Chess psychometrics – Ethnic Elo Gaps in the US

In the last post, we showed that US-American Ashkenazim have a higher average Elo rating than the average US chess player. In this post we extend the analysis to the ethnic and racial minorities African Americans, Asians and Hispanics.

Originally, I was just curious whether Asians had overtaken the Ashkenazim. This seems to be happening in several other measures of academic excellence. I assumed that to be the case, because chess rating first and foremost reflects how much work you have put into chess. So it likely responds strongly to the Asian work ethic.

When comparing Ashkenazim to the general US player I was methodologically quite lazy. I didn’t bother correcting for age because I assumed that the age structure would be similar enough. When looking at the Asian Elo this assumption no longer holds: Most current Asian chess players in the US are likely the kids of relatively recent immigrants.

Indeed, the average birth year of Asian chess players in the US seems to be 2001 vs 1986 for all players. Instead of using Elo directly, we look at the deviation from the average global Elo for the given birth year. This allows us to eliminate the effect of age.

Global average Elo rating by age

We use the 2010 US Census [1] to create samples of Asian, Black, Hispanic and White chess players. These are all players whose names belong with a >90% probability to the respective ethnic group. (For blacks we choose >80% probability, because otherwise the sample is almost nonexistent.) I restrict the samples to male players and birth years >1950.

I also create an additional Ashkenazi sample using my list of Ashkenazi names that occur among Israeli chess players and are not obviously mostly non-Ashkenazi (I excluded Perez, Miller, Brown).

This results in 420 Asians, 60 Hispanics, 66 Ashkenazim, 391 Whites, 14 Blacks. The low number of Whites and Blacks is a result of the difficulty of clearly distinguishing these two groups by surname. The high number of Asians is a result of uniquely Asian names. Ashkenazim are probably much more over represented than they are in my samples.

These are the age controlled deviations from the global average Elo for each group:
Asians 339
Ashkenazim 250
Whites 189
Hispanics 140
Blacks 125

With a White standard deviation of 215 we transform these numbers into IQ for a more intuitive comparison:

Asians 110.4
Ashkenazim 104.2
Whites 100.0
Hispanics 96.6
Blacks 95.5

The differences are probably distorted by weaker groups being more predominantly sampled from the right tail. But the ranking is quite unsurprising. After all, the 2019 US chess champions are called Hikaru Nakamura and Jennifer Yu [2]. With the much younger age structure US chess is bound to become much more Asian dominated in the future, with the ex-Soviets fading into the background.

[1] US census 2010

[2] US chess championship 2019


Chess psychometrics – The Ashkenazi advantage

In this post we are going to take a look at the chess performance of Ashkenazim relative to other Europeans. Ashkenazim are massively overrepresented in the higher echelons of chess history, with almost half of all World Chess Champions having at least partial Ashkenazi ancestry. But of course, it is not a priori clear that this is the result of stronger chess playing ability on average. 

Our data set has some clear sampling issues. Generally only relatively strong players are going to play rated games and the less developed the chess infrastructure of a country the more that is going to be the case. For example, the highest average rating of all countries is exhibited by Cuba. Cuba is a legit strong chess playing country with a World Champion (Capablanca) and some current very strong players (Lazaro Bruzon, Lenier Dominguez) to its name. But if we compare the rating distribution to the distribution of the USA, we see that the higher average is due to left part of the distribution missing, while there is still a gap at the right side.

In this figure we normalized the distributions by height. That’s not perfect, but it is probably better than normalizing by number of players, because that would over emphasize the right tail if the left tail is thinner due to under sampling. 

For the Ashkenazim, we can partly circumvent that problem by comparing US Ashkenazim to all other US players. Then at least the chess infrastructure is the same, although a group with a lower mean might still be under sampled at the left tail possibly reducing the difference. This is also a high bar to clear for the Ashkenazim because the US has stronger rated players than comparable Western European countries.

We look at US players with a typical Ashkenazi name. We circumvent the problem discussed in my post „Counting Names“, by only considering names that also occur among Israeli players. This makes it unlikely that we pick up German, English or Spanish names that are also (but rarely) found among Ashkenazim. 

The figure shows an Ashkenazi advantage both at the left tail and the right tail. Contrary to my assumptions it is bigger at the left tail. It might be the case that less Ashkenazim commit to high level chess due to a comparative advantage in other fields. That is for example very noticeable in Germany, where there is a strong chess infrastructure, a general high level, but almost no (native) chess professionals and consequently no international contenders. 

With these caveats in mind, the average US Ashkenazi rating is 2014 while the average US non-Ashkenazi rating is 1924. A difference of 90 points or 0.40 standard deviations. In terms of IQ this translates to 106 which is not completely out of line with other cognitive measures, especially if one takes into account that the g-loading of chess is likely not very high. 

Chess psychometrics – The gender equality paradox

Chess databases contain millions of games, whose players can largely be identified by the Fide players database [1] which contains age, sex, nationality and ratings. These games are interactions providing information about behavior in a competitive context. They are a goldmine for psychological or sociological research into a wide range of topics. The datasets derivable from chess databases are much larger than what can be realistically achieved in typical psychological research. As a researcher you are really only limited by your imagination and the number of your grad students.

While the typical university professor is severely limited in the former, I am unfortunately limited in the later. So, we will see how many of my chess psychometrics projects I’ll be able to bring to completion. For now we’ll start with something simple and not very original: We will check whether the gender equality paradox holds in chess.

The gender equality paradox is the observation that women in more gender equal societies tend to choose more stereotypical female occupations and are less likely for example to go into STEM. The gender disbalance in chess is very comparable to the disbalance in STEM research [2]. In fact, it is usually even more extreme, with women in many countries only representing less than 5% of the pool of rated players.

Outside of developed countries, the number of rated players is often quite small and not very representative when it comes to age, rating or possibly sex. So it is not surprising that on a global level we find no correlation between the global gender gap index and the fraction of female players.

In European countries however, there is a significant negative correlation between gender equality and the fraction of female players. (Yes, Turkey is for some reason in my list of European countries.)

Pearson correlation: -0.43603692297301305, p-value: 0.01119225533794084

This looks like a straightforward result. However, I am generally skeptical about the significance of these kind of correlations, because I suspect that often the significance is a result of countries falling into a small number of similar behaving clusters. If these clusters are then arranged linearly by chance, we get a significant correlation by virtue of decomposing these clusters into many countries.

So it might be the case that Northern countries all have high gender equality and low female chess player fractions by chance. While Eastern Europeans have low gender equality and high female chess participation for historical reasons. Because these clusters and the rest of the countries in between constitute a lot of observations the results looks a lot more robust than it really is.

Sure enough, there is no such correlation in Eastern Europe and restricted to Western Europe the correlation looses all significance.

On the other hand, the loss of statistical significance is due to just two outliers: Iceland and France. So is the gender equality paradox a thing in chess or not?

To detect even a rather weak tendency, we average over all countries that fall into the same section of the GGG-index. This time we look at all countries in our dataset.

If we ignore the four countries with the lowest gender equality which average very low, we actually see a nice downward trend in female chess playing the higher the gender equality. I tentatively conclude that the gender equality paradox does actually exist in chess.

[1] Fide player database

[2] The Gender-Equality Paradox in STEM Education

Demographic Change in France: Discussion

The typical rightwing theory would be that the immigrant population is outbreeding the natives due to much higher birthrates. Is this the story behind the sickle cell data? If we translate the percentages into absolute numbers based on the number of births in each relevant year, we get absolute births:
2000: 778900,
2007: 785985
2010: 802224
2012: 790290
2013: 781621
2015: 760421
Sickle cell tested newborns:
2000: 147991
2007: 223613
2010: 252701
2012: 272176
2013: 279039
2015: 295804
Not sickle cell tested newborns:
2000: 630909
2007: 562372
2010: 549523
2012: 518114
2013: 502582
2015: 464617

This is a growth of 4.72% per year and a shrinking of -2.02% per year respectively. Let’s imagine there was a French population and an immigrant population established maybe in the 60ies. And both populations were breeding merrily away with a perfectly steady fertility rate. In this closed system, what kind of fertility rates would account for the growth rates we see?

Well, with a generation length of 30 years (and human generation lengths almost always fall close to 30 years even with very different fertility rates), a fertility rate of 7.97 kids per woman for the immigrant population and 1.08 kids per woman for the French population would lead exactly to the growth rates we calculated.

That’s of course insane. According to the studies I have seen on this topic, immigrant birth rates have never been close to 8 kids per woman and these days they are certainly far lower. The French fertility rate of 1.08 kids per woman is also crazy low, because we made the assumption of a steady rate over many generations. If the rate was higher in the past, today’s birth rate would have to be even lower to account for the decline. Or conversely, if the birthrate today was actually higher, it must have been below 1.0 in the past.

So how do we square the circle? The sickle cell birth rate increase has to be predominantly driven by recent immigration. That fits the numbers. Birthrates among immigrant usually drop relatively quickly towards the birth rate typical of the country. Legal immigration to France has been massive in recent years. And there is also illegal immigration, estimated by Wikipedia to lie between 80,000 and 100,000 per year [1]. The story is probably one of young immigrants coming to France and then having kids, which also implies that a total immigration stop might lead to a reduction of the percentage of sickle cell babies.

But how can the native French birth rate be this low? The answer is probably that there is a certain amount of intermarriage, which get’s counted for the sickle cell numbers, and the overall fertility rate of ethnic French women is pretty close to that in neighbouring European countries, maybe in the vicinity of 1.4.

By the way, the current rates of growth and decline predict parity in sickle cell births and non-sickle cell births in 2022 and a 66% majority for the former in 2032. Of course, for reliable predictions a more sophisticated model is needed than just extrapolating growth rates.

I am not somebody to grieve for the French genepool, but I think this rapid change is dangerous for a variety of reasons.

It seems probably that within a decade or two, most French people will wake up to a reality were France is still 70% white, but the future is very noticeably 70% black. How will they react?

If the relative growth rates hold and at some point the political power changes hands, how will that effect the ethnic French? Even in a very peaceful best case scenario the new government will have been brought into power by a electorate much younger than the opposition, and with wages and pensions much lower than those of the opposition. In this situation a massive cut in pensions is the logical result in a democracy.

Or maybe the percentage stabilizes somewhere and French and Africans just have to live side by side. Well, the Basques, the Northern Irish, the Ukrainians and all inhabitants of Balkan states will tell you that even in Europe, having different ethnic groups in one country is not a recipe for peace. What does the trouble in the banlieues look like if scaled up five-fold?

I am not sure how big the achievement gap between second generation immigrant and the ethnic French is. But if there is a significant gap, the massive influx of lower qualified workers into the labour market will retard economic growth. That’s not gonna work wonders for ethnic relations. All in all a very worrying development, with France not the only European country in which rapid demographic change might lead to major upheaval in the next decades.

[1] Illegal Immigration to France

Demographic Change in France: The Numbers

A few years back certain medical data made a big wave in right wing circles never quite spilling over into mainstream media. The data in question consisted of percentages of newborns tested for sickle cell anemia in mainland France. In France, only newborns that have at least one parent originating from a region in which sickle cell anemia is common are tested for the disease. As sickle cell anemia is mostly prevalent in Africa, these percentages where taken as stand-in for the percentage of French newborns of African heritage.

The screening data suggested, that in 2000, 19 percent of babies born in mainland France (excluding oversea departments) were of African origin, a number that rose steadily to 38.9 percent in 2015. This is certainly surprising. If these numbers are correct, France’s ethnic makeup seems poised to jump from entirely European to basically Brazil within two generations.

To me, these data are worth investigating for several reasons.

During the last decades the media fed us a steady diet of articles about the French family friendly policies that were the reason for the birth rate collapse failing to materialize in France. It would certainly be interesting if that was just nonsense and the real reason was a more fecund (or just bigger) class of immigrants.

Ethnic replacement is a centerpiece of rightwing agitation. Of course, the media tells us that it is just a conspiracy theory. Just as with the French birth rate, I am very much interested in the extend of lies told to me by mainstream media outlets. Call it a desire for informational emancipation.

Ethnicity correlates with lots of variables of interest. Quantifying such a rapid change would allow predictions in crime rates, economic growth, human capital, unemployment, etc. Rapid change of any sort is often accompanied with many dangers. If you don’t know about the change, you can’t look out for the dangers.

There are several arguments against equating sickle cell screening with African origin. Among the countries that provided significant numbers of immigrants to France, sickle cell anemia is prevalent in Italy, Greece and Turkey aside from the Maghreb, Subsaharan Africa and the Caribbean. However, the number of recent European immigrants from sickle cell regions is too small to account for more than a few percentage points.

It has also been argued that some hospitals do not distinguish by origin, but instead test all newborns. That is entirely possible, however, it leads to a dilemma. If the absolute number of newborns at risk for sickle cell anemia is overestimated, the growth rate has to be underestimated!
Or to put it differently: If the 19% in 2000 were actually just 10% because 9% were due to unnecessary testing, than to get to 39% in 2015 the percentage of actual kids at risk had to triple from 10% to 30% instead of double from 19% to 39%.

Alternatively, the number of hospitals just testing everybody has steadily risen. In which case the entire data is worthless. Or the original study could just be a hoax by a devious far-right physician. Who knows?

So the first point on our agenda is trying to independently verify the plausibility of the data.

To this end, I downloaded the data for given names in France provided by the French bureau for statistics, INSEE [1]. I also create a list of 2211 popular Muslim names, specifically Arab and Turkish names. Not all of the sickle cell tested babies will be of Arab or Turkish origin. And not all kids of Arab and Turkish origin will be given Arab and Turkish names. And additionally, my list probably doesn’t cover more than small chunk of all actual Arab and Turkish names. But it still allows us to track the increase of a certain subset of all kids that would be subject to sickle cell testing.

A first quick and dirty run of the numbers: In 2000, out of 800039 kids my list covers 46718 or 5.8%. In 2015, my list covers 80387 out of 777746 names, or 10.3%. This amount to an estimated 1.77-fold increase of Arab/Turkish newborns over the time span in which the sickle cell percentage roughly doubled, which is reasonably close.

However, out of my 2211 names only 103 and 127 actually occur in the INSEE list of given names for the years 2000 and 2015. Only 77 names occur in both lists. Some of these names are clearly not just popular among Muslims, especially girl’s names are often ambiguous. So let’s try to tighten up the method.

Now, we only look at names present in both years. We exclude all ambiguous names. Each remaining name provides a separate estimate how much the percentage of Muslim newborns has changed between 2000 and 2015. This time the overall percentage accounted for by these 56 names almost exactly doubles from 1.95% to 3.89%. The median increase, which should be more robust against outliers (like short term trends in popularity), is also exactly 2.0.

To my mind this provides strong confirmation that the sickle cell data is correctly interpreted as showing that the percentage of a predominantly African derived immigrant population among the newborns in France has doubled between 2000 and 2015. Confirmation of the growth rate makes it rather unlikely that the absolute percentage numbers are off by any significant degree.

I did these analyses quite some time ago. At one point I became aware that my given name analysis had been scooped by a French far-right website. (Which was one motivation to finally get the blog going.) In their analysis they try to capture all Muslim names and give a definite estimate of the absolute numbers. They handle ambiguous names by just counting them as half a Muslim. According to their analysis the number of Muslim newborns more than doubled between 2000 and 2015.

This got me thinking about how to do this analysis right. Counting ambiguous names as half is a really ugly hack, likely to overcount names as long as Muslims are a minority. Instead one might use the regional and temporal variation to infer for each name separately how it contributes to the number of Muslims.

Once you have done that you can subtract a precise estimate of number of Muslims from the sickle cell data to get an estimate of the increase of Subsaharan Africans for each region. Which allows you to do the same inference for SS-African names, which are probably much more ambiguous than the Islamic ones.

If that works, you have ended up with a method to create precise estimates for both groups directly from given names, even in the likely case that the sickle cell data stops being published. Unfortunately this takes quite a lot of time. And of course there is no guarantee that it would work. Maybe a project for the future.


Four problems with cousin marriage

Cousin marriage was prevalent all over the world with the big exception of western Europe [1]. It still is especially common in the Islamic world. Marrying your relatives has the advantage of keeping the family together. Clans of up to several hundred closely related persons are the result and especially in a pre-state context, that is a pretty useful organisational unit. However, from a genetic or evolutionary perspective there are several potential problems with cousin marriage.

The obvious one is the prevalence of homozygosity runs, i.e. sections in the genome that are identical for the chromosome coming from the father and the chromosome received from the mother. These are generally problematic, because the other chromosome copy has to step in whenever something is significantly messed up in one chromosome. Homozygosity runs mean that mutational load hits with full force for some sections of the genome. There is probably an IQ loss of several points and congenital diseases become much more common.

However, all problems caused by homozygosity runs can be fixed by a single outbreeding event. But what if living in a clan environment has reduced the selection for individual achievement in a population for hundreds of years? The welfare state is often blamed for the reduced or reversed selection for positive traits, but a clan is a form of welfare state. The clan provides you with a job, a wife, takes care of you when you fall ill or lose your ability to feed your kids. It is conceivable that the existence of clan structures prevented the slow replacement of the lower class by the middle class that is conjectured to have raised the IQ in Europe until the nineteenth century [2].

Clan borders also work to a certain degree as genetic barriers. This means that positive mutations have a much harder time sweeping the population. If the default is marrying a relative, a positive mutation will have to sweep each clan separately and additionally jump from clan to clan.

The fourth potential problem I see is a reduced response to selection. Response to selection depends on the variance of the trait in question. Variance within each clan (not necessarily within the full population) will be lower for two reasons: The reduced genetic diversity and less assortative mating. Depending on how the selection pressure is structured this might reduce the speed of genetic adaptation.

It is possible that these four factors played a role in the precipitous fall of intellectual productivity that the Islamic world has experienced since the Islamic Golden Age [3].

[1] Cousin marriage Europe

[2] Farewell to alms

[3] Cousin marriage Middle East

Too much of a good thing

The gay germ theory proposes that there is a pathogen that causes male homosexuality [1]. The reasoning behind this theory is that the fitness hit of homosexuality is too big to allow a genetic explanation. I.e. any genetic variation involved would very quickly be weeded from the gene pool and the occurrence of homosexuality would depend on de novo mutations and be very rare. Pathogens on the other hand, can do with us whatever they want to, because we cannot out-evolve them.

However, there are several indications, that an immune reaction of the mother is somehow involved [2]. Here, the idea would be that embryonic tissue that expresses proteins alien to the immune system is attacked and damaged. This would explain a underdevelopment of a hypothalamic nucleus responsible for a male heterosexual orientation, which expresses male specific genes during masculinization and defeminization in utero.

If we combine both ideas, we end up with a putative pathogen, that might trigger an immune response against male specific proteins. This gives us a hint how to identify the pathogen in question: It would have to be a pathogen that shares an epitope with such a male specific protein. An epitope is a surface of a folded protein, which is recognized by the immune system’s antibody.

Unfortunately identifying epitopes from protein sequences requires a solution to the protein folding problem. So we are not going to be able to do it just by downloading a bunch of genetic sequences.

However, it is entirely possible that there isn’t actually a pathogen involved. The effect of shared environment on sexual orientation, for example, is zero [3]. Pretty weird for an infection. Instead male homosexuality might be a case of what I like to call the “too much of a good thing”-failure mode of evolution. Occasionally evolution finds itself in dead ends, where there is a selection pressure towards a good thing, and a catastrophic failure mode whenever there is too much of the good thing.

One example for this are the trisomies. Here the good thing is having a big ovum [4]. Human egg cells are pretty big and for good reason. After fertilization they have to divide quickly and set up shop in the uterus. If they run out of gas before the placenta is in place, that’s it. One way ova get big is very unequal division in the last two rounds of cell division. One new cell keeps all the cell plasma, the other one is discarded. Very unequal cell division results in a big final cell, but it also increases the likelihood that not all of half of the chromosomes can be stashed in the small cell.

Another example might be autism. One of the symptoms of autism is neural overgrowth in some parts of the cortex [5]. And while the average autist does not do too well on an IQ test, extreme precociousness in children is often accompanied with autism. It seems growing too many neurons and learning too fast has catastrophic failure modes too.

This could be going on with male homosexuality. A strong immune system is nice, but not if it attacks vital parts of you unborn child’s brain. Or conversely, toning down the immune system to accommodate your child is a fine thing, if it doesn’t get one or both of you killed. One could argue that evolution should find some sideway avenue to avoid the failure mode. But it didn’t for the trisomies, nor did it for autism.

[1] Gay germ theory

[2] Antibodies against male specific proteins in mothers of gay sons

[3] Shared environment of homosexuality is zero.

[4] The ovum is large.

[5] Brain overgrowth in autism