Too much of a good thing

The gay germ theory proposes that there is a pathogen that causes male homosexuality [1]. The reasoning behind this theory is that the fitness hit of homosexuality is too big to allow a genetic explanation. I.e. any genetic variation involved would very quickly be weeded from the gene pool and the occurrence of homosexuality would depend on de novo mutations and be very rare. Pathogens on the other hand, can do with us whatever they want to, because we cannot out-evolve them.

However, there are several indications, that an immune reaction of the mother is somehow involved [2]. Here, the idea would be that embryonic tissue that expresses proteins alien to the immune system is attacked and damaged. This would explain a underdevelopment of a hypothalamic nucleus responsible for a male heterosexual orientation, which expresses male specific genes during masculinization and defeminization.

If we combine both ideas, we end up with a putative pathogen, that might trigger an immune response against male specific proteins. This gives us a hint how to identify the pathogen in question: It would have to be a pathogen that shares an epitope with such a male specific protein. An epitope is a surface of a folded protein, which is recognized by the immune system’s antibody.

Unfortunately identifying epitopes from protein sequences requires a solution to the protein folding problem. So we are not going to be able to do it just by downloading a bunch of genetic sequences.

However, it is entirely possible that there isn’t actually a pathogen involved. The effect of shared environment on sexual orientation, for example, is zero [3]. Pretty weird for an infection. Instead male homosexuality might be a case of what I like to call the “too much of a good thing”-failure mode of evolution. Occasionally evolution finds itself in dead ends, where there is a selection pressure towards a good thing, and a catastrophic failure mode whenever there is too much of the good thing.

One example for this are the trisomies. Here the good thing is having a big ovum [4]. Human egg cells are pretty big and for good reason. After fertilization they have to divide quickly and set up shop in the uterus. If they run out of gas before the placenta is in place, that’s it. One way ova get big is very unequal division in the last two rounds of cell division. One new cell keeps all the cell plasma, the other one is discarded. Very unequal cell division results in a big final cell, but it also increases the likelihood that not all of half of the chromosomes can be stashed in the small cell.

Another example might be autism. One of the symptoms of autism is neural overgrowth in some parts of the cortex [5]. And while the average autist does not do too well on an IQ test, extreme precociousness in children is often accompanied with autism. It seems growing too many neurons and learning too fast has catastrophic failure modes too.

This could be going on with male homosexuality. A strong immune system is nice, but not if it attacks vital parts of you unborn child’s brain. Or conversely, toning down the immune system to accommodate your child is a fine thing, if it doesn’t get one or both of you killed. One could argue that evolution should find some sideway avenue to avoid the failure mode. But it didn’t for the trisomies, nor did it for autism.

[1] Gay germ theory

[2] Antibodies against male specific proteins in mothers of gay sons

[3] Shared environment of homosexuality is zero.

[4] The ovum is large.

[5] Brain overgrowth in autism


Counting names

In the last blogpost we counted a lot of names to determine how NE-Asians are represented among songwriting competition winners. In this post we take a closer look at some methods to determine ethnic composition of a dataset by surname analysis.

Without the anatomical baggage, the theory of NE-Asian relative underperformance boils down to “whatever Ashkenazim have that makes them successful (beyond the absolute IQ-value), NE-Asians probably have slightly less of it than Europeans”.

So we are naturally interested in the Ashkenazi representation among songwriting competition winners. We try to calculate it using two datasets of Ashkenazi names [1],[2]. Using the first set of names we count 17% percent Ashkenazi names among songwriting competition winners, using the second set of names we only find 6.5%. This mostly tells us that we cannot reliably assess the number of Ashkenazim with this method. There are too many names listed that are quite common among gentiles as well. And on the other hand, it is unclear how complete the lists are.

There has been much speculation, whether Ashkenazi intellectual performance is declining due to outmarriage and low fertility. Even if our method is too crude to give precise percentages, we can at least say that there is no evidence of declining performance in this dataset.

It seems counting Ashkenazim is methodologically significantly more difficult than counting NE-Asians. And of course the same holds true for African-Americans or Hispanics, who share a lot of surnames with White Americans.

Counting Ashkenazim is a time-honoured method that has been used to advance many different theories. Due to the Ashkenazi IQ advantage of 10 points, Ashkenazi overrepresentation can, for example, be used to assess the intellectual difficulty of a feat. For very high profile samples it is possible to check ethnicity by hand. For the NE-Asian songwriters this was only possible because a look at a picture is enough. Generally this is not possible for Ashkenazim which leads to a lot of potential data fudging. We would really like to do better than that.

How do we accurately and objectively assess the likely number of Ashkenazim in a sample of surnames? Let’s assume we have the probability distribution of Ashkenazi family names, i.e. the frequency with which each name appears in the Ashkenazi population. We also have the surname distribution in the general US population. Then we can create a mixed distribution of x% Ashkenazim and 100-x% non-Ashkenazim and calculate the likelihood of our sample given this mixed distribution. The likelihood is just the product of frequencies of the names in the sample. By doing this for different x we can find the mixed distribution that leads to the maximum likelihood of our sample. The x% used to create this maximum likelihood distribution is our best guess at the Ashkenazi percentage in our sample. Possibly, we have to adjust the percentage by the fraction of the population that is actually covered by our name database.

It may seem easier to just add the fractions of Ashkenazim for each name in the sample. So if we have ten names and in the general population for each of these names 10% are Ashkenazim, these ten names add up to one full Ashkenazim. Unfortunately, this undervalues overrepresented groups. In a sample with five-fold overrepresentation, these ten names should be treated as 50% Ashkkenazi. One idea would be to calculate one estimate, use this estimate to update the expected fraction of Ashkenazim for each name and then recursively get closer to the real value, but the maximum likelihood method also gives us a distribution of likely percentages.

The 2010 US census provides the surname distributions for the general US population, Whites, Blacks, Asians and Hispanics [3]. This allows us to test this method at least for these groups. For the songwriting competition winners it results in a maximum likelihood for an Asian percentage of 0.8%. This would correspond to 23 artists. Given that we only looked at NE-Asians in the last blogpost, this result is very compatible with our earlier result.

Likelihood for percentage of non-Asians peaks at 99.8%.

So we do have a method and we do have a motive, but unfortunately we don’t have a distribution of Ashkenazi surnames in the US. One way to get a distribution of Ashkenazi surnames would be by scraping the names of Holocaust victims from the Vad Vachem database [4]. However, that is certainly somewhat disrespectful and it is unclear whether the distribution is all that similar to the current American one. The other way is to collect names of American Jews from a variety of sources, here a hundred and there a hundred until a significant fraction of the most common names is covered. However, I am currently too lazy to do that. But maybe I will get around to it, if something a lot more interesting than songwriting competition winners turns up.

[1] Ashkenazi names 1

[2] Ashkenazi names 2

[3] 2010 US Census

[4] Holocaust victim names

Verbal IQ and songwriting – NE-Asian underperformance

In my two part blog post “A theory of intelligence” I examine the unusual IQ profiles of both Ashkenazi Jews (high verbal) and NE-Asians (high math-spatial) to propose a theory of intelligence. This theory tries to explain the NE-Asian underperformance in GDP and science relative to their very high IQ, by positing that NE-Asians create fewer lateral and top-down synapses. This leads to slightly lower verbal IQ and conceptual creativity compared to Europeans and especially compared to Ashkenazim.

One of my intuitions is that verbal IQ tests do not pick up on this difference particularly well, because they also load on knowledge and pattern recognition. I wondered whether tail effects in verbally creative endeavors would maybe lend support to my theory. To this end I analyzed a dataset of songwriting competition winners [1].

The dataset consists of 2875 US artists that won prizes or honorary mentions in the years 2002-2017. To identify NE-Asian artists I compare the names against the most common Korean [2], Japanese [3] and Chinese [4] surnames. These surnames cover roughly 90%, 33% and 84.8% of these populations respectively. Each hit I then check by hand to exclude anybody who is provably not Asian (quite likely for some names like Young, Shaw or Lee).

Chinese Americans constitute 1.5% of the US population, Japanese Americans 0.4% and Korean Americans 0.8%. Multiplied with the sensitivity of our method, this leads to (0.0150.848 + 0.0040.33 + 0.0080.90)2875= 61 being the expected number of hits for a perfectly proportional representation of NE-Asian Americans.

Instead we only find 13 NE-Asian names that cannot be excluded, more than a four-fold underrepresentation. Of course, one may argue that this is a result of language deficiencies due to relatively recent immigration. However, there is also no upward trend visible over these 15 years. Only seven of these artists are unambiguously Japanese, Chinese or Korean American. Of the rest, one is Japanese but not American, one is Malaysian, one is Taiwanese (not included in the 1.5%) and four I could not identify.

Also, perfectly proportional representation may be the wrong baseline to compare against. Of the 1554 winners of the US Open Music Competition 2019 [5], a competition in classical music, 1050 have NE-Asian surnames. These also have much more typical names, with the most common being Yang, Wang, Chen, Li, Truong, Zhang, Liu, Loo, Wu, Lin. This makes it plausible that we are still overcounting NE-Asians in the songwriting dataset.

There is the additional fudge factor, that you can’t tell who wrote the lyrics. Christopher Tin, for example, whom we counted twice, is a classical composer. His most famous piece is Baba Yetu, the theme song of Civilisation IV. Its lyrics are a Swahili version of the Lord’s Prayer.

Overall we see at least a 4.5-fold underrepresentation relative to population percentage, compared to classical music winners a 150-fold underrepresentation. I chalk this up as consistent with my theory.

[1] International songwriting competition

[2] Korean Surnames

[3] Japanese surnames

[4] Chinese surnames

[5] US Open Music Competition Winners

Hereditarianism III: Discussion

In the last post, we have seen that for African-Americans and Hispanics, IQ varies according to ancestry. In this post we will discuss what this actually means and whether there is still leeway for the environmentalist to wriggle about.

The key idea of this kind of admixture study is to show that the differences between ethnic groups can entirely be explained by genetic factors. This is done by showing that the IQ differences within each ethnic group by ancestry extrapolate to the differences between ethnic groups. So it is essential that we only look at IQ and ancestry within each ethnic group.

Without a strict restriction to one ethnic group, it would not be enough to prove that IQ correlates with admixture. We already know that there is an IQ gap and we already know that there is an “admixture gap”. So a correlation is already a given.

But what if the self-identified ethnicity is noisy? For example some of the “Hispanics” might actually identify or be identified as White. In that case the correlation between ethnicity and IQ would bleed over into the IQ-admixture. Of course this assumption borders on paranoia. But the correlations observed are quite small, which means that admixture explains very little of the IQ variance in the data set, which might seem counterintuitive from a hereditarian perspective.

So what kind of correlation should we expect? If the European-Amerindian-gap is 16 points, similar to the Hispanic standard deviation, shouldn’t we expect admixture to explain a very significant part of the variation? Well, actually not. If admixture is uniformly distributed the mean difference in admixture between two Hispanics is only 33.3%. This means the average IQ difference explained by admixture would at most be 5-6 points. But the admixture is not uniformly distributed, Hispanics with less than 40% European admixture are notably rarer. This is why the actual standard deviation of admixture is just 23.3. So we are down to less than 4 points explained by admixture. This would lead to a correlation of 0.50 … given perfect data. But both the admixture data and especially the IQ data invariably contain noise, reducing this correlation further. So it is actually not surprising that we only see correlations between 0.17 (for the very range-restricted African Americans) and 0.41 (for much more uniformly distributed African-European Hispanics).

A better way than looking at correlations to drive home the meaning of the hereditarian hypothesis is to visualize how mean IQ of percentiles change. The hereditarian hypothesis posits, that IQ varies continuously with admixture. This means that the IQ averages of admixture percentiles will more or less linearly increase.

To show this effect for each percentile would require a much larger data set. This data set is almost too small and heterogeneous to show the effect convincingly for quartiles. For example, as we have seen, the Hispanic IQ is slightly depressed compared to the same admixture in African Americans. Because the middle region of European admixture is dominated by Hispanics this results in a depressed middle if we use the whole sample.

Instead we restrict ourselves to the Hispanic sample. Because the mean White and mean Asian IQ in our data is almost identical, we can just pool European and East Asian admixture to create a well-powered Hispanic quartile admixture plot:

n=323, slope=21.56, intercept=75.32, correlation=0.273, p-value=6.217e-07

Here, we see that the average IQ of the admixture quartiles fall pretty nicely on the regression line.
This plot perfectly illustrates the hereditarian hypothesis: The averages vary exactly according to admixture. (Note also, that if we plot a line through the first two quartile averages only, we would overshoot the mean white IQ, presumably because the lowest quartile is slightly environmentally depressed. This might be happening in the African-American sample.)

It is tough to come up with environmental causes for IQ differences that vary according to ancestry. Colorism is one of the best tries. Colorism is the idea that racism is graded by how dark somebodies skin is, which varies according to ancestry, and that this racism somehow reduces IQ. Except when you are NE-Asian … Colorism as the reason for IQ varying with ancestry, is a theory that has a lot to prove before it can be remotely taken seriously.

However, IQ varying by ancestry also doesn’t prove that the gap is fully genetic. Or, to put it differently, even if we could predict IQ perfectly directly from the genome, it remains theoretically possible that there are gene-environment feedback mechanisms involved that allow us to reduce the magnitude of the gap by improving living/learning conditions. Of course the history of intervention studies tells us not to hold our breath.

So, what are the take-aways from this series:

  1. IQ varies by ancestry within ethnic groups with the same country of birth.
  2. This intra-ethnic variation fully explains IQ differences between ethnic groups.
  3. This invalidates most environmental explanations for the IQ gaps.
  4. And strongly suggests a genetic reason for IQ gaps between ethnic groups.
  5. Ancestry nonetheless explains little individual IQ variation – people should be judged as individuals.

Hereditarianism II: Admixture Data and Gaps

In the last post, we have seen, that the environmentalist position about group differences in IQ is mostly based on the idea of x-factors. Factors hard to identify that vary systematically between groups and affect IQ. Given that there are many factors that vary between ethnic groups, this is a difficult theory to disprove.

However, from a hereditarian perspective, two persons belonging to the same ethnic group can sometimes be differentiated by different amounts of a certain genetic ancestry. So in ethnic groups whose members have varying degrees of admixture of some original founding populations we can put the hereditarian hypothesis to the test. This is the case for African-Americans, who have varying degrees of European ancestry and for Hispanics, who are mostly a mixture of Europeans, Amerindians and Africans.

The hereditarian hypothesis predicts that IQ will vary within these groups with the amount of admixture for any chosen ancestral group. This type of admixture study has the power to rule out the majority of x-factors that systematically vary between ethnic groups, except for those that vary roughly according to ancestry.

A recent paper showed IQ varying by ancestry for Hispanics and African Americans [1]. These are the key figures.

The regression line of the relationship between cognitive ability and European ancestry in African Americans
And the same thing for Hispanics …

Courtesy of Emil Kirkegaard we can reanalyze the underlying data set. This data set contains IQ scores for a couple of hundred self-identified Whites, Blacks, Hispanics, East Asians + other minorities and the percentage of their genome being European, African, Amerindian, Asian etc.

First we translate the cognitive ability measure, here given in whole sample standard deviations above the sample mean, into IQ, with white mean = 100 and white standard deviation = 15.

n=137, slope=23.283, intercept=79.6, correlation=0.176, p-value=0.0392

The slope of 23.283 immediately gives us the gap between 100% European and 100% African, while the intercept provides us with the IQ of a 100% African African-American. The regression line overshoots the mean white IQ. This might be noise, or legitimately smarter white genes in the black population, or Amerindian admixture in the whites reducing the mean, or a slight environmental downward bent of the left part of the plot. But whether we take the estimated gap, or the difference between actual white mean IQ and the 100% African IQ, the result is always strikingly close to Galton’s estimate.

Of course this is just a very small sample. With a very restricted range. However, we can immediately replicate this regression line with those Hispanics that have predominately African and European admixture.

n=79, slope=23.837, intercept=73.33, correlation=0.416475096463478, p-value=0.000134

This gives us a virtually identical gap. But the whole line is shifted down. This vibes well with other results, see for example [2]. The average Hispanic IQ in this sample is only 89.5, compared to a usual US Hispanic IQ of 92-93, so it might still be missing a few points of Flynn effect. Note, however, that this seems to affect the entire IQ range in the same fashion.

The combined sample of African Americans and Euro-African Hispanics of course also validates Galton’s estimate of the gap almost perfectly.

n=257, slope=22.282, intercept=77.979, correlation=0.401 p-value=2.34e-11

For comparison, for Hispanics with predominantly European and Amerindian the admixture plot looks like this.

n=323, slope=16.65, intercept=80.024, correlation=0.233, p-value=2.231e-05

The gap is some 7 points smaller and the percentage of European admixture is generally quite high, which is why despite the missing Flynn effect points, the average Hispanic IQ is 89.5 vs 83.7 for African Americans.

[1] Biogeographic Ancestry, Cognitive Ability and Socioeconomic Outcomes

[2] A study of intelligence of children in Brazil

Hereditarianism I: Galton and Gaps

Hereditarianism is the idea that differences in abilities and character traits are substantially genetic in origin. This has been largely validated for individual differences, especially when it comes to IQ.

Everything is heritable.

“Hereditary genius” by Francis Galton published 1869 can be seen as the founding document of hereditarianism [1]. In “Hereditary genius” Galton observes that human traits are often normally distributed, including intellectual abilities. He then proposes a method to sort people into different grades of “eminence”. The grades A, B, C, D, E, F, G, and X are above the average, getting ever more illustrious and the grades a, b, c, d, e, f, g, and x classify people below average in lifetime achievement. He gives precise frequencies for each grade, so that it is possible to translate his statements into the language of IQ. Although Galton’s “eminence” is based on more than just intelligence (he mentions “zeal” and “working capacity”) it is probably the most important aspect.

His grades correspond to the following IQs:

A >100.0
B >110.39
C >120.88
D >131.33
E >141.78
F >152.24
G >162.60
X >171.30

As we can see, each grade should roughly correspond to a range of 10.5 IQ points.

Using his grading system he then starts to analyse the pedigrees of English judges and other notable men. He finds that “eminence” runs in families, and rules out a decisive role of nuture by looking at the adopted sons of popes.

He finally goes on to assess the difference between Africans and Europeans, in essence relying on several observations of tail effects. He diagnoses an average intellectual ability gap of 2 grades, which would translate to 21 IQ points.

First, the negro race has occasionally, but very rarely, produced such men as Toussaint l’Ouverture, who are of our class F; that is to say, its X, or its total classes above G, appear to correspond with our F, showing a difference of not less than two grades between the black and white races, and it may be more.

Hereditary Genius

To Galton group differences are obviously innate, but he does see moderating environmental influences. On the Africans in Africa he says:

Thirdly, we may compare, but with much caution, the relative position of negroes in their native country with that of the travellers who visit them. … [A]n average actual difference of three grades, of which one may be due to the relative demerits of native education, and the remaining two to a difference in natural gifts.

Hereditary Genius

However, the currently existing results about the heritability of IQ differences between individuals do not automatically transfer to group differences. If there are systematic environmental differences between groups, in-group heritability could be high, but the between-group differences would be environmental. And of course there are many actual and potential systematic differences between groups. Enough, that as soon as hereditarians have disproven one potential environmental cause for group differences, two new ideas are lined up by the environmentalists. These potential causes include socio-economic status of the parents, lead exposure, number of words heard in early childhood, peer groups, stereotype threat, many aspects of education, prenatal and postnatal nutrition, breast feeding, systemic racism and many more.

Although there is no clear-cut argument for predominantly environmental IQ gaps between ethnic groups, the environmental position is the current consensus.

“Hereditary genius” is a great read, because, while his methods are pretty dodgy, Galton is basically some hundred years ahead of the curve. A true founder of the field. In the next post we are going to analyze a data set to see how well Galton’s assessment of group differences holds up or whether the current environmentalist consensus is still in decent shape.

[1] Hereditary Genius

IQ-GDP VIII: Linear g theory

The second idea of how to interpret the GDP-IQ relationship is based on several different results of IQ research.

As you might know, there is a general factor of intelligence, that can be extracted from any battery of cognitive tests. The so-called g-factor explains a big part of the results on any IQ-test. The essential thing is that it explains the predictive part [1]. That means if you factor out the g-factor, IQ tests do no tell you much about educational attainment, income, criminality or performance in other cognitive domains.

As you might further know, there has been a steady rise of IQ scores, called the Flynn effect [2]. However, the Flynn effect has not been on the g-factor. I.e. the Flynn effect has been anti-correlated with the g-loadings of different IQ tests. This explains why our grandparent’s generation does not seem to be morons, despite scoring 30 points lower on Raven’s matrices. The Flynn effect doesn’t really increase cognitive ability, rather it increases the additional factors that unfortunately do not generalize.

As the Flynn effect is still ongoing in many countries and has stopped in the most developed countries, it is obviously playing a role in the differences in national mean IQ. If one day all countries have reached the end of the Flynn effect, we would expect the differences in mean IQ to have decreased substantially.

But here comes the rub: If the differences decrease due to the Flynn effect, and the Flynn effect is not on g, and only g is predictive of performance in the real world … why would we expect the shrinking IQ gap to be accompanied by a shrinking performance gap in GDP and co?

The linear g theory says that if we could compare nations by g-factor instead of IQ, we would see a linear relationship between g and GDP. The exponential relationship observed between IQ and GDP is just an artifact of poorer countries having still a lot of Flynn left to go.

This figure illustrates the linear g theory: The developed countries have IQs close to their g-factor, everybody else is still catching up. The relationship between g and GDP is linear.

I do not endorse a strong version of the linear g theory. But given the results of IQ research cited above, the hollowness of the Flynn effect must play some role in distorting the IQ-GDP relationship.

[1] g-factor

[2]Flynn effect