Too much of a good thing

The gay germ theory proposes that there is a pathogen that causes male homosexuality [1]. The reasoning behind this theory is that the fitness hit of homosexuality is too big to allow a genetic explanation. I.e. any genetic variation involved would very quickly be weeded from the gene pool and the occurrence of homosexuality would depend on de novo mutations and be very rare. Pathogens on the other hand, can do with us whatever they want to, because we cannot out-evolve them.

However, there are several indications, that an immune reaction of the mother is somehow involved [2]. Here, the idea would be that embryonic tissue that expresses proteins alien to the immune system is attacked and damaged. This would explain a underdevelopment of a hypothalamic nucleus responsible for a male heterosexual orientation, which expresses male specific genes during masculinization and defeminization.

If we combine both ideas, we end up with a putative pathogen, that might trigger an immune response against male specific proteins. This gives us a hint how to identify the pathogen in question: It would have to be a pathogen that shares an epitope with such a male specific protein. An epitope is a surface of a folded protein, which is recognized by the immune system’s antibody.

Unfortunately identifying epitopes from protein sequences requires a solution to the protein folding problem. So we are not going to be able to do it just by downloading a bunch of genetic sequences.

However, it is entirely possible that there isn’t actually a pathogen involved. The effect of shared environment on sexual orientation, for example, is zero [3]. Pretty weird for an infection. Instead male homosexuality might be a case of what I like to call the “too much of a good thing”-failure mode of evolution. Occasionally evolution finds itself in dead ends, where there is a selection pressure towards a good thing, and a catastrophic failure mode whenever there is too much of the good thing.

One example for this are the trisomies. Here the good thing is having a big ovum [4]. Human egg cells are pretty big and for good reason. After fertilization they have to divide quickly and set up shop in the uterus. If they run out of gas before the placenta is in place, that’s it. One way ova get big is very unequal division in the last two rounds of cell division. One new cell keeps all the cell plasma, the other one is discarded. Very unequal cell division results in a big final cell, but it also increases the likelihood that not all of half of the chromosomes can be stashed in the small cell.

Another example might be autism. One of the symptoms of autism is neural overgrowth in some parts of the cortex [5]. And while the average autist does not do too well on an IQ test, extreme precociousness in children is often accompanied with autism. It seems growing too many neurons and learning too fast has catastrophic failure modes too.

This could be going on with male homosexuality. A strong immune system is nice, but not if it attacks vital parts of you unborn child’s brain. Or conversely, toning down the immune system to accommodate your child is a fine thing, if it doesn’t get one or both of you killed. One could argue that evolution should find some sideway avenue to avoid the failure mode. But it didn’t for the trisomies, nor did it for autism.

[1] Gay germ theory

[2] Antibodies against male specific proteins in mothers of gay sons

[3] Shared environment of homosexuality is zero.

[4] The ovum is large.

[5] Brain overgrowth in autism


Counting names

In the last blogpost we counted a lot of names to determine how NE-Asians are represented among songwriting competition winners. In this post we take a closer look at some methods to determine ethnic composition of a dataset by surname analysis.

Without the anatomical baggage, the theory of NE-Asian relative underperformance boils down to “whatever Ashkenazim have that makes them successful (beyond the absolute IQ-value), NE-Asians probably have slightly less of it than Europeans”.

So we are naturally interested in the Ashkenazi representation among songwriting competition winners. We try to calculate it using two datasets of Ashkenazi names [1],[2]. Using the first set of names we count 17% percent Ashkenazi names among songwriting competition winners, using the second set of names we only find 6.5%. This mostly tells us that we cannot reliably assess the number of Ashkenazim with this method. There are too many names listed that are quite common among gentiles as well. And on the other hand, it is unclear how complete the lists are.

There has been much speculation, whether Ashkenazi intellectual performance is declining due to outmarriage and low fertility. Even if our method is too crude to give precise percentages, we can at least say that there is no evidence of declining performance in this dataset.

It seems counting Ashkenazim is methodologically significantly more difficult than counting NE-Asians. And of course the same holds true for African-Americans or Hispanics, who share a lot of surnames with White Americans.

Counting Ashkenazim is a time-honoured method that has been used to advance many different theories. Due to the Ashkenazi IQ advantage of 10 points, Ashkenazi overrepresentation can, for example, be used to assess the intellectual difficulty of a feat. For very high profile samples it is possible to check ethnicity by hand. For the NE-Asian songwriters this was only possible because a look at a picture is enough. Generally this is not possible for Ashkenazim which leads to a lot of potential data fudging. We would really like to do better than that.

How do we accurately and objectively assess the likely number of Ashkenazim in a sample of surnames? Let’s assume we have the probability distribution of Ashkenazi family names, i.e. the frequency with which each name appears in the Ashkenazi population. We also have the surname distribution in the general US population. Then we can create a mixed distribution of x% Ashkenazim and 100-x% non-Ashkenazim and calculate the likelihood of our sample given this mixed distribution. The likelihood is just the product of frequencies of the names in the sample. By doing this for different x we can find the mixed distribution that leads to the maximum likelihood of our sample. The x% used to create this maximum likelihood distribution is our best guess at the Ashkenazi percentage in our sample. Possibly, we have to adjust the percentage by the fraction of the population that is actually covered by our name database.

It may seem easier to just add the fractions of Ashkenazim for each name in the sample. So if we have ten names and in the general population for each of these names 10% are Ashkenazim, these ten names add up to one full Ashkenazim. Unfortunately, this undervalues overrepresented groups. In a sample with five-fold overrepresentation, these ten names should be treated as 50% Ashkkenazi. One idea would be to calculate one estimate, use this estimate to update the expected fraction of Ashkenazim for each name and then recursively get closer to the real value, but the maximum likelihood method also gives us a distribution of likely percentages.

The 2010 US census provides the surname distributions for the general US population, Whites, Blacks, Asians and Hispanics [3]. This allows us to test this method at least for these groups. For the songwriting competition winners it results in a maximum likelihood for an Asian percentage of 0.8%. This would correspond to 23 artists. Given that we only looked at NE-Asians in the last blogpost, this result is very compatible with our earlier result.

Likelihood for percentage of non-Asians peaks at 99.8%.

So we do have a method and we do have a motive, but unfortunately we don’t have a distribution of Ashkenazi surnames in the US. One way to get a distribution of Ashkenazi surnames would be by scraping the names of Holocaust victims from the Vad Vachem database [4]. However, that is certainly somewhat disrespectful and it is unclear whether the distribution is all that similar to the current American one. The other way is to collect names of American Jews from a variety of sources, here a hundred and there a hundred until a significant fraction of the most common names is covered. However, I am currently too lazy to do that. But maybe I will get around to it, if something a lot more interesting than songwriting competition winners turns up.

[1] Ashkenazi names 1

[2] Ashkenazi names 2

[3] 2010 US Census

[4] Holocaust victim names

Verbal IQ and songwriting – NE-Asian underperformance

In my two part blog post “A theory of intelligence” I examine the unusual IQ profiles of both Ashkenazi Jews (high verbal) and NE-Asians (high math-spatial) to propose a theory of intelligence. This theory tries to explain the NE-Asian underperformance in GDP and science relative to their very high IQ, by positing that NE-Asians create fewer lateral and top-down synapses. This leads to slightly lower verbal IQ and conceptual creativity compared to Europeans and especially compared to Ashkenazim.

One of my intuitions is that verbal IQ tests do not pick up on this difference particularly well, because they also load on knowledge and pattern recognition. I wondered whether tail effects in verbally creative endeavors would maybe lend support to my theory. To this end I analyzed a dataset of songwriting competition winners [1].

The dataset consists of 2875 US artists that won prizes or honorary mentions in the years 2002-2017. To identify NE-Asian artists I compare the names against the most common Korean [2], Japanese [3] and Chinese [4] surnames. These surnames cover roughly 90%, 33% and 84.8% of these populations respectively. Each hit I then check by hand to exclude anybody who is provably not Asian (quite likely for some names like Young, Shaw or Lee).

Chinese Americans constitute 1.5% of the US population, Japanese Americans 0.4% and Korean Americans 0.8%. Multiplied with the sensitivity of our method, this leads to (0.0150.848 + 0.0040.33 + 0.0080.90)2875= 61 being the expected number of hits for a perfectly proportional representation of NE-Asian Americans.

Instead we only find 13 NE-Asian names that cannot be excluded, more than a four-fold underrepresentation. Of course, one may argue that this is a result of language deficiencies due to relatively recent immigration. However, there is also no upward trend visible over these 15 years. Only seven of these artists are unambiguously Japanese, Chinese or Korean American. Of the rest, one is Japanese but not American, one is Malaysian, one is Taiwanese (not included in the 1.5%) and four I could not identify.

Also, perfectly proportional representation may be the wrong baseline to compare against. Of the 1554 winners of the US Open Music Competition 2019 [5], a competition in classical music, 1050 have NE-Asian surnames. These also have much more typical names, with the most common being Yang, Wang, Chen, Li, Truong, Zhang, Liu, Loo, Wu, Lin. This makes it plausible that we are still overcounting NE-Asians in the songwriting dataset.

There is the additional fudge factor, that you can’t tell who wrote the lyrics. Christopher Tin, for example, whom we counted twice, is a classical composer. His most famous piece is Baba Yetu, the theme song of Civilisation IV. Its lyrics are a Swahili version of the Lord’s Prayer.

Overall we see at least a 4.5-fold underrepresentation relative to population percentage, compared to classical music winners a 150-fold underrepresentation. I chalk this up as consistent with my theory.

[1] International songwriting competition

[2] Korean Surnames

[3] Japanese surnames

[4] Chinese surnames

[5] US Open Music Competition Winners