Is Eric Turkheimer a Scientist?

36 min readMar 12, 2019

Eric Nathan Turkheimer, the Hugh Scott Hamilton Professor of Psychology at the University of Virginia, is much more likely to be a meretricious fraud promoting skullduggery than he is to be a scientist. He has committed himself to the misrepresentation of science, its history, and the actors within it, for clearly ideological reasons. His scientific raison d’être has already been declared, and it is not only unscientific but anti-scientific. In this piece, I elaborate and substantiate this claim, showing that Turkheimer has produced weak research, knowingly misrepresented findings, and treated himself to statistical liberties that would be unacceptable across the political divide. He has done this and more for the purpose of striking at the credibility of scholars with whom he disagrees for ideological reasons.

Why am I writing this?

I’m writing this, now, because Turkheimer is leading a special issue of the esteemed journal Behavior Genetics. The title of his introduction is Genetics and Human Agency: The Philosophy of Behavior Genetics. With a title like this, you would expect the writer to be someone who appreciates behavior genetics. If that’s what you thought, then you do not know Eric Turkheimer. For him, the only proper philosophy of science is ideology conjunct with nihilism — no science at all.

Turkheimer begins his introduction with a lie. He writes:

Behavior genetics has a fraught history with philosophy. In the early days of the field it seemed as though the only philosophical discussion that was possible was between the hereditarian descendants of Galton–Cyril Burt, Raymond Cattell, Hans Eysenck and Arthur Jensen- and their critics from the intellectual left: Richard Lewontin, Leon Kamin and Steven J. Gould. Many of the participants on both sides, of course, were neither geneticists nor philosophers, but they nevertheless set the tone of the debate: the discussion about the genetics of behavior was a clash between a harsh and often racialized determinism and a politically motivated commitment to individual freedom and progressive social values.

On the issue of race and freedom, Turkheimer continues:

Racism and social elitism fundamentally arise from identification of individuals with their genetic ancestry; they ignore individuality in favor of group characteristics; they emphasize pride in group characteristics, not individual accomplishment; they are more concerned with who belongs to what, and with head-counting and percentages and quotas than with respecting the characteristics of individuals in their own right. This kind of thinking is contradicted by genetics; it is anti-Mendelian. And even if you profess to abhor racism and social elitism and are joined in battle against them, you can only remain in a miserable quandary if at the same time you continue to think, explicitly or implicitly, in terms of non-genetic or antigenetic theories of human differences. Wrong theories exact their own penalties from those who believe them.

I’m mistaken. That quote is actually from Arthur Jensen in his 1973 book Educability and Group Differences. Here’s Turkheimer on his neutrality with respect to issues of race:

Given the present state of our knowledge, and insufficient thought on my part, my own prescription for the time being is to deal as best we can with individual differences and let the statistical group differences fall where they may. Society’s general concern with race and other social group differences is not the product of research on these matters, but arises from chauvinist-like attitudes of racial group identity and solidarity in connection with political power and economic interest. It might be termed meta-racism. The ‘race problem’ from that viewpoint is lower in my own hierarchy of values than concern with individual justice and alleviation of individual misfortune. Though it would be blind not to acknowledge the reality of certain statistical differences among populations, I would find it difficult to be the least concerned with any given individual’s racial heritage. Perhaps I may be too insensitive on this score, never having felt much sense of racial identity myself.

I’m mistaken. That quote is actually from Arthur Jensen’s chapter of Modgil & Modgil (1987). Here’s the real Turkheimer quote:

We share a commitment to the prospect of the creation of a more socially just — a socialist society. And we recognize that a critical science is an integral part of the struggle to create that society, just as we also believe that the social function of much of today’s science is to hinder the creation of that society by acting to preserve the interests of the dominant class, gender, and race.

I’m mistaken. That quote is actually from Lewontin, Rose & Kamin’s Not in Our Genes (1984). Here’s the real Turkheimer quote:

[Gould’s] The Mismeasure of Man remains a curiously unpolitical and unphilosophical book…. [Its emphasis] on racism and ethnocentrism in the study of abilities is an American bias…. In America, race, ethnicity, and class are so confounded, and the reality of social class so firmly denied, that it is easy to lose sight of the general setting of class conflict out of which biological determinism arose.

I am again mistaken. That quote is excerpted from Lewontin’s New York Review of Books review of Gould’s The Mismeasure of Man (1980). The tenor of Lewontin’s review should be unsurprising; what’s surprising is that Lewontin had time to write it at all, what with the uninterrupted effort to construct a “socialist science” in opposition to the “bourgeois science” of those — to ape Turkheimer — racialized, reductionistic, determinist opponents of individual freedom. Lewontin’s writing was always reactionary. Turkheimer’s writing is similarly deflectionary and unctuous schlock aimed at safeguarding personal ideology.

Turkheimer has clearly picked a side and he has picked it against all reason. There is no reason at all to believe that Galton, Burt, Cattell, Eysenck or Jensen are anything but staunch individualists; on the other hand, there is no reason at all to believe that Lewontin, Kamin, and Gould — soi disant socialists — are anything but the collectivists they claim to be. Turkheimer prefers not to take men at their word. For Turkheimer, whatever keeps the light on his face is what’s true. This is perhaps too perfect because like the scientifically-reviled and politically-adored Gould he also shouldn’t be taken at his word. At least, when it comes to science.

Summa: Turkheimer lies about the beliefs of people whose views he thinks are bad.

Turkheimer’s Stance

Turkheimer is noticeably incorrect about some of the most prominent figures in his field from the last century. That on its own is not a damning indictment for anyone. If these men were alive, however, his comments would be libel. If it can be substantiated that his beliefs come from a place of bias, however, then there may be something worth discussing.

Let’s take Turkheimer at his word when it comes to something else. In his review of Snyderman & Rothman’s The IQ Controversy, The Media, and Public Policy, he writes:

If it is ever documented conclusively, the genetic inferiority of a race on a trait as important as intelligence will rank with the atomic bomb as the most destructive scientific discovery in human history [emphasis always mine]. The correct conclusion is to withhold judgment.

Let us ignore that Turkheimer’s views about the atomic bomb — perhaps the most powerful prophylactic against large-scale military conflict ever conceived — are likely puerile. Scientists should not withhold judgment regarding matters of fact. Science is the pursuit of truth, not theoretical applied egalitarianism. Conclusive documentation of some phenomenon should come with similarly definitive reporting. Other researchers have noted this “unfortunate and aberrant tendency” (p. 718) being applied only to the social sciences. Turkheimer, evidently, does not believe in science qua science, but instead, science qua something else. What could he want?

Turkheimer has maintained these views. Writing in 2018:

I should be clear that I am not making a “both sides do it” argument. It is the hereditarians who are trying to reach a strong and potentially destructive conclusion, and the burden is absolutely on them to demonstrate that they have a well-grounded empirical and quantitative theory to work with. So, if you are out there and think that group differences t [sic] are at least partially genetic, please explain exactly what you mean, in empirical terms.

In the first highlighted quote, he makes an assumption — a baseless one. No one has ever demonstrated harm arising from hereditarian views. Ideologues like the abovementioned Gould, Lewontin, Kamin, and Rose have all made attempts to insert their preferred myths about the harms of hereditarianism into the popular imagination, but these have never held any water. For instance, it is often claimed that America’s Johnson-Reed Act was inspired by cognitive testing. This is contradicted by the historical record and has not once been substantiated.

Turkheimer provides other examples, namely, the Holocaust and Jim Crow. Invoking the latter is evidence of yet more puerility. The causes of Jim Crow were not beliefs in scientific group differences and it was not held aloft by these either. Jim Crow was intended to suppress the economic rise of blacks; it was protectionism writ large but research was not a signatory.

In the former case, he is confused. He has taken to Twitter to convey two things: (1) idle speculation is only wrong if it is hereditarian, and; (2) group differences are harmful to idly talk about. Note his tacit implication: that group differences research at the time (and by association, now) was responsible for the Holocaust and Jim Crow. This is perhaps a hubristic view. As a “scientist,” Turkheimer may imagine the things he says have a great deal of influence on political machination. He may not. He may believe these things did earlier. They did not, but I welcome him to show that they did. It is a fact, however, that environmentarian beliefs (a la Lysenko et al.) have caused enormous harm. Turkheimer lobbies for enormous harm to science with no as-yet justified reason.

Either way, group differences researchers have always been passionately liberal. Galton, Spearman, and Jensen were certainly so (Cattell and Eysenck, perhaps less). A left bias is what surveys of IQ researchers indicate. This is what surveys of allied fields show as well. The portrayal of these people as anything other than classical liberals can only be made with tendentious evidence and threadbare arguments about hidden motives that — unlike Turkheimer’s — have never been revealed in any substantive fashion. Jensen never called to suppress research or any form of pre-scientific idle chatter but Turkheimer has and continues to. Turkheimer is no liberal and certainly no proponent of scientific integrity. The topics of his snide, libelous remarks often are.

Contrary to Turkheimer’s insinuation, the Nazis vehemently opposed the Galton-Spearman tradition. They preferred a Jungian, typological and völkisch conception of intelligence, more akin to Gardner’s Multiple Intelligences or Sternberg’s Triarchic Abilities. In fact, the arguments of Gould, Lewontin, Rose, Kamin, Sternberg, and Gardner are hardly distinguishable from those made by prominent Nazi psychologists (few, if any, Nazis considered themselves to be concerned primarily with mental measurement; this was “bourgeois”).

Given that I’m a native German speaker, I can read sources on this that Turkheimer may be unaware of. Erich Rudolph Jaensch and Friedrich Becker offer an excellent description of Nazi views towards intelligence and group differences. From their writing we can see that the Nazis believed intelligence research represented the victory of a “bourgeois spirit”; that intelligence measurement — especially with a predominant general factor a la Spearman — was an “instrument for Jewry” to “fortify its hegemony” over white Gentiles; that the use of intelligence testing in schools was “a system of examination of Jewish origin,” etc. Almost exactly like Sternberg (and more recently people like Nassim Taleb), the Nazis claimed that, because people differ and thus intelligence differs, there should be examinations not for “intellectualism,” but for “practical intelligence.” The Nazis claimed that correlation and factor analysis were invalid tools for understanding anything (compare this to Hilgard 1955 p. 228: “Correlation is an instrument of the devil”) and that even if a general factor emerged from ability tests, it was invalid; that life was more complex than a dominant general factor (a fact no one contests); and that regardless of what the results showed, they would be unconvincing, because understanding is always distant, multifaceted, complex. In Volkmar Weiss’ Vorgeschichte und Folgen des arischen Ahnenpasses: Zur Geschichte der Genealogie im 20. Jahrhundert (also recommended is Thilo Sarrazin’s Deutschland schafft sich ab) he records that the Nazis rejected IQ tests; if they had accepted them, they would have had to admit that Jews outperformed Germans and that Slavs performed very similarly to Germans. This is completely inconsistent with what Turkheimer implies and with the apparently popular view that Nazis cherished and frequently used IQ tests. When it came to group differences in humans, the Nazis were pragmatic nihilists capable of denying evidence when it suited their agenda — like Turkheimer.

The disdain for the “bourgeois” displayed by the Nazis was shared by their intellectual successors, Lewontin et al. With the same collectivistic impulse, both groups attacked the individualistic g-based Galton-Spearman-Jensen paradigm. The first articulations of the criticisms of intelligence brought forward by pop intellectuals like Gould et al. were made by the Nazis (this includes “reification” and certain types of accusations of class bias). Exchanging the terms “1%” or “dominant class” for “Jews” would make it impossible to distinguish their arguments. The censure of Luria by the communists for his interest in intelligence is yet another example of the totalitarian (this time communist) opposition to this sort of research — les extrêmes se touchent.

The Nazis also believed much of the Jewish-Gentile difference to be the result of environments, not genes. The Nazis were typological thinkers when it came to anthropology, but types were often the result of the environment (certain types of education, they believed, would make Germans more like Jews). In Adolf Hitler’s Mein Kampf, he mentions a blackguarding type bearing an indelible blemish from his cramped home environment. Hitler believed that by improvement of German rearing environments, these blackguards could be precluded from existing, without a single shot being fired. Similarly, the Nazis believed that Jewish children could be re-educated away from their Jewishness. Heredity was secondary for the Nazis. Lothrop Stoddard’s trip to the Nazi eugenics courts in Into the Darkness is an almost farcical illustration of this fact. Many young Jewish children were forcefully adopted into German families out of the Nazi belief in the impotence of heredity and the fluidity of the mind.

How could Turkheimer have believed the study of heredity and “reductionism” to g was related to the Holocaust or the Nazis in general? It’s hard to say because he has never substantiated anything he’s said on this matter. It doesn’t seem likely he can. It does seem likely, however, that he makes these claims for ideological reasons. It’s very easy to taboo research as a means of combatting it when the empirical evidence is not on your side. But do we know that Turkheimer has Naziesque views on intelligence research, capable of motivating him to push for a taboo, like the enlightened progressives Lewontin, Rose, Kamin, and Gould? In fact, we do:

Scientific rightists are comfortable using race as an explanatory variable, tend toward single-factor models of ability, would not mind having their views characterized as philosophically reductionist, and accept a moderate to large degree of genetic influence in most human behavior; leftists reject race, at least as a biological variable, support multifactorial views of ability, support more holistic views of the philosophy of science, and are suspicious, to put it mildly, of genetic accounts of behavior.
Schönemann’s work is an important part of a literature that is founded on a thoroughgoing rejection of a complex of ideas embraced by school of establishment psychometricians and behavior geneticists under the influence of Galton, Spearman, and Pearson, by way of Burt, Eysenck, and Cattell, and more recently by Jensen. Of course, grouping together such an enormous and varied collection of psychometric theorists only serves to emphasize the differences among them, but that is precisely the point I wish to make: The psychometric establishment includes considerable variability of opinion about issues like single factor models of ability, the quantification of genetic influence, and the applicability of psychometric theory to social issues involving race and poverty. Nevertheless, there can be little doubt that the centroid of this multivariate belief space lies to the right of the scientific and political center. One need only turn to the preemptively titled, « Mainstream Views on Intelligence », published in the Wall Street Journal (of all places) to get a flavor of the central tenets on which the establishment is able to agree: Intelligence is a meaningful attribute of human beings, well-represented by a single factor called g, and substantially heritable; it is an important determinant of social and economic success in America, and contributes to an unknown degree to differences in socioeconomic status between White and Black Americans…. [I have skipped a great deal of content which is good and I will admit shows well on Turkheimer, because I wanted to save space. I recommend reading this whole paper. It is a good illustration of the publicly reasonable Turkheimer, as opposed to the other Turkheimer who lacks scientific integrity.]
A psychometric left would recognize that human ability, individual differences in human ability, measures of human ability, and genetic influences on human ability ‘are all real but profoundly complex, too complex for the imposition of biogenetic or political schemata. [t would assert that the most important difference between the races is racism, with its origins in the horrific institution of slavery only a very few generations ago. Opposition to determinism, reductionism and racism, in their extreme or moderate forms, need not depend on blanket rejection of undeniable if easily misinterpreted facts like heritability, or useful if easily misapplied tools like factor analysis. Indeed it had better not, because if it does the eventual victory of the psychometric right is assured.

A lot of this is commendable. Much of the rest is not. The similarity to Nazi conceptions of cognitive ability is obvious. The point here is not to disqualify Turkheimer, as there is no real problem with having beliefs that are similar to those held by bad people. There is a problem, however, with his habit of lying about the beliefs of others and their relationships to these bad people. When Turkheimer claims Jensen et al. to be similar to the Nazis, he betrays that his own beliefs more closely resemble them and that he has expressed the sort of unscientific, fundamentalist opposition to empirical fact which only belongs in a dogmatic church. The people Turkheimer praises have done similarly; the people he smears have not. This is a matter of fact, and it matters because it shows Turkheimer is not acting in good faith.

His biases expressed above have undoubtedly influenced his work. For example, on the general factor of personality (GFP). As noted above, Turkheimer is opposed to explanations of phenomena which are unidimensional or nearly-so (though of course, things like g and the GFP do not preclude or in any way militate against nuance). He, therefore, considered it adequate evidence to orthogonalize the GFP and declare the residual item clusters as evidence against the plausibility of the GFP. He claimed these clusters pointed to evaluative rather than descriptive personality variance, but his method could not have supported this claim in any way but a subjective one. The methodological limitation is met on the other end by the failure to even test if this hypothesis is true. Turkheimer’s study included no attempts to relate content to criterion variables, so its conclusion was impossible. It is lucky that other researchers were as unconvinced as I am because it’s these researchers who will ultimately make contributions to our understanding instead of manufacturing sophistic attacks under the guise of science.

Some researchers have responded in rather simple ways such as by controlling for the social desirability of questions; others have emphasized the continual criterion validity of the GFP in spite of Turkheimer’s claims; some have preempted his study with proper factor analysis; yet others have used MTMM, which does not directly invalidate explanations based on item valence but still makes them unlikely; some have even emphasized the relationship of the GFP to objective behavioral outcomes and pathology. Most importantly, contra-Turkheimer is work which has used ipsative personality evaluations. The ipsative format is more commonly known as “forced choice.” When there is no difference in the item valence, the GFP continues to emerge. It is unsurprising that Turkheimer believes otherwise and that he does so on the basis of such weak evidence.

To interpolate into Marx, for Turkheimer, vague “complexity” is a passe partout which explains everything because it explains nothing. It is nothing more than a ruse. Turkheimer is absolutely certain when it comes to disproving various theories that don’t comport with his views, but by his own standards, this is never a real possibility.

The bad faith of Eric Turkheimer is more than obvious from his gallingly ignorant and insulting remarks towards more scientifically-inclined researchers. For instance:

Only @KirkegaardEmil would proudly advertise support for the author of “Early Jews and the Rise of Jewish Money Power” and “Swindlers of the Crematoria” as a list of “who’s cool in behavioral genetics and IQ.” He that lieth down with dogs shall wake up with fleas.

Since that tweet, the person Turkheimer attacked has published and replicated the result of a paper showing that the mean Jewish advantage in cognitive ability compared to white Gentiles is mediated by an advantage in their mean polygenic scores. But the inaccuracy regarding Kirkegaard’s attitudes towards Jews (like myself and Jensen; Kirkegaard himself is also part-Jewish) is beside the point. Here, Turkheimer’s remarks have been shown to be nothing more than an ungrounded character attack, ostensibly based on his reading and misinterpretation of a Wikipedia page. Turkheimer’s commentary is unwarranted, incorrect, and unbecoming of a putatively trustworthy researcher. It is worth noting that Turkheimer is willing to cite apparently abhorrent people like Kirkegaard if it supports his Weltanschauung. Kirkgaard lieth with dogs but Turkheimer lyeth with a straight face.

In the second highlighted quote, Turkheimer indicts himself. Turkheimer does believe group differences have a genetic component, and yet he has never formally explained how these might work. I am of course talking about social class differences, sibship differences, differences between family members. Why do we accept these differences but not racial ones? These are just as statistical in nature and are thus just as constrained theoretically and empirically as are race differences. I attribute this admission to the fact that Turkheimer would probably be discredited if he did not admit even these differences.

The between-group heritability calculated via DeFries formula (which Turkheimer acts ignorant of) is at least 71–73% for the Jewish-white Gentile difference. There is no difference between the validity of this heritability and the heritability found through the study of twins, nor is there any reason to think this is different than the similarly-sized differences between US blacks and whites. On that point, Turkheimer is wont to underestimate heritability. In his introduction to the special issue, he gives an off-hand figure of 0.4 for the heritability of extraversion. It is doubtful that Turkheimer does not know that personality traits have higher heritabilities when they’re measured with item response theory methods instead of (the) crude methods (which he uses but is — mysteriously — not criticized for) like comparisons of twin resemblance or Falconer’s equation. But it is in his interest to give the lowest possible credible estimates for heritability. This is because Turkheimer understands that variance components are not independent of group differences.

Every credible person in this field recognizes that high heritabilities within groups do not mean that the between-group heritability for some trait is necessarily high or even greater than zero. However, variance components do constrain the variance attributable to different sources. To determine how large the environmental differences would need to be in order to explain a group gap entirely in environmental terms, Jensen, in his 1998 The g Factor, supplies the formula (d/√1-h²) where d is the size of the mean difference in some trait in terms of Cohen’s d and h² is the narrow-sense heritability of a trait. If groups display a 1 d difference in some trait that’s 50% heritable, the environment must be 1.41 d worse on average in the group with the lower mean level of the trait in question. For the black-white gap in intelligence, which Turkheimer is most focused on, the difference in the US is 1.1 d and the latent heritability has been found to reliably turn out greater than 85% in adulthood. Assuming the naïvely corrected heritability from Panizzon et al. (2014) of 91%, and assuming that both races show the same heritability, the necessary environmental gap is 3.67 d. This means that black environments meet the white mean environmental quality less than 1% of 1% of the time. Given that socioeconomic status measures index environmental quality and the gap in datasets like the NLSY, NELS, NSID, NCPP, and various test standardization samples are usually around 0.65 d, this suggestion is simply incredible.

Summa: Turkheimer likes to imply that Nazi ideology resulted from hereditarianism, implying that modern IQ research is liable to give way to Naziism. His understanding of Nazi belief regarding heredity is best-described as naïve and pragmatic. He does not consider there to be any danger associated with environmentarianism despite the fact that it has harmed many people (and in spite of the possibility that biological attributions may be associated with tolerance and outcomes directly opposite what he has claimed, like reduced sentencing severity for criminals). When it comes to potential harms from hereditarianism, the onus is on Turkheimer to show them, but all he has for his case is misrepresentation. His bias has contaminated his work and led to shoddy empiricism that comports with ideology but not logic. Turkheimer does believe in genetic group differences, but only selectively. Turkheimer is not an agnostic with regards to the sources of group differences. Turkheimer understands the relationship between variance components and group differences, though he acts as if it’s irrelevant. Bad maths does not disturb Turkheimer if it comports with his ideology, which can be summarized as such:

Opposition to findings that comport with a unidimensional view of cognitive ability;
Opposition to high heritabilities;
Opposition to the existence of genetic group differences;
A dogmatic insistence that there is no genetic component to (select) group differences.

When one’s ideology involves the wholesale denial of genetic differences between groups— as it does for Turkheimer — one cannot be trusted to discuss the existence (or non-existence) of these differences in an objective, scientific manner. Turkheimer echoes this sentiment.

Lies, Damned Lies and the (Non-)Sparsity of Effects

In order to maintain the plausibility of a purely environmental racial gap, Turkheimer underestimates heritabilities. But that’s not all. He also acts as if widespread and highly-significant statistical interactions can explain the observed group differences. This is his hobbyhorse, the Scarr-Rowe effect. This is where the most obvious examples of scientific misconduct come into play.

The Scarr-Rowe hypothesis is a theory based on the idea that environments and genetic variance work additively. By this theory, worse environments entail reduced genetic contributions to traits. Therefore, if environmental inequalities are greater, the genetic variance in a trait in some population will be reduced. When a Scarr-Rowe effect is found, it’s generally the case the heritability is found to be lower in individuals with lower environmental quality as assessed by measures of socioeconomic status (SES). However, this is not always the case. In others, it is found that SES — which is a shared environmental variable — merely moderates the non-shared versus shared variance, such that individuals at the lower end of the SES distribution have greater shared environmental influences on their abilities. The shared environment can be bad, or it can be good, but it is generally bad when its impact persists into adulthood. In the context of the Scarr-Rowe, moderation of the non-shared environment by the shared environment typically implies constraints to the ability to choose one’s own environment.

I believe this hypothesis is true, if weak. I know one person who aims to meta-analyze the literature on this effect and he tells me (personal correspondence):

The Scarr-Rowe effect is robust and it doesn’t only affect intelligence. In my upcoming meta-analysis, I show that the animal evidence is also consistent with the existence of sizeable and reproducible Scarr-Rowe[ effects].

The size of this effect is usually small and this literature is plagued by faulty methods and bad statistical reporting (most interactions, as in all fields, fail or are absurdly weak). Turkheimer (with Richard Nisbett and Paige Harden) implies that the Scarr-Rowe effect can explain some part of the racial gap in ability:

The heritability of intelligence, although never zero, is markedly lower among American children raised in poverty. Several interpretations of this fact are possible. The one we find most persuasive is that children raised in those circumstances are unable to take full advantage of their genetic potential because they do not have access to the high-quality environments that could support it.

It is implied that blacks show lower heritabilities because they are generally poorer than white Americans. However, the heritability of cognitive ability has been found to be essentially the same regardless of race in the US. This is an instance where Turkheimer is verifiably promoting fraud. He has done this more than once. In the article where he makes that quote — an article discussing racial differences — , he links to his 2003 Scarr-Rowe finding from the NCPP study. This finding has never replicated at this magnitude and is, in fact, by far the largest effect size in the literature on this subject. More importantly, however, is that this study did not find a Race x SES interaction and that Turkheimer knew this. Turkheimer also knows that the heritability of IQ, like many traits, increases with age — this is one confound he is confident in ignoring. Here’s the relevant table omitted from the cited study:

And here’s what Turkheimer said about it privately:

[T]here were some differences, but they were NS [meaning non-significant].

And yet Turkheimer has never stopped anyone from implying his results vindicate a Race x SES interaction; nor have null findings persuaded him to avoid arguing this way. Pre-scientific idle speculation is apparently OK if it’s Turkheimer doing it (ignore that this can have real, negative effects in the form of resent, hatred, and the taboo of useful research). And it is clear that he knows he’s doing this because he knows the origin of the Scarr-Rowe hypothesis as an explanation for black-white differences. He clearly knows much more than he lets on but in order to avoid the “dismal” (quoting Turkheimer, multiple occasions) conclusion that race differences have a genetic component — as he admits other group differences do — , he has to act like he doesn’t. This is what he has done with Spearman’s hypothesis.

Turkheimer knows about Spearman’s hypothesis (the hypothesis that the g factor of intelligence tests is central to group differences just as it is to the predictive validity of IQ tests for almost everything else). Turkheimer is also not mathematically inept (in point of fact, he’s better with maths than most of the people he criticizes, but he makes what seem to be selective errors). And yet, in his commentary on Schönemann’s criticism of Spearman’s hypothesis, he writes:

Suffice it to say that I find his case compelling, reinforcing the strong impression that Guttman’s (1992) classic posthumous paper had already made. There is one statistical point which needs to be explored more fully, however. Although Schönemann appears to be fully correct regarding the Level I interpretation of Spearman’s hypothesis, his psychometric derivation of the Level II interpretation depends on the particular method he employs to divide the groups — i.e., dividing them into high and low scorers.

Schönemann — and through the acceptance of his thesis, Turkheimer — believed that Spearman’s hypothesis was a mathematical tautology. But this is in fact only true in the context of Schönemann’s simulations. The problem with Schönemann’s simulations is that they do not represent any of the empirical data Jensen presented. What’s more, Spearman’s hypothesis is clearly non-tautological, as it only necessarily follows given that tests display strict factorial invariance (SFI) which leads to collinearity between factor loadings and their means. Through his associations with Paige Harden — a woman who is aware of measurement invariance and its implications — , Turkheimer almost certainly understands that the finding of SFI makes the Scarr-Rowe effect unlikely to be explanatory with regards to group differences.

More important than all of these small errors and misrepresentations of his own findings, though, is that Turkheimer implies that the Scarr-Rowe can explain race differences when the published data seem to indicate it is an anti-Jensen effect. Insofar as group differences in ability are a product of g, the things which affect these differences must be Jensen effects, meaning they have a substantial, positive relationship with g; large negative relationships to g indicate an anti-Jensen effect, and thus the tenuous and perhaps non-relationship of some variable to group differences. To give two examples — one Turkheimer certainly knows (because he has cited it on numerous occasions) and one he almost certainly does not — , take Scarr (1981) and the Penn Neurodevelopmental Cohort (PNC) study (chosen because it’s from the same area as much of Scarr’s work).

In Scarr’s piece, she showcases differences in the heritability of the results of five IQ tests given to blacks and whites in the Philadelphia Twin sample. The tests are the Raven’s, Columbia, Peabody, Benton Error, and Paired Association tasks. In order, the g-loading for whites is 0.82, 0.74, 0.76, 0.77, and 0.5; for blacks, 0.8, 0.77, 0.7, 0.74, and 0.6. The average is 0.81, 0.75, 0.73, 0.76, and 0.54. The heritability for whites is 0.68, 0.48, 0.48, 0.62, and 0.14; for blacks, 0.54, 0.42, 0.28, 0.6, and 0.5. The Scarr-Rowe effect in this sample, defined as the black-white difference in heritability, is -0.14, -0.06, -0.2, -0.02, and 0.36. The correlation between average g-loadings and the magnitude of the Scarr-Rowe effect is -0.9 — a strong anti-Jensen effect. Therefore, using Scarr’s original sample, we can conclude that the Scarr-Rowe effect is probably unrelated to group differences in ability (at least, if the result if significant).

A genome-wide complex trait analysis (GCTA) of the PNC revealed mostly insignificant heritability differences between blacks and whites. Since we have a strong prior to expect the Scarr-Rowe effect to go in a certain direction, it is appropriate to use one-tailed instead of two-tailed significance. Turkheimer has also argued for the plausibility of insignificant values for the Scarr-Rowe effect and has failed to supply adequate model fit statistics while calling a model that’s non-invariant by regular definitions “excellent,” so we can be statistically liberal too (as a further, interesting note, when Turkheimer applied for his grant to study the Louisville Twin Study, he claimed the data should be made available to the wider scientific community. This may have been a ploy to obtain funding as his desire to share data apparently does not apply very widely (the data are still not available); further, when asked to share data on an apparent vindication of the Scarr-Rowe effect due to a preschool intervention in a conversation with Charles Murray on Twitter, he did not. The person I quoted about a meta-analysis of the Scarr-Rowe above contradicted me on this, saying that Turkheimer is actually very open and helpful with obtaining data, but I will leave the reader to decide what’s correct.)

The heritabilities of various cognitive tests in the PNC for whites were 0.32, 0 (overlapped with 0), 0.29, 0.17, 0 (overlapped with 0), 0.28, 0.29, 0.24, 0.48, 0.73, 0.31, 0.35, 0.18, and 0.28; for blacks, 0.34, 0.12, 0.28, 0.38, 0.42, 0.56, 0.46, 0.35, 0.45, 0.67, 0.22, 0.3, 0.11, 0.34, and 0.37. The Scarr-Rowe effect was thus 0.02, 0.12, -0.01, 0.19, 0.42, 0.28, 0.17, 0.11, -0.03, -0.06, -0.09, -0.05, -0.07, 0.06, and 0.37. The aggregate g-loadings for these tests were 0.315, 0.453, 0.499, 0.434, 0.395, 0.245, 0.632, 0.63, 0.71, 0.633, 0.455, 0.327, 0.39, 0.329, and 0.271. The correlation between g-loadings and the Scarr-Rowe effect is thus -0.35 — another anti-Jensen effect. If we omit Scarr-Rowe effects due to non-significant heritabilities in the white sample, the correlation becomes -0.19 — still an anti-Jensen effect. These are likely underestimates because GCTA presents what are effectively lower-bounds for heritability. The g-loadings were also uncorrected and I did not calculate these using ρ.

While these give us some initial reason to doubt that the Scarr-Rowe effect is even related to group differences in cognitive ability (or perhaps more generally, any trait which has experienced strong selective pressures in the past because plasticity is maladaptive, though this remains to be seen), we can go further and formally analyze a published dataset on a related topic: phenotype to environment transmission. Ritchie, Bates & Plomin (2015) provide the correlation matrices and effect sizes of intelligence gains due to discordance in reading within twin pairs. Based on their intelligence test correlation matrix, we can generate g-loadings. The n-weighted average g-loadings by test categories (verbal IQ, non-verbal IQ) can be calculated. By squaring the path weights, we can get an effect size that we can then correlate to the g-loadings. Because the reliabilities are not supplied, I cannot correct for them. Many of these paths are insignificant, the total correlation is insignificant, and the total actual effect is for very few IQ points, but my main interest is the overall correlation between the gains and g, which is r = 0.29. To my knowledge, this is the first discovered environmental Jensen effect and it adds up to real g gains. This makes for the first demonstration of a real increase in intelligence from the environment. It is uncertain if this is confounded by preexisting differences or if the effect is real given that this wasn’t solid longitudinal data, but the point remains that these sorts of things deserve further investigation and consideration through the lens of theories like Spearman’s hypothesis. Regardless of whether group differences are concentrated on g, ignoring them will not fix them, but investigations like Ritchie, Bates & Plomin’s may help.

(I did not initially expect to find a Jensen effect on an environmental variable when I began writing this piece. I checked my numbers multiple times to make sure I had made no mistake. This finding is genuine, if causally underdetermined. Findings like this should warrant further research; they are not worthy of being tabooed as pre-scientific idle chatter.)

With further investigation, we could be reasonably certain of whether the Scarr-Rowe effect and the effect of selecting one’s own environment are really Jensen or anti-Jensen effects. But Turkheimer is having none of that. He has not put his data out there for either of these effects to be examined and he has all but denounced the idea that science can help us to understand the natural world (in militating against him, I again note that I have had friends contradict me on this very issue). For him, group differences are apparently not an empirical question, but a moral one. This makes no sense.

He inveighs:

I want to start by considering why the discussion is so frustrating…. The third reason why this argument never gets anywhere is the most important: there is no valid scientific basis for answering the question in the first place [sic]. Think for a moment what the discussion would be like if it was about heritability of individual differences in IQ rather than group differences. There are still a few people out there who don’t believe that genes have anything to do with individual differences in intelligence.

The two issues with the whole piece and this part, in particular, are that:

Turkheimer’s standard of “valid scientific basis” implies most science, being indirect as it is, is not valid, and;
Group differences are literally no more than aggregated individual differences. There is no gestalt shift between levels of comparison, except that ACE components at the individual level which represent randomness (parts of E) cannot contribute to group means. (In the same article, Turkheimer seems to be unaware or in denial about the existence of formulae like DeFries (above) or structural equation models like Rowe’s.)

If we can allow a contribution of genetics to individual-level differences, then we must allow them for group-level differences unless there are X-factors separating the groups. X-factors have never been demonstrated and will be falsified in an admixture design unless they’re of the totally unfalsifiable, unscientific variety. There is no avoiding this fact, it is a litmus test of epistemic rationality. To claim the absence of genetic influence on one group difference but not another with no logical, empirically-testable reason for that conclusion is to argue for ignorance. What’s more, it follows from his nihilism towards empiricism coupled with his claims about dangers that he shouldn’t speculate about the gap. This should be especially true for instances where his speculations are ones that he knows are based on things which are wrong, as in his Vox article attacking Charles Murray, where he uses both lies and apparently unacceptable methods to do it.

Others have pointed out the inadequacy and contradiction inherent in Turkheimer’s “spittle-flecked invective.” Given what must follow from Turkheimer’s remarks about the philosophy of science, it must follow that he would denounce all indeterminate models and the policy consequences of environmentarianism, but he does not. He has never come out against left-friendly narratives, preferring instead to repeat them even when they cannot be falsified.

Summa: Turkheimer knowingly misleads. He alludes to the non-existence of behavior genetic methods and advances he is well aware of — like structural equation models for group differences, techniques like Mendelian Randomization (genetic instrumental variables), latent causal variable analysis (which has just confirmed that larger brains are a causal influence on ability), and equations relating group differences to individual heritabilities and environmentalities — for no clear reason. Instead of citing far more moderator meta-analyses — of which he is well aware — of effects he believes are important for this topic — an impossible judgment by his logic — , he decides to present examples of the Winner’s Curse in the form of pre-scientific idle speculation he knows can be falsified with data he possesses. Many of the things he does with regards to the topic of group differences are flagrant contradictions of himself. Using an earlier example, he believes in certainty with regards to the non-existence of the GFP, even when the method precludes even possibly demonstrating that; and yet for him, there can be no evidence for group differences by any means. He tergiversates between evidence being possible and impossible as soon as it can suit his wants. This evinces anti-scientific, anti-objective bias, of which he has, thankfully, published extensive proofs.

Miscellany

For Turkheimer, something is only causal for something else if its effects can be mapped to the most elementary level, not if it can be demonstrated to be causal by normal standards. Perhaps I’m biased by being an economist, but if the standards for evidence were as high in our field as he wants them to be for this issue, none of us would ever have published anything about anything and the world would be worse for it (there is no evidence or reason to believe the world would be worse for knowing about the causes and consequences of group differences, however; this applies to Turkheimer’s work as well, since it would also be hopelessly underdetermined). Luckily, science does not work by withholding judgment, but by proposing and testing amongst competing theories. Turkheimer should read the causal inference mixtape or the behavior genetic version of it. But Turkheimer and people who appear to be little more than his lackeys already know this and allude to it to give the appearance that they have something resembling scholarly integrity. For example, Paige Harden writes:

In my next post, this is the question that I will consider in more detail: what would strong evidence regarding genetically-based group differences look like?

Paige likely understands that science has always advanced as Lakatos described— by prediction. The choice between theories is arbitrary, but the theories which should win out tend to be the ones supported by the evidence. What Turkheimer instead argues is that science ends here. Unless Paige has written what she would consider evidence on some other site, she has not made the post she claimed she would, and we’re coming up on two years now. She should take a lesson from David Geary and specify her ideas clearly, present the evidence for them, detail the underlying mathematics if any are necessary, and then give us examples of predictions that would follow from or falsify that theory like hereditarians have (see also here and here). To another point, the Genetics of Human Agency seems to be attracting only out and out liars. To err on the side of charity, they may just be imbeciles. Paige’s errors in the same article are excellent proof of this; for example:

As Turkheimer et al. (2003) makes clear, theirs was not the original study that inspired this line of research. This line of research dates back to Sandra Scarr (1971, whose original hypothesis was quoted in the first paragraph of Tucker-Drob & Bates, 2016), and there were subsequent tests of the hypothesis by David Rowe and colleagues. Given this, why should the Turkheimer et al. (2003) effect size be the anchoring point?

Turkheimer et al. (2003) should clearly be the anchoring point because it led to the solidification of the Scarr-Rowe hypothesis and inspired most of the follow-on analyses. Without Turkheimer, the Scarr-Rowe hypothesis would have likely remained undeveloped for far longer if it was developed at all, and the new class of experts on the Scarr-Rowe effect like Woodley of Menie and Bates might not have cropped up with the new methods (individual-level heritabilities and an understanding of genetic expressivity moderation through the continuous-parameter estimation method and genomic virtual parent designs, for example) needed to continue the empirical advance of this idea.

WWBS then question whether the interaction would hold past childhood: “…even if the heritability of intelligence in some groups was low in childhood (say, 10% or so), it is not clear that it would remain low into adulthood.”
This exact question was tested in the meta-analysis! From Tucker-Drob and Bates (2016), emphasis added: “We examined whether test performance was measured in childhood or adulthood, childhood age of testing, whether the tests measured either achievement and knowledge or intelligence, whether a single or composite indicator of SES was used, and whether the tests were of single ability or a composite cognitive measure. None of these additional moderators achieved statistical significance, and the cross-national difference in the Gene × SES effect remained when each of these possible moderators was entered into the meta-analytic model. Thus, the cross-national difference identified does not appear to be an epiphenomenon of cross-national differences in the age ranges examined [none of this emphasis is mine] or the particular intelligence or achievement outcomes measured.”

The difference between meta-analysis and case-level data is well-known, but Harden doesn’t appreciate this fact. Moderators in a meta-analysis are not likely to have the same effects they would when pooling case-level data. The authors did not, and because of data sharing rules and researcher proclivities, could not have actually tested whether age was a moderator, or if any of these other items was a meaningful moderator with the given level of power and heterogeneity — but Paige should know this, as just two paragraphs prior, she remarked that the studies could not be treated as arising from the same population. Why the contradiction? An example of age moderation comes from a more recent study by Turkheimer.

There are no twin datasets with enough Black participants to provide a strong test of racial differences heritability of IQ is lower in Blacks. (To give you an idea of how large the necessary sample size might be, Tucker-Drob and Bates, based on their meta-analysis, estimated that a sample of 3300 twin pairs would be necessary to detect the gene × socioeconomic status interaction with sufficient statistical power.) Power for designs that compare full-siblings vs. half-siblings (such as in the study they cited) is even lower than for a twin design.

At the point where all of the published literature (a meta-analysis of this was linked above) is insufficient to determine whether or not the heritabilities are statistically differentiable, we can confidently conclude that the difference is minor. If it were substantive, then it would be easily picked up in existing datasets. I hasten to add that Figlio et al. (2017) had a sufficient sample size and still found no Race x SES interaction (personal correspondence). Project Talent and the NLSYLinks are just two other datasets that are sufficiently large to pick up differences of even very minor size. I suggest researchers use them.

They then suggest that the Black-White IQ gap persists even controlling for social class: “However, even if there is a small race difference in heritability estimates, researchers could limit their analyses to middle- and upper-class Blacks and Whites, where the IQ gap is about the same, if not larger, than it is among lower-class Blacks and Whites.”
This comment implies that the environmental experiences of Blacks and Whites in America can be equated by simply equating groups on income. This is sociologically uninformed. Pages and pages could be written about this, but I will limit myself to two examples. First, middle- and upper-income Black households, on average, live in neighborhoods that are much more disadvantaged than the neighborhoods that Whites with similar incomes live in. For example, Black households earning $60,000 per year live in neighborhoods that are comparable to the neighborhoods where White households making only $11,800 live. Second, even when you limit comparisons to the top quintile of income, the median wealth of Black households is 43% of the median wealth of White households.

This comment is statistically uninformed. Paige may not understand restriction of range. Even confining our observations to the next-highest quintile, the differences remain. The point of the people she’s addressing is that the gap does not shrink but instead grows when the relative level of environmental insults (as indexed by SES indicators) declines. As Jensen demonstrated in 1973 and 1998, the gap isn’t reduced beyond a third in a hierarchical regression of as many environmental variables as you like, but these are genetically confounded. The SES-IQ correlation is really entirely genetic. The reason for the seemingly-contradictory finding that adoption raises IQ, is that those gains do not relate to g and the SES-IQ correlation in biological families is related to the meaningful and most heritable component of IQ tests, g, whereas gains due to enhanced SES with adoption is not. What she’s saying does nothing to that unless we assume the Scarr-Rowe is wrong. We can accept her contradiction and imagine that environments uplift — when this has never been demonstrated (though the reverse has) — , but doing that requires abandoning empiricism. For her to be correct, we should be able to improve environments and thus reduce the genetic variance for traits, but only the opposite has been demonstrated.

So, a child growing up in a middle- or upper-class Black home, as defined by household income, still begins life with significant material and contextual disadvantages compared to a child growing up in a White household with comparable income.

While some of her research has shown that race is a shared environmental variable that explains a significant portion of the variance in environments between blacks and whites in childhood, there is no evidence that it’s a meaningful causal influence in adulthood for most any trait of interest. Other analyses have found that the effects on g/IQ of many types of environmental influences are basically nil. What could she mean? She wouldn’t dare say that race is an unmediated influence on the environment. It’s unlikely that it’s something related to group differences in intelligence for reasons already stated, but also because of the fact that racial differences in ability are inversely correlated with g-loadings and heritabilities. That is, when a test is more g-loaded, the black-white differences in it are larger; when the test result is more heritable, the black-white differences in it are larger; when the test displays greater environmentality the black-white differences in it are much smaller. She will have to be specific, but there’s reason to doubt that information will be forthcoming: one paragraph below the one quoted above, Paige repeats the lie from the Vox article, but this may be because she’s ignorant of the things Turkheimer knows.

Paige is possibly an even more vitriolic character than Turkheimer. Here she writes that open science is leading to eugenicist pseudoscience and alludes to this being about Kirkegaard’s genetic mediation analysis conducted in a sample of 53 Ashkenazi Jews. (Note that this is high powered, consistent with prior literature, and immediately replicated with a four-times larger sample in the HRS.) It is far easier to smear those you dislike instead of providing a reason to believe they’re wrong. This is what Harden has elected to do instead of science. This is what will continue to happen in the future as well since the biggest barrier to the ascendancy of hers and Turkheimer’s pragmatic ideological interpretation of biology is the availability of much stronger, far more cogent, and wholly contradictory theory which can be backed up by publicly available data.

Summa: Turkheimer may not be alone in his political crusade against behavior genetics and the study of group differences in particular. When it comes to answers to the hard questions — such as those about g — , neither he nor his lapdog, Harden, seem to be clear or even slightly coherent. Their writing is rife with contradiction and opportunistic interpretation. They have demonstrated a lack of basic scientific integrity that should cast doubt on anything they have to say about these matters. At the same time, they have attempted to impose confused standards on debate and to derogate or taboo researchers interested in tackling these hard questions in an objective fashion.

On the basis of these facts, it should be unsurprising when Turkheimer et al. question the usefulness of things like polygenic scores on the basis that they’ve helped no one or solved none of the badly-defined issues he believes they should have in order to be valid. Turkheimer is guilty not only of bad science but of bad faith in general. I view it as unlikely that Turkheimer will produce anything but obloquy in the future. If science is a battle against ad hoc explanation, then science is a battle against Turkheimer.

Is Eric Turkheimer a Scientist?

Why am I writing this?

Turkheimer’s Stance

Lies, Damned Lies and the (Non-)Sparsity of Effects

Miscellany

Written by Crémieux

Responses (2)