Correcting a Confused Hereditarian

Crémieux
19 min readMay 20, 2020

Do stilts make you taller? Two immediate answers— “yes” and “no” — are both correct, depending on one’s point of reference and their choice in interpreting the question. This question thus presents a problem, as both answers are legitimate given elaboration, but no elaboration is provided in asking it like it is. This does not mean that neither answer is correct. Similarly, when the question “Are racial differences genetic?” comes to the fore, both “yes” and “no” are correct answers provided some extension of the question.

Today, Ion Rimaru (this pseudonym comes from a Romanian sequence killer) has made a post explaining why he thinks the gap between Blacks and Whites in the United States is “almost certainly genetic.” I believe he’s confused — about the question and about his answers; likewise, I believe he doesn’t know how to even begin asking the question he has intended to answer. In short, I don’t think he’s that bright. I’ll document these conceptual confusions like he does his own reasons for assuming the Black-White gap is genetic. Before that, however, I’d like to clarify that I also believe this gap is basically “entirely genetic,” but I’m not going to discuss what that means or why (because I don’t have to). Readers who are not interested in pettiness should not continue reading since I am a pedant.

The Existence of a Black-White IQ Gap

The first line of Rimaru’s short read is this: “The existence of a Black-White IQ gap in the US is not controversial at all.” This is true. However, it is important to remember that this is also not interesting in itself. The reason is simple: IQ is not an attribute, it is a score; it is merely the result of a summing procedure for items on an IQ test. To clarify why this is not necessarily interesting, consider a corresponding gap in IQ between northeast Asians and Whites; this gap is typically on the order of a 6 IQ point advantage for Asians. But, the sources of this gap are not the same as the sources of the gap between Blacks and Whites — that is, psychometric general intelligence (g). Instead, the differences between Asians and Whites come primarily (but not wholly) from psychometric s, or the sources of common variance measured with IQ tests which are not general. Anyone faintly familiar with this line of research knows that psychometric g is the source of the majority of the criterion validity for IQ scores, not s. Viewing only the IQ scores themselves is a surefire way to conflate meaningful differences likely to predict something, and differences which are not. If this presumably minor detail isn’t recognized, it can lead hereditarians to confusion about what to predict regarding group performance differences in addition to their causes. This is a petty quibble, but it is important to accurately specify the referent in group comparisons, lest one allow their own destitution in future arguments where someone hands them a “gotcha” due to their lack of specificity.

Within- and Between-Group Heritability

Somehow in his brief excursion into the relationship between within- and between-group heritability, Rimaru fails to actually supply its formula. As originally given in this context by DeFries (1972), the formula is

Taken from Jensen (1975)

In this formula h²B and h²W are the heritabilities between and within groups, r is the genetic intraclass correlation, and t is the phenotypic intraclass correlation. There are several important things to note about this formula like that positive gene-environment covariance leads to an underestimation of h²B and if it’s negative, obviously the converse is true and h²B is overestimated. Another important thing — which is relevant to what Rimaru says just a few lines later — is that the “h” in “h²B” and “h²W” is lower-case. This signifies that this is the narrow-sense, or additive heritability, not the broad-sense heritability, commonly denoted with H. If the heritability of traits is due to a combination of additive and non-additive factors — or more importantly, if population differences are formed by, say, dominance and epistasis — then this formula cannot describe the whole of the group differences. Since selection tends to erode non-additive variance like a sandcastle in the waves, we should expect that if it acted to differentiate populations, then this formula would be, at best, partially valid. Most importantly, is that the h²B of the group difference depends on much more than h²W. This can be demonstrated with empirical examples.

Consider a purely additive 50% heritable trait with population differences of 0.1, 0.5, 1.0, 1.5, and 2.0 in Cohen’s d. The genetic intraclass correlation coefficient, r, corresponds to 2Fst, which we will hold constant at 0.05. At each of these levels of trait differentiation, h²B equates to >1, 0.89, 0.20, 0.10, and 0.05: as the trait becomes more phenotypically differentiated, the proportion attributable to genes can become vanishingly small if it isn’t accompanied by genetic differentiation between the groups. Holding, this time, the difference between populations constant at 1 d, and now varying the differentiation (in terms of r) for the trait being inspected to levels of 0.01, 0.03, 0.05, 0.07, 0.10, 0.15, 0.25, and 0.40, we arrive at values of h²B of 0.02, 0.06, 0.11, 0.15, 0.20, 0.35, 0.60, and >1. Now, holding both constant at 1 d and 0.10 but varying h²W from 0.10 to 1 in increments of 0.1, we arrive at levels of h²B of 0.04, 0.08, 0.13, 0.17, 0.20, 0.26, 0.31, 0.35, 0.40, and 0.40: h²B was barely 40% despite total h²W.

It is imminently true that there are rising constraints, but that model describes, merely, the required size of some environmental variable to explain group differences at a certain level of h²W, not the statistical relationship between h²W and h²B. The plausibility — which differs between people, just like answers to the stilt-height question — is irrelevant; there is environmental variance, and in it could lie the answer. This is the essence of what Mackenzie (1984) called the hereditarian fallacy. The important counter here is to note other facts. The equality of variance components in some trait by groups is an excellent start; the nature of those components likewise presents an excellent argument. For instance, if 50% of the variance in some trait is attributable to additive genetics and the residual 50% to environmental influences which are not systematic with respect to sibship (i.e., the unshared environment), then it’s difficult to surmise an influence which could fit in that component and explain the group differences because it must, as such, be systematic with respect to race, whilst being unsystematic with respect to siblings. If such a variable (or set of systematically-directed variables) existed, it would lead to a considerable expansion of the variance in the trait for one group but not the other, and depending on statistical power, this may or may not be detectable in a given sample. A more interesting and conclusive way to test for this would be to assess the variance within sibling pairs, comparing sibling intraclass correlations in groups with disparate means, and assessing the similarity of sibling (and parent) regression to the mean; if these are equal, then the unshared variance (which cannot drive regression to the mean) is not systematic with respect to group, negating the presence of an explanatory mystery variable between the groups. If this is your argument, assemble the evidence (this has been done), but don’t leave premises unstated, and always push the argument along to the point where a determinate conclusion can be reached. The problem with these specific disconfirmations of mysterious explanations is that they also require power which is desperately lacking in the majority of existing datasets, so as a means of disconfirming, they fall short, and similarly, at some sample size, they may confirm a difference, as no null hypothesis is actually true.

A more important thing to note about mystery variables is that they must be specified to mimic the differences within groups if there is measurement invariance. Rimaru alludes to this with reference to Lubke et al. (2003) and the Rowe studies, but fails to consider the meanings of the theories in these works. Regarding the chronologically former, the method suffers from the same problem as testing for differences in variances which should result from these mysterious X factors: power. If you have too little the test isn’t meaningful. But just as well, depending on the measures, Rowe’s method can indicate artefactual differences. For example, it is commonly noted that parental socioeconomic status is less predictive for Blacks and more predictive for Asians. An assumed reason is social, and per Rowe’s method, this would indicate some proof in favor of social influence, but an explanation predicting the extent of the effect is regression to the mean, and this appears to be accurate. In either case, with a large enough sample, Rowe’s test will show some evidence for a difference in the “developmental process” and with a small enough one, it won’t work to tell us anything.

Regarding Lubke et al., the fact that a mysterious explanatory variable does not exist is evidenced by the invariance of a given factor model for some assemblage of tests whose constructs are modeled in it. However, this does not rule out an X factor that affects only latent constructs. The fact that these constructs have the same genetic and environmental components and that they don’t show considerable differences in variances in well-powered tests does, but this is little-tested. Bifactor models of intelligence have a particularly interesting place in testing this sort of effect, as, in a multi-group confirmatory factor model with multiple groups, the power to detect a difference in latent variances is greatly expanded, and if measurement invariance can be confirmed, a well-powered test for X factors is allotted, as the bifactor model does not have covariances among the factors, so their inequality — which could be due to an X factor — is ruled out. Until this is done, all but the largest X factors may be ruled out. The lucky point of this sort of analysis is that it is not affected by sample size in the same way the Rowe studies are because many common goodness-of-fit measures for structural equation models move independently from sample sizes, making this test possible, but it has still not been done. Moreover, with many different groups, the likelihood of some parameter failing to show invariance increases.

In this section, Rimaru confused the broad-sense and additive heritabilities. Panizzon et al. — whom he cited — did not assess the broad-sense heritability, and importantly, they found that the construct of g was 86% heritable in their sample, not that IQ — a score, not a construct — was 80% heritable. Moreover, a model like theirs is not confirmed to be correct for blacks and whites just because standard estimates of genetic and environmental influence seem to be equal; it is necessary to fit a model like this for both groups simultaneously. Finding invariance between those groups in that model would then justify the equivalence of aetiological influences for them, but until that is done, it’s only an assumption. Moreover, the power to detect an X factor in that model (which could take the form of, say, a covariance between genetic influences on latent constructs which differs by groups) still needs to be addressed.

This entire section by Rimaru was a plausibility argument, with the evidence insufficient to drive home a strong point. It is possible to address these holes in the evidence, but it hasn’t been done yet. So far, his evidence is still just indirect, even if it’s plausibly consistent.

Global Admixture

This section is grossly inappropriate. First, there is confusion about what constitutes an admixture analysis. Second, direct and indirect evidence are confused. Third, the evidence is made up. Fourth, global admixture is triply confounded in ways that can be but have not been addressed.

Firstly, Shuey and Rowe did not perform admixture analyses. Their analyses took the form of assessing whether mixed children scored in-between the races of their parents, as expected under an admixture model. However, these studies were all small and they deviated from a pure admixture model conclusion, with estimates affected by the uncertainty of the estimate due to sampling variance, systematic variance from selection of the parents into mixing, potential bias in the tests, and the potential for parental effects which are mirrored by the proposed genetic dispositions of members of either parent’s race, which make the relationship between expectations and the findings indeterminate as to their cause. This is not a major issue, but it does need to be clarified, and these old studies like this need to be meta-analyzed to assess how they conform to expectations and why they may differ.

Global admixture estimates computed from actual genetic data are not direct evidence for genetic differentiation, they’re indirect evidence (many papers interpret this correctly, like this). The way global admixture evidence is used to compute h²B is by computing expected versus observed correlations between admixture and values for a trait. For example, imagine you have DNA from a population of Nzime and a population of Baka, with a 2 d difference in height and an admixture cline in the Baka population, where, situated closer to the Nzime, they have relatively more admixture from the Nzime. With a mean of 20% Nzime admixture with an SD of 10% in the Baka and a mean of 100% with an SD of, for this example, zero, the expected correlation would be r = 0.25 within the admixed Baka group if the difference was entirely genetic. Obviously interpolation of the difference is not direct evidence based on how the word “direct” is used in the sciences.

Now, we can complicate this scenario. Three factors make admixture estimates difficult to interpret:
1) Assortative mating
2) Self-identification by admixture or one of its correlates
3) Range restriction
Assortative mating can work such that, for example, if there is a zone between two populations where population mixing occurs, if individuals from one population are selected based on the trait being investigated and the others are not, an admixture-trait relationship can emerge due to nothing more than that selection, even if there is no relationship within either population. This can work in either direction and can be disqualified indirectly by discussing the evidence for this sort of assortative mating, or directly, by assessing polygenic scores for local ancestry segments derived from the population whose ancestry is the independent variable in the global ancestry regression; this ought to be done eventually for both types of admixture.

Self-identification effects can lead to a mischaracterization of the departure from a null of no relationship between admixture and a trait. For example, if people are more likely to identify with a given racial or ethnic group by the level of admixture — or a closely aligned trait such as skin colour — from one group or another, the selective identification can induce the effect (or reduce it). Assessing what the relationship would be at different ancestry cutoffs assuming environmental stratification due to selective self-identification (i.e., no relationship within a group, just between the groups) and comparing that with what it would be with heritable stratification is one way to test this (since there should be no relationship at early cutoffs assuming a decently strong probabilistic relationship between admixture and self-identification with some group). Another is to perform cross-validation leaving some-fold out, but the amount of people to remove is uncertain, and at some point, it may affect the variance, which will usually already be constrained, contributing to the likelihood of type-II error. Simulating expected coefficients and comparing them to the value observed seems like a simple way to get around this issue.

Range restriction is one of the most major problems for global admixture analyses. This problem should, in general, lead to the underestimation of h²B from global admixture. For example, in white Americans, the mean level of European admixture is approximately 99% with almost no variability. Even sampling a billion Europeans would not deliver sufficient power to estimate an admixture effect with a true size of, say, r = 0.10. This is similarly the case comparing siblings. Not all populations will, thus, be amenable to checking the relationship between global admixture expectations and observations for accuracy and to appropriately calculate h²B, as the relationship can‘t even approach its’ true value.

Rimaru makes two additional tendentious claims in this section. Namely:

The way we know confounding factors are not at fault for this is because most of the variance is between rather than within sibling pairs.

and

Furthermore, consider this paper which found that education differences between European countries are partly genetic.

The latter claim is simply inaccurate. That paper he linked talks quite a bit about population stratification and the claim that the differences between European countries are due to genetics is as consistent with the evidence as differences in “education, cultural norms, the relationship between education and GDP, and discrimination within the educational system” and “dynastic effects.” Given that the outcome is so abundantly social in nature, environmental explanations are quite plausible. If, for example, one country subsidises education and another does not, but they are otherwise equal, we will still see a difference, and if the subsidising nation has a population with a lower level of differentiation from the discovery population, their scores may be higher tautologically (or vice-versa, since the effect can go in the opposite direction too).

The former claim makes little sense. Most of the variance is, of course, between rather than within sibling pairs. Siblings share genetics, and thus typically show a correlation in outcomes as a result, and thus a smaller variance compared to the population. The point of the linked paper is to assess whether there’s a within-family relationship between colour — a proxy for admixture — and intelligence, and there is not, indicating that the effect is familial, but the reason for the familiality is not certain, as it could be due to the family environment. If we have samples of monozygotic (identical) and dizygotic (fraternal) twins, they can be used to assess the reduction in the relationship between an outcome and a trait with its heritability in mind in order to see if the reduction in the relationship is greater than what’s predicted based on that. If it is not, then it’s likely that the familiality is genetic, and if it is, then there’s also an environmental element, and with appropriate data and designs, these can be disentangled, even accounting for potential confounding from, say, gene-environment correlations. All that paper can say with its sample — assuming it is well-powered — is that colour does not causally influence g.

Correspondence with the Wilson Effect

It does not seem like Rimaru has thought through the relationship between the Wilson effect and group differences in light of the age distribution of group differences. Dickens & Flynn’s evidence that the Black-White gap grew with age was insufficient, as it did not actually cover the growth of the gap with age, nor the growth of the gap at the construct level if anything. They neglect other samples and tests and come to a conclusion prior to the advent of sufficient evidence. In 2013, Jason Malloy more comprehensively assessed this question and found a fully-developed gap by age three, with citations of longitudinal analyses supporting gap stability with age — much more strongly than a handful of results and a glut of extrapolations from norming samples — included. Since then, Cottrell, Newman & Roisman (2015) found a similar result, with the gap in g from age four practically fully developed with little change by age fifteen. There aren’t many assessments of this, but the longitudinal evidence — and the majority of the cross-sectional evidence — seem to agree that the gap is developed early (the growth is also consistent with the observation of the Jensen effect since this can describe the direction of the effect on cognitive components of increasing or decreasing some variance component).

This generates a contrast between this result and observations from cross-racial adoption studies. These studies (of which there are but a handful) have usually found reduced IQ gaps which grow with age (e.g.). The way this squares with the Wilson effect is that the shared environment is a systematic influence of the parental environment which, in the situation of cross-racial adoption in which the child’s shared genes from their parents are separated and that source of gene-environment covariance goes away, they reflect the influence from their adoptive parents instead, yielding reduced group differences. The fade towards normal-sized gaps is, thus, due to a reduction in shared environmental variance (this also occurs in virtual twins, so it is quite real). I don’t know where Rimaru got the idea that there was a “perfect correlation” between the Wilson effect and the Black-White IQ gap. This just seems like a restatement of the increase in heritability with age and the alleged increase in the Black-White IQ gap along the same grounds, which he seems to have been confused about.

Subtest Heritability and g

A more common term for this relationship is the “Jensen effect.” For it to say anything about group differences requires some transitivity between them, or a common pathway model to be true. Corrections are generally the only way these correlations can achieve the requisite level of >0.71 to be considered surely transitive, but these are rarely reached. For example, Jensen found that the correlation between group differences and subtest g loadings was r = 0.63 in The g Factor, and the relationship between g loadings and heritabilities seems to be somewhat less, at approximately r = 0.25 to 0.35 in the analysis by Kan et al. (2013). Corrections for reliability affirm the consequent and those correlations can thus not be used for this (nor is significance computed correctly in Kan et al., but this does not affect the relevant part, the r). Regardless, these vector correlations do not correspond to the proportion of group differences due to g; take Frisby & Beaujean (2015), whose data showed correlations, r and ρ, of 0.58 and 0.62, but, based on their confirmatory factor model of Spearman’s hypothesis, 75% of group differences due to g. These results do not square, but one method delivers a result that can be readily interpreted (MGCFA) and the other does not (MCV). In fact, the latter relationships can be reduced to nothing even if they’re substantial if there are, for example, some strong group factors beyond g.

Even if we grant this method and its conclusion, the interpretation that “the gap is on g, and only genetic factors have been found to correlate with g, the obvious conclusion is that the gap is genetic” is foolhardy. These things may make a conclusion likely, but they do not make it obvious — especially when there are alternative explanations for how those relationships should arise, and there may be as-yet undocumented influences that act differently (see Mackenzie, 1984, again). The use of the term “magical environmental factor” is just insulting and that’s all it’s intended to be. There hasn’t, in Rimaru’s piece, been any reason stated as to why that factor should be construed in that way. There are reasons why it could be, but he didn’t state them, and I, frankly, do not believe he could.

Stability

Going from gap stability to the “gap is thus likely due to genetic factors” is simply not possible. The one does not imply the other. Although I do not believe it because I’m privy to more of the data than most, a plausible alternative argument is that the relative environments of Blacks and Whites which are causal for g have not changed. Anyone willing to accept this has to exclude a wide array of factors including education and discrimination for which there is excellent evidence of growing equality and a decline, but they can ride on things like wealth and an ad hoc move towards areas, groups (particularly via sampling complaints), and time periods where there’s little evidence. For instance, they could use the data from Kuhn, Schularick & Steins (2018) and claim that the lack of convergence in wealth is a persistent driver of the Black-White gap and that it would have been larger in the past where there’s a paucity of data. This would be strictly untrue because there is data with which to construct measurement models from then and they don’t agree, and equating in terms of wealth does not close the gap (nor could equating show that since it genetically confounded), but it is seemingly legitimate and plausible when neither side is looking over all of the evidence. Moreover, the inclusion of other groups necessarily suggests that this explanation has no legs. Take Hispanics and northeast Asians in the United States. When Jensen’s Educability and Group Differences came out in 1973, Hispanics were poorer than Blacks and northeast Asians were poorer than Whites, but they had higher IQs than these comparison groups. Years later, Hispanics are richer than Blacks and Asians than Whites (although some will contest this in terms of specific measures of “wealth” in an ad hoc fashion), consistent with convergence to expectations based on IQ instead of those based on socioeconomic status.

The failure to acknowledge gaps in knowledge and plausible alternatives is just embarrassing. The failure of a gap to close is supportive of a role for genetic factors and it disqualifies numerous environmental explanations, but it is not itself evidence for a genetic explanation. Perhaps broadening the discussion to talk about and present evidence regarding more groups can help to push this discussion along.

I don’t care much about what else Rimaru wrote, but the piece is in general vague, and often misleading and the logic is weak and rarely extends beyond quotes to inferences. I am completely unsure how it can be concluded based on the evidence presented and how it was presented that “the gap has a significant genetic influence.” There was simply nothing that justified that or attempted to. Even calculating h²B a single time would have justified the claim, but this was never done; instead, short and overly specific arguments were presented which didn’t do anything to establish his point. The two penultimate sentences are just strange; these said

The Hereditarian Hypothesis makes a lot of risky predictions, and it comes up on top. Meanwhile, the Environmentalist hypothesis bends itself at every new finding that goes against it.

These “risky predictions” are absent from Rimaru’s article. Instances of “the Environmentalist hypothesis” bending itself at new claims are also absent. Even clear instances of things that are difficult to explain for environmentalists, like the Jensen effect, are only used in a handwaving fashion and they are, in fact, explicable in some models of intelligence considered to be more environmentalist-friendly, like mutualism.

In toto, Rimaru’s article is a series of overstatements and scant little empirical content, failing to connect-the-dots between bits of evidence or to admit to serious limitations and alternative explanations. There is a kernel of a case within the piece, but at the moment, it serves as nothing more than a disjointed collection of plausibility-based indirect arguments that fail to constitute a decent case for what he believes they should because they aren’t properly tied together if they’re able to be at all.

As I alluded at the beginning of this little response, the interpretations of practically all of his evidence are unwieldy and can be interpreted differently in ways which aren’t consonant with his ultimate conclusion, so without elaboration of the questions he seems to be asking, there’s not much to take away or learn from him; without explanations and observations that disqualify some of the alternatives to the phenomena he has basically just listed off, I can’t see how his piece could be anything but a source of confusion and a waste of valuable reading time.

--

--