
We are searching data for your request:
Upon completion, a link will appear to access the found materials.
I am reading chapter 7 of Falconer and Mackay where they give a formula to calculate the population mean as a deviation from the heterozygote genotypic value.
As an example, imagine a one locus two allele trait, with alleles $A_1$ and $A_2$, with respective frequencies $p$ and $q$, and genotypic values ($ heta$) of $A_1A_1 = heta_{1,1} = 0.5$, $A_1A_2 = heta_{1,2} = 0.4$, and $A_2A_2 = heta_{2,2} = 0.3$.
The mean is given by
$ M = a_1(p-q) + 2pqd$
where $a_1$ is the deviation between the genotypic value of $A_1$ homozygotes and genotypic value heterozygotes, and, because $d=0$ (the dominance deviation), this equation reduces to
$ M = a_1(p-q)$
For my above example, if $p = 0.7$ then $M = 0.04$ and the population mean should be
$frac{1}{2} heta_{1,1} + frac{1}{2} heta_{2,2} + M = 0.44$
$frac{1}{2} heta_{1,2} + M = 0.44$
Is this correct?
What is the correct term and notation for the genotypic values (which I termed $ heta$)
Implications of the difference between true and predicted breeding values for the study of natural selection and micro-evolution
The ability to predict individual breeding values in natural populations with known pedigrees has provided a powerful tool to separate phenotypic values into their genetic and environmental components in a nonexperimental setting. This has allowed sophisticated analyses of selection, as well as powerful tests of evolutionary change and differentiation. To date, there has, however, been no evaluation of the reliability or potential limitations of the approach. In this article, I address these gaps. In particular, I emphasize the differences between true and predicted breeding values (PBVs), which as yet have largely been ignored. These differences do, however, have important implications for the interpretation of, firstly, the relationship between PBVs and fitness, and secondly, patterns in PBVs over time. I subsequently present guidelines I believe to be essential in the formulation of the questions addressed in studies using PBVs, and I discuss possibilities for future research.
Abstract
Better understanding of the genetic control of traits in breeding populations is crucial for the selection of superior varieties and parents. This study aimed to assess genetic parameters and breeding values for six essential traits in a white Guinea yam (Dioscorea rotundata Poir.) breeding population. For this, pedigree-based best linear unbiased prediction (P-BLUP) was used. The results revealed significant nonadditive genetic variances and medium to high (.45–.79) broad-sense heritability estimates for the traits studied. The pattern of associations among the genetic values of the traits suggests that selection based on a multiple-trait selection index has potential for identifying superior breeding lines. Parental breeding values predicted using progeny performance identified 13 clones with high genetic potential for simultaneous improvement of the measured traits in the yam breeding program. Subsets of progeny were identified for intermating or further variety testing based on additive genetic and total genetic values. Selection of the top 5% progenies based on the multi-trait index revealed positive genetic gains for fresh tuber yield (t ha −1 ), tuber yield (kg plant −1 ), and average tuber weight (kg). However, genetic gain was negative for tuber dry matter content and Yam mosaic virus resistance in comparison with standard varieties. Our results show the relevance of P-BLUP for the selection of superior parental clones and progenies with higher breeding values for interbreeding and higher genotypic value for variety development in yam.
Abbreviations
Results
In this section, we use empirical data and simulations of the toy-model to show that most of the heritability estimators borrowed from classical quantitative genetics are prone to significant bias, because they neglect or inaccurately model the change in resemblance between transmission partners caused by within-host evolution of the pathogen. Based on the toy-model simulations, we designate the intraclass correlation in the closest phylogenetic pairs (CPPs) and the phylogenetic heritability, H OU 2 ( t ¯ ) , measured by the phylogenetic Ornstein–Uhlenbeck mixed model (POUMM) ( Mitov and Stadler 2016 Blanquart et al. 2017) as the most reliable estimators of pathogen trait heritability. Based on applying these estimators to a large HIV cohort, we establish a lower bound for the lg(spVL)-heritability.
Through the rest of the article, we use the symbol dij to denote the phylogenetic distance between two tips, i and j, on a transmission tree ( fig. 1). dij summarizes the total evolutionary distance between two infected hosts at the moment of measuring the trait value and is measured in substitutions per site for real trees and arbitrary time units for simulated trees. We begin our report with a result from HIV data demonstrating the relevance of within-host evolution for estimating heritability.
The lg(spVL) Correlation in HIV Phylogenetic Pairs Decreases with dij
We used one-way analysis of variance (ANOVA, rA) and Spearman correlation (rSp) to estimate the correlation in phylogenetic pairs (PP) extracted from a recently published transmission tree of 8,483 HIV patients ( Hodcroft et al. 2014). As defined in Shirreff et al. (2013), phylogenetic pairs represent pairs of tips in the transmission tree that are mutually nearest to each other by phylogenetic distance (dij) ( fig. 1). We ordered the PPs by dij and split them into ten strata of equal size (deciles), evaluating the correlation between pair trait values (rA and rSp) in each stratum. The point estimates and the 95% confidence intervals (CI) are shown with black and magenta points and error bars on figure 3. Dashed horizontal bars denote the 95% CI for rA evaluated on all phylogenetic pairs. Despite some irregularities, there is a well pronounced pattern of decay in the correlation—strata to the left (small dij) tend to have higher rA values than strata to the right (big dij). The values of rA closely matched the values from other correlation estimators, such as DR (b) and the Pearson product mean correlation (r) (results not shown). We performed ordinary least squares regressions (OLS) of the values r A , D k and r Sp , D k on the mean phylogenetic distance, d i j , k ¯ , in each stratum, k = 1 , … , 10 . The slopes of both regressions were significantly negative (P<0.05) and are shown as black and magenta lines on figure 3. Similar slopes were obtained when using other stratifications of the data ( supplementary fig. S1 , Supplementary Material online).
Correlation between lg(spVL)-values in HIV phylogenetic pairs. A sample of 1917 PPs with lg(spVL)-measurements from HIV patients shows a decrease in the correlation (ICC) between pair trait values as a function of the pair phylogenetic distance dij. The point estimates and 95% CIs in ten strata of equal size (deciles) are depicted as points and error bars positioned at the mean dij for each stratum, d i j ¯ . Black and magenta points with error-bars denote the estimated rA and rSp in the real data. Dashed horizontal bars denote the 95% CI for rA evaluated on all phylogenetic pairs. A black and a magenta inclined line denote the least squares linear regression of rA and rSp on d i j ¯ . Brown and green points with error bars denote the estimated values of rA obtained after replacing the real trait values on the tree by values simulated under the maximum likelihood fit of the PMM and the POUMM methods, respectively (mean and 95% CI estimated from 100 replications). A brown and a green line show the expected correlation between pairs of tips at distance dij, as modeled under the ML-fit of the PMM and the POUMM (eqs. 2 and 3). A light-brown and a light-green region depict the 95% high posterior density (HPD) intervals inferred from Bayesian fit of the two models (Materials and Methods).
Correlation between lg(spVL)-values in HIV phylogenetic pairs. A sample of 1917 PPs with lg(spVL)-measurements from HIV patients shows a decrease in the correlation (ICC) between pair trait values as a function of the pair phylogenetic distance dij. The point estimates and 95% CIs in ten strata of equal size (deciles) are depicted as points and error bars positioned at the mean dij for each stratum, d i j ¯ . Black and magenta points with error-bars denote the estimated rA and rSp in the real data. Dashed horizontal bars denote the 95% CI for rA evaluated on all phylogenetic pairs. A black and a magenta inclined line denote the least squares linear regression of rA and rSp on d i j ¯ . Brown and green points with error bars denote the estimated values of rA obtained after replacing the real trait values on the tree by values simulated under the maximum likelihood fit of the PMM and the POUMM methods, respectively (mean and 95% CI estimated from 100 replications). A brown and a green line show the expected correlation between pairs of tips at distance dij, as modeled under the ML-fit of the PMM and the POUMM (eqs. 2 and 3). A light-brown and a light-green region depict the 95% high posterior density (HPD) intervals inferred from Bayesian fit of the two models (Materials and Methods).
The above result shows that the value of a heritability estimator based on the correlation within phylogenetic pairs (including DR couples) depends strongly on dij. Another issue of all estimators of H 2 using the correlation in phylogenetic or DR pairs is that the underlying statistical methods require independence between the pairs—the trait values in one pair should not influence or be correlated with the trait values in any other pair. This assumption is not valid in general, due to the phylogenetic relationship between all patients. One way to mitigate the effects of phylogenetic relationship between pairs is to limit the analysis to the closest pairs (i.e., pairs, for which dij does not exceed some user specified threshold). This approach has the drawback of omitting much of the data from the analysis. As an alternative taking advantage of the entire tree, it is possible to correct for the phylogenetic relationship by using a phylogenetic comparative method (PCM). PCMs attempt to solve both of the above problems, because they 1) incorporate the branch lengths in the transmission tree to model the variance–covariance structure of the data and 2) correct for the phylogenetic correlation when estimating evolutionary parameters or the phylogenetic heritability of the trait ( Felsenstein 1985 Housworth et al. 2004 Alizon et al. 2010). These advantages of the PCMs come at the price of assuming a specific stochastic process as a model of the trait evolution along the tree. In the next subsection, we show that assuming an inappropriate process for the trait evolution can cause a significant bias in the estimate of phylogenetic heritability.
A Brownian Motion Process Cannot Reproduce the Decay of Correlation in the UK Data
We implemented a maximum likelihood and a Bayesian fit of the PMM ( Lynch 1991 Housworth et al. 2004) and its extension to an Ornstein–Uhlenbeck model of evolution (POUMM) ( Hansen 1997 Mitov and Stadler 2016 Blanquart et al. 2017). The PMM and the POUMM assume an additive model of the trait values, z ( t ) = g ( t ) + e , in which z(t) represents the trait value at time t for a given lineage of the tree, g(t) represents a heritable (genotypic) value at time t for this lineage and e represents a nonheritable contribution summarizing the effects of the host and his/her environment on the trait and the measurement error. The only difference between the two models is their assumption about the evolution of g(t) along the branches of the tree—the PMM assumes a Brownian motion process the POUMM assumes an Ornstein–Uhlenbeck process ( Uhlenbeck and Ornstein 1930 Lande 1976 Hansen 1997).
Using the maximum likelihood estimates of the model parameters ( supplementary table S1 , Supplementary Material online), we simulated random trait trajectories on the UK tree, running 100 replications for each model. For each replication, we estimated the correlation, rA, in PPs using the simulated values instead of the real values. The resulting correlation estimates are shown on figure 3 as brown and green points and error bars for the PMM and POUMM simulations, respectively. We notice that there is a significant difference between the correlation estimates of the two models. In particular, in the leftmost decile the POUMM estimate is significantly higher than the PMM estimate (the POUMM 95% CI excludes the PMM estimate).
The last approximation in equation (5) follows from the fact that the term exp ( − 8.35 + 36.47 d i j ) is nearly 0 for the range of phylogenetic distances ( d i j ∈ [ 0 , 0.14 ] ) in the UK tree (see supplementary information , Supplementary Material online, for further details on the above approximations).
Equations (4) and (5) represent a linear and an exponential model of the correlation as a function of dij. The values of these equations at dij=0 are equal to the phylogenetic heritabilities estimated at the mean root-tip distance t ¯ under PMM and POUMM (details on that later). The slope of the linear model ( eq. 4) equals −0.36 (95% HPD [−0.58, −0.21]). The rate of the exponential decay ( eq. 5) equals the POUMM parameter α=28.78 (95% HPD [16.64, 46.93]) and the half-life of decay equals ln ( 2 ) / α = 0.02 substitutions per site (95% HPD [0.01, 0.04]).
Plotting the values of equations (4) and (5) and their 95% HPD intervals on figure 3 reveals visually that the POUMM fits better to the data than the PMM. Statistically, this is confirmed by a lower Akaike Information Criterion (AICc) for the POUMM fit and a strictly positive HPD interval for the OU parameter α ( supplementary table S1 and fig. S8, Supplementary Material online). The slope of the linear model derived from the PMM fit ( eq. 4, brown line on fig. 3) is nearly flat compared with the slopes of the two OLS fits (black and magenta lines on fig. 3). To explain this, we notice that in PMM, the covariance in phylogenetic pairs and the variance at the population level are modeled as linear functions of the root-mrca distance (tij) and the root-tip distance (t) (numerator and denominator in eq. 2). Importantly, both of these linear functions are bound to the same slope parameter, σ 2 . As it turns out, in the UK data, the covariance and the variance increase at different rates with respect to tij and t (see supplementary fig. S2 and supplementary information , Supplementary Material online). We conclude that the PMM is not an appropriate model for the correlation in phylogenetic pairs, being unable to model the above difference in the rates.
In the limit d i j → 0 , a phylogenetic pair should be equivalent to a DR couple at the moment of transmission, that is, before the genotypes in the two hosts have diverged due to within-host evolution. Thus, it appears reasonable to use an estimate of the correlation at dij=0 as a proxy for the broad-sense heritability, H 2 , in the entire population. This idea has been applied in previous studies of HIV ( Hecht et al. 2010 Hollingsworth et al. 2010 Bachmann et al. 2017 Blanquart et al. 2017) as well as malaria ( Anderson et al. 2010). One potential obstacle to this approach is the possibility of introducing a sampling bias by filtering of the data. For example, if the study is on a trait, which evolves toward higher values during the course of infection, patients with lower trait values would tend to be more frequent among the CPPs than in the entire population. Thus, there is no guarantee that the trait distribution and, therefore, the heritability measured in the CPPs equals the heritability in the entire population. This problem of sampling bias affects both, resemblance-based as well as the currently used phylogenetic comparative methods. This suggests that the approach of imposing a threshold on dij or estimating the correlation (rA, rSp or another correlation measure) at dij=0 needs further validation. In the next subsection, we use simulations of the toy model to show that sampling bias, although present, is comparatively small with respect to the negative bias due to measurement delay.
ANOVA-CPP and POUMM Are the Least Biased Heritability Estimators in Toy-Model Simulations
Grouping of the trait values by identical pathogen genotype. We evaluated the coefficient of determination adjusted for finite sample size, R adj 2 , and the intraclass correlation (ICC) estimated using one-way ANOVA, r A [ id ] . The main difference between these two estimators is the ANOVA assumption that the group-means (genotypic values) are sampled from a distribution of potentially many more genotypes than the ones found in the data. In contrast, R adj 2 assumes that all genotypes in the population are present in the sample. Since the latter assumption is true for the simulated epidemics, R adj 2 represents the reference (true) value of H 2 to which all other estimates are compared.
Known DR couples. We evaluated the regression slope of recipient on donor values in three ways: 1) b—based on the trait values at the moment of diagnosing the infection 2) b0—based on the trait values right after the transmission events and 3) b d i j ′ —based on the subsample of diagnosed couples having dij not exceeding a threshold d i j ′ . Based on a trade-off between precision and bias, we specified d i j ′ = D 1 , D1 denoting the first decile in the empirical distribution of dij (see supplementary information , Supplementary Material online).
Phylogenetic pairs (PPs) in T 10 k . We evaluated ICC using ANOVA in three ways: 1) rA—based on all PPs 2) r A , D 1 —based on CPPs defined as PPs in T 10 k having dij not exceeding the first decile, D1 and 3) r A , 0 , lin —the estimated intercept from a linear regression of the values r A , D k on the mean values d i j , k in each decile, k = 1 , … , 10 For the latter two estimators, which attempt to estimate rA at dij=0, we use the acronym ANOVA-CPP. As an alternative to ANOVA, which is more robust to outliers (e.g., extreme values at the tails of the trait distribution), we evaluated the Spearman correlation in the first decile, hereby denoted as r Sp , D 1 .
Transmission tree T 10 k . We evaluated the phylogenetic heritability based on the ML fit of the PMM and POUMM models. Specifically, we compared the classical formula evaluated at the mean root-tip distance t ¯ in the tree (eqs. 10 and 12) ( Housworth et al. 2004 Leventhal and Bonhoeffer 2016) and the empirical formula based on the sample trait variance, s 2 (z) (eqs. 11 and 13) (described in Materials and Methods). For the PMM, we denote these estimators by H BM 2 ( t ¯ ) and H BM e 2 for the POUMM, we use the symbols H OU 2 ( t ¯ ) and H OU e 2 :
Table 1 summarizes the mathematical definition and the assumptions of the above estimators. A more detailed description of the PMM and the POUMM methods is provided in Materials and Methods. The referenced textbooks on quantitative genetics ( Lynch and Walsh 1998) are excellent references for the other methods.
Tested Estimators of the Broad-Sense Heritability of Pathogen Traits.
Input Data . | Method (Abbreviation) . | Assumptions . | Estimator . |
---|---|---|---|
Grouping by identical infecting strain | Adjusted coefficient of determination | The sample of data contains all genotypes present in the population | R adj 2 = 1 − N − 1 N − K s 2 ( z − G ^ ) s 2 ( z ) (6) |
One-way analysis of variance (ANOVA) | Independently sampled genotypes | r A [ id ] = ( M S b − M S e ) / n ( M S b − M S e ) / n + M S e (7) | |
i.i.n.d. trait-values within each group | |||
Equal within-group variances (homoscedasticity) | |||
Known donor–recipient couples | Donor–recipient regression (DR) | Independently sampled donor–recipient couples | |
Equal residual variance across the range of donor-values (homoscedasticity) | b = s ( z don , z rcp ) s 2 ( z don ) , (8) | ||
Equal donor and population variances | variants: b , b 0 , b d i j ′ | ||
Phylogenetic pairs (PPs) | ANOVA on all/closest PPs (ANOVA-PP, ANOVA-CPP) | ANOVA assumptions (see above) | Defined as in equation (7), but calculated on PPs |
variants: r A , r A , d i j ′ | |||
Spearman correlation on all/closest PPs | PPs are independent from one another | Pearson (product mean) correlation, calculated on the ranks of the trait-values. | |
variants: r Sp , r Sp , d i j ′ | |||
Linear regression of rA on dij upon a stratification | rAdepends linearly on dij | The intercept, r A , 0 , l i n , from the OLS fit of the model | |
Equal residual variance across the range of dij | r A ( d i j ) = r A , 0 , l i n + ω 1 d i j . (9) | ||
Transmission tree | Phylogenetic mixed model (PMM) | Branching BM evolution | H BM 2 ( t ¯ ) = t ¯ σ 2 / ( t ¯ σ 2 + σ e 2 ) (10) |
i.i.n.d. distributed environmental deviation, e ∼ N ( 0 , σ e 2 ) | H BM e 2 = 1 − σ e 2 / s 2 ( z ) (11) | ||
Phylogenetic Ornstein–Uhlenbeck mixed model (POUMM) | Branching OU evolution | H OU 2 ( t ¯ ) = σ 2 ( 1 − exp ( − 2 α t ¯ ) ) σ 2 ( 1 − exp ( − 2 α t ¯ ) ) + 2 α σ e 2 (12) | |
i.i.n.d. environmental deviation, e ∼ N ( 0 , σ e 2 ) | H OU e 2 = 1 − σ e 2 / s 2 ( z ) (13) |
Input Data . | Method (Abbreviation) . | Assumptions . | Estimator . |
---|---|---|---|
Grouping by identical infecting strain | Adjusted coefficient of determination | The sample of data contains all genotypes present in the population | R adj 2 = 1 − N − 1 N − K s 2 ( z − G ^ ) s 2 ( z ) (6) |
One-way analysis of variance (ANOVA) | Independently sampled genotypes | r A [ id ] = ( M S b − M S e ) / n ( M S b − M S e ) / n + M S e (7) | |
i.i.n.d. trait-values within each group | |||
Equal within-group variances (homoscedasticity) | |||
Known donor–recipient couples | Donor–recipient regression (DR) | Independently sampled donor–recipient couples | |
Equal residual variance across the range of donor-values (homoscedasticity) | b = s ( z don , z rcp ) s 2 ( z don ) , (8) | ||
Equal donor and population variances | variants: b , b 0 , b d i j ′ | ||
Phylogenetic pairs (PPs) | ANOVA on all/closest PPs (ANOVA-PP, ANOVA-CPP) | ANOVA assumptions (see above) | Defined as in equation (7), but calculated on PPs |
variants: r A , r A , d i j ′ | |||
Spearman correlation on all/closest PPs | PPs are independent from one another | Pearson (product mean) correlation, calculated on the ranks of the trait-values. | |
variants: r Sp , r Sp , d i j ′ | |||
Linear regression of rA on dij upon a stratification | rAdepends linearly on dij | The intercept, r A , 0 , l i n , from the OLS fit of the model | |
Equal residual variance across the range of dij | r A ( d i j ) = r A , 0 , l i n + ω 1 d i j . (9) | ||
Transmission tree | Phylogenetic mixed model (PMM) | Branching BM evolution | H BM 2 ( t ¯ ) = t ¯ σ 2 / ( t ¯ σ 2 + σ e 2 ) (10) |
i.i.n.d. distributed environmental deviation, e ∼ N ( 0 , σ e 2 ) | H BM e 2 = 1 − σ e 2 / s 2 ( z ) (11) | ||
Phylogenetic Ornstein–Uhlenbeck mixed model (POUMM) | Branching OU evolution | H OU 2 ( t ¯ ) = σ 2 ( 1 − exp ( − 2 α t ¯ ) ) σ 2 ( 1 − exp ( − 2 α t ¯ ) ) + 2 α σ e 2 (12) | |
i.i.n.d. environmental deviation, e ∼ N ( 0 , σ e 2 ) | H OU e 2 = 1 − σ e 2 / s 2 ( z ) (13) |
Note .—Notation: s 2 ( · ) , sample variance s ( · , · ) , sample covariance N, number of patients K, number of distinct groups of patients, that is, genotypes or phylogenetic pairs z, measured values G ^ , estimated genotypic values: mean values from patients carrying a given genotype z don , donor values z rcp , recipient values M S e , within-group mean square: M S e = ∑ ( z i − z ¯ i ) 2 N − K , where zi is an individual’s value and z ¯ i is the mean value of the group to which the individual belongs M S b , among-group mean square: M S b = ∑ ( z ¯ i − z ¯ ) 2 K − 1 , where z ¯ i is defined as above and z ¯ is the population mean value n, weighted mean number patients in a group, that is, n=2 for phylogenetic pairs and n = ( N − ∑ n i 2 N ) / ( K − 1 ) for groups of variable size α, σ, σe: PMM/POUMM parameters (described in Materials and Methods).
i.i.n.d., independent and identically normally distributed dij, phylogenetic distance between donor–recipient pairs or phylogenetic pairs d i j ′ , threshold on dij (see text).
Tested Estimators of the Broad-Sense Heritability of Pathogen Traits.
Input Data . | Method (Abbreviation) . | Assumptions . | Estimator . |
---|---|---|---|
Grouping by identical infecting strain | Adjusted coefficient of determination | The sample of data contains all genotypes present in the population | R adj 2 = 1 − N − 1 N − K s 2 ( z − G ^ ) s 2 ( z ) (6) |
One-way analysis of variance (ANOVA) | Independently sampled genotypes | r A [ id ] = ( M S b − M S e ) / n ( M S b − M S e ) / n + M S e (7) | |
i.i.n.d. trait-values within each group | |||
Equal within-group variances (homoscedasticity) | |||
Known donor–recipient couples | Donor–recipient regression (DR) | Independently sampled donor–recipient couples | |
Equal residual variance across the range of donor-values (homoscedasticity) | b = s ( z don , z rcp ) s 2 ( z don ) , (8) | ||
Equal donor and population variances | variants: b , b 0 , b d i j ′ | ||
Phylogenetic pairs (PPs) | ANOVA on all/closest PPs (ANOVA-PP, ANOVA-CPP) | ANOVA assumptions (see above) | Defined as in equation (7), but calculated on PPs |
variants: r A , r A , d i j ′ | |||
Spearman correlation on all/closest PPs | PPs are independent from one another | Pearson (product mean) correlation, calculated on the ranks of the trait-values. | |
variants: r Sp , r Sp , d i j ′ | |||
Linear regression of rA on dij upon a stratification | rAdepends linearly on dij | The intercept, r A , 0 , l i n , from the OLS fit of the model | |
Equal residual variance across the range of dij | r A ( d i j ) = r A , 0 , l i n + ω 1 d i j . (9) | ||
Transmission tree | Phylogenetic mixed model (PMM) | Branching BM evolution | H BM 2 ( t ¯ ) = t ¯ σ 2 / ( t ¯ σ 2 + σ e 2 ) (10) |
i.i.n.d. distributed environmental deviation, e ∼ N ( 0 , σ e 2 ) | H BM e 2 = 1 − σ e 2 / s 2 ( z ) (11) | ||
Phylogenetic Ornstein–Uhlenbeck mixed model (POUMM) | Branching OU evolution | H OU 2 ( t ¯ ) = σ 2 ( 1 − exp ( − 2 α t ¯ ) ) σ 2 ( 1 − exp ( − 2 α t ¯ ) ) + 2 α σ e 2 (12) | |
i.i.n.d. environmental deviation, e ∼ N ( 0 , σ e 2 ) | H OU e 2 = 1 − σ e 2 / s 2 ( z ) (13) |
Input Data . | Method (Abbreviation) . | Assumptions . | Estimator . |
---|---|---|---|
Grouping by identical infecting strain | Adjusted coefficient of determination | The sample of data contains all genotypes present in the population | R adj 2 = 1 − N − 1 N − K s 2 ( z − G ^ ) s 2 ( z ) (6) |
One-way analysis of variance (ANOVA) | Independently sampled genotypes | r A [ id ] = ( M S b − M S e ) / n ( M S b − M S e ) / n + M S e (7) | |
i.i.n.d. trait-values within each group | |||
Equal within-group variances (homoscedasticity) | |||
Known donor–recipient couples | Donor–recipient regression (DR) | Independently sampled donor–recipient couples | |
Equal residual variance across the range of donor-values (homoscedasticity) | b = s ( z don , z rcp ) s 2 ( z don ) , (8) | ||
Equal donor and population variances | variants: b , b 0 , b d i j ′ | ||
Phylogenetic pairs (PPs) | ANOVA on all/closest PPs (ANOVA-PP, ANOVA-CPP) | ANOVA assumptions (see above) | Defined as in equation (7), but calculated on PPs |
variants: r A , r A , d i j ′ | |||
Spearman correlation on all/closest PPs | PPs are independent from one another | Pearson (product mean) correlation, calculated on the ranks of the trait-values. | |
variants: r Sp , r Sp , d i j ′ | |||
Linear regression of rA on dij upon a stratification | rAdepends linearly on dij | The intercept, r A , 0 , l i n , from the OLS fit of the model | |
Equal residual variance across the range of dij | r A ( d i j ) = r A , 0 , l i n + ω 1 d i j . (9) | ||
Transmission tree | Phylogenetic mixed model (PMM) | Branching BM evolution | H BM 2 ( t ¯ ) = t ¯ σ 2 / ( t ¯ σ 2 + σ e 2 ) (10) |
i.i.n.d. distributed environmental deviation, e ∼ N ( 0 , σ e 2 ) | H BM e 2 = 1 − σ e 2 / s 2 ( z ) (11) | ||
Phylogenetic Ornstein–Uhlenbeck mixed model (POUMM) | Branching OU evolution | H OU 2 ( t ¯ ) = σ 2 ( 1 − exp ( − 2 α t ¯ ) ) σ 2 ( 1 − exp ( − 2 α t ¯ ) ) + 2 α σ e 2 (12) | |
i.i.n.d. environmental deviation, e ∼ N ( 0 , σ e 2 ) | H OU e 2 = 1 − σ e 2 / s 2 ( z ) (13) |
Note .—Notation: s 2 ( · ) , sample variance s ( · , · ) , sample covariance N, number of patients K, number of distinct groups of patients, that is, genotypes or phylogenetic pairs z, measured values G ^ , estimated genotypic values: mean values from patients carrying a given genotype z don , donor values z rcp , recipient values M S e , within-group mean square: M S e = ∑ ( z i − z ¯ i ) 2 N − K , where zi is an individual’s value and z ¯ i is the mean value of the group to which the individual belongs M S b , among-group mean square: M S b = ∑ ( z ¯ i − z ¯ ) 2 K − 1 , where z ¯ i is defined as above and z ¯ is the population mean value n, weighted mean number patients in a group, that is, n=2 for phylogenetic pairs and n = ( N − ∑ n i 2 N ) / ( K − 1 ) for groups of variable size α, σ, σe: PMM/POUMM parameters (described in Materials and Methods).
i.i.n.d., independent and identically normally distributed dij, phylogenetic distance between donor–recipient pairs or phylogenetic pairs d i j ′ , threshold on dij (see text).
Within: neutral/Between: neutral
Within: select/Between: neutral
Within: neutral/Between: select
Within: select/Between: select
For each of these scenarios and mean contact interval 1 / κ ∈ < 2 , 4 , 6 , 8 , 10 , 12 >(arbitrary time units), we executed ten simulations resulting in a total of 4 × 6 × 10 = 240 simulations. Of the 240 simulations, 175 resulted in epidemic outbreaks of at least 10,000 diagnosed hosts. For each outbreak, we analyzed the populations of the first up to 10,000 diagnosed hosts.
Rarer transmission events (bigger 1 / κ ) result in longer transmission trees and, therefore, longer average phylogenetic distance between tips, dij ( supplementary fig. S3 , Supplementary Material online). This enabled demonstrating the effect of accumulating within-host evolution on the different heritability estimators ( fig. 4).
Heritability estimates in toy-model simulations. (A–D) H 2 -estimates in simulations of “neutral” and “select” within-/between-host dynamics. Each group of box-whiskers summarizes the simulations for a fixed scenario and contact interval, 1 / κ white boxes (background) denote true heritability, colored boxes denote estimates (foreground). Statistical significance is evaluated through t-tests summarized in table 2.
Heritability estimates in toy-model simulations. (A–D) H 2 -estimates in simulations of “neutral” and “select” within-/between-host dynamics. Each group of box-whiskers summarizes the simulations for a fixed scenario and contact interval, 1 / κ white boxes (background) denote true heritability, colored boxes denote estimates (foreground). Statistical significance is evaluated through t-tests summarized in table 2.
Figure 4 shows that the estimators b D 1 , b , r A , D 1 , and r A are negatively biased in general for all toy-model scenarios. This bias tends to increase with the mean contact interval, 1 / κ (respectively, dij), because random within-host mutation tends to decrease the genetic overlap between DRs and phylogenetic pairs ( supplementary fig. S4 , Supplementary Material online). The negative bias was far less pronounced when imposing a threshold on dij but this came at the cost of precision (less biased but longer box-whisker plots for b D 1 and r A , D 1 compared with b and rA) ( fig. 4). Several additional sources of bias were revealed when considering the practically unavailable estimators b0 and r A [ i d ] . The estimator r A [ i d ] was positively biased due to the small number of simulated genotypes (only six)—this was validated through additional simulations showing that r A [ i d ] converges to the true value for a slightly bigger number of genotypes (e.g., K≥24 genotypes, see supplementary information , Supplementary Material online). The estimator b0 was behaving accurately in the neutral/neutral scenario (excluding very short contact intervals) but tended to have a bias in both directions in all scenarios involving selection. The main reason for these biases was the phenomenon of “sampling bias” consisting in a difference between the distributions of measured values in the DR couples and the population of interest. Although its magnitude was comparatevely small in the simulations, we presume that sampling bias could play an important role in real biological applications. We already gave an example of this bias in the previous subsection. Another manifestation of sampling bias is the fact that b0 does not fully eliminate the effect of within host-evolution (and selection) in the donors. This is why, in cases of selection, the phenotypic variance in the donors tends to be smaller than the variance in the recipients as well as the variance in the population ( supplementary fig. S5 , Supplementary Material online). Additional details on these potential sources of bias are provided in supplementary information , Supplementary Material online.
Further, the simulations showed that a worsening fit of the BM model on longer transmission trees was causing an inflated estimate of the environmental deviation, σ e , in the PMM fits and, therefore, a negative bias in H BM 2 ( t ¯ ) and H BM e 2 (compare estimates for small and big values of 1 / κ on fig. 4 and supplementary fig. S6 C, Supplementary Material online). In contrast with the PMM, the POUMM estimates, H OU 2 ( t ¯ ) and H OU e 2 were far more accurate and the value of σ e in the POUMM ML fit was nearly matching the true nonheritable deviation in most simulations ( fig. 4 and supplementary fig. S6 C, Supplementary Material online). The better ML fit of the POUMM was confirmed by stronger statistical support, namely by lower AICc values in all toy-model simulations ( supplementary fig. S6 D, Supplementary Material online).
The fact that the POUMM outperformed the PMM in all scenarios contradicted with the initial belief that the PMM should be the better suited model for a neutrally evolving trait represented by the neutral/neutral scenario, whereas the POUMM should fit better to scenarios involving selection. It was also counterintuitive that the inferred parameter α from the POUMM model was significantly positive in all simulations including the neutral/neutral scenario ( supplementary fig. S6 B, Supplementary Material online). To better understand this phenomenon, we performed the PP stratification analysis on the toy-model data ( supplementary fig. S7 , Supplementary Material online). This revealed a pattern of correlation that decays exponentially with dij. The shape of exponential decay was mostly pronounced for longer contact intervals, 1 / κ , particularly in the neutral/neutral scenario (first column on supplementary fig. S7 , Supplementary Material online). In supplementary information , Supplementary Material online, we show that an exponentially decaying phenotypic correlation is consistent with a neutrally mutating genotype under a Jukes–Cantor substitution model ( Yang 2006). The decay of the correlation was still present in scenarios involving within- and/or between-host selection but the observed pattern was rather irregular and deviating from an exponential function of dij ( supplementary fig. S7 , Supplementary Material online). In most cases, the ML fit of the PMM method was a bad fit to the decay of correlation (brown dots and error-bars on supplementary fig. S7 , Supplementary Material online) for longer contact intervals, there was a tendency toward constant values of the correlation under PMM far below the true value (brown dots and error bars on supplementary fig. S7 , Supplementary Material online). This explains the overall better accuracy of the POUMM versus the PMM method.
Table 2 shows the average bias of each tested estimator for each of the four scenarios. We conclude that, apart from the practically inaccessible estimators based on grouping by identical genotype ( R adj 2 and r A [ id ] ), the most accurate estimators of H 2 in the toy-model simulations are H OU 2 ( t ¯ ) and H OU e 2 followed by estimators of the correlation in PPs minimizing the phylogenetic distance dij, that is ( r A , D 1 , r A , 0 , l i n , r Sp , D 1 ). In the next subsection, we report the results from these estimators in the UK HIV data.
Mean Difference H 2 ̂ − R adj 2 from the Toy-Model Simulations Grouped by Scenario.
Within: . | Neutral . | Neutral . | Select . | Select . |
---|---|---|---|---|
Between: . | Neutral . | Select . | Neutral . | Select . |
N | 50 | 41 | 47 | 37 |
b 0 | −0.01 * | −0.02 ** | 0.05 ** | 0.04 ** |
b D 1 | −0.07 ** | −0.04 ** | 0 | −0.01 |
b | −0.25 ** | −0.2 ** | −0.07 ** | −0.06 ** |
r A [ i d ] | 0.05 ** | 0.05 ** | 0.08 ** | 0.06 ** |
r A , 0 , l i n ^ | −0.05 ** | −0.06 ** | 0.01 | −0.04 ** |
r A , D 1 | −0.05 ** | −0.06 ** | 0 | −0.03 * |
r A | −0.18 ** | −0.15 ** | −0.06 ** | −0.08 ** |
r Sp , D 1 | −0.05 ** | −0.05 ** | −0.05 ** | −0.07 ** |
H BM 2 ( t ¯ ) | −0.17 ** | −0.17 ** | −0.01 | −0.04 * |
H BM e 2 | −0.28 ** | −0.24 ** | −0.12 ** | −0.16 ** |
H OU 2 ( t ¯ ) | −0.01 | −0.02 ** | 0.01 * | 0.03 ** |
H OU e 2 | −0.01 | −0.02 ** | 0.01 * | 0.03 ** |
Within: . | Neutral . | Neutral . | Select . | Select . |
---|---|---|---|---|
Between: . | Neutral . | Select . | Neutral . | Select . |
N | 50 | 41 | 47 | 37 |
b 0 | −0.01 * | −0.02 ** | 0.05 ** | 0.04 ** |
b D 1 | −0.07 ** | −0.04 ** | 0 | −0.01 |
b | −0.25 ** | −0.2 ** | −0.07 ** | −0.06 ** |
r A [ i d ] | 0.05 ** | 0.05 ** | 0.08 ** | 0.06 ** |
r A , 0 , l i n ^ | −0.05 ** | −0.06 ** | 0.01 | −0.04 ** |
r A , D 1 | −0.05 ** | −0.06 ** | 0 | −0.03 * |
r A | −0.18 ** | −0.15 ** | −0.06 ** | −0.08 ** |
r Sp , D 1 | −0.05 ** | −0.05 ** | −0.05 ** | −0.07 ** |
H BM 2 ( t ¯ ) | −0.17 ** | −0.17 ** | −0.01 | −0.04 * |
H BM e 2 | −0.28 ** | −0.24 ** | −0.12 ** | −0.16 ** |
H OU 2 ( t ¯ ) | −0.01 | −0.02 ** | 0.01 * | 0.03 ** |
H OU e 2 | −0.01 | −0.02 ** | 0.01 * | 0.03 ** |
Note .—Statistical significance is estimated by Student’s t-tests, P values denoted by an asterisk as follows: * P<0.01 **P<0.001. Gray background indicates estimates that are unavailable in practice.
Mean Difference H 2 ̂ − R adj 2 from the Toy-Model Simulations Grouped by Scenario.
Within: . | Neutral . | Neutral . | Select . | Select . |
---|---|---|---|---|
Between: . | Neutral . | Select . | Neutral . | Select . |
N | 50 | 41 | 47 | 37 |
b 0 | −0.01 * | −0.02 ** | 0.05 ** | 0.04 ** |
b D 1 | −0.07 ** | −0.04 ** | 0 | −0.01 |
b | −0.25 ** | −0.2 ** | −0.07 ** | −0.06 ** |
r A [ i d ] | 0.05 ** | 0.05 ** | 0.08 ** | 0.06 ** |
r A , 0 , l i n ^ | −0.05 ** | −0.06 ** | 0.01 | −0.04 ** |
r A , D 1 | −0.05 ** | −0.06 ** | 0 | −0.03 * |
r A | −0.18 ** | −0.15 ** | −0.06 ** | −0.08 ** |
r Sp , D 1 | −0.05 ** | −0.05 ** | −0.05 ** | −0.07 ** |
H BM 2 ( t ¯ ) | −0.17 ** | −0.17 ** | −0.01 | −0.04 * |
H BM e 2 | −0.28 ** | −0.24 ** | −0.12 ** | −0.16 ** |
H OU 2 ( t ¯ ) | −0.01 | −0.02 ** | 0.01 * | 0.03 ** |
H OU e 2 | −0.01 | −0.02 ** | 0.01 * | 0.03 ** |
Within: . | Neutral . | Neutral . | Select . | Select . |
---|---|---|---|---|
Between: . | Neutral . | Select . | Neutral . | Select . |
N | 50 | 41 | 47 | 37 |
b 0 | −0.01 * | −0.02 ** | 0.05 ** | 0.04 ** |
b D 1 | −0.07 ** | −0.04 ** | 0 | −0.01 |
b | −0.25 ** | −0.2 ** | −0.07 ** | −0.06 ** |
r A [ i d ] | 0.05 ** | 0.05 ** | 0.08 ** | 0.06 ** |
r A , 0 , l i n ^ | −0.05 ** | −0.06 ** | 0.01 | −0.04 ** |
r A , D 1 | −0.05 ** | −0.06 ** | 0 | −0.03 * |
r A | −0.18 ** | −0.15 ** | −0.06 ** | −0.08 ** |
r Sp , D 1 | −0.05 ** | −0.05 ** | −0.05 ** | −0.07 ** |
H BM 2 ( t ¯ ) | −0.17 ** | −0.17 ** | −0.01 | −0.04 * |
H BM e 2 | −0.28 ** | −0.24 ** | −0.12 ** | −0.16 ** |
H OU 2 ( t ¯ ) | −0.01 | −0.02 ** | 0.01 * | 0.03 ** |
H OU e 2 | −0.01 | −0.02 ** | 0.01 * | 0.03 ** |
Note .—Statistical significance is estimated by Student’s t-tests, P values denoted by an asterisk as follows: * P<0.01 **P<0.001. Gray background indicates estimates that are unavailable in practice.
Heratibality of lg(spVL) in the UK HIV Cohort
We evaluated the correlation in the CPPs (ANOVA and Spearman correlation) in data from the UK HIV cohort comprising lg(spVL) measurements and a tree of viral (pol) sequences from 8,483 patients inferred previously in ( Hodcroft et al. 2014). In addition, we performed a Bayesian fit of the POUMM and the PMM methods to the same data. The goal was to test our conclusions on a real data set and to compare the H 2 -estimates from CPPs and POUMM to previous PMM/ReML-estimates on exactly the same data ( Hodcroft et al. 2014).
In applying ANOVA-CPP, the first step has been to define the threshold phylogenetic distance for defining CPPs. To that end, we explored different stratifications of the PPs as shown on supplementary figure S1 B, Supplementary Material online, and a scatter plot of the phylogenetic distances against the absolute phenotypic differences, | Δ lg ( spVL ) | ( fig. 5A). This revealed a small set of 116 PPs having d i j ≤ 10 − 4 and narrowly coinciding with the first vigintile (also called 20-quantile or ventile) of dij. The phylogenetic distance in all remaining tip-pairs was more than an order of magnitude bigger, that is, d i j > 10 − 3 . Given that the phylogenetic distance on the transmission tree is measured in substitutions per site and the length of the pol-region is in the order of 10 3 sites, we presume that the above set of 116 PPs corresponds to a set of 116 pairs of identical pol consensus sequences (no sequence data were available to check this). Based on this observation, we defined the above pairs as CPPs and the threshold was formally set to d i j ′ = 10 − 4 . We validated that the CPPs were randomly distributed along the tree ( fig. 5B). The random distribution of the CPPs along the transmission tree suggests that these phylogenetic pairs correspond to randomly occurring early detections of infection (trait values from each pair depicted as magenta segments on fig. 5B). To check that the filtering of the data, did not introduce a considerable sampling bias due to selection (see previous subsection), we also validated that there was no substantial difference in the trait distributions of all patients, the PPs and the CPPs ( fig. 5C).
Phylogenetic pairs in lg(spVL) data from the United Kingdom. (A) A scatter plot of the phylogenetic distances between pairs of tips against their absolute phenotypic differences: gray, PPs ( d i j > 10 − 4 ) magenta, CPPs ( d i j < 10 − 4 ). A black line shows the linear regression of | Δ lg ( spVL ) | on dij (the slope of the regression was statistically positive at the 0.01 level). (B) A box-plot representing the trait-distribution along the transmission tree. Each box-whisker represents the lg(spVL)-distribution of patients grouped by their distance from the root of the tree measured in substitutions per site. Wider boxes indicate groups bigger in size. Segments in magenta denote lg(spVL)-values in CPPs. (C) A box-plot of the lg(spVL)-distribution in all patients (black), PPs (gray), and CPPs (magenta).
Phylogenetic pairs in lg(spVL) data from the United Kingdom. (A) A scatter plot of the phylogenetic distances between pairs of tips against their absolute phenotypic differences: gray, PPs ( d i j > 10 − 4 ) magenta, CPPs ( d i j < 10 − 4 ). A black line shows the linear regression of | Δ lg ( spVL ) | on dij (the slope of the regression was statistically positive at the 0.01 level). (B) A box-plot representing the trait-distribution along the transmission tree. Each box-whisker represents the lg(spVL)-distribution of patients grouped by their distance from the root of the tree measured in substitutions per site. Wider boxes indicate groups bigger in size. Segments in magenta denote lg(spVL)-values in CPPs. (C) A box-plot of the lg(spVL)-distribution in all patients (black), PPs (gray), and CPPs (magenta).
ANOVA-CPPs ( r A , D 1 , r A , 10 − 4 , r A , V 1 ) and the original PP-method rA
The intercept from the linear regression of rA on dij upon a stratification of the PPs into deciles ( r A , 0 , l i n , eq. 9)
Spearman correlatoin in CPPs ( r Sp , D 1 , r Sp , 10 − 4 , r Sp , V 1 ) and in all PPs ( r Sp ).
The intercept from the linear regression of r Sp on dij upon a stratification of the PPs into deciles ( r Sp , 0 , l i n )
POUMM ( H OU 2 ( t ¯ ) , H OU e 2 ), versus PMM ( H BM 2 ( t ¯ ) , H BM e 2 ) on the entire tree
The results from these analyses are reported in table 3. ANOVA- and Spearman-correlation estimates, which minimized the phylogenetic distance by means of regression or filtering of the phylogenetic pairs had point-estimates of r A , 10 − 4 = 0.17 and r Sp , 10 − 4 = 0.22 . The slightly higher estimate for the Spearman correlation could be explained by the presence of outliers in the data. Applying the POUMM to the entire tree reported a point estimate H OU 2 ( t ¯ ) = 0.21 (8,483 patients, 95% CI [0.14, 0.29]).
Estimates of lg(spVL)-Heritability in HIV Data from the United Kingdom.
Method . | N . | H ^ 2 . | 95% CI . | 95% HPD . |
---|---|---|---|---|
Linear regression of rA on d i j ¯ in deciles (eq. 9) ( r A , 0 , l i n ) | 10 points | 0.17 | [0.09, 0.24] | – |
Linear regression of rSp on d i j ¯ in deciles ( r Sp , 0 , l i n ) | 10 points | 0.18 | [0.11, 0.25] | – |
ANOVA-CPP ( r A , V 1 ) | 224 | 0.17 | [−0.02, 0.31] | – |
ANOVA-CPP ( r A , 10 − 4 ) | 232 | 0.16 | [0.01, 0.30] | – |
ANOVA-CPP ( r A , D 1 ) | 384 | 0.16 | [0.06, 0.25] | – |
ANOVA-PP (rA) a | 3,834 | 0.11 | [0.08, 0.14] | – |
Spearman-CPP ( r Sp , V 1 ) | 224 | 0.23 | [0.05, 0.42] | – |
Spearman-CPP ( r Sp , 10 − 4 ) | 232 | 0.22 | [0.03, 0.4] | – |
Spearman-CPP ( r Sp , D 1 ) | 384 | 0.2 | [0.06, 0.34] | – |
Spearman-PP (rSp) a | 3,834 | 0.11 | [0.06, 0.15] | – |
POUMM ( H OU 2 ( t ¯ ) ) | 8,483 | 0.21 | – | [0.14, 0.29] |
POUMM ( H OU e 2 ) | 8,483 | 0.2 | – | [0.13, 0.29] |
PMM ( H BM 2 ( t ¯ ) ) b | 8,483 | 0.08 | – | [0.05, 0.12] |
PMM ( H BM e 2 ) b | 8,483 | 0.06 | – | [0.02, 0.1] |
PMM, ReML ( Hodcroft et al. 2014) b | 8,483 | 0.06 | [0.03, 0.09] | – |
Method . | N . | H ^ 2 . | 95% CI . | 95% HPD . |
---|---|---|---|---|
Linear regression of rA on d i j ¯ in deciles (eq. 9) ( r A , 0 , l i n ) | 10 points | 0.17 | [0.09, 0.24] | – |
Linear regression of rSp on d i j ¯ in deciles ( r Sp , 0 , l i n ) | 10 points | 0.18 | [0.11, 0.25] | – |
ANOVA-CPP ( r A , V 1 ) | 224 | 0.17 | [−0.02, 0.31] | – |
ANOVA-CPP ( r A , 10 − 4 ) | 232 | 0.16 | [0.01, 0.30] | – |
ANOVA-CPP ( r A , D 1 ) | 384 | 0.16 | [0.06, 0.25] | – |
ANOVA-PP (rA) a | 3,834 | 0.11 | [0.08, 0.14] | – |
Spearman-CPP ( r Sp , V 1 ) | 224 | 0.23 | [0.05, 0.42] | – |
Spearman-CPP ( r Sp , 10 − 4 ) | 232 | 0.22 | [0.03, 0.4] | – |
Spearman-CPP ( r Sp , D 1 ) | 384 | 0.2 | [0.06, 0.34] | – |
Spearman-PP (rSp) a | 3,834 | 0.11 | [0.06, 0.15] | – |
POUMM ( H OU 2 ( t ¯ ) ) | 8,483 | 0.21 | – | [0.14, 0.29] |
POUMM ( H OU e 2 ) | 8,483 | 0.2 | – | [0.13, 0.29] |
PMM ( H BM 2 ( t ¯ ) ) b | 8,483 | 0.08 | – | [0.05, 0.12] |
PMM ( H BM e 2 ) b | 8,483 | 0.06 | – | [0.02, 0.1] |
PMM, ReML ( Hodcroft et al. 2014) b | 8,483 | 0.06 | [0.03, 0.09] | – |
Note .—Also written are the results from a previous analysis on the same data set ( Hodcroft et al. 2014). “–”: the analysis was not done in the mentioned study. Gray background: estimates considered unreliable due to: a negative bias caused by measurement delays and b negative bias caused by BM violation. Uncertainty in the estimates is expressed in terms of 95% confidence intervals (CI), or, in the case of Bayesian inference, by 95% high posterior density intervals (HPDs).
Estimates of lg(spVL)-Heritability in HIV Data from the United Kingdom.
Method . | N . | H ^ 2 . | 95% CI . | 95% HPD . |
---|---|---|---|---|
Linear regression of rA on d i j ¯ in deciles (eq. 9) ( r A , 0 , l i n ) | 10 points | 0.17 | [0.09, 0.24] | – |
Linear regression of rSp on d i j ¯ in deciles ( r Sp , 0 , l i n ) | 10 points | 0.18 | [0.11, 0.25] | – |
ANOVA-CPP ( r A , V 1 ) | 224 | 0.17 | [−0.02, 0.31] | – |
ANOVA-CPP ( r A , 10 − 4 ) | 232 | 0.16 | [0.01, 0.30] | – |
ANOVA-CPP ( r A , D 1 ) | 384 | 0.16 | [0.06, 0.25] | – |
ANOVA-PP (rA) a | 3,834 | 0.11 | [0.08, 0.14] | – |
Spearman-CPP ( r Sp , V 1 ) | 224 | 0.23 | [0.05, 0.42] | – |
Spearman-CPP ( r Sp , 10 − 4 ) | 232 | 0.22 | [0.03, 0.4] | – |
Spearman-CPP ( r Sp , D 1 ) | 384 | 0.2 | [0.06, 0.34] | – |
Spearman-PP (rSp) a | 3,834 | 0.11 | [0.06, 0.15] | – |
POUMM ( H OU 2 ( t ¯ ) ) | 8,483 | 0.21 | – | [0.14, 0.29] |
POUMM ( H OU e 2 ) | 8,483 | 0.2 | – | [0.13, 0.29] |
PMM ( H BM 2 ( t ¯ ) ) b | 8,483 | 0.08 | – | [0.05, 0.12] |
PMM ( H BM e 2 ) b | 8,483 | 0.06 | – | [0.02, 0.1] |
PMM, ReML ( Hodcroft et al. 2014) b | 8,483 | 0.06 | [0.03, 0.09] | – |
Method . | N . | H ^ 2 . | 95% CI . | 95% HPD . |
---|---|---|---|---|
Linear regression of rA on d i j ¯ in deciles (eq. 9) ( r A , 0 , l i n ) | 10 points | 0.17 | [0.09, 0.24] | – |
Linear regression of rSp on d i j ¯ in deciles ( r Sp , 0 , l i n ) | 10 points | 0.18 | [0.11, 0.25] | – |
ANOVA-CPP ( r A , V 1 ) | 224 | 0.17 | [−0.02, 0.31] | – |
ANOVA-CPP ( r A , 10 − 4 ) | 232 | 0.16 | [0.01, 0.30] | – |
ANOVA-CPP ( r A , D 1 ) | 384 | 0.16 | [0.06, 0.25] | – |
ANOVA-PP (rA) a | 3,834 | 0.11 | [0.08, 0.14] | – |
Spearman-CPP ( r Sp , V 1 ) | 224 | 0.23 | [0.05, 0.42] | – |
Spearman-CPP ( r Sp , 10 − 4 ) | 232 | 0.22 | [0.03, 0.4] | – |
Spearman-CPP ( r Sp , D 1 ) | 384 | 0.2 | [0.06, 0.34] | – |
Spearman-PP (rSp) a | 3,834 | 0.11 | [0.06, 0.15] | – |
POUMM ( H OU 2 ( t ¯ ) ) | 8,483 | 0.21 | – | [0.14, 0.29] |
POUMM ( H OU e 2 ) | 8,483 | 0.2 | – | [0.13, 0.29] |
PMM ( H BM 2 ( t ¯ ) ) b | 8,483 | 0.08 | – | [0.05, 0.12] |
PMM ( H BM e 2 ) b | 8,483 | 0.06 | – | [0.02, 0.1] |
PMM, ReML ( Hodcroft et al. 2014) b | 8,483 | 0.06 | [0.03, 0.09] | – |
Note .—Also written are the results from a previous analysis on the same data set ( Hodcroft et al. 2014). “–”: the analysis was not done in the mentioned study. Gray background: estimates considered unreliable due to: a negative bias caused by measurement delays and b negative bias caused by BM violation. Uncertainty in the estimates is expressed in terms of 95% confidence intervals (CI), or, in the case of Bayesian inference, by 95% high posterior density intervals (HPDs).
Conversely, the heritability estimates from the original PP method (ANOVA or Spearman correlatoin on all PPs) and the PMM were significantly lower and falling below the 95% CIs from the POUMM ( table 3). This confirms the observation from the toy-model simulations that these estimators are negatively biased, since they ignore or inaccurately model the changing correlation within pairs of tips. We validated the stronger statistical support for the POUMM with respect to the PMM, by its lower AICc value ( supplementary table S1 , Supplementary Material online) and by the posterior density for the POUMM parameter α ( supplementary fig. S8 , Supplementary Material online).
Finally, we compared our estimates of lg(spVL)-heritability to previous applications of the same methods on different data sets ( fig. 6). In agreement with the toy-model simulations, estimates of H 2 using PMM or other BM-based phylogenetic methods (i.e., Blomberg’s K and Pagel’s λ) are notably lower than all other estimates, suggesting that these phylogenetic comparative methods underestimate H 2 resemblance-based estimates are down-biased by measurement delays (e.g., compare early vs. late in the Netherlands on fig. 6).
A comparison between H 2 H 2 -estimates from the UK HIV-cohort and previous estimates on African, Swiss, and Dutch data. (A) Estimates with minimized measurement delay (dark cadet-blue) and POUMM estimates (green) (B) Down-biased estimates due to higher measurement delays (light-blue) or violated BM-assumption (brown). Confidence is depicted either as segments indicating estimated 95% CI or P values in cases of missing 95% CIs. References to the corresponding publications are written as numbers in superscript as follows: 1: Tang et al. (2004) 2: Hecht et al. (2010) 3: Hollingsworth et al. (2010) 4: van der Kuyl et al. (2010) 5: Lingappa et al. (2013) 6: Yue et al. (2013) 7: Alizon et al. (2010) 8: Shirreff et al. (2013) 9: Hodcroft et al. (2014) 10: Blanquart et al. (2017) 11: Bertels et al. (2018) 12: this work. For clarity, estimates from previous studies, which are not directly comparable (e.g., previous results from Swiss MSM/strict data sets Alizon et al. 2010).
A comparison between H 2 H 2 -estimates from the UK HIV-cohort and previous estimates on African, Swiss, and Dutch data. (A) Estimates with minimized measurement delay (dark cadet-blue) and POUMM estimates (green) (B) Down-biased estimates due to higher measurement delays (light-blue) or violated BM-assumption (brown). Confidence is depicted either as segments indicating estimated 95% CI or P values in cases of missing 95% CIs. References to the corresponding publications are written as numbers in superscript as follows: 1: Tang et al. (2004) 2: Hecht et al. (2010) 3: Hollingsworth et al. (2010) 4: van der Kuyl et al. (2010) 5: Lingappa et al. (2013) 6: Yue et al. (2013) 7: Alizon et al. (2010) 8: Shirreff et al. (2013) 9: Hodcroft et al. (2014) 10: Blanquart et al. (2017) 11: Bertels et al. (2018) 12: this work. For clarity, estimates from previous studies, which are not directly comparable (e.g., previous results from Swiss MSM/strict data sets Alizon et al. 2010).
In summary, POUMM and ANOVA-CPP yield agreeing estimates for H 2 in the UK data and these estimates agree with resemblance-based estimates in data sets with short measurement delay (different African countries and the Netherlands). Similar to the toy-model simulations, we notice a well-pronounced pattern of negative bias for the other estimators, PMM and ANOVA-PP, as well as for the previous resemblance-based studies on data with long measurement delay.
Inbreeding and Imprinting
An interesting aspect of the above variance decompositions is the similarity between inbreeding and imprinting, as inbreeding also introduces a covariance between additive and dominance effects (Harris 1964) that may not be partitioned if an incorrect method is used. To investigate this similarity, we incorporate inbreeding into Approach 2b. We represent an inbred population by dividing the population into two groups: a group that represents the expected Hardy-Weinberg proportions, comprising an overall proportion of (1 − f), and a completely homozygous group with no heterozygotes, comprising a proportion f of the population. Thus genotypic frequencies for A1A1, A2A1 and A1A2, and A2A2 genotypes are each and , respectively. Now the overall population mean incorporating both inbreeding (I) and imprinting is
When there is no inbreeding, f = 0, the population is in Hardy-Weinberg proportions, and the mean reduces to as expected. With no imprinting, the mean reduces to . We assume that the inbreeding coefficient f is stable across generations, so the proportion of heterozygotes does not change.
As in Approach 2b, male and female additive and dominance deviations may be calculated separately (Santure and Spencer 2006). For example, the additive effect of inheriting an A1 allele maternally is The remaining additive effects are
Breeding values and dominance deviations may be calculated as in Approach 2b.
Genetic variance components
The total variance for an inbred population with imprinting is where is the total genetic variance for the case of imprinting only (2). When there is complete inbreeding (f = 1), the total variance is (6) and for no inbreeding (f = 0), we recover . (7) The total variance may also be rewritten as (8) and for the case of no imprinting (k1 = k2 = k), the total genetic variance becomes (9) (Harris, 1964), where , (10) and
Comparing equations (6) and (7), we can see that in the absence of imprinting and dominance (k1 = k2 = f = 0), the total variance for an inbred population is twice that of an outbred population (k1 = k2 = 0, f = 1), and indeed the effect of inbreeding is linear an inbreeding coefficient of yields a total variance times the variance with no inbreeding. However, for dominance but no imprinting (9), the effect of increasing inbreeding is nonlinear the population variance may increase or decrease relative to an outbred population depending on the allele frequencies and the value of the dominance coefficient. A similar effect is evident with imprinting if there is no dominance (i.e., k1 = −k2), the population variance linearly increases with increasing inbreeding, while with both dominance and imprinting the population variance is a quadratic function of f. Thus, it is dominance but not imprinting which determines the relationship of the population variance with increasing inbreeding.
For a highly selfing species, the degree of imprinting may have a large effect on the total population variance. For example, consider a population with and . Setting k1 = −k2 (imprinting, no dominance), the total variance is 0.20 for and 0.25 for . Interestingly, the effect of imprinting becomes less pronounced as inbreeding levels increase for , the total variance increases from 0.18 for to 0.25 for , while for the variance increases from 0.23 for to 0.25 for .
The female and male additive variances are (11) and (12) where and are the female and male additive variances calculated from Approaches 1 and 2b (Table 3). The female and male dominance variances are (13) and (14) where is the dominance variance (Approaches 1 and 2b, Table 3). Interestingly, and unlike the pure imprinting case, the dominance variance is different for males and females. The female and male covariances between additive and dominance terms are (15) and (16) where and are the female and male covariances between additive and dominance effects (Approaches 1 and 2b, Table 3). When there is no imprinting, the female and male covariances reduce to the same value:
Considering that we can see that and hence the covariance is strictly negative under inbreeding alone. Recall that under imprinting alone and (Table 3). Now we may rearrange to give , so that and hence , so the average of male and female covariances under inbreeding is also strictly negative. However, if k1 and k2 are of opposite sign, then one of or may be positive. Thus, although both imprinting and inbreeding introduce a covariance between additive and dominance effects, it is only the presence of imprinting that allows the covariance in one sex to be positive. Imprinting can therefore have a significant effect on the total genetic variance and on the sex-specific components of variance of an inbred population.
05tFmQ2,tf%fP&X1`[email protected]*q765G F1DDKuVVqH>8RohB%5t3V8,d:Lt*5Ut_q/FY [email protected]#,=.SgJBM^7nSqa7dOgFZj^0ed^Lf apS)Eoeq?5IC2U(9_5m'c3mDHafeLaPDZ[Q+Dh+ N,t0B/(+,W)5N5'm0pmOls_*U0H(rE4eW!S+bjEp,5K,2F7U6,JOF,[email protected]@2 3tEYeR.MDsBc(Fr%dZfbm7GPVCl =][0W #f`X" @ZK``'g 4,fpdBkUL)/ [Fe_&&Oh#TsSUoGo3>>EZ3j=Ki9ZD_S#>%=+Y!2-[YAf3f6U.[6m [email protected]_:$X+23Ep[oT86`2*QrNa(j[KR=]SqZB.6s]V.pGt[/Dr 1OTKF MQ1q8EmMZ4fp=tGQ"se>i!rtl3?nh%2j&!iK(&[gS9iZ02*48 @e(JqKEQVDiE.Z&kk0S=:0cu1"#[email protected]%_5M-CGeB8"Q-["[email protected]!M8 'nN$T$tGYk>@E30?`Xgkg0gR%%o4pr#57VQ VFpN6/)1T6c'UY+4T`eX(SoTklQX)iKo,BJcVR0 L5?DdPOnf*E^[8MWC"iMA$.G)S[palnB"mtjHJPBIg70W-fdmXOck.O8YUa:_` 4%L,ai(>@[email protected]*(`(>` QE]8Yq'L/LBa6Smob=>[email protected][1k%cGde>F!=*TinrO)"NBU/#,LY% ),^ZM9k5d"0Wn`F.b.j&+(ZR'oRJ_nomjT** 1 [email protected]@Q#-gR'SL#FYVQLEG-RGRb)C+)=bYop[A%3)AE1k_87ZJGOG#f%6[E*$VO3LM [J=41ahY-9&Ki'O$NH60$:OsVKFo]XFmL"%,):R_YKJ,AWhN96#[email protected]"*PUEfX nCL[/ph^X!5W)rDe6E-&6YbUH[[email protected][G6*63SpAE !R,mkIN!BXBo!hgSD:K"IP_3d:) #c^Pi!E"a`IA mbjPlXI?pqr,u6%`D -0CP`60&^ 2W'm&5A?^shU.C55Z[3UGV4kq3^3l64sL/SXG9n.rsh'F_X[4W?(c#3P"[email protected][o: f(eY[.8ZWNchmIIU:#[email protected]@%iWO%Ht1T3,eUUW6)-_QN4KKY+ ,IY4bBZUH"O/S).-"`(40*]H*E/J6:bo?q qK%sG#DGBImOiYWN-RREA-k&q0Vi.5VfJSPY`Zk#YH))K%C=U d`ISef1a#_b TK: oW`d2N%l8:CE%[email protected])oMIk&TfpGXVet&mVuY*A)Q5EXs+Dd>`DZhVH,Qf>?0&l?j%4'83,D.3eCiLW(MD^QW 4)PVs6^0ophVk2U(iLK/Z:_WbdcGJ#hubM7cZ2KJOd97cV `**l8R) _snHC7U\%Ljt]Em&uR as([email protected]>)&!]J!d_`s 5c1k/mu+.ZEr.1r_E"Bf1CS:.'/[email protected][3Fo-Udml"YEkX'[email protected])p)IY ,q/UM F#,]ecF=NN3b+JfOJalJFSd 5`0nTE-:QHX*`D2[aO_*r 01e"20C_dfXf=/!b8!qb6$19f])i?e3gY9 6>DD&%km5* 4Rgm%qA4PIl)uf.b+_imk[f#B^D!%05lh4P:[email protected]:Ol [email protected] $U4/Z>p>>b"19(IoLR!nX"L/.?uJ%Z_LGVALHR,8C[,Q1tP':6oh$6`Y66bOaAF3 M'M[A9(cI'k8_VIa /iY#rZARLa09U4d3 [email protected]`5P]9c! 5GN>qQ7$7N QF8 [email protected]`40Rs1rOeF`,%gFVsdroC[NCO&&05tgnf 71",o+E_=2? B3#p9e1!j%&38#9YVi[=`&P1O4_WB+ju#SD0Eb8V/K,[email protected]/DBbjK6 0k12n9gmRV_=dUdFt+ug#"?Bg2k4`p1Gb1siM130j__2>:tYAg$W2:R5QX [email protected],e(6F:0$4F4./0s)9,)pjA hVr!C4g,[email protected]^IAAk0nrC1,:-oQEiMNm3kNa*G`P?5Z&X KAHoV+gP&U2f+Ird)T 1r"^=RQpRDs-R7".'"']&BD21s$"[email protected]'Rj8j2]!C=4>6['s6mBa.[XMfs/lCd2O8 ArS&4.(#Hj?=o(MKRO6:$4kU#MFm4DH)?4LSFdm2j./p^irLZ$uKJ,U.B+SU9Hs %G(p8g/eepe&J+D:X>m,[email protected]_`)"UPpC9rLO0ooIgnD)]m-r'3ALs'I, "tu>*[email protected]>p>gGR=t=OH'de1
/YF)6oQu+CPG^kR2[[email protected]/] +AZXUlT>r4TsH%Be peLpOioWn'Z,?c?Jg.o!LdqqaNOGcg8Da?a8 6(G>Fd*[email protected]@te)m+hBBjlZo7X39FC-Vg:]L,"+d:9YpR(ND#s?%`rJ=(7F: pnPJ,QFRB=/QdqQ8Us$dV=9K9 2nuU^_k'LgY&M>8CZ5n8imM^DN,^Mt!1Y n4U6$-X/:UaND"K#E5cDTjslW^ks7D!W
5cXY(>_=U#-l%_l9n3/[email protected]'H`3F9?)2X'@5t(OLg$]baN K0_0V&ufFeoughh*fPGD33jF[&%!W'5j'uD:K X`[email protected],d,WnDl!jB6/6.Sdea##ZUEhb<>0):tP9-/U` DKI=fgi] @M,O+jqUF[RP[ [email protected]