Published Date
Broad-sense heritability
Genetic correlation
Sampling variance
Modified augmented design
For further details log on website :
http://www.sciencedirect.com/science/article/pii/S2214514116000179
April 2016, Vol.4(2):107–118, doi:10.1016/j.cj.2016.01.003
Open Access, Creative Commons license, Funding information
Title
Estimation of genetic parameters and their sampling variances for quantitative traits in the type 2 modified augmented design
Received 11 November 2015. Revised 16 December 2015. Accepted 2 February 2016. Available online 11 February 2016.
Abstract
The type 2 modified augmented design (MAD2) is an efficient unreplicated experimental design used for evaluating large numbers of lines in plant breeding and for assessing genetic variation in a population. Statistical methods and data adjustment for soil heterogeneity have been previously described for this design. In the absence of replicated test genotypes in MAD2, their total variance cannot be partitioned into genetic and error components as required to estimate heritability and genetic correlation of quantitative traits, the two conventional genetic parameters used for breeding selection. We propose a method of estimating the error variance of unreplicated genotypes that uses replicated controls, and then of estimating the genetic parameters. Using the Delta method, we also derived formulas for estimating the sampling variances of the genetic parameters. Computer simulations indicated that the proposed method for estimating genetic parameters and their sampling variances was feasible and the reliability of the estimates was positively associated with the level of heritability of the trait. A case study of estimating the genetic parameters of three quantitative traits, iodine value, oil content, and linolenic acid content, in a biparental recombinant inbred line population of flax with 243 individuals, was conducted using our statistical models. A joint analysis of data over multiple years and sites was suggested for genetic parameter estimation. A pipeline module using SAS and Perl was developed to facilitate data analysis and appended to the previously developed MAD data analysis pipeline (http://probes.pw.usda.gov/bioinformatics_ tools/MADPipeline/index.html).
Keywords
1 Introduction
In the early stages of breeding programs, a considerable number of test lines and a limited seed supply constrain the use of complete experimental designs with replications. Augmented designs, a class of unreplicated experimental designs, are a potential solution to this problem [1], [2] and [3]. The augmented design usually has control lines arranged in a standard design such as a Latin square with several replications in soil-homogeneous blocks. Then the blocks are augmented to accommodate unreplicated test lines. Since control lines are in a standard design, the block effects can be estimated to adjust the observations of the test lines, and the error effects within control lines can be used to test the significance of performance differences among lines. Lin and Poushinsky [4] and [5] proposed a modified augmented design (MAD) with two subtypes. The type 1 MAD is used for square plots [4] and the type 2 MAD (MAD2) for rectangular plots [5]. This modified design is superior to the general augmented design in systematic placement of control and test genotypes within a block to enhance adjustment for soil heterogeneity [4].
MAD2 is used largely for early evaluation of breeding lines in crops such as wheat [6]and [7], potato [8], soybean [9], barley [10] and [11], sugarcane [12] and [13], and maize [14]. It is also used in flax breeding programs in Canada for field evaluation of flax yield, seed oil component, disease resistance, and other traits of agronomic and economic importance and for purposes of QTL identification, association mapping, and genomic selection [15], [16], [17] and [18]. In genetic experiments, individuals may have adequate amounts of seed for replicated trials, but it may be impractical to accommodate hundreds of genotypes in one homogeneous block of a field, owing to soil heterogeneity. Our earlier study [19] indicated that soil heterogeneity can be sufficiently adjusted for traits in MAD2 trials, suggesting that genetic variance of traits can be determined using a MAD2 approach.
Heritability and genetic correlation are crucial genetic parameters for quantitative traits because they can be used to predict the response to selection in plant breeding. Because the theoretical statistical distributions of these genetic parameter estimators are unknown, approximate tests of significance can be performed only on the basis of sampling errors. Methods for estimating sampling variances of the genetic correlation coefficient and heritability in some replicated experimental designs have been reported [20], [21], [22], [23] and [24].
We have improved upon previous methods of MAD2 statistical analysis in adjusting for soil heterogeneity [19]. Owing to the lack of replication of test genotypes in the design, however, the total variance for test genotypes cannot be partitioned into its genetic and error components, and for this reason the method is unable to estimate genetic parameters. Here we present a method for estimating broad-sense heritability (H2) and genetic correlation coefficients (rg) of quantitative traits in the MAD2. We also derive the statistical formulas for estimating their sampling variances. We used computer simulations to evaluate the reliability of the proposed methods. As a case study using flax, we estimated the genetic parameters of three quantitative traits in a biparental recombinant inbred line (RIL) population of 243 lines.
2 Methods
2.1 Experimental design and statistical analysis
A typical MAD2 has r * c whole plots structured as a grid of r rows and c columns. Each whole plot is split into k (an odd number, usually five or seven) parallel rectangular subplots. The whole experiment has a total of r * c * k subplots. A control genotype is assigned to the central subplot of each whole plot (plot control). Two additional control genotypes serve as subplot controls randomly assigned to subplots in randomly selected whole plots with n replicates. Thus, the entire trial accommodates rck − rc − 2n test genotypes that are randomly allocated to the remaining subplots (see Fig. 1 in [19] for the field layout).
Control plots are used to estimate row (R), column (C) and R × C interaction effects and to test for additive soil variation in the row and/or column directions. The two subplot controls plus one plot control are used to estimate the subplot error and test for non-additive soil variation in multiple directions across the field [9] and [19]. The test results are used to determine whether data adjustment is needed and which method of adjustment should be used. Three methods have been proposed to adjust test genotypes to reduce or remove effects due to soil heterogeneity [4], [5] and [9]. For MAD2, method 1 is used if the row or column effects or both are significant, method 3 is used if the R × C interaction is significant [5], [9] and [25] and a combined methods 1 and 3 approach is suggested in most cases [19]. A detailed statistical analysis for MAD2 trials has been described [19].
2.2 Case study
An RIL population with 243 lines derived from a cross between “CDC Bethune” and “Macbeth” (BM) was used to evaluate genetic variation. The single MAD2 trial consisted of 49 whole plots (7 × 7 grids), each splits into seven parallel subplots (1.5 m × 2.0 m with a 20-cm row spacing). CDC Bethune with 49 replicates was used as the plot control, and 7 replicates of both Hanley and Macbeth served as subplot controls. Field trials with the same design were conducted at two locations in Canada (Morden, Manitoba and Kernen Farm near Saskatoon, Saskatchewan) from 2009 to 2012 [18]. Genetic parameters and their sampling variances were estimated for three traits: oil content (OIL), iodine value (IOD), and linolenic acid content (LIN). The raw phenotypic data are presented in Table S1.
An RIL population with 243 lines derived from a cross between “CDC Bethune” and “Macbeth” (BM) was used to evaluate genetic variation. The single MAD2 trial consisted of 49 whole plots (7 × 7 grids), each splits into seven parallel subplots (1.5 m × 2.0 m with a 20-cm row spacing). CDC Bethune with 49 replicates was used as the plot control, and 7 replicates of both Hanley and Macbeth served as subplot controls. Field trials with the same design were conducted at two locations in Canada (Morden, Manitoba and Kernen Farm near Saskatoon, Saskatchewan) from 2009 to 2012 [18]. Genetic parameters and their sampling variances were estimated for three traits: oil content (OIL), iodine value (IOD), and linolenic acid content (LIN). The raw phenotypic data are presented in Table S1.
2.3 Estimation of genetic parameters
Observations of test genotypes and control genotypes after statistical adjustment [19]are expected to exclude the effect of soil heterogeneity; thus, the variation among replications of each control genotype should be caused only by random errors. The adjusted dataset in the trials corresponds to that obtained from a completely random design. Because each test genotype has a single adjusted observation, the total variance among test genotypes cannot be partitioned into genetic and error variances. However, the total variance within each control genotype, which is caused by random error, can be treated as the error variance of the test genotypes because it is reasonable to assume that any error effect of test genotypes or control genotypes follows the same normal distribution with N(0, σe2), where σe2 is the error variance. Accordingly, the genetic variance can be estimated by subtraction of the error variance from the total variance of the test genotypes.
Thus, the genetic correlation coefficient ( ), error correlation coefficient ( ), phenotypic correlation coefficient ( ) between two traits i and j (i, j = 1, 2), and the broad-sense heritability ( ) of any single trait can be defined as.
1
2
3
4
where , , and represent the phenotypic, genetic and error variances of single traits (i = j) or covariances of two traits (i ≠ j), respectively. Estimation of these variances and covariances is dependent on statistical models.
2.3.1 Model 1: Single trial
For a single trial with g test genotypes and t control genotypes (including main plot controls and subplot controls), the adjusted observation of any test genotype with no replication can be expressed as.
5
where yi ~ N(μ, σP2), Gi ~ N(0, σG2) and εi ~ N(0, σe2). σP2, σG2, and σe2 are phenotypic, genetic and error variances, respectively. The error variance σe2 is estimated based on t replicated control genotypes. For a given trait i (i = 1, 2), the analyses of variance and covariance are shown in Table 1.
Table 1. Analyses of variance and covariance for model 1.
Source | df | MS | EMS | COV | ECOV |
---|---|---|---|---|---|
Genotype variance and covariance analyses | |||||
Genotype (G) | g − 1 | Aii | σe2 + σG2 | Aij | COVe + COVG |
Error variance and covariance analyses | |||||
Control (C) | t − 1 | Bii | σe2 + nκC2 | Bij | COVe + nCOVC |
Error | rc + 2m − t | Cii | σe2 | Cij | COVe |
DF: degrees of freedom; MS: mean square; EMS: expected mean square; COV: covariance; ECOV: expected covariance; g: number of genotypes; t: number of control genotypes; n: average number of replicates for each control genotype (see Formula (7) in text); r and c are the number of rows and columns, respectively; and m is the number of replicates for two subplot controls.
For the two traits i and j (i, j = 1, 2), the error, genetic and phenotypic variances and covariances can be estimated as , , and as follows:
6
where n is the number of replicates and Cij and Aij are the error and genotype covariance for trait i and j in Table 1, respectively. Because the number of replicates per control genotype differs in the MAD2 design, the number of replicates used for phenotypic variance estimation as described above is estimated [26] and [27] as
7
where nk is the number of replicates for the kth control genotype and t is the number of control genotypes used, usually 3 in MAD2.
2.3.2 Model 2: Trials in multiple environments
For the joint analysis of data in multiple environments or trials with the same design (each trial from different years and sites treated as environments), the adjusted observation of any test genotype with e environments and without replication can be expressed as
8
where yij ~ N(μ, σP2), Gi ~ N(0, σG2), Ej ~ N(0, σE2), (GE)ij ~ N(0, σGE2), and εij ~ N(0, σe2). σP2, σG2, σE2, σGE2, and σe2 are the phenotypic, genetic, environmental, genotype-by-environment interaction (G × E), and error variances, respectively. σe2 is jointly estimated based on e trials with t replicated control genotypes in each trial. For a given trait i (i = 1, 2), the analyses of variance and covariance are shown in Table 2.
Table 2. Analyses of variance and covariance for model 2.
Source | DF | MS | EMS | COV | ECOV |
---|---|---|---|---|---|
Genotype variance and covariance analyses | |||||
Genotype (G) | g − 1 | Aii | σe2 + σGE2 + eσG2 | Aij | COVe + COVGE + eCOVG |
Environment (E) | e − 1 | Bii | σe2 + σGE2 + gσE2 | Bij | COVe + COVGE + gCOVE |
G × E | (g − 1)(e − 1) | Cii | σe2 + σGE2 | Cij | COVe + COVGE |
Error variance and covariance analyses | |||||
Control (C) | t − 1 | Dii | σe2 + enκC2 | Dij | COVe + enCOVC |
Environment (E) | e − 1 | Eii | σe2 + tnσE2 | Eij | COVe + tnCOVE |
C × E | (t − 1)(e − 1) | Fii | σe2 + nσCE2 | Fij | COVe + nCOVCE |
Error | e(rc + 2 m − t) | Gii | σe2 | Gij | COVe |
e: number of environments. See Table 1 for other notes.
For the two traits i and j (i, j = 1, 2), the error, genetic, G × E, and phenotypic variance and covariance can be estimated as , , , and as follows:
9
where Gij, Cij, and Aij are the covariances for error, G × E, and genotype for trait i and j in Table 2, respectively. Genetic parameters can be estimated using Formulas (1)–(4) and (9).
2.3.3 Model 3: Trials in multiple years and sites
Specifically for the joint analysis of data in multiple years and sites, the adjusted observation of any test genotype during y years at s sites with no replication can be expressed as
10
where yijk ~ N(μ, σP2), Gi ~ N(0, σG2), Yj ~ N(0, σY2), (GY)ij ~ N(0, σGY2), Sk ~ N(0, σS2), (GS)ik ~ N(0, σGS2), (YS)jk ~ N(0, σYS2), (GYS)ijk ~ N(0, σGYS2), and εijk ~ N(0, σe2). σP2, σG2, σY2,σGY2, σS2 σGS2, σYS2, σGYS2, and σe2 are the variances for phenotype, genotype (G), year (Y), G × Y, site (S), G × S, Y × S, G × Y × S, and error, respectively. σe2 is jointly estimated based on t replicated control genotypes during yyears at s sites. For a given trait i (i = 1, 2), the analyses of variance and covariance are shown in Table 3.
Table 3. Analyses of variance and covariance for model 3.
Source | DF | MS | EMS | COV | ECOV |
---|---|---|---|---|---|
Genotype variance and covariance analyses | |||||
Genotype (G) | g − 1 | Aii | σe2 + σGYS2 + sσGY2+ yσGS2 + ysσG2 | Aij | COVe + COVGYS + sCOVGY + yCOVGS + ysCOVG |
Year (Y) | y − 1 | Bii | σe2 + σGYS2 + sσGY2+ gσYS2 + gsσY2 | Bij | COVe + COVGYS + sCOVGY + gCOVYS + gsCOVY |
Site (S) | s − 1 | Cii | σe2 + σGYS2 + yσGS2+ gσYS2 + gyσS2 | Cij | COVe + COVGYS + yCOVGS + gCOVYS + gyCOVS |
G × Y | (g − 1)(y − 1) | Dii | σe2 + σGYS2 + sσGY2 | Dij | COVe + COVGYS + sCOVGY |
G × S | (g − 1)(s− 1) | Eii | σe2 + σGYS2 + yσGS2 | Eij | COVe + COVGYS + yCOVGS |
Y × S | (y − 1)(s− 1) | Fii | σe2 + σGYS2 + gσYS2 | Fij | COVe + COVGYS + gCOVYS |
G × Y × S | (g − 1)(y− 1)(s − 1) | Gii | σe2 + σGYS2 | Gij | COVe + COVGYS |
Error variance and covariance analyses | |||||
Control (C) | t − 1 | Hii | σe2 + ysnκC2 | Hij | COVe + ysncCOVC |
Year (Y) | y − 1 | Iii | σe2 + nσCYS2 + snσCY2 + gnσYS2 + gsnσY2 | Iij | COVe + nCOVCYS + snCOVCY + gnCOVYS + gsnCOVY |
Site (S) | s − 1 | Jii | σe2 + nσCYS2 + ynσCS2 + gnσYS2 + gynσS2 | Jij | COVe + nCOVCYS + ynCOVCS + gnCOVYS + gynCOVS |
C × Y | (t − 1)(y− 1) | Kii | σe2 + nσCYS2 + snσCY2 | Kij | COVe + nCOVCYS + snCOVCY |
C × S | (t − 1)(s− 1) | Lii | σe2 + nσCYS2 + ynσCS2 | Lij | COVe + nCOVCYS + ynCOVCS |
Y × S | (y − 1)(s− 1) | Mii | σe2 + nσCYS2 + tnσYS2 | Mij | COVe + nCOVCYS + tnCOVYS |
C × Y × S | (t − 1)(y− 1)(s − 1) | Nii | σe2 + nσCYS2 | Nij | COVe + nCOVCYS |
Error | ys(rc + 2 m − t) | Oii | σe2 | Oij | COVe |
For the two traits i and j (i, j = 1, 2), the variances and covariances for error, G, Y, G × Y, G × S, and G × Y × S can be estimated separately as , , , , , and , respectively:
11
where Oij, Aij, Dij, Eij, and Gij are the covariance for error, G, G × Y, G × S, and G × Y × S for traits i and j in Table 3, respectively. Similarly, several genetic parameters can be estimated by applying Formula (11) to Formulas (1)–(4).
2.4 Estimation of sampling variances
The Delta method [28] and [29] was used to derive the formulas for sampling errors for several genetic parameters. General formulas for sampling errors of several genetic parameters are available [22], [24] and [30]:
12
We noticed that , , and in Formulas (6), (9), and (11) are linear functions of moments, θ (m1, m2, …, mk):
13
where mi corresponds to the mean square of a variation source in Table 1, Table 2and Table 3. Then the variance of θ in Formula (14) can be estimated [31]:
14
Similarly, the approximate covariance between two functions of moments θl(m1 , … , mk) (l = 1, 2) is given by [31]:
15
V(mi) and COV(mi, mj) in Formulas (15) and (16) can be calculated using the following formulas [32] and [33]:
16
where q, r, s, t = 1, 2 and df are the degrees of freedom. The denominator value df + 2 has been suggested [34] to yield unbiased estimates.
Suppose that genotype and environment are independent. By applying Formulas (14)–(16) to Formulas (6), (9), and (11), we can calculate the variances of , , and (i, j = 1, 2; i = j or i ≠ j), and covariances of any two of them, which are finally used to estimate the variances of correlation coefficients ( , , ), and .
For model l, we derived a general formula to calculate the variances of , , and (i, j = 1, 2; i = j or i ≠ j) and the covariances between any two of them:
17
where represents , , ; q, r, s, t = 1, 2; n is the number of replicates of control genotypes estimated from Formula (7); dA = (g − 1) + 2 and dC = (rc + 2 m − t) + 2 from Table 1; and C1 and C2 are the correction coefficients listed in Table 4 for calculation of different variances or covariances.
Table 4. Correction coefficients in Formula (17) for sampling variance estimation for model 1.
- aOn a plot basis.
- bOn an entry-mean basis.
For model 2, a similar general formula was derived to calculate variances of , , and and covariances of any two of them:
18
where e is the number of environments; n is the number of replicates estimated with Formula (7); dA = (g − 1) + 2, dC = (g − 1)(e − 1) + 2 and dG = e(rc + 2 m − t) + 2 in Table 2; and C1, C2, and C3 are the correction coefficients listed in Table 5 for calculation of different variances or covariances.
Table 5. Correction coefficients in Formula (18) for sampling variance estimation for model 2.
- aOn a plot basis.
- bOn an entry-mean basis.
For model 3, we derived the following general formula to calculate variances of , , and and covariances of any two of them:
19
where y is the number of years; s is the number of sites; n is the number of control replicates estimated with Formula (7); dA = (g − 1) + 2, dD = (g − 1)(y − 1) + 2, dE = (g − 1)(s − 1) + 2, dG = (g − 1)(y − 1)(s − 1) + 2 and dO = ys(rc + 2 m − t) + 2 from Table 3; and C1, C2, C3, C4, and C5 are the correction coefficients listed in Table 6 for calculation of different variances or covariances.
Table 6. Correction coefficients in Formula (19) for sampling variance estimation for model 3.
C1 | C2 | C3 | C4 | C5 | C1 | C2 | C3 | C4 | C5 | C1 | C2 | C3 | C4 | C5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a | n2 | n2(y− 1)2 | n2(s− 1)2 | n2(y− 1)2(s− 1)2 | 0 | n2 | –n2(y− 1) | –n2(s− 1) | n2(y− 1)(s− 1) | 0 | 0 | 0 | 0 | 0 | 0 |
b | n2 | 0 | 0 | 4n2 | (1 − n)2 | n2 | 0 | 0 | − 2n2 | 0 | 0 | 0 | 0 | 0 | 0 |
n2 | n2 | n2 | n2 | n2 | 0 | 0 | 0 | 0 | 0 | ||||||
0 | 0 | 0 | 0 | (nys)2 |
- aOn a plot basis.
- bOn an entry-mean basis.
2.5 Computer simulations
Based on the MAD2 design scheme, we simulated single MAD2 trials for estimation of two genetic parameters: H2 of a trait and rg between two traits. The purposes of the simulations were to (1) validate the proposed method for estimating genetic parameters in the MAD2 trials and (2) assess the accuracy of the derived theoretical formulas of the sampling variances for the two genetic parameters. We compared H2and rg values with the simulated and to determine whether these parameters were accurately estimated.
A single MAD2 trial with 10 × 10 whole plots and five subplots in each whole plot was simulated. The dataset of 390 test genotypes with one observation, one main plot control with 100 replicates, and two subplot controls with five replicates each were generated based on assumptions in Formula (5) and given values for heritability and genetic correlation of the test genotype population. All simulations were performed using R software (https://www.r-project.org/), and the R code is available upon request.
2.5.1 Broad-sense heritability (H2)
Given the σG2 and H2 of a trait, we can calculate the error variances as σe2 = σG2 (1 - H2)/H2 on a plot basis. Thus, we can simulate the effect of different error variances on the estimation of H2 in MAD2. Data generation was performed as follows: (1) given the μ and σG2 of a trait, we generated a set of normal random numbers for 390 test genotypes plus three control genotypes following N(μ, σG2), corresponding to the genetic values of test and control genotypes; (2) given H2, we calculated σe2 and generated 100 sets of normal random numbers with N(0, σe2), corresponding to the error effect of 100 replicates; and (3) we merged genetic values and error effects to generate phenotypic values of test and control genotypes of 100 replicates, creating a matrix of 393 rows and 100 columns, following N(μ, σP2) and representing phenotypic values of the single MAD2 trial; (4) we randomly chose 390 rows with one column to simulate test genotypes without replication, one row with all 100 columns to simulate the plot control with 100 replicates, and two rows with five columns to simulate two subplot controls with five replicates. For each given H2 value from 0.1 to 0.9 with an interval of 0.1, a total of 1000 simulations were performed. For each, the data were analyzed using model 1 (Table 1) and and its sampling error were estimated using Formulas (4) and (12). The standard deviation of in 1000 samples was calculated to represent an actual sampling error (henceforth termed “simulated” sampling error) for comparison with those calculated based on Formula (12).
2.5.2 Genetic correlation (rg)
Given two traits (1 and 2) following N(μ1, σG12) and N(μ2, σG22) with rg, we generated two sets of correlated random numbers to simulate genetic values of traits as follows: (1) we generated two sequences of uncorrelated standard normal distributed random numbers X1 and X2; (2) we defined a new variable that had a genetic correlation of rg with X1; and (3) we transformed X1 and Y into two new variables following the given normal distribution: X1' = X1σG1 + μ1 and X2' = YσG2 + μ2. To simplify the simulation, we set the error correlation re between the two traits to zero. We then generated two sets of independent random numbers for the error effects of the two traits. All other procedures followed the principles described above.
2.5.3 Simulation of trial data from multiple years and sites
When trial data from multiple years and sites are available, both models 2 and 3 can be used for genetic parameter estimation. Model 1 can also be applied for analysis of single trials. To compare these three statistical models, we simulated trial data from four years and two sites per year that were similar to those of the case study. The same trial design and simulation procedure as the single trial were used but several major effects for years and sites, and some interaction effects, were added to the linear model (Formula (10) and Table 3). A total of eight trials were produced for a given H2. All three models were used to estimate H2.
2.6 Pipeline programs
The ANOVA and covariance analyses in Table 1, Table 2 and Table 3 were implemented using SAS software (SAS Institute Inc., Cary, USA). The results from SAS served as input to a Perl program and were further analyzed to estimate several genetic parameters and their sampling variances. A new module including a SAS and a Perl program was appended to the MAD pipeline [19].
3 Results
3.1 Computer simulations
3.1.1 Estimation of genetic parameters and their sampling errors
Given the different H2 values from 0.1 to 1.0, the average estimates of 1000 simulated datasets were highly correlated (R2 = 0.998) with H2 (Fig. 1A); both theoretical and simulated sampling errors (s( ) decreased with increasing H2 (Fig. 1B); and the simulated s( was highly correlated with the theoretical s( (Fig. 1C). The s( values estimated from the two methods were consistent except when H2was less than 0.3. These results indicate that estimation of H2 and its s( using the derived theoretical formula is reliable and that the reliability of the estimates increases with H2.
Similarly, we simulated trial data for estimation of rg for values ranging from 0.1 to 0.9. rg was calculated based on the genetic covariance and variance of two traits. Considering that two traits may have different heritabilities, we generated data for 729 parameter combinations of different rg (0.1–0.9), H12 (0.1–0.9) and H22 (0.1–0.9) each with 1000 simulations. A significant correlation between rg and (R2 = 0.7242) was observed (Fig. 2A), but this relationship was more complex than that between and H2 (Fig. 1A). Large sampling errors were observed for any given rg, which may result from the bias caused by the correlated errors of two traits (re). We also noticed that the theoretical s( was slightly higher than the simulated s( (Fig. 2B), though the theoretical s( was also highly correlated with the simulated s( (Fig. 2C).
3.1.2 Sampling distribution of genetic parameters
Using 1000 simulations (or samples) for each given parameter or a combination of parameters, we can calculate the sampling error for each simulation and assess the sampling distribution of the parameter. Most samples appeared to be near- or normally distributed for and . Fig. 3A and B shows several typical examples of the sampling distributions for and , respectively. Based on all the simulated samples in two simulation experiments, 97% and 92% of the samples for and , respectively, were normally distributed (P > 0.05) and the remainders followed an approximate normal distribution, suggesting that the theoretically estimated sampling error of a parameter estimate can be used to derive an approximate assessment of the significance of an estimate different from zero with a Z test.
3.1.3 Comparison of statistical models
For the joint data analysis of trials from multiple years and sites, two statistical models, model 2 (Table 2) and model 3 (Table 3), are suitable. Technically, model 1 (Table 1) can also be used for a single-trial analysis. The question was whether all three models could accurately estimate the genetic parameters when significant genotype-by-environment interaction effects were present. To compare the three statistical models for the same sets of data, we simulated trial data from 4 years and two sites (similarly to the case study). The results showed that only model 3 produced accurate H2estimates, whereas models 2 and 1 overestimated H2, especially at low H2 values (Fig. 4A). The theoretically estimated sampling errors of fitted the simulated ones well in all three models (Fig. 4B). The sampling errors of in model 3 were higher than those in models 2 and 1. Although in model 1 had the lowest sampling errors, they deviated greatly from the correct values.
3.2 Case study
OIL, IOD and LIN are three phenotypic traits important in flax breeding for flaxseed or linseed. For the trial data of the BM population from 4 years at two sites, we first performed data adjustment using the MAD pipeline [19]. Then, using the adjusted observations, we also calculated the (Table 7) and (Table 8) for the three traits and their sampling errors on a single-plot and a genotype mean basis. Two statistical models (models 2 and 3) were applied to the same dataset. We also estimated the genetic parameters using model 1 independently for each of eight trials. Similar estimates for all two parameters were obtained using both model 2 and model 3 to account for the possibility of their high heritability. As expected, higher estimates of and of the three traits were obtained from model 1 (Table 7 and Table 8). The sampling error estimates from model 3 were consistently higher than those from models 2 and 1 (Table 7 and Table 8), in accordance with the simulation results (Fig. 4). Because the two genetic parameters follow a normal sampling distribution (Fig. 3), we could perform an approximate Z test to determine whether the estimates of the parameters were significantly different from zero. All three traits had high and statistically significant (P < 0.01) heritability estimates. For , the estimates of all possible trait pairs were significant in model 2 and model 1, but the estimates of some trait pairs in model 3 were not significant because of their higher sampling errors. In addition, the estimates of based on the genotype mean were larger than those based on single plots because the estimation of phenotypic variances differed (Formulas (2), (9), and (11)).
Table 7. and s( for three traits (OIL, IOD and LIN) in the BM population.
Modela | Unitb | OIL | IOD | LIN |
---|---|---|---|---|
Model 3 | Genotype mean | 0.916 ± 0.063⁎⁎ | 0.957 ± 0.025⁎⁎ | 0.954 ± 0.025⁎⁎ |
Model 2 | Genotype mean | 0.888 ± 0.011⁎⁎ | 0.963 ± 0.004⁎⁎ | 0.964 ± 0.004⁎⁎ |
Model 1 | Genotype mean | 0.996 ± 0.001⁎⁎ | 0.997 ± 0.001⁎⁎ | 0.997 ± 0.001⁎⁎ |
Model 3 | Plot | 0.490 ± 0.084⁎⁎ | 0.748 ± 0.059⁎⁎ | 0.748 ± 0.060⁎⁎ |
Model 2 | Plot | 0.400 ± 0.025⁎⁎ | 0.676 ± 0.021⁎⁎ | 0.675 ± 0.021⁎⁎ |
Model 1 | Plot | 0.905 ± 0.017⁎⁎ | 0.919 ± 0.014⁎⁎ | 0.925 ± 0.013⁎⁎ |
- aModel 3: joint analysis of 4 years × 2 sites; Model 2: joint analysis using eight environments (each site/year as an environment); and Model 1: one single trial (2012 at Morden) is shown as an example.
- bGenotype mean: on an entry-mean basis; Plot: on a plot basis.
- ⁎⁎Represents statistical significance at the 0.01 probability level.
Table 8. and s( ) between three traits (OIL, IOD and LIN) in the BM population.
Model | OIL vs. IOD | OIL vs. LIN | IOD vs. LIN |
---|---|---|---|
Model 3 | − 0.277 ± 0.187 | − 0.261 ± 0.188 | 0.963 ± 0.015⁎⁎ |
Model 2 | − 0.268 ± 0.065⁎⁎ | − 0.259 ± 0.065⁎⁎ | 0.961 ± 0.005⁎⁎ |
Model 1 | − 0.336 ± 0.065⁎⁎ | − 0.330 ± 0.064⁎⁎ | 0.957 ± 0.006⁎⁎ |
See Table 7 for the same notes.
4 Discussion
An augmented design is usually applied by breeders to a large number of lines that are to be planted in a field of limited size. Error variance and genetic parameters may be estimated from replicated controls in unreplicated experimental designs such as MAD2. In the present study, genetic variance (covariance) was calculated based on total phenotypic variance (covariance) estimated from the test genotypes minus error variance (covariance) estimated from the control genotypes. This separate analysis approach provides approximate estimates of genetic parameters based on the MAD2 design, although it is not optimal for some cases. Our simulation results suggest that the method we propose is highly accurate for estimating H2 with the reliability of the estimates increasing with trait heritability. Estimates of rg had larger sampling errors than those of H2, indicating that the latter is less subject to environmental effects.
We derived approximate theoretical sampling error formulas for the two genetic parameters using the Delta method [28] and [29]. We found that the theoretical sampling errors of all two genetic parameters were highly consistent with the simulated sampling errors, except for a few cases at very low heritability (Fig. 1, Fig. 2and Fig. 3) suggesting that estimation of the sampling errors for two genetic parameters in MAD2 is reliable and that it can be used to test whether the estimated genetic parameters are significantly different from zero.
Theoretically, the total variance of the test genotypes (the mean square Aii in Table 1) will be greater than the error variance in a single trial. Accordingly, we were able to obtain genetic variance as total variance minus the error variance. However, because a limited number of control genotypes (three in our case) were used to estimate the error variance, the latter estimate is occasionally greater than the total variance of the test genotypes as a consequence of sampling bias. This results in negative genetic variance estimates and failure to estimate genetic parameters. In our simulation, when H2 = 0.1,22.5% of simulation data sets failed to yield estimates of genetic parameters, but when H2 = 0.3, only 0.6% of simulation data sets failed; and when H2 > 0.3, none failed. When the heritability of a trait is very low (e.g. < 0.1), the method proposed in this paper is sometimes unable to estimate genetic parameters precisely. In addition, there is some risk of misadjustment in this design if control genotypes show a different error variance or perform differently from the unreplicated entries [35]. Some alternatives have been proposed to reduce this risk, such as partially replicated (p–rep) designs, where a proportion of the test entries are replicated at each location [36], [37] and [38].
There are two units used to measure phenotypic variances: one based on the single plot and the other based on the genotype mean. The two measurement units will generate different estimates of H2; however, the estimation of rg is not affected because the numerator and denominator of Formula (1) for calculating involve only genetic components. The estimates of phenotypic variance based on the genotype mean were always larger than those based on the plot (Table 7 and Table 8) because the error and interaction variance components were divided by the corresponding number of observations in the measurement unit on a genotype mean basis (Formulas (6), (9), and (11)). Because MAD2 is an unreplicated unbalanced design, each adjusted observation comes from single plots only, and estimates based on the plot may be more reasonable estimates of genetic variation.
Three statistical models were considered. Because model 1 deals only with single-trial data, the genetic variance contains an undecomposable genotype-by-environment interaction and consequently H2 and rg are always overestimated (Fig. 4A, Table 7 and Table 8). For this reason, we suggest a joint analysis of trials from multiple environments (different years and/or sites) with model 2 or model 3. However, in the presence of significant genotype-by-environment effect, H2 is generally overestimated in both models 1 and 2 (Fig. 4A). Theoretically, in model 2, the total variation of the test genotypes is partitioned into three components: G, E, and G × E (Table 2), whereas in model 3, E is further partitioned into Y, S, and Y × S, and G × E into G × Y, G × S, and G × Y × S. Hence, ANOVA of the same dataset had identical sum of squares (SS) of G in models 2 and 3; the SS of E was equal to the summation of the SS of Y, S and Y × S; and the SS of G × E was equal to the summation of the SS of G × Y, G × S, and G × Y × S. Both models also yielded the same error variances. The two models applied different formulas (Formulas (9) and (11)) to estimate σG2, σGE2 or σGY2, σGS2, and σGYS2 that resulted in higher σG2 and lower σGE2 in model 2 than in 3 (Fig. 5) and consequently in the overestimation of genetic parameters in model 2. However, because more partitioned variance components in model 3 are indirectly estimated, higher sampling errors usually ensue—the major reason for the higher sampling variance of the genetic parameters estimated from model 3. Model 2 yields reasonable estimation accuracy and low sampling variance. Because model 2 treats all years, sites or their combinations as environments, it can be applied when complete data missing for a year or a site occurs, or data from only years or sites are available. Thus, model 2 is a more practical and flexible statistical model for genetic parameter estimation using datasets from multiple years and sites.
5 Conclusions
We have proposed an approximation method to estimate H2 and rg and their respective sampling variances for MAD2 trials. The simulation results suggest that H2can be reliably estimated in the MAD2 trial. The sampling error estimates based on the derived theoretical formulas coincide with the simulated values and can be applied to statistical tests of estimated genetic parameters.
Acknowledgments
This work was partly supported by an A-base project funded by Agriculture and Agri-Food Canada, the TUFGEN project funded by Genome Canada and other stakeholders, and funds from the Western Grains Research Foundation. The authors thank Andrzej Walichnowski for manuscript editing.
Appendix A Supplementary data
References
- [1]
- Augmented (or hoonuiaku) designs
- Hawaiian Planter's Record, Volume 55, 1956, pp. 191–208
- [2]
- Some augmented row-column designs
- Biometrics, Volume 31, 1975, pp. 361–374
- [3]
- On augmented designs
- Biometrics, Volume 31, 1975, pp. 29–35
- |
- [4]
- A modified augmented design for an early stage of plant selection involving a large number of test lines without replication
- Biometrics, Volume 39, 1983, pp. 553–561
- |
- [5]
- A modified augmented design (type 2) for rectangular plots
- Can. J. Plant Sci., Volume 65, 1985, pp. 743–749
- |
- [6]
- Determination of the best indirect selection criteria in Iranian durum wheat (Triticum aestivum L.) genotypes under irrigated and drought stress conditions
- Genetika, Volume 47, 2015, pp. 549–558
- |
- [7]
- Field evaluation of type 2 modified augmented designs for non-replicated yield trials in the early stages of a wheat breeding program
- Bericht uber die Arbeitstagung 2002 der Vereinigung der Pflanzenzuchter und Saatgutkaufleute Osterreichs gehalten vom 26. bis 28 November 2002 in Gumpenstein, P. Ruckenbauer, F. Raab, R. Kern, K. Buchgraber, A. Schaumberger, 2003
- [8]
- Field evaluation of a modified augmented design for early stage selection involving a large number of test lines without replication
- Potato Res., Volume 30, 1987, pp. 35–45
- |
- [9]
- Efficiency of type 2 modified augmented designs in soybean variety trials
- Agron. J., Volume 81, 1989, pp. 512–517
- |
- [10]
- Success of a selection program for increasing grain yield of two-row barley lines and evaluation of the modified augmented design (type 2 )
- Can. J. Plant Sci., Volume 75, 1995, pp. 795–799
- |
- [11]
- Field evaluation of a modified augmented design (type 2) for screening barley lines
- Can. J. Plant Sci., Volume 69, 1989, pp. 9–15
- [12]
- A modified augmented design for early selection stages in sugarcane and its limitation
- Sugar Tech., Volume 1, 1999, pp. 63–66
- |
- [13]
- Use of a modified augmented design in the unreplicated stages of sugarcane selection
- Report of Projects Louisiana Agricultural Experiment Station, 1990, Department of Agronomy, pp. 132–135
- [14]
- Evaluation of maize inbred lines for resistance to fusarium ear rot and fumonisin accumulation in grain in tropical Africa
- Plant Dis., Volume 91, 2007, pp. 279–286
- |
- [15]
- Association mapping of seed quality traits using the Canadian flax (Linum usitatissimum L.) core collection
- Theor. Appl. Genet., Volume 127, 2014, pp. 881–896
- [16]
- Genomic regions underlying agronomic traits in linseed (Linum usitatissimum L.) as revealed by association mapping
- J. Integr. Plant Biol., Volume 56, 2014, pp. 75–87
- |
- [17]
- Fatty acid composition and desaturase gene expression in flax (Linum usitatissimum L.)
- J. Appl. Genet., Volume 55, 2014, pp. 423–432
- |
- [18]
- QTL for fatty acid composition and yield in linseed (Linum usitatissimum L.)
- Theor. Appl. Genet., Volume 128, 2015, pp. 965–984
- |
- [19]
- Statistical analysis and field evaluation of the type 2 modified augmented design (MAD) in phenotyping of flax (Linum usitatissimum) germplasms in multiple environments
- Aust. J. Crop Sci., Volume 7, 2013, pp. 1789–1800
- [20]
- The variance of the genetic correlation coefficient
- Biometrics, Volume 11, 1955, pp. 357–374
- |
- [21]
- The sampling variance of the genetic correlation coefficient
- Biometrics, Volume 15, 1959, pp. 469–485
- |
- [22]
- Pleotropism and the genetic variance and covariance
- Biometrics, Volume 15, 1959, pp. 518–537
- |
- [23]
- The sampling variance of the correlation coefficients estimated in genetic experiments
- Biometrics, Volume 22, 1966, pp. 187–191
- |
- [24]
- The sampling variances of co-heritability and genetic correlation coefficients between traits in genetic design of single factor
- J. Nanjing Agric. Univ., Volume 17, 1994, pp. 13–20 (in Chinese with English abstract)
- [25]
- Simulation study of three adjustment methods for the modified augmented design and comparison with the balanced lattice square design soil variation, statistical models
- J. Agric. Sci., Volume 100, 1983, pp. 527–534
- |
- [26]
- Analysis of genetic parameters of agronomic and seed quality traits of soybean landraces in southern China soybean
- Soybean Sci., Volume 9, 1990, pp. 9–18 (in Chinese with English abstract)
- [27]
- The Advanced Theory of Statistics, Vol. 2
- 1979, Inference and Relationship, Griffin, London
- [28]
- A note on the Delta method
- Am. Stat., Volume 46, 1992, pp. 27–29
- |
- [29]
- Statistical Models
- 2003, Cambridge University Press, Cambridge
- [30]
- Estimation of the sampling variance of co-heritability
- J. Genet. Genomics, Volume 20, 1993, pp. 504–513 (in Chinese wth English abstract)
- [31]
- The Advanced Theory of Statistics, Vol 1
- 1977, Distribution Theory, Griffin, London
- [32]
- Sampling errors of genetic correlation coefficients calculated from analyses of variance and covariance
- Aust. J. Stat., Volume 1, 1959, pp. 35–43
- |
- [33]
- The Mathematical Theory of Quantitative Genetics
- 1980, Clarendon Press, Oxford
- [34]
- An Introduction to Quantitative Genetics
- 1957, Longman, New York and London
- [35]
- The design and analysis of unreplicated field trials
- Vortrage Fuer Pflanzenzuchtung, Volume 7, 1984, pp. 219–242
- [36]
- Construction of more flexible and efficient p-rep designs
- Aust. Nz. J. Stat., Volume 56, 2014, pp. 89–96
- |
- [37]
- Augmented p-rep designs
- Biom. J., Volume 53, 2011, pp. 19–27
- |
- [38]
- On the design of early generation variety trials with correlated data
- J. Agric. Biol. Environ. Stat., Volume 11, 2006, pp. 381–393
- |
- Peer review under responsibility of Crop Science Society of China and Institute of Crop Science, CAAS.
- ⁎ Corresponding author. Tel.: + 1 204 822 7525; fax: + 1 204 808 7507.
For further details log on website :
http://www.sciencedirect.com/science/article/pii/S2214514116000179
No comments:
Post a Comment