-
GBB was first initialized and reported in 2014 by Liu et al.[10] & Zhang et al.[11] based on the following hypotheses and facts:
(1) Genes are the primary determinants of genetic performances of all traits, thus being at the central position of plant and animal breeding, despite modulation of the performance of traits by epigenetic factors, such as small RNAs and methylation, and variable environments (G × E interaction). Therefore, a variety having a desirable gene content, especially their favorable alleles, heterosis genotypes, and desirable networks, will perform best when grown in an environment to which it is best adapted. In fact, when the expressions of the gene transcripts controlling breeding objective traits are used for GBB, epigenetic factors and G × E interaction are included because gene transcript expressions are a consequence of not only gene activities, but also gene × gene interaction, gene × non-gene element interaction, G × E interaction, and epigenetic factors such as small RNAs and methylation.
(2) The molecular basis of breeding is to continuously incorporate the favorable alleles and heterotic genotypes of the genes controlling objective traits into new varieties.
(3) Most agronomic traits, such as yield, quality, yield and quality component traits, and biotic/abiotic stress tolerances, are controlled by numerous genes, probably over 1,000 genes[10−14] (e.g., GenBank acc. No.: MW082098–MW082571), which makes it impossible or difficult for breeders to simultaneously manipulate such large numbers of genes for efficient breeding and to incorporate most, if not all, of their favorable alleles and heterosis genotypes from parents into new varieties.
(4) Both pure-line varieties, e.g., bread wheat and soybean, and hybrid varieties, e.g., maize, rice, most vegetable crops, and most fruit trees, are used in food production. The current breeding procedures, whether for pure-line or hybrid varieties, all include parent selection, crossing design, and progeny selection, followed by multi-location variety testing; therefore, GBB could be performed in a manufactory manner in greenhouse or phytotron. Where a GBB variety is grown can be determined by multi-location variety testing, thus dramatically accelerating the breeding process, increasing breeding efficiency, and reducing breeding cost.
GBB has been first tested in cotton for pure-line variety breeding using fiber length, a typical quantitative agronomical trait with a high heritability of 0.83−0.90, as a breeding objective trait[10,12,15,16] and in maize for inbred line and hybrid variety breeding using grain yield, an extremely complex agronomical trait with a low or moderate heritability of 0.41–0.62, as a breeding objective trait[11,13,14]. In summary, the following have been learned from these studies. First, since the genes controlling breeding objective traits are used, GBB is efficient not only for progeny selection, but also for parent selection and crossing design that are critical to success of breeding. Therefore, the most desirable parents can be selected to approach the breeding objectives; an optimal crossing design can be performed to maximally incorporate the favorable alleles and heterosis genotypes of the genes controlling breeding objective traits from breeding parents into new progeny; and the best individual in the objective traits can be accurately identified from the progeny pool and rapidly developed into a new elite variety or breeding line at earlier generations. It is essential for a successful breeding program to have breeding parents that are the most complementary in content of the genes controlling breeding objective traits and that potentially yield progeny with performance better than the currently released commercial varieties and to have a cross that can maximally combine the favorable alleles and heterosis genotypes of the genes from parents into progeny containing individuals more superior than the currently released best commercial variety. It is impossible for a breeder to identify superior individuals in a progeny pool and to develop them into superior varieties if no superior individual exists in the progeny pool. For example, two higher-yielding parents can be selected for breeding in a current breeding (CB) program that may be assisted by genomic selection, without a comprehensive knowledge of the genes underlying yield. But if they have the same or similar set of alleles for the genes controlling yield, no genetic improvement may result from crossing two seemingly higher-yielding or better parents (Fig. 2). On the other hand, two seemingly phenotypically mediocre lines but possessing complementary allele combinations may be selected as the parents for GBB if detailed knowledge of their favorable alleles and heterosis genotypes is available, and when crossed, can produce superior progeny, from which superior varieties can be developed[3] (Figs 1 & 2). In comparison, genomic selection, a genomics-assisted method developed for genome-wide assisted progeny selection[7], has been extensively studied in over the past 20 years for progeny selection in breeding programs based on genome-wide random SNP markers[17−22], genome-wide gene expressions[23], or genome-wide metabolites[23,24]. Nevertheless, although it has been shown that genomic selection was efficient for progeny selection, no study has been reported, to the best of my knowledge, about its efficiency for parent selection and crossing design. Second, for progeny selection GBB predicted the cotton fiber length phenotypes of breeding progeny and maize inbred line grain yield and F1 hybrid grain yield from parents, for instance, at a prediction accuracy (r, correlation coefficient between predicted and observed phenotypes) of 0.83–0.86[12−16] (Fig. 3). These prediction accuracies of GBB were higher than those of genomic selection for the same or similar types of populations with cross-validation schemes using tens to hundreds of thousands of genome-wide random DNA markers, gene expressions, or metabolites by 116% for cotton fiber length[12], 63% for maize inbred line grain yield[13], and 27%–406% for maize F1 hybrid grain yield from parents[14]. Furthermore, the phenotypes of breeding objective traits can be predicted with genic SNP/InDel markers, NFAs, and expressions, individually or jointly, for GBB's progeny selection. When the phenotypes of the traits predicted with two or all three of the genic datasets were jointly employed for progeny selection, the top 10% plants selected by GBB were consistent up to 100% with the top 10% plants selected based on the phenotypes determined by standard replicated field trials[12,13] (Table 1). Third, development of varieties with GBB can be designed according to breeder's objectives, including selecting parents that can potentially result in progeny having more NFAs than the current best commercial variety (Fig. 1), designing crosses that can maximally combine the favorable alleles and heterosis genotypes of the genes from parents into progeny, and accurately identifying the individuals that have more NFAs than the current best commercial variety. Therefore, the results of breeding efforts can be much more predictable with GBB than with current breeding methods. Fourth, since it is based on genes controlling breeding objective traits for parent selection, crossing design, and progeny selection, GBB can be practiced in a manufactory manner in greenhouses or phytotron before multi-location variety testing, thus accelerating the breeding process, reducing breeding cost, and increasing genetic gain per unit time dramatically. Fifth, the varieties developed by GBB can be readily fine-tuned by gene or genome editing, if they have undesirable traits, because the genes controlling objective traits are known, including the SNP/InDel mutations, expression variation, and interactions or networks of the genes and their impacts on performance of objective traits. Sixth, it is often necessary for all genomic selection projects to train and validate prediction models with a portion of the targeted breeding population, known as a training population[8,18,25], which is costly and slows the breeding process. In comparison, when the NFAs of the genes are applied for GBB, a training population and a prediction model are unnecessary for progeny performance prediction and selection because the total NFAs of the genes controlling objective traits were correlated with the performance of the objective traits in both cotton and maize (r = 0.85, p < 0.0001)[13−16] and can be directly used for progeny selection. Seventh, when the total NFAs of the genes controlling breeding objective traits are used for GBB, the computation for phenotype prediction of the objective traits that is necessary by means of super-computing facility is dramatically simplified. A simple calculator is sufficient to calculate the total NFAs of the genes controlling a breeding objective trait in an individual or a hybrid, which is the sum of the NFAs of individual genes controlling the trait in the individual or hybrid (Table 2). Eighth, a simple and rapid method can be developed for high-throughput genotyping of genes by sequencing the genes only controlling the objective traits for GBB, with which > 1,000,000 progeny individuals can be genotyped with more than 2,000 genes that control 10–30 agronomic traits by an Illumina sequencer (HiSeq 4000) run at a cost of <
2.00 per individual for progeny selection (in preparation). Liu et al.[12] & Zhang et al.[13] showed that 125 key genes controlling cotton fiber length and 150 key genes controlling maize grain yield were sufficient for accurate prediction of phenotypes for cotton fiber length and maize grain yield, respectively. The genotyping method substantially reduces the cost for progeny selection, relative to that for genomic selection for which tens to hundreds of thousands of omic features are often used, and allows identification of superior individuals from the progeny pool that are homozygous for non-heterosis genes but heterozygous for heterosis genes in the genes controlling objective and other agronomic traits at a generation as early as the F2 generation, thus substantially accelerating the process of development from the selected progeny to a pure-line variety. Keeping the heterosis genes of the objective trait in heterozygous state is a huge plus to a new variety because the heterosis of the genes can be used for further crop improvement and increased production. Finally but importantly, Liu et al.[16] studied the genetic potential of advanced cotton breeding lines using 226 GFL (Gossypium fiber length) genes and found that the current cotton best varieties or advanced breeding lines contain only approximately 52% of the total NFAs of the genes controlling fiber length. Using the cotton GFL genes and the breeding lines, they trained linear and non-linear models and predicted that GBB could potentially further improve current best varieties or advanced breeding lines by up to 118% (Fig. 4) if the favorable alleles and heterosis genotypes of all 226 GFL genes are incorporated into a new variety[16]. Therefore, GBB is promising to further large-scale improve crops and livestock and enhance food and feed production.${\$} $ Table 1. Progeny selection of gene-based breeding (GBB) for top 10% plants with the highest grain yields predicted with ZmINGY genes vs phenotypic selection (PS) of conventional breeding for top 10% plants with the highest grain yields determined by replicated field trials for inbred line breeding in maize[13].
Consistency of GBB with PS ZmINGY genic datasets I II III I + II I + III II + III I + II + III Field trials, Halfway, Texas, 2010 40.0% 50.0% 66.7% 100.0% 66.7% 100.0% 100.0% Field Trials, College Station, Texas, 2010 41.2% 33.3% 55.6% 80.0% 100.0% 100.0% 100.0% I. Number of favorable alleles (NFAs) of 27 SNP/InDel-containing ZmINGY genes; II. SNPs/InDels of the 27 SNP/InDel-containing ZmINGY genes; III. The transcript expressions of the 150 key ZmINGY genes. Note that when the grain yields of the plants predicted with two or all three genic datasets of the ZmINGY genes were jointly used for progeny selection, the top 10% plants selected with the highest grain yields predicted with the genes were consistent up to 100% with those selected with the highest grain yields determined by replicated field trials. Halfway, Texas and College Station, Texas represent two different agricultural ecosystems and climate zones in the USA. Table 2. Statistics of NFAs of each gene and total NFAs of all genes controlling an agronomical trait[15].
Statistical analysis (ANOVA and LSD) Effect Genotype of a gene AA Aa aa AA > Aa > aa Additive 2 1 0 Aa = AA > aa complete dominant 2 2 0 Aa > AA > aa Over-dominant 2 3 0 Allele 'A' is the favorable allele over allele 'a', when 'AA' is larger than 'aa' (p ≤ 0.05). The total NFAs of all genes controlling an agronomical trait is calculated with the following formula: y = ∑xi, where y is the total NFAs of the genes controlling an agronomical trait, x is the NFAs of individual genes controlling the trait, which is 0, 1, 2, or 3, in individual 'i'. Figure 4.
Genetic potential of current cotton varieties or advanced breeding lines having a fiber length of 33.8 mm predicted using the total NFAs of the 226 GFL genes contained in cotton advanced breeding plants or lines[16]. (a) Training linear and non-linear models using the fiber lengths of 198 advanced breeding lines determined by multiple replicated field trials and the total NFAs of their 226 GFL genes controlling fiber length. (b) Predicting the genetic potential of the current cotton variety or advanced breeding line with the longest fiber length using the trained linear and non-linear models, respectively. The current cotton variety or advanced breeding line with the fiber lengths (33.8 mm) of the current best cotton varieties only contained 52% of the 728 total NFAs of the 226 GFL genes and that they could potentially be further improved by up to 118% for the linear model or 73% for the non-linear model, if all 728 NFAs of the 226 GFL genes are incorporated into a new variety through GBB.
-
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
-
About this article
Cite this article
Zhang HB. 2024. Gene-based Breeding (GBB), a novel discipline of biological science and technology for plant and animal breeding. Tropical Plants 3: e005 doi: 10.48130/tp-0024-0005
Gene-based Breeding (GBB), a novel discipline of biological science and technology for plant and animal breeding
- Received: 19 December 2023
- Accepted: 15 January 2024
- Published online: 20 February 2024
Abstract: Gene-based breeding (GBB) is an innovative technology and science for plant and animal breeding. Studies have shown that GBB is extremely powerful, predictable, accelerated, and cost-efficient for both pure-line and hybrid variety breeding. Moreover, the concepts, principles, techniques, and methodologies developed and used for GBB are also applicable to molecular precision agriculture, such as gene-based agriculture, and molecular precision medicine in humans as well as in animals, such as gene-based health, gene-based clinics, and gene-based medicine. Therefore, research, development, and applications of GBB for plant and animal breeding are promising to promote substantial crop and livestock genetic improvement, enhanced agriculture production, and improvement and transition of current phenotypic medicine to genotypic medicine in humans and animals.
-
Key words:
- Gene-based Breeding /
- Molecular breeding /
- Genes controlling agronomic trait /
- Plant /
- Animal