CN111524545A - Method and apparatus for whole genome selective breeding - Google Patents

Method and apparatus for whole genome selective breeding Download PDF

Info

Publication number
CN111524545A
CN111524545A CN202010366270.XA CN202010366270A CN111524545A CN 111524545 A CN111524545 A CN 111524545A CN 202010366270 A CN202010366270 A CN 202010366270A CN 111524545 A CN111524545 A CN 111524545A
Authority
CN
China
Prior art keywords
genome
model
analysis
breeding
bayesian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010366270.XA
Other languages
Chinese (zh)
Other versions
CN111524545B (en
Inventor
喻宇烨
梁齐齐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Novogene Biological Information Technology Co ltd
Original Assignee
Tianjin Novogene Biological Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Novogene Biological Information Technology Co ltd filed Critical Tianjin Novogene Biological Information Technology Co ltd
Priority to CN202010366270.XA priority Critical patent/CN111524545B/en
Publication of CN111524545A publication Critical patent/CN111524545A/en
Application granted granted Critical
Publication of CN111524545B publication Critical patent/CN111524545B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B10/00ICT specially adapted for evolutionary bioinformatics, e.g. phylogenetic tree construction or analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physiology (AREA)
  • Bioethics (AREA)
  • Epidemiology (AREA)
  • Artificial Intelligence (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Animal Behavior & Ethology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a method and a device for whole genome selective breeding. The method comprises the following steps: obtaining markers in a training population that are significantly associated with a target phenotype; calculating a genome estimated breeding value of each individual in the breeding population by utilizing various whole genome selection prediction models according to the training population and the markers; individuals, each ranked at the top by a predetermined number in the plurality of genome-wide selection prediction models, are selected as breeding materials in the order of the genome estimated breeding value from high to low. And (3) calculating a genome estimated breeding value by integrating a plurality of models, co-locating by using results of the plurality of models, and selecting individuals with high breeding values in all the models as breeding materials, thereby greatly improving the accuracy of the result. The method can adapt to most material backgrounds, fills the blank of genome selection analysis in a supercomputer, improves the effect of breeding selection and promotes the progress of breeding.

Description

Method and apparatus for whole genome selective breeding
Technical Field
The invention relates to the field of molecular breeding, in particular to a method and a device for whole genome selective breeding.
Background
In the history of selective breeding, the exploration from empirical breeding to breeding theory and method is performed, and the selective theory, the pure line theory, backcross breeding, recurrent breeding, mutation breeding, single seed transmission and ideal plant type are provided; further to marker assisted selection breeding, a variety of markers have been explored, such as amplified fragment length polymorphism marker assisted selection (AFLP), microsatellite marker assisted selection (SSR), and single nucleotide polymorphism marker assisted Selection (SNP). With the development of sequencing technology, sequencing throughput is higher and higher, cost is lower and lower, and computer computing capability is continuously improved, so that technical conditions are created for the development of brand-new breeding technology, and the breeding wave of Genome Selection (GS) is generated.
The genome selective breeding can effectively solve the limitation of factors such as difficult character measurement, large gas transportation component, long time consumption, high technical difficulty and the like, and quickens the breeding pace. Genome selective breeding is a breeding method of marker-assisted selection using high-density molecular genetic markers covering the entire genome.
The current famous Genome Selection (GS) analysis function software is Ipat software, the interface of the Icat software is friendly, but the Ipat has only three GS models, namely, genome optimal linear unbiased estimation (GBLUP), ridge regression optimal linear unbiased estimation (RRBLUP) and Bayesian Ridge Regression (BRR).
However, for companies with a demand for rapid breeding, the existing genome selection analysis has low efficiency and relatively low accuracy of analysis results, and thus cannot meet the demand.
Disclosure of Invention
The invention mainly aims to provide a method and a device for whole genome selective breeding, so as to solve the problem of low accuracy of analysis results in the prior art.
In order to achieve the above object, according to one aspect of the present invention, there is provided a method of whole genome selective breeding, the method comprising: obtaining markers in a training population that are significantly associated with a target phenotype; calculating a genome estimated breeding value of each individual in the breeding population by utilizing various whole genome selection prediction models according to the training population and the markers; individuals, each ranked at the top by a predetermined number in the plurality of genome-wide selection prediction models, are selected as breeding materials in the order of the genome estimated breeding value from high to low.
Further, the plurality of genome-wide selection prediction models includes: at least 4 of a genome optimal linear unbiased prediction model, a ridge regression optimal linear unbiased estimation model, a Bayesian lasso model, a Bayesian A model, a Bayesian B model, a Bayesian C model, and a Bayesian ridge regression model.
Further, when the plurality of genome wide selection prediction models include at least 3 of a ridge regression optimal linear unbiased estimation model, a bayesian lasso model, a bayesian a model, a bayesian B model, a bayesian C model and a bayesian ridge regression model, calculating the genome estimated breeding value of each individual in the breeding population by using the plurality of genome wide selection prediction models comprises: carrying out accuracy evaluation on the multiple whole genome selection prediction models by utilizing the significant correlation between the target phenotype and the markers in the training population to obtain one or more whole genome selection prediction models meeting the accuracy requirement; calculating to obtain the effect value of each marker by using one or more whole genome selection prediction models meeting the accuracy requirement; the genome estimated breeding value for each individual in the breeding population was calculated using the effect values of each marker.
Further, obtaining markers in the training population that are significantly associated with the target phenotype comprises: and performing whole genome association analysis on sequencing data of a training population derived from a gene chip or genome re-sequencing so as to obtain a marker which is obviously associated with the target phenotype.
Further, performing whole genome association analysis from the sequencing data to obtain markers significantly associated with the target phenotype comprises: performing comprehensive analysis on sequencing data, and performing comprehensive analysis on phenotype distribution analysis, population structure analysis, linkage disequilibrium analysis and genetic relationship analysis; and performing whole genome association analysis according to the result of the comprehensive analysis, thereby obtaining the marker which is obviously associated with the target phenotype.
Further, performing comprehensive analysis on the sequencing data, and performing genome-wide association analysis according to the results of the comprehensive analysis, thereby obtaining markers significantly associated with the target phenotype includes: detecting whether the quantitative phenotype in the sequencing data conforms to normal distribution or skewed distribution, and rejecting an extreme phenotype deviating from a lever value; calculating a population structure in a training population through principal component analysis or population structure analysis, and adding the population structure into a whole genome association analysis model as a fixed effect; performing linkage disequilibrium filtration on the markers of the whole genome through attenuation distance to remove the markers with the effect of multiple collinearity; calculating the genetic distance between individuals in a training population, and adding the genetic distance into a whole genome association analysis model as a random effect; calculating the correlation between the quantitative trait phenotype and the whole genome marker by using a whole genome correlation analysis model, and selecting and obtaining a marker which is obviously correlated with the target phenotype; preferably, the genome-wide association analysis model is a mixed linear model.
In order to achieve the above object, according to one aspect of the present invention, there is provided an apparatus for whole genome selective breeding, the apparatus comprising: the system comprises an acquisition module, a breeding value estimation module and a selection module, wherein the acquisition module is used for acquiring a mark which is obviously associated with a target phenotype in a training population; the breeding value estimation module is used for calculating the genome estimation breeding value of each individual in the breeding population by utilizing various whole genome selection prediction models according to the training population and the markers; and the selection module is used for selecting individuals with preset numbers in the plurality of whole genome selection prediction models as breeding materials according to the sequence of the genome estimated breeding value from high to low.
Further, the plurality of genome-wide selection prediction models includes: at least 4 of a genome optimal linear unbiased prediction model, a ridge regression optimal linear unbiased estimation model, a Bayesian lasso model, a Bayesian A model, a Bayesian B model, a Bayesian C model, and a Bayesian ridge regression model.
Further, when the plurality of whole genome selection prediction models include at least 3 of a ridge regression optimal linear unbiased estimation model, a bayesian lasso model, a bayesian a model, a bayesian B model, a bayesian C model and a bayesian ridge regression model, the breeding value estimation module includes: the model accuracy evaluation module is used for evaluating the accuracy of various whole genome selection prediction models by utilizing the significant correlation between the target phenotype and the markers in the training population to obtain one or more whole genome selection prediction models meeting the accuracy requirement; the effect value calculation module is used for selecting a prediction model by utilizing one or more whole genomes meeting the accuracy requirement and calculating to obtain the effect value of each marker; and the breeding value estimation submodule is used for calculating a genome estimated breeding value of each individual in the breeding population by using the effect value of each marker.
Further, the acquisition module includes: and the whole genome correlation analysis module is used for carrying out whole genome correlation analysis on sequencing data of a training population derived from a gene chip or genome re-sequencing so as to obtain a marker obviously correlated with the target phenotype.
Further, the genome wide association analysis module comprises: the comprehensive analysis module is used for carrying out comprehensive analysis on the sequencing data, comprehensive analysis phenotype distribution analysis, population structure analysis, linkage disequilibrium analysis and genetic relationship analysis; and the whole genome association analysis submodule is used for performing whole genome association analysis according to the result of the comprehensive analysis so as to obtain a marker which is obviously associated with the target phenotype.
Further, the genome wide association analysis module comprises: the phenotype distribution analysis module is used for detecting whether the phenotypes of the quantitative traits in the sequencing data accord with normal distribution or skewed distribution or not and rejecting extreme phenotypes deviating from the lever values; the group structure analysis module is used for calculating the group structure in the training group through principal component analysis or group structure analysis, and adding the group structure into the whole genome association analysis submodule as a fixed effect; the linkage disequilibrium analysis module is used for carrying out linkage disequilibrium filtration on the markers of the whole genome through the attenuation distance and removing the markers with the effect of multiple collinearity; the genetic relationship analysis module is used for calculating the genetic distance among individuals in the training population and adding the genetic distance into the whole genome association analysis submodule as a random effect; the whole genome association analysis submodule is used for calculating the association between the quantitative trait phenotype and the markers of the whole genome, so that the markers which are obviously associated with the target phenotype are selected; preferably, the whole genome association analysis submodule is a mixed linear module.
In order to achieve the above object, according to one aspect of the present invention, there is provided a storage medium comprising a stored program, wherein the program is executed to control a device on which the storage medium is located to perform any one of the above-mentioned methods for whole genome selective breeding.
To achieve the above object, according to one aspect of the present invention, there is provided a processor for running a program, wherein the program when run performs any of the methods of whole genome selection breeding.
By applying the technical scheme of the invention, the genome estimated breeding value is calculated by integrating a plurality of models, the results of the plurality of models are co-located, and individuals with high breeding values in all the models are selected as breeding materials, so that the accuracy of the results is greatly improved. In addition, the method can find out the best model from various models to predict the best breeding material, thereby improving the accuracy of the genome selective breeding result. The method can adapt to most material backgrounds, fills the blank of genome selection analysis in a supercomputer, improves the effect of breeding selection and promotes the progress of breeding.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic flow diagram illustrating a method of whole genome selective breeding according to a preferred embodiment 1 of the present invention;
FIG. 2 is a detailed flow diagram of a method of whole genome selective breeding according to the preferred embodiment 2 of the present invention;
FIGS. 3A to 3H6 are schematic diagrams showing the results of each step in a detailed flow chart of a method of whole genome selective breeding provided according to preferred embodiment 3 of the present invention; wherein fig. 3A shows the results of the phenotypic characteristic analysis step, fig. 3B shows the results of the population structure analysis step, fig. 3C shows the results of the linkage disequilibrium analysis step, fig. 3D shows the results of the genetic relationship analysis step, fig. 3E shows the results of the whole genome association analysis step, fig. 3F shows the results of the model evaluation and selection step, fig. 3G1 to 3G6 show the results of the marker effect analysis step of the 6 two-step model, fig. 3H1 to 3H6 show the results of the genome estimated breeding value analysis step of the 6 two-step model;
fig. 4A to 4D are graphs showing results of a whole genome selective breeding analysis provided in embodiment 4 of the present invention, where 4A shows that BB is the highest value among the results of model prediction accuracy, fig. 4B shows the results of labeling effect values calculated by rrBLUP model, fig. 4C shows the results of genome estimated breeding values calculated by rrBLUP model, and fig. 4D shows a quantitative en-latitude graph of 7 prediction models combined with each other for selective breeding;
FIG. 5 is a schematic structural diagram of a whole genome selective breeding apparatus provided in example 4 of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail with reference to examples.
Interpretation of terms:
genome-wide selection (GS) breeding: i.e., estimating the effect of all markers or haplotypes on the whole genome, thereby obtaining a genome estimated breeding value. The greatest difference from traditional marker-assisted selection is that genome-wide selection does not rely solely on a significant set of markers, but rather, all markers in a population are analyzed in combination to make predictions of individual breeding values. Compared with the traditional marker-assisted selection, the whole genome selection has two major breakthroughs, one is that the genome-targeted parental population can be directly applied and bred, but is more suitable for improving the quantitative traits of polygene control with small effect.
The Genetic Estimated Breeding Value (GEBV) is obtained by detecting a marker covering the whole genome and utilizing genetic information of genome level to carry out genetic evaluation on individuals so as to obtain higher accuracy of estimating the Breeding Value. The method selects the traits which are difficult to measure in the early stage, shortens the generation interval, accelerates the breeding process and further saves a large amount of breeding cost.
As mentioned in the background art, the existing genome selective breeding method still has the defect of low accuracy of the prediction result, and in order to further improve the accuracy of the prediction, the application improves the existing whole genome selective breeding method.
Example 1
This example provides a method of whole genome selective breeding, as shown in fig. 1, comprising:
step S101, obtaining a mark which is obviously associated with a target phenotype in a training population;
step S102, calculating the genome estimated breeding value of each individual in the breeding population by utilizing various whole genome selection prediction models according to the training population and the markers;
step S103, selecting individuals with a predetermined number of individuals in the plurality of genome-wide selection prediction models as breeding materials in the order of the highest genome estimated breeding value to the lowest.
According to the method for whole genome selective breeding, genome estimated breeding values of breeding populations are calculated by integrating a plurality of prediction models, and then individual materials with high breeding values are co-located by using the calculation results of the plurality of models, namely, individuals with high breeding values in the calculation results of all the prediction models are selected as breeding materials, so that the accuracy of results is greatly improved. Moreover, the method can also find out the best model from various prediction models to predict the best breeding material, thereby improving the accuracy of the genome selective breeding result. The method can adapt to most material backgrounds, fills the blank of Genome Selection (GS) analysis in a supercomputer, is beneficial to improving the effect of breeding selection and promoting the progress of breeding.
The above-mentioned selection of the individuals ranked in the top predetermined number as breeding materials is carried out, and the specific number varies depending on species classes and the number of the population to be selected, and in some cases, the selection may be carried out by number ratio, such as the top 3%, 5%, 8% or 10% of the individuals.
The various genome-wide selection prediction models can be added on the basis of the existing three models (namely, genome optimal linear unbiased estimation (GBLUP), ridge regression optimal linear unbiased estimation (RRBLUP) and Bayesian Ridge Regression (BRR)) according to needs. Included in this application, but not limited to: the system comprises a genome optimal linear unbiased prediction model, a ridge regression optimal linear unbiased estimation model, a Bayesian lasso model, a Bayesian A model, a Bayesian B model, a Bayesian C model and a Bayesian ridge regression model. Preferably, the plurality of models comprises at least 4 of the above, more preferably 5, 6 or 7. In some cases, new prediction models constructed according to different species and different shapes may also be incorporated into the various prediction models described above.
Note that, the marker used in the present application is an SNP marker on the genome, but the marker type may be a WGS sequencing SNP marker, INDEL marker, GBS sequencing, RAD sequencing, and chip data SNP marker.
Among the above-mentioned various whole genome selection prediction models, Ridge Regression Best Linear Unbiased Prediction (RRBLUP) and genome best linear unbiased prediction (GBLIP) belong to the category of penalty methods. Bayesian Lasso algorithm (BL), Bayesian A (BA), Bayesian B (Bayesian B, BB), Bayesian C (Bayesian C, BC), Bayesian Ridge Regression (BRR), these 5 models belong to the category of Bayesian methods. The best linear unbiased prediction of Genome (GBLUP) was a one-step method, and the other 6 models were two-step methods. The one-step method is to obtain estimated breeding value (GEBV) of genome in one step, and has no Single Nucleotide Polymorphism (SNP) marker effect value. And the two-step method is based on the obtained Genome Estimated Breeding Value (GEBV) of the SNP marker effect value matrix and the genetic matrix, so that the contribution ranking of the region on the genome to the Genome Estimated Breeding Value (GEBV) is obtained.
Therefore, the multiple whole genome selective prediction models comprise a ridge regression optimal linear unbiased estimation model, a Bayesian lasso model, a Bayesian A model, a Bayesian B model, a Bayesian C model and a Bayesian ridge regression model for combined selective breeding, and the calculation of the genome estimated breeding value of each individual in the breeding population by using the multiple whole genome selective prediction models comprises the following steps: carrying out accuracy evaluation on the multiple whole genome selection prediction models by utilizing the significant correlation between the target phenotype and the markers in the training population to obtain one or more whole genome selection prediction models meeting the accuracy requirement; selecting one or more whole genome selection prediction models by using the model accuracy result, and calculating to obtain the effect value of each marker; the genome estimated breeding value for each individual in the breeding population was calculated using the effect values of each marker.
The 6 models are used for evaluating the accuracy of each prediction model and the prediction of the genome estimated breeding value by using the relationship between the remarkably associated markers and phenotypes in the training populations when the selected statistically significant markers are used, so as to judge whether the corresponding models are suitable for the training populations. The standard or requirement evaluated here can be set reasonably according to actual needs according to the difference of different research projects and research purposes. For models that pass the accuracy assessment, then estimated breeding values for the genomes of breeding populations can be calculated.
In particular, a cross-validation method is generally used for evaluating the accuracy of each prediction model. The method is characterized in that a training set population is divided into two parts, wherein one part is used as a training set population (such as 70%, 75% or 80%), the other part is used as a test set population (such as 30%, 25% or 20%), the training set population is used for building the models, and the test set population is used for testing whether the built corresponding prediction models are accurate, so that subsequent breeding value calculation for genome estimation is screened, and the models are ensured to have relatively high accuracy, wherein the proportion of the training set population to the test set population is determined according to project conditions, species types and sample numbers.
The specific sources of the above markers that are significantly associated with the target phenotype can be derived from the genealogical population background or from Genomic association analysis (GWAS) to obtain statistically significant markers. It is only necessary to convert the relevant data of the mark to be used into a data format which can be accepted and processed by each prediction model.
In this application, we focus on the approach of screening for statistically significant markers by whole genome association analysis. In a preferred embodiment, obtaining markers in the training population that are significantly associated with the target phenotype comprises: and performing whole genome association analysis on sequencing data of a training population derived from a gene chip or genome re-sequencing so as to obtain a marker which is obviously associated with the target phenotype.
When the whole genome association analysis is performed, in order to more accurately screen and obtain the markers which have high association with the target traits and have statistical significance, the influence of various factors on statistical results, such as the phenotypic distribution, the structural characteristics and the like of training populations of different species, the linkage disequilibrium relationship among the markers and the like, is comprehensively considered. In a preferred embodiment, performing whole genome association analysis from the sequencing data to obtain markers significantly associated with the phenotype of interest comprises: performing comprehensive analysis on sequencing data, and performing comprehensive analysis on phenotype distribution analysis, population structure analysis, linkage disequilibrium analysis and genetic relationship analysis; and performing whole genome association analysis according to the result of the comprehensive analysis, thereby obtaining the marker which is obviously associated with the target phenotype. The specific analysis reasonably sets screening conditions and consideration factors according to the difference of specific species and the difference of specific target traits.
In a more preferred embodiment, performing a comprehensive analysis of the sequencing data and performing a whole genome association analysis based on the results of the comprehensive analysis to obtain markers significantly associated with the phenotype of interest comprises: detecting whether the quantitative phenotype in the sequencing data conforms to normal distribution or skewed distribution, and rejecting an extreme phenotype deviating from a lever value; calculating a population structure in a training population through principal component analysis or population structure analysis, and adding the population structure into a whole genome association analysis model as a fixed effect; performing linkage disequilibrium filtration on the markers of the whole genome through attenuation distance to remove the markers with the effect of multiple collinearity; calculating the genetic distance between individuals in a training population, and adding the genetic distance into a whole genome association analysis model as a random effect; and calculating the correlation between the quantitative trait phenotype and the marker of the whole genome by using a whole genome correlation analysis model, thereby selecting the marker which has a significant correlation with the target phenotype.
In the preferred embodiment, extreme phenotypes deviating from the lever value are eliminated in time by phenotypic characteristic analysis so as not to influence the results of subsequent correlation analysis. The analysis of the population structure is convenient for considering the accuracy of the estimation of the breeding of another population with the same population structure, and the smaller the difference of the population structure is, the higher the accuracy of the genome estimated breeding value obtained by the model calculation is. And judging that the training population has several population structures through Principal Component Analysis (PCA) or population Structure analysis (Structure), and adding the calculated result into a global genome association analysis (GWAS) model as a fixed effect, so that the influence of the factor on the association result is considered during analysis. Determining attenuation distances of several groups and a large group, and filtering the high-density markers through the attenuation distances by Linkage Disequilibrium (LD) so as to prevent the reduced fitness of a global genome association analysis (GWAS) model caused by the effect of multiple collinearity caused by strong Linkage Disequilibrium (LD) among SNPs. The genetic distance between training population individuals is calculated and is added into a global genome association analysis (GWAS) model as a random effect, so that the accuracy in analysis is improved conveniently.
And the method for obtaining the marker with statistical significance by whole genome correlation analysis and screening also has a plurality of methods, and the marker can be reasonably selected according to actual needs. Hybrid linear models are preferably employed in the present application to calculate the correlation between phenotype and marker, and to retain significantly correlated markers as markers for the target phenotype.
Example 2
This example provides a method for developing genome selection breeding based on the results obtained from global genome association analysis (GWAS), which comprises the following steps:
the data to be prepared are phenotype of Training Population (TP), genotype of TP (i.e. genotype of marker), GWAS result data of TP, Genome Selection (GS) prediction model, and genotype data of Breeding Population (BP), respectively. Among these, the result file of genome wide association analysis (GWAS) requires only 3 rows, chromosome number, physical location of Single Nucleotide Polymorphism (SNP), P-value of Single Nucleotide Polymorphism (SNP) (i.e., P-value with significant correlation to the phenotype of interest), respectively.
FIG. 2 shows a detailed flow of the genome selection breeding method of the present embodiment, in which two populations, a training population and a breeding population, are required. After the training population is analyzed through genome-wide association analysis (GWAS), extracting a genotype with statistical significance; then analyzing the training set population and the test set population by the training population, predicting an optimal model and selecting the optimal model; the genetic effect values of all markers were calculated by the best model and these effect values were used to estimate the genomic breeding values.
For selecting the best prediction model, calculating the genetic effect values of all the marks according to the best prediction model, and then calculating the genome estimated breeding value, aiming at some embodiments, after evaluating a plurality of prediction models, only one best prediction model is available, or after comparing, the prediction accuracy of the best prediction model can cover the results predicted by other models, and then calculating the genome estimated breeding value by the best prediction model.
In other embodiments, for example, when there is no significant difference between the accuracies of the plurality of prediction models, in order to further improve the accuracy of genome breeding selection, the accuracies of the plurality of prediction models may be evaluated separately, the estimated genome breeding values of the breeding populations may be calculated separately using the prediction models satisfying the accuracy requirements, the calculation results of each prediction model are selected according to the sequence of the estimated genome breeding values from high to low, and those individuals co-located by the plurality of prediction models to high breeding values may be further selected as breeding materials.
Example 3
The embodiment discloses a method for selective breeding of a genome of a certain sheep character. As shown in fig. 3A to 3H6, the method includes the steps of:
s1, phenotypic characterization: detecting whether the quantitative phenotype accords with normal distribution or skewed distribution, if the quantitative phenotype deviates from the extreme phenotype of the lever value, timely removing the quantitative phenotype, and obtaining a result shown in figure 3A;
s2, population stratification analysis: judging that the population has several population structures through Principal Component Analysis (PCA) or population Structure analysis (Structure), and adding the calculated result into a global genome association analysis (GWAS) model as a fixed effect, wherein the result is shown in FIG. 3B;
s3, Linkage Disequilibrium (LD) analysis: judging attenuation distances of several groups and a large group, and filtering the high-density markers through the attenuation distances by Linkage Disequilibrium (LD) so as to prevent the reduced fitness of a global genome association analysis (GWAS) model caused by the effect of multiple collinearity caused by strong Linkage Disequilibrium (LD) among SNPs, wherein the result is shown in FIG. 3C;
s4, genetic relationship analysis: by calculating the genetic distance between the population individuals, adding the genetic distance as a random effect into a genome-wide association analysis (GWAS) model, the result is shown in FIG. 3D;
s5, genome wide association analysis (GWAS): calculating the degree of association between the phenotype and the high-density marker through a whole genome association analysis model (namely, a mixed linear model), and selecting a marker with statistical significance, wherein the result is shown in FIG. 3E;
s6, model evaluation and selection: evaluating the model accuracy of 6 two-step models including RRBLUP, BL, BA, BB, BC and BRR respectively, and selecting a model adapting to a project, wherein the result is shown in FIG. 3F;
s7, Marker effect (Marker effect) analysis: calculating the effect values of the statistically significant markers in the whole genome of 6 two-step models, the results are shown in fig. 3G1 to 3G 6;
s8, Genome Estimated Breeding Value (GEBV) analysis: and predicting the genome breeding value of the corresponding model through the effect values of 6 two-step models in S7, directly calculating the Genome Estimated Breeding Value (GEBV) through a one-step method by skipping model evaluation and marking the effect value through a genome optimal linear unbiased prediction (GBLUP) model, and selecting a high Genome Estimated Breeding Value (GEBV) to breed. The results of 7 models were also examined, and materials co-localized to high breeding values by multiple models were selected and bred, as shown in FIGS. 3H 1-3H 6.
Example 4
As shown in fig. 4A to 4D, this example shows a graph of the results of a genome selective breeding analysis of chicken pectoralis major weight ratio obtained by genome breeding analysis based on high-density SNP marker data obtained by high-throughput sequencing of the illumina platform, wherein 2010956 SNPs were used and 519 samples were used for whole genome selective breeding.
Wherein, fig. 4A shows BB model as the highest mode value in the accuracy result of model prediction, and fig. 4B shows the marking effect value result calculated by rrBLUP model; FIG. 4C shows the results of estimated genomic breeding values calculated by the rrBLUP model. Fig. 4D shows the number of wehn plots of the 7 prediction models combined with each other for selective breeding, each model result providing the top 5% of samples of high GEBV, the most central data 7 refers to the 7 samples of the 7 models selected together, and the other numbers are the selected samples in the case where all combinations of the 7 models are considered.
As the prior art basically uses only one model to select the genome, the accuracy is about 0.3. The embodiment of the application performs combined selection breeding of a plurality of models on the basis of the accuracy of one test model, eliminates the specific samples selected and bred by each model, and the samples are possibly false positives, so that the accuracy of sample selection breeding is improved while the false positives are reduced, and the breeding efficiency is greatly improved.
As can be seen from the above description, the above-described embodiments of the present invention achieve the following technical effects: genome estimated breeding value calculation is carried out on breeding populations by integrating a plurality of prediction models, and then individual materials with high breeding values are co-located by using calculation results of the plurality of models, namely, individuals with high breeding values in calculation results of all the prediction models are selected as breeding materials, so that the accuracy of results is greatly improved. Moreover, the method can also find out the best model from various prediction models to predict the best breeding material, thereby improving the accuracy of the genome selective breeding result.
The method can simultaneously run a plurality of phenotypes of a plurality of models, and simultaneously generate genome selection results, thereby efficiently completing genome selection analysis. The method can adapt to most material backgrounds, fills the blank of Genome Selection (GS) analysis in a supercomputer, is beneficial to improving the effect of breeding selection and promoting the progress of breeding.
Compared with the existing method, most experiments are served on the end face of a desktop PC (personal computer), the method and the device are suitable for small data volume and large data volume, mainly operate in a supercomputer, only the files are adapted, the operation is performed in a key mode, the operation efficiency is improved by automatically debugging the operation quantity of the CPU, and 7 models can be simultaneously operated in the background. The results for the 7 models were plotted delicately. Therefore, the method provided by the application has great improvement in the data amount required to be calculated, the model integrity of calculation, the calculation efficiency and the result display.
The method and apparatus of the present application are suitable for many types of samples, and can analyze any type of data, such as the genotypes of WGS, GBS, RAD, chips, samples, such as natural population, pedigree population, etc., as long as the format is consistent.
The method and the device adopt the joint analysis of a plurality of models to jointly breed the sample, so that the accuracy is higher and the selectivity is richer.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be essentially or partially implemented in the form of software products, which are stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and include instructions for causing a computing device to execute the methods of the embodiments of the present invention or causing a processor to execute the methods of the embodiments of the present invention.
Example 5
This example provides an apparatus for whole genome selective breeding, as shown in fig. 5, the apparatus comprising: the system comprises an acquisition module 20, a breeding value estimation module 40 and a selection module 60, wherein the acquisition module 20 is used for acquiring markers which are obviously associated with target phenotypes in training groups; a breeding value estimation module 40, which is used for calculating the genome estimation breeding value of each individual in the breeding population by using various whole genome selection prediction models according to the training population and the markers; and a selection module 60 for selecting individuals, each ranked in a predetermined number in the plurality of genome-wide selection prediction models, as breeding materials in order of the genome estimated breeding value from high to low.
In the above apparatus, the plurality of genome wide selection prediction models include: at least 4 of a genome optimal linear unbiased prediction model, a ridge regression optimal linear unbiased estimation model, a Bayesian lasso model, a Bayesian A model, a Bayesian B model, a Bayesian C model, and a Bayesian ridge regression model.
When the above apparatus is used, the plurality of genome wide selection prediction models includes: in a preferred embodiment, the breeding value estimation module comprises at least 3 of a ridge regression optimal linear unbiased estimation model, a bayesian lasso model, a bayesian a model, a bayesian B model, a bayesian C model and a bayesian ridge regression model, since the 6 models are calculated by a two-step method for estimating the breeding value of the genome: the model accuracy evaluation module is used for carrying out accuracy evaluation on various whole genome selection prediction models by utilizing the significant correlation between target phenotypes and markers in a training population to obtain one or more whole genome selection prediction models meeting the accuracy requirement; the effect value calculation module is used for selecting a prediction model by utilizing one or more whole genomes meeting the accuracy requirement and calculating to obtain the effect value of each marker; and the breeding value estimation submodule is used for calculating a genome estimated breeding value of each individual in the breeding population by using the effect value of each marker.
In a preferred embodiment, the obtaining module includes: and the whole genome correlation analysis module is used for carrying out whole genome correlation analysis on sequencing data of a training population derived from a gene chip or genome re-sequencing so as to obtain a marker obviously correlated with the target phenotype.
In a preferred embodiment, the genome wide association analysis module comprises: the comprehensive analysis module is used for carrying out comprehensive analysis on the sequencing data, comprehensive analysis phenotype distribution analysis, population structure analysis, linkage disequilibrium analysis and genetic relationship analysis; and the whole genome association analysis submodule is used for performing whole genome association analysis according to the result of the comprehensive analysis so as to obtain a marker which is obviously associated with the target phenotype.
In a preferred embodiment, the genome wide association analysis module comprises: the phenotype distribution analysis module is used for detecting whether the phenotypes of the quantitative traits in the sequencing data accord with normal distribution or skewed distribution or not and rejecting extreme phenotypes deviating from the lever values; the group structure analysis module is used for calculating group structures in the training groups through principal component analysis or group structure analysis, and adding the number of the group structures into the whole genome association analysis submodule as a fixed effect; the linkage disequilibrium analysis module is used for carrying out linkage disequilibrium filtration on the markers of the whole genome through the attenuation distance and removing the markers with the effect of multiple collinearity; the genetic relationship analysis module is used for calculating the genetic distance among individuals in the training population and adding the genetic distance into the whole genome association analysis submodule as a random effect; and the whole genome association analysis submodule is used for calculating the association between the quantitative trait phenotype and the marker of the whole genome, so as to select the marker which has a significant association with the target phenotype.
Example 6
The present application also provides a storage medium comprising a stored program, wherein the program is run by a device on which the storage medium is controlled to perform any of the above-described methods of whole genome selective breeding.
Example 7
The present application also provides a processor for running a program, wherein the program when run performs any of the methods of whole genome selection breeding.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in this application are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of hardware devices such as software plus necessary detection instruments. Based on such understanding, the data processing part in the technical solution of the present application may be embodied in the form of a software product, and the computer software product may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, multiprocessor systems, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
It will be apparent to those skilled in the art that some of the above-described modules or steps of the present application may be implemented in a general purpose computing device, they may be centralized on a single computing device or distributed over a network of multiple computing devices, and alternatively, they may be implemented in program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (14)

1. A method of whole genome selection breeding, the method comprising:
obtaining markers in a training population that are significantly associated with a target phenotype;
calculating a genome estimated breeding value of each individual in a breeding population by utilizing a plurality of genome-wide selection prediction models according to the training population and the markers;
selecting individuals, each ranked in a predetermined number in the plurality of the genome-wide selection prediction models, as breeding materials in order of the genome estimated breeding value from high to low.
2. The method of claim 1, wherein the plurality of genome wide selection prediction models comprises: at least 4 of a genome optimal linear unbiased prediction model, a ridge regression optimal linear unbiased estimation model, a Bayesian lasso model, a Bayesian A model, a Bayesian B model, a Bayesian C model, and a Bayesian ridge regression model.
3. The method of claim 2, wherein calculating the genome estimated breeding value for each individual in the breeding population using the plurality of genome-wide selection prediction models comprises, when the plurality of genome-wide selection prediction models comprises at least 3 of a ridge regression optimal linear unbiased estimation model, a bayesian lasso model, a bayesian a model, a bayesian B model, a bayesian C model, and a bayesian ridge regression model:
evaluating the accuracy of the plurality of genome wide selection prediction models by using the significant associations between the target phenotypes and the markers in the training population to obtain one or more genome wide selection prediction models meeting the accuracy requirement;
calculating the effect value of each marker by using the one or more whole genome selection prediction models meeting the accuracy requirement;
calculating the estimated genome breeding value of each individual in the breeding population by using the effect value of each marker.
4. The method of any one of claims 1 to 3, wherein obtaining markers in the training population that are significantly associated with the target phenotype comprises:
performing whole genome association analysis on sequencing data of the training population derived from gene chips or genome re-sequencing, thereby obtaining markers significantly associated with the target phenotype.
5. The method of claim 4, wherein performing the genome-wide association analysis from the sequencing data to obtain markers significantly associated with the target phenotype comprises:
performing comprehensive analysis on sequencing data, wherein the comprehensive analysis comprises phenotype distribution analysis, population structure analysis, linkage disequilibrium analysis and genetic relationship analysis;
performing the genome-wide association analysis according to the result of the comprehensive analysis, thereby obtaining a marker significantly associated with the target phenotype.
6. The method of claim 5, wherein performing a comprehensive analysis of sequencing data and performing the genome-wide association analysis based on the results of the comprehensive analysis to obtain markers significantly associated with the target phenotype comprises:
detecting whether the quantitative phenotype in the sequencing data conforms to normal distribution or skewed distribution, and rejecting an extreme phenotype deviating from a lever value;
calculating a population structure in the training population through principal component analysis or population structure analysis, and adding the population structure into a whole genome association analysis model as a fixed effect;
performing linkage disequilibrium filtration on the markers of the whole genome through attenuation distance to remove the markers with the effect of multiple collinearity;
calculating the genetic distance between individuals in the training population, and adding the genetic distance into the whole genome association analysis model as a random effect;
calculating the correlation between the quantitative trait phenotype and the genome-wide marker by using the genome-wide correlation analysis model, so as to select the marker which has a significant correlation with the target phenotype;
preferably, the genome wide association analysis model is a hybrid linear model.
7. An apparatus for whole genome selective breeding, the apparatus comprising:
an acquisition module for acquiring markers in a training population that are significantly associated with a target phenotype;
a breeding value estimation module for calculating a genome estimated breeding value of each individual in a breeding population by using a plurality of whole genome selection prediction models according to the training population and the markers;
and a selection module for selecting individuals, each of which is ranked in a predetermined number in the plurality of genome-wide selection prediction models, as breeding materials in order of the genome estimated breeding values from high to low.
8. The apparatus of claim 7, wherein the plurality of genome wide selection prediction models comprises: at least 4 of a genome optimal linear unbiased prediction model, a ridge regression optimal linear unbiased estimation model, a Bayesian lasso model, a Bayesian A model, a Bayesian B model, a Bayesian C model, and a Bayesian ridge regression model.
9. The apparatus as claimed in claim 8, wherein when the plurality of genome-wide selection prediction models includes at least 3 of a ridge regression optimal linear unbiased estimation model, a bayesian lasso model, a bayesian a model, a bayesian B model, a bayesian C model and a bayesian ridge regression model, the breeding value estimation module comprises:
a model accuracy evaluation module for performing accuracy evaluation on the plurality of genome-wide selection prediction models by using the significant correlation between the target phenotype and the marker in the training population to obtain one or more genome-wide selection prediction models meeting accuracy requirements;
the effect value calculation module is used for calculating the effect value of each marker by utilizing the one or more whole genome selection prediction models meeting the accuracy requirement;
and a breeding value estimation submodule for calculating a genome estimated breeding value of each individual in the breeding population by using the effect value of each marker.
10. The apparatus of any one of claims 7 to 9, wherein the obtaining module comprises: and the whole genome correlation analysis module is used for carrying out whole genome correlation analysis on sequencing data of the training population derived from the gene chip or genome re-sequencing so as to obtain a marker obviously correlated with the target phenotype.
11. The apparatus of claim 10, wherein the genome wide association analysis module comprises:
the comprehensive analysis module is used for comprehensively analyzing sequencing data, and analyzing comprehensive analysis phenotype distribution, population structure analysis, linkage disequilibrium analysis and genetic relationship analysis;
and the whole genome association analysis submodule is used for carrying out the whole genome association analysis according to the result of the comprehensive analysis so as to obtain a mark which is obviously associated with the target phenotype.
12. The apparatus of claim 11, wherein the genome wide association analysis module comprises:
the phenotype distribution analysis module is used for detecting whether the phenotypes of the quantitative characters in the sequencing data accord with normal distribution or skewed distribution or not and rejecting extreme phenotypes deviating from the lever values;
the group structure analysis module is used for calculating the group structure in the training group through principal component analysis or group structure analysis, and adding the group structure into the whole genome correlation analysis submodule as a fixed effect;
the linkage disequilibrium analysis module is used for carrying out linkage disequilibrium filtration on the markers of the whole genome through the attenuation distance and removing the markers with the effect of multiple collinearity;
the genetic relationship analysis module is used for calculating the genetic distance among individuals in the training population and adding the genetic distance into the whole genome correlation analysis submodule as a random effect;
the whole genome association analysis submodule is used for calculating the association between the quantitative trait phenotype and the whole genome marker, so as to select the marker which has a significant association with the target phenotype;
preferably, the whole genome association analysis submodule is a mixed linear module.
13. A storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the storage medium is located to perform the method of any one of claims 1 to 6.
14. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 6.
CN202010366270.XA 2020-04-30 2020-04-30 Method and device for whole genome selective breeding Active CN111524545B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010366270.XA CN111524545B (en) 2020-04-30 2020-04-30 Method and device for whole genome selective breeding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010366270.XA CN111524545B (en) 2020-04-30 2020-04-30 Method and device for whole genome selective breeding

Publications (2)

Publication Number Publication Date
CN111524545A true CN111524545A (en) 2020-08-11
CN111524545B CN111524545B (en) 2023-11-10

Family

ID=71906503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010366270.XA Active CN111524545B (en) 2020-04-30 2020-04-30 Method and device for whole genome selective breeding

Country Status (1)

Country Link
CN (1) CN111524545B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113373245A (en) * 2021-07-14 2021-09-10 广东海洋大学 Method for cultivating improved variety of pinctada martensii with golden yellow shell color character based on whole genome selection
CN115443907A (en) * 2022-07-26 2022-12-09 开封市农林科学研究院 High-yield large-fruit peanut hybridization combination selection method based on whole genome selection
CN116467596A (en) * 2023-04-11 2023-07-21 广州国家现代农业产业科技创新中心 Training method of rice grain length prediction model, morphology prediction method and apparatus
CN116732222A (en) * 2023-05-26 2023-09-12 南京农业大学 Method for efficiently predicting salt tolerance of chrysanthemum based on whole genome
CN117672360A (en) * 2024-01-30 2024-03-08 北京市农林科学院信息技术研究中心 Genome selection method, device, equipment and medium based on transfer learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914631A (en) * 2014-02-26 2014-07-09 中国农业大学 Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip
CN105868584A (en) * 2016-05-23 2016-08-17 厦门胜芨科技有限公司 Method for performing whole genome selective breeding by selecting extreme character individual
CN106755441A (en) * 2016-12-29 2017-05-31 华南农业大学 A kind of method that gene group selection based on multiple characters carries out forest multiple characters pyramiding breeding
CN107278877A (en) * 2017-07-25 2017-10-24 山东省农业科学院玉米研究所 A kind of full-length genome selection and use method of corn seed-producing rate
US20190228839A1 (en) * 2015-07-23 2019-07-25 Limagrain Europe Improved computer implemented method for predicting true agronomical value of a plant
CN110610744A (en) * 2019-09-11 2019-12-24 华中农业大学 Efficient whole genome selection method capable of realizing parallel operation and high accuracy
CN110867208A (en) * 2019-11-29 2020-03-06 中国科学院海洋研究所 Method for improving whole genome selective breeding efficiency of aquatic animals

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914631A (en) * 2014-02-26 2014-07-09 中国农业大学 Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip
US20190228839A1 (en) * 2015-07-23 2019-07-25 Limagrain Europe Improved computer implemented method for predicting true agronomical value of a plant
CN105868584A (en) * 2016-05-23 2016-08-17 厦门胜芨科技有限公司 Method for performing whole genome selective breeding by selecting extreme character individual
CN106755441A (en) * 2016-12-29 2017-05-31 华南农业大学 A kind of method that gene group selection based on multiple characters carries out forest multiple characters pyramiding breeding
CN107278877A (en) * 2017-07-25 2017-10-24 山东省农业科学院玉米研究所 A kind of full-length genome selection and use method of corn seed-producing rate
CN110610744A (en) * 2019-09-11 2019-12-24 华中农业大学 Efficient whole genome selection method capable of realizing parallel operation and high accuracy
CN110867208A (en) * 2019-11-29 2020-03-06 中国科学院海洋研究所 Method for improving whole genome selective breeding efficiency of aquatic animals

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐玲: "肉用西门塔尔牛全基因组选择的低密度SNP芯片设计研究" *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113373245A (en) * 2021-07-14 2021-09-10 广东海洋大学 Method for cultivating improved variety of pinctada martensii with golden yellow shell color character based on whole genome selection
CN115443907A (en) * 2022-07-26 2022-12-09 开封市农林科学研究院 High-yield large-fruit peanut hybridization combination selection method based on whole genome selection
CN115443907B (en) * 2022-07-26 2023-04-21 开封市农林科学研究院 High-yield large-fruit peanut hybrid combination selection method based on whole genome selection
CN116467596A (en) * 2023-04-11 2023-07-21 广州国家现代农业产业科技创新中心 Training method of rice grain length prediction model, morphology prediction method and apparatus
CN116467596B (en) * 2023-04-11 2024-03-26 广州国家现代农业产业科技创新中心 Training method of rice grain length prediction model, morphology prediction method and apparatus
CN116732222A (en) * 2023-05-26 2023-09-12 南京农业大学 Method for efficiently predicting salt tolerance of chrysanthemum based on whole genome
CN116732222B (en) * 2023-05-26 2024-03-15 南京农业大学 Method for efficiently predicting salt tolerance of chrysanthemum based on whole genome
CN117672360A (en) * 2024-01-30 2024-03-08 北京市农林科学院信息技术研究中心 Genome selection method, device, equipment and medium based on transfer learning
CN117672360B (en) * 2024-01-30 2024-06-11 北京市农林科学院信息技术研究中心 Genome selection method, device, equipment and medium based on transfer learning

Also Published As

Publication number Publication date
CN111524545B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
CN111524545B (en) Method and device for whole genome selective breeding
Qanbari On the extent of linkage disequilibrium in the genome of farm animals
Kardos et al. Inferring individual inbreeding and demographic history from segments of identity by descent in Ficedula flycatcher genome sequences
Hozé et al. High-density marker imputation accuracy in sixteen French cattle breeds
Nietlisbach et al. Pedigree-based inbreeding coefficient explains more variation in fitness than heterozygosity at 160 microsatellites in a wild bird population
Sousa et al. Identifying loci under selection against gene flow in isolation-with-migration models
Chen et al. Using Mendelian inheritance to improve high-throughput SNP discovery
Kelly et al. The genomic signal of partial sweeps in Mimulus guttatus
CN106446597B (en) Several species feature selecting and the method for identifying unknown gene
Pérez-Enciso et al. Evaluating sequence-based genomic prediction with an efficient new simulator
CN105868584A (en) Method for performing whole genome selective breeding by selecting extreme character individual
CN114530198A (en) Screening method of SNP (single nucleotide polymorphism) sites for detecting sample pollution level and detection method of sample pollution level
Salmona et al. Inferring demographic history using genomic data
Balding et al. Population genetics of STR loci in Caucasians
CN110444253B (en) Method and system suitable for mixed pool gene positioning
Mollandin et al. An evaluation of the predictive performance and mapping power of the BayesR model for genomic prediction
Aylward et al. How methodological changes have influenced our understanding of population structure in threatened species: insights from tiger populations across India
Anastasiadi et al. Bioinformatic analysis for age prediction using epigenetic clocks: Application to fisheries management and conservation biology
Jiang et al. Fast Bayesian fine-mapping of 35 production, reproduction and body conformation traits with imputed sequences of 27K Holstein bulls
Chiquitto et al. Impact of sequencing technologies on long non-coding RNA computational identification
Hodel et al. Linking genome signatures of selection and adaptation in non-model plants: exploring potential and limitations in the angiosperm Amborella
US20220020449A1 (en) Vector-based haplotype identification
Noto et al. Ethnicity estimate 2018 white paper
Emma Huang et al. iDArTs: increasing the value of genomic resources at no cost
CN114496076B (en) Genome genetic layering joint analysis method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant