CN111524545B - Method and device for whole genome selective breeding - Google Patents

Method and device for whole genome selective breeding Download PDF

Info

Publication number
CN111524545B
CN111524545B CN202010366270.XA CN202010366270A CN111524545B CN 111524545 B CN111524545 B CN 111524545B CN 202010366270 A CN202010366270 A CN 202010366270A CN 111524545 B CN111524545 B CN 111524545B
Authority
CN
China
Prior art keywords
genome
breeding
model
analysis
whole genome
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010366270.XA
Other languages
Chinese (zh)
Other versions
CN111524545A (en
Inventor
喻宇烨
梁齐齐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Novogene Biological Information Technology Co ltd
Original Assignee
Tianjin Novogene Biological Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Novogene Biological Information Technology Co ltd filed Critical Tianjin Novogene Biological Information Technology Co ltd
Priority to CN202010366270.XA priority Critical patent/CN111524545B/en
Publication of CN111524545A publication Critical patent/CN111524545A/en
Application granted granted Critical
Publication of CN111524545B publication Critical patent/CN111524545B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B10/00ICT specially adapted for evolutionary bioinformatics, e.g. phylogenetic tree construction or analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Physiology (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Artificial Intelligence (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a whole genome selective breeding method and a whole genome selective breeding device. The method comprises the following steps: obtaining a marker in the training population that is significantly associated with the target phenotype; according to the training population and the markers, calculating a genome estimation breeding value of each individual in the breeding population by utilizing a plurality of whole genome selection prediction models; and selecting a predetermined number of individuals which are arranged in front in a plurality of whole genome selection prediction models according to the sequence of genome estimation breeding values from high to low as a breeding material. And integrating a plurality of models to calculate genome estimated breeding values, co-locating by utilizing the results of the models, and selecting individuals with high breeding values in all models as breeding materials, thereby greatly improving the accuracy of the results. The method can adapt to most material backgrounds, fills the blank in genome selection analysis in a super computer, improves the effect of breeding selection, and promotes the progress of breeding.

Description

Method and device for whole genome selective breeding
Technical Field
The invention relates to the field of molecular breeding, in particular to a whole genome selective breeding method and a whole genome selective breeding device.
Background
In the selection breeding history, the exploration from experience breeding to breeding theory and method is carried out, and the theory of pure line theory is selectively adopted, namely backcross breeding, recurrent breeding, mutation breeding, single grain transmission and ideal plant type; to marker assisted selection breeding, a variety of markers have been explored, such as amplified fragment length polymorphism marker assisted selection (AFLP), microsatellite marker assisted selection (SSR), and single nucleotide polymorphism marker assisted Selection (SNP). Along with the development of sequencing technology, the throughput of sequencing is higher and lower, the cost is lower and the computational capability of a computer is improved continuously, so that technical conditions are created for the development of brand new breeding technology, and genome selection (Genomic Selection, GS) breeding tide is raised.
Genome selective breeding can effectively solve the limitations of difficult-to-measure characters, large fortune components, long time consumption, high technical difficulty and other factors, and quicken the breeding pace. Genome selective breeding is a breeding method of marker-assisted selection using high-density molecular genetic markers covering the whole genome.
The currently well-known Genome Selection (GS) analysis functional software is Ipat software, the Ipat software interface is friendly, but Ipat has three GS models, namely genome optimal linear unbiased estimation (GBLUP), ridge regression optimal linear unbiased estimation (RRBLUP) and Bayesian Ridge Regression (BRR).
However, for companies with rapid breeding requirements, the existing genome selection analysis has low efficiency, and the accuracy of analysis results is relatively low, so that the requirements cannot be met.
Disclosure of Invention
The invention mainly aims to provide a whole genome selective breeding method and device, which are used for solving the problem of low accuracy of analysis results in the prior art.
In order to achieve the above object, according to one aspect of the present invention, there is provided a whole genome selective breeding method comprising: obtaining a marker in the training population that is significantly associated with the target phenotype; according to the training population and the markers, calculating a genome estimation breeding value of each individual in the breeding population by utilizing a plurality of whole genome selection prediction models; and selecting a predetermined number of individuals which are arranged in front in a plurality of whole genome selection prediction models according to the sequence of genome estimation breeding values from high to low as a breeding material.
Further, the plurality of genome-wide selection prediction models includes: at least 4 of genome optimal linear unbiased prediction model, ridge regression optimal linear unbiased estimation model, bayesian lasso model, bayesian A model, bayesian B model, bayesian C model and Bayesian ridge regression model.
Further, when the plurality of whole genome selection prediction models includes at least 3 of a ridge regression optimal line type unbiased estimation model, a bayesian lasso model, a bayesian a model, a bayesian B model, a bayesian C model, and a bayesian ridge regression model, calculating a genome estimation breeding value of each individual in the breeding population using the plurality of whole genome selection prediction models includes: performing accuracy assessment on the multiple whole genome selection prediction models by using the remarkable correlation between the target phenotype and the marker in the training population to obtain one or more whole genome selection prediction models meeting the accuracy requirement; selecting a prediction model by using one or more whole genome which meets the accuracy requirement, and calculating to obtain the effect value of each marker; and calculating the genome estimated breeding value of each individual in the breeding population by using the effect value of each marker.
Further, obtaining markers in the training population that are significantly associated with the phenotype of interest comprises: whole genome association analysis is performed on sequencing data from a training population derived from a gene chip or genome re-sequencing to obtain markers that are significantly associated with the phenotype of interest.
Further, performing a whole genome association analysis from the sequencing data to obtain a marker that is significantly associated with the phenotype of interest comprises: comprehensively analyzing the sequencing data, namely comprehensively analyzing phenotype distribution analysis, group structure analysis, linkage disequilibrium analysis and genetic relationship analysis; and carrying out whole genome association analysis according to the comprehensive analysis result, thereby obtaining the marker which is obviously associated with the target phenotype.
Further, performing a comprehensive analysis on the sequencing data and performing a genome-wide association analysis based on the results of the comprehensive analysis, thereby obtaining a signature that is significantly associated with the phenotype of interest comprises: detecting whether the phenotype of the quantitative trait in the sequencing data accords with normal distribution or biased distribution, and eliminating extreme phenotypes deviating from the lever value; calculating a population structure in a training population through principal component analysis or population structure analysis, and adding the population structure as a fixed effect into a whole genome association analysis model; performing linkage disequilibrium filtering on the markers of the whole genome through the attenuation distance to remove the markers with multiple collinearity effects; calculating the related distances among individuals in the training population, and adding the related distances into a whole genome association analysis model as a random effect; calculating the association between the quantitative trait phenotypes and the markers of the whole genome by using a whole genome association analysis model, so as to select and obtain the markers which have obvious association with the target phenotypes; preferably, the whole genome correlation analysis model is a hybrid linear model.
In order to achieve the above object, according to one aspect of the present invention, there is provided an apparatus for whole genome selective breeding, the apparatus comprising: the system comprises an acquisition module, a breeding value estimation module and a selection module, wherein the acquisition module is used for acquiring marks which are obviously associated with a target phenotype in a training group; the breeding value estimation module is used for calculating genome estimated breeding values of each individual in the breeding population by utilizing a plurality of whole genome selection prediction models according to the training population and the markers; the selection module is used for selecting a preset number of individuals which are arranged in the whole genome selection prediction models in the sequence of the genome estimation breeding value from high to low as breeding materials.
Further, the plurality of genome-wide selection prediction models includes: at least 4 of genome optimal linear unbiased prediction model, ridge regression optimal linear unbiased estimation model, bayesian lasso model, bayesian A model, bayesian B model, bayesian C model and Bayesian ridge regression model.
Further, when the plurality of whole genome selection prediction models includes at least 3 of a ridge regression optimal line type unbiased estimation model, a bayesian lasso model, a bayesian a model, a bayesian B model, a bayesian C model, and a bayesian ridge regression model, the breeding value estimation module includes: the model accuracy evaluation module is used for evaluating the accuracy of the multiple whole genome selection prediction models by utilizing the obvious correlation between the target phenotype and the marker in the training population to obtain one or more whole genome selection prediction models meeting the accuracy requirement; the effect value calculation module is used for selecting a prediction model by using one or more whole genomes meeting the accuracy requirement, and calculating to obtain the effect value of each mark; and the breeding value estimation submodule is used for calculating genome estimated breeding values of each individual in the breeding population by utilizing the effect value of each marker.
Further, the acquisition module includes: and the whole genome association analysis module is used for carrying out whole genome association analysis on sequencing data of a training population derived from a gene chip or genome re-sequencing so as to obtain a marker which is obviously associated with the target phenotype.
Further, the whole genome association analysis module includes: the comprehensive analysis module is used for comprehensively analyzing the sequencing data, and comprehensively analyzing phenotype distribution analysis, group structure analysis, linkage disequilibrium analysis and genetic relationship analysis; and the whole genome association analysis submodule is used for carrying out whole genome association analysis according to the comprehensive analysis result so as to obtain the marker which is obviously associated with the target phenotype.
Further, the whole genome association analysis module includes: the phenotype distribution analysis module is used for detecting whether the phenotype of the quantitative trait in the sequencing data accords with normal distribution or biased distribution, and eliminating extreme phenotypes deviating from the lever value; the group structure analysis module is used for calculating the group structure in the training group through principal component analysis or group structure analysis, and adding the group structure as a fixed effect into the whole genome association analysis submodule; the linkage disequilibrium analysis module is used for carrying out linkage disequilibrium filtration on the markers of the whole genome through the attenuation distance and removing the markers with the effect of multiple collinearity; the genetic relationship analysis module is used for calculating the genetic relationship distance among individuals in the training population and adding the genetic relationship distance as a random effect into the whole genome association analysis sub-module; a whole genome association analysis sub-module for calculating the association between markers of the quantitative trait phenotype and the whole genome, thereby selecting markers with significant association with the target phenotype; preferably, the whole genome associative analysis sub-module is a hybrid linear module.
In order to achieve the above object, according to one aspect of the present application, there is provided a storage medium including a stored program, wherein the program is controlled to execute any one of the above methods of whole genome selective breeding by a device in which the storage medium is located when the program is run.
To achieve the above object, according to one aspect of the present application, there is provided a processor for running a program, wherein the program is run to perform any one of methods of whole genome selective breeding.
By applying the technical scheme of the application, the genome estimation breeding value calculation is performed by integrating a plurality of models, the results of the models are utilized to co-locate, and individuals with high breeding values in all the models are selected as breeding materials, so that the accuracy of the results is greatly improved. In addition, the method can find out the optimal model prediction optimal breeding material from various models, thereby improving the accuracy of genome selective breeding results. The method disclosed by the application can adapt to most material backgrounds, fills the blank in genome selection analysis in a super computer, improves the effect of breeding selection, and promotes the progress of breeding.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 shows a schematic flow diagram of a method for whole genome selective breeding according to the preferred embodiment 1 of the present application;
FIG. 2 shows a detailed flow diagram of a method for whole genome selective breeding according to the preferred embodiment 2 of the present application;
FIGS. 3A to 3H6 are schematic views showing the results of each step in a detailed flow of a whole genome selective breeding method according to the preferred embodiment 3 of the present application; wherein fig. 3A shows the results of the phenotypic characteristic analysis step, fig. 3B shows the results of the population structure analysis step, fig. 3C shows the results of the linkage disequilibrium analysis step, fig. 3D shows the results of the genetic relationship analysis step, fig. 3E shows the results of the whole genome correlation analysis step, fig. 3F shows the results of the model evaluation and selection step, fig. 3G1 to 3G6 show the results of the marker effect analysis step of the 6 two-step model, fig. 3H1 to 3H6 show the results of the genome estimation breeding value analysis step of the 6 two-step model;
Fig. 4A to fig. 4D show a whole genome selective breeding analysis result graph provided in embodiment 4 of the present application, in which, 4A shows that the highest mode value in the model prediction accuracy results is a BB model, fig. 4B shows the marker effect value result calculated by the rrBLUP model, fig. 4C shows the genome estimated breeding value result calculated by the rrBLUP model, and fig. 4D shows a number-weft plot of selective breeding by combining 7 prediction models with each other;
fig. 5 shows a schematic structural diagram of a whole genome selective breeding device according to embodiment 4 of the present application.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The present application will be described in detail with reference to examples.
Term interpretation:
whole genome selection (Genomic selection, GS) breeding: i.e., the effect of all markers or haplotypes on the whole genome is estimated, resulting in a genome-estimated breeding value. The biggest difference from traditional marker-assisted selection is that whole genome selection relies not only on a significant set of markers, but rather the combination of all markers in a population are analyzed to make predictions of individual breeding values. Compared with the traditional marker-assisted selection, the whole genome selection has two breakthrough, and firstly, the genome-oriented parent population can be directly applied to breeding, but is more suitable for improving the quantitative trait of the polygenic control with smaller effect.
Genome estimation breeding values (Genomic Estimated Breeding Value, GEBV), individuals are genetically assessed with genome-level genetic information by detecting markers covering the whole genome to obtain higher accuracy of breeding value estimation. And the characters which are difficult to measure in early stage are selected, the generation interval is shortened, and the breeding process is accelerated, so that a great amount of breeding cost is saved.
As mentioned in the background art, the existing genome selective breeding method still has the defect of low prediction result accuracy, and in order to further improve the prediction accuracy, the application improves the existing whole genome selective breeding method.
Example 1
The embodiment provides a whole genome selective breeding method, as shown in fig. 1, which comprises the following steps:
step S101, obtaining marks which are obviously associated with a target phenotype in a training population;
step S102, calculating a genome estimated breeding value of each individual in a breeding population by utilizing a plurality of whole genome selection prediction models according to training populations and markers;
step S103, selecting a predetermined number of individuals which are arranged in front in each of the plurality of whole genome selection prediction models as breeding materials in the order of high-to-low genome estimation breeding values.
According to the whole genome selective breeding method, genome estimated breeding values of breeding groups are calculated by integrating a plurality of prediction models, and then individual materials with high breeding values are co-located by utilizing the calculation results of the plurality of models, namely, individuals with high breeding values in the calculation results of all the prediction models are selected as the breeding materials, so that the accuracy of the results is greatly improved. In addition, the method can find out the optimal model prediction optimal breeding material from a plurality of prediction models, thereby improving the accuracy of genome selective breeding results. The method can adapt to most material backgrounds, fills the blank in Genome Selection (GS) analysis in a super computer, is beneficial to improving the effect of breeding selection and promotes the progress of breeding.
The above selection of a predetermined number of individuals in a row as breeding material, the specific number being different depending on the species category and the size of the population selected, may in some cases be selected by the number ratio, such as the previous 3%, 5%, 8% or 10% of individuals.
The above-described multiple genome-wide selection prediction models can be added on the basis of existing three models (i.e., genome best linear unbiased estimation (GBLUP), ridge regression best linear unbiased estimation (RRBLUP), bayesian Ridge Regression (BRR)) as needed. Included in the present application are, but not limited to: a genome optimal linear unbiased prediction model, a ridge regression optimal linear unbiased estimation model, a Bayesian lasso model, a Bayesian A model, a Bayesian B model, a Bayesian C model and a Bayesian ridge regression model. Preferably, the plurality of models includes at least the above 4, more preferably 5, 6 or 7. In some cases, new predictive models constructed according to different species and different shapes may also be incorporated into the various predictive models described above.
The markers used in the present application are SNP markers on the genome, but the types of markers may be either WGS sequenced SNP markers, INDEL markers, or GBS sequenced SNP markers, RAD sequenced SNP markers, and chip data.
Of the various genome-wide selection prediction models described above, the ridge regression best linear unbiased estimation (ridge regression best linear unbiased prediction, RRBLUP) and the genome best linear unbiased prediction (Genomic best linear unbiased prediction, GBLIP) fall within the category of penalty methods. Bayesian Lasso algorithm (BL), bayesian a (Bayesian a, BA), bayesian B (Bayesian B, BB), bayesian C (Bayesian C, BC), bayesian ridge regression (Bayesian ridge regression, BRR), these 5 models belong to the Bayesian method category. Wherein, the optimal linear unbiased prediction (GBLUP) of genome is a one-step method, and the other 6 models are two-step methods. The one-step method refers to obtaining a Genome Estimated Breeding Value (GEBV) in one step, and no Single Nucleotide Polymorphism (SNP) marker effect value exists. And the two-step method is based on the obtained Genome Estimation Breeding Value (GEBV) of the SNP marker effect value matrix and the genetic matrix, so that the contribution rank of the region on the genome to the Genome Estimation Breeding Value (GEBV) is obtained.
Therefore, the multiple whole genome selection prediction models comprise a ridge regression optimal line type unbiased estimation model, a Bayesian lasso model, a Bayesian A model, a Bayesian B model, a Bayesian C model and a Bayesian ridge regression model for joint selection breeding, and the calculating of the genome estimation breeding value of each individual in the breeding population by utilizing the multiple whole genome selection prediction models comprises the following steps: performing accuracy assessment on the multiple whole genome selection prediction models by using the remarkable correlation between the target phenotype and the marker in the training population to obtain one or more whole genome selection prediction models meeting the accuracy requirement; selecting one or more whole genome selection prediction models by using model accuracy results, and calculating to obtain effect values of all marks; and calculating the genome estimated breeding value of each individual in the breeding population by using the effect value of each marker.
When the 6 models are used for screening markers with statistical significance, the relation between the markers with obvious association in the training groups and phenotypes is used for evaluating the accuracy of the prediction of each prediction model and the genome estimated breeding value, so that whether the corresponding model is suitable for the training groups is judged. The criteria or requirements assessed herein may be set appropriately according to actual needs, depending on the differences between different study projects and study objectives. For models that pass the accuracy assessment, it can be used to calculate genomic estimated breeding values for breeding populations.
Specific methods for evaluating the accuracy of each predictive model typically employ cross-validation methods. The training population is divided into two parts, one part is used as a training set population (such as 70%, 75% or 80%), the other part is used as a test set population (such as 30%, 25% or 20%), the training set population is used for constructing each model, the test set population is used for testing whether the constructed corresponding prediction model is accurate or not, and therefore the subsequent calculation of the estimated breeding value for genome is screened, the used model is ensured to have relatively high accuracy, and the proportion of the training set population to the test set population is determined according to the condition of projects, the types of species and the number of samples.
The specific sources of markers that are significantly associated with the phenotype of interest described above may be derived from a genealogy population background or from whole genome association analysis (Genomic wide association analysis, GWAS) to yield statistically significant markers. So long as the marker-related data to be used is converted into a data format acceptable and processable by each predictive model.
In the present application, the method of screening for statistically significant markers by whole genome association analysis is focused on. In a preferred embodiment, obtaining markers in the training population that are significantly associated with the phenotype of interest comprises: whole genome association analysis is performed on sequencing data from a training population derived from a gene chip or genome re-sequencing to obtain markers that are significantly associated with the phenotype of interest.
In the whole genome association analysis, in order to more accurately screen and obtain the markers which have statistical significance and are highly associated with the target characters, the differences of the phenotype distribution, the population structural characteristics and the like of training populations of different species and the influence of a plurality of factors such as linkage disequilibrium relationship among the markers on the statistical result are comprehensively considered. In a preferred embodiment, performing a whole genome association analysis from the sequencing data to obtain a marker that is significantly associated with the phenotype of interest comprises: comprehensively analyzing the sequencing data, namely comprehensively analyzing phenotype distribution analysis, group structure analysis, linkage disequilibrium analysis and genetic relationship analysis; and carrying out whole genome association analysis according to the comprehensive analysis result, thereby obtaining the marker which is obviously associated with the target phenotype. The specific analysis reasonably sets screening conditions and considerations according to the different specific species and the different specific target traits.
In a more preferred embodiment, performing a comprehensive analysis of the sequencing data and performing a genome-wide association analysis based on the results of the comprehensive analysis to obtain markers that are significantly associated with the phenotype of interest comprises: detecting whether the phenotype of the quantitative trait in the sequencing data accords with normal distribution or biased distribution, and eliminating extreme phenotypes deviating from the lever value; calculating a population structure in a training population through principal component analysis or population structure analysis, and adding the population structure as a fixed effect into a whole genome association analysis model; performing linkage disequilibrium filtering on the markers of the whole genome through the attenuation distance to remove the markers with multiple collinearity effects; calculating the related distances among individuals in the training population, and adding the related distances into a whole genome association analysis model as a random effect; and calculating the correlation between the quantitative trait phenotypes and the markers of the whole genome by using a whole genome correlation analysis model, so as to select the markers which have obvious correlation with the target phenotypes.
In the preferred embodiment, extreme phenotypes deviating from the leverage value are removed in time by phenotypic characterization analysis so as not to affect the results of subsequent associative analysis. And the analysis of the group structure is convenient for considering the accuracy of estimating the breeding of another group with the same group structure, and the smaller the group structure difference is, the higher the accuracy of the estimated breeding value of the genome obtained by the model calculation is. And judging that the training population has a plurality of population structures through Principal Component Analysis (PCA) or population Structure analysis (Structure), and adding the calculated results into a model of whole genome association analysis (GWAS) as fixed effects, wherein the influence of the factors on the association results is considered during analysis. And judging the attenuation distances of several groups and large groups, and filtering the high-density markers through Linkage Disequilibrium (LD) by the attenuation distances so as to prevent the decrease of the fitness of a whole genome association analysis (GWAS) model caused by the effect of multiple collinearity caused by strong Linkage Disequilibrium (LD) among SNPs. The genetic distances among individuals in the training population are calculated and added into a whole genome association analysis (GWAS) model as random effects, so that the accuracy in analysis is improved conveniently.
The method for obtaining the marker with statistical significance through the whole genome association analysis and screening is also various, and the method can be reasonably selected according to actual needs. In the present application, a mixed linear model is preferably used to calculate the correlation between phenotype and marker, and a significantly correlated marker is retained as a marker for the phenotype of interest.
Example 2
The present example provides a genome selective breeding method developed for the results obtained by whole genome association analysis (GWAS), which comprises the following steps:
the data to be prepared are phenotype of Training Population (TP), genotype of TP (i.e. genotype of marker), GWAS result data of TP, genome Selection (GS) prediction model, genotype data of Breeding Population (BP), respectively. Wherein, the result file of the whole genome association analysis (GWAS) only requires 3 lines, which are chromosome number, physical location of Single Nucleotide Polymorphism (SNP), P-value of Single Nucleotide Polymorphism (SNP) (i.e. P-value having significant correlation with target phenotype), respectively.
Fig. 2 shows a detailed flow of the genome selective breeding method of the present embodiment, in which two populations are required, a training population and a breeding population. After the training population is analyzed by whole genome association analysis (GWAS), extracting genotypes with statistical significance; then analyzing the training set group and the testing set group by the training set group, predicting an optimal model, and selecting the optimal model; genetic effect values of all markers are calculated by an optimal model, and the genome breeding values are estimated by using the effect values.
For selecting the optimal prediction model, calculating genetic effect values of all marks according to the optimal prediction model, and then calculating a genome estimation breeding value, for some embodiments, when only one optimal prediction model exists after multiple prediction models are evaluated, or when the prediction accuracy of the optimal prediction model can cover the prediction results of other models after comparison, the genome estimation breeding value is calculated according to the optimal prediction model.
In other embodiments, for example, when there is no significant difference between the accuracies of the multiple prediction models, in order to further improve the accuracy of genome selective breeding, the accuracies of the multiple prediction models may be evaluated separately, and the estimated genome breeding values of the breeding population are calculated by using the prediction models meeting the accuracy requirement, and the calculation results of each prediction model are selected according to the order of the estimated genome breeding values from high to low, so that those individuals co-located to the high breeding values by the multiple prediction models are further selected as breeding materials.
Example 3
The embodiment discloses a genome selective breeding method for sheep with a certain character. As shown in fig. 3A to 3H6, the method includes the steps of:
S1, phenotypic characteristic analysis: detecting whether the phenotype of the quantitative trait accords with normal distribution or bias distribution, if the phenotype deviates from the extreme phenotype of the lever value, timely eliminating is needed, and the result is shown in figure 3A;
s2, group layering analysis: judging that the population has several population structures through Principal Component Analysis (PCA) or population Structure analysis (Structure), and adding the calculated results into a whole genome association analysis (GWAS) model as a fixed effect, wherein the results are shown in FIG. 3B;
s3, linkage Disequilibrium (LD) analysis: judging attenuation distances of several groups and large groups, filtering the high-density markers through Linkage Disequilibrium (LD) by the attenuation distances to prevent the decrease of the fitness of a whole genome correlation analysis (GWAS) model caused by the effect of multiple collinearity caused by strong Linkage Disequilibrium (LD) among SNPs, wherein the result is shown in figure 3C;
s4, genetic relationship analysis: adding the genetic distances among the population individuals as random effects into a whole genome association analysis (GWAS) model, wherein the result is shown in figure 3D;
s5, whole genome association analysis (GWAS): calculating the association degree between the phenotype and the high-density marker through a whole genome association analysis model (namely a mixed linear model), and selecting the marker with statistical significance, wherein the result is shown in figure 3E;
S6, model evaluation and selection: model accuracy was evaluated for RRBLUP, BL, BA, BB, BC, BRR two-step models, respectively, and models were selected for the adaptation project, the results of which are shown in fig. 3F;
s7, marker effect (Marker effect) analysis: calculating the effect value of the markers with statistical significance in the whole genome of the 6 two-step method models, and the results are shown in fig. 3G1 to 3G 6;
s8, genome Estimation Breeding Value (GEBV) analysis: genome breeding values of corresponding models are predicted through effect values of 6 two-step method models in S7, a genome optimal linear unbiased prediction (GBLUP) model directly calculates Genome Estimated Breeding Values (GEBV) through model evaluation and marking effect values by a one-step method, and high Genome Estimated Breeding Values (GEBV) are selected for breeding. The results of 7 models can also be examined, materials with high breeding values co-located by multiple models can be selected, and these materials can be selected for breeding, as shown in fig. 3H1 to 3H 6.
Example 4
As shown in fig. 4A to 4D, this example shows a genome selective breeding analysis result diagram of chicken pectoral muscle weight ratio, obtained by genome breeding analysis of high-density SNP marker data obtained by high-throughput sequencing based on illuminea platform, wherein 2010956 SNPs were used and 519 samples were used for whole genome selective breeding.
Wherein, fig. 4A shows that the highest mode value in the model prediction accuracy result is a BB model, and fig. 4B shows the result of the marker effect value calculated by the rrBLUP model; FIG. 4C shows the results of estimated genomic breeding values calculated by the rrBLUP model. Fig. 4D shows a number of wien plots of 7 predictive models combined with each other for selection breeding, each model result providing the first 5% of samples of high GEBV, the most central data 7 referring to 7 samples commonly bred by the 7 models, and the other numbers being samples bred with all combinations of the 7 models considered.
Because the prior art basically uses only one model to select genome, the accuracy is about 0.3. The embodiment of the application carries out the combined selection breeding of a plurality of models on the basis of the accuracy of one test model, eliminates the special samples bred by each model, has the possibility of false positive, reduces the false positive, improves the accuracy of the bred samples, and greatly improves the breeding efficiency.
As can be seen from the above description, the above embodiments of the present application achieve the following technical effects: genome estimation breeding value calculation is carried out on breeding groups through the comprehensive multiple prediction models, and then individual materials with high breeding values are co-located by utilizing the calculation results of the multiple models, namely individuals with high breeding values in the calculation results of all the prediction models are selected as breeding materials, so that the accuracy of the results is greatly improved. In addition, the method can find out the optimal model prediction optimal breeding material from a plurality of prediction models, thereby improving the accuracy of genome selective breeding results.
The method can simultaneously run a plurality of phenotypes of a plurality of models and simultaneously generate genome selection results, thereby efficiently completing genome selection analysis. The method can adapt to most material backgrounds, fills the blank in Genome Selection (GS) analysis in a super computer, is beneficial to improving the effect of breeding selection and promotes the progress of breeding.
Compared with the prior art, the method and the device are suitable for small data volume and large data volume on the end face of the desktop PC, serve most of experiments, mainly run in a supercomputer as long as files are well adapted, run by one key, improve the running efficiency by self-debugging the running quantity of the CPU, and can run 7 models simultaneously in the background. The results of the 7 models were plotted elaborately. Therefore, the method of the application has great promotion in the data volume to be calculated, the integrity of the model to be calculated, the efficiency of calculation and the result display.
The method and device of the application are suitable for various sample types, and can analyze whatever type of data, such as WGS, GBS, RAD, genotype of the chip, sample such as natural population, pedigree population, etc., as long as the format is kept consistent.
The method and the device adopt a plurality of models for joint analysis, commonly select and breed samples, and have higher accuracy and richer selectivity.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required for the present application.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in part in the form of a software product stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) including instructions for causing a computing device to perform the method of the various embodiments of the present application or for causing a processor to perform the method of the various embodiments of the present application.
Example 5
The present embodiment provides a whole genome selective breeding device, as shown in fig. 5, which comprises: the system comprises an acquisition module 20, a breeding value estimation module 40 and a selection module 60, wherein the acquisition module 20 is used for acquiring marks which are obviously associated with a target phenotype in a training group; a breeding value estimation module 40 for calculating a genome estimation breeding value of each individual in the breeding population using a plurality of whole genome selection prediction models according to the training population and the markers; a selection module 60 for selecting, as breeding materials, a predetermined number of individuals that are arranged in front of each of the plurality of whole genome selection prediction models in order of high to low genome estimation breeding values.
In the above apparatus, the plurality of genome-wide selection prediction models includes: at least 4 of genome optimal linear unbiased prediction model, ridge regression optimal linear unbiased estimation model, bayesian lasso model, bayesian A model, bayesian B model, bayesian C model and Bayesian ridge regression model.
When the above apparatus, the plurality of genome-wide selection prediction models includes: in at least 3 of the ridge regression optimal linear unbiased estimation model, the bayesian lasso model, the bayesian a model, the bayesian B model, the bayesian C model and the bayesian ridge regression model, since the 6 models are obtained by calculating the genome estimated breeding value by a two-step method, in a preferred embodiment, the breeding value estimation module comprises: the model accuracy evaluation module is used for evaluating the accuracy of various whole genome selection prediction models by utilizing the remarkable correlation between the target phenotype and the marker in the training population to obtain one or more whole genome selection prediction models meeting the accuracy requirement; the effect value calculation module is used for selecting a prediction model by using one or more whole genomes meeting the accuracy requirement, and calculating to obtain the effect value of each mark; and the breeding value estimation submodule is used for calculating genome estimated breeding values of each individual in the breeding population by utilizing the effect value of each marker.
In a preferred embodiment, the acquiring module includes: and the whole genome association analysis module is used for carrying out whole genome association analysis on sequencing data of a training population derived from a gene chip or genome re-sequencing so as to obtain a marker which is obviously associated with the target phenotype.
In a preferred embodiment, the whole genome association analysis module comprises: the comprehensive analysis module is used for comprehensively analyzing the sequencing data, and comprehensively analyzing phenotype distribution analysis, group structure analysis, linkage disequilibrium analysis and genetic relationship analysis; and the whole genome association analysis submodule is used for carrying out whole genome association analysis according to the comprehensive analysis result so as to obtain the marker which is obviously associated with the target phenotype.
In a preferred embodiment, the whole genome association analysis module comprises: the phenotype distribution analysis module is used for detecting whether the phenotype of the quantitative trait in the sequencing data accords with normal distribution or biased distribution, and eliminating extreme phenotypes deviating from the lever value; the group structure analysis module is used for calculating the group structure in the training group through principal component analysis or group structure analysis, and adding the number of the group structure as a fixed effect into the whole genome association analysis sub-module; the linkage disequilibrium analysis module is used for carrying out linkage disequilibrium filtration on the markers of the whole genome through the attenuation distance and removing the markers with the effect of multiple collinearity; the genetic relationship analysis module is used for calculating the genetic relationship distance among individuals in the training population and adding the genetic relationship distance as a random effect into the whole genome association analysis sub-module; a genome-wide association analysis submodule for calculating the association between markers of the quantitative trait phenotype and the genome-wide, so as to select the markers which have obvious association with the target phenotype.
Example 6
The application also provides a storage medium comprising a stored program, wherein the program is used for controlling a device where the storage medium is located to execute any one of the whole genome selective breeding methods.
Example 7
The application also provides a processor for running a program, wherein the program runs to execute any whole genome selective breeding method.
It should be noted that the terms "comprises" and "comprising," along with any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary hardware devices such as detection devices. With such understanding, portions of the data processing in the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, magnetic disk, optical disk, etc., including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods of various embodiments or portions of embodiments of the application.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, multiprocessor systems, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
It will be apparent to those skilled in the art that some of the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, or they may alternatively be implemented in program code executable by a computing device, so that they may be stored in a memory device for execution by the computing device, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A method of whole genome selective breeding, the method comprising:
obtaining a marker in the training population that is significantly associated with the target phenotype;
calculating a genome estimated breeding value of each individual in a breeding population by utilizing a plurality of whole genome selection prediction models according to the training population and the markers;
selecting a predetermined number of individuals which are arranged in front of each of a plurality of whole genome selection prediction models as breeding materials according to the sequence of the genome estimation breeding values from high to low;
wherein the indicia comprises any one or more of the following: SNP markers on the genome, WGS sequenced SNP markers, WGS sequenced INDEL markers, GBS sequenced SNP markers, RAD sequenced SNP markers, or SNP markers in chip data;
the breeding population comprises a natural population and/or a pedigree population;
The obtaining markers in the training population that are significantly associated with the phenotype of interest comprises:
performing whole genome association analysis on sequencing data derived from a gene chip or genome re-sequencing of the training population, thereby obtaining a marker that is significantly associated with the phenotype of interest;
performing the whole genome association analysis from the sequencing data to obtain a marker that is significantly associated with the phenotype of interest comprises:
performing comprehensive analysis on the sequencing data, wherein the comprehensive analysis comprises phenotype distribution analysis, population structure analysis, linkage disequilibrium analysis and genetic relationship analysis;
and performing the genome-wide association analysis according to the result of the comprehensive analysis, thereby obtaining a marker which is obviously associated with the target phenotype.
2. The method of claim 1, wherein a plurality of the whole genome selection prediction models comprises: at least 4 of genome optimal linear unbiased prediction model, ridge regression optimal linear unbiased estimation model, bayesian lasso model, bayesian A model, bayesian B model, bayesian C model and Bayesian ridge regression model.
3. The method of claim 2, wherein when the plurality of whole genome selection prediction models comprises at least 3 of a ridge regression best line type unbiased estimation model, a bayesian lasso model, a bayesian a model, a bayesian B model, a bayesian C model, and a bayesian ridge regression model, calculating a genome estimation breeding value for each individual in the breeding population using the plurality of whole genome selection prediction models comprises:
Performing accuracy assessment on the multiple genome-wide selection prediction models by using the remarkable correlation between the target phenotype and the marker in the training population to obtain one or more genome-wide selection prediction models meeting accuracy requirements;
calculating the effect value of each marker by using the one or more whole genome selection prediction models meeting the accuracy requirement;
and calculating a genome estimated breeding value of each individual in the breeding population by using the effect value of each marker.
4. The method of claim 1, wherein performing a comprehensive analysis on sequencing data and performing the genome-wide association analysis based on the results of the comprehensive analysis to obtain a signature that is significantly associated with the phenotype of interest comprises:
detecting whether the phenotype of the quantitative trait in the sequencing data accords with normal distribution or biased distribution, and eliminating extreme phenotypes deviating from the lever value;
calculating a population structure in the training population through principal component analysis or population structure analysis, and adding the population structure as a fixed effect into a whole genome association analysis model;
performing linkage disequilibrium filtering on the markers of the whole genome through the attenuation distance to remove the markers with multiple collinearity effects;
Calculating the related distances among individuals in the training population, and adding the related distances into the whole genome association analysis model as a random effect;
calculating the association between the markers of the quantitative trait and the whole genome in the phenotype by using the whole genome association analysis model, thereby selecting the markers which have obvious association with the target phenotype.
5. The method of claim 4, wherein the whole genome correlation analysis model is a hybrid linear model.
6. A whole genome selective breeding device, the device comprising:
the acquisition module is used for acquiring marks which are obviously associated with the target phenotype in the training group; the breeding value estimation module is used for calculating a genome estimated breeding value of each individual in the breeding population by utilizing a plurality of whole genome selection prediction models according to the training population and the markers;
a selection module for selecting, as a breeding material, a predetermined number of individuals that are arranged in front of each of the plurality of whole genome selection prediction models in order of the genome estimation breeding value from high to low;
the acquisition module comprises: the whole genome association analysis module is used for carrying out whole genome association analysis on the sequencing data of the training population from the gene chip or genome re-sequencing so as to obtain a marker obviously associated with the target phenotype;
The whole genome association analysis module comprises:
the comprehensive analysis module is used for comprehensively analyzing the sequencing data, wherein the comprehensive analysis is used for carrying out phenotype distribution analysis, group structure analysis, linkage disequilibrium analysis and genetic relationship analysis;
a whole genome association analysis sub-module for performing the whole genome association analysis according to the result of the comprehensive analysis, thereby obtaining a marker significantly associated with the target phenotype;
the indicia include any one or more of the following: SNP markers on the genome, WGS sequenced SNP markers, WGS sequenced INDEL markers, GBS sequenced SNP markers, RAD sequenced SNP markers, or SNP markers in chip data;
the breeding populations include natural populations and/or pedigree populations.
7. The apparatus of claim 6, wherein a plurality of the genome-wide selection prediction models comprises: at least 4 of genome optimal linear unbiased prediction model, ridge regression optimal linear unbiased estimation model, bayesian lasso model, bayesian A model, bayesian B model, bayesian C model and Bayesian ridge regression model.
8. The apparatus of claim 7, wherein when the plurality of genome-wide selection prediction models comprises at least 3 of a ridge regression best line type unbiased estimation model, a bayesian lasso model, a bayesian a model, a bayesian B model, a bayesian C model, and a bayesian ridge regression model, the breeding value estimation module comprises:
The model accuracy evaluation module is used for evaluating the accuracy of the multiple genome-wide selection prediction models by utilizing the remarkable correlation between the target phenotype and the marker in the training population to obtain one or more genome-wide selection prediction models meeting the accuracy requirement;
the effect value calculation module is used for calculating the effect value of each marker by using the one or more whole genome selection prediction models meeting the accuracy requirement;
a breeding value estimation sub-module, configured to calculate a genome estimated breeding value of each individual in the breeding population using the effect value of each marker.
9. The apparatus of claim 6, wherein the whole genome association analysis module comprises:
the phenotype distribution analysis module is used for detecting whether the phenotype of the quantitative trait in the sequencing data accords with normal distribution or biased distribution, and eliminating extreme phenotypes deviating from the lever value;
the group structure analysis module is used for calculating the group structure in the training group through principal component analysis or group structure analysis, and adding the group structure as a fixed effect into the whole genome association analysis sub-module;
The linkage disequilibrium analysis module is used for carrying out linkage disequilibrium filtration on the markers of the whole genome through the attenuation distance and removing the markers with the effect of multiple collinearity;
the genetic relationship analysis module is used for calculating the genetic relationship distance among individuals in the training population and adding the genetic relationship distance into the whole genome association analysis sub-module as a random effect;
the genome-wide association analysis submodule is used for calculating the association between the markers in the quantitative trait phenotype and the genome-wide markers so as to select markers which have obvious association with the target phenotype.
10. The apparatus of claim 9, wherein the whole genome association analysis submodule is a hybrid linear module.
11. A storage medium comprising a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform the method of any one of claims 1 to 5.
12. A processor for running a program, wherein the program when run performs the method of any one of claims 1 to 5.
CN202010366270.XA 2020-04-30 2020-04-30 Method and device for whole genome selective breeding Active CN111524545B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010366270.XA CN111524545B (en) 2020-04-30 2020-04-30 Method and device for whole genome selective breeding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010366270.XA CN111524545B (en) 2020-04-30 2020-04-30 Method and device for whole genome selective breeding

Publications (2)

Publication Number Publication Date
CN111524545A CN111524545A (en) 2020-08-11
CN111524545B true CN111524545B (en) 2023-11-10

Family

ID=71906503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010366270.XA Active CN111524545B (en) 2020-04-30 2020-04-30 Method and device for whole genome selective breeding

Country Status (1)

Country Link
CN (1) CN111524545B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113373245A (en) * 2021-07-14 2021-09-10 广东海洋大学 Method for cultivating improved variety of pinctada martensii with golden yellow shell color character based on whole genome selection
CN115443907B (en) * 2022-07-26 2023-04-21 开封市农林科学研究院 High-yield large-fruit peanut hybrid combination selection method based on whole genome selection
CN116467596B (en) * 2023-04-11 2024-03-26 广州国家现代农业产业科技创新中心 Training method of rice grain length prediction model, morphology prediction method and apparatus
CN116732222B (en) * 2023-05-26 2024-03-15 南京农业大学 Method for efficiently predicting salt tolerance of chrysanthemum based on whole genome

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914631A (en) * 2014-02-26 2014-07-09 中国农业大学 Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip
CN105868584A (en) * 2016-05-23 2016-08-17 厦门胜芨科技有限公司 Method for performing whole genome selective breeding by selecting extreme character individual
CN106755441A (en) * 2016-12-29 2017-05-31 华南农业大学 A kind of method that gene group selection based on multiple characters carries out forest multiple characters pyramiding breeding
CN107278877A (en) * 2017-07-25 2017-10-24 山东省农业科学院玉米研究所 A kind of full-length genome selection and use method of corn seed-producing rate
CN110610744A (en) * 2019-09-11 2019-12-24 华中农业大学 Efficient whole genome selection method capable of realizing parallel operation and high accuracy
CN110867208A (en) * 2019-11-29 2020-03-06 中国科学院海洋研究所 Method for improving whole genome selective breeding efficiency of aquatic animals

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017013462A1 (en) * 2015-07-23 2017-01-26 Limagrain Europe Improved computer implemented method for predicting true agronomical value of a plant

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914631A (en) * 2014-02-26 2014-07-09 中国农业大学 Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip
CN105868584A (en) * 2016-05-23 2016-08-17 厦门胜芨科技有限公司 Method for performing whole genome selective breeding by selecting extreme character individual
CN106755441A (en) * 2016-12-29 2017-05-31 华南农业大学 A kind of method that gene group selection based on multiple characters carries out forest multiple characters pyramiding breeding
CN107278877A (en) * 2017-07-25 2017-10-24 山东省农业科学院玉米研究所 A kind of full-length genome selection and use method of corn seed-producing rate
CN110610744A (en) * 2019-09-11 2019-12-24 华中农业大学 Efficient whole genome selection method capable of realizing parallel operation and high accuracy
CN110867208A (en) * 2019-11-29 2020-03-06 中国科学院海洋研究所 Method for improving whole genome selective breeding efficiency of aquatic animals

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐玲.肉用西门塔尔牛全基因组选择的低密度SNP芯片设计研究.中国优秀硕士学位论文全文数据库 农业科技辑.2019,正文第2.5-2.7节,第3.5.5节. *

Also Published As

Publication number Publication date
CN111524545A (en) 2020-08-11

Similar Documents

Publication Publication Date Title
CN111524545B (en) Method and device for whole genome selective breeding
Granato et al. snpReady: a tool to assist breeders in genomic analysis
Qanbari On the extent of linkage disequilibrium in the genome of farm animals
Dassonneville et al. Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations
Zhao et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa
Hozé et al. High-density marker imputation accuracy in sixteen French cattle breeds
CN113519028B (en) Methods and compositions for estimating or predicting genotypes and phenotypes
Jenko et al. Cow genotyping strategies for genomic selection in a small dairy cattle population
US20060111849A1 (en) Computer systems and methods that use clinical and expression quantitative trait loci to associate genes with traits
Kelly et al. The genomic signal of partial sweeps in Mimulus guttatus
Chen et al. Using Mendelian inheritance to improve high-throughput SNP discovery
Pérez-Enciso et al. Evaluating sequence-based genomic prediction with an efficient new simulator
Kozak et al. Genome-wide admixture is common across the Heliconius radiation
CN105868584A (en) Method for performing whole genome selective breeding by selecting extreme character individual
Timmermans et al. Mimicry diversification in Papilio dardanus via a genomic inversion in the regulatory region of engrailed–invected
Salmona et al. Inferring demographic history using genomic data
Bartholomé et al. Genomic prediction: progress and perspectives for rice improvement
CN108376210A (en) A kind of breeding parent selection method excavated based on the advantageous haplotypes of full-length genome SNP of genomic information auxiliary breeding means II-
Aylward et al. How methodological changes have influenced our understanding of population structure in threatened species: insights from tiger populations across India
Gil et al. Accurate, efficient and user-friendly mutation calling and sample identification for TILLING experiments
Hill et al. A global barley panel revealing genomic signatures of breeding in modern cultivars
You et al. Genomic cross prediction for linseed improvement
Jiang et al. Fast Bayesian fine-mapping of 35 production, reproduction and body conformation traits with imputed sequences of 27K Holstein bulls
CN111898807B (en) Tobacco leaf yield prediction method based on whole genome selection and application
Hodel et al. Linking genome signatures of selection and adaptation in non-model plants: exploring potential and limitations in the angiosperm Amborella

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant