CN105512510A - Algorithm for assessing heritability through genome data - Google Patents
Algorithm for assessing heritability through genome data Download PDFInfo
- Publication number
- CN105512510A CN105512510A CN201510873172.4A CN201510873172A CN105512510A CN 105512510 A CN105512510 A CN 105512510A CN 201510873172 A CN201510873172 A CN 201510873172A CN 105512510 A CN105512510 A CN 105512510A
- Authority
- CN
- China
- Prior art keywords
- algorithm
- genome
- value
- gebv
- heritability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Landscapes
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an algorithm for assessing heritability through genome data. The algorithm comprises the steps that for a certain quantitative character, marker effect estimation is conducted on a whole genome with different numbers of reference group individuals through a GBLUP algorithm, the breeding value of an estimated group is further obtained, and estimation accuracy is calculated; curve linearization fitting is conducted through genome estimation accuracy and the size of reference groups, and the reciprocal of the intercept of a regression equation obtained through fitting serves as the estimated value of heritability. According to the algorithm for assessing heritability through the genome data, heritability of the quantitative character is assessed through the genome data, the research achievement can be directly applied to quantitative character breeding of animals and plants, genealogy recording is not conducted on the individuals, but sequencing is conducted on each individual genome, heritability of the character is predicted through whole genome marks, the heritability estimation result is mainly applied to breeding work in the future, in addition, Mendel sampling errors can be captured through sequencing, and compared with genealogy data recording, more accurate genealogy information can be obtained.
Description
Technical field
The present invention relates to genetic engineering field, specifically a kind of algorithm genetic force assessed by genomic data.
Background technology
Current genetic force appraisal procedure mainly utilizes the sibship between individuality, adopt various statistical means, as method of analysis of variance, relevant function method etc. are inferred, the method will carry out complete pedigree record, but for some species, carry out very large being even difficult to of pedigree record workload to realize, such as aquatic livestock; In addition, traditional genetic force appraisal procedure genomic information is used as " black box " process, and cannot capture the specifying information that gene transmits from parent to filial generation like this, namely cannot prepare to capture Mendelian sampling error, cause evaluated error larger; In order to solve the large problem with Mendelian sampling error accurately cannot be caught of pedigree record workload in conventional genetic power method of estimation, need to carry out improvement improvement to prior art.
Summary of the invention
The object of the present invention is to provide a kind of problem that error is comparatively large and pedigree record is loaded down with trivial details overcome in the estimation of conventional genetic power.By the algorithm that genomic data is assessed genetic force, to solve the problem proposed in above-mentioned background technology.
The present invention does not carry out individual pedigree record, directly checks order to the genome of all individualities, in conjunction with individual performance inventory and genomic marker information, estimates the accuracy of estimation of genomic breeding value, and then estimates the genetic force of proterties.
For achieving the above object, the invention provides following technical scheme:
By the algorithm that genomic data is assessed genetic force, for a certain quantitative character, by the estimation using the reference group of varying number individuality to carry out the marker effect of full-length genome, and then obtain the breeding value estimating group, and calculate accuracy of estimation; Said process is exactly the detailed process that genome is selected in fact, adopt GBLUP as the algorithm calculating marker effect in this invention, GBLUP algorithm was invented by people such as Meuwissen in calendar year 2001, its prior distribution thinks that the effect variance of all marker sites of genome is equal, and marker effect can be drawn by following formulae discovery:
Wherein,
for population mean;
for the effect vector of all marker sites; Genome estimated breeding value (GEBV) obtains, i.e. GEBV=∑ X by the effect of all marker sites being added
ig
i; GEBV estimates that accuracy is by calculating the related coefficient of GEBV and true breeding value (TBV), i.e. r
(GEBV, TBV); Meanwhile, the people such as Daetwyler were deduced when GBLUP algorithm estimated breeding value in 2008, r
(GEBV, TBV)another computing formula be:
Wherein, N
pfor the individual amount with reference to group; h
2for the genetic force of studied proterties; M is the number of the effective gene pack section determining this proterties; But in actual production, the concrete numerical value of TBV cannot be learnt, therefore use phenotypic number (Y) to substitute TBV, the pass deriving GEBV and Y is:
In formula (3), by adjustment N
plarge I obtain different r
(GEBV, Y)value, this curvilinear equation of matching, the mode of matching adopts curve linearize, arranges, obtain linear equation to formula (3):
This equation is equivalent to linear regression model (LRM) y=a+bx, and wherein y is r
(GEBV, Y)square inverse, x is N
pinverse, namely the intercept a of equation is the inverse of genetic force, by asking the inverse of the intercept of this equation, obtains the estimated value of genetic force.
As the further scheme of the present invention: check order to all genes of individuals groups, obtain SNP information, the SNP site of all individualities is corresponding, and missing data is by imputation method polishing.
As the present invention's further scheme: for preventing single evaluated error comparatively large, adopting repeatedly the method for hybridization verification, repeatedly randomly drawing reference group and estimate colony from overall, obtaining the estimated result close to actual value.
As the present invention's further scheme: use different reference group numbers in conjunction with GBLUP algorithm to calculate the effect value of each mark of genome, to obtain the breeding value estimating group, obtain accuracy of estimation by carrying out correlation analysis to the breeding value and phenotypic number of estimating group
Compared with prior art, the invention has the beneficial effects as follows: the present invention is assessed by the genetic force of genomic data to quantitative character, the achievement studied can directly apply in the breeding of animals and plants quantitative character, algorithm of the present invention can not set up on the basis of family, the genetic force predicting proterties is marked by full-length genome, solve the loaded down with trivial details problem being even difficult to realize of pedigree record, and because order-checking can capture Mendelian sampling error, algorithm relative record pedigree data of the present invention can obtain pedigree information more accurately.
Accompanying drawing explanation
Fig. 1 is algorithm flow chart of the present invention.
Fig. 2 is the trend map of GEBV accuracy with reference group's size variation of body weight and long two proterties of body in the present invention.
Fig. 3 is the trend map after the GEBV accuracy of long two proterties of body weight and body in the present invention and reference group's size are changed according to formula 4.
Wherein, the value of horizontal ordinate is the reciprocal value with reference to group's number of individuals; The value of ordinate be GEBV accuracy square inverse; R
2for the coefficient of determination of regression equation.
Embodiment
Be described in more detail below in conjunction with the technical scheme of embodiment to this patent.
Refer to accompanying drawing 1-3, by the algorithm that genomic data is assessed genetic force, for a certain quantitative character, by the estimation using the reference group of varying number individuality to carry out the marker effect of full-length genome, and then obtain the breeding value estimating group, and calculate accuracy of estimation; Carry out the matching of curve linearize by genome accuracy of estimation and reference group's size, the inverse of the intercept of the regression equation simulated is the estimated value of genetic force; It is characterized in that: the detailed process that genome is selected adopts GBLUP as the algorithm calculating marker effect, and the effect variance of all marker sites of genome is equal, and marker effect is drawn by following formulae discovery:
Wherein,
for population mean;
for the effect vector of all marker sites; Genome estimated breeding value (GEBV) obtains by the effect of all marker sites being added, i.e. GEBV=∑ X
ig
i; GEBV estimates that accuracy is by calculating the related coefficient of GEBV and true breeding value (TBV), i.e. r
(GEBV, TBV)draw; When GBLUP algorithm estimated breeding value, r
(GEBV, TBV)another computing formula be:
Wherein, N
pfor the individual amount with reference to group; h
2for the genetic force of studied proterties; M is the number of the effective gene pack section determining this proterties; In actual production, cannot learn the concrete numerical value of TBV, therefore use phenotypic number (Y) to substitute TBV, the pass deriving GEBV and Y is:
In formula (3), by adjustment N
plarge I obtain different r
(GEBV, Y)value, this curvilinear equation of matching, the mode of matching adopts curve linearize, arranges, obtain linear equation to formula (3):
This equation is equivalent to linear regression model (LRM) y=a+bx, and wherein y is r
(GEBV, Y)square inverse, x is N
pinverse, namely the intercept a of equation is the inverse of genetic force, by asking the inverse of the intercept of this equation, obtains the estimated value of genetic force.
Check order to all genes of individuals groups, obtain SNP information, the SNP site of all individualities is corresponding, and missing data is by imputation method polishing; For preventing single evaluated error comparatively large, adopting repeatedly the method for hybridization verification, from overall, repeatedly randomly draw reference group and estimate colony, obtaining the estimated result close to actual value; Use different reference group numbers in conjunction with GBLUP algorithm to calculate the effect value of each mark of genome, to obtain the breeding value estimating group, accuracy of estimation is obtained by carrying out analysis to the breeding value and phenotypic number of estimating group, solve the problem that pedigree record intricate operation has even been difficult to, accurately catch the Mendelian sampling error of allele in transmittance process simultaneously.
Embodiment 1
1. subjects is 500 large yellow croakers, and adopt and manually urge ovum technology, all large yellow croakers are being born on the same day, and namely the age is all identical; When test period is two age of large yellow croaker, Metric traits is that the body weight of all large yellow croakers and body are long.
2. adopt GBS (genotyping-by-sequencing) sequencing technologies to carry out gene order-checking to all individualities that will study, screen qualified SNP site, state modulator is as follows: by MAF > 0.05, Hardy-Weinberg equilibrium inspection P-value > 0.001, the miss rate of Single locus stays lower than the marker site of 20%; Final filter out altogether 29748 qualified SNP marker, for the site of disappearance, by the imputation program polishing of software Beagle3.3.2 version.
3. in all 500 individualities, colonies are estimated in random sampling extraction 20% i.e. 100 individual conducts, remaining is divided into four grades according to number of individuals 100,200,300,400, and the reference group number of individuals observing four different stages corresponds to the variation tendency of accuracy of estimation; All marker effects under using GBLUP algorithm to estimate each grade, obtain the breeding value GEBV of each individuality estimating group, estimating the GEBV of group and the related coefficient of phenotypic number, obtaining accuracy of estimation, i.e. r by calculating
(GEBV, Y).
In order to reduce the excessive impact of single sampling error, by step 3 repetitive operation 20 times, owing to estimating that group and the individuality with reference to group are all random samplings at every turn, therefore each result repeated can be slightly different, but the mean value of 20 results can more close to legitimate reading, shown in the result accompanying drawing 2 of 20 mean values.
4. reference group size (the i.e. N of pair each grade
p) get inverse, to accuracy of estimation (the i.e. r of each grade
(GEBV, Y)) the squared inverse of the mean value of 20 results, relation therebetween as shown in Figure 3, carrys out the final regression equation of matching according to formula (4), as shown in the table:
According to upper table result, the heretability estimate value can trying to achieve body weight is 0.227, and body length is 0.196.
Above the better embodiment of this patent is explained in detail, but this patent is not limited to above-mentioned embodiment, in the ken that one skilled in the relevant art possesses, can also makes a variety of changes under the prerequisite not departing from this patent aim.
Claims (4)
1. algorithm genetic force assessed by genomic data, for a certain quantitative character, by the estimation using the reference group of varying number individuality to carry out the marker effect of full-length genome, and then obtain the breeding value estimating group, and calculate accuracy of estimation; Carry out the matching of curve linearize by genome accuracy of estimation and reference group's size, the inverse of the intercept of the regression equation simulated is the estimated value of genetic force; It is characterized in that: the detailed process that genome is selected adopts GBLUP as the algorithm calculating marker effect, and the effect variance of all marker sites of genome is equal, and marker effect is drawn by following formulae discovery:
Wherein,
for population mean;
for the effect vector of all marker sites; Genome estimated breeding value (GEBV) obtains by the effect of all marker sites being added, i.e. GEBV=∑ X
ig
i; GEBV estimates that accuracy is by calculating the related coefficient of GEBV and true breeding value (TBV), i.e. r
(GEBVTBV) draw; When GBLUP algorithm estimated breeding value, r
(GEBVTBV)another computing formula be:
Wherein, N
pfor the individual amount with reference to group; h
2for the genetic force of studied proterties; M is the number of the effective gene pack section determining this proterties; In actual production, cannot learn the concrete numerical value of TBV, therefore use phenotypic number (Y) to substitute TBV, the pass deriving GEBV and Y is:
In formula (3), by adjustment N
plarge I obtain different r
(GEBV, Y)value, this curvilinear equation of matching, the mode of matching adopts curve linearize, arranges, obtain linear equation to formula (3):
This equation is equivalent to linear regression model (LRM) y=a+bx, and wherein y is r
(GEBV, Y)square inverse, x is N
pinverse, namely the intercept a of equation is the inverse of genetic force, by asking the inverse of the intercept of this equation, obtains the estimated value of genetic force.
2. algorithm genetic force assessed by genomic data according to claim 1, it is characterized in that, all genes of individuals groups are checked order, obtain SNP information, the SNP site of all individualities is corresponding, and missing data is by imputation method polishing.
3. algorithm genetic force assessed by genomic data according to claim 1, it is characterized in that, for preventing single evaluated error larger, adopt repeatedly the method for hybridization verification, from overall, repeatedly randomly draw reference group and estimate colony, obtaining the estimated result close to actual value.
4. algorithm genetic force assessed by genomic data according to claim 1, it is characterized in that, use different reference group numbers in conjunction with GBLUP algorithm to calculate the effect value of each mark of genome, to obtain the breeding value estimating group, obtain accuracy of estimation by carrying out correlation analysis to the breeding value and phenotypic number of estimating group.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510873172.4A CN105512510B (en) | 2015-12-03 | 2015-12-03 | A method of genetic force is assessed by genomic data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510873172.4A CN105512510B (en) | 2015-12-03 | 2015-12-03 | A method of genetic force is assessed by genomic data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105512510A true CN105512510A (en) | 2016-04-20 |
CN105512510B CN105512510B (en) | 2019-03-08 |
Family
ID=55720487
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510873172.4A Active CN105512510B (en) | 2015-12-03 | 2015-12-03 | A method of genetic force is assessed by genomic data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105512510B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107338321A (en) * | 2017-08-29 | 2017-11-10 | 集美大学 | A kind of method for determining optimal SNP quantity and its carrying out genome selection and use to large yellow croaker production performance by selection markers |
CN109817281A (en) * | 2019-01-23 | 2019-05-28 | 湖南农业大学 | Estimation method, device and the electronic equipment that genome kind is constituted |
CN111627495A (en) * | 2020-06-01 | 2020-09-04 | 集美大学 | Method for judging species value of population |
CN114410746A (en) * | 2022-03-29 | 2022-04-29 | 中国海洋大学三亚海洋研究院 | Dongxiang spot molecule source-tracing selection breeding method and application thereof |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103914631A (en) * | 2014-02-26 | 2014-07-09 | 中国农业大学 | Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip |
CN103914632A (en) * | 2014-02-26 | 2014-07-09 | 中国农业大学 | Method for rapidly evaluating genome breeding value and application |
-
2015
- 2015-12-03 CN CN201510873172.4A patent/CN105512510B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103914631A (en) * | 2014-02-26 | 2014-07-09 | 中国农业大学 | Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip |
CN103914632A (en) * | 2014-02-26 | 2014-07-09 | 中国农业大学 | Method for rapidly evaluating genome breeding value and application |
Non-Patent Citations (4)
Title |
---|
ANDRES LEGARRA ETAL: ""performance of genomic selection in mice"", 《GENETICS》 * |
M. SHIRALI ETAL: ""A Comparison of the Sensitivity of the BayesC and Genomic Best Linear Unbiased Prediction (GBLUP) Methods of Estimating Genomic Breeding Values under Different Quantitative Trait Locus (QTL) Model Assumptions"", 《IRANIAN JOURNAL OF APPLIED ANIMAL SCIENCE》 * |
冷静的疯子: ""曲线拟合与曲线直线化"", 《HTTP://BLOG.SINA.COM.CN/S/BLOG_6E59E3730100VFMH.HTML》 * |
张勤: "《动物重要经济性状基因的分离与应用》", 28 February 2212, 中国农业大学出版社 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107338321A (en) * | 2017-08-29 | 2017-11-10 | 集美大学 | A kind of method for determining optimal SNP quantity and its carrying out genome selection and use to large yellow croaker production performance by selection markers |
CN107338321B (en) * | 2017-08-29 | 2020-05-19 | 集美大学 | Method for determining optimal SNP (single nucleotide polymorphism) quantity and performing genome selective breeding on production performance of large yellow croaker through screening markers |
CN109817281A (en) * | 2019-01-23 | 2019-05-28 | 湖南农业大学 | Estimation method, device and the electronic equipment that genome kind is constituted |
CN109817281B (en) * | 2019-01-23 | 2022-12-23 | 湖南农业大学 | Method and device for estimating genome variety composition, and electronic device |
CN111627495A (en) * | 2020-06-01 | 2020-09-04 | 集美大学 | Method for judging species value of population |
CN111627495B (en) * | 2020-06-01 | 2023-03-14 | 集美大学 | Method for judging species value of population |
CN114410746A (en) * | 2022-03-29 | 2022-04-29 | 中国海洋大学三亚海洋研究院 | Dongxiang spot molecule source-tracing selection breeding method and application thereof |
CN114410746B (en) * | 2022-03-29 | 2022-07-12 | 中国海洋大学三亚海洋研究院 | Dongxiang spot molecule source-tracing selection breeding method and application thereof |
Also Published As
Publication number | Publication date |
---|---|
CN105512510B (en) | 2019-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Montesinos-López et al. | A genomic Bayesian multi-trait and multi-environment model | |
Hall et al. | A practical toolbox for design and analysis of landscape genetics studies | |
Prunier et al. | Optimizing the trade‐off between spatial and genetic sampling efforts in patchy populations: Towards a better assessment of functional connectivity using an individual‐based sampling scheme | |
Legarra et al. | Improved Lasso for genomic selection | |
Taylor | Implementation and accuracy of genomic selection | |
CN105512510A (en) | Algorithm for assessing heritability through genome data | |
Twyford et al. | Multi-level patterns of genetic structure and isolation by distance in the widespread plant Mimulus guttatus | |
Hvilsom et al. | Understanding geographic origins and history of admixture among chimpanzees in European zoos, with implications for future breeding programmes | |
Bothwell et al. | Identifying genetic signatures of selection in a non-model species, alpine gentian (Gentiana nivalis L.), using a landscape genetic approach | |
US20110296753A1 (en) | Methods and compositions for predicting unobserved phenotypes (pup) | |
CN105868584A (en) | Method for performing whole genome selective breeding by selecting extreme character individual | |
Flanagan et al. | Population genomics reveals multiple drivers of population differentiation in a sex‐role‐reversed pipefish | |
Ye et al. | Phylogeography of Schisandra chinensis (Magnoliaceae) reveal multiple refugia with ample gene flow in Northeast China | |
Maenhout et al. | Prediction of maize single-cross hybrid performance: support vector machine regression versus best linear prediction | |
Hernández-Velasco et al. | Spatial genetic structure in four Pinus species in the Sierra Madre Occidental, Durango, México | |
Mäntysaari et al. | Use of bivariate EBV-DGV model to combine genomic and conventional breeding value evaluations | |
CN105740649A (en) | Multi-character correlation analysis method based on mixed linear model | |
CN108197435A (en) | Localization method between a kind of multiple characters multi-region for containing error based on marker site genotype | |
Chybicki et al. | Isolation-by-distance within naturally established populations of European beech (Fagus sylvatica) | |
Wang et al. | Molecular phylogeography and historical demography of a widespread herbaceous species from eastern North America, Podophyllum peltatum | |
Legarra et al. | Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation–maximization maximum likelihood and increase of relationships | |
Clark et al. | Chloroplast DNA phylogeography in long-lived Huon pine, a Tasmanian rain forest conifer | |
CN108090325B (en) | Method for analyzing single cell sequencing data by applying beta-stability | |
Vandergast | Incorporating genetic sampling in long-term monitoring and adaptive management in the San Diego County Management Strategic Plan Area, Southern California | |
Degner | Genomics of adaptation in interior spruce to past, present, and future climates of western Canada |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |