CN105512510A - Algorithm for assessing heritability through genome data - Google Patents

Algorithm for assessing heritability through genome data Download PDF

Info

Publication number
CN105512510A
CN105512510A CN201510873172.4A CN201510873172A CN105512510A CN 105512510 A CN105512510 A CN 105512510A CN 201510873172 A CN201510873172 A CN 201510873172A CN 105512510 A CN105512510 A CN 105512510A
Authority
CN
China
Prior art keywords
algorithm
genome
value
gebv
heritability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510873172.4A
Other languages
Chinese (zh)
Other versions
CN105512510B (en
Inventor
肖世俊
董林松
王志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jimei University
Original Assignee
Jimei University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jimei University filed Critical Jimei University
Priority to CN201510873172.4A priority Critical patent/CN105512510B/en
Publication of CN105512510A publication Critical patent/CN105512510A/en
Application granted granted Critical
Publication of CN105512510B publication Critical patent/CN105512510B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an algorithm for assessing heritability through genome data. The algorithm comprises the steps that for a certain quantitative character, marker effect estimation is conducted on a whole genome with different numbers of reference group individuals through a GBLUP algorithm, the breeding value of an estimated group is further obtained, and estimation accuracy is calculated; curve linearization fitting is conducted through genome estimation accuracy and the size of reference groups, and the reciprocal of the intercept of a regression equation obtained through fitting serves as the estimated value of heritability. According to the algorithm for assessing heritability through the genome data, heritability of the quantitative character is assessed through the genome data, the research achievement can be directly applied to quantitative character breeding of animals and plants, genealogy recording is not conducted on the individuals, but sequencing is conducted on each individual genome, heritability of the character is predicted through whole genome marks, the heritability estimation result is mainly applied to breeding work in the future, in addition, Mendel sampling errors can be captured through sequencing, and compared with genealogy data recording, more accurate genealogy information can be obtained.

Description

A kind of algorithm genetic force assessed by genomic data
Technical field
The present invention relates to genetic engineering field, specifically a kind of algorithm genetic force assessed by genomic data.
Background technology
Current genetic force appraisal procedure mainly utilizes the sibship between individuality, adopt various statistical means, as method of analysis of variance, relevant function method etc. are inferred, the method will carry out complete pedigree record, but for some species, carry out very large being even difficult to of pedigree record workload to realize, such as aquatic livestock; In addition, traditional genetic force appraisal procedure genomic information is used as " black box " process, and cannot capture the specifying information that gene transmits from parent to filial generation like this, namely cannot prepare to capture Mendelian sampling error, cause evaluated error larger; In order to solve the large problem with Mendelian sampling error accurately cannot be caught of pedigree record workload in conventional genetic power method of estimation, need to carry out improvement improvement to prior art.
Summary of the invention
The object of the present invention is to provide a kind of problem that error is comparatively large and pedigree record is loaded down with trivial details overcome in the estimation of conventional genetic power.By the algorithm that genomic data is assessed genetic force, to solve the problem proposed in above-mentioned background technology.
The present invention does not carry out individual pedigree record, directly checks order to the genome of all individualities, in conjunction with individual performance inventory and genomic marker information, estimates the accuracy of estimation of genomic breeding value, and then estimates the genetic force of proterties.
For achieving the above object, the invention provides following technical scheme:
By the algorithm that genomic data is assessed genetic force, for a certain quantitative character, by the estimation using the reference group of varying number individuality to carry out the marker effect of full-length genome, and then obtain the breeding value estimating group, and calculate accuracy of estimation; Said process is exactly the detailed process that genome is selected in fact, adopt GBLUP as the algorithm calculating marker effect in this invention, GBLUP algorithm was invented by people such as Meuwissen in calendar year 2001, its prior distribution thinks that the effect variance of all marker sites of genome is equal, and marker effect can be drawn by following formulae discovery:
l n ′ l n l n ′ X X ′ l n X ′ X + I λ μ ^ g ^ l n ′ y X ′ y - - - ( 1 )
Wherein, for population mean; for the effect vector of all marker sites; Genome estimated breeding value (GEBV) obtains, i.e. GEBV=∑ X by the effect of all marker sites being added ig i; GEBV estimates that accuracy is by calculating the related coefficient of GEBV and true breeding value (TBV), i.e. r (GEBV, TBV); Meanwhile, the people such as Daetwyler were deduced when GBLUP algorithm estimated breeding value in 2008, r (GEBV, TBV)another computing formula be:
r ( G E B V , T B V ) = N p h 2 N p h 2 + M - - - ( 2 )
Wherein, N pfor the individual amount with reference to group; h 2for the genetic force of studied proterties; M is the number of the effective gene pack section determining this proterties; But in actual production, the concrete numerical value of TBV cannot be learnt, therefore use phenotypic number (Y) to substitute TBV, the pass deriving GEBV and Y is:
r ( G E B V , Y ) = r ( G E B V , T B V ) * h = N p h 2 N p h 2 + M * h - - - ( 3 )
In formula (3), by adjustment N plarge I obtain different r (GEBV, Y)value, this curvilinear equation of matching, the mode of matching adopts curve linearize, arranges, obtain linear equation to formula (3):
1 r ( G E B V , Y ) 2 = 1 h 2 + M h 4 * 1 N p - - - ( 4 )
This equation is equivalent to linear regression model (LRM) y=a+bx, and wherein y is r (GEBV, Y)square inverse, x is N pinverse, namely the intercept a of equation is the inverse of genetic force, by asking the inverse of the intercept of this equation, obtains the estimated value of genetic force.
As the further scheme of the present invention: check order to all genes of individuals groups, obtain SNP information, the SNP site of all individualities is corresponding, and missing data is by imputation method polishing.
As the present invention's further scheme: for preventing single evaluated error comparatively large, adopting repeatedly the method for hybridization verification, repeatedly randomly drawing reference group and estimate colony from overall, obtaining the estimated result close to actual value.
As the present invention's further scheme: use different reference group numbers in conjunction with GBLUP algorithm to calculate the effect value of each mark of genome, to obtain the breeding value estimating group, obtain accuracy of estimation by carrying out correlation analysis to the breeding value and phenotypic number of estimating group
Compared with prior art, the invention has the beneficial effects as follows: the present invention is assessed by the genetic force of genomic data to quantitative character, the achievement studied can directly apply in the breeding of animals and plants quantitative character, algorithm of the present invention can not set up on the basis of family, the genetic force predicting proterties is marked by full-length genome, solve the loaded down with trivial details problem being even difficult to realize of pedigree record, and because order-checking can capture Mendelian sampling error, algorithm relative record pedigree data of the present invention can obtain pedigree information more accurately.
Accompanying drawing explanation
Fig. 1 is algorithm flow chart of the present invention.
Fig. 2 is the trend map of GEBV accuracy with reference group's size variation of body weight and long two proterties of body in the present invention.
Fig. 3 is the trend map after the GEBV accuracy of long two proterties of body weight and body in the present invention and reference group's size are changed according to formula 4.
Wherein, the value of horizontal ordinate is the reciprocal value with reference to group's number of individuals; The value of ordinate be GEBV accuracy square inverse; R 2for the coefficient of determination of regression equation.
Embodiment
Be described in more detail below in conjunction with the technical scheme of embodiment to this patent.
Refer to accompanying drawing 1-3, by the algorithm that genomic data is assessed genetic force, for a certain quantitative character, by the estimation using the reference group of varying number individuality to carry out the marker effect of full-length genome, and then obtain the breeding value estimating group, and calculate accuracy of estimation; Carry out the matching of curve linearize by genome accuracy of estimation and reference group's size, the inverse of the intercept of the regression equation simulated is the estimated value of genetic force; It is characterized in that: the detailed process that genome is selected adopts GBLUP as the algorithm calculating marker effect, and the effect variance of all marker sites of genome is equal, and marker effect is drawn by following formulae discovery:
l n ′ l n l n ′ X X ′ l n X ′ X + I λ μ ^ g ^ l n ′ y X ′ y - - - ( 1 )
Wherein, for population mean; for the effect vector of all marker sites; Genome estimated breeding value (GEBV) obtains by the effect of all marker sites being added, i.e. GEBV=∑ X ig i; GEBV estimates that accuracy is by calculating the related coefficient of GEBV and true breeding value (TBV), i.e. r (GEBV, TBV)draw; When GBLUP algorithm estimated breeding value, r (GEBV, TBV)another computing formula be:
r ( G E B V , T B V ) = N p h 2 N p h 2 + M - - - ( 2 )
Wherein, N pfor the individual amount with reference to group; h 2for the genetic force of studied proterties; M is the number of the effective gene pack section determining this proterties; In actual production, cannot learn the concrete numerical value of TBV, therefore use phenotypic number (Y) to substitute TBV, the pass deriving GEBV and Y is:
r ( G E B V , Y ) = r ( G E B V , T B V ) * h = N p h 2 N p h 2 + M * h - - - ( 3 )
In formula (3), by adjustment N plarge I obtain different r (GEBV, Y)value, this curvilinear equation of matching, the mode of matching adopts curve linearize, arranges, obtain linear equation to formula (3):
1 r ( G E B V , Y ) 2 = 1 h 2 + M h 4 * 1 N p - - - ( 4 )
This equation is equivalent to linear regression model (LRM) y=a+bx, and wherein y is r (GEBV, Y)square inverse, x is N pinverse, namely the intercept a of equation is the inverse of genetic force, by asking the inverse of the intercept of this equation, obtains the estimated value of genetic force.
Check order to all genes of individuals groups, obtain SNP information, the SNP site of all individualities is corresponding, and missing data is by imputation method polishing; For preventing single evaluated error comparatively large, adopting repeatedly the method for hybridization verification, from overall, repeatedly randomly draw reference group and estimate colony, obtaining the estimated result close to actual value; Use different reference group numbers in conjunction with GBLUP algorithm to calculate the effect value of each mark of genome, to obtain the breeding value estimating group, accuracy of estimation is obtained by carrying out analysis to the breeding value and phenotypic number of estimating group, solve the problem that pedigree record intricate operation has even been difficult to, accurately catch the Mendelian sampling error of allele in transmittance process simultaneously.
Embodiment 1
1. subjects is 500 large yellow croakers, and adopt and manually urge ovum technology, all large yellow croakers are being born on the same day, and namely the age is all identical; When test period is two age of large yellow croaker, Metric traits is that the body weight of all large yellow croakers and body are long.
2. adopt GBS (genotyping-by-sequencing) sequencing technologies to carry out gene order-checking to all individualities that will study, screen qualified SNP site, state modulator is as follows: by MAF > 0.05, Hardy-Weinberg equilibrium inspection P-value > 0.001, the miss rate of Single locus stays lower than the marker site of 20%; Final filter out altogether 29748 qualified SNP marker, for the site of disappearance, by the imputation program polishing of software Beagle3.3.2 version.
3. in all 500 individualities, colonies are estimated in random sampling extraction 20% i.e. 100 individual conducts, remaining is divided into four grades according to number of individuals 100,200,300,400, and the reference group number of individuals observing four different stages corresponds to the variation tendency of accuracy of estimation; All marker effects under using GBLUP algorithm to estimate each grade, obtain the breeding value GEBV of each individuality estimating group, estimating the GEBV of group and the related coefficient of phenotypic number, obtaining accuracy of estimation, i.e. r by calculating (GEBV, Y).
In order to reduce the excessive impact of single sampling error, by step 3 repetitive operation 20 times, owing to estimating that group and the individuality with reference to group are all random samplings at every turn, therefore each result repeated can be slightly different, but the mean value of 20 results can more close to legitimate reading, shown in the result accompanying drawing 2 of 20 mean values.
4. reference group size (the i.e. N of pair each grade p) get inverse, to accuracy of estimation (the i.e. r of each grade (GEBV, Y)) the squared inverse of the mean value of 20 results, relation therebetween as shown in Figure 3, carrys out the final regression equation of matching according to formula (4), as shown in the table:
According to upper table result, the heretability estimate value can trying to achieve body weight is 0.227, and body length is 0.196.
Above the better embodiment of this patent is explained in detail, but this patent is not limited to above-mentioned embodiment, in the ken that one skilled in the relevant art possesses, can also makes a variety of changes under the prerequisite not departing from this patent aim.

Claims (4)

1. algorithm genetic force assessed by genomic data, for a certain quantitative character, by the estimation using the reference group of varying number individuality to carry out the marker effect of full-length genome, and then obtain the breeding value estimating group, and calculate accuracy of estimation; Carry out the matching of curve linearize by genome accuracy of estimation and reference group's size, the inverse of the intercept of the regression equation simulated is the estimated value of genetic force; It is characterized in that: the detailed process that genome is selected adopts GBLUP as the algorithm calculating marker effect, and the effect variance of all marker sites of genome is equal, and marker effect is drawn by following formulae discovery:
1 n ′ 1 n 1 n ′ X X ′ 1 n X ′ X + I λ μ ^ g ^ = 1 n ′ y X ′ y - - - ( 1 )
Wherein, for population mean; for the effect vector of all marker sites; Genome estimated breeding value (GEBV) obtains by the effect of all marker sites being added, i.e. GEBV=∑ X ig i; GEBV estimates that accuracy is by calculating the related coefficient of GEBV and true breeding value (TBV), i.e. r (GEBVTBV) draw; When GBLUP algorithm estimated breeding value, r (GEBVTBV)another computing formula be:
r ( G E B V , T B V ) = N p h 2 N p h 2 + M - - - ( 2 )
Wherein, N pfor the individual amount with reference to group; h 2for the genetic force of studied proterties; M is the number of the effective gene pack section determining this proterties; In actual production, cannot learn the concrete numerical value of TBV, therefore use phenotypic number (Y) to substitute TBV, the pass deriving GEBV and Y is:
r ( G E B V , Y ) = r ( G E B V , T B V ) * h = N p h 2 N p h 2 + M * h - - - ( 3 )
In formula (3), by adjustment N plarge I obtain different r (GEBV, Y)value, this curvilinear equation of matching, the mode of matching adopts curve linearize, arranges, obtain linear equation to formula (3):
1 r ( G E B V , Y ) 2 = 1 h 2 + M h 4 * 1 N p - - - ( 4 )
This equation is equivalent to linear regression model (LRM) y=a+bx, and wherein y is r (GEBV, Y)square inverse, x is N pinverse, namely the intercept a of equation is the inverse of genetic force, by asking the inverse of the intercept of this equation, obtains the estimated value of genetic force.
2. algorithm genetic force assessed by genomic data according to claim 1, it is characterized in that, all genes of individuals groups are checked order, obtain SNP information, the SNP site of all individualities is corresponding, and missing data is by imputation method polishing.
3. algorithm genetic force assessed by genomic data according to claim 1, it is characterized in that, for preventing single evaluated error larger, adopt repeatedly the method for hybridization verification, from overall, repeatedly randomly draw reference group and estimate colony, obtaining the estimated result close to actual value.
4. algorithm genetic force assessed by genomic data according to claim 1, it is characterized in that, use different reference group numbers in conjunction with GBLUP algorithm to calculate the effect value of each mark of genome, to obtain the breeding value estimating group, obtain accuracy of estimation by carrying out correlation analysis to the breeding value and phenotypic number of estimating group.
CN201510873172.4A 2015-12-03 2015-12-03 A method of genetic force is assessed by genomic data Active CN105512510B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510873172.4A CN105512510B (en) 2015-12-03 2015-12-03 A method of genetic force is assessed by genomic data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510873172.4A CN105512510B (en) 2015-12-03 2015-12-03 A method of genetic force is assessed by genomic data

Publications (2)

Publication Number Publication Date
CN105512510A true CN105512510A (en) 2016-04-20
CN105512510B CN105512510B (en) 2019-03-08

Family

ID=55720487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510873172.4A Active CN105512510B (en) 2015-12-03 2015-12-03 A method of genetic force is assessed by genomic data

Country Status (1)

Country Link
CN (1) CN105512510B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107338321A (en) * 2017-08-29 2017-11-10 集美大学 A kind of method for determining optimal SNP quantity and its carrying out genome selection and use to large yellow croaker production performance by selection markers
CN109817281A (en) * 2019-01-23 2019-05-28 湖南农业大学 Estimation method, device and the electronic equipment that genome kind is constituted
CN111627495A (en) * 2020-06-01 2020-09-04 集美大学 Method for judging species value of population
CN114410746A (en) * 2022-03-29 2022-04-29 中国海洋大学三亚海洋研究院 Dongxiang spot molecule source-tracing selection breeding method and application thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914631A (en) * 2014-02-26 2014-07-09 中国农业大学 Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip
CN103914632A (en) * 2014-02-26 2014-07-09 中国农业大学 Method for rapidly evaluating genome breeding value and application

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914631A (en) * 2014-02-26 2014-07-09 中国农业大学 Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip
CN103914632A (en) * 2014-02-26 2014-07-09 中国农业大学 Method for rapidly evaluating genome breeding value and application

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANDRES LEGARRA ETAL: ""performance of genomic selection in mice"", 《GENETICS》 *
M. SHIRALI ETAL: ""A Comparison of the Sensitivity of the BayesC and Genomic Best Linear Unbiased Prediction (GBLUP) Methods of Estimating Genomic Breeding Values under Different Quantitative Trait Locus (QTL) Model Assumptions"", 《IRANIAN JOURNAL OF APPLIED ANIMAL SCIENCE》 *
冷静的疯子: ""曲线拟合与曲线直线化"", 《HTTP://BLOG.SINA.COM.CN/S/BLOG_6E59E3730100VFMH.HTML》 *
张勤: "《动物重要经济性状基因的分离与应用》", 28 February 2212, 中国农业大学出版社 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107338321A (en) * 2017-08-29 2017-11-10 集美大学 A kind of method for determining optimal SNP quantity and its carrying out genome selection and use to large yellow croaker production performance by selection markers
CN107338321B (en) * 2017-08-29 2020-05-19 集美大学 Method for determining optimal SNP (single nucleotide polymorphism) quantity and performing genome selective breeding on production performance of large yellow croaker through screening markers
CN109817281A (en) * 2019-01-23 2019-05-28 湖南农业大学 Estimation method, device and the electronic equipment that genome kind is constituted
CN109817281B (en) * 2019-01-23 2022-12-23 湖南农业大学 Method and device for estimating genome variety composition, and electronic device
CN111627495A (en) * 2020-06-01 2020-09-04 集美大学 Method for judging species value of population
CN111627495B (en) * 2020-06-01 2023-03-14 集美大学 Method for judging species value of population
CN114410746A (en) * 2022-03-29 2022-04-29 中国海洋大学三亚海洋研究院 Dongxiang spot molecule source-tracing selection breeding method and application thereof
CN114410746B (en) * 2022-03-29 2022-07-12 中国海洋大学三亚海洋研究院 Dongxiang spot molecule source-tracing selection breeding method and application thereof

Also Published As

Publication number Publication date
CN105512510B (en) 2019-03-08

Similar Documents

Publication Publication Date Title
Montesinos-López et al. A genomic Bayesian multi-trait and multi-environment model
Hall et al. A practical toolbox for design and analysis of landscape genetics studies
Prunier et al. Optimizing the trade‐off between spatial and genetic sampling efforts in patchy populations: Towards a better assessment of functional connectivity using an individual‐based sampling scheme
Legarra et al. Improved Lasso for genomic selection
Taylor Implementation and accuracy of genomic selection
CN105512510A (en) Algorithm for assessing heritability through genome data
Twyford et al. Multi-level patterns of genetic structure and isolation by distance in the widespread plant Mimulus guttatus
Hvilsom et al. Understanding geographic origins and history of admixture among chimpanzees in European zoos, with implications for future breeding programmes
Bothwell et al. Identifying genetic signatures of selection in a non-model species, alpine gentian (Gentiana nivalis L.), using a landscape genetic approach
US20110296753A1 (en) Methods and compositions for predicting unobserved phenotypes (pup)
CN105868584A (en) Method for performing whole genome selective breeding by selecting extreme character individual
Flanagan et al. Population genomics reveals multiple drivers of population differentiation in a sex‐role‐reversed pipefish
Ye et al. Phylogeography of Schisandra chinensis (Magnoliaceae) reveal multiple refugia with ample gene flow in Northeast China
Maenhout et al. Prediction of maize single-cross hybrid performance: support vector machine regression versus best linear prediction
Hernández-Velasco et al. Spatial genetic structure in four Pinus species in the Sierra Madre Occidental, Durango, México
Mäntysaari et al. Use of bivariate EBV-DGV model to combine genomic and conventional breeding value evaluations
CN105740649A (en) Multi-character correlation analysis method based on mixed linear model
CN108197435A (en) Localization method between a kind of multiple characters multi-region for containing error based on marker site genotype
Chybicki et al. Isolation-by-distance within naturally established populations of European beech (Fagus sylvatica)
Wang et al. Molecular phylogeography and historical demography of a widespread herbaceous species from eastern North America, Podophyllum peltatum
Legarra et al. Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation–maximization maximum likelihood and increase of relationships
Clark et al. Chloroplast DNA phylogeography in long-lived Huon pine, a Tasmanian rain forest conifer
CN108090325B (en) Method for analyzing single cell sequencing data by applying beta-stability
Vandergast Incorporating genetic sampling in long-term monitoring and adaptive management in the San Diego County Management Strategic Plan Area, Southern California
Degner Genomics of adaptation in interior spruce to past, present, and future climates of western Canada

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant