CN105512510B - A method of genetic force is assessed by genomic data - Google Patents
A method of genetic force is assessed by genomic data Download PDFInfo
- Publication number
- CN105512510B CN105512510B CN201510873172.4A CN201510873172A CN105512510B CN 105512510 B CN105512510 B CN 105512510B CN 201510873172 A CN201510873172 A CN 201510873172A CN 105512510 B CN105512510 B CN 105512510B
- Authority
- CN
- China
- Prior art keywords
- estimation
- value
- group
- gebv
- genetic force
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000002068 genetic effect Effects 0.000 title claims abstract description 29
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000009395 breeding Methods 0.000 claims abstract description 30
- 230000001488 breeding effect Effects 0.000 claims abstract description 30
- 230000000694 effects Effects 0.000 claims abstract description 23
- 239000003550 marker Substances 0.000 claims abstract description 19
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 11
- 238000012417 linear regression Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000009396 hybridization Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000010219 correlation analysis Methods 0.000 claims 1
- 238000005070 sampling Methods 0.000 abstract description 8
- 238000012163 sequencing technique Methods 0.000 abstract description 5
- 241001465754 Metazoa Species 0.000 abstract description 2
- 241001596950 Larimichthys crocea Species 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 239000012634 fragment Substances 0.000 description 2
- 238000005498 polishing Methods 0.000 description 2
- 108700028369 Alleles Proteins 0.000 description 1
- 241000208340 Araliaceae Species 0.000 description 1
- 108010000912 Egg Proteins Proteins 0.000 description 1
- 102000002322 Egg Proteins Human genes 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 244000144972 livestock Species 0.000 description 1
- 210000004681 ovum Anatomy 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000002834 transmittance Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Landscapes
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of methods assessed by genomic data genetic force, for a certain quantitative character, the estimation of the marker effect of full-length genome is carried out using GBLUP algorithm by using the reference group individual of different number, and then the breeding value of estimation group is obtained, and calculate accuracy of estimation;The fitting of curve linearization(-sation) is carried out by genome accuracy of estimation and reference group's size, the inverse of the intercept of the regression equation fitted is the estimated value of genetic force;The present invention assesses the genetic force of quantitative character by the data of genome, the achievement studied may be directly applied in animals and plants quantitative character breeding, algorithm of the invention does not carry out pedigree record to individual but genes of individuals group is sequenced, the genetic force of character is predicted by full-length genome label, genetic force estimated result is mainly used in the breeding work in future, in addition, sequencing can capture Mendelian sampling error, relative record pedigree data can obtain more accurate pedigree information.
Description
Technical field
The present invention relates to genetic engineering field, specifically a kind of side that genetic force is assessed by genomic data
Method.
Background technique
Current genetic force appraisal procedure mainly utilizes the affiliation between individual, using various statistical means, such as variance
Analytic approach, relevant function method etc. are inferred that this method will carry out complete pedigree record, however for some species,
It is very big or even difficult to realize to carry out pedigree record workload, such as aquatic livestock;In addition, traditional genetic force appraisal procedure is
Genomic information is handled as " black box ", the specific letter that gene is transmitted from parent to filial generation can not be captured in this way
Breath, i.e., can not prepare to capture Mendelian sampling error, cause evaluated error larger;In order to solve conventional genetic power estimation method
Middle pedigree record heavy workload and the problem of can not accurately capture Mendelian sampling error, needs to improve the prior art and changes
It is good.
Summary of the invention
It is larger cumbersome with pedigree record that the purpose of the present invention is to provide a kind of errors overcome in the estimation of conventional genetic power
The problem of the algorithm that genetic force is assessed by genomic data, to solve the problems mentioned in the above background technology.
Pedigree record of the present invention without individual, is directly sequenced the genome of all individuals, in conjunction with individual
Performance inventory and genomic marker information, estimate the accuracy of estimation of genomic breeding value, and then estimate the heredity of character
Power.
To achieve the above object, the invention provides the following technical scheme:
A kind of algorithm that genetic force is assessed by genomic data, for a certain quantitative character, by using not
With the estimation of the marker effect of reference group's individual progress full-length genome of quantity, and then the breeding value of estimation group is obtained, and calculate
Accuracy of estimation out;The above process is exactly the detailed process of gene group selection in fact, is used as in the invention using GBLUP and calculates mark
Remember that the algorithm of effect, GBLUP algorithm were invented in 2001 by Meuwissen et al., prior distribution thinks all marks of genome
Note site effect variance be it is equal, marker effect can be calculated by following formula:
Wherein,For population mean;For the effect vector of all marker sites;Genome estimated breeding value pass through by
The effect of all marker sites, which is added, to be obtained, and wherein genome estimated breeding value is indicated with GEBV, i.e. GEBV=∑ Xigi;GEBV
Accuracy is estimated by the related coefficient of calculating GEBV and true breeding value, wherein true breeding value is indicated with TBV, i.e.,
r(GEBV, TBV);Meanwhile Daetwyler et al. was deduced in the case where GBLUP algorithm estimated breeding value in 2008,
r(GEBV, TBV)Another calculation formula are as follows:
Wherein, NpFor the individual amount with reference to group;h2Genetic force for the character studied;M be determine the character it is effective
The number of genomic fragment;However in actual production, it can not learn the specific value of TBV, therefore substitute TBV with phenotypic number,
Wherein phenotypic number is indicated with Y, derives the relationship of GEBV and Y are as follows:
In formula (3), by adjusting NpSize can get different r(GEBV, Y)Value, be fitted the curvilinear equation, intend
The mode of conjunction uses curve linearization(-sation), arranges to formula (3), obtains linear equation:
The equation is equivalent to linear regression model (LRM) y=a+bx, and wherein y is r(GEBV, Y)Square inverse, x NpFall
Number, the intercept a of equation is the inverse of genetic force, and the inverse of the intercept by seeking the equation finds out the estimated value of genetic force.
As a further solution of the present invention: all genes of individuals groups being sequenced, SNP information, all individuals are obtained
SNP site it is corresponding, missing data passes through interpolating method polishing.
As further scheme of the invention: to prevent single evaluated error larger, using the side of multiple hybridization verification
Method randomly selects reference group and estimation group, from totality repeatedly to obtain the estimated result close to true value.
As the present invention further scheme: calculating gene with reference to group's number combination GBLUP algorithm using different
The effect value of each label of group is related to phenotypic number progress by the breeding value to estimation group to obtain the breeding value of estimation group
Analysis obtains accuracy of estimation
Compared with prior art, the beneficial effects of the present invention are: the present invention passes through the data of genome to quantitative character
Genetic force is assessed, and the achievement studied may be directly applied in animals and plants quantitative character breeding, and algorithm of the invention can be with
On the basis of not establishing family, the genetic force of character is predicted by full-length genome label, and it is cumbersome very to solve pedigree record
Extremely problem difficult to realize, and since sequencing can capture Mendelian sampling error, algorithm relative record system of the invention
Modal data can obtain more accurate pedigree information.
Detailed description of the invention
Fig. 1 is algorithm flow chart of the invention.
Fig. 2 is the GEBV accuracy of weight and long two characters of body in the present invention with the trend of reference group's size variation
Figure.
Fig. 3 is the GEBV accuracy of weight in the present invention and long two characters of body and reference group's size according to 4 turns of formula
Tendency chart after changing.
Wherein, the value of abscissa is the reciprocal value with reference to group's number of individuals;The value of ordinate be GEBV accuracy square
It is reciprocal;R2For the coefficient of determination of regression equation.
Specific embodiment
The technical solution of the patent is explained in further detail With reference to embodiment.
Please refer to attached drawing 1-3, a kind of algorithm assessed by genomic data genetic force, for a certain quantitative
Shape, the estimation of the marker effect of full-length genome is carried out by using reference group's individual of different number, and then obtains estimation group's
Breeding value, and calculate accuracy of estimation;It is quasi- that curve linearization(-sation) is carried out by genome accuracy of estimation and reference group's size
It closes, the inverse of the intercept of the regression equation fitted is the estimated value of genetic force;It is characterized by: the specific mistake of gene group selection
For Cheng Caiyong GBLUP as the algorithm for calculating marker effect, the effect variance of all marker sites of genome is equal, label effect
It should be calculated by the following formula and obtain:
Wherein,For population mean;For the effect vector of all marker sites;Genome estimated breeding value pass through by
The effect of all marker sites, which is added, to be obtained, and wherein genome estimated breeding value is indicated with GEBV, i.e. GEBV=∑ Xigi;GEBV
Accuracy is estimated by the related coefficient of calculating GEBV and true breeding value, wherein true breeding value is indicated with TBV, i.e.,
r(GEBV, TBV)It obtains;In the case where GBLUP algorithm estimated breeding value, r(GEBV, TBV)Another calculation formula are as follows:
Wherein, NpFor the individual amount with reference to group;h2Genetic force for the character studied;M be determine the character it is effective
The number of genomic fragment;In actual production, it can not learn the specific value of TBV, therefore substitute TBV with phenotypic number, wherein
Phenotypic number is indicated with Y, derives the relationship of GEBV and Y are as follows:
In formula (3), by adjusting NpSize can get different r(GEBV, Y)Value, be fitted the curvilinear equation, intend
The mode of conjunction uses curve linearization(-sation), arranges to formula (3), obtains linear equation:
The equation is equivalent to linear regression model (LRM) y=a+bx, and wherein y is r(GEBV, Y)Square inverse, x NpFall
Number, the intercept a of equation is the inverse of genetic force, and the inverse of the intercept by seeking the equation finds out the estimated value of genetic force.
All genes of individuals groups are sequenced, obtain SNP information, the SNP site of all individuals is corresponding, and missing data is logical
Cross interpolating method polishing;It is repeatedly random from totality using the method for multiple hybridization verification to prevent single evaluated error larger
Reference group and estimation group are extracted, to obtain the estimated result close to true value;It is combined using different reference group's numbers
GBLUP algorithm passes through the breeding to estimation group to calculate the effect value of each label of genome to obtain the breeding value of estimation group
Value and phenotypic number are analyzed to obtain accuracy of estimation, solve the problems, such as that pedigree record intricate operation is even difficult to complete, together
When accurately capture Mendelian sampling error of the allele in transmittance process.
Embodiment 1
1. subjects are 500 Larimichthys croceas, using manually ovum technology is urged, all Larimichthys croceas are being born on the same day, i.e.,
Age is all identical;When test period is two age of Larimichthys crocea, Metric traits are long for the weight and body of all Larimichthys croceas.
2. carrying out gene to all individuals to be studied using GBS (genotyping-by-sequencing) sequencing technologies
Group sequencing, screens qualified SNP site, and state modulator is as follows: by MAF > 0.05, Hardy-Weinberg equilibrium examines P-value
Marker site of the miss rate lower than 20% of > 0.001, single locus leave;It is final to filter out 29748 qualified SNP altogether
Label, for the site of missing, passes through the patching plug program polishing of software Beagle 3.3.2 version.
3. random sampling extraction 20% i.e. 100 individuals are remaining according to a as estimation group in all 500 individuals
Body number 100,200,300,400 is divided into four grades, and the reference group's number of individuals for observing four different stages corresponds to estimation accurately
The variation tendency of degree;All marker effects under each grade are estimated using GBLUP algorithm, obtain each of estimation group individual
Breeding value GEBV obtains accuracy of estimation, i.e. r by calculating the GEBV of estimation group and the related coefficient of phenotypic number(GEBV, Y)。
In order to reduce the excessive influence of single sampling error, by step 3 repetitive operation 20 times, due to estimating group and ginseng every time
The individual for examining group is all random sampling, therefore duplicate result can be slightly different every time, but the average value meeting of 20 results
It is more nearly legitimate reading, shown in the result attached drawing 2 of 20 average value.
4. reference group's size (i.e. N of pair each gradep) inverted, to accuracy of estimation (the i.e. r of each grade(GEBV, Y))
20 results the inverse that is squared of average value, relationship between the two is as shown in Fig. 3, is fitted most according to formula (4)
Whole regression equation, as shown in the table:
According to upper table as a result, the heretability estimate value that can acquire weight is 0.227, body a length of 0.196.
The preferred embodiment of the patent is described in detail above, but this patent is not limited to above-mentioned embodiment party
Formula within the knowledge of one of ordinary skill in the art can also be under the premise of not departing from this patent objective
It makes a variety of changes.
Claims (4)
1. a kind of method assessed by genomic data genetic force, for a certain quantitative character, by using difference
Reference group's individual of quantity carries out the estimation of the marker effect of full-length genome, and then obtains the breeding value of estimation group, and calculate
Accuracy of estimation;The fitting of curve linearization(-sation), the recurrence fitted are carried out by genome accuracy of estimation and reference group's size
The inverse of the intercept of equation is the estimated value of genetic force;It is characterized by: the detailed process of gene group selection uses GBLUP conduct
Calculate marker effect algorithm, the effect variance of all marker sites of genome be it is equal, marker effect passes through following formula
It is calculated:
Wherein,For population mean;For the effect vector of all marker sites;Genome estimated breeding value will be by that will own
The effect of marker site, which is added, to be obtained, and wherein genome estimated breeding value is indicated with GEBV, i.e. GEBV=∑ Xigi;GEBV estimation
Related coefficient of the accuracy by calculating GEBV and true breeding value, i.e. r(GEBV, TBV)It obtains;In GBLUP algorithm estimated breeding value
In the case where, wherein true breeding value is indicated with TBV, r(GEBV, TBV)Another calculation formula are as follows:
Wherein, NpFor the individual amount with reference to group;h2Genetic force for the character studied;M is the effective gene for determining the character
The number of group segment;In actual production, it can not learn the specific value of TBV, therefore substitute TBV with phenotypic number, wherein phenotype
Value is indicated with Y, derives the relationship of GEBV and Y are as follows:
In formula (3), by adjusting NpSize can get different r(GEBV, Y)Value, be fitted the curvilinear equation, fitting
Mode uses curve linearization(-sation), arranges to formula (3), obtains linear equation:
The equation is equivalent to linear regression model (LRM) y=a+bx, and wherein y is r(GEBV, Y)Square inverse, x NpInverse, side
The intercept a of journey is the inverse of genetic force, and the inverse of the intercept by seeking the equation finds out the estimated value of genetic force.
2. the method according to claim 1 assessed by genomic data genetic force, which is characterized in that institute
There is genes of individuals group to be sequenced, obtains SNP information, the SNP site of all individuals is corresponding, and missing data is mended by interpolating method
Together.
3. the method according to claim 1 assessed by genomic data genetic force, which is characterized in that be anti-
Only single evaluated error is larger, using the method for multiple hybridization verification, randomly selects reference group and estimation from totality repeatedly
Group, to obtain the estimated result close to true value.
4. the method according to claim 1 assessed by genomic data genetic force, which is characterized in that use
Different reference group's number combination GBLUP algorithms calculates the effect value of each label of genome, to obtain the breeding of estimation group
Value obtains accuracy of estimation by carrying out correlation analysis to the breeding value and phenotypic number of estimating group.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510873172.4A CN105512510B (en) | 2015-12-03 | 2015-12-03 | A method of genetic force is assessed by genomic data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510873172.4A CN105512510B (en) | 2015-12-03 | 2015-12-03 | A method of genetic force is assessed by genomic data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105512510A CN105512510A (en) | 2016-04-20 |
CN105512510B true CN105512510B (en) | 2019-03-08 |
Family
ID=55720487
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510873172.4A Expired - Fee Related CN105512510B (en) | 2015-12-03 | 2015-12-03 | A method of genetic force is assessed by genomic data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105512510B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107338321B (en) * | 2017-08-29 | 2020-05-19 | 集美大学 | Method for determining optimal SNP (single nucleotide polymorphism) quantity and performing genome selective breeding on production performance of large yellow croaker through screening markers |
CN109817281B (en) * | 2019-01-23 | 2022-12-23 | 湖南农业大学 | Method and device for estimating genome variety composition, and electronic device |
CN111627495B (en) * | 2020-06-01 | 2023-03-14 | 集美大学 | Method for judging species value of population |
CN114410746B (en) * | 2022-03-29 | 2022-07-12 | 中国海洋大学三亚海洋研究院 | Dongxiang spot molecule source-tracing selection breeding method and application thereof |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103914631A (en) * | 2014-02-26 | 2014-07-09 | 中国农业大学 | Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip |
CN103914632A (en) * | 2014-02-26 | 2014-07-09 | 中国农业大学 | Method for rapidly evaluating genome breeding value and application |
-
2015
- 2015-12-03 CN CN201510873172.4A patent/CN105512510B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103914631A (en) * | 2014-02-26 | 2014-07-09 | 中国农业大学 | Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip |
CN103914632A (en) * | 2014-02-26 | 2014-07-09 | 中国农业大学 | Method for rapidly evaluating genome breeding value and application |
Non-Patent Citations (3)
Title |
---|
"A Comparison of the Sensitivity of the BayesC and Genomic Best Linear Unbiased Prediction (GBLUP) Methods of Estimating Genomic Breeding Values under Different Quantitative Trait Locus (QTL) Model Assumptions";M. Shirali etal;《iranian journal of applied animal science》;20150331;第5卷(第1期);第42页右栏第1段-第44页左栏第1段 |
"performance of genomic selection in mice";andres legarra etal;《genetics》;20080930;第180卷(第1期);第618页左栏第1段 |
"曲线拟合与曲线直线化";冷静的疯子;《http://blog.sina.com.cn/s/blog_6e59e3730100vfmh.html》;20111116;第2页 |
Also Published As
Publication number | Publication date |
---|---|
CN105512510A (en) | 2016-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Paetkau et al. | Genetic assignment methods for the direct, real‐time estimation of migration rate: a simulation‐based exploration of accuracy and power | |
Schlötterer et al. | Combining experimental evolution with next-generation sequencing: a powerful tool to study adaptation from standing genetic variation | |
CN105512510B (en) | A method of genetic force is assessed by genomic data | |
Chang et al. | High density marker panels, SNPs prioritizing and accuracy of genomic selection | |
CN109524059B (en) | Rapid and stable animal individual genome breeding value evaluation method | |
Holsinger et al. | Genetics in geographically structured populations: defining, estimating and interpreting F ST | |
Zhang et al. | Accuracy of whole-genome prediction using a genetic architecture-enhanced variance-covariance matrix | |
VanRaden | Genomic measures of relationship and inbreeding | |
CN105868584B (en) | The method for carrying out full-length genome selection and use by choosing extreme character individual | |
Fu et al. | Mapping shape quantitative trait loci using a radius-centroid-contour model | |
Chen et al. | Multi-population genomic prediction using a multi-task Bayesian learning model | |
Masuda et al. | 331 Efficient quality control methods for genomic and pedigree data used in routine genomic evaluation | |
Bariotakis et al. | Environmental (in) dependence of a hybrid zone: Insights from molecular markers and ecological niche modeling in a hybrid zone of Origanum (Lamiaceae) on the island of Crete | |
CN101613742B (en) | Multielement high flux genetic marking system and genetic analyzing method of Chinese mitten crabs | |
Holman et al. | A morphological cline in Eucalyptus: a genetic perspective | |
CN108197435A (en) | Localization method between a kind of multiple characters multi-region for containing error based on marker site genotype | |
Lepais et al. | Joint analysis of microsatellites and flanking sequences enlightens complex demographic history of interspecific gene flow and vicariance in rear-edge oak populations | |
Field et al. | Population assignment in autopolyploids | |
CN106570350A (en) | Single nucleotide polymorphisms site parting algorithm | |
Liang et al. | Globally relaxed selection and local adaptation in Boechera stricta | |
Edel et al. | Optimized aggregation of phenotypes for MA-BLUP evaluation in German Fleckvieh | |
Clark et al. | Chloroplast DNA phylogeography in long-lived Huon pine, a Tasmanian rain forest conifer | |
Schiavinato et al. | JLOH: Inferring loss of heterozygosity blocks from sequencing data | |
Wang et al. | Haplotype-based computational genetic analysis in mice | |
CN116863998B (en) | Genetic algorithm-based whole genome prediction method and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190308 |