CN105512510B - A method of genetic force is assessed by genomic data - Google Patents

A method of genetic force is assessed by genomic data Download PDF

Info

Publication number
CN105512510B
CN105512510B CN201510873172.4A CN201510873172A CN105512510B CN 105512510 B CN105512510 B CN 105512510B CN 201510873172 A CN201510873172 A CN 201510873172A CN 105512510 B CN105512510 B CN 105512510B
Authority
CN
China
Prior art keywords
estimation
value
group
gebv
genetic force
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510873172.4A
Other languages
Chinese (zh)
Other versions
CN105512510A (en
Inventor
肖世俊
董林松
王志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jimei University
Original Assignee
Jimei University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jimei University filed Critical Jimei University
Priority to CN201510873172.4A priority Critical patent/CN105512510B/en
Publication of CN105512510A publication Critical patent/CN105512510A/en
Application granted granted Critical
Publication of CN105512510B publication Critical patent/CN105512510B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of methods assessed by genomic data genetic force, for a certain quantitative character, the estimation of the marker effect of full-length genome is carried out using GBLUP algorithm by using the reference group individual of different number, and then the breeding value of estimation group is obtained, and calculate accuracy of estimation;The fitting of curve linearization(-sation) is carried out by genome accuracy of estimation and reference group's size, the inverse of the intercept of the regression equation fitted is the estimated value of genetic force;The present invention assesses the genetic force of quantitative character by the data of genome, the achievement studied may be directly applied in animals and plants quantitative character breeding, algorithm of the invention does not carry out pedigree record to individual but genes of individuals group is sequenced, the genetic force of character is predicted by full-length genome label, genetic force estimated result is mainly used in the breeding work in future, in addition, sequencing can capture Mendelian sampling error, relative record pedigree data can obtain more accurate pedigree information.

Description

A method of genetic force is assessed by genomic data
Technical field
The present invention relates to genetic engineering field, specifically a kind of side that genetic force is assessed by genomic data Method.
Background technique
Current genetic force appraisal procedure mainly utilizes the affiliation between individual, using various statistical means, such as variance Analytic approach, relevant function method etc. are inferred that this method will carry out complete pedigree record, however for some species, It is very big or even difficult to realize to carry out pedigree record workload, such as aquatic livestock;In addition, traditional genetic force appraisal procedure is Genomic information is handled as " black box ", the specific letter that gene is transmitted from parent to filial generation can not be captured in this way Breath, i.e., can not prepare to capture Mendelian sampling error, cause evaluated error larger;In order to solve conventional genetic power estimation method Middle pedigree record heavy workload and the problem of can not accurately capture Mendelian sampling error, needs to improve the prior art and changes It is good.
Summary of the invention
It is larger cumbersome with pedigree record that the purpose of the present invention is to provide a kind of errors overcome in the estimation of conventional genetic power The problem of the algorithm that genetic force is assessed by genomic data, to solve the problems mentioned in the above background technology.
Pedigree record of the present invention without individual, is directly sequenced the genome of all individuals, in conjunction with individual Performance inventory and genomic marker information, estimate the accuracy of estimation of genomic breeding value, and then estimate the heredity of character Power.
To achieve the above object, the invention provides the following technical scheme:
A kind of algorithm that genetic force is assessed by genomic data, for a certain quantitative character, by using not With the estimation of the marker effect of reference group's individual progress full-length genome of quantity, and then the breeding value of estimation group is obtained, and calculate Accuracy of estimation out;The above process is exactly the detailed process of gene group selection in fact, is used as in the invention using GBLUP and calculates mark Remember that the algorithm of effect, GBLUP algorithm were invented in 2001 by Meuwissen et al., prior distribution thinks all marks of genome Note site effect variance be it is equal, marker effect can be calculated by following formula:
Wherein,For population mean;For the effect vector of all marker sites;Genome estimated breeding value pass through by The effect of all marker sites, which is added, to be obtained, and wherein genome estimated breeding value is indicated with GEBV, i.e. GEBV=∑ Xigi;GEBV Accuracy is estimated by the related coefficient of calculating GEBV and true breeding value, wherein true breeding value is indicated with TBV, i.e., r(GEBV, TBV);Meanwhile Daetwyler et al. was deduced in the case where GBLUP algorithm estimated breeding value in 2008, r(GEBV, TBV)Another calculation formula are as follows:
Wherein, NpFor the individual amount with reference to group;h2Genetic force for the character studied;M be determine the character it is effective The number of genomic fragment;However in actual production, it can not learn the specific value of TBV, therefore substitute TBV with phenotypic number, Wherein phenotypic number is indicated with Y, derives the relationship of GEBV and Y are as follows:
In formula (3), by adjusting NpSize can get different r(GEBV, Y)Value, be fitted the curvilinear equation, intend The mode of conjunction uses curve linearization(-sation), arranges to formula (3), obtains linear equation:
The equation is equivalent to linear regression model (LRM) y=a+bx, and wherein y is r(GEBV, Y)Square inverse, x NpFall Number, the intercept a of equation is the inverse of genetic force, and the inverse of the intercept by seeking the equation finds out the estimated value of genetic force.
As a further solution of the present invention: all genes of individuals groups being sequenced, SNP information, all individuals are obtained SNP site it is corresponding, missing data passes through interpolating method polishing.
As further scheme of the invention: to prevent single evaluated error larger, using the side of multiple hybridization verification Method randomly selects reference group and estimation group, from totality repeatedly to obtain the estimated result close to true value.
As the present invention further scheme: calculating gene with reference to group's number combination GBLUP algorithm using different The effect value of each label of group is related to phenotypic number progress by the breeding value to estimation group to obtain the breeding value of estimation group Analysis obtains accuracy of estimation
Compared with prior art, the beneficial effects of the present invention are: the present invention passes through the data of genome to quantitative character Genetic force is assessed, and the achievement studied may be directly applied in animals and plants quantitative character breeding, and algorithm of the invention can be with On the basis of not establishing family, the genetic force of character is predicted by full-length genome label, and it is cumbersome very to solve pedigree record Extremely problem difficult to realize, and since sequencing can capture Mendelian sampling error, algorithm relative record system of the invention Modal data can obtain more accurate pedigree information.
Detailed description of the invention
Fig. 1 is algorithm flow chart of the invention.
Fig. 2 is the GEBV accuracy of weight and long two characters of body in the present invention with the trend of reference group's size variation Figure.
Fig. 3 is the GEBV accuracy of weight in the present invention and long two characters of body and reference group's size according to 4 turns of formula Tendency chart after changing.
Wherein, the value of abscissa is the reciprocal value with reference to group's number of individuals;The value of ordinate be GEBV accuracy square It is reciprocal;R2For the coefficient of determination of regression equation.
Specific embodiment
The technical solution of the patent is explained in further detail With reference to embodiment.
Please refer to attached drawing 1-3, a kind of algorithm assessed by genomic data genetic force, for a certain quantitative Shape, the estimation of the marker effect of full-length genome is carried out by using reference group's individual of different number, and then obtains estimation group's Breeding value, and calculate accuracy of estimation;It is quasi- that curve linearization(-sation) is carried out by genome accuracy of estimation and reference group's size It closes, the inverse of the intercept of the regression equation fitted is the estimated value of genetic force;It is characterized by: the specific mistake of gene group selection For Cheng Caiyong GBLUP as the algorithm for calculating marker effect, the effect variance of all marker sites of genome is equal, label effect It should be calculated by the following formula and obtain:
Wherein,For population mean;For the effect vector of all marker sites;Genome estimated breeding value pass through by The effect of all marker sites, which is added, to be obtained, and wherein genome estimated breeding value is indicated with GEBV, i.e. GEBV=∑ Xigi;GEBV Accuracy is estimated by the related coefficient of calculating GEBV and true breeding value, wherein true breeding value is indicated with TBV, i.e., r(GEBV, TBV)It obtains;In the case where GBLUP algorithm estimated breeding value, r(GEBV, TBV)Another calculation formula are as follows:
Wherein, NpFor the individual amount with reference to group;h2Genetic force for the character studied;M be determine the character it is effective The number of genomic fragment;In actual production, it can not learn the specific value of TBV, therefore substitute TBV with phenotypic number, wherein Phenotypic number is indicated with Y, derives the relationship of GEBV and Y are as follows:
In formula (3), by adjusting NpSize can get different r(GEBV, Y)Value, be fitted the curvilinear equation, intend The mode of conjunction uses curve linearization(-sation), arranges to formula (3), obtains linear equation:
The equation is equivalent to linear regression model (LRM) y=a+bx, and wherein y is r(GEBV, Y)Square inverse, x NpFall Number, the intercept a of equation is the inverse of genetic force, and the inverse of the intercept by seeking the equation finds out the estimated value of genetic force.
All genes of individuals groups are sequenced, obtain SNP information, the SNP site of all individuals is corresponding, and missing data is logical Cross interpolating method polishing;It is repeatedly random from totality using the method for multiple hybridization verification to prevent single evaluated error larger Reference group and estimation group are extracted, to obtain the estimated result close to true value;It is combined using different reference group's numbers GBLUP algorithm passes through the breeding to estimation group to calculate the effect value of each label of genome to obtain the breeding value of estimation group Value and phenotypic number are analyzed to obtain accuracy of estimation, solve the problems, such as that pedigree record intricate operation is even difficult to complete, together When accurately capture Mendelian sampling error of the allele in transmittance process.
Embodiment 1
1. subjects are 500 Larimichthys croceas, using manually ovum technology is urged, all Larimichthys croceas are being born on the same day, i.e., Age is all identical;When test period is two age of Larimichthys crocea, Metric traits are long for the weight and body of all Larimichthys croceas.
2. carrying out gene to all individuals to be studied using GBS (genotyping-by-sequencing) sequencing technologies Group sequencing, screens qualified SNP site, and state modulator is as follows: by MAF > 0.05, Hardy-Weinberg equilibrium examines P-value Marker site of the miss rate lower than 20% of > 0.001, single locus leave;It is final to filter out 29748 qualified SNP altogether Label, for the site of missing, passes through the patching plug program polishing of software Beagle 3.3.2 version.
3. random sampling extraction 20% i.e. 100 individuals are remaining according to a as estimation group in all 500 individuals Body number 100,200,300,400 is divided into four grades, and the reference group's number of individuals for observing four different stages corresponds to estimation accurately The variation tendency of degree;All marker effects under each grade are estimated using GBLUP algorithm, obtain each of estimation group individual Breeding value GEBV obtains accuracy of estimation, i.e. r by calculating the GEBV of estimation group and the related coefficient of phenotypic number(GEBV, Y)
In order to reduce the excessive influence of single sampling error, by step 3 repetitive operation 20 times, due to estimating group and ginseng every time The individual for examining group is all random sampling, therefore duplicate result can be slightly different every time, but the average value meeting of 20 results It is more nearly legitimate reading, shown in the result attached drawing 2 of 20 average value.
4. reference group's size (i.e. N of pair each gradep) inverted, to accuracy of estimation (the i.e. r of each grade(GEBV, Y)) 20 results the inverse that is squared of average value, relationship between the two is as shown in Fig. 3, is fitted most according to formula (4) Whole regression equation, as shown in the table:
According to upper table as a result, the heretability estimate value that can acquire weight is 0.227, body a length of 0.196.
The preferred embodiment of the patent is described in detail above, but this patent is not limited to above-mentioned embodiment party Formula within the knowledge of one of ordinary skill in the art can also be under the premise of not departing from this patent objective It makes a variety of changes.

Claims (4)

1. a kind of method assessed by genomic data genetic force, for a certain quantitative character, by using difference Reference group's individual of quantity carries out the estimation of the marker effect of full-length genome, and then obtains the breeding value of estimation group, and calculate Accuracy of estimation;The fitting of curve linearization(-sation), the recurrence fitted are carried out by genome accuracy of estimation and reference group's size The inverse of the intercept of equation is the estimated value of genetic force;It is characterized by: the detailed process of gene group selection uses GBLUP conduct Calculate marker effect algorithm, the effect variance of all marker sites of genome be it is equal, marker effect passes through following formula It is calculated:
Wherein,For population mean;For the effect vector of all marker sites;Genome estimated breeding value will be by that will own The effect of marker site, which is added, to be obtained, and wherein genome estimated breeding value is indicated with GEBV, i.e. GEBV=∑ Xigi;GEBV estimation Related coefficient of the accuracy by calculating GEBV and true breeding value, i.e. r(GEBV, TBV)It obtains;In GBLUP algorithm estimated breeding value In the case where, wherein true breeding value is indicated with TBV, r(GEBV, TBV)Another calculation formula are as follows:
Wherein, NpFor the individual amount with reference to group;h2Genetic force for the character studied;M is the effective gene for determining the character The number of group segment;In actual production, it can not learn the specific value of TBV, therefore substitute TBV with phenotypic number, wherein phenotype Value is indicated with Y, derives the relationship of GEBV and Y are as follows:
In formula (3), by adjusting NpSize can get different r(GEBV, Y)Value, be fitted the curvilinear equation, fitting Mode uses curve linearization(-sation), arranges to formula (3), obtains linear equation:
The equation is equivalent to linear regression model (LRM) y=a+bx, and wherein y is r(GEBV, Y)Square inverse, x NpInverse, side The intercept a of journey is the inverse of genetic force, and the inverse of the intercept by seeking the equation finds out the estimated value of genetic force.
2. the method according to claim 1 assessed by genomic data genetic force, which is characterized in that institute There is genes of individuals group to be sequenced, obtains SNP information, the SNP site of all individuals is corresponding, and missing data is mended by interpolating method Together.
3. the method according to claim 1 assessed by genomic data genetic force, which is characterized in that be anti- Only single evaluated error is larger, using the method for multiple hybridization verification, randomly selects reference group and estimation from totality repeatedly Group, to obtain the estimated result close to true value.
4. the method according to claim 1 assessed by genomic data genetic force, which is characterized in that use Different reference group's number combination GBLUP algorithms calculates the effect value of each label of genome, to obtain the breeding of estimation group Value obtains accuracy of estimation by carrying out correlation analysis to the breeding value and phenotypic number of estimating group.
CN201510873172.4A 2015-12-03 2015-12-03 A method of genetic force is assessed by genomic data Expired - Fee Related CN105512510B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510873172.4A CN105512510B (en) 2015-12-03 2015-12-03 A method of genetic force is assessed by genomic data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510873172.4A CN105512510B (en) 2015-12-03 2015-12-03 A method of genetic force is assessed by genomic data

Publications (2)

Publication Number Publication Date
CN105512510A CN105512510A (en) 2016-04-20
CN105512510B true CN105512510B (en) 2019-03-08

Family

ID=55720487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510873172.4A Expired - Fee Related CN105512510B (en) 2015-12-03 2015-12-03 A method of genetic force is assessed by genomic data

Country Status (1)

Country Link
CN (1) CN105512510B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107338321B (en) * 2017-08-29 2020-05-19 集美大学 Method for determining optimal SNP (single nucleotide polymorphism) quantity and performing genome selective breeding on production performance of large yellow croaker through screening markers
CN109817281B (en) * 2019-01-23 2022-12-23 湖南农业大学 Method and device for estimating genome variety composition, and electronic device
CN111627495B (en) * 2020-06-01 2023-03-14 集美大学 Method for judging species value of population
CN114410746B (en) * 2022-03-29 2022-07-12 中国海洋大学三亚海洋研究院 Dongxiang spot molecule source-tracing selection breeding method and application thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914631A (en) * 2014-02-26 2014-07-09 中国农业大学 Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip
CN103914632A (en) * 2014-02-26 2014-07-09 中国农业大学 Method for rapidly evaluating genome breeding value and application

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914631A (en) * 2014-02-26 2014-07-09 中国农业大学 Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip
CN103914632A (en) * 2014-02-26 2014-07-09 中国农业大学 Method for rapidly evaluating genome breeding value and application

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"A Comparison of the Sensitivity of the BayesC and Genomic Best Linear Unbiased Prediction (GBLUP) Methods of Estimating Genomic Breeding Values under Different Quantitative Trait Locus (QTL) Model Assumptions";M. Shirali etal;《iranian journal of applied animal science》;20150331;第5卷(第1期);第42页右栏第1段-第44页左栏第1段
"performance of genomic selection in mice";andres legarra etal;《genetics》;20080930;第180卷(第1期);第618页左栏第1段
"曲线拟合与曲线直线化";冷静的疯子;《http://blog.sina.com.cn/s/blog_6e59e3730100vfmh.html》;20111116;第2页

Also Published As

Publication number Publication date
CN105512510A (en) 2016-04-20

Similar Documents

Publication Publication Date Title
Paetkau et al. Genetic assignment methods for the direct, real‐time estimation of migration rate: a simulation‐based exploration of accuracy and power
Schlötterer et al. Combining experimental evolution with next-generation sequencing: a powerful tool to study adaptation from standing genetic variation
CN105512510B (en) A method of genetic force is assessed by genomic data
Chang et al. High density marker panels, SNPs prioritizing and accuracy of genomic selection
CN109524059B (en) Rapid and stable animal individual genome breeding value evaluation method
Holsinger et al. Genetics in geographically structured populations: defining, estimating and interpreting F ST
Zhang et al. Accuracy of whole-genome prediction using a genetic architecture-enhanced variance-covariance matrix
VanRaden Genomic measures of relationship and inbreeding
CN105868584B (en) The method for carrying out full-length genome selection and use by choosing extreme character individual
Fu et al. Mapping shape quantitative trait loci using a radius-centroid-contour model
Chen et al. Multi-population genomic prediction using a multi-task Bayesian learning model
Masuda et al. 331 Efficient quality control methods for genomic and pedigree data used in routine genomic evaluation
Bariotakis et al. Environmental (in) dependence of a hybrid zone: Insights from molecular markers and ecological niche modeling in a hybrid zone of Origanum (Lamiaceae) on the island of Crete
CN101613742B (en) Multielement high flux genetic marking system and genetic analyzing method of Chinese mitten crabs
Holman et al. A morphological cline in Eucalyptus: a genetic perspective
CN108197435A (en) Localization method between a kind of multiple characters multi-region for containing error based on marker site genotype
Lepais et al. Joint analysis of microsatellites and flanking sequences enlightens complex demographic history of interspecific gene flow and vicariance in rear-edge oak populations
Field et al. Population assignment in autopolyploids
CN106570350A (en) Single nucleotide polymorphisms site parting algorithm
Liang et al. Globally relaxed selection and local adaptation in Boechera stricta
Edel et al. Optimized aggregation of phenotypes for MA-BLUP evaluation in German Fleckvieh
Clark et al. Chloroplast DNA phylogeography in long-lived Huon pine, a Tasmanian rain forest conifer
Schiavinato et al. JLOH: Inferring loss of heterozygosity blocks from sequencing data
Wang et al. Haplotype-based computational genetic analysis in mice
CN116863998B (en) Genetic algorithm-based whole genome prediction method and application thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190308