CN104846077B - A method of specificity, consistency and the stability of test pure lines new rape variety - Google Patents

A method of specificity, consistency and the stability of test pure lines new rape variety Download PDF

Info

Publication number
CN104846077B
CN104846077B CN201510148702.9A CN201510148702A CN104846077B CN 104846077 B CN104846077 B CN 104846077B CN 201510148702 A CN201510148702 A CN 201510148702A CN 104846077 B CN104846077 B CN 104846077B
Authority
CN
China
Prior art keywords
hybrid strain
genotype
measured
rate
hybrid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510148702.9A
Other languages
Chinese (zh)
Other versions
CN104846077A (en
Inventor
张静
彭海
陈斌
陈红霖
王沁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jianghan University
Institute of Crop Sciences of Chinese Academy of Agricultural Sciences
Original Assignee
Jianghan University
Institute of Crop Sciences of Chinese Academy of Agricultural Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jianghan University, Institute of Crop Sciences of Chinese Academy of Agricultural Sciences filed Critical Jianghan University
Priority to CN201510148702.9A priority Critical patent/CN104846077B/en
Publication of CN104846077A publication Critical patent/CN104846077A/en
Application granted granted Critical
Publication of CN104846077B publication Critical patent/CN104846077B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a kind of specificity, the methods of consistency and stability of test pure lines new rape variety.The method includes:Obtain variant sites;Determine the test zone of rape variety to be measured;Build database;After determining amount of sampling, random sampling mixes and extracts the DNA of mixing sample;Prepare primer;It is expanded using the DNA of primer pair mixing sample, amplified production is for building high-throughput sequencing library;High-flux sequence is carried out to high-throughput sequencing library, obtains sequencing segment group;Analysis sequencing segment group, obtains rape variety genotype and hybrid strain genotype to be measured;Compare the approximate kind of acquisition, variant sites and variant sites rate;By hybrid strain genotype compared with the genotype in database, after obtaining hybrid strain kind, hybrid strain rate is calculated;Using variant sites, variant sites rate and hybrid strain rate, rape variety specificity, consistency and stability to be measured are judged.This method can accurately, completely judge the specificity, stability and consistency of rape variety to be measured.

Description

A method of specificity, consistency and the stability of test pure lines new rape variety
Technical field
The present invention relates to biotechnology, more particularly to a kind of specificity, the consistency of test pure lines new rape variety With the method for stability.
Background technology
As a kind of intellectual property of specialization, new variety of plant has become a company and competing to a national core Strive power.The solution that new variety of plant authorizes account and relative legal problems is tested dependent on DUS, i.e. the specificity to rape variety to be measured (Distinctness), the field trapping test or molecules inside of consistency (Uniformity) and stability (Stability) Marker Identification.Field trapping test flow is:Rape variety to be measured is planted simultaneously with approximate kind in field, at 2 years or more The season of growth in, observe their multiple characters, the difference of rape variety to be measured and approximate kind judged according to trait expression Conspicuousness, i.e., it is specific, while judging hybrid strain ratio in group, i.e. consistency and stability;The stream of molecules inside Marker Identification Cheng Wei:The DNA for dividing single plant to extract rape variety to be measured and each sample in approximate kind, and respectively to each survey of each sample It tries region and carries out PCR (Polymerase Chain Reaction, polymerase chain reaction), and electrophoresis is carried out to each PCR product Or generation sequencing detection, according to testing result, difference site ratio of the rape variety to be measured with approximate kind is obtained, according to difference Site ratio judges the specificity of rape variety to be measured.
The shortcomings that field trapping test is:Period length, heavy workload, environmental impact shape cause to judge inaccurate.It is indoor The shortcomings that molecular markers for identification is:Need to handle each test zone of each sample respectively, heavy workload, cannot to sample with Test zone bulk sampling can not calculate hybrid strain rate, thus can not carry out the test of stability and consistency.Field trapping test Common drawback with molecules inside Marker Identification is:Due to workload, can not from existing kind objective selection it is close Like kind, can only applicant be weighed by kind and provided, and based on motivations such as commercial interests, kind weighs the approximate kind that applicant provides May be untrue, to cause the legal consequence of wrong kind mandate.
Invention content
In order to solve the problems in the prior art, an embodiment of the present invention provides a kind of spies of test pure lines new rape variety Anisotropic, consistency and stability method.The technical solution is as follows:
An embodiment of the present invention provides the sides of a kind of specificity of test pure lines new rape variety, consistency and stability Method, the method includes:
Obtain the variant sites between different rape varieties;
Determine that the test zone of rape variety to be measured, the test zone include universal test area by the variant sites Domain, at least partly described variant sites are included in the universal test region;
The database of genotype of the structure comprising the different rape varieties in all test zones;
After the amount of sampling SN for determining the rape variety to be measured, random sampling mixes and extracts the DNA of mixing sample;
The primer for expanding the test zone is prepared, the primer includes universal test region primer;
It is expanded using the DNA of mixing sample described in the primer pair, obtains the amplified production of the test zone, institute Amplified production is stated for building high-throughput sequencing library;
High-flux sequence is carried out to the high-throughput sequencing library, obtains sequencing segment group;
The sequencing segment group is analyzed, rape variety genotype and hybrid strain genotype to be measured are obtained;
By the rape variety genotype to be measured compared with the genotype of the different cultivars in the database, obtain Approximate kind, variant sites and the variant sites rate of the rape variety to be measured;
By the hybrid strain genotype compared with the genotype of the different cultivars in the database, hybrid strain kind is obtained Afterwards, hybrid strain rate is calculated;
Using the variant sites, the variant sites rate and the hybrid strain rate, the spy of the rape variety to be measured is judged Anisotropic, consistency and stability.
Specifically, the amount of sampling SN meets following condition:BINOM.INV (SN, M, 0.95)/SN≤1.15*M, wherein BINOM.INV is the function in excel 2010, and the condition meaning that the amount of sampling SN meets is:Even if the hybrid strain rate only surpasses Go out the 15% of threshold value M, the amount of sampling can correctly judge the stability of the rape variety to be measured in the case where 95% probability ensures With consistency, M is to judge threshold value selected when the consistency and stability.
Specifically, the depth CF of the high-flux sequence meets following condition:BINOM.DIST(10,10,BI NOM.DIST (8,20, BINOM.DIST (0, CF, 0.1%, TRUE), TRUE), FALSE) >=99.9%, 1-BIN OM.DIST (10000,10000,1-BINOM.DIST (8,20,1-BINOM.DIST (99.99%*CF, CF, 99.9989%, TRUE), TRUE), FALSE)≤0.1% and BINOM.DIST (10* (1-M) * CF, 10*CF, 1-110%*M, TRUE) >=95.0%, In, CF is the depth of the high-flux sequence, and M is to judge threshold value selected when the consistency and stability, BINOM.DIST is the function in excel 2010, and the meaning of the formula of the depth CF of the high-flux sequence is:Described miscellaneous Strain rate averagely only has 20 differences down to 0.1%, the hybrid strain kind between 10 and the hybrid strain kind and the rape variety Under conditions of ectopic sites, the detection that is determined by the depth CF of the high-flux sequence all the hybrid strain kinds probability >= 99.9%;Averagely only have 20 between 10000 and the hybrid strain kind and the rape variety in the kind of the database Under conditions of difference site, the presence that is determined by the depth CF of the high-flux sequence judge by accident the probability of the hybrid strain kind≤ 0.1%;When the hybrid strain kind is that 10 and true hybrid strain rate exceed only 10% of selected threshold value when judging specificity, Correct probability >=95.0% of the judgement conclusion to stability and consistency determined by the depth CF of the high-flux sequence.
Specifically, the test zone further includes non-universal test zone, and the primer further includes non-universal test zone Primer.
Further, the non-universal test zone primer includes the first primer and the second primer, the first primer packet The first forward primer and the first reverse primer are included, second primer includes the second forward primer and the second reverse primer, described The first primer and second primer, which carry out individually expanding respectively, obtains the amplified production of two non-universal test zones, will The amplified production mixed in equal amounts of two non-universal test zones is for building the high-throughput sequencing library individually expanded;
5 ' end connections of first forward primer are just like SEQ ID NO in sequence table:Sequence 1 shown in 1, described first 5 ' end connections in reverse primer are just like SEQ ID NO in sequence table:Sequence 2 shown in 2;
5 ' end connections of second forward primer are just like SEQ ID NO in sequence table:Sequence 2 shown in 2, described second 5 ' end connections of reverse primer are just like SEQ ID NO in sequence table:Sequence 1 shown in 1.
Further, using the variant sites, the variant sites rate and the hybrid strain rate, judge the rape to be measured The method of varietY specificity, consistency and stability includes:
When the variant sites rate >=non-universal test zones of SD or described are there are when the variant sites, the oil to be measured Vegetable kind has specificity, as the variant sites rate < SD, alternatively, working as the variant sites rate < SD and the variant sites When being not present in the non-universal test zone, the rape variety to be measured does not have specificity, and wherein SD is to judge specifically Selected threshold value when property;
As the hybrid strain rate≤M of the rape variety to be measured, the rape variety to be measured is with uniformity and stablizes Property, when the hybrid strain rate of the rape variety to be measured is more than > M, the rape variety to be measured does not have consistency and stabilization Property, M is to judge threshold value selected when the consistency and stability;
The hybrid strain rate R=R1+R2-R3-R4, wherein:
Wherein, n1 is the number of nucleus hybrid strain kind, t1 For the number of all special hybrid strain karyogene types of the i-th 1 nucleus hybrid strain kinds, i1j1 is the i-th 1 nucleus After all special hybrid strain karyogene types of hybrid strain kind sort from low to high by frequency, the special hybrid strain core base of jth 1 Because of type, R1i1j1 is the frequency of the i-th 1j1 special hybrid strain karyogene types;R1 is described in the calculating of hybrid strain karyogene type The summation of the hybrid strain rate of nucleus hybrid strain kind, the hybrid strain rate of the nucleus hybrid strain kind are to remove the cell In core hybrid strain kind after the frequency of the special hybrid strain karyogene type of minimum 80% and highest 10%, the remaining spy 2 times of the average value of the frequency of different hybrid strain karyogene type;
Wherein, t2 is the hybrid strain possessed except the nucleus hybrid strain kind The number of the hybrid strain karyogene type except karyogene type and frequency >=0.17%, i2 are except the nucleus hybrid strain kind After all hybrid strain karyogene types except the hybrid strain karyogene type possessed sort from low to high by frequency, the i-th 2 institutes State hybrid strain karyogene type, R2i2 is the frequency of the i-th 2 hybrid strain karyogene types;R2 is using except the nucleus hybrid strain product The hybrid strain rate that the hybrid strain karyogene type that kind possesses calculates, R2 is to remove the institute possessed except the nucleus hybrid strain kind After stating 80% and highest 10% value minimum in the frequency of hybrid strain karyogene type, 2 times of the average value of remaining value;
Wherein, N2 is the number of cytoplasm hybrid strain kind, and R3i3 is the hybrid strain rate of the i-th 3 cytoplasm hybrid strain kinds, R3ic i3 The value of R3i3 when=ic, ic be when the rape variety to be measured is nucleo_cytoplasmic interaction sterile line or maintainer, it is corresponding described The cytoplasm hybrid strain kind of maintainer or the sterile line, t3 are all special of the i-th 3 cytoplasm hybrid strain kinds The number of hybrid strain matter genotype, i3j3 are that all special hybrid strain matter genotype of the i-th 3 cytoplasm hybrid strain kinds are pressed After frequency sorts from low to high, the special hybrid strain matter genotype of jth 3, R3i3j3 is the i-th 3j3 special hybrid strain matter The frequency of genotype, the institute that R3ic refers to the hybrid strain rate for being mixed into the maintainer in the sterile line or is mixed into the maintainer State the hybrid strain rate of sterile line;R3 is the total of the hybrid strain rate of the cytoplasm hybrid strain kind calculated by hybrid strain matter genotype It is to remove minimum 80% and highest 10% in the cytoplasm hybrid strain kind with, the hybrid strain rate of the cytoplasm hybrid strain kind The special hybrid strain matter genotype frequency after, the average value of the frequency of the remaining special hybrid strain matter genotype;
Wherein, t4 be possess except the cytoplasm hybrid strain kind described in The number of the hybrid strain matter genotype except hybrid strain matter genotype and frequency >=0.17%, i4 are except the cytoplasm hybrid strain After all hybrid strain matter genotype except the hybrid strain matter genotype that kind possesses sort from low to high by frequency, the i-th 4 A hybrid strain matter genotype, R4i4 are the frequency of the i-th 4 hybrid strain matter genotype;R4 is using except the cytoplasm is miscellaneous The hybrid strain rate that the hybrid strain matter genotype that strain kind possesses calculates, R4 are to remove except the cytoplasm hybrid strain kind possesses The hybrid strain matter genotype frequency in after minimum 80% and highest 10% value, the average value of remaining value;
Int () is bracket function;
The nucleus hybrid strain kind refers to calculating the hybrid strain kind obtained, the cytoplasm merely with karyogene type Hybrid strain kind refers to calculating the hybrid strain kind obtained merely with matter genotype;The special hybrid strain karyogene type refers to being only All hybrid strain karyogene types of one nucleus hybrid strain kind;The special hybrid strain matter genotype refers to only one All hybrid strain matter genotype of the cytoplasm hybrid strain kind;The hybrid strain karyogene type refers to that the hybrid strain genotype is The karyogene type, the karyogene type refer to the genotype and are located on nuclear genome;The hybrid strain matter genotype refers to The hybrid strain genotype is the matter genotype, and the matter genotype refers to that the genotype is located on cytoplasmic skeleton;Base Because the frequency of type refer to represented in the sequencing segment group genotype sequencing segments account for where the genotype described in The ratio of the sequencing segment sum of test zone.
Further, the method further includes judging the consistency and stabilization of the rape variety to be measured in the following ways The correct probability of conclusion of property is:When the rape variety to be measured is with uniformity and stability, the correct probability of conclusion >= BINOM.DIST(M*SN,SN,R,TRUE)*BINOM.DIST(∑SeN*M,∑SeN,R,TRUE);When the rape product to be measured When kind not having the consistency and stability, the correct probability >=BINOM.DIST of conclusion ((1-M) * SN, SN, (1-R), TRUE)*BINOM.DIST(∑SeN*(1-M),∑SeN,1-R,TRUE);Wherein, ∑ SeN is is useful for calculating the hybrid strain The summation of the sequencing segment of the test zone where the frequency of the genotype of rate R, M are to judge the consistency and stabilization Property when selected threshold value, BINOM.DIST (M*SN, SN, R, TRUE) is that the rape variety to be measured has carried out the sampling of S n times, The hybrid strain rate R being actually pumped is less than the probability of the threshold value M, the meaning of BINOM.DIST (∑ SeN*M, ∑ SeN, R, TRUE) Justice is:SeN sampling of ∑ is carried out to the rape variety to be measured, the hybrid strain rate R being actually pumped is less than the probability of threshold value M.
Further, when the variant sites are not present in the non-universal test zone, if judging the rape to be measured Kind has specificity, the correct probability >=BINOM.DIST of conclusion ((1-SD) * TRN, TRN, 1-OD, TRUE);If judging institute It states rape variety to be measured and does not have specific, the correct probability of conclusion >=BINOM.DI ST (SD*TRN, TRN, OD, TRUE), In, TRN is the number of the successful test zone of detection, and OD is the variant sites rate, BINOM.DIST excel Function in 2010, the correct probability of conclusion is expressed as when judging that the rape variety to be measured has specificity, described Variant sites rate is more than the probability of SD, and when judging that the rape variety to be measured does not have specificity, the variant sites rate is small In the probability of SD.
Specifically, the method for obtaining the hybrid strain kind includes:The hybrid strain kind is to be present in the database Kind, and have the test section of phase homogenic type between the potential hybrid strain genotype of the hybrid strain kind and the hybrid strain genotype The number in domain accounts for ratio >=60% of the sum of the test zone of the hybrid strain kind with the potential hybrid strain genotype; The hybrid strain genotype refers to the potential hybrid strain genotype of frequency >=0.02%;
Quantity >=2 of distinguishing base between the potential hybrid strain genotype and all genotype of the rape variety to be measured There are insertion or the missing of discontinuous base in a or described distinguishing base.
Specifically, it is by the method that the variant sites determine the universal test region:
Pass through discriminationCalculate the value of discrimination, wherein be detected in a variation window areas Kind sum, bi are the kind number of i-th kind of genotype in the variation window area, and bi>1, k is to include more than a kind Genotype number, the variation window area be centered on each single nucleotide variations site, to the mononucleotide The both sides of variant sites respectively extend 1/2 window as detection of sequence length to be measured;
The maximum 6000 variation windows of discrimination on cytoplasmic skeleton is in the universal test region and are located at cell The maximum 100 variations window of discrimination in matter genome.
The advantageous effect that technical solution provided in an embodiment of the present invention is brought is:Method provided in an embodiment of the present invention passes through High-flux sequence and the amplification of multidigit point realize the full-page proof of the large sample sampling and inter-species individual test zone of rape variety to be measured This sampling recycles and defines hybrid strain genotype, defines cytoplasm hybrid strain kind and define the comprehensive means such as hybrid strain rate calculation formula, Specificity, the target of stability and consistency accurate, that completely judge rape variety to be measured are successfully realized, and tests speed Degree faster, can be completed within 10 days.
Specific implementation mode
To make the object, technical solutions and advantages of the present invention clearer, embodiment of the present invention will be made into one below Step ground detailed description.
Embodiment measures rape variety ' specificity, consistency and the stability of Soviet Union 2051 '
Rape variety to be measured provided in an embodiment of the present invention is rape variety " Soviet Union 2051 ", and rape variety " Soviet Union 2051 " is pure It is rape and uses kind to be open.The method for then measuring the specificity of the rape variety, consistency and stability includes following Step.
One, the variant sites between different rape varieties are obtained.
The variant sites of different rape varieties can be obtained from the documents and materials announced, but the knot that this method is obtained Fruit is more fragmentary, in the present embodiment, can obtain a large amount of difference by comparing the genome sequence of different rape varieties Variant sites between rape variety.
Further, the method for obtaining the genome sequence of different rape varieties is as follows:
There are two types of sources for the genome sequence of the different rape varieties of the present embodiment, the first is Huang etc. to 10 oil The high-flux sequence sequence of the genome of vegetable kind, pertinent literature information are as follows:Huang et al.:Identification of genome-wide single nucleotide polymorphisms in allopolyploid crop Brassica napus.BMC Genomics 2013 14:717.The genome sequence of 10 rape varieties is published in NCBI Short Read Archive(http://www.ncbi.nlm.nih.gov/sra), reception number is SRA057227;Second for by The method provided in the above-mentioned article delivered of Huang etc. has carried out high pass to " 430AB ", " P65 " and cenospecies " miscellaneous No. 9 peaceful " Measure sequence.The present embodiment obtains the high-flux sequence sequence of the genome of 13 rape varieties altogether.
Further, variant sites are obtained using the genome sequence of different cultivars.
Specifically, since the sequencing depth of this 13 rape varieties is not high or fragment length falls short of, it is single to be only capable of identification The site nucleotide diversity (SNP), other variation types are as repeated number variation, due to a low credibility, without identification.It utilizes Frederick Sanger compare software (version number 0.4) by the high-flux sequence sequence of the genome of this 13 rape varieties It compares respectively and arrives rape cell core reference gene group (version:Release v1.01, download address:http:// Www.ncbi.nlm.nih.gov) and in cytoplasm reference gene group, which includes that mitochondria refers to base Because of group and chloroplaset reference gene group, in NCBI (National Center for Biotechnology Information, US National Biotechnology Information center) on reception number be respectively NC_016734.1 and AP006444.1. When comparison, Insert Fragment length is set as 500bp, and other parameters are set as default value.The Ssaha Pileup software package (versions of use This number SNP site for 0.5) each rape variety of identification.The SNP site is defined as the base-pair of difference determination, single base The missing of insertion or single base.The base-pair that the difference determines refers to not including the uncertain base-pair of difference, and difference is uncertain Base-pair refer to base-pair between certain degeneracy bases, if R represents A or G, therefore, there may be differences between A and R, also may be used Difference can be not present, therefore, difference is indefinite between A and R, is not mutually SNP.Therefore, the SNP site in the embodiment of the present invention is not Including the uncertain base-pair of above-mentioned difference.By the definition of the above SNP site, the embodiment of the present invention is in all 13 rape varieties Between obtain 911346 SNP sites altogether, wherein 18543 SNP sites are located on cytoplasmic skeleton, remaining SNP site position In on nuclear genome.Genotype referred to hereafter is the combination for referring to multiple SNP sites in test zone, and karyogene type refers to Genotype is located on nuclear genome, and matter genotype refers to that genotype is located on cytoplasmic skeleton.For example, the 1st in table 1 Test zone is located on nuclear genome, is karyogene type, which shares 3 SNP sites, the base of the test zone Because type is the combination of this 3 SNP sites.
Two, determine that the test section of rape variety to be measured, test zone include universal test region by variant sites, at least Meristic variation site is included in universal test region, and method includes:
Determine universal test region
Universal test region be on cytoplasmic skeleton on the big region or nuclear genome of discrimination discrimination it is big and Equally distributed region, wherein discriminationWherein, a is that the kind that is detected is total in making a variation window area Number, bi be the kind number of i-th kind of genotype in variation window area, and bi>1, k is to include the genotype more than a kind Number, variation window area are centered on each single nucleotide variations site (SNP site), to single nucleotide variations site Both sides respectively extend 1/2 window as detection for surveying sequence length;Test zone is the area that discrimination is big on cytoplasmic skeleton Discrimination is big on domain or nuclear genome and equally distributed region.The Computing Principle of discrimination is as follows:It is all interracial Number of combinations isWherein, the combination between the different cultivars in same gene type is undistinguishable, and number isThat , the ratio for the breed combination that can not be distinguished isThe ratio for the breed combination that can be distinguished i.e. discriminationIt can be seen that discrimination is bigger, different cultivars can more be distinguished, the big variation window area of discrimination It is more effective to DUS tests.If the variation window area on nuclear genome is unevenly distributed, some regions can be caused adjacent, To linkage inheritance, information is easy overlapping, and the principle of compositionality in universal test region therefore, on nuclear genome is selected to be:Area Indexing is big and SNP site is uniformly distributed.Cytoplasmic skeleton without linkage inheritance problem, so, on cytoplasmic skeleton only need The big region of selective discrimination degree.
High-flux sequence is carried out using Proton high-flux sequence instrument in the embodiment of the present invention, the test section of detection is sequenced Length of field can reach 200bp, and in order to obtain maximum fault information, the longest test zone in the present embodiment is also 200bp.Therefore, The variant sites that the present embodiment is mentioned refer to entire test zone, inside may include multiple SNP sites.
First, centered on each SNP site of acquisition, respectively extend 99bp and 100bp to the left and right, constitute the change of 200bp Different window.According to the 911346 of acquisition SNP sites, 911346 variation windows can be obtained, calculate these variation window regions The discrimination in domainFor example, in the 1st variation window area, a=13 kind is detected altogether, shares k= 2 kinds of genotype CCA, TTA, their kind number are respectively b1=4 and b2=7 a, therefore, It is meant that:By the 1st variation window area, 65% breed combination in 13 kinds can be distinguished, in addition 35% breed combination cannot be distinguished and open, and the window that needs more to make a variation can just distinguish.After the same method, it calculates and obtains The discrimination of whole 911346 variations windows is simultaneously therefrom chosen positioned at maximum 6000 changes of discrimination in nuclear genome Different window and the maximum 100 variations windows of discrimination in the cytoplasmic skeleton.It checks one by one and is located at nuclear genome 6000 variation windows in, it is each make a variation window between next variation window at a distance from, if apart from more than 200K (1K= 1000 bases), then it abandons reexamining after the smaller variation window of wherein discrimination, until the adjacent distance for looking into variation window Until being all higher than 200K.The criterion distance of selection 200K is because rapeseed gene group size is about 930M (ten thousand alkali of 1M=100 Base), it is located at based on the universal test region of nuclear genome by final selected 2000, the interregional distance of average universal test About 500K, but due to few variant sites such as some specific regions such as centromeres, average distance should be less than 500K.By the above process, 4367 variation windows for being located at nuclear genome are had selected, they are located at cytoplasm with what is obtained Totally 4467 variation windows pass through test zone to the maximum 100 variations window of discrimination as selected together in genome. Wherein, the maximum 200 variations window of selective discrimination degree, is empirical value, which can modify as the case may be.
The test zone can also include non-universal test zone.
Determine non-universal test zone:
Non-universal test zone refers to that special kinds need the non-universal test zone site detected.DUS tests need to examine The non-universal site of measuring point transformation, fixed point transformation is common technological means in modern breeding, as back cross breeding, transgenosis are educated Kind etc., fixed point transformation kind can also become new varieties because it has specificity.Judgement based on New variety protection specificity Principle, non-universal test zone should not include in universal test region and be the known site for controlling qualitative character.This implementation In example, since rape variety to be measured is transformed by pinpointing, no non-universal site needs to detect, therefore, without non-through Use test zone.
Three, the primer in amplification assay region is prepared, primer includes universal test region primer, specific as follows:
Universal test region primer is prepared, which is directed to all kinds, specifically:
Universal test region is detected using multiple PCR technique, and multiple PCR technique refers in same PCR reactions Multiple PCR primers, while multiple sites in amplification gene group are added.The key of the technology is to design and synthesize multiplex PCR to draw Object, the present embodiment use the multiple PCR technique that LifeTechnology companies of the U.S. provide, up to 12000 weights can be arranged PCR primer.
Primer acquisition process is as follows:Log in LifeTechnology companies multiple PCR primer Photographing On-line webpage https://ampliseq.com/protected/help/pipelineDetails.action submits related letter by its requirement Breath.In the present embodiment, " Application type " option selects " DNA Hotspot designs (single- pool)".If selecting multi-pool, multiplex PCR will divide multitube to carry out, and cost can increased, and single-pool Primer only needs a multiplex PCR, saves cost, the disadvantage is that certain universal test regions design of primers may fail, but Alternative universal test region on genome is more, therefore, abandons some alternative universal test regions and has no effect on result. The nucleus reference gene group of rape variety to be measured and cytoplasm reference gene group are permeated file, and in " Select After selecting " Custom " in the genome you wish to use " options, the file of fusion is uploaded as design multiplex PCR Reference gene group when primer.DNA type options select " Standard DNA ", and in Add Hotspot options, addition needs The location information of SNP site in the universal test region to be designed, including the initiation site of chromosome information, SNP and The end locus of SNP, certain embodiments are shown in Table 1." Submit targets " button is finally clicked to submit and designed more Weight PCR primer.In the present embodiment, from all 4467 universal test regions, designs and be successfully authenticated 2302 pairs of multiplex PCRs Primer, for expanding corresponding 2302 universal test regions.The method for verifying multiple PCR primer is by side provided by the invention Method, extracts the leaves genomic DNA on same strain rape, and using the multiple PCR primer of design to the genomic DNA of acquisition into Row amplification builds library, high-flux sequence and analyzes sequencing segment group, removes the corresponding primer of following test zone:The test zone Sequencing segments less than 1000 or there are hybrid strain genotype, the primer that remains is the multiple PCR primer being proved to be successful. Since genomic DNA source is in same strain rape leaf, it is impossible to which there are hybrid strain kinds, and therefore, hybrid strain genotype is by testing PCR caused by the special construction in region or sequencing Preference mistake, remove these test zones and avoid such system mistake.It tests Demonstrate,prove after successful multiple PCR primer is also mixed by the said firm is supplied to client to use in fluid form.Above-mentioned successful design 2302 universal test regions of multiple PCR primer are the universal test region detected eventually for rape variety to be measured, Meanwhile each kind in the database of structure also contains above-mentioned 2302 universal test regions, wherein 55 universal tests Region is located on cytoplasmic skeleton, and remaining 2247 universal test regions are located on nuclear genome.
It should be noted that:The number in universal test region requires >=900, and reason is as follows:If being less than 900, exist The probability of the hybrid strain kind of erroneous judgement will be more than 1%, and the projectional technique of the threshold value is shown in Table 2.Since there may be the surveys of detection failure Region is tried, therefore, test zone number is general >=and 1000.
Test zone primer can also include non-universal test zone primer, and the non-universal test zone primer is for be measured Rape variety, it is specific as follows:
Prepare non-universal test zone primer:
The primer of non-universal test zone includes the first primer and the second primer, the first primer include the first forward primer and First reverse primer, the second primer include the second forward primer and the second reverse primer, the first primer and the second primer respectively into Individually amplification obtains the amplified production of two non-universal test zones to row, by the amplified production equivalent of two non-universal test zones It is mixed for building the high-throughput sequencing library individually expanded.5 ' end connections of the first forward primer are just like SEQ ID in sequence table NO:Sequence 1 shown in 1,5 ' end connections in the first reverse primer are just like SEQ ID NO in sequence table:Sequence 2 shown in 2;The 5 ' end connections of two forward primers are just like SEQ ID NO in sequence table:Sequence 2 shown in 2,5 ' end connections of the second reverse primer Just like SEQ ID NO in sequence table:Sequence 1 shown in 1.
The design process of non-universal test zone primer is as follows:The first step is no more than 200bp and comprising non-by amplification length The requirement of all SNP sites in universal test region, by common PCR primers design method, design expands non-universal test zone PCR forward primer and reverse primer;5 ' ends of designed forward primer and reverse primer are separately connected sequence by second step SEQ ID NO in list:1 and sequence table in SEQ ID NO:2, the forward primer and the first primer of the first primer are obtained respectively Reverse primer;Third walks, and 5 ' ends of designed forward primer and reverse primer are separately connected SEQ ID NO in sequence table:2 With SEQ ID NO in sequence table:1, the reverse primer of the forward primer and the second primer of the second primer is obtained respectively.In sequence table SEQ ID NO:1 and sequence table in SEQ ID NO:2 be the joint sequence used in high-flux sequence, thereby using PCR product band There is the joint sequence of high-flux sequence, after establishing sequencing library after can directly being mixed with the product in the general sequencing region of amplification Be sequenced together, without by fragmentation, jointing etc. it is cumbersome build library step, improve work efficiency and reduce into This.It is to be sequenced simultaneously from the both ends of non-universal test zone to make two pairs of only different primers of connector.
Rape variety to be measured in the present embodiment due to no non-universal test zone, universal test region nothing but Primer.
Four, structure is as follows in the method for the database of the genotype of all test zones comprising different rape varieties:
This example obtains 2302 universal test region primers and 0 non-universal test zone primer, they are corresponding Amplification region is the test zone of rape variety to be measured.Structure comprising 13 kinds 2302 test zones genotype and its The database of the location information of SNP, partial results are shown in Table 1.
Table 1 be database variety and genetype and its position, rape variety genotype to be measured, hybrid strain genotype and
The certain embodiments of its frequency
In table 1, "/" indicates that the test zone is heterozygous genotypes, and there are the different genotype of "/" both front and back;It removes Outside ATGC, other letters represent degeneracy base.If genotype is made of degeneracy base N entirely, claim corresponding test zone genotype with SNP shortage of data makees indifference processing when the genotype or SNP of missing are compared with any genotype or SNP.It can be by this hair The method Test database kind of the detection rape variety genotype to be measured of bright offer and the genotype of completion missing.
The present embodiment does not list all database content completely as space is limited, only lists wherein 5 kinds The information of 10 test zones.It is equally limited based on length, also has some areas also only to list part in the present embodiment related real Example, remaining unlisted data can be according to the method completion of the present embodiment.
Five, after the amount of sampling SN for determining rape variety to be measured, random sampling mixes and extracts the DNA of mixing sample, method It is as follows:
Calculate rape variety amount of sampling to be measured
Amount of sampling SN should meet following condition:BINOM.INV (SN, M, 0.95)/SN≤1.15*M, wherein BINOM.INV For the function in excel 2010, application method is identical as the definition in excel 2010, is meant that so that accumulating binomial The functional value of distribution is greater than or equal to the smallest positive integral of critical value.The condition meaning that amount of sampling SN meets is:Even if hybrid strain rate is only 15% beyond threshold value M, which can correctly judge the stability and one of rape variety to be measured in the case where 95% probability ensures Cause property.M values are artificially determined according to conditions such as crop species, type, specific requirements.It is done in Ministry of Agriculture's New variety protection In public room publication《New variety of plant specificity, consistency and stability test guide-cabbage type rape》Middle regulation:For routine Kind (including parental department), using 2% population norms and at least 95% acceptance probability.Therefore, it in the present embodiment, selects intermediate Value 2% is used as M values.After gradually increasing SN values, calculates above-mentioned formula and find, as SN >=5957, BINOM.INV (SN, 2%, 0.95)/SN≤1.15*2% is set up.Therefore, the rape variety amount of sampling to be measured in the present embodiment answers >=5957.
Random sampling mixes and extracts the DNA of mixing sample
In the present embodiment, 10000 germinations are had chosen, 8000 bud being substantially equal to the magnitudes mixing are randomly selected It is placed in mortar, powder is fully ground into after liquid nitrogen is added into mortar.It is produced using Beijing Tiangeng biochemical technology Co., Ltd Article No. be DP305 plant genome DNA extracts kit extract and obtain the DNA, DNA of rape variety mixing sample to be measured Extracting method is carried out by the operation manual of the kit.Utilize the production of Invitrigen companies of the U.S.dsDNA HS Assay Kit (article No. Q32852) and its specification quantify the DNA of acquisition, the rape variety to be measured after quantifying DNA is diluted to 10.00ng/ μ l.
Six, it is expanded using the DNA of primer pair mixing sample, obtains the amplified production of test zone, amplified production is used In structure high-throughput sequencing library, the specific method is as follows:
High-throughput sequencing library includes:The high pass of the high-throughput sequencing library in universal test region and non-universal test zone Sequencing library is measured, the high-throughput sequencing library in universal test region and non-universal test zone is built respectively, the two is mixed, energy The high-throughput sequencing library of all test zones is accessed, does not have non-universal test zone, therefore, test zone in the present embodiment High-throughput sequencing library be universal test region high-throughput sequencing library.
The method for building the high-throughput sequencing library in universal test region is as follows:
It is multiple using library construction Kit 2.0 (being produced by LifeTechnology companies of the U.S., article No. 4475345) Behind PCR amplification universal test region, high-throughput sequencing library is built using amplified production.The kit includes following reagent:5× Ion AmpliSeqTMHiFi Mix, FuPa reagents, transferring reagent, sequence measuring joints solution and DNA ligase.The side of library construction Method presses the operation manual of the kit《Ion AmpliSeqTMLibrary Preparation》(publication number:MAN0006735, Version:A.0 it) carries out.By 2302 universal test regions of multiplexed PCR amplification, the amplification system of multiplex PCR is as follows:5×Ion AmpliSeqTMThe DNA 10ng of 4 μ l of HiFi Mix, 4 μ l of universal test region primer mixed liquor of preparation, rape variety to be measured With 11 μ l of no enzyme water.The amplification program of multiplex PCR is as follows:99 DEG C, 2 minutes;(99 DEG C, 15 seconds;60 DEG C, 4 minutes) × 25 follow Ring;10 DEG C of heat preservations.After digesting primer extra in multiplex PCR amplification product using FuPa reagents, then phosphorylation is carried out, specifically Method is:It is added 2 μ L FuPa reagents into the amplified production of multiplex PCR, after mixing, is reacted by following procedure in PCR instrument: 50 DEG C, 10 minutes;55 DEG C, 10 minutes;60 DEG C, 10 minutes;10 DEG C of preservations, obtain mixture a, and mixture a is containing by phosphorus The amplified production solution of acidification.By the upper sequence measuring joints of amplified production connection of phosphorylation, specific method is:Add into mixture a Enter 2 μ L of 4 μ L of transferring reagent, 2 μ L of sequence measuring joints solution and DNA ligase, after mixing, is reacted by following procedure in PCR instrument:22 DEG C, 30 minutes;72 DEG C, 10 minutes;10 DEG C of preservations, obtain mixed liquor b.Utilize the ethanol precipitation methods purifying mixed liquor b of standard After be dissolved in 10 μ L without in enzyme water.Utilize the production of Invitrigen companies of the U.S.DsDNA HS Assay Kit (goods Number be Q32852) and be measured according to its specification, and after obtaining the mass concentration of mixed liquor b, will after purification mixed liquor b it is dilute It releases to 15ng/ml, obtains the high-throughput sequencing library in the universal test region of concentration about 100pM.
The method for building the high-throughput sequencing library of non-universal test zone is as follows:
Using the DNA of rape variety to be measured as template, the first primer of the non-universal test zone of above-mentioned preparation and are utilized Two primers carry out independent PCR amplification respectively, and the high-flux sequence text of non-universal test zone is obtained after mixed in equal amounts amplified production Library.Concrete operations are pressed《Ion Amplicon Library Preparation(Fusion Method)》(publication number: 4468326) it carries out, substantially process is as follows:The concentration for being 10 μM by the forward primer of the first primer and reverse primer water dissolution Afterwards, isometric mixing, obtains the first primer solution.It is formulated as follows PCR reaction systems:1 μ L of the first primer solution, 30ng oil to be measured Vegetable kind DNA and PCR high-fidelity mixture (invirtrigen companies of the U.S. produce, article No. 12532016) 45 μ L, mixing Afterwards, it is reacted by following procedure in PCR instrument:94 DEG C, 3 minutes;(94 DEG C, 30 seconds;58 DEG C, 30 seconds;68 DEG C, 1 minute) × 40 Cycle;4 DEG C of heat preservations.Pcr amplification product is dissolved in by the method for the ethanol precipitation of standard in 10 μ L water after purification, utilizes DNA 1000 kits (article No. 5067-1504) are pressed on the biological analyser (model 2100) that Agilent company of the U.S. produces After the kit specification measures and obtains the molar concentration of amplified production, it is diluted to 200pM, as the amplification production of the first primer Object.Using identical method, the amplified production of the second primer of a concentration of 200pM is obtained.By the amplified production of the first primer with The amplified production of second primer mixes in equal volume, obtains the non-universal test zone high-throughput sequencing library of a concentration of 100pM.This In embodiment, due to universal test region nothing but, without the high-throughput sequencing library for building non-universal test zone.
Obtain the high-throughput sequencing library of all test zones
The general of equimolar concentration is mixed in the ratio of the number of the number and non-universal test zone in universal test region The high-throughput sequencing library of the high-throughput sequencing library of test zone and non-universal test zone, obtained mixture are all The high-throughput sequencing library of test zone.In the present embodiment, because of the high-throughput sequencing library in universal test region nothing but, because This, the high-throughput sequencing library of structure is the high-throughput sequencing library in the universal test region of a concentration of 100pM.
Seven, high-flux sequence is carried out to high-throughput sequencing library, obtains sequencing segment group, method is as follows:
Determine the principle of high-flux sequence depth:The depth of high-flux sequence meets following condition:BINOM.DI ST(10, 10, BINOM.DIST (8,20, BINOM.DIST (0, CF, 0.1%, TRUE), TRUE), FALSE) >=99.9%, 1- BINOM.DIST (10000,10000,1-BINOM.DIST (8,20,1-BINOM.DIST (99.99%*CF, CF, 99.9989%, TRUE), TRUE), FALSE)≤0.1% and BINOM.DIST (10* (1-M) * CF, 10*CF, 1-110%*M, TRUE) >=95.0%, wherein CF is the depth of high-flux sequence, namely the multiple that average each test zone is capped, M are Judge threshold value selected when consistency and stability, BINOM.DIS T are the function in excel 2010, application method with Definition in excel 2010 is identical, and what is returned is the probability of binomial distribution.The meaning of three functions is:In hybrid strain rate The condition in average only 20 difference sites down to 0.1%, hybrid strain kind up to 10 and between hybrid strain kind and rape variety to be measured Under, by probability >=99.9% for the whole hybrid strain kinds of detection that high-flux sequence depth determines;In database kind up to 10000 It is a and between hybrid strain kind and rape variety to be measured under conditions of average only 20 difference sites, it is determined by high-flux sequence depth In the presence of probability≤0.1% of erroneous judgement hybrid strain kind;Judgement specificity is exceeded only in hybrid strain kind up to 10 and really hybrid strain rate When selected threshold value 10% when, the judgement conclusion to stability and consistency determined by high-flux sequence depth is correct Probability >=95.0%.Conditions above is very stringent, and therefore, true effect is better than above-mentioned threshold value.The projectional technique of the above probability is shown in Table 2.
Table 2 is the computational methods of the present embodiment dependent probability
Table 2 is 2010 tables of data of Excel, and function, cell etc. are identical as the definition of Excel 2010.Wherein, " judging threshold value selected when consistency and stability (M) " for cell B3, other cell numbers are pressed using B2 as reference The rule of Excel 2010 defines, such as the cell where " hybrid strain rate (R) " increases 4 rows 1 row on the basis of B2, therefore Number is C7, and other cell coding rules are identical with this.
The determination method of the present embodiment high-flux sequence depth is:After M=2% is substituted into above three formula, gradually add When big sequencing depth CF to 1935, above three equation can be made to set up, therefore, the present embodiment sequencing depth is determined as >=1935 Times.
High-flux sequence is carried out using high-throughput sequencing library
Utilize the high-throughput sequencing library and kit Ion PI Template of all test zones of acquisition OT2200Kit v2 (invirtrigen companies of the U.S. produce, article No. 4485146) be sequenced before ePCR (Emulsion PCR, emulsion polymerization enzyme chain reaction) it expands, operating method is carried out by the operation manual of the kit.Utilize ePCR products and reagent Box Ion PI Sequencing 200Kit v2 (invirtrigen companies of the U.S. produce, article No. 4485149) are in Proton High-flux sequence is carried out on two generation high-flux sequence instrument, operating method is carried out by the operation manual of the kit.In the present embodiment In, high-flux sequence flux is set as average 10000 times of coverage test region.
High-flux sequence result is pre-processed
First determine whether high-flux sequence the quality of data whether >=Q20, if<Q20 (this situation is few), then as stated above High-flux sequence is re-started, until quality requirement reaches Q20 standards, Q20 standards meet in table 2 that " sequencing mistake is specific The requirement of the probability of base "≤0.33%.The high-flux sequence segment for being up to quality requirement is compared to all 2302 tests Region is removed after comparing the unsuccessful and infull sequencing segment of genotype detection, and remaining all sequencing segments are known as that piece is sequenced Section group.The incomplete sequencing segment of genotype detection refers to could not will be shown in " positions of the SNP in reference gene group " in table 1 All SNP sites in sequencing region where the sequencing segment detect that the infull reason of genotype detection is sequencing segment It is too short, compare unsuccessful the reason is that sequencing segment is mostly non-specific amplification product.
Eight, analysis sequencing segment group, it is as follows to obtain rape variety genotype and hybrid strain genotype, method to be measured:
Sequencing segment group is compared to all test zones, and counts the sequencing segments in each test zone, is removed The test zone of segments≤1000 is sequenced, remaining test zone is to detect successful test zone.In the present embodiment, 2117 successful test zones of detection are obtained altogether.The segment for comparing test zone is known as the sequencing segment of the test zone, The base composition of position shown in " positions of the SNP in reference gene group " is known as the sequencing from extraction table 1 in sequencing segment The genotype of segment.The frequency of genotype refers in sequencing segment group, and the sequencing segments for representing the genotype accounts for the genotype The ratio of the sequencing segment sum of place test zone.The maximum genotype of frequency is known as rape variety genotype to be measured.Hybrid strain Genotype refers to the potential hybrid strain genotype of frequency >=0.02%, wherein potential hybrid strain genotype is all with rape variety to be measured There are insertion or the missing of discontinuous base in quantity >=2 of distinguishing base between genotype or distinguishing base.Hybrid strain genotype The principle of definition is:In high-flux sequence, it is inserted into or missing errors is extremely rare, and the 2 fixed difference caused by mistake is sequenced The probability of base requires hybrid strain genotype frequency >=0.02% down to (1%/3) 2=0.0011%, is limited in these conditions Under, even 30000 sequencing depth, because the probability that sequencing mistake generates certain hybrid strain genotype is only 0.0001% (calculating 2) method is shown in Table.0.02% frequency meets most stringent DUS testing standards at present, i.e., detected from 10,000 seeds down to 2 Hybrid.If distinguishing base quantity=1, whole test zones can all generate wrong hybrid strain genotype, and (computational methods are shown in Table 2), if when distinguishing base quantity >=3, hybrid strain genotype quantity is drastically reduced, it is difficult to accurate calculating hybrid strain rate R is therefore, poor The threshold value of isobase quantity >=2 is optimal.
For example, in segment group is sequenced, the sequencing segment sum in the 1st sequencing region is 11230 articles, have TTA, CCA, TTT, TTG ... totally 29 kinds of genotype, represent these genotype sequencing segments distinguish 10840,281,2,2 The frequency of item ..., these genotype is 10840/11230=96.52%, 281/11230=2.50%, 2/11230= 0.02%, 2/11230=0.02% ....By the definition of rape variety genotype and hybrid strain genotype to be measured, TTA should be waited for Survey rape variety the 1st test zone rape variety genotype to be measured, and the frequency of CCA be more than 0.02% and with oil to be measured Dish variety and genetype TTA relatively has the difference of 2 >=2 bases, therefore CCA is hybrid strain genotype, and other genotype are sequencing The genotype that mistake generates.Hybrid strain karyogene type refers to that hybrid strain genotype is karyogene type, and hybrid strain matter genotype refers to hybrid strain base Because type is matter genotype.By this definition, the hybrid strain genotype CCA of first test zone is also hybrid strain karyogene type.By identical Method, judge and obtain rape variety genotype to be measured, the hybrid strain genotype of all 2117 successful test zones of detection And its frequency, and judge that the hybrid strain genotype obtained is hybrid strain karyogene type or hybrid strain matter genotype.The result shows that:It obtains altogether 177 hybrid strain genotype, wherein 174 are hybrid strain karyogene type, and 3 are hybrid strain matter genotype.
The standard sample detection method in the present embodiment is following is a brief introduction of, a kind is taken from rape variety to be measured After sowing and growing up to seedling, genomic DNA is extracted using the blade of seedling by method identical with rape variety to be measured for son, should DNA is known as the standard sample of rape variety to be measured.With rape variety to be measured simultaneously and by the parallel structure standard sample of same procedure High-throughput sequencing library and high-flux sequence.Wherein, the maximum genotype of frequency is known as standard sample genotype, standard sample In quantity >=2 or distinguishing base of frequency >=0.02% of hybrid strain genotype and the distinguishing base between standard sample genotype There are insertion or the missing of discontinuous base.By method identical with rape variety to be measured, acquisition each detects successful test section Standard sample genotype in domain and standard sample hybrid strain genotype.If standard sample genotype is identical with variety and genetype to be measured Test zone account for standard sample and kind to be measured to detect the ratio of successful test zone be more than 90%, then standard sample is being just Really, otherwise, 1 seed is taken from rape variety to be measured again, repeats above procedure, until obtaining correct standard sample.It will The hybrid strain genotype of correct standard sample obtains identical miscellaneous compared with the hybrid strain genotype of the corresponding test zone of kind to be measured Pnca gene type removes identical hybrid strain genotype in kind to be measured, and correct kind hybrid strain genotype to be measured is retained simultaneously For subsequent analysis.The above measure eliminates the hybrid strain genotype caused by Systematic selection mistake, Systematic selection mistake master If the PCR selectivity mistake amplifications caused by the special construction of gene order.It should be noted that:When database is wide in variety, When can represent different cultivars genotype extensively, hybrid strain genotype can be required identical as some genotype of database kind, together Sample can play function identical with standard sample, in this case, it is possible to which not examination criteria sample, reaches and mitigate workload Purpose.Result is in the present embodiment:From the 177 hybrid strain genotype obtained, 2 hybrid strain genotype are eliminated altogether, wherein 2 For hybrid strain karyogene type, 0 is hybrid strain matter genotype, and the 175 hybrid strain genotype remained are used for subsequent analysis, part It the results are shown in Table 1.
Nine, by rape variety genotype to be measured compared with the genotype of the different cultivars in database, the approximate kind of acquisition, Variant sites and variant sites rate, method are as follows:
If in the test, rape variety to be measured claims the test zone with the genotype of database kind without missing For the shared test zone of rape variety to be measured and the database kind.In shared test zone, if rape variety to be measured with The genotype of database kind is not exactly the same, then the test zone where the not exactly the same genotype is referred to as rape to be measured The difference site of kind and the database kind, corresponding genotype Differential genotype each other, difference site rate=difference site Number/shared test zone number.The kind that difference bit rate minimum is obtained from database is known as rape variety to be measured Approximate kind, corresponding difference site are known as variant sites, number/shared test zone of variant sites rate=variant sites Number.
In the present embodiment, the shared test zone number of the 1st kind " 430AB " of rape variety and database to be measured is 2021.In the 1st shared test zone, rape variety to be measured and " 430AB " genotype are respectively TTA and CCA, and the two is not Identical, therefore, the 1st shared test zone is the difference site of rape variety to be measured and " 430AB ", and TTA and CCA are to wait for Survey the Differential genotype of rape variety and " 430AB ".By identical method, by all shared test zones, rape product to be measured For kind compared with " 430AB " genotype, discovery shares 278 difference sites, difference site rate=278/2021=13.76%.It presses Identical method obtains rape variety to be measured and all 13 interracial difference sites rate in database, and obtains difference position The kind of point rate minimum is " P65 ", and difference site rate is 3.68%.Therefore, " P65 " is the approximate kind of rape variety to be measured, The variant sites rate of rape variety to be measured is 3.68%.
Ten, it by hybrid strain genotype compared with the genotype of the different cultivars in database, after obtaining hybrid strain kind, calculates miscellaneous Strain rate, method are as follows:
Obtain hybrid strain kind:Hybrid strain kind is present in the kind in database, and the potential hybrid strain genotype of hybrid strain kind Having the number of the test zone of phase homogenic type to account for hybrid strain kind between hybrid strain genotype has the test of potential hybrid strain genotype Ratio >=60% of the sum in region, wherein the difference between potential hybrid strain genotype and all genotype of rape variety to be measured There are insertion or the missing of discontinuous base in quantity >=2 of base or distinguishing base.Hybrid strain kind is divided into nucleus hybrid strain product Kind and cytoplasm hybrid strain kind, wherein nucleus hybrid strain kind refers to calculating the hybrid strain kind obtained merely with karyogene type, carefully Cytoplasm hybrid strain kind refers to calculating the hybrid strain kind obtained merely with matter genotype.For example, it is assumed that the base of the kind in database When being respectively AA, AA, AA/TT, AA/TT, AA/TT, AA/TT and AA because of type, the corresponding genotype of rape variety to be measured is respectively AA, AA/TT, TT, AA, TT/CC, GG/CC and when-A, corresponding potential hybrid strain genotype is:Nothing, nothing, AA, TT, AA, AA/TT And AA.Be not present heterozygous genotypes in general pure line cultivar, but only a few site there may be, in addition, hybrid strain is mostly cenospecies, Heterozygous sites are more typical, therefore list various possible situations.Parameter 60% can ensure that whole hybrid strain kind detection probabilities are 100% and exist erroneous judgement hybrid strain kind probability be 0%, the determination method of the parameter value is shown in Table 2.
In the 1st test zone of the present embodiment, first kind " 430AB " and rape variety to be measured in database Genotype is respectively CCA and TTA, and there are the differences of 2 bases between the two, be CCA is potential hybrid strain genotype therefore, and is somebody's turn to do Potential hybrid strain genotype is identical as the hybrid strain genotype CCA in the 1st test zone, by identical method, judges one by one all In the test zone of karyogene type, whether the genotype of first kind " 430AB " is potential hybrid strain genotype in database, if For potential hybrid strain genotype, then judge whether there is phase homogenic type between potential hybrid strain genotype and hybrid strain genotype, the results showed that, " 430AB " shares 210 test zones with potential hybrid strain genotype, the hybrid strain gene of all of which and same test region There is phase homogenic type between type, its ratio be 210/210=100%>60%, therefore, judge " 430AB " as nucleus hybrid strain product Kind, in a similar manner, using the test zone of all matter genotype, " 430AB " is judged not as cytoplasm hybrid strain kind.By phase With method, judge in database whether all other kind is nucleus hybrid strain kind or cytoplasm hybrid strain kind, as a result table It is bright:Only " 430AB " is nucleus hybrid strain kind, does not find cytoplasm hybrid strain kind.It these results suggest that:" 430AB " is logical It crosses flyings pollination rather than mechanical admixture, genotype has been mixed into rape variety to be measured.
Obtain special hybrid strain genotype:Special hybrid strain genotype refers to the hybrid strain gene that only a hybrid strain kind is all Type comprising special hybrid strain karyogene type and special hybrid strain matter genotype;Special hybrid strain karyogene type refers to an only cell All hybrid strain karyogene types of core hybrid strain kind, special hybrid strain matter genotype refer to that only a cytoplasm hybrid strain kind is all Hybrid strain matter genotype.In the present embodiment, 177 hybrid strain genotype are obtained altogether, wherein 174 are hybrid strain karyogene type, and 3 are Hybrid strain matter genotype.First hybrid strain karyogene type CCA is only that nucleus hybrid strain kind " 430AB " is all, so, CCA is The special hybrid strain karyogene type of " 430AB ".By identical method, in 177 hybrid strain genotype for judging all acquisitions one by one, 82 A special hybrid strain karyogene type possessed for " 430AB ".In a similar manner, judge 10 hybrid strain matter genotype not for spy Different hybrid strain matter genotype.
Hybrid strain rate R principles are calculated, it is specific as follows:
Hybrid strain rate R=R1+R2-R3-R4, wherein:Wherein, n1 For the number of nucleus hybrid strain kind, t1 is the number of all special hybrid strain karyogene types of the i-th 1 nucleus hybrid strain kinds, I1j1 is jth 1 after all special hybrid strain karyogene types of the i-th 1 nucleus hybrid strain kinds sort from low to high by its frequency Special hybrid strain karyogene type, R1i1j1 are the frequency of the i-th 1j1 special hybrid strain karyogene types;R1 is by hybrid strain karyogene type meter The summation of the hybrid strain rate of the nucleus hybrid strain kind of calculation, the hybrid strain rate of nucleus hybrid strain kind are to remove in nucleus hybrid strain kind After the frequency of 80% and highest 10% minimum special hybrid strain karyogene type, the frequency of remaining special hybrid strain karyogene type 2 times of average value;Wherein, t2 be except nucleus hybrid strain kind possess it is miscellaneous Except strain karyogene type and the hybrid strain karyogene type of frequency >=0.17% number, i2 possess except nucleus hybrid strain kind After all hybrid strain karyogene types except hybrid strain karyogene type sort from low to high by its frequency, the i-th 2 hybrid strain karyogene types, R2i2 is the frequency of the i-th 2 hybrid strain karyogene types;R2 is calculated using the hybrid strain karyogene type possessed except nucleus hybrid strain kind Hybrid strain rate, be minimum 80% and highest in the frequency for remove the hybrid strain karyogene type possessed except nucleus hybrid strain kind After 10% value, 2 times of the average value of remaining value;Wherein,N2 is the number of cytoplasm hybrid strain kind, and R3i3 is the i-th 3 cells The hybrid strain rate of matter hybrid strain kind, the value of R3i3 when R3ic is i3=ic, ic are when rape variety to be measured is nucleo_cytoplasmic interaction infertility System or when maintainer, the cytoplasm hybrid strain kind of corresponding maintainer or sterile line, t3 are the i-th 3 cytoplasm hybrid strain kinds The number of all special hybrid strain matter genotype, i3j3 are that all special hybrid strain matter genotype of the i-th 3 cytoplasm hybrid strain kinds are pressed After its frequency sorts from low to high, the special hybrid strain matter genotype of jth 3, R3i3j3 is the i-th 3j3 special hybrid strain matter genotype Frequency, R3ic refers to the hybrid strain rate of sterile line for being mixed into the hybrid strain rate of the maintainer in sterile line or being mixed into maintainer;R3 is By the summation of the hybrid strain rate of the cytoplasm hybrid strain kind of hybrid strain matter genotype calculating, the hybrid strain rate of cytoplasm hybrid strain kind is to remove It is remaining special miscellaneous in cytoplasm hybrid strain kind after the frequency of 80% and highest 10% minimum special hybrid strain matter genotype The average value of the frequency of strain matter genotype;Wherein, t4 is except cytoplasm hybrid strain The number of except the hybrid strain matter genotype that kind possesses and frequency >=0.17% hybrid strain matter genotype, i4 are except cytoplasm is miscellaneous After all hybrid strain matter genotype except the hybrid strain matter genotype that possesses of strain kind sort from low to high by its frequency, the i-th 4 miscellaneous Strain matter genotype, R4i4 are the frequency of the i-th 4 hybrid strain matter genotype;Int () is bracket function, returns to the whole of the number in bracket Number part;R4 is the hybrid strain rate calculated using the hybrid strain matter genotype possessed except cytoplasm hybrid strain kind, to remove except cell In the frequency for the hybrid strain matter genotype that matter hybrid strain kind possesses after 80% and highest 10% minimum value, remaining value is averaged Value;Int () is bracket function, returns to the integer part of the number in bracket.
Hybrid strain in rape variety to be measured comes from the pollination of the flyings in reproductive process and mixes and mechanical admixture, wherein flies Flower pollination mix be hybrid strain variet complexity main source.It refers to that the pollen of hybrid strain kind passes through the biographies such as wind-force that flyings pollination, which mixes, To rape variety to be measured and the hybrid seed for formation of pollinating, flyings pollination can not possibly introduce cytoplasm, therefore can only cause hybrid strain Karyogene type, hybrid strain rate are 2 times of hybrid strain karyogene type frequency.It is to be measured that mechanical admixture refers to that hybrid strain variety seeds are directly mixed in In rape variety, while nucleus and cytoplasm are introduced, is formed simultaneously hybrid strain karyogene type and hybrid strain matter genotype, hybrid strain Rate should be the frequency of hybrid strain matter genotype.In the calculation formula of hybrid strain rate R, R1+R2 over-evaluates the hybrid strain rate of mechanical admixture It 1 times, needs to correct, the R=R1+R2-R3-R4 after correction.It is a technical barrier to distinguish mechanical admixture with flyings pollination to mix, The present invention solves this problem.
In the calculation formula of hybrid strain rate R, the hybrid strain rate of nucleus hybrid strain kind is all 2 × hybrid strain karyogene type frequency, Its reason is as follows:Diploid or allopolyploid rape are 2 in the test zone of nuclear genome to be copied, therefore, hybrid strain Rate is 2 times of corresponding hybrid strain karyogene type frequency.If having to selection has the test zone of nuclear genome of N parts of copies, Then coefficient should be adjusted to N, if copy number is indefinite, make N=2 processing, if wrong, it will when calculating R, by removing 80% The mode of low extremum excludes them.
In the calculation formula of hybrid strain rate R, merely with 10% of hybrid strain genotype frequency value in centre count It calculates, principle is:The different hybrid strain genotype of same hybrid strain kind are determined by the hybrid strain rate of the hybrid strain kind, so the phase of frequency Prestige value is equal, and the difference between frequency is caused by the error during PCR amplification, high-flux sequence.Pass through hybrid strain gene The definition of type and rape variety standard sample to be measured, substantially eliminate these error values, remove 10% extremum and are enough Remove the test zone that minute quantity deviates true hybrid strain rate.Why minimum 80% is removed, and it is maximum, 10% is only removed, Principle is as follows:(1) worst error source is sequencing mistake, and it is very low that the hybrid strain genotype frequency that mistake generates is sequenced;(2) except In the frequency of hybrid strain genotype except hybrid strain kind, high level is more likely to the common hybrid strain genotype for different hybrid strains, represents True hybrid strain rate.
When rape variety to be measured is nucleo_cytoplasmic interaction sterile line, if being wherein mixed with the corresponding maintainer hybrid strain of the sterile line Kind, then, it is miscellaneous by cytoplasm is detected as since the cytoplasm of the maintainer hybrid strain kind and rape variety to be measured are different Strain kind, but since the nucleus of sterile line and maintainer is just the same, it will not be detected as nucleus hybrid strain kind, because This, the value of R3ic is not calculated in R1+R2, but is calculated in R3i3, therefore, it is necessary to subtract 2 in R3 × R3ic is imitated just.Same reason, when rape variety to be measured is nucleo_cytoplasmic interaction maintainer, it is also desirable to be subtracted in R3 pair 2 × R3ic of the sterile line hybrid strain kind answered is imitated just.Obviously, when rape variety to be measured is neither nucleo_cytoplasmic interaction sterile line When not being nucleo_cytoplasmic interaction maintainer yet, R3ic=0.
In the calculation formula of R2 and R4, it is desirable that frequency >=0.17% of hybrid strain genotype, principle are as follows:Work as database In kind number and detection site when reaching 10000,149 hybrid strain genotype erroneous judgements will be averagely generated, when setting hybrid strain When genotype frequency >=0.17%, probability >=99.98% (projectional technique is shown in Table 2) of the hybrid strain genotype of no erroneous judgement just can be accurate Really calculate the value to R2 and R4.It has been the limit in reality that kind number in database and detection site, which reach 10000, because This, the threshold value of frequency >=0.17% of hybrid strain genotype can be adapted for various situations.The introducing of R2 and R4 so that energy of the present invention It is enough in the case that 0 i.e. no database is supported, to calculate hybrid strain rate R in database kind.
Particularly, if all hybrid strain genotype of hybrid strain kind A are possessed by hybrid strain kind B and other hybrid strain kinds, because And hybrid strain kind A is without special hybrid strain genotype.At this point, when calculating hybrid strain rate R, hybrid strain kind A and hybrid strain kind B are not calculated Hybrid strain rate, and calculate the hybrid strain rate of hybrid strain kind AB.The hybrid strain VDA genotypes of hybrid strain kind AB are:Hybrid strain kind A with it is miscellaneous Hybrid strain genotype common to strain kind B.
The calculation formula of hybrid strain rate R is general formula, and rape variety to be measured generally only mixes a kind of hybrid strain product in reality Kind.
Calculate the hypothesis example of hybrid strain rate R
Table 3 assumes a hybrid strain rate calculated examples, to become apparent from the calculating process for illustrating hybrid strain rate R.
Table 3 is a hypothesis example for calculating hybrid strain rate R
In table 3, nucleus hybrid strain kind total A and B two, so n1=2, cytoplasm hybrid strain kind number only C mono-, so N2=1.By the definition of special hybrid strain karyogene type, the special hybrid strain karyogene type for obtaining hybrid strain kind A is that number is No. 1-10 Hybrid strain karyogene type AA, TT, TCC, GG, AC, TTC, TCCC, GGC, ACC and AG, so, t1=10, they frequency difference It is 0.10%, 1.20%, 0.10%, 0.10%, 0.02%, 0.10%, 0.10%, 0.10%, 0.10% and 0.10%, to this It is R11111=0.02%, R11121=0.02%, R11131 after 10 special hybrid strain karyogene type frequencies sort from low to high =0.10%, R11141=0.10%, R11151=0.10%, R11161=0.10%, R11171=0.10%, R11181= 0.10%, R11191=0.10% and R111101=1.20%.From j 1=Int (0.8 × t1)+1=Int (0.8 × 10)+1 The value of=9 to j 1=t1-Int (0.1 × t1)=10-Int (0.1 × 10)+1=9 R111j1 is R11191=0.10%, So the hybrid strain rate of nucleus hybrid strain kind A isIn the same way, nucleus is obtained The hybrid strain rate of hybrid strain kind B isNucleus hybrid strain kind is obtained as a result, In a similar manner, R2=0.02%, cytoplasm hybrid strain product are obtained The hybrid strain rate of kindR4=0.04%.Therefore, hybrid strain rate R=R1+ in the hypothesis example R2-R3-R4=0.60%+0.02%-0.10%-0.04%=0.48%.
With reference to above-mentioned hypothesis example, the hybrid strain rate R in the present embodiment is calculated:In the present embodiment, hybrid strain kind is only " 430AB " and it is nucleus hybrid strain kind, R2, R3 and R4 are 0, thus, R=R1=R111." 430AB " shares 82 specifically Hybrid strain karyogene type, frequency are:2.50%, 2.53%, 2.53% ... (certain embodiments are shown in Table 1), by the computation rule of R, After the frequency values for removing 80% minimum (65) and 10% (8) of minimum, the average value of remaining 9 frequencies is hybrid strain Rate R=2.52%.
11, using variant sites, variant sites rate and hybrid strain rate, judge specificity, the consistency of rape variety to be measured And stability, method are as follows:
Wherein, SD is threshold value selected when judging specificity, and M is to judge threshold selected when consistency and stability Value.Judge that the method for specific rape variety to be measured, consistency and stability is:When variant sites rate >=SD or non-universal tests Region is there are when variant sites, and rape variety to be measured has specificity, and as variant sites rate < SD and variant sites are not present in When in non-universal test zone, rape variety to be measured does not have specificity;It is to be measured as the hybrid strain of rape variety to be measured rate≤M Rape variety is with uniformity and stability, and when the hybrid strain rate of rape variety to be measured is more than > M, rape variety to be measured does not have Consistency and stability.As M values, SD values be according to the Stringency of breeding level, requirement, label characteristic etc. it is many because Element artificially determines.In the present embodiment, SD selects 1% standard.
In the present embodiment, variant sites rate is 3.68%>Therefore SD=1% it is special to judge that rape variety to be measured has Property;Therefore the 2.52% > M=2% of hybrid strain rate of rape variety to be measured judge that rape variety to be measured does not have consistency and stabilization Property.
Further, after judging specific rape variety to be measured, consistency and stability, the accuracy of judgement is carried out Estimation, method are as follows:
Pure lines new rape variety in the present invention refer to be sheerly genotype as target and conventional kind of selection and breeding, self-mating system, The types such as restorer, maintainer, sterile line.
Specific accuracy calculates:When variant sites are not present in non-universal test zone, if judging rape variety to be measured With specificity, the correct probability >=BINOM.DIST of conclusion ((1-SD) * TRN, TRN, 1-OD, TR UE);If judging oil to be measured Vegetable kind does not have specificity, the correct probability >=BINOM.DIST (SD*TRN, TRN, OD, TRUE) of conclusion, wherein TRN is The number for the test zone that success detects, OD are variant sites rate, and BINOM.DIST is the function in excel 2010, is used Method is identical as the definition in excel 2010, and what is returned is the probability of binomial distribution.What above-mentioned probability actually calculated It is:When judging to have specificity, variant sites rate is more than the probability of SD;When judging not having specificity, variant sites rate Probability less than SD.
In the present embodiment, rape variety to be measured is judged using variant sites rate has specificity, therefore, specificity knot By correct probability >=BINOM.DIST ((1-1%) * 2117,2117,1-3.68%, TRUE)=100.00%, it is seen that this reality The accuracy for applying the special sex determination conclusion of example is very high.
Consistency is calculated with stability accuracy
The correct probability of conclusion of the consistency and stability that judge rape variety to be measured is:When rape variety to be measured has When consistency and stability, correct probability >=BINOM.DIST (M*SN, SN, R, TRUE) * BINOM.DIST (the ∑ SeN* of conclusion M,∑SeN,R,TRUE);When rape variety to be measured does not have consistency and stability, the correct probability of conclusion >= BINOM.DIST ((1-M) * SN, SN, (1-R), TRUE) * BINOM.DIST (∑ SeN* (1-M), ∑ SeN, 1-R, TRUE), In, the summation of the sequencing segment of test zone where genotype frequencies of the ∑ SeN to be useful for calculating hybrid strain rate R, namely go After falling 80% minimum value and 10% maximum value, the total of the test fragment of the test zone for calculating hybrid strain rate is remained With M is to judge threshold value selected when consistency and stability.It is miscellaneous to judge that the accuracy of consistency and stability depends entirely on The accuracy of strain rate, and the positive rate of hybrid strain rate really depends on the accuracy of following three steps:First, rape variety sampling to be measured Accuracy, second, the accuracy of hybrid strain kind is detected from extraction sample, third calculates hybrid strain using the hybrid strain kind of detection The accuracy of rate.Therefore, judge that the accuracy of rape variety consistency and stability to be measured is the product of the above three steps accuracy.By Even in the present invention under the conditions of most stringent of, the accuracy of detection hybrid strain kind also controls 99.9% or more, actually absolutely Major part is close to 100%.For example, in the present embodiment, whole hybrid strain kind detection probabilities are deposited 100.0000% or more Erroneous judgement hybrid strain kind probability below 0.0000% (circular is shown in Table 2).Therefore, judge rape variety to be measured The accuracy of consistency and stability can be estimated as the first step and third step accuracy product, be respectively in above-mentioned formula before The value that latter two function is calculated.For example, the meaning of BINOM.DIST (M*SN, SN, R, TRUE) is:Rape variety to be measured carries out SN sampling, the hybrid strain rate R that is actually pumped are less than the probability of threshold value M;For calculating each of rape variety hybrid strain rate to be measured A sequencing segment, has substantially also quite carried out single sample to rape variety to be measured, therefore, BINOM.DIST (∑ SeN*M, ∑ SeN, R, TRUE) meaning be:SeN sampling of ∑ is carried out to rape variety to be measured, the hybrid strain rate R being actually pumped is less than threshold The probability of value M.
In the present embodiment, after the hybrid strain genotype frequency for removing minimum 80% and maximum 10%, 9 hybrid strain genes are shared Type frequency be used to calculate hybrid strain rate R, and the sequencing segment sum of their corresponding test zones is 89091, so ∑ SeN= 89091, also that is, having carried out 89091 sampling again to 8000 samples being pumped, the error of such big amount of sampling is Fairly small.In the present embodiment, judge that rape variety to be measured does not have consistency and stability, therefore, the judgement conclusion is correct Probability >=BINOM.DIST ((1-M) * S N, SN, (1-R), TRUE) * BINOM.DIST (∑ SeN* (1-M), ∑ SeN, 1-R, TRUE)=BINOM.DIS T ((1-2%) * 8000,8000, (1-2.52%), TRUE) * BINOM.DIST (89091* (1- 2%), 89091,1-2.52%, TRUE)=99.90%.As it can be seen that consistency and stability of this implementation to rape variety to be measured Judgement be also very accurately.
Result verification
It presses《New variety of plant specificity, consistency and stability test guide-cabbage type rape》In method plantation simultaneously Observe rape variety to be measured and its approximate kind " 430AB ", find rape variety to be measured in multiple characters such as plant height with it is approximate There are notable differences for kind.《New variety of plant specificity, consistency and stability test guide-cabbage type rape》Middle regulation: At least when there is apparent and reproducible difference with approximate kind in a character, you can judge the rape variety to be measured of application Has specificity.Therefore, judge that rape variety to be measured has specificity.During the experiment, 200 plants of rapes to be measured have been planted altogether Kind and approximate kind (100 plants of cells, totally 2 repetitions) find 12 plants of special-shaped strains,《New variety of plant is specific, consistent Property and stability test guide-cabbage type rape》Middle regulation:When observation sample is 200 plants, 7 plants of abnormal shapes are at most allowed for Thus strain judges that rape variety to be measured does not have consistency.Since rape variety to be measured has not had consistency, There can not possibly be stability.Thus judge, rape variety to be measured does not have stability yet.Shown by testing above:This implementation It is correct to the judgement of the specificity of rape variety to be measured, stability and consistency in example.
The embodiment of the present invention is expanded by high-flux sequence and multidigit point, realizes the large sample sampling of rape variety to be measured It samples with the large sample of inter-species individual test zone, recycles and define hybrid strain genotype, define cytoplasm hybrid strain kind and definition The comprehensive means such as hybrid strain rate calculation formula, successfully realize it is accurate, quick, completely judge the special of rape variety to be measured Property, the target of stability and consistency, have the technical effect that existing DUS test methods were all not achieved.Existing molecule DUS detections Technology such as chip only detects fixed test zone, cannot flexibly select non-universal test zone according to case.And the present invention detects Be PCR product, non-universal test zone can be detected easily according to case flexible design primer.In addition, the present invention is real Example is applied for 8000 individual amount of sampling for traditional DUS measuring technologies, work is big, can not complete, for example, field In DUS tests, 8000 plants of rapes of sampling need plantation 2 mu or more, and need to plant 2 years, and annual every plant of rape need to investigate it is multiple Character.In widely used SSR molecules DUS tests, need to do 8000 DNA extractions respectively, 8000*2302 PCR with 8000*2302 times PCR product detects (assuming that as the present embodiment, having detected 2302 universal test regions).Therefore, because Workload is excessive, has molecule DUS test and does not all have measuring stability and consistency, although field DUS tests detection consistency and Stability, but sampling samples amount all at 1000 plants hereinafter, and the present embodiment has been sampled 8000 plants of rapes, accuracy is obviously more It is high.Why the present embodiment can increase amount of sampling, be to be used as a sample process after all being mixed because of all 8000 samples, With field DUS test and comparisons, workload, which is equivalent to, is reduced to 1/8000;Further, all 2302 universal test regions are all It only does mixed once amplification and a high-flux sequence detects, with SSR molecule DUS test and comparisons, workload, which is equivalent to, to be reduced to 1/(8000*2302).Therefore, the present invention realizes large sample and more site primers, makes in the case where workload significantly mitigates DUS tests are not only accurate but also simple.Database variety and genetype is base composition, scale-of-ten in the embodiment of the present invention simultaneously Standard detects same breed in the present inventive method under different experimental conditions, and identical genotype can be obtained, thus, no Need under different conditions repeat DUS test, therefore, the embodiment of the present invention can directly compared with database variety and genetype, Objectively select the approximate kind of rape variety to be measured.And existing DUS measuring technologies are not up to standard, it is parallelly right simultaneously to need Rape variety to be measured carries out DUS tests with approximate kind, just reliable conclusion can be obtained, in order to mitigate workload, it has to by By kind, power applicant provides approximate kind, if approximate kind mistake, there may be the legal consequences of erroneous grants.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims (8)

1. a kind of specificity, the method for consistency and stability of test pure lines new rape variety, which is characterized in that the method Including:
Obtain the variant sites between different rape varieties;
Determine that the test zone of rape variety to be measured, the test zone include universal test region by the variant sites, At least partly described variant sites are included in the universal test region, pass through discriminationCalculate discrimination Value, wherein a be make a variation window area in be detected kind sum, bi be the variation window area in i-th kind of gene The kind number of type, and bi>1, k is the number for including the genotype more than a kind, and the variation window area is with each list Centered on nucleotide diversity site, respectively extend 1/2 conduct of sequence length to be measured to the both sides in the single nucleotide variations site The window of detection;The universal test region is described on discrimination on cytoplasmic skeleton big region or nuclear genome The maximum 6000 variation windows of discrimination and the maximum 100 variations windows of discrimination in the cytoplasmic skeleton, wherein The genotype is the combination in multiple single nucleotide variations sites in the test zone;
The database of genotype of the structure comprising the different rape varieties in all test zones;
After the amount of sampling SN for determining the rape variety to be measured, random sampling mixes and extracts the DNA of mixing sample;
The primer for expanding the test zone is prepared, the primer includes universal test region primer;
It is expanded using the DNA of mixing sample described in the primer pair, obtains the amplified production of the test zone, the expansion Volume increase object is for building high-throughput sequencing library;
High-flux sequence is carried out to the high-throughput sequencing library, obtains sequencing segment group, the depth CF of the high-flux sequence Meet following condition:BINOM.DIST (10,10, BINOM.DIST (8,20, BINOM.DIST (0, CF, 0.1%, TRUE), TRUE), FALSE) >=99.9%, 1-BINOM.DIST (10000,10000,1-BINOM.DIST (8,20,1-BINOM.DIST (99.99%*CF, CF, 99.9989%, TRUE), TRUE), FALSE)≤0.1% and BINOM.DIST (10* (1-M) * CF, 10*CF, 1-110%*M, TRUE) >=95.0%, wherein CF is the depth of the high-flux sequence, and M is to judge the consistency With threshold value selected when stability, BINOM.DIST is the function in excel 2010, the depth CF of the high-flux sequence The condition meaning of satisfaction is:Hybrid strain rate down to 0.1%, the hybrid strain kind be 10 and the hybrid strain kind and the rape Under conditions of averagely only having 20 difference sites between kind, described in the detection whole that determined by the depth CF of the high-flux sequence Probability >=99.9% of hybrid strain kind;It is 10000 and the hybrid strain kind and the rape product in the kind of the database Under conditions of inter-species averagely only has 20 difference sites, the presence erroneous judgement determined by the depth CF of the high-flux sequence is described miscellaneous Probability≤0.1% of strain kind;It is selected when the hybrid strain kind is 10 and true hybrid strain rate exceeds only judgement specificity Threshold value 10% when, the judgement conclusion to stability and consistency determined by the depth CF of the high-flux sequence is correct Probability >=95.0%;
The sequencing segment group is analyzed, rape variety genotype and hybrid strain genotype to be measured are obtained;
By the rape variety genotype to be measured compared with the genotype of the different cultivars in the database, described in acquisition Approximate kind, variant sites and the variant sites rate of rape variety to be measured;
By the hybrid strain genotype compared with the genotype of the different cultivars in the database, after obtaining hybrid strain kind, Calculate hybrid strain rate;
Using the variant sites, the variant sites rate and the hybrid strain rate, judge the rape variety to be measured specificity, Consistency and stability.
2. according to the method described in claim 1, it is characterized in that, the amount of sampling SN meets following condition:BINOM.INV (SN, M, 0.95)/SN≤1.15*M, wherein BINOM.INV be excel 2010 in function, M be judge the consistency with Selected threshold value when stability, the condition meaning that the amount of sampling SN meets are:Even if the hybrid strain rate only exceeds threshold value M's 15%, the amount of sampling can correctly judge the stability and consistency of the rape variety to be measured in the case where 95% probability ensures.
3. according to the method described in claim 1, it is characterized in that, the test zone further includes non-universal test zone, institute It further includes non-universal test zone primer to state primer.
4. according to the method described in claim 3, it is characterized in that, the non-universal test zone primer include the first primer and Second primer, the first primer include the first forward primer and the first reverse primer, and second primer includes second positive Primer and the second reverse primer, the first primer and second primer carry out respectively individually expand obtain two it is described non-through With the amplified production of test zone, it is used for the amplified production mixed in equal amounts of two non-universal test zones to build independent expansion The high-throughput sequencing library of increasing;
5 ' end connections of first forward primer are just like SEQ ID NO in sequence table:Sequence 1 shown in 1, described first is reversed 5 ' end connections in primer are just like SEQ ID NO in sequence table:Sequence 2 shown in 2;
5 ' end connections of second forward primer are just like SEQ ID NO in sequence table:Sequence 2 shown in 2, described second is reversed 5 ' end connections of primer are just like SEQ ID NO in sequence table:Sequence 1 shown in 1.
5. according to the method described in claim 3, it is characterized in that, utilizing the variant sites, the variant sites rate and institute Hybrid strain rate is stated, judges that the method for the rape variety specificity to be measured, consistency and stability includes:
When the variant sites rate >=non-universal test zones of SD or described are there are when the variant sites, the rape product to be measured Kind has specificity, and as the variant sites rate < SD and the variant sites are not present in the non-universal test zone When, the rape variety to be measured does not have specificity, threshold value selected when being judgement specificity wherein SD;
As the hybrid strain rate≤M of the rape variety to be measured, the rape variety to be measured is with uniformity and stability, when When the hybrid strain rate of the rape variety to be measured is more than > M, the rape variety to be measured does not have consistency and stability, M Selected threshold value when to judge the consistency and stability;
The hybrid strain rate R=R1+R2-R3-R4, wherein:
Wherein, n1 is the number of nucleus hybrid strain kind, t1 the The number of all special hybrid strain karyogene types of i1 nucleus hybrid strain kinds, i1j1 are the i-th 1 nucleus hybrid strains After all special hybrid strain karyogene types of kind sort from low to high by frequency, the special hybrid strain karyogene of jth 1 Type, R1i1j1 are the frequency of the i-th 1j1 special hybrid strain karyogene types;R1 is described thin to be calculated by hybrid strain karyogene type The summation of the hybrid strain rate of karyon hybrid strain kind, the hybrid strain rate of the nucleus hybrid strain kind are to remove the nucleus It is remaining described special in hybrid strain kind after the frequency of the special hybrid strain karyogene type of minimum 80% and highest 10% 2 times of the average value of the frequency of hybrid strain karyogene type;
Wherein, t2 is the hybrid strain core base possessed except the nucleus hybrid strain kind Because of the number of the hybrid strain karyogene type except type and frequency >=0.17%, i2 is except the nucleus hybrid strain kind possesses The hybrid strain karyogene type except all hybrid strain karyogene types sort from low to high by frequency after, the i-th 2 are described miscellaneous Strain karyogene type, R2i2 are the frequency of the i-th 2 hybrid strain karyogene types;R2 is using except the nucleus hybrid strain kind is gathered around The hybrid strain rate that the hybrid strain karyogene type that has calculates, R2 be remove except the nucleus hybrid strain kind possess it is described miscellaneous In the frequency of strain karyogene type after 80% and highest 10% minimum value, 2 times of the average value of remaining value;
Wherein,n2 For the number of cytoplasm hybrid strain kind, R3i3 is the hybrid strain rate of the i-th 3 cytoplasm hybrid strain kinds, R3ic i3= The value of R3i3 when ic, ic are the corresponding guarantor when the rape variety to be measured is nucleo_cytoplasmic interaction sterile line or maintainer It holds and is or the cytoplasm hybrid strain kind of the sterile line, t3 is all special miscellaneous of the i-th 3 cytoplasm hybrid strain kinds The number of strain matter genotype, i3j3 are all special hybrid strain matter genotype of the i-th 3 cytoplasm hybrid strain kinds by frequency After rate sorts from low to high, the special hybrid strain matter genotype of jth 3, R3i3j3 is the i-th 3j3 special hybrid strain matter bases Because of the frequency of type, R3ic refers to the hybrid strain rate for being mixed into the maintainer in the sterile line or is mixed into described in the maintainer The hybrid strain rate of sterile line;R3 is the summation of the hybrid strain rate of the cytoplasm hybrid strain kind calculated by hybrid strain matter genotype, The hybrid strain rate of the cytoplasm hybrid strain kind is to remove minimum 80% and highest 10% in the cytoplasm hybrid strain kind After the frequency of the special hybrid strain matter genotype, the average value of the frequency of the remaining special hybrid strain matter genotype;
Wherein, t4 is the hybrid strain possessed except the cytoplasm hybrid strain kind The number of the hybrid strain matter genotype except matter genotype and frequency >=0.17%, i4 are except the cytoplasm hybrid strain kind After all hybrid strain matter genotype except the hybrid strain matter genotype possessed sort from low to high by frequency, the i-th 4 institutes State hybrid strain matter genotype, R4i4 is the frequency of the i-th 4 hybrid strain matter genotype;R4 is using except the cytoplasm hybrid strain product The hybrid strain rate that the hybrid strain matter genotype that kind possesses calculates, R4 is to remove the institute possessed except the cytoplasm hybrid strain kind After stating 80% and highest 10% value minimum in the frequency of hybrid strain matter genotype, the average value of remaining value;
Int () is bracket function;
The nucleus hybrid strain kind refers to calculating the hybrid strain kind obtained, the cytoplasm hybrid strain merely with karyogene type Kind refers to calculating the hybrid strain kind obtained merely with matter genotype;The special hybrid strain karyogene type refers to only one All hybrid strain karyogene types of the nucleus hybrid strain kind;The special hybrid strain matter genotype refers to only described in one All hybrid strain matter genotype of cytoplasm hybrid strain kind;The hybrid strain karyogene type refers to that the hybrid strain genotype is described Karyogene type, the karyogene type refer to the genotype and are located on nuclear genome;The hybrid strain matter genotype refers to described Hybrid strain genotype is the matter genotype, and the matter genotype refers to that the genotype is located on cytoplasmic skeleton.
6. according to the method described in claim 5, it is characterized in that, the method further includes being waited for described in judgement in the following ways The correct probability of conclusion of consistency and stability for surveying rape variety is:When the rape variety to be measured is with uniformity and steady When qualitative, correct probability >=BINOM.DIST (M*SN, SN, R, TRUE) * BINOM.DIST of conclusion (∑ SeN*M, ∑ SeN, R, TRUE);When the rape variety to be measured does not have the consistency and stability, the correct probability >=BINOM.DIST of conclusion ((1-M)*SN,SN,(1-R),TRUE)*BINOM.DIST(∑SeN*(1-M),∑SeN,1-R,TRUE);Wherein, ∑ SeN is The summation of the sequencing segment of the test zone, M are where being useful for the frequency for calculating the genotype of the hybrid strain rate R Judge that threshold value selected when the consistency and stability, BINOM.DIST (M*SN, SN, R, TRUE) are the rape to be measured Kind has carried out SN sampling, and the hybrid strain rate R being actually pumped is less than the probability of the threshold value M, BINOM.DIST (∑ SeN* M, ∑ SeN, R, TRUE) meaning be:SeN sampling of ∑ is carried out to the rape variety to be measured, what is be actually pumped is described miscellaneous Strain rate R is less than the probability of threshold value M.
7. according to the method described in claim 5, it is characterized in that, when the change dystopy is not present in the non-universal test zone When point, if it is specific to judge that the rape variety to be measured has, the correct probability >=BINOMDIST of conclusion ((1-SD) * TRN, TRN,1-OD,TRUE);If judging, the rape variety to be measured does not have specificity, the correct probability >=BINOMDIST of conclusion (SD*TRN, TRN, OD, TRUE), wherein TRN is the number for detecting successful test zone, and OD is the variant sites rate, SD Selected threshold value when to judge specificity, BINOMDIST are the function in excel 2010, the correct probability tables of conclusion It is shown as when judging that the rape variety to be measured has specificity, the variant sites rate is more than the probability of SD, described in judgement When rape variety to be measured does not have specificity, the variant sites rate is less than the probability of SD, the successful test zone of detection By being obtained after analyzing the sequencing segment group.
8. according to the method described in claim 1, it is characterized in that, the method for obtaining the hybrid strain kind includes:The hybrid strain Kind is the kind being present in the database, and the potential hybrid strain genotype of the hybrid strain kind and the hybrid strain genotype Between to have the number of the test zone of phase homogenic type to account for the hybrid strain kind described with the potential hybrid strain genotype Ratio >=60% of the sum of test zone;The hybrid strain genotype refers to the potential hybrid strain genotype of frequency >=0.02%;
Quantity >=2 of distinguishing base between the potential hybrid strain genotype and all genotype of the rape variety to be measured or There are insertion or the missing of discontinuous base in the distinguishing base.
CN201510148702.9A 2015-03-31 2015-03-31 A method of specificity, consistency and the stability of test pure lines new rape variety Active CN104846077B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510148702.9A CN104846077B (en) 2015-03-31 2015-03-31 A method of specificity, consistency and the stability of test pure lines new rape variety

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510148702.9A CN104846077B (en) 2015-03-31 2015-03-31 A method of specificity, consistency and the stability of test pure lines new rape variety

Publications (2)

Publication Number Publication Date
CN104846077A CN104846077A (en) 2015-08-19
CN104846077B true CN104846077B (en) 2018-11-13

Family

ID=53846060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510148702.9A Active CN104846077B (en) 2015-03-31 2015-03-31 A method of specificity, consistency and the stability of test pure lines new rape variety

Country Status (1)

Country Link
CN (1) CN104846077B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102747163A (en) * 2012-07-24 2012-10-24 江苏省农业科学院 Method for identifying maize intercross species by using molecular marker
WO2014048062A1 (en) * 2012-09-28 2014-04-03 未名兴旺系统作物设计前沿实验室(北京)有限公司 Snp loci set and usage method and application thereof
CN104328507A (en) * 2014-10-11 2015-02-04 中国水稻研究所 SNP chip used for identifying rice variety, preparation method and application

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102747163A (en) * 2012-07-24 2012-10-24 江苏省农业科学院 Method for identifying maize intercross species by using molecular marker
WO2014048062A1 (en) * 2012-09-28 2014-04-03 未名兴旺系统作物设计前沿实验室(北京)有限公司 Snp loci set and usage method and application thereof
CN104328507A (en) * 2014-10-11 2015-02-04 中国水稻研究所 SNP chip used for identifying rice variety, preparation method and application

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Evaluation of the use of high-density SNP genotyping to implement UPOV Model 2 for DUS testing in barley;Huw Jones et al;《Theor. Appl. Genet.》;20121212;第126卷;901-911 *

Also Published As

Publication number Publication date
CN104846077A (en) 2015-08-19

Similar Documents

Publication Publication Date Title
CN104846076B (en) A method of specificity, consistency and the stability of measurement cross-bred rape new varieties
CN104480205B (en) Method of establishing animal paternity identification system on basis of whole genome STR
CN106834507A (en) DMD gene traps probe and its application in DMD detection in Gene Mutation
CN107368706A (en) Sequencing data interpretation of result method and apparatus, sequencing library structure and sequence measurement
CN104830975A (en) Novel method for testing corn parent source authenticity and proportion
US11739374B2 (en) Methods and compositions for pathogen detection in plants
CN108517368A (en) The method and system of Chinese white poplar LncRNA Pto-CRTG and its target gene Pto-CAD5 interactions are parsed using epistasis
CN104293892A (en) Method of detecting phenotypic character related genes in nuclear genome
CN117106967A (en) Functional KASP molecular marker of rice blast resistance gene and application thereof
CN104805187B (en) A kind of method of the specificity for testing pure lines new soybean varieties, uniformity and stability
CN104805191B (en) A kind of method of the specificity for testing pure lines corn variety, uniformity and stability
CN104846077B (en) A method of specificity, consistency and the stability of test pure lines new rape variety
CN104805182B (en) A kind of method for the specificity, uniformity and stability for determining new hybrid rice varieties
CN104805184B (en) A kind of method of the specificity for testing pure lines new rice variety, uniformity and stability
CN105624298A (en) Method for detecting genetically modified components of rape
CN104805190B (en) A kind of method of the specificity for determining hybrid maize variety, uniformity and stability
CN104805189B (en) A kind of method of the specificity for determining hybrid plant new varieties, uniformity and stability
CN109182505A (en) Mastadenitis of cow key SNPs site rs75762330 and 2b-RAD Genotyping and analysis method
CN104573409B (en) The multiple check method of the assignment of genes gene mapping
CN108866225B (en) Screening method for genetic background of genetically modified rice
CN104805186B (en) A kind of method for testing corn variety substance derived relation
CN109913575A (en) A kind of KASP molecular labeling, kit and its application for identifying capsicum CMS fertility restorer gene
CN104805185B (en) A kind of method of test plants kind substance derived relation
CN104805188B (en) A kind of method for testing soybean varieties substance derived relation
CN104805195A (en) Novel method for testing rice parental source authenticity and proportion of rice parental source

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant