CN104805189B - A kind of method of the specificity for determining hybrid plant new varieties, uniformity and stability - Google Patents
A kind of method of the specificity for determining hybrid plant new varieties, uniformity and stability Download PDFInfo
- Publication number
- CN104805189B CN104805189B CN201510150504.6A CN201510150504A CN104805189B CN 104805189 B CN104805189 B CN 104805189B CN 201510150504 A CN201510150504 A CN 201510150504A CN 104805189 B CN104805189 B CN 104805189B
- Authority
- CN
- China
- Prior art keywords
- hybrid strain
- genotype
- measured
- hybrid
- rate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/6895—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Immunology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Botany (AREA)
- Mycology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a kind of method of specificity for determining hybrid plant new varieties, uniformity and stability.This method includes:Obtain variant sites;Determine the test zone of kind to be measured;Build database;After determining amount of sampling, random sampling mixes and extracts the DNA of mixing sample;Prepare primer;Expanded using the DNA of primer pair mixing sample, amplified production is used to build high-throughput sequencing library;High-flux sequence is carried out to high-throughput sequencing library, obtains that fragment group is sequenced;Analysis sequencing fragment group, obtains variety and genetype to be measured and hybrid strain genotype;Compare and obtain approximate kind, variant sites and variant sites rate;By hybrid strain genotype compared with the genotype in database, after obtaining hybrid strain kind, hybrid strain rate is calculated;Using variant sites, variant sites rate and hybrid strain rate, varietY specificity, uniformity and stability to be measured are judged.Methods described can accurately, intactly judge the specificity, stability and uniformity of kind to be measured, and test speed is faster.
Description
Technical field
The present invention relates to biological technical field, more particularly to a kind of specificity, uniformity for determining hybrid plant new varieties
With the method for stability.
Background technology
As a kind of intellectual property of specialization, new variety of plant has become a company and competing to a national core
Strive power.The solution that new variety of plant authorizes account and relative legal problems is tested dependent on DUS, i.e. the specificity to kind to be measured
(Distinctness), the field trapping test or molecules inside of uniformity (Uniformity) and stability (Stability)
Marker Identification.Field trapping test flow is:Kind to be measured is planted in field simultaneously with approximate kind, in 2 years and the life of the above
In long season, their multiple characters are observed, the significance of difference of the kind to be measured with approximate kind are judged according to trait expression, i.e.,
Specificity, while judge hybrid strain ratio in colony, i.e. uniformity and stability;The flow of molecules inside Marker Identification is:Divide single
DNA of the kind to be measured with each sample in approximate kind is extracted in strain, and enters performing PCR to each test zone of each sample respectively
(Polymerase Chain Reaction, polymerase chain reaction), and electrophoresis or generation sequencing inspection are carried out to each PCR primer
Survey, according to testing result, obtain the difference site ratio of kind to be measured and approximate kind, according to difference site ratio, judge to treat
Survey the specificity of kind.
The shortcomings that field trapping test is:Cycle is long, workload is big, environmental impact shape, causes to judge inaccuracy.It is indoor
The shortcomings that molecular markers for identification is:Need to handle each test zone of each sample respectively, workload is big, it is impossible to sample with
Test zone bulk sampling, hybrid strain rate can not be calculated, thus the test of stability and uniformity can not be carried out.Field trapping test
Common drawback with molecules inside Marker Identification is:Due to workload is big, can not from existing kind objective selection
Approximate kind, applicant can only be weighed by kind and provided, and the approximate product provided based on the motivations such as commercial interest, kind power applicant
Kind may be untrue, so as to cause the legal consequence of wrong kind mandate.
The content of the invention
In order to solve the problems of the prior art, the embodiments of the invention provide a kind of spy for determining hybrid plant new varieties
The method of the opposite sex, uniformity and stability.The technical scheme is as follows:
The embodiments of the invention provide the side of a kind of specificity for determining hybrid plant new varieties, uniformity and stability
Method, methods described include:
Obtain the variant sites in kind between different cultivars belonging to kind to be measured;
The test zone of the kind to be measured is determined by the variant sites, the test zone includes universal test area
Domain, at least partly described variant sites are included in the universal test region;
The database of genotype in all test zones of the structure comprising the different cultivars;
After the amount of sampling SN for determining the kind to be measured, random sampling mixes and extracts the DNA of mixing sample;
The primer for expanding the test zone is prepared, the primer includes universal test region primer;
Expanded using the DNA of mixing sample described in the primer pair, obtain the amplified production of the test zone, institute
Amplified production is stated to be used to build high-throughput sequencing library;
High-flux sequence is carried out to the high-throughput sequencing library, obtains that fragment group is sequenced;
The sequencing fragment group is analyzed, obtains variety and genetype to be measured and hybrid strain genotype;
By the variety and genetype to be measured compared with the genotype of the different cultivars in the database, described in acquisition
Approximate kind, variant sites and the variant sites rate of kind to be measured;
By the hybrid strain genotype compared with the genotype of the different cultivars in the database, hybrid strain kind is obtained
Afterwards, hybrid strain rate is calculated;
Using the variant sites, the variant sites rate and the hybrid strain rate, the varietY specificity to be measured, one are judged
Cause property and stability.
Specifically, the amount of sampling SN meets following condition:BINOM.INV (SN, M, 0.95)/SN≤1.15*M, wherein
BINOM.INV is the function in excel 2010, and M is described to take out to judge threshold value selected when the uniformity and stability
Sample amount SN meet condition implication be:Even if the hybrid strain rate only exceeds the 15% of the judgment threshold M of uniformity and stability, institute
Amount of sampling is stated in the case where 95% probability ensures, can correctly judge the stability and uniformity of the kind to be measured.
Specifically, the depth CF of the high-flux sequence meets following condition:BINOM.DIST(10,10,BI
NOM.DIST (8,20, BINOM.DIST (0, CF, 0.1%, TRUE), TRUE), FALSE) >=99.9%, 1-BIN OM.DIST
(10000,10000,1-BINOM.DIST (8,20,1-BINOM.DIST (99.99%*CF, CF, 99.9989%, TRUE),
TRUE), FALSE)≤0.1% and BINOM.DIST (10* (1-M) * CF, 10*CF, 1-110%*M, TRUE) >=95.0%, its
In, CF is the depth of the high-flux sequence, and M is to judge threshold value selected when the uniformity and stability,
BINOM.DIST is the function in excel 2010, and the condition implication that the depth CF of the high-flux sequence meets is:Described
Hybrid strain rate as little as 0.1%, the hybrid strain kind are 10 and the hybrid strain kind averagely only has 20 with the product to be tested inter-species
Under conditions of difference site, the detection that is determined by the depth CF of the high-flux sequence all the hybrid strain kinds probability >=
99.9%;Averagely only have 20 with the product to be tested inter-species in the kind of the database for 10000 and the hybrid strain kind
Under conditions of difference site, the presence that is determined by the depth CF of the high-flux sequence judge by accident the probability of the hybrid strain kind≤
0.1%;When the hybrid strain kind is that 10 and true hybrid strain rate exceed only the 10% of threshold value selected when judging specific,
Correct probability >=95.0% of the judgement conclusion to stability and uniformity determined by the depth CF of the high-flux sequence.
Specifically, the test zone also includes non-universal test zone, and the primer also includes non-universal test zone
Primer.
Further, the non-universal test zone primer includes the first primer and the second primer, the first primer bag
The first forward primer and the first reverse primer are included, second primer includes the second forward primer and the second reverse primer, described
First primer and second primer carry out individually amplification and obtain the amplified production of two non-universal test zones respectively, will
The amplified production mixed in equal amounts of two non-universal test zones is used to build the high-throughput sequencing library individually expanded;
5 ' end connections of first forward primer are just like SEQ ID NO in sequence table:Sequence 1 shown in 1, described first
5 ' end connections in reverse primer are just like SEQ ID NO in sequence table:Sequence 2 shown in 2;
5 ' end connections of second forward primer are just like SEQ ID NO in sequence table:Sequence 2 shown in 2, described second
5 ' end connections of reverse primer are just like SEQ ID NO in sequence table:Sequence 1 shown in 1.
Further, using the variant sites, the variant sites rate and the hybrid strain rate, the kind to be measured is judged
The method of specificity, uniformity and stability includes:
When the variant sites be present in the variant sites rate >=non-universal test zones of SD or described, the product to be tested
Kind there is specificity, as the variant sites rate < SD and when the variant sites are not present in the non-universal test zone,
For the kind to be measured without specificity, wherein SD is threshold value selected when judging specific;
As the hybrid strain rate≤M of the kind to be measured, the kind to be measured has uniformity and stability, when described
When the hybrid strain rate of kind to be measured is more than > M, the kind to be measured does not have uniformity and stability, and M is to judge described one
Selected threshold value when cause property and stability;
The hybrid strain rate R=R1+R2-R3-R4+Rm, wherein:
Wherein, n1 be nucleus hybrid strain kind number, t1
For the number of all special hybrid strain karyogene types of the i-th 1 nucleus hybrid strain kinds, i1j1 is the i-th 1 nucleus
After all special hybrid strain karyogene types of hybrid strain kind sort from low to high by frequency, the special hybrid strain core base of jth 1
Because of type, R1i1j1 is the frequency of the i-th 1j1 special hybrid strain karyogene types;R1 is as described in calculating hybrid strain karyogene type
The summation of the hybrid strain rate of nucleus hybrid strain kind, the hybrid strain rate of the nucleus hybrid strain kind are to remove the nucleus hybrid strain product
In kind after the frequency of the special hybrid strain karyogene type of minimum 80% and highest 10%, the remaining special hybrid strain core
2 times of the average value of the frequency of genotype;
Wherein, t2 be possess except the nucleus hybrid strain kind described in
The number of the hybrid strain karyogene type outside hybrid strain karyogene type and frequency >=0.17%, i2 are except the nucleus hybrid strain
After all hybrid strain karyogene types outside the hybrid strain karyogene type that kind possesses sort from low to high by frequency, the i-th 2
The individual hybrid strain karyogene type, R2i2 are the frequency of the i-th 2 hybrid strain karyogene types;R2 is utilized except the nucleus is miscellaneous
The hybrid strain rate that the hybrid strain karyogene type that strain kind possesses calculates, R2 are to remove except the nucleus hybrid strain kind possesses
The hybrid strain karyogene type frequency in minimum 80% and highest 10% value after, 2 times of the average value of surplus value;
Wherein, n2 is the number of cytoplasm hybrid strain kind,
R3i3 is the hybrid strain rate of the i-th 3 cytoplasm hybrid strain kinds, and t3 is all special of the i-th 3 cytoplasm hybrid strain kinds
The number of hybrid strain matter genotype, i3j3 are that all special hybrid strain matter genotype of the i-th 3 cytoplasm hybrid strain kinds are pressed
After frequency sorts from low to high, the special hybrid strain matter genotype of jth 3, R3i3j3 is the i-th 3j3 special hybrid strain matter
The frequency of genotype;R3 is the summation of the hybrid strain rate of the cytoplasm hybrid strain kind calculated by hybrid strain matter genotype, described thin
The hybrid strain rate of kytoplasm hybrid strain kind is to remove the spy of 80% and highest 10% minimum in the cytoplasm hybrid strain kind
After the frequency of different hybrid strain matter genotype, the average value of the frequency of the remaining special hybrid strain matter genotype;
Wherein, t4 be possess except the cytoplasm hybrid strain kind described in
The number of the hybrid strain matter genotype outside hybrid strain matter genotype and frequency >=0.17%, i4 are except the cytoplasm hybrid strain
After all hybrid strain matter genotype outside the hybrid strain matter genotype that kind possesses sort from low to high by frequency, the i-th 4
The individual hybrid strain matter genotype, R4i4 are the frequency of the i-th 4 hybrid strain matter genotype;R4 is utilized except the cytoplasm is miscellaneous
The hybrid strain rate that the hybrid strain matter genotype that strain kind possesses calculates, R4 is to remove the institute possessed except the cytoplasm hybrid strain kind
After the value for stating 80% and highest 10% minimum in the frequency of hybrid strain matter genotype, the average value of surplus value;
Wherein, t5 is the number of the special test zone of hybrid;I5 is described miscellaneous for the i-th 5
The special test zone of kind;Rmi5 is in the i-th 5 special test zone of hybrid, the frequency of female genotype;Rfi5 is the i-th 5
In the individual special test zone of the hybrid, the frequency of male parent gene type;Rm is the hybrid strain rate of maternal selfing, and Rm is that the hybrid is special
In different test zone, the average value of the frequency of the female genotype and the difference of the frequency of the male parent gene type;
Int () is bracket function;
The nucleus hybrid strain kind refers to calculate the hybrid strain kind obtained, the cytoplasm merely with karyogene type
Hybrid strain kind refers to calculate the hybrid strain kind obtained merely with matter genotype;The special hybrid strain karyogene type refers to
All hybrid strain karyogene types of one nucleus hybrid strain kind;The special hybrid strain matter genotype refers to only one
All hybrid strain matter genotype of the cytoplasm hybrid strain kind;The hybrid strain karyogene type refers to that the hybrid strain genotype is
The karyogene type;The hybrid strain matter genotype refers to that the hybrid strain genotype is the matter genotype;It is special in the hybrid
In test zone, the female genotype differs with the male parent gene type, the female genotype and all cells
The genotype of core hybrid strain kind is different, and the genotype of the male parent gene type and all nucleus hybrid strain kinds is not yet
Together;The female genotype is the genotype identical genotype with female parent in the kind to be measured;The male parent gene type is
In the kind to be measured, the genotype identical genotype with male parent;
The karyogene type refers to the genotype on nuclear genome;The matter genotype refers to be located at cytoplasm base
Because of the genotype in group.
Further, methods described also includes the uniformity and stability for judging the kind to be measured in the following ways
The correct probability of conclusion is:When the kind to be measured has uniformity and stability, the correct probability of conclusion >=
BINOM.DIST(M*SN,SN,R,TRUE)*BINOM.DIST(ΣSeN*M,ΣSeN,R,TRUE);When the kind to be measured not
With the uniformity and during stability, the correct probability >=BINOM.DIST of conclusion ((1-M) * SN, SN, (1-R), TRUE) *
BINOM.DIST(ΣSeN*(1-M),ΣSeN,1-R,TRUE);Wherein, M is selected when the uniformity and stability to judge
Threshold value, Σ SeN are all sequencings for being used for the test zones where calculating the frequencies of the genotype of the hybrid strain rate R
The summation of fragment, BINOM.DIST (M*SN, SN, R, TRUE) are that the kind to be measured has carried out SN sampling, are actually pumped
The hybrid strain rate R is less than the probability of the threshold value M, and BINOM.DIST (Σ SeN*M, Σ SeN, R, TRUE) meaning is:To institute
State kind to be measured and carried out SeN sampling of Σ, the hybrid strain rate R being actually pumped is less than threshold value M probability;BINOM.DIST
((1-M) * SN, SN, (1-R), TRUE) has carried out SN time for the kind to be measured and sampled, and the hybrid strain rate R being actually pumped is big
In the probability of the threshold value M, BINOM.DIST (Σ SeN* (1-M), Σ SeN, 1-R, TRUE) meaning is:To the product to be tested
Kind has carried out SeN sampling of Σ, and the hybrid strain rate R being actually pumped is more than threshold value M probability, and the frequency of the genotype refers to
In the sequencing fragment group, the sequencing segments for representing the genotype accounts for the sequencing of the test zone where the genotype
The ratio of fragment sum.
Further, when the variant sites are not present in the non-universal test zone, if judging the kind to be measured
With specificity, the correct probability >=BINOM.DIST of conclusion ((1-SD) * TRN, TRN, 1-OD, TR UE);If treated described in judging
Kind is surveyed without specific, the correct probability >=BINOM.DIST (SD*TRN, TRN, OD, TRUE) of conclusion, wherein, TRN is
Detect the number of successful test zone, OD is the variant sites rate, and BINOM.DIST is the function in excel 2010, institute
State the correct probability of conclusion to be expressed as when judging that the kind to be measured has specific, the variant sites rate is general more than SD's
Rate, when judging that the kind to be measured does not have specific, the variant sites rate is less than SD probability, and the detection is successful
Test zone after analyzing the sequencing fragment group by obtaining.
Specifically, obtaining the method for the hybrid strain kind includes:The hybrid strain kind is to be present in the database
Kind, and the potential hybrid strain genotype of the hybrid strain kind is with there is the test section of phase homogenic type between the hybrid strain genotype
The number in domain accounts for total ratio >=60% that the hybrid strain kind has the test zone of the potential hybrid strain genotype;
The hybrid strain genotype refers to the potential hybrid strain genotype of frequency >=0.02%;
Quantity >=2 of distinguishing base between all genotype of the potential hybrid strain genotype and the kind to be measured or
There are insertion or the missing of discontinuous base in the distinguishing base.
Specifically, the method for determining the universal test region by the variant sites is:Pass through discriminationThe value of discrimination is calculated, wherein, a is the kind sum being detected in variation window area, and bi is
The kind number of i-th kind of genotype in the variation window area, and bi>1, k is the number of the genotype comprising more than a kind
Mesh, the variation window area is centered on each single nucleotide variations site, to the two of the single nucleotide variations site
Side respectively extends 1/2 window as detection for surveying sequence length;
The universal test region is the area on the region that discrimination is big on cytoplasmic skeleton or nuclear genome
Index big and equally distributed region.
The beneficial effect that technical scheme provided in an embodiment of the present invention is brought is:Method provided in an embodiment of the present invention passes through
High-flux sequence and the amplification of more sites, realize large sample sampling and the large sample of the test zone of each individual of kind to be measured
Sampling, recycle the comprehensive means such as hybrid strain genotype and hybrid strain rate, successfully realize it is accurate, intactly judge kind to be measured
The target of specificity, stability and uniformity, and test speed is faster, can be completed within 10 days.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, embodiment of the present invention will be made into one below
It is described in detail on step ground.
Embodiment one, measure new rice variety ' the excellent 8377' of section specificity, uniformity and stability
For rice varieties, " section excellent 8377 ", " section excellent 8377 " is rice product to rice varieties to the kind to be measured that the present embodiment provides
Plant " R8377 " and " Jin Ke 1A " cross combination, above kind are kind known to disclosure.Determine the special of the rice varieties
The method of property, uniformity and stability comprises the following steps.
First, the variant sites belonging to kind to be measured in kind between different cultivars are obtained.
The kind of kind to be measured is rice, therefore, obtains the variant sites belonging to kind to be measured in kind between different cultivars and then should
To obtain the variant sites between different rice varieties, the variant sites between different rice varieties can be from the documents and materials announced
Middle acquisition, but the results contrast that this method is obtained is fragmentary, in the present embodiment, by by the genome sequence of different rice with
It is compared with reference to the genome sequence of rice varieties, the variant sites between substantial amounts of different rice varieties is obtained, wherein joining
It can be " Japanese eyeball " rice to examine rice varieties, and " Japanese eyeball " rice could alternatively be other known and refer to rice varieties.
Further, the method for obtaining the genome sequence of different rice varieties is as follows:
The genome sequence of the different rice varieties of the present embodiment shows three kinds of sources, and the first is Han Bin to 1082 rice
The high-flux sequence sequence of the genome of kind, pertinent literature information are as follows:Huang XH et al.A map of rice
genome variation reveals the origin of cultivated rice.Nature.2012;7:497–503.
The genome sequence of 1082 rice varieties is published in European NucleotideArchive (http://
Www.ebi.ac.uk/ena/), reception number is ERP001143, ERP000729 and ERP000106;Second be Xu Xun to 50
The high-flux sequence sequence of the genome of rice varieties, pertinent literature information are as follows:Xun X et al.Resequencing
50accessions of cultivated and wild rice yields markers for identifying
agronomically important genes.Nat Biotechnol.2011,30(1):105-11,50 rice varieties
Genome sequence be published in NCBI Short Read Archive (http://www.ncbi.nlm.nih.gov/sra), connect
Collect the digits for SRA023116;The third be by the method provided in the above-mentioned articles delivered of Han Bin to " R8377 ", " Jin Ke 1A ",
" " D excellent 527 " has carried out high-flux sequence to Jin Ke 1A/R7723 " with hybrid strain kind for " IRBB23 ", cenospecies.The present embodiment obtains altogether
The high-flux sequence sequence of the genome of 1137 rice varieties.
Further, variant sites are obtained using the genome sequence of different cultivars.
Specifically, because the sequencing depth of this 1137 rice varieties is not high, it is only capable of identifying single nucleotide variations (SNP)
Site, other variation types are as repeated number variation, due to a low credibility, without identification.Utilize Frederick Sanger ratios
" Japan is compared respectively by the high-flux sequence sequence of the genome of this 1137 rice varieties to software (version number 0.4)
(version is IRGSP 4.0 to eyeball " rice cell core reference gene group, download address:http://www.ncbi.nlm.nih.gov)
In cytoplasm reference gene group, the cytoplasm reference gene group includes mitochondria reference gene group and chloroplaset reference gene
Group, it is in NCBI (National Center for Biotechnology Information, US National biotechnology letter
Breath center) on reception number be respectively NC_011033 and NC_001320.During contrast, Insert Fragment length is set to 500bp, other
Parameter setting is default value.The Ssaha Pileup software kits (version number 0.5) used identify the SNP positions of each rice varieties
Point.The SNP site is defined as base-pair, the insertion of single base or the missing of single base of difference determination.The alkali that the difference determines
For base to referring to not include the uncertain base-pair of difference, the uncertain base-pair of difference refers to the base between some degeneracy bases
It is right, as R represents A or G, therefore, difference is there may be between A and R, it is also possible to which, in the absence of difference, therefore, difference is failed to understand between A and R
Really, SNP it is not mutually.Therefore, the SNP site in the present embodiment is not include the uncertain base-pair of above-mentioned difference.By above SNP
The definition in site, the present embodiment obtains 7236888 SNP sites altogether between all 1137 rice varieties, wherein 59503
SNP site is located on cytoplasmic skeleton, and remaining SNP site is located on nuclear genome.Genotype referred to hereafter is
Refer to the combination of multiple SNP sites in test zone, karyogene type refers to genotype and is located on nuclear genome, and matter genotype is
Refer to genotype to be located on cytoplasmic skeleton.For example, the 8th test zone is located on nuclear genome in table 1, it is karyogene
Type, the test zone share 9 SNP sites, and the genotype of the test zone is the combination of this 9 SNP sites.
2nd, the test zone of kind to be measured is determined by variant sites, test zone includes universal test region, at least portion
Variant sites are divided to be included in universal test region, its method includes:
Determine universal test region
Universal test region be on cytoplasmic skeleton on the big region of discrimination or nuclear genome discrimination it is big and
The equally distributed region of SNP site, wherein, discriminationWherein, a is tested in variation window area
The kind sum measured, bi are the kind number of i-th kind of genotype in variation window area, and bi>1, k is to include more than 1 product
The number of the genotype of kind, variation window area are centered on each single nucleotide variations site, to single nucleotide variations position
The both sides of point respectively extend 1/2 window as detection for surveying sequence length.The Computing Principle of discrimination is as follows:It is all interracial
Number of combinations isWherein, the combination between the different cultivars in same gene type is undistinguishable, and its number isThat
, the ratio for the breed combination that can not be distinguished isThe ratio for the breed combination that can be distinguished i.e. discrimination As can be seen here, discrimination is bigger, can more distinguish different cultivars, the big variation window of discrimination
Region is tested DUS more effective.If the test zone skewness on nuclear genome, some regions can be caused adjacent,
So as to linkage inheritance, information is easily overlapping, and therefore, the principle of compositionality in universal test region is selected on nuclear genome is:Area
Indexing is big and SNP site is uniformly distributed.Cytoplasmic skeleton without linkage inheritance problem, so, on cytoplasmic skeleton only need
The big region of selective discrimination degree.
High-flux sequence is carried out using Proton high-flux sequences instrument in the present embodiment, the test zone length of detection is sequenced in it
Degree can reach 200bp, and in order to obtain maximum fault information, the most long test zone in the present embodiment is also 200bp.Therefore, this reality
Apply the variant sites that example is mentioned to be located in whole test zone, the variant sites may include multiple SNP sites.
First, centered on each SNP site of acquisition, respectively extend 99bp and 100bp to the left and right, form 200bp change
Different window.According to the 7236888 of acquisition SNP sites, 7236888 variation windows can be obtained, calculate these variation windows
The discrimination in regionFor example, in the 1st variation window area, a=520 kind is detected altogether, altogether
There are k=3 kind genotype ACCT, CGTT, ACCC, their kind number is respectively b1=10, b2=30 and b3=431,
Therefore, It is meant that:, can be by 520 by the 1st variation window area
31% breed combination in kind distinguishes, and 79% breed combination cannot be distinguished by out, it is necessary to the window that more makes a variation in addition
It can just distinguish.After the same method, calculate to obtain all discriminations of 7236888 variation windows and therefrom choose and be located at
In nuclear genome 6800 maximum variation windows of discrimination and discrimination is maximum in cytoplasmic skeleton 200
Make a variation window.Check one by one in 6800 variation windows of nuclear genome, each make a variation window and next variation
Distance between window, if distance exceed 100K (1K=1000 base), abandon wherein discrimination it is less make a variation window it
After reexamine, until it is adjacent look into variation window distance be all higher than 100K untill.Selection 100K criterion distance is because rice
Genome Size is about 500M (ten thousand bases of 1M=100), by final selected 2000 general surveys for being located at nuclear genome
Region meter is tried, the interregional distance of average universal test is 250K, but due to few changes such as some specific regions such as centromeres
Ectopic sites, therefore, average distance should be less than 250K.By the above process, 4061 changes for being located at nuclear genome be have selected
Different window, their totally 4261 changes together with 200 that are located at discrimination maximum in the cytoplasmic skeleton windows that make a variation of acquisition
Different window is as selected universal test region.Wherein, 200 maximum variation windows of selective discrimination degree, are empirical value, the number
Amount can modify as the case may be.
The test zone can also include non-universal test zone.
Determine non-universal test zone
Non-universal test zone refers to the special site that special kinds needs detect.DUS tests need to detect fixed point transformation
Special site, fixed point transformation is the technological means commonly used in modern breeding, and such as back cross breeding, transgenic breeding, fixed point changes
Specificity can also be had because of it and turn into new varieties by making kind.It is non-universal based on the specific decision principle of New variety protection
Test zone should not be included in universal test region and the site of qualitative character is controlled for known to.
In the present embodiment, the gene Xa23 of high bacterial leaf spot resistant is present in database kind IRBB23, the control of Xa23 genes
Bacterial leaf spot resistance be qualitative character, and Xa23 derives from wild rice, is not included in universal test region.Managed based on more than
By being detected Xa23 genes as non-universal test zone, Xa23 genes have been cloned, and its resistance is lacked by 7 bases
Mistake causes, and therefore, the special detection region of kind to be measured is the base of this 7 missings, and it is located at Japanese eyeball reference gene group
24046820 to 24046825 of upper 11st chromosome, the more detailed information on Xa23 genes is shown in:Wang,C.,
X.Zhang,et al.(2014)."XA23is an executor R protein and confers broad-spectrum
disease resistance in rice."Molecular plant:ssu132.
3rd, the primer in amplification assay region is prepared, the primer includes universal test region primer, specific as follows:
Universal test region primer is prepared, the universal test region primer is directed to all kinds, specifically:
Universal test region is detected using multiple PCR technique, and multiple PCR technique refers in same PCR reacts
Add multiple PCR primers, while multiple sites in amplification gene group.The key of the technology is to design and synthesize multiplex PCR to draw
Thing, the multiple PCR technique that the present embodiment is provided using match Mo Feishier companies of the U.S., it can set up to 12000 weight PCR to draw
Thing.
Primer acquisition process is as follows:Log in match Mo Feishier company multiple PCR primer Photographing On-line webpage https://
Ampliseq.com/protected/help/pipelineDetails.action, relevant information is submitted by its requirement.
In the present embodiment, " Application type " options select " DNA Hotspot designs (single-pool) ".If
Multi-pool is selected, then multiplex PCR will divide multitube to carry out, and cost can increased, and single-pool primer only needs
Multiplex PCR, cost is saved, shortcoming is that some universal test regions design of primers may fail, but on genome
Alternative universal test region is more, therefore, abandons some alternative universal test regions and has no effect on result.By kind to be measured
Nucleus reference gene group and cytoplasm reference gene group permeate file, and in " Select the genome you
After " Custom " being selected in wish to use " options, reference base when uploading the file of fusion as design multiple PCR primer
Because of group." Standard DNA ", in Add Hotspot options, addition needs the general survey designed to the selection of DNA type options
The positional information of the SNP site in region, including chromosome information, SNP initiation site and SNP end locus are tried,
Its certain embodiments is shown in Table 1.Finally click on the " multiple PCR primer that Submit targets " buttons are submitted and designed.This reality
Apply in example, from the above-mentioned 4261 universal test regions obtained, design and be successfully authenticated 2231 pairs of multiple PCR primers, use
In the corresponding 2231 universal test regions of amplification.The method for verifying multiple PCR primer is by method provided by the invention, extraction
Leaves genomic DNA on same strain rice, and using design multiple PCR primer the genomic DNA of acquisition is expanded,
Build storehouse, high-flux sequence and analyze sequencing fragment group, remove the corresponding primer of following test zone:The sequencing piece of the test zone
Hop count is less than 1000 or hybrid strain genotype be present, and the primer remained is the multiple PCR primer being proved to be successful.Due to gene
Group DNA derives from same strain rice leaf, it is impossible to hybrid strain kind be present, therefore, hybrid strain genotype is the spy by test zone
PCR caused by different structure or sequencing Preference mistake, remove these test zones and avoid such system mistake.It is proved to be successful
Multiple PCR primer is supplied to client to use in fluid form after also being mixed by the said firm.Above-mentioned successful design multiplex PCR
2231 universal test regions of primer are the universal test region eventually for kind to be measured detection, meanwhile, the number of structure
Above-mentioned 2231 universal test regions are also contains according to each kind in storehouse, wherein, 100 universal test regions are located at cell
On matter genome, remaining 2131 universal test regions are located on nuclear genome.
It should be noted that:The number requirement >=900 in universal test region, reason is as follows:If less than 900, exist
The probability of the hybrid strain kind of erroneous judgement will be more than 1%, and the projectional technique of the threshold value is shown in Table 2.Due to there may be the survey of detection failure
Region is tried, therefore, test zone number is general >=and 1000.
Test zone primer can also include the primer of non-universal test zone, and the non-universal test zone primer is for treating
Kind is surveyed, test zone primer can also include non-universal test zone primer, specific as follows:
The primer of non-universal test zone includes the first primer and the second primer, the first primer include the first forward primer and
First reverse primer, the second primer include the second forward primer and the second reverse primer, and the first primer and the second primer enter respectively
Individually amplification obtains the amplified production of two non-universal test zones to row, by the amplified production equivalent of two non-universal test zones
It is mixed for building the high-throughput sequencing library individually expanded.The 5' ends of first forward primer are connected just like SEQ ID in sequence table
NO:Sequence 1 shown in 1, the 5' ends in the first reverse primer are connected just like SEQ ID NO in sequence table:Sequence 2 shown in 2;The
The 5' ends of two forward primers are connected just like SEQ ID NO in sequence table:Sequence 2 shown in 2, the 5' ends connection of the second reverse primer
Just like SEQ ID NO in sequence table:Sequence 1 shown in 1.
The design process of non-universal test zone primer is as follows:The first step, it is no more than 200bp and comprising non-by amplification length
The requirement of all SNP sites in universal test region, by common PCR primers design method, design expands non-universal test zone
PCR forward primer and reverse primer;Second step, the 5' ends of designed forward primer and reverse primer are connected into sequence respectively
SEQ ID NO in list:1 and sequence table in SEQ ID NO:2, the forward primer and first primer of the first primer are obtained respectively
Reverse primer;3rd step, SEQ ID NO in catenation sequence table are distinguished at the 5' ends of designed forward primer and reverse primer:2
With SEQ ID NO in sequence table:1, the forward primer of the second primer and the reverse primer of the second primer are obtained respectively.In sequence table
SEQ ID NO:1 and sequence table in SEQ ID NO:2 be the joint sequence used in high-flux sequence, thereby using PCR primer band
There is the joint sequence of high-flux sequence, after establishing sequencing library after directly being mixed with the product in the general sequencing region of amplification
Together be sequenced, without by fragmentation, jointing etc. it is cumbersome build storehouse step, improve operating efficiency and reduce into
This.It is to be sequenced from the both ends of non-universal test zone simultaneously to make two pairs of only different primers of joint.
Specifically, in the present embodiment, it is designed to be used to expand the non-universal test zone of kind to be measured (Xa23 genes)
The forward primer sequence of common PCR primers is:TGCGGCATCACTAACATCAG, reverse primer sequences are:
TGTTAGTGATGCGGGAGGAA.SEQ ID NO in sequence table are added respectively to its both ends:1 and sequence table in SEQ ID NO:2
The forward primer of the first primer formed afterwards is:5'-
SEQ ID NO in CCATCTCATCCCTGCGTGTCTCCGACTCAGTGCGGCATCACTAACATCAG such as sequence table:3;First
The reverse primer of primer is:SEQ in 5'-CCTCTCTATGGGCAGTCGGTGATTGTTAGTGATGCGGGAGGAA such as sequence table
ID NO:4;The forward primer of second primer is:5'-CCTCTCTATGGGCAGTCGGTGATTGCGGCATCACTAACATCAG is such as
SEQ ID NO in sequence table:5;The reverse primer of second primer is:5'-
SEQ ID NO in CCATCTCATCCCTGCGTGTCTCCGACTCAGTGTTAGTGATGCGGGAGGAA such as sequence table:6.It is set
The non-universal test zone primer of meter is matched Mo Feishier companies by the U.S. and synthesized.
4th, the method for the database of genotype in all test zones of the structure comprising different cultivars is as follows:
The database of genotype in all test zones of the structure comprising different cultivars, specifically, in kind to be measured
On test zone, different cultivars is obtained to genotype that should be on test zone and composition data storehouse.This example obtains
2231 universal test region primers and 1 non-universal test zone primer, amplification region corresponding to them is kind to be measured
Test zone.The database of the genotype of 2232 test zones of the structure comprising 1137 kinds and its SNP positional information,
Partial results are shown in Table 1.
Table 1 is the part of database variety and genetype and its position, variety and genetype to be measured, hybrid strain genotype and its frequency
Example
'-' represents the position of the SNP site and lacked in reference gene group in table 1;"/" represents that the test zone is heterozygosis
Genotype, the different genotype of "/" both front and back be present;In addition to ATGC, other letters represent degeneracy base.If genotype entirely by
Degeneracy base N is formed, and claims corresponding test zone genotype and SNP shortage of data, genotype or SNP and any genotype of missing
Or SNP makees indifference processing when comparing.Can be by the method Test database of detection variety and genetype to be measured provided by the invention
The genotype of kind and completion missing.
The present embodiment does not list all database content completely as space is limited, only lists wherein 5 kinds
The information of 10 test zones.Equally limited based on length, also have some areas also only to list part in the present embodiment related real
Example, remaining unlisted data can be according to the method completion of the present embodiment.
5th, after the amount of sampling SN for determining kind to be measured, random sampling mixes and extracts the DNA of mixing sample, and method is as follows:
Calculate kind amount of sampling to be measured
Amount of sampling SN should meet following condition:BINOM.INV (SN, M, 0.95)/SN≤1.15*M, wherein, BINOM.INV
For the function in excel 2010, its application method is identical with the definition in excel 2010, and its implication is so that accumulation binomial
The functional value of distribution is more than or equal to the smallest positive integral of critical value.Amount of sampling SN meet condition implication be:Even if hybrid strain rate is only
Beyond the 15% of threshold value M, the amount of sampling in the case where 95% probability ensures, can correctly judge the stability of kind to be measured with it is consistent
Property.M values artificially determine according to conditions such as crop species, type, specific requirements.In Ministry of Agriculture's New variety protection office
In the issue of room《New variety of plant specificity, uniformity and stability test guide-rice》Middle regulation:Hybrid rice seed uses
1% population norms, therefore, in the present embodiment, M values are used as from median 1%.After progressively increasing SN values, above-mentioned public affairs are calculated
Formula is found, as SN >=12000, BINOM.INV (SN, 1%, 0.95)/SN≤1.15*1% is set up.Therefore, in the present embodiment
Sample to be tested amount of sampling should >=12000.
Random sampling mixes and extracts the DNA of mixing sample
In the present embodiment, 50000 germinations are have chosen, 30000 buds being substantially equal to the magnitudes is randomly selected and mixes
It is placed in after conjunction in mortar, powder is fully ground into after adding liquid nitrogen into mortar.Given birth to using Beijing Tiangeng biochemical technology Co., Ltd
The article No. of production is that DP305 plant genome DNA extracts kit is extracted and obtains the DNA of kind mixing sample to be measured, and DNA is carried
Method is taken to be carried out by the operation manual of the kit.Utilize the production of Invitrigen companies of the U.S. dsDNA HS
Assay Kit (article No. Q32852) and its specification quantify to the DNA of acquisition, and the kind DNA to be measured after quantifying is dilute
It is interpreted as 10.00ng/ μ l.
6th, expanded using the DNA of primer pair mixing sample, obtain the amplified production of test zone, amplified production is used
In structure high-throughput sequencing library, wherein primer includes universal test region primer and the high-flux sequence text in universal test region
Storehouse, specific method are as follows:
High-throughput sequencing library includes:The high pass of the high-throughput sequencing library in universal test region and non-universal test zone
Sequencing library is measured, in the present embodiment, builds the high-throughput sequencing library of universal test region and non-universal test zone respectively,
The two is mixed, obtains the high-throughput sequencing library of all test zones.
The method for building the high-throughput sequencing library in universal test region is as follows:
Utilize library construction Kit 2.0 (production of Mo Feishier companies, article No. 4475345 are matched by the U.S.) multiplex PCR
After expanding universal test region, high-throughput sequencing library is built using amplified production.The kit includes following reagent:5×Ion
AmpliSeqTMHiFi Mix, FuPa reagents, transferring reagent, sequence measuring joints solution and DNA ligase.The method of library construction is pressed
The operation manual of the kit《Ion AmpliSeqTMLibrary Preparation》(publication number:MAN0006735, version:
A.0) carry out.It is as follows by 2231 universal test regions of multiplexed PCR amplification, the amplification system of multiplex PCR:5×Ion
AmpliSeqTMμ l of HiFi Mix 4, μ l of universal test region primer mixed liquor 4 prepared, the DNA 10ng and nothing of kind to be measured
The μ l of enzyme water 11.The amplification program of multiplex PCR is as follows:99 DEG C, 2 minutes;(99 DEG C, 15 seconds;60 DEG C, 4 minutes) × 25 circulations;10
DEG C insulation.After primer unnecessary in multiplexed PCR amplification product is digested using FuPa reagents, then carry out phosphorylation, specific method
For:2 μ L FuPa reagents are added into the amplified production of multiplex PCR, after mixing, are reacted in PCR instrument by following program:50 DEG C,
10 minutes;55 DEG C, 10 minutes;60 DEG C, 10 minutes;10 DEG C of preservations, obtain mixture a, and mixture a is containing by phosphorylation
Amplified production solution.By the upper sequence measuring joints of amplified production connection of phosphorylation, specific method is:Conversion is added into mixture a
μ L of reagent 4, the μ L of sequence measuring joints solution 2 and the μ L of DNA ligase 2, after mixing, reacted in PCR instrument by following program:22 DEG C, 30
Minute;72 DEG C, 10 minutes;10 DEG C of preservations, obtain mixed liquor b.Dissolved after purifying mixed liquor b using the ethanol precipitation methods of standard
In 10 μ L without in enzyme water.Utilize the production of Invitrigen companies of the U.S.(article No. is dsDNA HS Assay Kit
Q32852) and according to its specification it is measured, and after obtaining mixed liquor b mass concentration, mixed liquor b after purification is diluted
To 15ng/ml, the high-throughput sequencing library in concentration about 100pM universal test region is obtained.
The method for building the high-throughput sequencing library of non-universal test zone is as follows:
Using the DNA of kind to be measured as template, the first primer of the non-universal test zone prepared using the above method and the
Two primers carry out independent PCR amplifications respectively, and the high-flux sequence text of non-universal test zone is obtained after mixed in equal amounts amplified production
Storehouse.Concrete operations are pressed《Ion Amplicon Library Preparation(Fusion Method)》(publication number:
4468326) carry out, substantially process is as follows:The forward primer of first primer and reverse primer are dissolved as with water to 10 μM of concentration
Afterwards, isometric mixing, obtains the first primer solution.It is formulated as follows PCR reaction systems:μ L of first primer solution 1,30ng products to be tested
Kind DNA and PCR high-fidelities mixture (invirtrigen companies of the U.S. produce, article No. 12532016) 45 μ L, after mixing,
Reacted in PCR instrument by following program:94 DEG C, 3 minutes;(94 DEG C, 30 seconds;58 DEG C, 30 seconds;68 DEG C, 1 minute) × 40 circulations;4
DEG C insulation.Pcr amplification product is dissolved in 10 μ L water after purification by the method for the ethanol precipitation of standard, utilizes the reagents of DNA 1000
On the biological analyser (model 2100) that box (article No. 5067-1504) produces in Agilent company of the U.S., by the kit
After specification determines and obtains the molar concentration of amplified production, 200pM, as the first primer amplified production are diluted to.Using
Identical method, obtain the amplified production for the second primer that concentration is 200pM.By the amplified production of the first primer and the second primer
Amplified production mix in equal volume, obtain concentration be 100pM non-universal test zone high-throughput sequencing library.
Obtain the high-throughput sequencing library of all test zones
In universal test region number and non-universal test zone number ratio mixing equimolar concentration it is general
The high-throughput sequencing library of the high-throughput sequencing library of test zone and non-universal test zone, obtained mixture are all
The high-throughput sequencing library of test zone.In the present embodiment, the high-throughput sequencing library in the universal test region of acquisition is taken
After the high-throughput sequencing library of 2231 μ L and the 1 non-universal test zones of μ L mixes, all test zones that concentration is 100pM are obtained
High-throughput sequencing library.
7th, high-flux sequence is carried out to high-throughput sequencing library, obtains that fragment group is sequenced, specific method is as follows:It is it is determined that high
Flux sequencing depth CF principle:The depth CF of high-flux sequence meets following condition:BINO M.DIST(10,10,
BINOM.DIST (8,20, BINOM.DIST (0, CF, 0.1%, TRUE), TRUE), FALSE) >=99.9%, 1-BINOM.DIST
(10000,10000,1-BINOM.DIST (8,20,1-BINOM.DIST (99.99%*CF, CF, 99.9989%, TRUE),
TRUE), FALSE)≤0.1% and BINOM.DIST (10* (1-M) * C F, 10*CF, 1-110%*M, TRUE) >=95.0%, its
In, CF is the depth of high-flux sequence, and M is to judge threshold value selected when uniformity and stability, namely average each test
The capped multiple in region, BINO M.DIST are the function in excel 2010, and its application method in excel 2010 with determining
Justice is identical, and what it was returned is the probability of binomial distribution.The meaning of three functions is:In hybrid strain rate as little as 0.1%, hybrid strain product
Under conditions of kind up to 10 and hybrid strain kind and product to be tested inter-species averagely only 20 difference sites, determined by high-flux sequence depth
Probability >=99.9% of the fixed whole hybrid strain kinds of detection;In database kind up to 10000 and hybrid strain kind and product to be tested
Under conditions of inter-species averagely only 20 difference sites, the presence that is determined by high-flux sequence depth judge by accident the probability of hybrid strain kind≤
0.1%;It is wide in variety up to 10 and when true hybrid strain rate exceeds only the 10% of threshold value selected when judging specific in hybrid strain, by
Correct probability >=95.0% of the judgement conclusion to stability and uniformity that high-flux sequence depth determines.Conditions above is very
Strictly, therefore, true effect is better than above-mentioned threshold value.The projectional technique of above probability is shown in Table 2.
Table 2 is the computational methods of the present embodiment dependent probability
Table 2 is that the tables of data of Excel 2010, its function, cell etc. is identical with Excel 2010 definition.Wherein,
" judging threshold value selected when uniformity and stability (M) " for cell B2, other cell numberings are pressed using B2 as reference
Excel 2010 rule defines, such as the cell where " hybrid strain rate (R) " adds 4 rows 1 on the basis of B2 and arranged, therefore
Numbering is C6, and other cell coding rules are identical with this.
The determination method of the present embodiment high-flux sequence depth is:After M=1% is substituted into above three formula, progressively add
It during big sequencing depth CF to 2783, can set up above three equation, therefore, the present embodiment sequencing depth is defined as >=2783
Times.
High-flux sequence is carried out using high-throughput sequencing library
Utilize the high-throughput sequencing library and kit Ion PI Template OT2 of all test zones of acquisition
200Kit v2 (invirtrigen companies of the U.S. produce, article No. 4485146) be sequenced before ePCR (Emulsion
PCR, emulsion polymerization enzyme chain reaction) expand, operating method is carried out by the operation manual of the kit.Utilize ePCR products and reagent
Box Ion PI Sequencing 200Kit v2 (invirtrigen companies of the U.S. produce, article No. 4485149) are in Proton
High-flux sequence is carried out on two generation high-flux sequence instrument, operating method is carried out by the operation manual of the kit.In the present embodiment
In, high-flux sequence flux is arranged to average 30000 times of coverage test region.
A large amount sequencing result is pre-processed
First determine whether high-flux sequence the quality of data whether >=Q20, if<Q20 (this situation is few), then as stated above
High-flux sequence is re-started, until quality requirement reaches Q20 standards, Q20 standards, which are met in table 2, " to be sequenced wrong to be specific
The requirement of the probability of base "≤0.33%.The high-flux sequence fragment for being up to quality requirement is compared to all 2232 tests
Region, remove after comparing the unsuccessful and infull sequencing fragment of genotype detection, remaining all sequencing fragments are referred to as piece is sequenced
Section group.The incomplete sequencing fragment of genotype detection refers to could not be by table 1 shown in " positions of the SNP in reference gene group "
The reason for all SNP sites detect sequencing fragment, and genotype detection is not complete is that sequencing fragment is too short, compares unsuccessful reason
It is that sequencing fragment is mostly non-specific amplification product.
8th, analysis sequencing fragment group, obtains variety and genetype to be measured and hybrid strain genotype, method is as follows;
Sequencing fragment group is compared and arrives all test zones, and counts the sequencing segments in each test zone, is removed
The test zone of segments≤1000 is sequenced, remaining test zone is the successful test zone of detection.In the present embodiment,
2030 successful test zones of detection are obtained altogether.The fragment for comparing test zone is referred to as the sequencing fragment of the test zone,
The base composition that the position in table 1 shown in " positions of the SNP in reference gene group " is extracted from sequencing fragment is referred to as the sequencing
The genotype of fragment.The frequency of genotype refers to be sequenced in fragment group, and the sequencing segments for representing the genotype accounts for the genotype
The ratio of the sequencing fragment sum of place test zone.The genotype of frequency >=30% is referred to as variety and genetype to be measured.It is general next
To say, in the sample extracted, for the amount of hybrid not higher than 15%, sequencing mistake is no more than 1%, and the two is total no more than 16%,
Therefore, for homozygous site, variety and genetype to be measured only has one kind, and its frequency should be more than 84%, and for heterozygosis position
Point for, variety and genetype to be measured has 2 kinds, its ratio should be more than 42%, therefore, it is specified that variety and genetype to be measured frequency >=
30%, it can exclude because being contaminated with hybrid strain in the wrong and to be measured kind of sequencing and to the interference of variety and genetype to be measured.Hybrid strain base
Because type refers to the potential hybrid strain genotype of frequency >=0.02%, wherein, all genotype of potential hybrid strain genotype and kind to be measured
Between distinguishing base quantity >=2 or distinguishing base in have insertion or the missing of discontinuous base.Hybrid strain VDA genotypes
Principle is:In high-flux sequence, insertion or missing errors are extremely rare, and 2 fixed distinguishing bases caused by mistake is sequenced
Probability as little as (1%/3) 2=0.0011%, and require hybrid strain genotype frequency >=0.02%, under the limitation of these conditions, even if
It is 30000 sequencing depth, because the probability that sequencing mistake produces certain hybrid strain genotype is only that 0.0001% (computational methods are shown in Table
2).0.02% frequency meets most strict DUS testing standards at present, i.e., as little as 2 hybrid detected from 10,000 seeds.
If distinguishing base quantity=1, whole test zones can all produce the hybrid strain genotype (computational methods are shown in Table 2) of mistake, if
During distinguishing base quantity >=3, hybrid strain genotype quantity is drastically reduced, it is difficult to accurate to calculate hybrid strain rate R, therefore, distinguishing base
The threshold value of quantity >=2 is optimal.
For example, in fragment group is sequenced, the sequencing fragment sum in the 1st sequencing region is 33180 articles, have ACCC, CGTT,
CCCC, GCCC ... totally 41 kinds of genotype, represent these genotype sequencing segments distinguish 16709,16334,2,2
Bar ..., the frequency of these genotype is 16709/33180=50.36%, 16334/33180=49.23%, 2/33180=
0.006%th, 2/33180=0.006% ....By the definition of variety and genetype to be measured and hybrid strain genotype, ACCC and CGTT should
This is to be measured variety and genetype of the kind to be measured in the 1st test zone, and other genotype are genotype caused by sequencing mistake.
In the special test zone of hybrid, female genotype differs with male parent gene type, female genotype and all nucleus hybrid strain product
The genotype of kind is different, and male parent gene type is also different from the genotype of all nucleus hybrid strain kinds;Female genotype is to treat
Survey in kind, the genotype identical genotype with female parent;Male parent gene type be kind to be measured in, it is identical with the genotype of male parent
Genotype.1st test zone, female genotype CGTT differs with male parent gene type ACCC, and female genotype and father
This genotype is different with the genotype of all nucleus hybrid strain kinds (in the present embodiment, no hybrid strain kind), therefore, the 1st survey
It is also the special test zone of hybrid to try region.Hybrid strain karyogene type refers to that hybrid strain genotype is karyogene type, hybrid strain matter genotype
It is matter genotype to refer to hybrid strain genotype.By this definition, first test zone is without hybrid strain genotype, therefore, also without hybrid strain core
Genotype or hybrid strain matter genotype.By identical method, judge and obtain treating for whole 2030 successful test zones of detection
Variety and genetype, the special test zone of hybrid, hybrid strain genotype and its frequency are surveyed, and the hybrid strain genotype for judging to obtain is hybrid strain
Karyogene type or hybrid strain matter genotype.As a result show:In the present embodiment, no hybrid strain genotype, share 153 hybrids and specifically survey
Try region.
The standard sample detection method in the present embodiment is following is a brief introduction of, 1 seed is taken from kind to be measured, is broadcast
After planting and growing up to seedling, pressed using the blade of seedling and extract genomic DNA with kind identical method to be measured, the DNA is referred to as treating
Survey the standard sample of kind.With kind to be measured simultaneously and by same procedure it is parallel structure standard sample high-throughput sequencing library simultaneously
High-flux sequence.Wherein, the genotype of frequency >=30% is referred to as standard sample genotype, the frequency of standard sample hybrid strain genotype
>=0.02% and quantity >=2 of the distinguishing base between standard sample genotype or distinguishing base in have discontinuous base insert
Enter or lack.The standard sample genotype in successful test zone is each detected by with kind identical method to be measured, acquisition
With standard sample hybrid strain genotype.If standard sample genotype and variety and genetype identical test zone to be measured account for standard sample
The ratio of successful test zone is detected more than 90% with kind to be measured, then standard sample is correct, otherwise, again from product to be tested
1 seed is taken in kind, repeats above procedure, until obtaining correct standard sample.By the hybrid strain gene of correct standard sample
Type obtains identical hybrid strain genotype, removed in kind to be measured compared with the hybrid strain genotype of the corresponding test zone of kind to be measured
The identical hybrid strain genotype, correct kind hybrid strain genotype to be measured are retained and are used for subsequent analysis.Arrange above
Apply and eliminate the hybrid strain genotype caused by Systematic selection mistake, Systematic selection mistake is mainly the special knot of gene order
PCR selectivity mistake amplifications caused by structure.It should be noted that:When database is wide in variety, different cultivars base can be represented extensively
It during because of type, can require that hybrid strain genotype is identical with some genotype of database kind, can equally play and standard sample
Identical function, in this case, it is possible to which not examination criteria sample, reaches the purpose for mitigating workload.In the present embodiment, because
Hybrid strain genotype is not detected, so, the problem of removing wrong hybrid strain genotype is also not present.
9th, by variety and genetype to be measured compared with the genotype of the different cultivars in database, approximate kind, variation are obtained
Site and variant sites rate, method are as follows:
If in the test, the genotype of kind to be measured and database kind is referred to as to treat without missing, the test zone
Survey the shared test zone of kind and the database kind.In shared test zone, if kind to be measured and database kind
Genotype is incomplete same, then the test zone where the genotype is referred to as the difference position of kind to be measured and the database kind
Point, corresponding genotype Differential genotype each other, the number of the number/shared test zone in difference site rate=difference site.
The approximate kind that the minimum kind of difference bit rate is referred to as kind to be measured is obtained from database, corresponding difference site is referred to as making a variation
Site, the number of number/shared test zone of variant sites rate=variant sites.
In the present embodiment, " Jin Ke 1A " shared test zone number is for kind to be measured and the 1st kind of database
2025.In the 1st shared test zone, kind to be measured with " Jin Ke 1A " genotype is respectively CGTT/ACCC and CGTT, two
Person is incomplete same, therefore, the 1st shared test zone be kind to be measured with " Jin Ke 1A " difference site, CGTT/ACCC with
ACCC is kind to be measured and " Jin Ke 1A " Differential genotype.By identical method, by all shared test zones, product to be tested
Kind is with " compared with Jin Ke 1A " genotype, finding to share 152 difference sites, difference site rate=152/2025=7.51%.Press
Identical method, kind to be measured and all 1137 interracial difference sites rate in database are obtained, and obtain difference site
The minimum kind of rate is " Jin Ke 1A/R7723 ", only 1 difference site, it (is shown in Table for the non-universal test zone of numbering 10
1), difference site rate is 0.05%.Therefore, " Jin Ke 1A/R7723 " be kind to be measured approximate kind, the variation of kind to be measured
Site rate is 0.05%.
Tenth, by hybrid strain genotype compared with the genotype of the different cultivars in database, after obtaining hybrid strain kind, calculate miscellaneous
Strain rate, method are as follows:
Obtain hybrid strain kind:The kind that hybrid strain kind is present in database, and the potential hybrid strain genotype of hybrid strain kind
Having the number of the test zone of phase homogenic type to account for hybrid strain kind between hybrid strain genotype has the test of potential hybrid strain genotype
Total ratio >=60% in region, wherein, the distinguishing base between all genotype of potential hybrid strain genotype and kind to be measured
Quantity >=2 or distinguishing base in have insertion or the missing of discontinuous base.Hybrid strain kind be divided into nucleus hybrid strain kind and
Cytoplasm hybrid strain kind, wherein, nucleus hybrid strain kind refers to calculate the hybrid strain kind obtained, cytoplasm merely with karyogene type
Hybrid strain kind refers to calculate the hybrid strain kind obtained merely with matter genotype.For example, it is assumed that the genotype of the kind in database
Respectively AA, AA, AA/TT, AA/TT, AA/TT, AA/TT and AA when, the corresponding genotype of kind to be measured be respectively AA, AA/TT,
TT, AA, TT/CC, GG/CC and during-A, corresponding potential hybrid strain genotype is:Nothing, nothing, AA, TT, AA, AA/TT and AA.Typically
Heterozygous genotypes are not present in homozygous kind, but only a few site there may be, in addition, hybrid strain is mostly cenospecies, heterozygous sites
More typically, various possible situations are listed therefore.Parameter 60% can ensure that whole hybrid strain kind detection probabilities are 100% and deposited
It is 0% in the probability of the hybrid strain kind of erroneous judgement, the determination method of the parameter value is shown in Table 2.
In the present embodiment, due to not detecting hybrid strain genotype, therefore, also without hybrid strain kind.Special hybrid strain gene
Type refers to the hybrid strain genotype that only a hybrid strain kind is all, and it includes special hybrid strain karyogene type and special hybrid strain matter gene
Type;Special hybrid strain karyogene type refers to the hybrid strain karyogene type that only a nucleus hybrid strain kind is all, special hybrid strain matter base
Because type refers to the hybrid strain matter genotype that only a cytoplasm hybrid strain kind is all.In the present embodiment, due to without hybrid strain kind, because
This, also without special hybrid strain genotype.
Calculate hybrid strain rate R principles
Hybrid strain rate R=R1+R2-R3-R4+Rm, wherein:Its
In, n1 is the number of nucleus hybrid strain kind, and t1 is all special hybrid strain karyogene types of the i-th 1 nucleus hybrid strain kinds
Number, i1j1 be after all special hybrid strain karyogene types of the i-th 1 nucleus hybrid strain kinds sort from low to high by its frequency,
The special hybrid strain karyogene type of jth 1, R1i1j1 are the frequency of the i-th 1j1 special hybrid strain karyogene types;R1 is by hybrid strain core base
Because of the summation of the hybrid strain rate of the nucleus hybrid strain kind of type calculating, the hybrid strain rate of nucleus hybrid strain kind is to remove nucleus hybrid strain
In kind after the frequency of the special hybrid strain karyogene type of minimum 80% and highest 10%, remaining special hybrid strain karyogene type
2 times of average value of frequency;Wherein, t2 is except nucleus hybrid strain kind is gathered around
The number of outside some hybrid strain karyogene types and frequency >=0.17% hybrid strain karyogene type, i2 are except nucleus hybrid strain kind
After all hybrid strain karyogene types outside the hybrid strain karyogene type possessed sort from low to high by its frequency, the i-th 2 hybrid strain core bases
Because of type, R2i2 is the frequency of the i-th 2 hybrid strain karyogene types;R2 is to utilize the hybrid strain karyogene possessed except nucleus hybrid strain kind
The hybrid strain rate that type calculates, it is 80% He minimum in the frequency for remove the hybrid strain karyogene type possessed except nucleus hybrid strain kind
After the value of highest 10%, 2 times of the average value of surplus value;Its
In, n2 is the number of cytoplasm hybrid strain kind, and R3i3 is the hybrid strain rate of the i-th 3 cytoplasm hybrid strain kinds, and t3 is the i-th 3 cells
The number of all special hybrid strain matter genotype of matter hybrid strain kind, i3j3 are all of the i-th 3 cytoplasm hybrid strain kinds
After special hybrid strain matter genotype sorts from low to high by its frequency, the special hybrid strain matter genotype of jth 3, R3i3j3 the
The frequency of i3j3 special hybrid strain matter genotype;R3 is by the miscellaneous of the cytoplasm hybrid strain kind of hybrid strain matter genotype calculating
The summation of strain rate, the hybrid strain rate of cytoplasm hybrid strain kind are to remove 80% and highest minimum in cytoplasm hybrid strain kind
After the frequency of 10% special hybrid strain matter genotype, the average value of the frequency of remaining special hybrid strain matter genotype;Wherein, t4 is in addition to the hybrid strain matter genotype that cytoplasm hybrid strain kind possesses
And the hybrid strain matter genotype of frequency >=0.17% number, i4 be the hybrid strain matter genotype that possesses except cytoplasm hybrid strain kind it
After outer all hybrid strain matter genotype sort from low to high by its frequency, the i-th 4 hybrid strain matter genotype, R4i4 is miscellaneous for the i-th 4
The frequency of strain matter genotype;R4 is the hybrid strain rate for utilizing the hybrid strain matter genotype possessed except cytoplasm hybrid strain kind to calculate, and it is
After the value for removing 80% and highest 10% minimum in the frequency of the hybrid strain matter genotype possessed except cytoplasm hybrid strain kind, remain
The average value of residual value;Wherein, t5 is the number of the special test zone of hybrid;I5 is the i-th 5
The special test zone of hybrid;Rmi5 is in the i-th 5 special test zone of hybrid, the frequency of female genotype;Rfi5 is the i-th 5
In the special test zone of hybrid, the frequency of male parent gene type;The hybrid strain rate of the maternal selfings of Rm, it is the special test zone of hybrid
In, the average value of the frequency of female genotype and the difference of the frequency of male parent gene type;Int () is bracket function, returns to bracket
In number integer part.
The female parent that hybrid strain in kind to be measured comes from reproductive process is selfed, flyings pollination mixes and mechanical admixture, its
In, female parent is the main source of hybrid strain variet complexity from giving flyings pollination to mix.Female parent selfing refers to produce in hybrid seed
During, the female parent as sterile line should not be selfed generation seed originally, but due to maternal part fertility restorer, generate kind
Son, so as to form hybrid.Flyings pollination, which mixes, refers to that the pollen of hybrid strain kind passes to kind to be measured by wind-force etc. and pollinated
The hybrid seed of formation, flyings pollination can not possibly introduce cytoplasm, therefore can only cause hybrid strain karyogene type, and its hybrid strain rate is miscellaneous
2 times of strain karyogene type frequency.Mechanical admixture refers to that hybrid strain variety seeds are directly mixed in kind to be measured, while introduces cell
Core and cytoplasm, while hybrid strain karyogene type and hybrid strain matter genotype are formed, its hybrid strain rate should be the frequency of hybrid strain matter genotype
Rate.In hybrid strain rate R calculation formula, the hybrid strain rate of mechanical admixture has been over-evaluated 1 times by R1+R2, needs to correct, after correction for R1
+R2-R3-R4.It is a technical barrier to distinguish mechanical admixture with flyings pollination to mix, and the present invention solves this problem.
In hybrid strain rate R calculation formula, the hybrid strain rate of nucleus hybrid strain kind is all 2 × hybrid strain karyogene type frequency,
Its reason is as follows:Diploid or allopolyploid plant are 2 copies, therefore, hybrid strain in the test zone of nuclear genome
Rate is 2 times of corresponding hybrid strain karyogene type frequency.If the test zone of the nuclear genome of N parts copy must be selected,
Then coefficient should be adjusted to N, if copy number is indefinite, make N=2 processing, if wrong, it will when calculating R, by removing 80%
The mode of low extremum excludes them.
In hybrid strain rate R calculation formula, merely with 10% of hybrid strain genotype frequency value in centre count
Calculate, its principle is:The different hybrid strain genotype of same hybrid strain kind are determined by the hybrid strain rate of the hybrid strain kind, so the phase of frequency
Prestige value is equal, and the difference between frequency is expanded by PCR, the error during high-flux sequence causes.Pass through hybrid strain genotype
Definition and kind standard sample to be measured, these improper values are eliminated substantially, removes 10% extremum and is enough to remove pole
Deviate the test zone of true hybrid strain rate on a small quantity.Why remove the 80% of minimum, and it is maximum then only remove 10%, principle is such as
Under:(1) worst error source is sequencing mistake, and it is very low that hybrid strain genotype frequency caused by mistake is sequenced;(2) except hybrid strain product
In the frequency of hybrid strain genotype outside kind, high level is more likely the common hybrid strain genotype of different hybrid strains, is represent real
Hybrid strain rate.
In R2 and R4 calculation formula, it is desirable to which frequency >=0.17% of hybrid strain genotype, its principle are as follows:Work as database
In kind number and detection site when reaching 10000,149 hybrid strain genotype erroneous judgements will be averagely produced, when setting hybrid strain
During genotype frequency >=0.17%, probability >=99.98% (projectional technique is shown in Table 2) of the hybrid strain genotype of no erroneous judgement just can be accurate
Really calculate the value to R2 and R4.It has been the limit in reality that kind number in database and detection site, which reach 10000, because
This, the threshold value of frequency >=0.17% of hybrid strain genotype goes for various situations.R2 and R4 introducing so that energy of the present invention
It is 0 enough in database kind, i.e., in the case that no database is supported, calculates hybrid strain rate R.Especially, if hybrid strain kind A institute
There is hybrid strain genotype to be possessed by hybrid strain kind B and other hybrid strain kinds, thus, hybrid strain kind A is without special hybrid strain genotype.This
When, when calculating hybrid strain rate R, hybrid strain kind A and hybrid strain kind B hybrid strain rate are not calculated, and calculate hybrid strain kind AB hybrid strain
Rate.Hybrid strain kind AB hybrid strain VDA genotypes are:Hybrid strain genotype common to hybrid strain kind A and hybrid strain kind B.
Hybrid strain rate R calculation formula is general formula, and kind to be measured typically only mixes a kind of hybrid strain kind in reality, by
The all very big and process specification in cenospecies production area, so, the possibility of flyings pollination and mechanical admixture is all very low, up to
Female parent selfing forms hybrid, and the present embodiment is such case.
Calculate hybrid strain rate R hypothesis example
Table 3 assumes a hybrid strain rate calculated examples, to become apparent from illustrating hybrid strain rate R calculating process.
Table 3 assumes example to calculate one of hybrid strain rate R
In table 3, nucleus hybrid strain kind common A and B two, so n1=2, cytoplasm hybrid strain kind number only C mono-, so
N2=1.By the definition of special hybrid strain karyogene type, the special hybrid strain karyogene type for obtaining hybrid strain kind A is that numbering is No. 1-10
Hybrid strain karyogene type AA, TT, TCC, GG, AC, TTC, TCCC, GGC, ACC and AG, so, t1=10, they frequency difference
For 0.10%, 1.20%, 0.10%, 0.10%, 0.02%, 0.10%, 0.10%, 0.10%, 0.10% and 0.10%, to this
It is R11111=0.02%, R11121=0.02%, R11131 after 10 special hybrid strain karyogene type frequencies sort from low to high
=0.10%, R11141=0.10%, R11151=0.10%, R11161=0.10%, R11171=0.10%, R11181=
0.10%th, R11191=0.10% and R111101=1.20%.From j 1=Int (0.8 × t1)+1=Int (0.8 × 10)+1
=9 to j 1=t1-Int (0.1 × t1)=10-Int (0.1 × 10)+1=9 R111j1 value is R11191=0.10%,
So nucleus hybrid strain kind A hybrid strain rate isIn the same way, nucleus is obtained
Hybrid strain kind B hybrid strain rate is Thus, nucleus hybrid strain kind is obtainedIn a similar manner, R2=0.02%, cytoplasm hybrid strain product are obtained
The hybrid strain rate of kindR4=0.04%.In the 1st special test zone of hybrid, Rmi5
=52.36%, Rfi5=46.34%, therefore, the maternal selfing rate calculated using the special test zone of the 1st hybrid is
52.36%-46.34%=6.02%, by identical method, calculate in other several special test zones of hybrid, female parent selfing
Rate is 3.94%, 6.06%, 6.22% and 7.54%, therefore in the hypothesis example, final maternal selfing rate is:Rm=
(6.02%+3.94%+6.06%+6.22%+7.54%)/5=5.96%.Therefore, hybrid strain rate R=R1+ in the hypothesis example
R2-R3-R4+Rm=0.60%+0.02%-0.10%-0.04%+5.96%=6.44%.
With reference to above-mentioned hypothesis example, the hybrid strain rate R in the present embodiment is calculated:In the present embodiment, no hybrid strain kind and miscellaneous
Pnca gene type, and in addition to the hybrid strain genotype that hybrid strain kind possesses, no frequency is more than 0.17% hybrid strain genotype, therefore,
R1, R2, R3 and R4 are 0, thus, R=Rm.In the 1st hybrid test zone, Rmi5=50.36%, Rfi5=
49.23%, therefore, the maternal selfing rate calculated using the 1st test zone is 50.36%-49.23%=1.13%, by phase
With method, calculate the special test zone of all 152 hybrids in, maternal selfing rate be 1.13%, 1.02%,
1.03%....., defined by Rm, their average value calculated after calculating the maternal selfing rate of the special test zone of these hybrids,
Obtain R=Rm=1.09% in the present embodiment
11, using variant sites, variant sites rate and hybrid strain rate, the specificity of kind to be measured, uniformity and steady are judged
Qualitative, method is as follows:
Wherein, SD is threshold value selected when judging specific, and M is to judge threshold selected when uniformity and stability
Value.The method for judging varietY specificity to be measured, uniformity and stability is:When variant sites rate >=SD or non-universal test zones
When variant sites be present, kind to be measured has specificity, and as variant sites rate < SD and variant sites are not present in non-universal survey
When trying region, kind to be measured is without specificity;As hybrid strain rate≤M of kind to be measured, kind to be measured has uniformity and steady
Qualitative, when the hybrid strain rate of kind to be measured is more than > M, kind to be measured does not have uniformity and stability.With M values, SD values
It is according to the factors such as breeding level, desired Stringency, mark characteristic, artificially determines.In the present embodiment, SD is selected
With 1% standard.
In the present embodiment, variant sites rate is 0.05%<SD=1%, but (numbering is No. 10 to non-universal test zone
Test zone) variant sites (being shown in Table 1) be present, therefore, judge that kind to be measured has specificity;The hybrid strain rate of kind to be measured
1.09% > M=1%, therefore, judge that kind to be measured does not have uniformity and stability.
Further, after varietY specificity to be measured, uniformity and stability is judged, the accuracy of judgement is estimated
Meter, method are as follows:
Specific accuracy calculates:When variant sites are not present in non-universal test zone, if judging, kind to be measured has
Specificity, the correct probability >=BINOM.DIST of conclusion ((1-SD) * TRN, TRN, 1-OD, TRUE);If judge kind to be measured not
With specificity, the correct probability >=BINOM.DIST (SD*TRN, TRN, OD, TRUE) of conclusion, wherein, TRN is successfully to detect
Test zone number, OD is variant sites rate, and BI NOM.DIST are the function in excel 2010, its application method with
Definition in excel 2010 is identical, and what it was returned is the probability of binomial distribution.What above-mentioned probability actually calculated is:When sentencing
Disconnected when having specific, variant sites rate is more than SD probability;When judging that kind to be measured does not have specific, variant sites rate
Probability less than SD, successful test zone is detected by being obtained after analyzing sequencing fragment group.
This implementation does not judge the specificity of kind to be measured using variant sites rate, therefore, is not calculating specific conclusion just
True probability.
Uniformity calculates with stability accuracy
The correct probability of conclusion for judging the uniformity and stability of kind to be measured is:When kind to be measured have uniformity and
During stability, correct probability >=BINOM.DIST (M*SN, SN, R, TRUE) * BINOM.DIST of conclusion (Σ SeN*M, Σ SeN,
R,TRUE);When kind to be measured does not have uniformity and stability, the correct probability >=BINOM.DIST of conclusion ((1-M) * SN,
SN, (1-R), TRUE) * BINOM.DIST (Σ SeN* (1-M), Σ SeN, 1-R, TRUE), wherein, M for judge uniformity and surely
Selected threshold value when qualitative, Σ SeN are all sequencings for being used for test zone where calculating the frequency of hybrid strain rate R genotype
The summation of fragment, BINOM.DIST (M*SN, SN, R, TRUE) are that kind to be measured has carried out SN sampling, the hybrid strain being actually pumped
Rate R is less than threshold value M probability, and BINOM.DIST (Σ SeN*M, Σ SeN, R, TRUE) meaning is:Kind to be measured is carried out
SeN sampling of Σ, the hybrid strain rate R being actually pumped are less than threshold value M probability;BINOM.DIST((1-M)*SN,SN,(1-R),
TRUE SN sampling) has been carried out for kind to be measured, the hybrid strain rate R being actually pumped is more than threshold value M probability, BINOM.DIST (Σ
SeN* (1-M), Σ SeN, 1-R, TRUE) meaning be:SeN sampling of Σ, the hybrid strain being actually pumped have been carried out to kind to be measured
Rate R is more than threshold value M probability.Σ SeN are after removing 80% minimum value and 10% maximum, are remained miscellaneous for calculating
The summation of the test fragment of the test zone of strain rate.Judge that the accuracy of uniformity and stability is depending entirely on hybrid strain rate just
True rate, and the positive rate of hybrid strain rate really depends on the accuracy of following three steps:First, kind sampling accuracy to be measured, second,
The accuracy of detection hybrid strain kind, the 3rd from extraction sample, utilize the hybrid strain kind of detection to calculate the accuracy of hybrid strain rate.Cause
This, the accuracy for judging kind uniformity and stability to be measured is the product of the step accuracy of the above three.Because even the present invention exists
Under the conditions of most stringent of, the accuracy of detection hybrid strain kind also controls more than 99.9%, is actually mostly close to 100%
's.Therefore, the product of the accuracy of the first step and the 3rd step can be estimated as by judging the accuracy of kind uniformity and stability to be measured,
It is respectively the value that former and later two functions are calculated in above-mentioned formula.For example, BINOM.DIST (M*SN, SN, R, TRUE) meaning
Justice is:Kind to be measured has carried out SN sampling, and the hybrid strain rate R being actually pumped is less than threshold value M probability;For calculating kind to be measured
Each sequencing fragment of hybrid strain rate, has substantially also quite carried out single sample, therefore, BINOM.DIST to kind to be measured
The meaning of (Σ SeN*M, Σ SeN, R, TRUE) is:SeN sampling of Σ, the hybrid strain rate R being actually pumped have been carried out to kind to be measured
Less than threshold value M probability.
In the present embodiment, the site for hybrid strain rate R is 153 special test zones of hybrid strain, and its sequencing total amount is
4403423, also that is, 30000 samples being pumped have been carried out with 4403423 sampling again, so big amount of sampling
Error is fairly small.In the present embodiment, judge that kind to be measured does not have uniformity and stability, therefore, the judgement conclusion is just
True probability >=BINOM.DIST ((1-M) * SN, SN, (1-R), TR UE) * BINOM.DIST (Σ SeN* (1-M), Σ SeN, 1-
R, TRUE)=BINOM.DIST ((1-1%) * 30000,30000, (1-1.09%), TRUE) * BINOM.DIST (4403423*
(1-1%), 4403423,1-1.09%, TRUE)=93.84%.It can be seen that uniformity and stability of this implementation to kind to be measured
Judgement be very accurate.
Result verification
Press《New variety of plant specificity, uniformity and stability test guide-rice》In method plant and observe and treat
Rice varieties and its approximate kind are surveyed, finds the high sense bacterial leaf-blight of rice varieties to be measured, approximate kind then high resistance to hoja blanca.
《New variety of plant specificity, uniformity and stability test guide-rice》Middle regulation:At least in a character with approximate product
When kind has obvious and reproducible difference, you can judge that the kind to be measured of application possesses specificity.Therefore, rice to be measured is judged
Kind has specificity.In experimentation, planted altogether 400 plants of kinds to be measured and approximate kind (200 plants of cells, totally 2
Individual repetition), 10 plants of special-shaped strains are found,《New variety of plant specificity, uniformity and stability test guide-rice》Middle regulation:
When sample size is 400~471 plants, 8 plants of special-shaped strains are at most allowed for.Therefore, judge kind to be measured without consistent
Property.It is generally believed that not having uniformity, just do not have stability yet.It is indicated above that to the special of kind to be measured in the present embodiment
The judgement of property, stability and uniformity is correct.
Embodiment two, measure corn variety ' G95/1102' specificity, uniformity and stability
The corn variety to be measured that the present embodiment provides is corn variety " G95/1102 ", and corn variety " G95/1102 " is jade
Rice kind " 1102 " and the cross combination of " G95 ", above kind are kind known to disclosure.Determine the special of the corn variety
Property, uniformity are similar with embodiment one with the method for stability, therefore, only describe difference.
First, by with method embodiment one kind, obtain the variant sites between different corn varieties.
The method for obtaining the genome sequence of different corn varieties is as follows:
The genome sequence of the different corn varieties of the present embodiment shows two kinds of sources, and the first is Chia etc. to 103 jade
The high-flux sequence sequence of the genome of rice kind, pertinent literature information are as follows:Chia JM et al.Maize
HapMap2identifies extant variation from a genome in flux.Nat Genet.2012,44
(7):803-7.The genome sequence of 103 corn varieties is published in NCBI Short Read Archive (http://
Www.ncbi.nlm.nih.gov/sra), reception number is SRA051245;Second is by Chia etc. the above-mentioned article delivered
To " G95 ", " 1102 " and cenospecies, " height relies 145 " to carry out high-flux sequence to the method for middle offer.The present embodiment obtains altogether
The high-flux sequence sequence of the genome of 106 corn varieties.
Further, by with method embodiment one kind, utilize different cultivars genome sequence obtain variant sites.
Software (version number 0.4) is compared by the genome of this 106 corn varieties using Frederick Sanger
High-flux sequence sequence compares " B73 " maize cell core reference gene group (version respectively:AGPv1, download address:http://
Www.ncbi.nlm.nih.gov) and in cytoplasm reference gene group, the cytoplasm reference gene group refers to base including mitochondria
Because of group and chloroplaset reference gene group, it is in NCBI (National Center for Biotechnology
Information, US National Biotechnology Information center) on reception number be respectively NC_007982.1 and NC_
001666.2.During contrast, Insert Fragment length is set to 500bp, and other specification is set as default value.The Ssaha Pileup of use
Software kit (version number 0.5) identifies the SNP site of each corn variety.The present embodiment is common between all 106 corn varieties
53855606 SNP sites are obtained, wherein 9005 SNP sites are located on cytoplasmic skeleton, remaining SNP site is positioned at thin
On karyon genome.
2nd, the test zone of corn variety to be measured is determined by variant sites, test zone includes universal test region, extremely
Small part variant sites are included in universal test region, and its method includes:
By with method embodiment one kind, determine universal test region
First, centered on each SNP site of acquisition, respectively extend 99bp and 100bp to the left and right, form 200bp change
Different window.According to the 53855606 of acquisition SNP sites, 53855606 variation windows can be obtained, calculates and obtains all
The discrimination of 53855606 variation windows simultaneously therefrom chooses 8000 variations for being located at discrimination maximum in nuclear genome
Window and 100 maximum variation windows of discrimination in cytoplasmic skeleton.Check one by one positioned at nuclear genome
In 8000 variation windows, each distance to make a variation between window and next variation window, if distance is more than 500K (1K=1000
Individual base), then abandon reexamining after the less variation window of wherein discrimination, up to the adjacent distance for looking into variation window is big
Untill 500K.Selection 500K criterion distance be because Maize genome size be about 2300M (ten thousand bases of 1M=100),
By final selected 2400 based on the universal test region of nuclear genome, the interregional distance of average universal test is about
1M, but due to few variant sites such as some specific regions such as centromeres, therefore, average distance should be less than 1M.More than pressing
Method, 5030 variation windows for being located at nuclear genome are have selected, they are located at area in cytoplasmic skeleton with what is obtained
Totally 5130 variation windows pass through test zone to 100 maximum variation windows of indexing as selected together.
In the present embodiment, because corn variety to be measured is come by pinpointing transformation, need to examine without non-universal site
Survey, therefore, universal test region nothing but.
3rd, by with method embodiment one kind, prepare the primer in amplification assay region, the primer includes universal test area
Domain primer, it is specific as follows:
Primer acquisition process is by identical with the method for embodiment one.From all 5130 universal test regions, design and into
Work(demonstrates 2506 pairs of multiple PCR primers, for expanding corresponding 2506 universal test regions.Verify multiple PCR primer
Method is identical with the method for embodiment one, and 2506 universal test regions of above-mentioned successful design multiple PCR primer are as final
For the universal test region of corn variety to be measured detection, meanwhile, each kind in the database of structure also contains above-mentioned
2506 universal test regions, wherein, 34 universal test regions are located on cytoplasmic skeleton, remaining 2472 general surveys
Examination region is located on nuclear genome.
Corn variety to be measured in the present embodiment is due to no non-universal test zone, therefore, universal test region nothing but
Primer.
4th, the method for the database of genotype in all test zones of the structure comprising different cultivars is as follows:
This example obtains 2506 universal test region primers and 0 non-universal test zone primer, corresponding to them
Amplification region is the test zone of corn variety to be measured.The genotype of 2506 test zones of the structure comprising 106 kinds and
The database of its SNP positional information, partial results are shown in Table 4.
Table 4 is database variety and genetype and its position, Maize Genotypes to be measured, hybrid strain genotype and its frequency
Certain embodiments
Symbol in table 4 is identical with the symbolic significance in table 1.
5th, after the amount of sampling SN for determining corn variety to be measured, random sampling mixes and extracts the DNA of mixing sample, method
It is as follows:
By with the identical method of embodiment one, calculating corn variety amount of sampling to be measured should >=4000.In the present embodiment, select
6000 germinations have been taken, have been placed in after randomly selecting the mixing of 5000 buds being substantially equal to the magnitudes in mortar, have been pressed and embodiment
One identical method, extract and obtain the DNA of corn variety mixing sample to be measured, and the DNA of acquisition is quantified, will be quantitative
Corn variety DNA to be measured afterwards is diluted to 10.00ng/ μ l.
6th, expanded using the DNA of primer pair mixing sample, obtain the amplified production of test zone, amplified production is used
In structure high-throughput sequencing library, wherein primer includes universal test region primer and the high-flux sequence text in universal test region
Storehouse, specific method are as follows:
By with the identical method of embodiment one, the high-throughput sequencing library in structure universal test region, in the present embodiment,
Because of the high-throughput sequencing library in universal test region nothing but, therefore, the high-throughput sequencing library in the universal test region of structure is
The high-throughput sequencing library for all test zones for being 100pM for concentration.
7th, high-flux sequence is carried out to high-throughput sequencing library, obtains that fragment group is sequenced.
By with the identical method of embodiment one, determine high-flux sequence depth CF be >=2237 times.By with the phase of embodiment one
Same method, high-flux sequence is carried out using high-throughput sequencing library, high-flux sequence flux is arranged to average coverage test area
30000 times of domain.
By with the identical method of embodiment one, a large amount sequencing result is pre-processed.
8th, analysis sequencing fragment group, obtains Maize Genotypes to be measured and hybrid strain genotype, method is as follows;
By with the identical method of embodiment one, altogether obtain 44 special test zones of hybrid, 61 hybrid strain genotype simultaneously point
Their frequency Ji Suan not obtained.Found by with the identical method of embodiment one, progress standard sample detection:In standard sample
There is no hybrid strain genotype, so, the problem of removing wrong hybrid strain genotype is also not present.
9th, by Maize Genotypes to be measured compared with the genotype of the different cultivars in database, obtain approximate kind,
Variant sites and variant sites rate, method are as follows:
By " high to rely 145 " the approximate kinds for being corn variety to be measured, corn to be measured with the identical method of embodiment one, acquisition
The variant sites rate of kind is 1.91%.
Tenth, by hybrid strain genotype compared with the genotype of the different cultivars in database, after obtaining hybrid strain kind, calculate miscellaneous
Strain rate, method are as follows:
By with the identical method of embodiment one, obtain BKN017:MZ is nucleus hybrid strain kind, and all 61 hybrid strain bases
Because type is present in BKN017:In MZ.In the present embodiment, acellular matter hybrid strain kind.
Assume example (table 3) by with the identical method of embodiment one, reference, calculate the hybrid strain rate R in the present embodiment:At this
In embodiment, nucleus hybrid strain kind BKN017 be present:MZ, acellular matter hybrid strain kind, and remove the hybrid strain that hybrid strain kind possesses
Outside genotype, no frequency is more than 0.17% hybrid strain genotype, and therefore, R2, R3 and R4 are 0, thus, R=R1+Rm.By
In only one hybrid strain kind, therefore, all 61 hybrid strain genotype are special hybrid strain genotype, and their frequency is respectively
1.03%, 1.02%........, remove after maximum of which 10% calculates average value afterwards with minimum 80% and be multiplied by 2, meter
Calculate the R1=2.07% obtained.In the 1st special test zone of hybrid, Rmi5=48.88%, Rfi5=48.84%, because
This, the maternal selfing rate calculated using the 1st test zone is 48.88%-48.84%=0.04%, by identical method, meter
Calculate in the special test zone of all 44 hybrids, maternal selfing rate be 0.04%, 0.05%, 0.03%....., defined by Rm,
Their average value is calculated after calculating the maternal selfing rate of the special test zone of these hybrids, obtains Rm=in the present embodiment
0.04%.Therefore, in the present embodiment, R=R1+Rm=2.07%+0.04%=2.11%.
11, by with the identical method of embodiment one, using variant sites, variant sites rate and hybrid strain rate, judge to be measured
Specificity, uniformity and the stability of corn variety, method are as follows:
In the present embodiment, variant sites rate is 1.91%>SD=1%, therefore, it is special to judge that corn variety to be measured has
Property;The hybrid strain rate 2.11% of corn variety to be measured<M=3%, therefore, judge that corn variety to be measured has uniformity and stability.
Further, by with the identical method of embodiment one, judge corn variety to be measured specificity, uniformity with stably
After property, the accuracy of judgement is estimated, method is as follows:
In this implementation, the specificity of corn variety to be measured is judged using variant sites rate, and judges corn variety to be measured
With specificity, therefore, the correct probability >=BINOM.DIST of conclusion ((1-SD) * TRN, TRN, 1-OD, TRUE)=
BINOM.DIST ((1-1%) * 2465,2465,1-1.91%, TRUE)=99.99%.As can be seen here, the present embodiment is to special
Property judge the correct probability of conclusion be very big.
By with the identical method of embodiment one, calculate the accuracy of uniformity and determination of stability
In the present embodiment, the site for hybrid strain rate R is the special test zone of 44 hybrid strains and 61 special hybrid strain genes
The test zone of type, it is 3513478 that total amount, which is sequenced, in it, also that is, being carried out again to 5000 samples being pumped
3513478 sampling, the error of so big amount of sampling is fairly small.In the present embodiment, judge that corn variety to be measured has
Uniformity and stability, therefore, correct probability >=BINOM.DIST (M*SN, SN, R, the TRUE) * of the judgement conclusion
BINOM.DIST (Σ SeN*M, Σ SeN, R, TRUE)=BIN OM.DIST (3%*5000,5000,2.11%, TRUE) *
BINOM.DIST (3513478*3%, 3513478,2.11%, TRUE)=100.00%.It can be seen that this implementation is to corn product to be measured
The judgement of the uniformity and stability of kind is very accurate.
Result verification
Press《New variety of plant specificity, uniformity and stability test guide-corn》In method plant and observe and treat
Corn variety and its approximate kind are surveyed, find corn variety to be measured exists significantly on multiple test characters with approximate kind
Difference.《New variety of plant specificity, uniformity and stability test guide-corn》Middle regulation:At least in a character with
When approximate kind has obvious and reproducible difference, you can judge that the corn variety to be measured of application possesses specificity.Therefore, sentence
Fixed corn variety to be measured has specificity.In experimentation, 40 plants of kinds to be measured and approximate kind (20 plants one has been planted altogether
Cell, totally 2 repetitions), 1 plant of special-shaped strain is found,《New variety of plant specificity, uniformity and stability test guide-corn》
Middle regulation:When sample size is 40 plants, 3 special-shaped strains are at most allowed for, thus judge that kind to be measured has uniformity.
《New variety of plant specificity, uniformity and stability test guide-corn》Middle regulation:If a kind possesses uniformity,
Then it is believed that the kind possesses stability.Thus judge, kind to be measured also has stability.Experiment shows more than:This reality
It is correct to apply the judgement in example to the specificity of kind to be measured, stability and uniformity.
The embodiment of the present invention is expanded by high-flux sequence and more sites, realizes large sample sampling and the kind of kind to be measured
Between individual test zone large sample sampling, recycle and define hybrid strain genotype, define cytoplasm hybrid strain kind and define hybrid strain
The comprehensive means such as rate calculation formula, successfully realize specificity that is accurate, quick, intactly judging kind to be measured, stability
With the target of uniformity, it has the technical effect that what existing DUS method of testings did not all reach.What the present invention detected is PCR primer, can
Easily according to case flexible design primer, to detect non-universal test zone.By taking the embodiment of the present invention one as an example, for
For 30000 individual amount of samplings for traditional DUS measuring technologies, work is big, can not complete, for example, field DUS is tested
In, 30000 plants of rice of sampling need to plant more than 2 mu of rice field, and need to plant 2 years, and annual every plant of rice need to be investigated more than 70
Individual character., it is necessary to be 30000 DNA extractions, 30000*2231 PCR respectively in widely used SSR molecules DUS tests
(assuming that as the present embodiment, have detected 2231 universal test regions) is detected with 30000*2231 PCR primer.Therefore,
Because workload is excessive, existing molecule DUS tests there all are not measuring stability and uniformity, although DUS test detections in field are consistent
Property and stability, but sampling samples amount, all below 1000 plants, 30000 plants of rice and the present embodiment has been sampled, its accuracy shows
It is so higher.Why the present embodiment can increase amount of sampling, be because all 30000 samples are used as a sample after all mixing
Processing, and field DUS test and comparisons, workload is equivalent to being reduced to 1/30000;Further, all 2231 universal tests
Mixed once amplification is all only done in region and high-flux sequence detects, and SSR molecule DUS test and comparisons, workload equivalent to
It is reduced to 1/ (30000*2231).Therefore, the present invention realizes large sample and more sites in the case where workload significantly mitigates
Detection, make DUS tests not only accurate but also simple.Database variety and genetype is base group in the embodiment of the present invention one simultaneously
Into very standard detects same breed under different experimental conditions, can obtain identical gene in the present inventive method
Type, accordingly, it is not necessary under different conditions repeat DUS test, therefore, the embodiment of the present invention can directly with database kind base
Because type compares, the approximate kind of kind to be measured is objectively selected.And existing DUS measuring technologies are not up to standard, kind to be measured with
Approximate kind abreast carries out DUS tests simultaneously, just reliable conclusion can be obtained, in order to mitigate workload, it has to by by product
Kind power applicant provide approximate kind, if approximate kind mistake, there may be the legal consequence of erroneous grants.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.
Claims (7)
1. a kind of method of specificity for determining hybrid plant new varieties, uniformity and stability, it is characterised in that methods described
Including:
Obtain the variant sites in kind between different cultivars belonging to kind to be measured;
The test zone of the kind to be measured is determined by the variant sites, the test zone includes universal test region,
At least partly described variant sites are included in the universal test region, are passed throughCalculate discrimination
Value, wherein, a is the kind sum being detected in variation window area, and bi is i-th kind of genotype in the variation window area
Kind number, and bi>1, k is the number of the genotype comprising more than a kind, and the variation window area is with each monokaryon
Centered on thuja acid variant sites, 1/2 conduct for respectively extending sequence length to be measured to the both sides in the single nucleotide variations site is examined
The window of survey;The universal test region is the 6800 variation windows or nucleus base that discrimination is maximum on cytoplasmic skeleton
The upper discrimination is maximum and equally distributed 200 variations window because organizing, wherein, the genotype is in the test zone
The combination in multiple single nucleotide variations sites;
Structure includes database of the different cultivars in the genotype of all test zones;
After the amount of sampling SN for determining the kind to be measured, random sampling mixes and extracts the DNA of mixing sample, the amount of sampling SN
Meet following condition:BINOM.INV (SN, M, 0.95)/SN≤1.15*M, wherein BINOM.INV are the letter in excel 2010
Number, to judge threshold value selected when the uniformity and stability, the condition implication that the amount of sampling SN meets is M:Even if
The hybrid strain rate only exceeds the 15% of threshold value M, and the amount of sampling can correctly judge the product to be tested in the case where 95% probability ensures
The stability and uniformity of kind;
The primer for expanding the test zone is prepared, the primer includes universal test region primer;
Expanded using the DNA of mixing sample described in the primer pair, obtain the amplified production of the test zone, the expansion
Volume increase thing is used to build high-throughput sequencing library;
High-flux sequence is carried out to the high-throughput sequencing library, obtains that fragment group, the depth CF of the high-flux sequence is sequenced
Meet following condition:BINOM.DIST (10,10, BINOM.DIST (8,20, BINOM.DIST (0, CF, 0.1%, TRUE),
TRUE), FALSE) >=99.9%, 1-BINOM.DIST (10000,10000,1-BINOM.DIST (8,20,1-BINOM.DIST
(99.99%*CF, CF, 99.9989%, TRUE), TRUE), FALSE)≤0.1% and BINOM.DIST (10* (1-M) * CF,
10*CF, 1-110%*M, TRUE) >=95.0%, wherein, C F are the depth of the high-flux sequence, and M is described consistent to judge
Property and selected threshold value during stability, BINOM.DIST be the function in excel 2010, the depth of the high-flux sequence
CF meet condition implication be:In the hybrid strain rate as little as 0.1%, the hybrid strain kind for 10 and the hybrid strain kind and institute
Under conditions of stating product to be tested inter-species averagely only 20 difference sites, the detection determined by the depth CF of the high-flux sequence is complete
Probability >=99.9% of hybrid strain kind described in portion;The database kind for 10000 and the hybrid strain kind with it is described
Under conditions of product to be tested inter-species averagely only has 20 difference sites, the presence determined by the depth CF of the high-flux sequence is judged by accident
Probability≤0.1% of the hybrid strain kind;In the hybrid strain kind be 10 and true hybrid strain rate is exceeded only when judging specific
Selected threshold value 10% when, by the depth CF of the high-flux sequence determine to stability and the judgement conclusion of uniformity
Correct probability >=95.0%;
The sequencing fragment group is analyzed, obtains variety and genetype to be measured and hybrid strain genotype;
By the variety and genetype to be measured compared with the genotype of the different cultivars in the database, obtain described to be measured
Approximate kind, variant sites and the variant sites rate of kind;
By the hybrid strain genotype compared with the genotype of the different cultivars in the database, after obtaining hybrid strain kind,
Calculate hybrid strain rate;
Using the variant sites, the variant sites rate and the hybrid strain rate, the varietY specificity to be measured, uniformity are judged
And stability.
2. according to the method for claim 1, it is characterised in that the test zone also includes non-universal test zone, institute
Stating primer also includes non-universal test zone primer.
3. according to the method for claim 2, it is characterised in that the non-universal test zone primer include the first primer and
Second primer, first primer include the first forward primer and the first reverse primer, and it is positive that second primer includes second
Primer and the second reverse primer, first primer and second primer carry out respectively individually amplification obtain two it is described non-through
With the amplified production of test zone, the amplified production mixed in equal amounts of two non-universal test zones is used to build independent expansion
The high-throughput sequencing library of increasing;
5 ' end connections of first forward primer are just like SEQ ID NO in sequence table:Sequence 1 shown in 1, described first is reverse
5 ' end connections in primer are just like SEQ ID NO in sequence table:Sequence 2 shown in 2;
5 ' end connections of second forward primer are just like SEQ ID NO in sequence table:Sequence 2 shown in 2, described second is reverse
5 ' end connections of primer are just like SEQ ID NO in sequence table:Sequence 1 shown in 1.
4. according to the method for claim 2, it is characterised in that utilize the variant sites, the variant sites rate and institute
Hybrid strain rate is stated, judging the method for the varietY specificity to be measured, uniformity and stability includes:
When the variant sites be present in the variant sites rate >=non-universal test zones of SD or described, the kind tool to be measured
There is specificity, it is described as the variant sites rate < SD and when the variant sites are not present in the non-universal test zone
For kind to be measured without specificity, wherein SD is threshold value selected when judging specific;
As the hybrid strain rate≤M of the kind to be measured, the kind to be measured has uniformity and stability, when described to be measured
When the hybrid strain rate of kind is more than > M, the kind to be measured does not have uniformity and stability, and M is to judge the uniformity
With threshold value selected during stability;
The hybrid strain rate R=R1+R2-R3-R4+Rm, wherein:
Wherein, n1 is the number of nucleus hybrid strain kind, and t1 is the i-th 1
The number of all special hybrid strain karyogene types of the nucleus hybrid strain kind, i1j1 are the i-th 1 nucleus hybrid strain kinds
All special hybrid strain karyogene types sorted from low to high by frequency after, the special hybrid strain karyogene type of jth 1,
R1i1j1 is the frequency of the i-th 1j1 special hybrid strain karyogene types;R1 is the nucleus calculated by hybrid strain karyogene type
The summation of the hybrid strain rate of hybrid strain kind, the hybrid strain rate of the nucleus hybrid strain kind are to remove in the nucleus hybrid strain kind most
After the frequency of the special hybrid strain karyogene type of low 80% and highest 10%, the remaining special hybrid strain karyogene type
2 times of average value of frequency;
Wherein, t2 is the hybrid strain core base possessed except the nucleus hybrid strain kind
Because of the number of the hybrid strain karyogene type outside type and frequency >=0.17%, i2 is except the nucleus hybrid strain kind possesses
The hybrid strain karyogene type outside all hybrid strain karyogene types sorted from low to high by frequency after, the i-th 2 are described miscellaneous
Strain karyogene type, R2i2 are the frequency of the i-th 2 hybrid strain karyogene types;R2 is utilized except the nucleus hybrid strain kind is gathered around
The hybrid strain rate that the hybrid strain karyogene type that has calculates, R2 be remove except the nucleus hybrid strain kind possess it is described miscellaneous
In the frequency of strain karyogene type after the value of minimum 80% and highest 10%, 2 times of the average value of surplus value;
Wherein, n2 is the number of cytoplasm hybrid strain kind, and R3i3 is the i-th 3
The hybrid strain rate of the individual cytoplasm hybrid strain kind, t3 are all special hybrid strain matter genes of the i-th 3 cytoplasm hybrid strain kinds
The number of type, i3j3 be the i-th 3 cytoplasm hybrid strain kinds all special hybrid strain matter genotype by frequency by it is low to
After height sequence, the special hybrid strain matter genotype of jth 3, R3i3j3 is the frequency of the i-th 3j3 special hybrid strain matter genotype
Rate;R3 is by the summation of the hybrid strain rate of the cytoplasm hybrid strain kind of hybrid strain matter genotype calculating, the cytoplasm hybrid strain product
The hybrid strain rate of kind is to remove the special hybrid strain matter base of 80% and highest 10% minimum in the cytoplasm hybrid strain kind
After the frequency of type, the average value of the frequency of the remaining special hybrid strain matter genotype;
Wherein, t4 is the hybrid strain matter base possessed except the cytoplasm hybrid strain kind
Because of the number of the hybrid strain matter genotype outside type and frequency >=0.17%, i4 is except the cytoplasm hybrid strain kind possesses
The hybrid strain matter genotype outside all hybrid strain matter genotype sorted from low to high by frequency after, the i-th 4 are described miscellaneous
Strain matter genotype, R4i4 are the frequency of the i-th 4 hybrid strain matter genotype;R4 is utilized except the cytoplasm hybrid strain kind is gathered around
The hybrid strain rate that the hybrid strain matter genotype having calculates, R4 is to remove the hybrid strain matter possessed except the cytoplasm hybrid strain kind
In the frequency of genotype after the value of minimum 80% and highest 10%, the average value of surplus value;
Wherein, t5 is the number of the special test zone of hybrid;I5 is special for the i-th 5 hybrids
Test zone;Rmi5 is in the i-th 5 special test zone of hybrid, the frequency of female genotype;Rfi5 is described in the i-th 5
In the special test zone of hybrid, the frequency of male parent gene type;Rm is the hybrid strain rate of maternal selfing, and Rm is that the hybrid is special
In test zone, the average value of the frequency of the female genotype and the difference of the frequency of the male parent gene type;
Int () is bracket function;
The nucleus hybrid strain kind refers to calculate the hybrid strain kind obtained, the cytoplasm hybrid strain merely with karyogene type
Kind refers to calculate the hybrid strain kind obtained merely with matter genotype;The special hybrid strain karyogene type refers to only one
All hybrid strain karyogene types of the nucleus hybrid strain kind;The special hybrid strain matter genotype refers to only described in one
All hybrid strain matter genotype of cytoplasm hybrid strain kind;The hybrid strain karyogene type refers to that the hybrid strain genotype is described
Karyogene type;The hybrid strain matter genotype refers to that the hybrid strain genotype is the matter genotype;Specifically tested in the hybrid
In region, the female genotype differs with the male parent gene type, and the female genotype and all nucleus are miscellaneous
The genotype of strain kind is different, and the male parent gene type is also different from the genotype of all nucleus hybrid strain kinds;Institute
It is the genotype identical genotype with female parent in the kind to be measured to state female genotype;The male parent gene type is described
In kind to be measured, the genotype identical genotype with male parent;
The karyogene type refers to the genotype on nuclear genome;The matter genotype refers to be located at cytoplasmic skeleton
On genotype.
5. according to the method for claim 4, it is characterised in that methods described also includes treating described in judgement in the following ways
The correct probability of conclusion of uniformity and stability for surveying kind is:When the kind to be measured has uniformity and stability,
Correct probability >=BINOM.DIST (M*SN, SN, R, TRUE) * BINOM.DIST of conclusion (∑ SeN*M, ∑ SeN, R, TRUE);
When the kind to be measured does not have the uniformity and stability, the correct probability >=BINOM.DIST of conclusion ((1-M) * SN,
SN,(1-R),TRUE)*BINOM.DIST(∑SeN*(1-M),∑SeN,1-R,TRUE);Wherein, M is to judge the uniformity
With threshold value selected during stability, ∑ SeN is used for institute where calculating the frequency of the genotype of the hybrid strain rate R to be all
The summation of the sequencing fragment of test zone is stated, BINOM.DIST (M*SN, SN, R, TRUE) is that the kind to be measured has been carried out SN times
Sampling, probability of the hybrid strain rate R being actually pumped less than the threshold value M, BINOM.DIST (∑ SeN*M, ∑ SeN, R,
TRUE meaning) is:SeN sampling of ∑ is carried out to the kind to be measured, the hybrid strain rate R being actually pumped is less than threshold value M
Probability;BINOM.DIST ((1-M) * SN, SN, (1-R), TRUE) has carried out SN time for the kind to be measured and sampled, and actually takes out
The hybrid strain rate R obtained is more than the probability of the threshold value M, BINOM.DIST (∑ SeN* (1-M), ∑ SeN, 1-R, TRUE) meaning
Justice is:SeN sampling of ∑ is carried out to the kind to be measured, the hybrid strain rate R being actually pumped is more than threshold value M probability, institute
The frequency for stating genotype refers in the sequencing fragment group that the sequencing segments for representing the genotype is accounted for where the genotype
The ratio of the sequencing fragment sum of the test zone.
6. according to the method for claim 4, it is characterised in that when the change dystopy is not present in the non-universal test zone
During point, if it is specific to judge that the kind to be measured has, the correct probability >=BINOM.DIST of conclusion ((1-SD) * TRN, TRN,
1-OD,TRUE);If it is specific to judge that the kind to be measured does not have, the correct probability >=BINOM.DIST of conclusion (SD*TRN,
TRN, OD, TRUE), wherein, TRN is the number for detecting successful test zone, and OD is the variant sites rate, BINOM.DIST
For the function in excel 2010, the correct probability of conclusion is expressed as when judging that the kind to be measured has specific,
The variant sites rate is more than SD probability, and when judging that the kind to be measured does not have specific, the variant sites rate is small
In SD probability, the successful test zone of detection after analyzing the sequencing fragment group by obtaining.
7. according to the method for claim 1, it is characterised in that obtaining the method for the hybrid strain kind includes:The hybrid strain
Kind is the kind being present in the database, and the potential hybrid strain genotype of the hybrid strain kind and the hybrid strain genotype
Between have phase homogenic type the number of the test zone account for the hybrid strain kind there is the described of the potential hybrid strain genotype
Total ratio >=60% of test zone;The hybrid strain genotype refers to the potential hybrid strain genotype of frequency >=0.02%;
Quantity >=2 or described of distinguishing base between all genotype of the potential hybrid strain genotype and the kind to be measured
There are insertion or the missing of discontinuous base in distinguishing base.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510150504.6A CN104805189B (en) | 2015-03-31 | 2015-03-31 | A kind of method of the specificity for determining hybrid plant new varieties, uniformity and stability |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510150504.6A CN104805189B (en) | 2015-03-31 | 2015-03-31 | A kind of method of the specificity for determining hybrid plant new varieties, uniformity and stability |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104805189A CN104805189A (en) | 2015-07-29 |
CN104805189B true CN104805189B (en) | 2017-12-15 |
Family
ID=53690385
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510150504.6A Active CN104805189B (en) | 2015-03-31 | 2015-03-31 | A kind of method of the specificity for determining hybrid plant new varieties, uniformity and stability |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104805189B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111524022B (en) * | 2020-03-12 | 2023-04-25 | 中国农业科学院蔬菜花卉研究所 | Plant variety DUS testing method |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9290774B2 (en) * | 2011-12-01 | 2016-03-22 | Rijk Zwaan Zaadteelt En Zaadhandel B.V. | Lettuce variety 79-69 RZ “triplex RZ” |
CN103088120B (en) * | 2012-11-29 | 2014-10-15 | 北京百迈客生物科技有限公司 | Large-scale genetic typing method based on SLAF-seq (Specific-Locus Amplified Fragment Sequencing) technology |
CN103060318B (en) * | 2013-01-11 | 2014-08-13 | 山东省农业科学院作物研究所 | SSR (Simple Sequence Repeat) core primer group developed based on whole genome sequence of foxtail millet and application of SSR core primer group |
CN103194537B (en) * | 2013-03-13 | 2014-07-09 | 山东省农业科学院作物研究所 | Cabbage SSR fingerprint construction method |
US20140283187A1 (en) * | 2013-03-13 | 2014-09-18 | Rijk Zwaan Zaadteelt En Zaadhandel B.V. | Hybrid melon variety 34-309 rz |
US20140182006A1 (en) * | 2014-02-26 | 2014-06-26 | Nunhems B.V. | Hybrid carrot varity NUN 89201 |
CN104328507B (en) * | 2014-10-11 | 2016-03-30 | 中国水稻研究所 | A kind of SNP chip, Preparation method and use for rice varieties qualification |
-
2015
- 2015-03-31 CN CN201510150504.6A patent/CN104805189B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN104805189A (en) | 2015-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104846076B (en) | A method of specificity, consistency and the stability of measurement cross-bred rape new varieties | |
Williams et al. | [51] Genetic analysis using random amplified polymorphic DNA markers | |
CN103114150B (en) | The method that storehouse order-checking is identified is built with the mononucleotide polymorphism site of Bayesian statistic based on enzyme action | |
CN107937502A (en) | A kind of method for screening the high polymorphic molecular marker site of microorganism | |
CN106834507A (en) | DMD gene traps probe and its application in DMD detection in Gene Mutation | |
CN104480205A (en) | Method of establishing animal paternity identification system on basis of whole genome STR | |
US20210285063A1 (en) | Genome-wide maize snp array and use thereof | |
CN109182538A (en) | Mastadenitis of cow key SNPs site rs88640083 and 2b-RAD Genotyping and analysis method | |
CN113136422A (en) | Method for detecting high-throughput sequencing sample contamination by grouping SNP sites | |
CN104830975A (en) | Novel method for testing corn parent source authenticity and proportion | |
US11739374B2 (en) | Methods and compositions for pathogen detection in plants | |
CN104805189B (en) | A kind of method of the specificity for determining hybrid plant new varieties, uniformity and stability | |
CN107815489A (en) | A kind of method for screening the high polymorphic molecular marker site of plant | |
CN104805182B (en) | A kind of method for the specificity, uniformity and stability for determining new hybrid rice varieties | |
CN104805184B (en) | A kind of method of the specificity for testing pure lines new rice variety, uniformity and stability | |
CN105603081B (en) | Non-diagnosis-purpose qualitative and quantitative detection method for intestinal microorganisms | |
CN104805191B (en) | A kind of method of the specificity for testing pure lines corn variety, uniformity and stability | |
CN104805190B (en) | A kind of method of the specificity for determining hybrid maize variety, uniformity and stability | |
CN104805187B (en) | A kind of method of the specificity for testing pure lines new soybean varieties, uniformity and stability | |
CN105624298A (en) | Method for detecting genetically modified components of rape | |
CN104846077B (en) | A method of specificity, consistency and the stability of test pure lines new rape variety | |
CN104805186B (en) | A kind of method for testing corn variety substance derived relation | |
CN104805183A (en) | Method for testing distinctness, uniformity and stability of pure-line plant new variety | |
CN104805185B (en) | A kind of method of test plants kind substance derived relation | |
CN104805193A (en) | Method for testing substantive derivation relation of rice varieties |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |