CN106520961B

CN106520961B - Corn microsatellite marker locus development method and length detection method of microsatellite markers in microsatellite marker locus

Info

Publication number: CN106520961B
Application number: CN201611030504.3A
Authority: CN
Inventors: 李论; 彭海; 周俊飞; 方治伟
Original assignee: Jianghan University
Current assignee: Jianghan University
Priority date: 2016-11-16
Filing date: 2016-11-16
Publication date: 2020-03-27
Anticipated expiration: 2036-11-16
Also published as: CN106520961A

Abstract

The invention discloses a development method of a corn microsatellite marker locus and a length detection method of a microsatellite marker in the microsatellite marker locus. The development method comprises the following steps: obtaining a mixed sample; extracting the genome of the mixed sample; fragmenting a genome to obtain a genome fragment; respectively hybridizing the probe sets with the genome fragments; purifying the successfully hybridized genome fragments in the plurality of hybridization solutions; mixing a plurality of the purified hybrid genomic fragments, and detecting the purified genomic fragments by using high-throughput sequencing; obtaining an effective said high throughput sequencing fragment; classifying the valid high throughput sequencing fragments. The detection method comprises the following steps: selecting a microsatellite marker locus to be detected; and amplifying the microsatellite marker in the microsatellite marker locus to be detected by using a multiplex amplification primer to obtain the length of the microsatellite marker in the microsatellite marker locus. The method is simple, rapid, comprehensive and accurate.

Description

Corn microsatellite marker locus development method and length detection method of microsatellite markers in microsatellite marker locus

Technical Field

The invention relates to the field of biotechnology, in particular to a corn microsatellite marker locus development method and a microsatellite marker length detection method in the microsatellite marker locus.

Background

The microsatellite marker is also called Short Tandem Repeat (STR) or simple repeat (SSR) and is composed of more than 2 nucleotides which are tandem repeats of a repeat unit. The microsatellite marker loci refer to loci containing microsatellite markers on a genome, the microsatellite marker loci are abundant and uniformly distributed on the genome, and the development of the microsatellite marker loci refers to a process for searching the microsatellite marker loci on the genome. In different samples, the repetition times of the repeat units of the microsatellite markers in the same microsatellite marker locus may be different, and length variation exists among samples, so that the polymorphism of the microsatellite marker locus mainly refers to the length polymorphism of different microsatellite markers of the same microsatellite marker locus. Microsatellite marker detection techniques refer to techniques that detect the length of a microsatellite marker in a microsatellite marker locus. The length polymorphism of the microsatellite markers of different samples can be used for identifying the identity of the samples, so the application of the microsatellite marker technology is very wide, and the microsatellite marker technology comprises biodiversity identification, animal and plant variety fingerprint identity card identification and the like.

The traditional development and detection of maize microsatellite marker loci comprises the following steps: extracting a genome, fragmenting the genome, connecting joints, amplifying, hybridizing with a simple repetitive sequence, purifying a hybridization product, cloning a hybridization product, converting escherichia coli of a cloning product, picking single clones, performing first-generation sequencing on a target site of each single clone, analyzing a sequencing result to obtain a microsatellite marker site, detecting polymorphism of the microsatellite marker site in a plurality of corn samples, developing the microsatellite marker site with high polymorphism, amplifying one by one and detecting the microsatellite marker in each microsatellite marker site to be detected in each sample to be detected by electrophoresis.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

the development and detection process of the corn microsatellite marker locus is complex, low in flux and extremely time-consuming and labor-consuming; secondly, the electrophoretic detection of the microsatellite marker locus has low resolution, the detection result is inaccurate, and the accurate result needs to be corrected by a reference sample and the like. Problems derived from this include: the number of the developed microsatellite marker loci is less, usually less than 200, and accounts for about 1% of all the microsatellite marker loci on the genome; the corn samples for detecting the polymorphism of the microsatellite marker locus are few, usually about tens of samples, so that the polymorphism detection result is inaccurate; the conservation of flanking sequences of the microsatellite marker loci is unknown, so that the universality of primers for amplifying the microsatellite marker loci is influenced; the number of detected microsatellite marker sites is limited, and dozens of microsatellite marker sites are generally detected in a sample to be detected, so that the DNA identity card information of the established sample is incomplete and inaccurate.

Disclosure of Invention

In order to solve the problems of the prior art, the embodiment of the invention provides a corn microsatellite marker locus development method and a microsatellite marker length detection method in the microsatellite marker locus. The technical scheme is as follows:

in one aspect, an embodiment of the present invention provides a method for developing a maize microsatellite marker locus, including:

mixing n corn samples with polymorphism in equal mass to obtain a mixed sample, wherein n is more than 1;

extracting the genome of the mixed sample;

fragmenting the genome of the mixed sample to obtain a genome fragment;

using a plurality of probes with simple repetitive sequences as probe sets, hybridizing the genomic fragments with each probe in the probe sets respectively to obtain a plurality of hybridization solutions, and purifying the genomic fragments successfully hybridized in the hybridization solutions respectively to obtain a plurality of purified hybrid genomic fragments;

after a plurality of purified hybrid genome segments are mixed in equal mass, detecting the mixed purified hybrid genome segments by using high-throughput sequencing to obtain a first high-throughput sequencing segment;

screening said first high-throughput sequencing fragment for an effective high-throughput sequencing fragment comprising a microsatellite marker within a microsatellite marker locus;

classifying the effective high-throughput sequencing fragments according to homology of sequences on two sides of a microsatellite marker in the effective high-throughput sequencing fragments, wherein the effective high-throughput sequencing fragments of the same class are the effective high-throughput sequencing fragments of the same microsatellite marker locus, if the number of the effective high-throughput sequencing fragments of the same microsatellite marker locus is more than or equal to α 1, one microsatellite marker locus is successfully developed, wherein α 1 is a first judgment threshold and α 1 is more than or equal to (the high-throughput sequencing depth is multiplied by the proportion of the effective high-throughput sequencing fragments/the number of the microsatellite marker loci capable of being detected on a genome) multiplied by probability.

In general, to facilitate purification of the successfully hybridized genomic fragment from the hybridization solution, the probe may be functionally labeled, e.g.

Hybridizing a biotin-labeled probe with a simple repetitive sequence with the genome segment to obtain a hybridization solution;

and purifying the successfully hybridized genome fragment in the hybridization solution by using streptavidin magnetic beads to obtain a purified genome fragment.

In the above step, because the probe has a biotin label, the successfully hybridized genome segment is also labeled with biotin, so that the successfully hybridized genome segment can be purified from the hybridization solution by using streptavidin magnetic beads. The technology of using biotin labeling and streptavidin magnetic bead purification is a well-known technology.

Specifically, α 1 is 20 or more.

Specifically, the microsatellite marker refers to a sequence formed by tandem repeat of a repeating unit consisting of more than or equal to 2 bases.

Specifically, the number of bases of the sequences on both sides of the microsatellite marker in the effective high-throughput sequencing fragment is more than or equal to 1, and the number of bases of the sequences on at least one side of the microsatellite marker in the effective high-throughput sequencing fragment is more than or equal to 10.

Specifically, the method for selecting the n samples having polymorphisms includes: selecting corn samples with different external forms, corn samples with different biological classifications, corn samples marked with different marks or corn samples of wild resources in different ecological regions.

Specifically, the number of the probes is 12, the repeating unit in the simple repeating sequence of each probe is CT, GA, TG, AC, TA, TGT, CCA, ATC, CCT, AGA, ATG or CAA, the repeating number of the simple repeating sequence of each probe is 6-20, preferably 6-15, for example, the repeating number is 8 or 12.

Specifically, the sequence of the probe is shown as SEQ ID NO 1-SEQ ID NO 12 in the sequence table.

In another aspect, an embodiment of the present invention provides a method for detecting a length of a microsatellite marker in a microsatellite marker locus successfully developed by the above development method, where the method includes:

selecting a microsatellite marker locus to be detected from the successfully developed microsatellite marker loci;

amplifying the microsatellite marker in the microsatellite marker locus to be detected by using a multiplex amplification primer to obtain an amplification product, carrying out high-throughput sequencing on the amplification product to obtain a second high-throughput sequencing fragment, and analyzing the second high-throughput sequencing fragment to obtain the length of the microsatellite marker in the microsatellite marker locus.

Specifically, the method for selecting the microsatellite marker loci to be detected from the successfully developed microsatellite marker loci comprises the following steps:

selecting the microsatellite marker locus with the standard of the microsatellite marker locus to be detected as the maximum H value, wherein the H value is the polymorphism index of the microsatellite marker locus,

wherein i is the ith category when classified according to the length of the microsatellite marker in the effective high-throughput sequencing fragment of the microsatellite marker locus, i is a natural number; ai is the ratio of the number of valid high throughput sequencing fragments of the ith class to the total number of valid high throughput sequencing fragments.

Specifically, the method for preparing the multiplex amplification primer comprises the following steps:

extracting the microsatellite marker from all the effective high-throughput sequencing fragments of the selected microsatellite marker locus to be detected and selecting the longest microsatellite marker as the microsatellite marker of the template sequence of the multiplex amplification primer;

extracting left sequences of the microsatellite markers from all the effective high-throughput sequencing fragments of the selected microsatellite marker loci to be detected, selecting all sequences with the length being more than α 2 bases, selecting the sequences with the highest frequency from all the selected sequences, taking the sequences with the highest frequency as reference sequences, comparing the reference sequences with the left sequences of all the microsatellite markers, and obtaining the coverage multiple and the variation frequency of each base in the sequences with the highest frequency, wherein in the sequences with the highest frequency, the bases with the coverage multiple being less than or equal to 1/α 3 or the variation frequency being more than or equal to α 3 are changed into N and then taken as the left sequences of the template sequences of the multiple amplification primers, wherein N is any one or more than four bases of A, T, C and G, α 2 is a second judgment threshold, α 2 is (the average length of the first high-throughput sequencing fragment-the length of the microsatellite marker loci) 2; α 3 is a third judgment threshold, α 3 is not less than or equal to 365 × the first high-throughput sequencing fragment (the accuracy of the first high-throughput sequencing fragment is obtained by taking the sequence of the multiple amplification primers as the template sequences of the multiple amplification primers);

obtaining a right sequence of the template sequence of the multiplex amplification primer sequence according to a method identical to the left sequence of the template sequence of the multiplex amplification primer;

and sequentially connecting the left sequences of the template sequences of the multiple amplification primers, the microsatellite markers of the template sequences of the multiple amplification primers and the right sequences of the template sequences of the multiple amplification primers to obtain the template sequences of the multiple amplification primers of the microsatellite marker loci, and obtaining the multiple amplification primers by utilizing the template sequences of the multiple amplification primers of the microsatellite marker loci.

Specifically, the method for obtaining the length of the microsatellite marker in the microsatellite marker locus comprises the following steps: obtaining a left border sequence of the second high-throughput sequencing fragment and a right border sequence of the second high-throughput sequencing fragment after removing the microsatellite marker in the second high-throughput sequencing fragment; aligning each of the second high-throughput sequencing fragments to the microsatellite marker locus to be detected by using the left border sequence and the right border sequence; intercepting the microsatellite marker in the second high-throughput sequencing fragment of each microsatellite marker locus to be detected; classifying the obtained microsatellite markers according to length, and calculating the truth degree R of the ith class_i＝N_i/N_maxWherein i is the ith class when classified by the length of the microsatellite marker in said effective high throughput sequencing fragment of said microsatellite marker locusOther, N_iNumber of said second high-throughput sequencing fragments for said ith class, N_max(ii) the maximum of the number of the second high-throughput sequencing fragments for all classes; if the degree of truth R_iα 4, the length of the microsatellite marker of the ith class is the length of the microsatellite marker in the microsatellite marker locus, if the true degree R is_i<α 4, the length of the i-th class of microsatellite markers is not the length of the microsatellite markers within the microsatellite marker locus, wherein α 4 is the fourth decision threshold and α 4 is 0.3.

Specifically, the method for fragmenting the genome of the mixed sample is mechanical disruption or enzyme digestion.

The technical scheme provided by the embodiment of the invention has the following beneficial effects: the development and detection technology of the corn microsatellite marker locus provided by the invention is simple, rapid, high-flux, comprehensive and accurate. The time consumption is shortened from 1 to 2 years to 1 to 2 days; the quantity of the developed microsatellite marker loci is improved to be close to 100 percent from about 1 percent of all the microsatellite marker loci in the genome; the number of corn samples for testing the polymorphism of the microsatellite marker locus is increased from dozens to no limit, and the accuracy of testing the polymorphism result is greatly improved; the conservativeness of the flanking sequence of the microsatellite marker locus can be obtained, and the universality of a primer for amplifying the microsatellite marker locus is ensured; the multiple microsatellite marker sites are used as one site for detection, one-by-one detection is not performed, multiple corn samples to be detected are only subjected to one-time detection, multiple detections are not performed, the workload of microsatellite marker site detection is greatly reduced, and therefore the number of the detected microsatellite marker sites is almost not limited. The detection result of the microsatellite marker locus is a base, and the accuracy is close to 100%; the detection resolution of the microsatellite marker locus is improved to the highest fraction: a single base; the detection result does not need to be corrected by referring to varieties.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.

The procedures or specifications of the procedures not shown or described in detail in the examples of the present invention are well known to those skilled in the art of molecular biology. Reagents or biological materials not mentioned in the examples of the present invention are commonly available reagents or biological materials, which are well known to those skilled in ordinary molecular biology and commercially available.

Examples

The development method of the maize microsatellite marker locus comprises the following steps:

n corn samples with polymorphism are mixed in equal amount to obtain a mixed sample, wherein n is more than 1.

Maize samples with polymorphisms included: the method comprises the following steps of (1) selecting corn samples with different external forms (form polymorphism), corn samples with different biological classifications (such as different varieties or varieties), samples with different markers (such as protein markers) or wild resource corn samples in different ecological regions, wherein the more the selected corn samples (the larger the n value), the more abundant the polymorphism, and the wider the applicability of the developed microsatellite marker locus. In this example, the species of microsatellite marker loci to be developed is maize, and the selected maize is a variety of maize, which are: "nong Hua 101", "Weike 702", "Xian Yu 335", "Longping 208", "Dafeng 30", "Ningyu 507", "Liyu 88", "Denghai 618", "Liang Yu No. 99", "Huanong 138", "nong Hua 101", "Jincheng 508", "Ningyu 666", "Yuyu Yu 30", "Jingke 665" and "Ling Chuan 808", 16 varieties of maize are widely used hybrids in China, publicly known and commercially available. Wherein, the microsatellite marker refers to a sequence formed by the tandem repeat of a repeating unit consisting of more than or equal to 2 bases.

The equal-quality leaves of the above 16 corn varieties are taken and mixed, the genome of the mixed sample is extracted, and the extraction method is carried out according to the operation manual of a novel plant genome extraction kit with the product number DP320 of Tiangen Biochemical technology (Beijing) Co. The corn sample selected in this embodiment is leaf, and as a general knowledge, the corn sample may be obtained from a seed or the like.

And fragmenting the genome of the mixed sample to obtain a genome fragment. Specifically, the method for fragmenting the genome of the mixed sample includes: mechanical disruption or enzymatic cleavage. The length of genome fragmentation is controlled within the range of fragment lengths detectable upon high throughput sequencing. In this embodiment, the high throughput sequencing employs a PI chip of a PROTON high throughput sequencer, and the detection length is about 200bp, so the peak value of the length of the obtained genome fragment is also controlled to be about 200bp as much as possible. In this example, an automatic acoustic focusing crusher Covaris S220 (manufactured by Covaris, usa, model number S220) was used to crush the genome of the mixed sample, the crushing method was performed according to the method for obtaining 200bp (peak) target fragment described in the instruction manual of the apparatus, "DNA cutting with S220/E220 Focused-ultrasonic" (version number: 010308Rev G), the genome fragment of the mixed sample was obtained after crushing, and after detecting the genome fragment according to the procedure of its double-stranded DNA using a Q5000 spectrophotometer manufactured by Quawell, usa, the concentration was diluted or concentrated to 100ng/μ L, and the genome fragment was obtained.

A plurality of biotin-labeled probes having a simple repetitive sequence were used as probe sets, and the probe sets were hybridized with the genome fragments to obtain a hybridization solution. The number of bases of the repeating unit in the probe having a simple repeating sequence is 2 or more. Specifically, the repeat unit in the simple repeat sequence of the probe is CT, GA, TG, AC, TA, TGT, CCA, ATC, CCT, AGA, ATG or CAA, and these 12 probes can hybridize to all possible microsatellite markers having repeat units of 2 bases and 3 bases, and thus can be used for the fishing of microsatellite markers in genomic fragments in all species. In the previous experiment, the efficiency of hooking the microsatellite marker by different probe lengths is detected, and the efficiency is higher when the repetition frequency of the simple repeated sequence of the probe is 6-20, and the preferred repetition frequency is 6-15, such as 8 or 12. In this embodiment, the probe set comprises 12 probes, and the sequences of the 12 probes are shown as SEQ ID NO 1-SEQ ID NO 12 in the sequence table. The probes are synthesized by Beijing Optimalaceae New Biotechnology Limited and labeled with 5' end biotin. Previous experiments show that the efficiency of respectively fishing the microsatellite markers from the genome fragments by using different probes is better than that of fishing the microsatellite markers from the genome fragments by mixing all the probes, so that the microsatellite markers from the genome fragments are respectively fished by using different probes in the embodiment, specifically, each probe is respectively dissolved into a solution with equimolar concentration (10 pM/mu L) by using enzyme-free water, and 1 mu L of each 12 probes in the probe set are respectively uniformly mixed with the genome fragment of a 5 mu g mixed sample and then hybridized to respectively obtain 12 hybridization solutions. The procedure for hybridization was: 95 ℃ for 10 minutes, 65 ℃ for 10 minutes and 37 ℃ for 10 minutes.

And purifying the successfully hybridized genome fragment in the hybridization solution by using streptavidin magnetic beads to obtain the purified genome fragment. Specifically, the purification process of using streptavidin magnetic beads to purify 12 kinds of hybridization solutions respectively is as follows: the obtained 1 of the 12 hybridization solutions was placed on a magnetic frame (manufactured by Invitrogen, usa) until the hybridization solution was clarified, the solution was aspirated, the magnetic beads were washed with enzyme-free water 2 times, 10 μ L of enzyme-free water was mixed with streptavidin magnetic beads, heated in a PCR instrument at 95 ℃ for 5 minutes, and rapidly placed on the magnetic frame, and the obtained solution was the purified hybrid genome fragment of the first probe. All 12 purified hybridizing genome fragments are sequentially obtained in the same manner as the purified hybridizing genome fragment of the first probe obtained, and mixed together, i.e., finally, purified hybridizing genome fragments of all probes are obtained. In order to successfully purify the hybridized genome segment, in this embodiment, a biotin-labeled probe with a simple repeat sequence is used in combination with streptavidin magnetic beads, and in other embodiments, the hybridization and purification of the genome segment can be performed in other manners.

And detecting the purified hybrid genome fragment by using second-generation high-throughput sequencing to obtain a first high-throughput sequencing fragment. The method comprises the steps of constructing a second-generation high-throughput sequencing library by using a DNA library preparation Kit (manufactured by NEB company, UK, and having a product code of E6270L) and according to an operation manual of the Kit, amplifying ePCR (emulsion PCR) before sequencing by using the obtained second-generation high-throughput sequencing library and a Kit Ion PI Template OT2200Kit v2 (manufactured by Invirripen company, USA, and having a product code of 4485146), and obtaining an ePCR amplification product by using the operation method according to the operation manual of the Kit. High-throughput Sequencing was performed on a Proton second-generation high-throughput sequencer using an ePCR amplification product and a Kit Ion PI Sequencing 200Kit v2 (manufactured by Invirrigen, USA, Cat. No. 4485149), and the operation was performed according to the manual of the Kit. In this example, the high-throughput sequencing amount is set to 10M sequencing fragments (1M ═ 100 ten thousand), the sequencing length is set to 500 cycles, and after the sequencing is finished, the first high-throughput sequencing fragment is obtained.

From the first high-throughput sequencing fragments, effective high-throughput sequencing fragments are screened. The effective high-throughput sequencing fragment comprises microsatellite markers in microsatellite marker sites, the number of bases of sequences on two sides of the microsatellite markers in the effective high-throughput sequencing fragment is more than or equal to 1, and the number of bases of sequences on at least one side of the microsatellite markers in the effective high-throughput sequencing fragment is more than or equal to 10. Analyzing whether each of the first high-throughput sequencing fragments contains a microsatellite marker, and removing the first high-throughput sequencing fragments which do not contain the microsatellite marker. In the reserved first high-throughput sequencing segment, whether the number of bases of sequences on both sides of the microsatellite marker is more than or equal to 1 is analyzed, if so, the microsatellite marker is complete in the first high-throughput sequencing segment, which is necessary because the polymorphism of the microsatellite marker refers to the length polymorphism of the microsatellite marker, and the length polymorphism of the microsatellite marker can be correctly obtained only by ensuring the completeness of the microsatellite marker, so that the subsequent analysis can be correctly carried out. The first high-throughput sequencing fragment with both side sequences of the microsatellite marker being less than 10 bases cannot accurately perform subsequent homology analysis, and errors are introduced due to the excessively short sequence, so that the first high-throughput sequencing fragment with both side sequences of the microsatellite marker being less than 10 bases is further removed. Through the above processes, the first high-pass sequencing fragment which is finally reserved is the effective high-pass sequencing fragment.

When each of the first high-throughput sequencing fragments is analyzed to determine whether it contains a microsatellite marker, analysis software commonly used in the prior art can be used for performing a numerator analysis, or each of the first high-throughput sequencing fragments can be simply and manually determined.

The effective high-throughput sequencing fragments are classified according to homology of sequences on both sides of a microsatellite marker in an effective high-throughput sequencing fragment, the effective high-throughput sequencing fragments of the same class are effective high-throughput sequencing fragments of the same microsatellite marker site, if the number of effective high-throughput sequencing fragments of the same microsatellite marker site is equal to or greater than 5631, a microsatellite marker site is successfully developed, wherein α is a first determination threshold and α 1 (the high-throughput sequencing depth is equal to the number of effective microsatellite marker sites/number of detectable microsatellite marker sites on the genome) is guaranteed, the specific value of α is adjusted according to the depth of high-throughput sequencing, the microsatellite markers in the effective high-throughput sequencing fragments are removed, the remaining two side sequences are combined into a complete sequence, the pairwise alignment analysis is performed between the combined complete sequences by using Megablalast (version 2.2.26) and the parameters of alignment analysis are set as 1e-5, the parameters-p-is set as 0, the effective homology-p.

Selecting the microsatellite marker locus with the maximum H value as the standard of the microsatellite marker locus to be detected, wherein the H value is the polymorphism index of the microsatellite marker locus,

wherein, when i is classified according to the length of the microsatellite marker in the effective high-throughput sequencing fragment of the microsatellite marker locus, the ith class is the natural number; ai is the effective high of the ith classThe number of fragments sequenced by flux is a proportion of the number of total available high throughput sequencing fragments. Putative microsatellite marker sites as in table 1 were classified by the length of the microsatellite marker in the efficient high throughput sequencing fragment for a total of 3: (TG)20, (TG)21 and (TG)22, so S ═ 3; the total number of effective high throughput sequencing fragments for this microsatellite marker locus was 40, with the number of 1 st microsatellite marker (TG)20 being 3, so a 1-3/40-80%, a 2-32/40-80%, and a 3-5/40-12.50% were also calculated. And substituting the above values into a calculation formula of H to obtain the H value of the microsatellite marker locus to be 0.98.

Calculating the H values of all the successfully developed microsatellite marker sites in this embodiment according to the same calculation method as the assumed microsatellite marker sites in table 1, wherein the H values of all the obtained microsatellite marker sites are arranged from large to small, and the microsatellite marker sites with the top 50 bits in the sequence are selected as the microsatellite marker sites to be detected in the sample to be detected in this embodiment. The parameters 50 are determined according to actual needs, for example, 1 microsatellite marker locus is needed when the purity of the corn is identified, about 50 microsatellite marker loci are generally selected when the corn fingerprint is constructed, and about 300 microsatellite marker loci are required to be selected to meet the requirements when the substantive derivation relationship among varieties is analyzed. The microsatellite marker loci with the largest H value are selected because the microsatellite marker loci have the strongest distinguishing capability, more samples can be distinguished and information can be provided as much as possible by using the fewest microsatellite marker loci, and the distinguishing of the samples is the most core task of the microsatellite marker technology.

Microsatellite markers in all valid high throughput sequencing fragments of the microsatellite marker locus were extracted, as set of 3 (TG)20, 32 (TG)21 and 5 (TG)22 microsatellite markers within the putative microsatellite marker locus listed in Table 1. The sequences to the left of the microsatellite markers in all available high throughput sequencing fragments from which the microsatellite marker locus was extracted constitute the sequences to the left of the microsatellite marker locus, as the sequences to the left of the putative microsatellite marker loci of Table 1 are a collection of 3 (A)2G (A)2, 5 (A)87G (A)3, 27 (A)86G (A)3 and 5 (A)81G (A) 4. In the same manner, the sequences on the right side of the microsatellite marker sites are obtained, and the sequences on the right side of the putative microsatellite markers shown in Table 1 are a set of 3 (A)4G (A)80, 5 (A)3G (A)2, 27 (A)3G (A)81 and 5 2G (A) 85.

The method for detecting the length of the microsatellite marker in the maize microsatellite marker locus comprises the following steps:

selecting a microsatellite marker locus to be detected from the successfully developed microsatellite marker loci, and designing a multiplex amplification primer for amplifying the microsatellite marker locus to be detected. The selection of microsatellite marker sites and the design of multiplex amplification primers are described below using the putative microsatellite marker sites in Table 1 as an example.

A method of designing multiplex amplification primers for amplifying selected microsatellite marker sites comprises: extracting microsatellite markers from all effective high-throughput sequencing fragments of the selected microsatellite marker loci, and selecting the longest microsatellite marker from the microsatellite markers as a microsatellite marker designed by a multiplex amplification primer; among the putative microsatellite marker sites in Table 1, (TG)22 is the longest microsatellite marker, and thus, (TG)22 is the microsatellite marker of the template sequence designed for the multiplex amplification primers for that microsatellite marker site. The longest microsatellite marker is selected to ensure that the length of the microsatellite locus amplified by the designed multiplex amplification primers does not exceed the amplification capacity of multiplex PCR, thereby reducing data loss in microsatellite detection.

The present invention provides a method for detecting a mutation in a microsatellite marker, wherein a left sequence of the microsatellite marker is extracted from all effective high-throughput sequencing fragments of the selected microsatellite marker site, wherein all sequences having a length of α bases are picked out, α is the second determination threshold, α (average length of the first high-throughput sequencing fragment detectable by the second high-throughput sequencing technique-length of the microsatellite marker site of the multiplex amplification primer) is 44(TG repeats 22 times, length is 44) in the present embodiment, α is 78 or bp., wherein all sequences having a length of greater than α are 5 (a)87G (a)3, 27 (a)86G (a)3 and 5 (a)81G (a)4 in the left sequence of the microsatellite marker are selected as 5 (a) sequences, wherein the selected microsatellite marker has a maximum overlap, no more than 7, no more than 3, no more than 0, no more than 3, no more than 0, no more than 3, no more than 0, no more than 3, no more than 0, no more than 2, no more than 3, no more than 2, no more than 3, no more than 2, no more than 3, no more than 2, no more than 3, no more than 2, no more than 3, no more than 2, no more than 3, no more than 2, no more than 3, no more than 2, no more than 3, no more than 2, no more than 3, no more than 2, no more than 3, no more than 2, no more than 3, no more than 2, no more than 3, no more than 2, no more than 3, no more than 2, no more than 3.

Obtaining the right sequence of the template sequence of the multiplex amplification primer by the method which is completely identical with the left sequence of the template sequence of the multiplex amplification primer. In the putative microsatellite marker loci of Table 1, the right sequence of the template sequence of the multiplex amplification primers is (A)2NNN (A) 80. Sequentially connecting the left sequences of the template sequences of the multiple amplification primers, the microsatellite markers of the template sequences of the multiple amplification primers and the right sequences of the template sequences of the multiple amplification primers to obtain the template sequences of the multiple amplification primers of the microsatellite marker loci, and obtaining the multiple amplification primers by utilizing the template sequences of the multiple amplification primers of the microsatellite marker loci. The template sequence of the multiplex amplification primers for the putative microsatellite marker loci in Table 1 is (A)85NNN (A)2(TG)22(A)2NNN (A) 80.

The template sequences of the multiplex amplification primers for the 50 microsatellite marker loci finally selected in this example were obtained according to the same method and parameters as described above.

TABLE 1 first high throughput sequencing fragment of a putative microsatellite marker site

In the first high throughput sequencing fragment type shown in table 1, the underlined parts represent microsatellite markers, the letters in parentheses represent repeat units of the microsatellite markers, and the numbers after the parentheses represent the number of repetitions of the repeat units.

And (3) designing a multiplex amplification primer for amplifying the selected microsatellite marker locus by using the template sequence of the multiplex amplification primer for all the microsatellite marker loci. The specific method comprises the following steps: the obtained template sequences of the multiple amplification primers of 50 microsatellite marker loci are connected by 100N to construct an artificial reference genome. Logging in multiple PCR primers to design a webpage https:// ampliseq.com/, and selecting DNA Hotspot designs (single-pole) at the option of "Application type". And uploading to construct an artificial reference genome after selecting "Custom" from the option of "Select the genome you with to use". The "DNA Type" option selects "Standard DNA". In the "Add Hotspot" option, the start and end positions of each microsatellite marker in the constructed artificial reference genome are filled in, and finally the "Submit targets" button is clicked to Submit and obtain the sequences of the multiplex amplification primers. In this embodiment, among the 50 selected microsatellite marker loci, 46 microsatellite marker loci in which the multiplex amplification primer is successfully designed are selected, and the 46 microsatellite marker loci are the microsatellite marker loci to be detected. This example employs multiplex PCR technology provided by Saimer Feishale, USA, which can amplify 12000 test regions simultaneously, so the present invention has the ability to detect 12000 microsatellite marker sites at a time, which is 12000 times higher than the detection ability of the traditional microsatellite marker sites.

And amplifying the microsatellite marker in the microsatellite marker locus to be detected by using the multiplex amplification primers to obtain an amplification product, and performing high-throughput sequencing on the amplification product to obtain a second high-throughput sequencing fragment. In this embodiment, the sample to be detected is 100 corn leaves taken from a corn field in the wuhan development area, 100 corn leaves are mixed in equal amount to obtain a mixed sample, and the genomic DNA of the mixed sample is extracted by using a plant genomic DNA extraction kit (product number: DP305, manufacturing company: tiangen biochemical technology (beijing) limited) according to the method provided by the operation manual thereof. The designed 46 pairs of multiplex amplification primers and library construction Kit 2.0 (manufactured by Life technology, USA, Cat. No. 4475345) was used to amplify genomic DNA of the mixed sample according to the Kit's operating manual, to construct a high-throughput sequencing library, and the obtained high-throughput sequencing library and Kit Ion PI Template OT2200Kit v2 (manufactured by Invirrtigen, Cat. No. 4485146) were used to amplify ePCR (Emulsion PCR) before sequencing, and the operating method was performed according to the Kit's operating manual, to obtain ePCR product. High-throughput Sequencing was performed on a Proton second generation high-throughput sequencer using the ePCR product and a Kit Ion PI Sequencing 200Kit v2 (manufactured by Invirriggen, USA, Cat. No. 4485149), and the procedure was performed according to the manual of the Kit. In this example, the high-throughput sequencing amount is set to 1M sequencing fragment (1M ═ 100 ten thousand), the high-throughput sequencing length is set to 500 cycles, and after the sequencing is finished, the second high-throughput sequencing product is obtained.

The length of the microsatellite marker within the microsatellite marker locus is obtained by analyzing the second high throughput sequencing product. The specific method comprises the following steps: removing the microsatellite markers in the second high-throughput sequencing fragment to obtain a left border sequence of the second high-throughput sequencing fragment and a right border sequence of the second high-throughput sequencing fragment; comparing each of the second high-throughput sequencing fragments to a microsatellite marker locus to be detected by using the left border sequence and the right border sequence; intercepting the microsatellite marker in the second high-throughput sequencing fragment of each microsatellite marker locus to be detected; classifying the obtained microsatellite markers according to length, and calculating the truth degree R of the ith class_i＝N_i/N_maxWherein N is_iNumber of second high throughput sequencing fragments for ith class, N_maxMaximum of the number of second high-throughput sequencing fragments for all classes; if degree of truth R_iα 4, the length of the microsatellite marker in the ith category is the length of the microsatellite marker in the microsatellite marker locus, if the truth R is greater than the truth R_i<α 4, the length of the i-th class of microsatellite markers is not the length of the microsatellite marker within the microsatellite marker locus, wherein α 4 is a fourth decision thresholdSince the sex is a length polymorphism caused by inconsistency in the number of repetitions of a simple repetitive sequence in a microsatellite marker, detection of a microsatellite marker locus mainly refers to detection of the length of a microsatellite marker within the microsatellite marker locus. Typically, the species is diploid, and if the sample is homozygous, the same microsatellite marker locus will contain only one microsatellite marker allele, and if the sample is heterozygous, the same microsatellite marker locus will have 2 different microsatellite marker alleles. If the sample is polyploid, such as wheat and cotton, the decision criteria should be adjusted accordingly. The microsatellite marker locus may generate sliding when performing multiplex amplification, so that in the second high-throughput sequencing segment, the length of the microsatellite marker generated by sliding is different from the length of the real microsatellite marker in the mixed sample, thereby generating interference noise with a degree of truth R_iCan reflect the strength of interference noise, R_iFor example, if a microsatellite marker locus is known to have a percentage of interfering microsatellite markers generated by more than 95 slides in 100 detections, then we can determine the value of the microsatellite marker locus to be α to be 390.3, then we have a 95% confidence that the obtained microsatellite marker locus of the ith category is not a true microsatellite marker, and if the obtained microsatellite marker locus is a false true microsatellite marker, then we can determine that the true microsatellite marker locus is a false microsatellite marker, if we have a lower percentage of interfering microsatellite markers, then we have a lower percentage of true microsatellite markers, if we have a lower percentage of interfering microsatellite markers generated by more than 95 slides, then we can determine that the true microsatellite marker locus is a false microsatellite marker, then we have a lower probability of determining that a microsatellite marker locus is a false microsatellite marker, then we have a lower percentage of interfering microsatellite markers in the ith category i, if we have a false microsatellite marker, then we have a lower probability of being found a true microsatellite marker, then we have a false satellite marker, then we have a false determination is a false determination that there is a false microsatellite marker of a true microsatellite marker locus of a lower probability of being found microsatellite marker, and a false satellite marker is found microsatellite marker found a false satellite, then a false satellite is found microsatellite marker of a false satellite, and a false determination is found a false satellite marker of a false satellite, and aIn the present embodiment, because reference data is lacking to determine α 4 value and the sample to be detected is diploid and is a heterozygote, α value is 0.6/2 to 0.3, the length difference between the false microsatellite marker generated by sliding and the amplification product of the real microsatellite marker is not large, and most of the conventional microsatellite marker detection methods are electrophoresis, so that the small length difference cannot be distinguished, and even if the small length difference can be distinguished, the accurate quantification cannot be realized, therefore, when the conventional microsatellite marker is detected, R cannot be calculated or cannot be accurately calculated_iCausing a large number of inaccurate and even erroneous conclusions.

In the following, it is assumed that Table 1 is a detected microsatellite marker locus again, and how to detect the microsatellite marker locus to be detected in the mixed sample is described. In the second high-throughput sequencing fragment of the putative microsatellite marker loci in table 1, the truncated microsatellite markers are a set of 3 (TG)20, 32 (TG)21 and 5 (TG)22, the truncated microsatellite markers are classified by repeat unit and are all TGs, and the microsatellite markers of the repeat unit with the highest occurrence frequency are reserved and are a set of 3 (TG)20, 32 (TG)21 and 5 (TG) 22; the remaining microsatellite markers were further classified by length to obtain 3 classes, respectively (TG)20, (TG)21 and (TG) 22. Of these 3 classes, the class that occupied the most the number of second high-throughput sequencing fragments was the 2 nd class (TG)21, N_max＝N₂32. The number of second high-throughput sequencing fragments occupied by the 1 st class (TG)20 was 3, N₁Then, R1 is 3/32<α 4 is 0.3, so it is determined that the 1 st class (TG)20 is not truly present, and is due to sliding₂＝1，R₃5/32, based on the same criteria, it is determined that the 2 nd category is truly present and the 3 rd category is not truly present. Thus, the length of the microsatellite marker within the microsatellite marker locus to be detected in the mixed sample is the length of the microsatellite marker of class 2, i.e., within the putative microsatellite marker locus to be detected in Table 1The microsatellite marker is 42bp in length (TG is repeated 21 times, so it is 21 × 2 bp-42 bp in length).

The length of the microsatellite markers in the 46 microsatellite marker loci to be detected in this example was successfully determined by repeating the detection in the same manner and parameters as in the above-described hypothetical example.

The development method and the detection method of the microsatellite marker locus provided by the embodiment of the invention are quick, simple, comprehensive and accurate. The traditional development method of the microsatellite marker locus can only discover about 1 percent of the microsatellite marker loci in a genome and can only verify the polymorphism of the microsatellite marker loci in less than 100 samples due to large workload. For the invention, theoretically, all the microsatellite marker loci on the genome can be found, in the embodiment of the development of the microsatellite of the corn, more than 1 ten thousand microsatellite marker loci are found, and the discovery capability of the microsatellite marker loci is improved by 50 times, and if the high-throughput sequencing quantity (which is easy to achieve) is increased, the discovery capability of the microsatellite marker loci can be improved to 80 times or even close to 100 times, which is easy to achieve. The embodiment of the invention combines the development (discovery) of the microsatellite marker locus and the polymorphism detection into a whole, does not pay extra work, but is time-consuming and difficult to realize for the traditional polymorphism detection work of the microsatellite marker locus, for example, the detection of the polymorphism of 18101 microsatellite marker loci in 16 corn varieties is equivalent to the detection of 16 × 18101-289616 times of PCR amplification and electrophoresis in the traditional detection, and the workload is not imaginable. In addition, the traditional development technology of the microsatellite marker locus has large workload and no capability of detecting a plurality of sequences of the same microsatellite marker locus, so that the conservation of a multiplex amplification primer cannot be analyzed, the universality of the developed microsatellite marker multiplex amplification primer is poor, and the problem is solved by the embodiment of the invention. Taking the example that 46 microsatellite marker loci are detected at a time in the method for detecting the length of the microsatellite marker in the maize microsatellite marker loci of the invention, for the traditional detection method, 46 times of PCR amplification and electrophoresis are needed. For the present invention, even if 1 ten thousand microsatellite marker loci are detected, the workload is not increased, but for the conventional detection method, the workload is increased by 1 ten thousand times. The traditional detection method is to judge the length of the microsatellite marker by electrophoresis, but the electrophoresis has errors, so reference varieties are needed to be compared, the detection workload is increased, moreover, few laboratories can have a set of complete reference varieties, but the embodiment of the invention adopts high-throughput sequencing to obtain a base sequence, and the obtained result is an absolute value, so no errors exist, and therefore, the reference varieties are not needed any more. In addition, different individuals cannot be distinguished by electrophoresis detection, for example, a sample in corn detection is a mixture of 100 individuals, and in an electrophoresis result, the proportion of different microsatellite markers of the same microsatellite marker site cannot be accurately calculated, so that the individual plants cannot be distinguished, and important indexes such as the rate of mixed plants cannot be calculated.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Sequence listing

<110> university of Jianghan

<120> corn microsatellite marker locus development method and microsatellite marker length detection method in microsatellite marker locus

<160>12

<170>PatentIn version 3.4

<210>1

<211>24

<212>DNA

<213> Artificial sequence

<400>1

ctctctctct ctctctctct ctct 24

<210>2

<211>24

<212>DNA

<213> Artificial sequence

<400>2

gagagagaga gagagagaga gaga 24

<210>3

<211>24

<212>DNA

<213> Artificial sequence

<400>3

tgtgtgtgtg tgtgtgtgtg tgtg 24

<210>4

<211>24

<212>DNA

<213> Artificial sequence

<400>4

acacacacac acacacacac acac 24

<210>5

<211>24

<212>DNA

<213> Artificial sequence

<400>5

tatatatata tatatatata tata 24

<210>6

<211>24

<212>DNA

<213> Artificial sequence

<400>6

tgttgttgtt gttgttgttg ttgt 24

<210>7

<211>24

<212>DNA

<213> Artificial sequence

<400>7

ccaccaccac caccaccacc acca 24

<210>8

<211>24

<212>DNA

<213> Artificial sequence

<400>8

atcatcatca tcatcatcat catc 24

<210>9

<211>24

<212>DNA

<213> Artificial sequence

<400>9

cctcctcctc ctcctcctcc tcct 24

<210>10

<211>24

<212>DNA

<213> Artificial sequence

<400>10

agaagaagaa gaagaagaag aaga 24

<210>11

<211>24

<212>DNA

<213> Artificial sequence

<400>11

atgatgatga tgatgatgat gatg 24

<210>12

<211>24

<212>DNA

<213> Artificial sequence

<400>12

caacaacaac aacaacaaca acaa 24

Claims

1. A method for developing a maize microsatellite marker locus, said method comprising:

the method for selecting n corn samples with polymorphism comprises the following steps of mixing n corn samples with polymorphism in equal mass to obtain a mixed sample, wherein n is greater than 1: selecting corn samples with different external forms, corn samples with different biological classifications, and corn samples marked with different corn samples or wild resources in different ecological regions;

extracting the genome of the mixed sample;

fragmenting the genome of the mixed sample to obtain a genome fragment;

using a plurality of probes with simple repetitive sequences as a probe set, hybridizing each probe in the probe set with the genome fragment respectively to obtain a plurality of hybridization solutions, wherein the number of the probes is 12, the repetitive unit in each simple repetitive sequence of the probe is CT, GA, TG, AC, TA, TGT, CCA, ATC, CCT, AGA, ATG or CAA, the repetition frequency of each simple repetitive sequence of the probe is 6-20, and purifying the successfully hybridized genome fragments in the hybridization solutions respectively to obtain a plurality of purified hybridization genome fragments;

screening effective high-throughput sequencing fragments from the first high-throughput sequencing fragments, wherein the effective high-throughput sequencing fragments comprise microsatellite markers in microsatellite marker sites, the base numbers of sequences on two sides of the microsatellite markers in the effective high-throughput sequencing fragments are more than or equal to 1, and the base numbers of sequences on at least one side of the microsatellite markers in the effective high-throughput sequencing fragments are more than or equal to 10;

2. The method of claim 1, wherein α 1 is 20 or more.

3. The method of claim 1, wherein the number of repetitions of the simple repeat sequence of each of the probes is 6 to 15.

4. The development method of claim 1, wherein the probe has a sequence as shown in SEQ ID NO 1-SEQ ID NO 12 of the sequence Listing.

5. A method for detecting the length of a microsatellite marker located within a microsatellite marker locus successfully developed by the development method according to any one of claims 1 to 4, wherein said method for detecting comprises:

selecting microsatellite marker loci to be detected from the successfully developed microsatellite marker loci, wherein the method for selecting the microsatellite marker loci to be detected from the successfully developed microsatellite marker loci comprises the following steps:

wherein S is the number of microsatellite marker sites classified according to the length of the microsatellite markers in the effective high-throughput sequencing fragment, i is the ith category when the microsatellite marker sites are classified according to the length of the microsatellite markers in the effective high-throughput sequencing fragment, and i is a natural number; ai is the proportion of the number of the effective high-throughput sequencing fragments of the ith category to the total number of the effective high-throughput sequencing fragments, a microsatellite marker in the microsatellite marker locus to be detected is amplified by utilizing a multiplex amplification primer to obtain an amplification product, the amplification product is subjected to high-throughput sequencing to obtain a second high-throughput sequencing fragment, and the length of the microsatellite marker in the microsatellite marker locus is obtained by analyzing the second high-throughput sequencing fragment;

the method for obtaining the length of the microsatellite marker in the microsatellite marker locus comprises the following steps: removing the microsatellites from the second high-throughput sequencing fragmentAfter tagging, obtaining a left border sequence of the second high-throughput sequencing fragment and a right border sequence of the second high-throughput sequencing fragment; aligning each of the second high-throughput sequencing fragments to the microsatellite marker locus to be detected by using the left border sequence and the right border sequence; intercepting the microsatellite marker in the second high-throughput sequencing fragment of each microsatellite marker locus to be detected; classifying the obtained microsatellite markers according to length, and calculating the truth degree R of the ith class_i＝N_i/N_maxWherein i is the ith class, N, when classified by the length of the microsatellite marker in said effective high throughput sequencing fragment of said microsatellite marker locus_iNumber of said second high-throughput sequencing fragments for said ith class, N_max(ii) the maximum of the number of the second high-throughput sequencing fragments for all classes; if the degree of truth R_iα 4, the length of the microsatellite marker of the ith class is the length of the microsatellite marker in the microsatellite marker locus, if the true degree R is_i<α 4, the length of the i-th class of microsatellite markers is not the length of the microsatellite markers within the microsatellite marker locus, wherein α 4 is a fourth decision threshold and α 4 is 0.3;

the method for preparing the multiplex amplification primer comprises the following steps: