CN107868843B

CN107868843B - Method for screening high-polymorphism molecular marker sites of mung beans

Info

Publication number: CN107868843B
Application number: CN201711288256.7A
Authority: CN
Inventors: 李伦; 方治伟; 周俊飞; 刘致浩; 彭海; 高丽芬; 李丽丽; 陈丽红
Original assignee: Jianghan University
Current assignee: Jianghan University
Priority date: 2017-12-07
Filing date: 2017-12-07
Publication date: 2021-06-29
Anticipated expiration: 2037-12-07
Also published as: CN107868843A

Abstract

The invention discloses a method for screening high polymorphic molecular marker loci of mung beans, which comprises the steps of mixing mung bean seeds of different varieties in equal quantity and extracting total nucleic acid; constructing a high-throughput sequencing library; searching variation sites on the genome; screening high polymorphism sites by sliding translation; designing multiple amplification primers at two sides of the candidate site; and screening the multiple primers to obtain a new high-polymorphism molecular marker locus. Theoretically, the invention can screen out all available high-polymorphism molecular marker loci on the genome at one time and can be directly used for high-throughput detection of varieties; because genome information of a plurality of varieties is integrated in the development process, the repeated verification process aiming at each sample in the traditional screening method is not needed; the molecular marker loci screened by the method can be detected in batch, and one molecular marker locus does not need to be subjected to molecular amplification and detection, so that the detection speed is increased, and the accuracy is improved; the molecular marker screened by the invention has high polymorphism, good resolution, simple and quick screening process and standard flow.

Description

Method for screening high-polymorphism molecular marker sites of mung beans

Technical Field

The invention discloses a method for screening high-polymorphism molecular marker loci of mung beans, and belongs to the technical field of biology.

Background

The molecular marker is a specific DNA fragment capable of reflecting a certain difference in genomes of biological individuals or populations, and the molecular marker technology is widely applied to the fields of gene positioning, genetic map analysis, forensic science, new species approval and the like, and has wide application value and theoretical research significance. However, due to the limitation of the development methods of the existing molecular markers, the number of the currently available molecular markers is very small, and even some species lack the available molecular marker sites. In the process of implementing the invention, the inventor finds that the prior art has at least the following problems: the development of the traditional high-polymorphism molecular marker mostly depends on personal experience of researchers, and is obtained through a large number of verification tests, the process is huge in cost and long in period, and a real high-polymorphism marker site cannot be obtained necessarily. In addition, the whole development process can only find out one high-polymorphism molecular marker site at a time, and the method is ineffective for a plurality of potential high-polymorphism molecular marker sites existing on the genome. The traditional high-polymorphism molecular marker is mainly used for detecting one marker in application, and a plurality of molecular marker sites cannot be detected in a large-scale and high-throughput manner. Some species currently have abundant mutation site information, but are not necessarily applicable to all varieties. Therefore, the mass search for suitable molecular marker sites is the biggest obstacle to the research and application of the current molecular markers.

Disclosure of Invention

Aiming at the defects in the prior art, the invention mainly aims to provide a method for screening mung bean high-polymorphism molecular marker loci, which can screen all available high-polymorphism molecular marker loci on a genome at one time and realize batch screening.

In order to achieve the purpose, the invention adopts the following technical scheme: a method for screening mung bean high polymorphism molecular marker loci comprises the following steps:

equivalently mixing mung bean seeds of different varieties, extracting total nucleic acid, and constructing a high-throughput sequencing library;

sequencing the high-throughput sequencing library with high coverage by adopting a high-throughput sequencing method to obtain a sequencing fragment; and comparing the sequencing result to the reference genome of the mung bean; obtaining all mutation sites according to the comparison result;

obtaining the combination number of the mutation sites of each window according to window translation, and converting the number of sequencing fragments of each combination into the percentage of all sequencing fragments on the window; calculating the polymorphism of each window, and evaluating whether the window is in a single copy region to obtain a candidate site;

searching conserved regions at two sides of the candidate site, and designing a molecular marker multiple amplification primer in the conserved regions;

and screening the multiple amplification primers to obtain a new high-polymorphism molecular marker locus.

More preferably, the number of the mung bean varieties is more than 2. Too few varieties will affect the assessment of the polymorphism of the molecular marker.

As a further preference, the equal mixing is: the molar ratio of the genome among each variety is (0.9-1) to (1-1.2), and different microspecies are uniformly mixed when a high-throughput sequencing library is constructed.

As a further preference, the high coverage sequencing comprises: obtaining nucleic acid sequencing fragments capable of covering genome, wherein the number of the sequencing fragments is higher than that of the subclasses.

As a further preference, said aligning the sequencing result to a reference genome of said test species comprises: spreading the sequencing fragments as units, and marking each base site and mutation type of each sequencing fragment, which are different from the reference genome; defining reads with the same mutant base site and mutation type as one genotype within the window, and obtaining all candidate genotypes for the window.

As a further preference, the window translation comprises: the window length is set to L; the number of sequenced fragments for each genotype was counted and divided by the total number of sequenced fragments that completely covered the window to obtain the frequency of the genotype.

As a further preference, the single copy area is: no genomic fragments that interfere with the single copy region are present elsewhere on the genome.

As a further preference, the multiple amplification primers for molecular marker design in conserved regions comprise: for a sliding window with the length L, the starting position of the window on the genome is set as S, the ending position of the window is set as E, the left-side conserved region of the window is defined as continuous n base sites with the coordinate position smaller than S and without variation found in sequencing data and existing data, and the right-side conserved region of the window is defined as continuous n base sites with the coordinate position larger than E and without variation found in sequencing data and existing knowledge (n is more than or equal to 30).

It is further preferred that the 3' end of the primer designed in the conserved region does not contain low complexity sequences, such as a plurality of consecutive A, consecutive AT repeats.

As a further preference, said screening for said multiplexed amplification primers comprises: and verifying the amplification uniformity among the molecular markers.

The invention has the beneficial effects that: the method comprises the steps of mixing different mung bean varieties and extracting total nucleic acid; constructing a high-throughput sequencing library; searching variation sites on the genome; screening high polymorphism sites by sliding translation; designing multiple amplification primers at two sides of the candidate site; and screening the multiple primers to obtain a new high-polymorphism molecular marker locus. Theoretically, the invention can screen all available high-polymorphism molecular marker loci on the mung bean genome at one time and can be directly used for high-throughput detection of mung bean varieties; because genome information of a plurality of varieties is integrated in the development process, the repeated verification process aiming at each sample in the traditional screening method is not needed; the molecular marker loci screened by the method can be detected in batch, and one molecular marker locus does not need to be subjected to molecular amplification and detection, so that the detection speed is increased, and the accuracy is improved; the screened molecular marker has high polymorphism, good resolution, simple and quick screening process and standard flow.

Detailed Description

The invention provides a method for screening mung bean high-polymorphism molecular marker loci, and solves the problem of difficulty in developing high-polymorphism molecular markers in the prior art.

In order to solve the above-mentioned defects, the main idea of the embodiment of the present invention is:

the method for screening the mung bean high-polymorphism molecular marker loci in the embodiment of the invention comprises the following steps:

The embodiment of the invention does not need to rely on the information of the existing molecular markers of mung beans and the information of the constructed genetic population required in the traditional screening process, can detect all the high-polymorphism molecular marker sites on the genome at one time only by high-throughput sequencing, and has simple and rapid screening process and standard flow.

The method for detecting the high-polymorphism molecular marker sites can be applied to detect all the molecular marker sites by utilizing an amplicon sequencing technology to realize amplification and sequencing once, is quick and accurate, has high flux, and is generally suitable for various researches and applications based on molecular markers.

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below. If not specifically stated, the reagents used in the invention are all common reagents in the market, most biotechnology companies are sold, and the effects are equivalent.

Example I screening of mung bean high polymorphism molecular marker loci

In this example, 10 varieties of mung beans with phenotypic differences were selected as materials, and the purpose of this example is to screen a batch of molecular marker sites with high polymorphism in batches.

Extraction of mixed genomic DNA

5 seeds of each of 10 mung bean varieties are taken for germination, fresh leaves of 0.1g are taken out of each seedling after the seedlings grow for 3 days, all the collected leaves are mixed and added with liquid nitrogen and ground into powder, and nucleic acid of the obtained mixed sample is extracted by using a plant genome DNA extraction kit (product number: DP305, manufacturing company: Tiangen Biochemical technology (Beijing) Co., Ltd.) according to the method of an operation manual. The mung bean variety information used in the scheme is shown in table 1.

TABLE 1 mung bean variety information used in the present example

Second, construction and sequencing of high-throughput sequencing library

The OD260/280 of the nucleic acid was determined to be 1.89 using an ultraviolet spectrophotometer (NanoDrop oneC, Seimer Feishell science, Ltd.). The extracted nucleic acids were quantified using qubits to determine the amount of extracted DNA concentration that reached library construction. The DNA to be detected is broken into 250bp by adopting a Covaris System ultrasonic breaking instrument (Covaris M220), and then a kit is constructed according to the entire genome library of the iontorrept to construct a high-throughput sequencing library for surface PCR amplification. Sequencing was performed using an ion torrent S5 high throughput sequencer.

When a high-throughput sequencing library is constructed, different mung bean varieties are uniformly mixed, and the mole ratio of the genome of each variety is close to 1: 1. Obtaining enough nucleic acid fragments capable of covering the genome based on high-throughput sequencing, wherein the number of the fragments is higher than that of the subclasses.

Method for screening polymorphic molecular marker sites

3.1 alignment of sequencing fragments with genomic sequences

All sequenced fragments were aligned to the mungbean reference genome using bowtie2 (version number 2.1.0), version number of mungbean genome GCF _000741045, download address: https:// www.ncbi.nlm.nih.gov/assembly/GCF _000741045.1/, all alignment parameters are default values.

3.2 analysis of variation sites

The analysis of the variation sites is developed by taking sequencing fragments as units, and each base site and the mutation type of the base site, which are different from those of the reference genome, on each sequencing fragment are marked; defining reads having the same mutant base site and mutation type as a genotype within the window; thereby obtaining all candidate genotypes for the window.

And (4) counting variation sites on the genome according to the comparison result, wherein the method comprises the following steps: setting the size of a sliding window as 100bp, and moving the window forwards by 30bp each time; for each window, firstly counting the variation site information of each ready, if the base on the genome is A, and the corresponding site on the sequencing fragment is T, recording the site as T; if the nucleotide information is the same as the nucleotide information on the genome, the nucleotide is recorded as R. The information of all base positions as a whole indicates the genotype of the sequenced fragment on the window. Since the occurrence ratio of insertions and deletions introduced during the sequencing process is high, and especially the occurrence ratio of sequencing errors at the positions of simple repeated sequences is high, all insertion and deletion sites and all variation sites in the simple repeated regions are omitted.

3.3 calculate the polymorphism index for each window.

The percentage frequency of the genotype is obtained by counting the number of sequenced fragments for each genotype and dividing by the total number of sequenced fragments that completely cover the window. The polymorphism index calculation formula for this window is as follows:

wherein p is_iIs the frequency of the ith genotype, and n is the number of genotypes that appear within the window. If the polymorphism index within the window is less than 0.2, then the site is discarded; let n be the position where the first variation occurs and m be the position where the last variation occurs in the windowAnd (2) detecting whether a conservative region with the length of more than 50bp exists in a region from n bp to (n-L) bp and a region from m bp to (m + L) bp when L is 200- (m-n), wherein the conservative region is required not to detect any base variation, if regions which meet the requirements exist on both sides, the region is reserved as a candidate polymorphic site, and otherwise, the window is discarded.

In the above calculation formula of polymorphism index, at any polymorphic site, there are 20 genotypes of ten mung bean varieties theoretically, but the genotypes of some varieties are the same, so that only i genotypes may appear, the number of sequencing fragments of some genotypes is more than the total number of fragments at the site, and some genotypes are less than the total number, i genotypes are numbered as 1, 2, 3₁The second is p₂... the ith genotype is p_i(ii) a The frequency of a certain genotype defines the number of sequencing fragments of the genotype divided by the number of all sequencing fragments of the site; 1 minus the sum of the frequencies of all genotypes is the value D of the polymorphism at that site. And finally, screening high polymorphism sites according to the calculated D value sequence.

3.4 screening of molecular marker sites

The screening of the molecular marker locus is carried out in a window translation mode, and the window length is set as L; only the sequencing fragments which can completely cover the window are considered in the window translation process, and other sequencing fragments are not considered; counting the number of sequencing fragments of each genotype, and dividing by the total number of sequencing fragments completely covering the window, thereby obtaining the frequency of the genotype; the method specifically comprises the following steps:

the window is translated forward by 30bp, and the steps of 3.1-3.3 are repeated, so as to obtain candidate molecular marker loci on each chromosome. Then selecting the first 30 sites according to the height of the polymorphism, and then removing the sites which are closer to each other on the genome by the following method: setting a window with the length of 1,000,000bp to check whether the candidate polymorphic sites exist in the region, if not, extending forwards for 500,000bp and then searching again; if a site exists, the site is reserved; if multiple sites are present, the one with the highest polymorphism is selected for retention. The high polymorphism molecular marker sites selected in this example are shown in Table 2.

TABLE 2 mung bean high polymorphism molecular marker loci

Fourth, primer design method

The selected molecular marker sites have characteristic fragments in the species and show high polymorphism in a mixed sequencing sample. The selected molecular marker is in a single copy region on the genome, and no genome segment interfering with the region exists at other positions on the genome. Conserved regions suitable for primer design should be arranged on two sides of the molecular marker; for a sliding window with the length of L, the starting position of the window on the genome is set as S, the ending position of the window is set as E, the left-side conserved region of the window is defined as continuous n base sites of which the coordinate position is less than S and no variation is found in sequencing data and existing data, and the right-side conserved region of the window is defined as continuous n base sites of which the coordinate position is greater than E and no variation is found in the sequencing data and existing knowledge; primers designed within conserved regions do not contain low complexity sequences AT the 3' end, such as multiple a, consecutive AT repeats.

Logging in a Life technology company multiple amplification primer online design webpage https:// ampliseq.com, clicking on the "My References" option, selecting the "Add reference" option in a newly popped page, selecting its own reference genome in the popped page, and clicking on "save" to upload the used mung bean reference genome sequence. Then click on the "start a new design" option under the "my design" option, thereby entering the primer design page. In the skipped page, "store" is selected in the "Select genome to use" option, then the mung bean reference genome sequence uploaded in the above step is selected, and then "DNA Hotspot designs" (single-pool) is selected in the "Application type" option. The "add targets" button is then selected, the start and stop information for each candidate polymorphic molecular marker site is entered in the new interface, and the "Submit targets" option is then clicked to begin primer design. After the design of the primers is completed, detecting whether the 3' end of each primer has low complex sequences, including a plurality of A or T or C or G, and ATATAT-like sequences: if so, the primer needs to be redesigned after setting the corresponding site on the genome of the reference genome to be N. The primer sequences of the molecular marker sites obtained in this example are shown in Table 3.

TABLE 3 molecular marker primer information

Fifth, verifying polymorphism of molecular marker sites

In addition, 5 authorized conventional varieties of mung beans were selected, 10 seeds were selected for each variety, and total nucleic acid was extracted. The amplicon sequencing library construction kit (cat # 4475345) produced by life technology, usa was then used to construct a high throughput sequencing library. The kit comprises the following reagents: 5 × Ion AmpliSeq^TMHiFi Mix, FuPa reagent, conversion reagent, sequencing linker solution, and DNA ligase. Library construction method according to the operation manual of the kit Ion AmpliSeq^TMLibrary Preparation (publication number: MAN0006735, version: A.0). The multiplex PCR amplification system is as follows: 10 × Ion AmpliSeq^TM2 mul of HiFi Mix, 3 mul of synthesized mixed multiplex amplification primer, and 8 of nucleic acid for extracting mung bean varietyng and 15. mu.l of enzyme-free water. The amplification procedure for multiplex PCR was as follows: at 95 ℃ for 2 minutes; (95 ℃, 10 seconds; 55 ℃, 45 seconds) x 15 cycles; keeping the temperature at 10 ℃. And (3) digesting redundant primers in the multiple PCR amplification product by using a FuPa reagent, and then carrying out phosphorylation. Connecting the phosphorylated amplification product with a sequencing adaptor by the following specific method: adding 1.5 mu L of DNA ligase, 2 mu L of conversion reagent and 2 mu L of sequencing linker solution into the mixture of the amplification product and the sequencing linker, uniformly mixing the mixture, then preserving the temperature of the mixture at 22 ℃ for 30 minutes, purifying the mixture by using an ethanol precipitation method, and then diluting the mixture to 15ng/ml, thereby obtaining a sequencing library with the concentration of 100 pM. And high throughput sequencing was performed with Ion torrent S5.

Aligning the measured reads to a reference genome according to the method of the third step, and analyzing the genotype of the sequenced fragment in a window which is a region marked on the reference genome by each pair of primers, wherein the analysis method is 3.1-3.3; differences in five varieties were analyzed for each site. From the results, it is understood that any two varieties can be distinguished by the 30 molecular markers screened in the present example, and extremely high polymorphisms are shown, so that sufficiently high resolution can be provided for distinguishing the mung bean varieties.

The invention can be applied to the development of high-polymorphism molecular markers of various species, and the method only needs to slightly change the sample collection method in different applications, so the invention has stronger universality. The invention changes the problem that only one high polymorphism site can be developed at a time in the existing method, and the developed site can not be used for detecting a plurality of molecular marker sites through one-time amplification in application, provides a new method for the high-throughput development of the high polymorphism molecular marker sites and the realization of high-throughput sequencing and simultaneous detection of a plurality of molecular marker sites in practical application, and has the advantages of simplicity, rapidness, convenience and obvious creativity.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for screening high polymorphic molecular marker loci of mung beans is characterized by comprising the following steps: the method comprises the following steps:

1) equivalently mixing mung bean seeds of different varieties, extracting total nucleic acid, and constructing a high-throughput sequencing library;

2) sequencing the high-throughput sequencing library with high coverage by adopting a high-throughput sequencing method to obtain a sequencing fragment; and comparing the sequencing result to the reference genome of the mung bean; obtaining all mutation sites according to the comparison result;

3) obtaining the combination number of the mutation sites of each window according to window translation, and converting the number of sequencing fragments of each combination into the percentage of all sequencing fragments on the window; calculating the polymorphism of each window, and evaluating whether the window is in a single copy region to obtain a candidate site;

4) searching conserved regions at two sides of the candidate site, and designing a molecular marker multiple amplification primer in the conserved regions;

5) screening the multiple amplification primers to obtain new high-polymorphism molecular marker loci;

in the step 1), the equivalent mixing is as follows: the molar ratio of the genome among each variety is (0.9-1) to (1-1.2);

in the step 2), the step of comparing the sequencing result to the reference genome of the mung bean comprises the following steps: spreading the sequencing fragments as units, and marking each base site and mutation type of each sequencing fragment, which are different from the reference genome; defining reads with the same mutant base site and mutation type as one genotype in the window, and obtaining all candidate genotypes of the window; ignoring all insertion, deletion variations, and variation sites in simple repeat regions;

in the step 3), calculating the polymorphism of each window, including: the window length is set to L; counting the number of sequencing fragments of each genotype, and dividing the number by the total number of sequencing fragments completely covering the window to obtain the frequency P of the genotype; calculating the polymorphism index D of the window according to P, wherein the formula is as follows:

wherein p is_iThe frequency of the ith genotype of the window; threshold value of polymorphism index: discarding a candidate window with a D value less than 0.2 according to a calculation result of a polymorphism index calculation formula;

in the step 4), designing a molecular marker multiplex amplification primer in the conserved region, which comprises: for a sliding window with the length of L, the starting position of the window on the genome is set as S, the ending position of the window is set as E, the left-side conserved region of the window is defined as continuous n base sites of which the coordinate position is less than S and no variation is found in sequencing data and existing data, and the right-side conserved region of the window is defined as continuous n base sites of which the coordinate position is greater than E and no variation is found in the sequencing data and existing knowledge;

in the step 4), the 3' end of the primer designed in the conserved region does not contain low-complexity sequences.

2. The method for screening mung bean high polymorphism molecular marker loci according to claim 1, characterized in that: in the step 1), the number of the mung bean varieties is more than 2.

3. The method for screening mung bean high polymorphism molecular marker loci according to claim 1, characterized in that: in the step 2), the sequencing with high coverage degree comprises the following steps: obtaining nucleic acid sequencing fragments capable of covering genome, wherein the number of the sequencing fragments is higher than that of the subclasses.