CN116804231A

CN116804231A - DNA bar code, primer, kit, method and application

Info

Publication number: CN116804231A
Application number: CN202310827017.3A
Authority: CN
Inventors: 徐平; 施佳辉; 唐蜀昆; 张亚峰; 高慧英; 高媛; 曾新生; 邵爱菊; 潘淑康; 蒋洁琳; 陈川龙; 贾鳗
Original assignee: Menghai Tea Industry Co ltd; Yunnan Dayi Microbial Technology Co ltd
Current assignee: Menghai Tea Industry Co ltd; Yunnan Dayi Microbial Technology Co ltd
Priority date: 2023-07-06
Filing date: 2023-07-06
Publication date: 2023-09-26

Abstract

The invention provides a DNA bar code, a primer, a kit, a method and application. The invention selects the coding gene sequence of OST protein as the DNA bar code by utilizing the protein genomics technology, which can realize the rapid identification and differentiation of the inner bacterial species of the Eimeria sampsonii (Rasamsonia emersonii). Therefore, the invention establishes the standard gene sequence and the sample identification method of the TMCC 70008 strain of the Emohnsen-Sahnikovia (Rasamsonia emersonii) for the industrial fermentation production of the puer tea. Compared with the traditional morphological identification method, the method has the characteristics of universality, easy amplification and easy comparison, and the identification efficiency is remarkably improved, so that a powerful technical means is provided for the protection of the controllable pure fermentation process and the excavation, protection and utilization of microbial strain resources in the puer tea fermentation industry.

Description

DNA bar code, primer, kit, method and application

Technical Field

The invention belongs to the field of species and strain identification, and particularly relates to a DNA bar code primer composition, a DNA bar code, a kit, a method and application for identifying a strain for fermentation production of puer tea. Specifically, the puer tea fermentation production strain is an Emohnsen Rosa (Rasamsonia emersonii) TMCC 70008 strain.

Background

Pu' er tea is post-fermented tea produced in the geographical mark range of Yunnan, and is prepared by adopting large-leaf green-sun-dried raw tea as a raw material through a series of processes. The traditional puer tea manufacturing process comprises the following steps: the picked fresh tea leaves are rolled and dried to prepare raw material dried green tea, and then the raw material dried green tea leaves are subjected to impurity removal, tidal water, pile fermentation, airing, screening, compression molding and packaging to leave the factory. In the production of puer tea, the pile fermentation process is a main factor for the quality formation of puer tea, and in the process, the content components such as tea polyphenol, caffeine, some polysaccharide substances and the like in the tea are greatly changed, so that the special flavor, taste, quality and various health care effects of puer tea are achieved.

In the traditional puer tea production, the moist heat environment activates enzymes contained in the tea leaves, so that a part of content components contained in the tea leaves are converted into substances which can be utilized by microorganisms; microorganisms grow in a large quantity in the fermentation process of the puer tea to generate abundant intracellular enzymes and extracellular enzymes, and the intracellular enzymes catalyze a series of conversion of the content components in the tea, so that the puer tea has unique quality. The different producing areas, the microorganism species and the differences of community structures, so that the Pu' er tea has special flavor and quality.

Besides the unique flavor and culture of the puer tea, the puer tea has the health care effects of losing weight, reducing blood sugar and blood fat, preventing and improving cardiovascular diseases, resisting aging, resisting cancer, diminishing inflammation, helping digestion, nourishing stomach and the like, and is also concerned by people and popular with consumers. The increasing market demand pulls the development of the puer tea industry and promotes the economic growth in Yunnan.

With the improvement of the living standard of people, consumers increasingly pay attention to the problems of sanitation, safety and the like of foods. However, food quality safety events have frequently occurred in recent years, and tea quality and safety issues thereof have also been increasingly concerned. In addition to the problem of pesticide residue, microorganisms in the Pu 'er tea pile fermentation process are possibly important factors affecting the quality of tea, so that the development of the Pu' er tea industry is faced with a plurality of dilemmas. For example, the quality of the product is unstable, the production period is long, the labor input is too high, the microorganism quantity exceeds the standard, mites are bred, and the like.

At present, the production of Pu' er tea by most manufacturers is still an empirical fermentation of semi-natural artificial pile fermentation, and although communities mainly comprising dominant common microorganisms including Emamsonia aestiva (Rasamsonia emersonii) are relatively stable in the pile fermentation process, a large lifting space exists in the stability of the product, and certain potential safety hazards are inevitably generated in the production process. To further obtain consumer favor and market acceptance, break through foreign trade barriers and promote the market competitiveness of puer tea enterprises, manual control, cleaning and high efficiency of puer tea production must be realized, and the product development and the industry chain extension are not broken. To do this, a new technology must be innovated, and a series of safe, clean, efficient and manually controllable automatic Pu 'er tea new processes are invented to ensure the healthy development of Pu' er tea industry, thereby bringing long-term benefits to the country and people.

The artificial inoculation and the pure fermentation of the puer tea are new development directions of the controllable fermentation of puer tea. In order to protect the fermentation process of the puer tea and the quality microorganism germplasm resources, a quick and accurate identification method for the strains for puer tea fermentation is necessary to be developed.

The DNA bar code technology can rapidly and simply identify and distinguish the similar strains, can provide theoretical basis and technical means for the development of the artificial controllable pure fermentation process and the resource protection of the puer tea, and promote the healthy development of puer tea industry.

Disclosure of Invention

In order to overcome the defect of morphology in identifying puer tea fermentation production bacteria, the invention provides a DNA bar code, a primer, a kit, a method and application for identifying puer tea fermentation strain Emamelis samsonii (TMCC 70008), so that the Emamelis samsonii TMCC 70008 strain can be accurately identified from confusing species or compound species, puer tea produced by fermenting the strain can be accurately identified, quick identification and differentiation can be realized, quick identification and assessment can be provided for a puer tea new fermentation process, interference of other miscellaneous bacteria in the fermentation process can be prevented, and an evidence method and basis can be provided for puer tea artificial controllable fermentation process and strain abuse.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

(1) A DNA barcode for identifying a strain of sampsonii or pu' er tea produced by fermentation thereof, characterized in that the DNA barcode is derived from the genome of a strain of sampsonii TMCC 70008 and is selected from a sequence of at least 100bp in the DNA sequence shown in SEQ ID No.4.

(2) The DNA barcode according to (1), wherein the nucleotide sequence of the DNA barcode comprises a sequence shown as SEQ ID No.1, SEQ ID No.2 or SEQ ID No. 7; alternatively, the DNA barcode is selected from the group consisting of the sequences shown in the DNA sequences of SEQ ID No.1, SEQ ID No.2, or SEQ ID No. 7; preferably, the nucleotide sequence of the DNA bar code is shown as SEQ ID No.1, SEQ ID No.2, SEQ ID No.4 or SEQ ID No.7.

(3) A primer pair for amplifying the DNA barcode of (1) or (2).

(4) The primer set according to (3), wherein the nucleotide sequence of the forward primer is the same as that in the genome of the strain TMCC 70008 of Emohnsonia: the sequence is a sequence from the 1 st position of the nucleotide sequence shown as SEQ ID No.4 to the 2296 th position of the nucleotide sequence shown as SEQ ID No.4 in the genome of the TMCC 70008 strain, and the length of the forward primer is 15-30bp; its reverse primer is reverse complementary to such sequence in the genome of the TMCC 70008 strain: the sequence is a sequence from the 86 th bit of the nucleotide sequence shown as SEQ ID No.4 to the last bit of the nucleotide sequence shown as SEQ ID No.4 in the genome of the TMCC 70008 strain, and the length of the reverse primer is 15-30bp.

(5) The primer set according to (4), wherein the nucleotide sequences of the forward primer and the reverse primer are as follows:

forward primer: 5'-ATATAAAAGCCTCTAGGGTGCC-3';

reverse primer: 5'-CACAACAAGCCTGCCTACC-3'.

(6) A kit for identifying a strain of Emoxaburner et al TMCC 70008 or puer tea produced by fermentation thereof, comprising the primer pair according to any one of (3) to (5).

(7) A method of identifying a strain of sampsonii TMCC 70008 comprising the steps of:

a) Providing genomic DNA of a strain to be tested;

b) Performing PCR amplification using the genomic DNA of step a) as a template and the primer set according to any one of (3) to (5) to obtain a PCR product;

c) Detecting PCR products by electrophoresis, and if no target band exists, judging that the strain to be detected is not the Emoxibusa TMCC 70008 strain; if the target strip exists, performing the step d);

d) Sequencing the obtained PCR product to obtain a nucleotide sequence to be detected; and (3) carrying out homology comparison on the nucleotide sequence to be detected and the nucleotide sequence of the DNA bar code in the step (1) or the step (2), and judging that the strain to be detected is the Eimeria sambucus TMCC 70008 strain if the homology is more than 99%.

(8) A method for identifying puer tea produced by fermentation of sampsonii aemosi TMCC 70008 strain, comprising the steps of:

a) Providing a puer tea sample;

b) Extracting genome DNA of a microorganism strain from the puer tea sample;

c) Performing PCR amplification using the genomic DNA of step b) as a template and the primer set according to any one of (3) to (5) to obtain a PCR product;

d) Detecting PCR products through electrophoresis, and if no target strip exists, judging that the puer tea is not puer tea produced by fermenting Emohnsen-Sa strain TMCC 70008; if the target strip exists, performing step e);

e) Sequencing the obtained PCR product to obtain a nucleotide sequence to be detected; and (3) carrying out homology comparison on the nucleotide sequence to be detected and the nucleotide sequence of the DNA bar code in the step (1) or the step (2), and judging that the puer tea is puer tea produced by fermenting the Emoxahna samsonii TMCC 70008 strain if the homology is more than 99%.

(9) The use of the DNA bar code according to (1) or (2) for identifying the strain of Emoxaburner et al TMCC 70008 or Pu' er tea produced by fermentation thereof.

(10) The use of the primer pair according to any one of (3) to (5) for identifying the sampsonii emmosbeck TMCC 70008 strain or a puer tea produced by fermentation thereof.

(11) The use of the kit according to (6) for identifying Emohnsen Coxsackie strain TMCC 70008 or Pu' er tea produced by fermentation thereof.

Compared with the prior art, the invention has the following advantages and positive effects:

1. the present invention employs protein genomics technology to find annotation-missing peptide fragments from the genome of the TMCC 70008 strain of sampsonii (Rasamsonia emersonii). Based on the position of the peptide coding sequence in the genome, the possible gene sequences and protein sequences encoding the peptide are determined. The invention further develops a DNA bar code based on the coding gene (OST gene) of the peptide through careful research and comparative analysis, the bar code sequence can realize the rapid identification and distinction of the strain in the Eimeria sampsonii, and can accurately identify the strain TMCC 70008 of the Eimeria sampsonii from confusing species or complex species, and further can accurately identify the puer tea produced by fermenting the strain.

2. The invention further discovers that the sequence of the OST gene (e.g., SEQ ID No. 1) has the characteristics of versatility, easy amplification and easy alignment, and the difference between different strains in Eimeria roczakii is obvious compared with other genes.

3. The invention establishes a standard gene sequence and a sample identification method of a strain Emmosense Rosa strain TMCC 70008 for the industrial fermentation production of puer tea. Compared with the traditional morphological identification method, the method provided by the invention has the advantage that the identification efficiency of the target strain is obviously improved. The method has low requirements on the integrity of the sample, and the identification index can be quantized, so that an effective basis is provided for timely judging the puer tea fermentation process and germplasm resources thereof. In addition, morphological confusion seeds are further added, the reliability and the accuracy of the identification are better than those of the conventional molecular identification method, and the blank of identifying the Eimeria sampsonii strain for producing the puer fermented tea based on the DNA bar code technology is filled.

Drawings

Figure 1 shows a mass spectrum of the newly identified peptide fragment YAPIDLDDTMYDELTSAPR.

FIG. 2 shows a comparison of the mass spectrum of a chemically synthesized peptide YAPIDLDDTMYDELTSAPR with the mass spectrum of a originally identified peptide; the original identification peptide fragment is a peptide fragment obtained by mass spectrometry analysis and identification; the upper part of the figure is the mass spectrum of the original identified peptide fragment, and the lower part is the mass spectrum of the chemically synthesized peptide fragment.

FIG. 3 shows the sequence of SEQ ID No.1, wherein the gray background part is an intron and the start site isATGTermination site isTAG。

FIG. 4 shows the correspondence of SEQ ID No.1 and the amino acid sequence of the protein encoded thereby (SEQ ID No. 3), wherein the grey part is the newly identified peptide YAPIDLDDTMYDELTSAPR.

FIG. 5 shows the results of homology analysis of SEQ ID No.3 by NCBI-BLASTP. Each line segment under the query result represents a sequence that has a certain similarity to the target sequence that is matched in NCBI.

FIG. 6 shows the BLASTP homology comparison of SEQ ID No. 3.

FIG. 7 shows the results of homology analysis of SEQ ID No.1 by NCBI-BLASTN. The line segment on the left side below the query results represents the match of the samsonia samsonii CBS 393.64 oligosaccharaide transferase subunit mRNA; the line segment on the right represents the matching of the A.pseudoswift (Aspergillus pseudotamarii) CBS 117625 unknown protein (BDV 38 DRAGT_ 260987) mRNA.

FIG. 8 shows the results of homology analysis of SEQ ID No.4 by NCBI-BLASTN. Each line segment under the query result represents a sequence that has a certain similarity to the target sequence that is matched in NCBI.

FIG. 9 shows the results of homology analysis of SEQ ID No.2 by NCBI-BLASTN. The line segment on the left side below the query results represents the match of the samsonia samsonii CBS 393.64 oligosaccharaide transferase subunit mRNA; the line segment on the right represents the matching of the A.pseudoswift (Aspergillus pseudotamarii) CBS 117625 unknown protein (BDV 38 DRAGT_ 260987) mRNA.

FIG. 10 shows the result of agarose gel electrophoresis of the products obtained by PCR amplification using primers designed for the DNA barcode of the Emoxaburner TMCC 70008 strain of the present invention.

FIG. 11 shows the result of the comparison of the sequencing of the PCR product of the Emoxaburner Emoxibusae TMCC 70008 strain with the theoretical sequence (SEQ ID No. 7).

FIG. 12 shows an NJ phylogenetic tree constructed from the DNA bar code (SEQ ID No. 7) and the sequencing results of the PCR products of each strain to be identified.

Detailed Description

The invention is further described below by means of the description of specific embodiments and with reference to the accompanying drawings, which are not intended to be limiting, but a person skilled in the art can make various modifications or improvements according to the basic idea of the invention, all without departing from the scope of the invention.

As used herein, the term "misannotation" refers to the inability of gene prediction software (e.g., geneMark, augustus, glimmer, etc.) to predict a gene or protein that is not normally expressed in high amounts under specific conditions after the species has completed genome sequencing, and therefore is difficult to find in research.

The term "DNA barcoding" refers to a novel technique for molecular identification of species using a standard, short DNA fragment within the genome, which allows rapid and accurate species identification.

The term "six-frame translation" is a known term in proteomics and genomics, and is based on the principle that when a DNA encodes a protein, the triplet codon is used to encode the protein, and given a DNA sequence, there are 3 encoding possibilities, plus 3 encoding possibilities on its complementary strand, for a total of 6 encoding possibilities (+1, +2, +3, -3, -2, -1).

The invention utilizes the systematic protein genomics technology to find a species-specific gene-encoded protein which is difficult to find by traditional gene prediction software from the Emohnsonia strain TMCC 70008. The peptide segment and relevant proteomics mass spectrum data of the gene coding sequence product are supported to be accurate and reliable. Based on the position of the polypeptide coding sequence in the genome, the possible gene sequence and protein sequence for coding the polypeptide are determined, and the coding gene is a novel oligosaccharyl transferase (Oligosaccharyl transferase, OST) coding gene. The invention discovers that the coding frame (SEQ ID No. 1) of the OST gene can identify the Eimeria indica TMCC 70008 strain from confusing species, so that the invention can be used for developing DNA bar codes for identifying the Pu' er tea industrial fermentation production strain Eimeria indica TMCC 70008 strain. Compared with the prior art, the DNA bar code obtained by the method has higher specificity.

The invention further obtains a DNA bar code which can accurately and effectively identify the Emohnsonia strain TMCC 70008 based on the above specific DNA sequence (SEQ ID No. 1) through careful research and comparative analysis.

Specifically, the invention discovers a peptide segment which is not contained in the annotation gene of the original Emohnsonia sambucus TMCC 70008 strain through systematic proteomics research, the sequence of the peptide segment is YAPIDLDDTMYDELTSAPR, and the peptide segment and the related proteomics mass spectrum data of the peptide segment are supported to be accurate and reliable. Based on the position of the peptide coding sequence in the genome of the Emohnsonia strain TMCC 70008, the possible gene sequence (SEQ ID No. 1) and protein sequence (SEQ ID No. 3) encoding the protein were determined. The length of SEQ ID No.1 is 1163bp, and according to comparison and analysis, the SEQ ID No.1 is unique to the genome of the Eimeria Saxifraga TMCC 70008 strain, so that the sequence can be used as a DNA bar code of the Eimeria Saxifraga TMCC 70008 strain.

In addition, the invention proves that the specificity of the DNA bar code with longer length comprising the sequence can be ensured through comparative analysis and experimental verification. When the length of the DNA barcode is too long (e.g., greater than 4000 bp), it is less desirable for the amplification operation. Accordingly, based on the gene sequence, the invention searches longer DNA barcode sequences (SEQ ID No.2 and SEQ ID No. 4) with high specificity in the genome of the Emohnsen Issatchenkia TMCC 70008 strain. On the other hand, when the DNA barcode sequence is 100bp to 200bp in length, it can be used as a DNA micro barcode (DNA minibarcoding). Accordingly, the invention is proved to have high specificity by comparison and experiment verification, and the sequence (such as SEQ ID No. 7) of at least 100bp in the DNA sequence shown as SEQ ID No.4 can realize rapid and accurate identification and differentiation of the Eimeria sambucia TMCC 70008 strain.

Based on the above findings, according to one aspect of the present invention, there is provided a DNA barcode for identifying Eimeria's strain or Pu ' er tea produced by fermentation thereof, characterized in that the DNA barcode is derived from the genome of Eimeria's TMCC 70008 strain and is selected from the sequence of at least 100bp in the DNA sequence shown as SEQ ID No.4.

In some embodiments, the DNA barcode is selected from a sequence of at least 500bp, 600bp, 700bp, 800bp, 900bp, 1000bp, 1100bp, 1200bp, 1300bp, 1400bp, 1500bp, 1600bp, 1700bp, 1800bp, 1900bp, 2000bp, 2100bp, 2200bp, or 2300bp in the DNA sequence shown as SEQ ID No.4.

In other embodiments, the DNA barcode is selected from the group consisting of at least 126bp, 130bp, 140bp, 150bp, 160bp, 170bp, 180bp, 190bp, or 200bp of the DNA sequence shown as SEQ ID No.4, and comprises the sequence shown as SEQ ID No.7.

In some preferred embodiments, the DNA barcode is selected from the sequences set forth in the DNA sequences set forth in SEQ ID No.1, SEQ ID No.2, or SEQ ID No.7. Preferably, the DNA barcode is selected from the group consisting of at least 500bp, 600bp, 700bp, 800bp, 900bp, 1000bp, 1100bp, or 1200bp of the DNA sequence shown in SEQ ID No.1 or SEQ ID No.2.

In other preferred embodiments, the nucleotide sequence of the DNA barcode comprises a sequence as set forth in SEQ ID No.1, SEQ ID No.2, or SEQ ID No.7.

In particular, the present invention has found that the DNA sequence as shown in SEQ ID No.7 contained in SEQ ID No.4 has a particularly high specificity among different species and among different strains within the species of Eimeria's Rosa in the genome of the strain of Eimeria's TMCC 70008. Since the fragment sequence has no homologous sequence found in NCBI database, it is suitable as a mini-barcode to identify the strain of Emoxaburner TMCC 70008.

Preferably, the DNA barcode sequence according to the present invention is shown as SEQ ID No.1, SEQ ID No.2, SEQ ID No.4 or SEQ ID No.7.

The DNA sequence can identify the Eimeria indica TMCC 70008 strain from confusing species, so that the DNA sequence can be used for developing a DNA bar code for identifying the Pu' er tea industrial fermentation production strain Eimeria indica TMCC 70008. Compared with the prior art, the DNA bar code obtained by the method has higher specificity.

The sequences of SEQ ID No.1, SEQ ID No.2, SEQ ID No.3, SEQ ID No.4 and SEQ ID No.7 are shown below:

SEQ ID No.1：

wherein the gray background portion is an intron and the initial site isATGTermination site isTAG。

SEQ ID No.2：

Wherein the double underlined part is an intron, the gray part is SEQ ID No.1, and the initial site isATGTermination site isTAG。

SEQ ID No.3：

MKFLSCFITLLCTAGIALSAEPAIDKFHKFQSLSRYAPIDLDDTMYDELTSAPRDYYVAILLTALEARYGCILCREFQSEWELIAKSWNKANQPDGIKLLFGTLDFSNGRNTFQKLMLQTAPIVLLFPPTVGPSATLDGAPVRFDFSGPISADQLYVWMNRHLPEGPKPPLVRPINYMRLVSAITILLGLITLFTVSSPYVLPVVQNRNLWAAISLIAILLFTSGHMFNHIRKVPYVAGDGRGGISYFAGGFSNQFGMETQIIAAICKFSLELKLSLKIGIGSNHLLIHPDGILSFATIALALKVPRMADAKAQQLAVIVWGVVLLGMYSFLLSIFRMKNGGYPFFLPPF

Peptide YAPIDLDDTMYDELTSAPR is located at the N-terminus of the protein encoded by the gene, as shown underlined in the sequence of SEQ ID No. 3.

SEQ ID No.4：

Wherein the grey part is SEQ ID No.1, double underlines are introns in the gene, and single underlines are SEQ ID No.7.

The invention also designs an amplification primer pair based on the nucleotide sequence of the bar code. The Eimeria strain TMCC 70008 can be rapidly and accurately identified by primer amplification according to the existence of PCR products, the difference of amplified fragments and optional phylogenetic tree.

Thus, according to another aspect of the present invention, there is provided a primer pair for amplifying a DNA barcode according to the present invention.

It will be appreciated by those skilled in the art that, according to the DNA barcode sequences provided herein for identifying Emoxas strains, corresponding primer pairs can be readily designed to amplify the desired DNA barcodes.

Preferably, the nucleotide sequence of the forward primer of the primer pair is identical to such sequence in the genome of the TMCC 70008 strain of sampsonii: the sequence is a sequence in a region from the 1 st position of the nucleotide sequence shown as SEQ ID No.4 to the 2296 th position of the nucleotide sequence shown as SEQ ID No.4 in the genome of the TMCC 70008 strain, and the length of the forward primer is generally 15-30bp; its reverse primer is reverse complementary to such sequence in the genome of the TMCC 70008 strain: the sequence is a sequence from the 86 th position of the nucleotide sequence shown as SEQ ID No.4 to the last position of the nucleotide sequence shown as SEQ ID No.4 in the genome of the TMCC 70008 strain, and the length of the reverse primer is also generally 15-30bp. The product amplified by the forward and reverse primers is a sequence of at least 100bp selected from the nucleotide sequences shown in SEQ ID No.4.

In a more preferred embodiment, the primer pair is used to amplify a sequence selected from the nucleotide sequences set forth in SEQ ID No.1, 2, or 7.

In some embodiments, the nucleotide sequences of the forward and reverse primers are shown below, respectively:

OST-F：5’-ATATAAAAGCCTCTAGGGTGCC-3’(SEQ ID No.5)；

OST-R：5’-CACAACAAGCCTGCCTACC-3’(SEQ ID No.6)。

the primer pair of the invention can realize the specific amplification of the DNA barcode sequence.

The invention also provides a kit for identifying the Emohnsonia strain TMCC 70008 or Pu' er tea produced by fermentation thereof, which comprises the primer pair.

In another embodiment, the kit further comprises a DNA barcode according to the invention. The DNA barcode may be present on a recording medium. The recording medium is, for example, an optical disc. The kit may also comprise any means and reagents for experimental manipulation.

In yet another aspect, the present invention provides a method for identifying a strain of sampsonii emmoshnsoni TMCC 70008, comprising the steps of:

a) Providing genomic DNA of a strain to be tested;

b) Using the genome DNA of the step a) as a template, and carrying out PCR amplification by using the primer pair to obtain a PCR product;

c) Detecting the PCR product by electrophoresis (e.g. agarose gel electrophoresis), if there is no target band, determining that the strain to be detected is not the Emohnsen Rosa TMCC 70008 strain, and if there is a target band, performing step d);

d) Sequencing the obtained PCR product to obtain a nucleotide sequence to be detected; and (3) carrying out homology comparison on the nucleotide sequence to be detected and the nucleotide sequence of the DNA bar code, and judging that the strain to be detected is the Eimeria sampsonii TMCC 70008 strain if the homology is more than 99% (preferably 100%).

The terms "identity" and "homology" as used herein have the same meaning and are used interchangeably.

The invention also provides a method for identifying puer tea produced by fermenting the Eimeria's Rosa TMCC 70008 strain, which comprises the following steps:

a) Providing a puer tea sample;

b) Extracting genome DNA of a microorganism strain from the puer tea sample;

c) Taking the genome DNA in the step b) as a template, and carrying out PCR amplification by using the primer pair to obtain a PCR product;

d) Detecting the PCR product by electrophoresis (e.g., agarose gel electrophoresis), and if there is no target band, determining that the puer tea is not puer tea produced by fermentation of Eimeria sampsonii TMCC 70008 strain; if the target strip exists, performing step e);

e) Sequencing the obtained PCR product to obtain a nucleotide sequence to be detected; and (3) carrying out homology comparison on the nucleotide sequence to be detected and the nucleotide sequence of the DNA bar code, and judging that the puer tea is puer tea generated by fermentation of the Eimeria sambucus TMCC 70008 strain if the homology is more than 99% (preferably 100%).

In step b), genomic DNA of the microorganism strain may be directly extracted from the puer tea sample, or the microorganism strain may be first isolated from puer tea, and genomic DNA may be extracted from the isolated microorganism strain.

In a specific embodiment of the method for identifying strains or puer tea of the present invention, the PCR amplification procedure is: 1) Pre-denaturation at 94 ℃ for 5 min; 2) Denaturation at 94℃for 30 seconds, annealing at 58℃for 30 seconds, extension at 72℃for 15 seconds, wherein the procedure 2) is carried out for 30 cycles; 3) Extension was carried out at 72℃for 10 minutes.

In yet another embodiment of the present invention, the method may further comprise performing a cluster analysis (e.g., phylogenetic tree) of the nucleotide sequence to be tested obtained as a result of the sequencing with the DNA barcode of the present invention, and if the sequence to be tested is clustered with the DNA barcode, determining that the strain to be tested is the sampsonii TMCC 70008 strain, or that the puer tea to be tested is puer tea produced by fermentation of the sampsonii TMCC 70008 strain. For example, the DNA barcode sequence of the Eimeria strain TMCC 70008 and the sample sequence of the strain to be tested (which may include other strain sequences within the species Eimeria) are combined, and the MEGA5 software is used to construct an NJ phylogenetic tree, which identifies the strain to be tested based on the clustering of the sequence of the strain to be tested with the DNA barcode sequence.

In a specific embodiment of the invention, genomic DNA extracted from the strain to be identified is PCR amplified using the primer pair of the invention, followed by agarose gel electrophoresis detection. Identifying strains based on detecting the presence or absence of PCR products: if the strain to be identified does not amplify the corresponding target band, it is indicated that the strain is not TMCC 70008; if the corresponding target band is amplified, it is demonstrated that the strain is likely TMCC 70008. For further identification, the PCR product is sequenced, the DNA sequencing result is subjected to homology comparison with the DNA barcode sequence, so that the similarity (i.e. homology) between the sequences is obtained, and if the sequence homology is less than 99%, the strain to be detected is judged not to be the Emohnsen Sa TMCC 70008 strain. If the sequence homology is greater than or equal to 99%, the strain to be tested is determined to be the Emoxibusa TMCC 70008 strain.

If cluster analysis, such as phylogenetic tree, is performed, the DNA bar code is used to construct NJ phylogenetic tree using MEGA5 software together with the DNA sequencing result (i.e., the sequence to be tested) of each strain to be identified. If the test sequence of the strain to be identified is clustered with the DNA barcode of the strain of Eimeria Sahnsonii TMCC 70008, then the strain of Eimeria Sahnsonii TMCC 70008 is identified.

The term "cluster" as used herein refers to a cluster that is in the same branch and has the same evolutionary distance after phylogenetic tree analysis.

The invention also provides application of the DNA bar code in identifying the Emohnsonia sampsonii TMCC 70008 strain or Pu' er tea produced by fermentation of the same.

The invention also provides application of the primer pair in identifying the Emohnsonia sampsonii TMCC 70008 strain or Pu' er tea produced by fermentation of the same.

The invention also provides application of the kit in identifying the Emohnsonia sambuci TMCC 70008 strain or Pu' er tea produced by fermentation of the same.

Examples

The invention will be further illustrated with reference to specific examples. The methods used in the examples, unless specifically indicated, all employ conventional methods and known tools.

Example 1: acquisition of OST Gene and DNA Bar code

1. Using high coverage proteome techniques, deep coverage studies of the proteome were performed with pFind and pAnno software (where pFind software was used for searching of the proteome database, pAnno software performed genome re-annotation by pasting the proteome data back into the genome) on TMCC 70008 (Rasamsonia emersonii TMCC 70008) of samsunia, and annotated encoding gene verification was performed on its genome. To find new protein coding regions, a six-frame translation database of samsunia TMCC 70008 genomic data was obtained using a six-frame translation (Six Frame Translation) strategy in the system protein genomics, the 6 coding possibilities (+1, +2, +3, -1, -2, -3) of the genome were exhausted, and the nucleic acid sequence was referred to as a "six-frame translation nucleic acid sequence", and the protein sequence was referred to as a "six-frame translation protein sequence". Typically, a six-frame translated nucleic acid sequence is a sequence from one terminator to the next. By using the database, the identification of new peptide fragments and new proteins was performed on high coverage proteome mass spectrum data of total cellular proteins of TMCC 70008 strain by pFInd and pAnno software.

A peptide YAPIDLDDTMYDELTSAPR (SEQ ID No. 8) which is not found in the TMCC 70008 annotation gene of Emoxibusa in the prior art is identified, and the mass spectrum is shown in FIG. 1.

Manual inspection of the mass spectrum showed that peptide YAPIDLDDTMYDELTSAPR secondary mass spectrum (MS ₂ ) And (3) almost all y ion sequences are matched, the signal is strong, and the result is reliable.

2. To further confirm this identification, the peptide was chemically synthesized according to the amino acid sequence of the newly identified peptide YAPIDLDDTMYDELTSAPR, and the high energy collision MS generated from the synthesized peptide ₂ Verification was performed that both the primary parent ion and the secondary daughter ion met the theoretical values, indicating that the sequence of the synthesized peptide fragment was correct, see fig. 2.

Based on this, MS of synthetic peptide fragments of the new peptide fragment sequence identified from large-scale proteome data was examined manually ₂ And large-scale identification of a spectrum of the new peptide fragment, wherein the spectrum and the spectrum are almost completely consistent, and a cosin value obtained by sub-ion similarity is as high as 0.97, so that the identification of the new peptide fragment from the Pu' er tea industrial fermentation strain Emmorsonia solani TMCC 70008 strain is proved to be correct.

3. Depending on the position of the new peptide fragment, a six-frame translated nucleic acid sequence (ORF coding frame), SEQ ID No.2, is obtained, bounded by the regions comprised by the previous stop codon and the subsequent stop codon.

4. In order to further determine the coding start site and the termination site of the coding gene, the six-frame translation nucleic acid sequence is respectively expanded by 1000bp upstream and downstream, the AUGUSTUS is adopted for gene prediction, and the reference species is schizosaccharomyces pombe (Schizosaccharomyces pombe). In this region, the presence of a protein-encoding gene (novel oligosaccharaide transferase-encoding gene, OST) was predicted, and the coding frame was substantially identical to the ORF coding frame sequence described above. According to the prediction result, the possible coding start site and termination site are determined, and the complete gene sequence from the initiator to the terminator is shown as SEQ ID No.1, as shown in FIG. 3.

The corresponding relationship of the amino acid sequence of SEQ ID No.1 and the protein encoded by the same (namely SEQ ID No. 3) is shown in FIG. 4. Peptide YAPIDLDDTMYDELTSAPR is located near the N-terminus of the protein.

The nucleotide sequence shown as SEQ ID No.1 comprises a terminator which is 1163bp in total, an intron region is removed, the total code is 350 amino acids, and the theoretical molecular weight is 38.75kDa. The amino acid sequence of theoretical coding is shown as SEQ ID No. 3.

5. NCBI-BLASTP analysis was performed with the amino acid sequence (SEQ ID No. 3) of the theoretical encoded product of this gene, which showed little homology (< 72%) with the existing sequences in the NCBI nr database, as shown in FIG. 5.

Each line segment under the query result in fig. 5 represents a sequence that has a certain similarity to the target sequence (SEQ ID No. 3) that is matched in NCBI. It can be seen that the left side of the line segments are significantly divergent, not the exact length, indicating that these amino acid sequences are not highly similar to the OST protein (SEQ ID No. 3)

The 9 sequences having the highest homology with the amino acid sequence of TMCC 70008OST protein among the amino acid sequences shown in fig. 5 are listed in table 1. For homology alignment, the evaluation is usually based on data from two aspects, one aspect being the coverage of the sequences, it can be seen from Table 1 that the coverage of these 9 sequences is substantially >99%; on the other hand, the similarity of the matching sequences needs to be examined, and even the highest-ranking amino acid sequence (from the protein-degrading basket bacteria PMI_201) has the sequence similarity of only 71.76%, so that the sequence has low similarity with the amino acid sequence (SEQ ID No. 3) of the TMCC 70008OST protein.

Table 1 shows a sequence with high homology to TMCC 70008OST protein sequence

The Blastp results shown in fig. 5 indicate that the detected misannotated gene product has a domain ost3_ost6, which indicates that the identified sequences are not highly similar to the existing sequences in the NCBI nr database. The sequence alignment shown in FIG. 6 demonstrates the above results, i.e., the similarity of homologous proteins is not high.

6. The DNA sequence of the identified OST gene (i.e., SEQ ID No. 1) was subjected to NCBI-BLASTN analysis, the results of which are shown in FIG. 7.

The left line segment below the query results of FIG. 7 represents the match of the mRNA of the CBS 393.64 oligosacchartransferase subunit of Emoxisenatide (Rasamsonia emersonii), which matches positions 4-348 and 399-500, respectively, of the sequence of SEQ ID No. 1; the line segment on the right represents the matching of the mRNA to the unknown protein (BDV 38 DRAGT_ 260987) of A.pseudoswift (Aspergillus pseudotamarii) CBS 117625, which matches positions 558-918 of the sequence of SEQ ID No. 1.

Table 2 shows the data for sequences with higher homology to the TMCC 70008OST gene sequence (SEQ ID No. 1) in the NCBI-BLASTN analysis described above.

Table 2 sequence with higher homology to TMCC 70008OST Gene sequence

The results in Table 2 show that the sequence SEQ ID No.1 of the OST gene has only 2 homologous sequences in the NCBI database, but the difference is large, and the coverage is not more than 38%, which shows that the OST gene sequence (SEQ ID No. 1) found in the strain TMCC 70008 of Emohnsonia has higher specificity than the existing sequences in the NCBI nr database, and can be used as a DNA bar code for distinguishing the strain TMCC 70008 from other strains or strains.

7. Further consideration is given to the sequence of the transcribed spacer before and after the OST gene sequence (i.e., SEQ ID No. 1), resulting in SEQ ID No.4. A homology (NCBI-BLASTN) comparison was performed with SEQ ID No.4, as shown in FIG. 8.

Table 3 shows the data for sequences with higher homology to SEQ ID No.4 in the NCBI-BLASTN analysis described above. The coverage of these 5 sequences does not exceed 18%, thus indicating that these sequences do not have high similarity to SEQ ID No.4.

Table 3 shows a sequence with high homology with SEQ ID No.4

From the results of FIG. 8 and Table 3, SEQ ID No.4 shows a high specificity in NCBI database. It was revealed that SEQ ID No.4 effectively distinguishes TMCC 70008 from the near-source strain, and thus can be used as a DNA barcode.

8. A homology (NCBI-BLASTN) comparison was further performed on SEQ ID No.2. As shown in FIG. 9, the line segment on the left side of the bottom of the query results represents the match of the mRNA of the CBS 393.64 oligosaccharase subunit of Emoxacillium (Rasamsonia emersonii); the line segment on the right represents the matching of the A.pseudoswift (Aspergillus pseudotamarii) CBS 117625 unknown protein (BDV 38 DRAGT_ 260987) mRNA.

Table 4 shows the data for sequences with higher homology to SEQ ID No.2 in the NCBI-BLASTN analysis described above, which do not cover more than 32% and thus indicate that these sequences are not highly similar to SEQ ID No.2.

Table 4 sequences with higher homology to SEQ ID No.2

The results in FIG. 9 and Table 4 show that SEQ ID No.2 has a higher specificity in the NCBI database. It was revealed that SEQ ID No.2 effectively distinguishes TMCC 70008 from the near-source strain, and thus can be used as a DNA barcode.

Example 2 identification of strains Using DNA barcodes

And judging whether the sample to be detected is the Pu' er tea industrial application strain Eimeria, according to the amplification result of the sample to be detected, the OST gene sequence of the Eimeria, isaria, TMCC 70008 strain and the sequence homology thereof.

The transcription spacer sequence of OST gene of the Eimeria's Thujopsis strain TMCC 70008 is selected and designed.

(1) Based on the results of the gene prediction in step 4 of example 1, PCR primers were designed at both ends of the gene using NCBI primer design tools. The sequences of the obtained forward and reverse primers are respectively as follows:

OST-F：5’-ATATAAAAGCCTCTAGGGTGCC-3’(SEQ ID No.5)；

OST-R：5’-CACAACAAGCCTGCCTACC-3’(SEQ ID No.6)。

the amplified sequence is as follows (SEQ ID No.7, 126 bp):

5’-ATATAAAAGCCTCTAGGGTGCCAACATGAAGCTTCCAAGGATGCCAGCATGAAGCCTCCAGGGATGCCAGCATAAAGCCTTATAAGGGTGCCAGTGTGAAGCCTGTGGGTAGGCAGGCTTGTTGTG-3’

wherein the streaking region is the location of the primer.

SEQ ID No.7 shows no homologous sequences by NCBI-BLASTN homology analysis. It was revealed that SEQ ID No.7 effectively distinguishes TMCC 70008 from the near-source strain, and thus can be used as a DNA barcode.

(2) Bacterial strain origin

TABLE 5 information on relevant strains selected for use

(3) Extracting strain DNA respectively: OMEGA e.z.n.a. was used. ^TM Genomic DNA of each strain was extracted and the DNA concentration of the sample was diluted to 0.5. Mu.g/. Mu.L with sterilized deionized water.

(4) Amplifying the DNA fragment, and performing Polymerase Chain Reaction (PCR), wherein the sequences of the primers are respectively as follows:

forward primer sequence OST-F:5'-ATATAAAAGCCTCTAGGGTGCC-3';

reverse primer sequence OST-R:5'-CACAACAAGCCTGCCTACC-3'.

The PCR reaction system is 50 mu L, and the PCR reagent is Thermo Scientific ^TM Taq DNA polymerase (recombinant): ddH ₂ O 37.7μL、MgCl ₂ 5. Mu.L of dNTPs 4. Mu.L, 1. Mu.L of forward primer, 1. Mu.L of reverse primer, 1. Mu.L of Taq DNA polymerase 0.3. Mu. L, DNA template, and no dye. The amplification procedure was: pre-denaturation at 94 ℃ for 5 min; next, denaturation at 94℃for 30 seconds, annealing at 58℃for 30 seconds, and extension at 72℃for 15 seconds were performed for a total of 30 cycles; finally, the extension is carried out at 72 ℃ for 10 minutes.

(5) Detection of amplification products: the PCR fragment size was determined by electrophoresis using 1.0% agarose gel, 1 XTBE running buffer and using DNA molecular weight markers. If the strain to be detected does not have an expected 126bp amplification band, the strain is not Emohnsen-Sahniki TMCC 70008; if a clear band appears and no band exists, the DNA fragment is sent to a biological sequencing company for sequencing.

(6) The theoretical amplification sequence of the PCR primer is 126bp. As a result, as shown in FIG. 10, the primer was able to amplify only in Eimeria strain TMCC 70008, while the sizes of bands amplified in Eimeria strains CBS 355.92, CBS 395.64, CBS 396.64 and CBS 397.64 of the same species were significantly different from the expected sizes, whereas amplification was not able to be achieved in CBS 472.92 and CBS 549.92.

(7) For strains amplified with the correct bands, sequencing and sequence alignment were performed in order to further verify the sequence of the amplified DNA. Firstly, checking the quality of a sequence peak diagram obtained after sequencing by using software Chromas, and after determining that the quality of the peak diagram meets the requirement of data analysis, splicing forward and reverse sequences by using SeqMan in DNASTAR software package. And (3) carrying out manual proofreading and sequence splicing on the sequencing result, and then carrying out sequence comparison. FIG. 11 shows the result of the comparison of the sequencing of the PCR product of the Emoxaburner Emoxibusae TMCC 70008 strain with the theoretical sequence (SEQ ID No. 7). The results showed that the PCR product of the Eimeria strain TMCC 70008 was 100% similar to the standard DNA bar code of Eimeria strain TMCC 70008 (SEQ ID No. 7). The result shows that the PCR amplification result of the strain to be detected based on the DNA bar code SEQ ID No.7 sequence is completely consistent with the theoretical sequence (SEQ ID No. 7), and other intraspecies strains can be distinguished.

(8) Further performing cluster analysis, combining the DNA bar code SEQ ID No.7 sequence of the Emohnsen Rosa TMCC 70008 strain with the PCR product sequencing result of the strain to be tested, and constructing an NJ phylogenetic tree by using MEGA5 software. Since fragments cannot be amplified in Emamoto grass's CBS 472.92 and CBS 549.92, PCR fragments in TMCC 70008 and other four strains with amplified sequences CBS 355.92, CBS 395.64, CBS 396.64 and CBS 397.64 are sent to sequencing, and the sequencing result is used for constructing an NJ phylogenetic tree. It was found that TMCC 70008 was significantly distinguishable from other parallel strains (FIG. 12), further demonstrating that only strains amplified with fragments conforming to the theoretical size of SEQ ID No.7 could be identified as the Emamoto-Rosa TMCC 70008 strain.

Claims

1. A DNA barcode for identifying an sampsonii strain or pu' er tea produced by fermentation thereof, characterized in that the DNA barcode is derived from the genome of sampsonii TMCC 70008 strain and is selected from the group consisting of a sequence of at least 100bp of the DNA sequence shown in SEQ ID No.4.

2. The DNA barcode of claim 1, wherein the nucleotide sequence of the DNA barcode comprises a sequence as set forth in SEQ ID No.1, SEQ ID No.2, or SEQ ID No. 7; alternatively, the DNA barcode is selected from the group consisting of the sequences shown in the DNA sequences of SEQ ID No.1, SEQ ID No.2, or SEQ ID No. 7; preferably, the nucleotide sequence of the DNA bar code is shown as SEQ ID No.1, SEQ ID No.2, SEQ ID No.4 or SEQ ID No.7.

3. A primer pair for amplifying the DNA barcode of claim 1 or 2.

4. A primer pair according to claim 3, the nucleotide sequence of the forward primer is identical to such sequence in the genome of the TMCC 70008 strain of samsunia: the sequence is a sequence from the 1 st position of the nucleotide sequence shown as SEQ ID No.4 to the 2296 th position of the nucleotide sequence shown as SEQ ID No.4 in the genome of the TMCC 70008 strain, and the length of the forward primer is 15-30bp; its reverse primer is reverse complementary to such sequence in the genome of the TMCC 70008 strain: the sequence is a sequence from the 86 th bit of the nucleotide sequence shown as SEQ ID No.4 to the last bit of the nucleotide sequence shown as SEQ ID No.4 in the genome of the TMCC 70008 strain, and the length of the reverse primer is 15-30bp.

5. The primer pair of claim 4, wherein the forward primer and the reverse primer have the nucleotide sequences shown below, respectively:

forward primer: 5'-ATATAAAAGCCTCTAGGGTGCC-3';

reverse primer: 5'-CACAACAAGCCTGCCTACC-3'.

6. A kit for identifying the sampsonii TMCC 70008 strain or a pu' er tea produced by fermentation thereof, comprising a primer pair according to any one of claims 3-5.

7. A method of identifying a strain of sampsonii TMCC 70008 comprising the steps of:

a) Providing genomic DNA of a strain to be tested;

b) Performing PCR amplification using the genomic DNA of step a) as a template and the primer pair according to any one of claims 3-5 to obtain a PCR product;

d) Sequencing the obtained PCR product to obtain a nucleotide sequence to be detected; and (3) carrying out homology comparison on the nucleotide sequence to be detected and the nucleotide sequence of the DNA bar code according to claim 1 or 2, and judging that the strain to be detected is the Eimeria sampsonii TMCC 70008 strain if the homology is more than 99%.

8. A method for identifying puer tea produced by fermentation of sampsonii aemosi TMCC 70008 strain, comprising the steps of:

a) Providing a puer tea sample;

b) Extracting genome DNA of a microorganism strain from the puer tea sample;

c) Performing PCR amplification using the genomic DNA of step b) as a template and the primer pair according to any one of claims 3 to 5 to obtain a PCR product;

e) Sequencing the obtained PCR product to obtain a nucleotide sequence to be detected; and (3) carrying out homology comparison on the nucleotide sequence to be detected and the nucleotide sequence of the DNA bar code in claim 1 or 2, and judging that the puer tea is puer tea generated by fermenting the Emoxahna samsonii TMCC 70008 strain if the homology is more than 99%.

9. Use of a DNA barcode according to claim 1 or 2 for identifying the sampsonii TMCC 70008 strain or a pu' er tea produced by fermentation thereof.

10. Use of a primer pair according to any one of claims 3-5 for the identification of the sampsonii TMCC 70008 strain or a pu' er tea produced by fermentation thereof.

11. Use of the kit according to claim 6 for identifying the sampsonii aemosi TMCC 70008 strain or the puer tea produced by its fermentation.