CN117363787A - DNA bar code, primer, kit, method and application - Google Patents
DNA bar code, primer, kit, method and application Download PDFInfo
- Publication number
- CN117363787A CN117363787A CN202311568743.4A CN202311568743A CN117363787A CN 117363787 A CN117363787 A CN 117363787A CN 202311568743 A CN202311568743 A CN 202311568743A CN 117363787 A CN117363787 A CN 117363787A
- Authority
- CN
- China
- Prior art keywords
- strain
- tmcc
- candida
- seq
- brownii
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 241000222120 Candida <Saccharomycetales> Species 0.000 claims abstract description 96
- 238000000855 fermentation Methods 0.000 claims abstract description 53
- 230000004151 fermentation Effects 0.000 claims abstract description 53
- 241001122767 Theaceae Species 0.000 claims abstract 14
- 108020004414 DNA Proteins 0.000 claims description 105
- 235000013616 tea Nutrition 0.000 claims description 86
- 239000002773 nucleotide Substances 0.000 claims description 42
- 125000003729 nucleotide group Chemical group 0.000 claims description 42
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 28
- 238000004949 mass spectrometry Methods 0.000 claims description 18
- 150000007523 nucleic acids Chemical class 0.000 claims description 17
- 108020004707 nucleic acids Proteins 0.000 claims description 14
- 102000039446 nucleic acids Human genes 0.000 claims description 14
- 238000012163 sequencing technique Methods 0.000 claims description 14
- 244000005700 microbiome Species 0.000 claims description 13
- 235000019224 Camellia sinensis var Qingmao Nutrition 0.000 claims description 11
- 235000020339 pu-erh tea Nutrition 0.000 claims description 11
- 238000012408 PCR amplification Methods 0.000 claims description 9
- 238000001962 electrophoresis Methods 0.000 claims description 7
- 230000000295 complement effect Effects 0.000 claims description 4
- 108090000623 proteins and genes Proteins 0.000 abstract description 73
- 102000004169 proteins and genes Human genes 0.000 abstract description 34
- 238000004519 manufacturing process Methods 0.000 abstract description 13
- 230000003321 amplification Effects 0.000 abstract description 11
- 238000003199 nucleic acid amplification method Methods 0.000 abstract description 11
- 238000005516 engineering process Methods 0.000 abstract description 8
- 230000004069 differentiation Effects 0.000 abstract description 2
- 238000009655 industrial fermentation Methods 0.000 abstract description 2
- 230000000877 morphologic effect Effects 0.000 abstract description 2
- 238000009412 basement excavation Methods 0.000 abstract 1
- 230000000813 microbial effect Effects 0.000 abstract 1
- 244000269722 Thea sinensis Species 0.000 description 76
- 238000003752 polymerase chain reaction Methods 0.000 description 39
- 108090000765 processed proteins & peptides Proteins 0.000 description 24
- 239000000523 sample Substances 0.000 description 13
- 241000894007 species Species 0.000 description 13
- 102000007079 Peptide Fragments Human genes 0.000 description 11
- 108010033276 Peptide Fragments Proteins 0.000 description 11
- 238000001819 mass spectrum Methods 0.000 description 11
- 125000003275 alpha amino acid group Chemical group 0.000 description 9
- 150000002500 ions Chemical class 0.000 description 9
- 241000222126 [Candida] glabrata Species 0.000 description 8
- 239000012634 fragment Substances 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 7
- 208000032343 candida glabrata infection Diseases 0.000 description 7
- 108010026552 Proteome Proteins 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 6
- 238000011161 development Methods 0.000 description 6
- 230000018109 developmental process Effects 0.000 description 6
- 238000013519 translation Methods 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 5
- 108091026890 Coding region Proteins 0.000 description 4
- 108090000790 Enzymes Proteins 0.000 description 4
- 102000004190 Enzymes Human genes 0.000 description 4
- 238000000246 agarose gel electrophoresis Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 108020004705 Codon Proteins 0.000 description 3
- 108700026244 Open Reading Frames Proteins 0.000 description 3
- 238000007621 cluster analysis Methods 0.000 description 3
- 238000010835 comparative analysis Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 239000000796 flavoring agent Substances 0.000 description 3
- 235000019634 flavors Nutrition 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 102000004196 processed proteins & peptides Human genes 0.000 description 3
- 239000002994 raw material Substances 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 2
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 2
- 241000143060 Americamysis bahia Species 0.000 description 2
- 241000894006 Bacteria Species 0.000 description 2
- 238000001712 DNA sequencing Methods 0.000 description 2
- 241000235347 Schizosaccharomyces pombe Species 0.000 description 2
- 108010006785 Taq Polymerase Proteins 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 238000000137 annealing Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- RYYVLZVUVIJVGH-UHFFFAOYSA-N caffeine Chemical compound CN1C(=O)N(C)C(=O)C2=C1N=CN2C RYYVLZVUVIJVGH-UHFFFAOYSA-N 0.000 description 2
- 238000002425 crystallisation Methods 0.000 description 2
- 230000008025 crystallization Effects 0.000 description 2
- 238000004925 denaturation Methods 0.000 description 2
- 230000036425 denaturation Effects 0.000 description 2
- 238000010612 desalination reaction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000029087 digestion Effects 0.000 description 2
- 230000005684 electric field Effects 0.000 description 2
- 235000013305 food Nutrition 0.000 description 2
- 235000009569 green tea Nutrition 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003834 intracellular effect Effects 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 239000002609 medium Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 229920001184 polypeptide Polymers 0.000 description 2
- 238000012257 pre-denaturation Methods 0.000 description 2
- 238000000746 purification Methods 0.000 description 2
- 239000012521 purified sample Substances 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 description 2
- 125000006850 spacer group Chemical group 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 230000009897 systematic effect Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 1
- 241000238876 Acari Species 0.000 description 1
- 241000192452 Candida blankii Species 0.000 description 1
- 208000024172 Cardiovascular disease Diseases 0.000 description 1
- 241000037488 Coccoloba pubescens Species 0.000 description 1
- 230000007067 DNA methylation Effects 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 206010064571 Gene mutation Diseases 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- LPHGQDQBBGAPDZ-UHFFFAOYSA-N Isocaffeine Natural products CN1C(=O)N(C)C(=O)C2=C1N(C)C=N2 LPHGQDQBBGAPDZ-UHFFFAOYSA-N 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 239000012807 PCR reagent Substances 0.000 description 1
- 239000001888 Peptone Substances 0.000 description 1
- 108010080698 Peptones Proteins 0.000 description 1
- 238000012356 Product development Methods 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 229960001948 caffeine Drugs 0.000 description 1
- VJEONQKOZGKCAK-UHFFFAOYSA-N caffeine Natural products CN1C(=O)N(C)C(=O)C2=C1C=CN2C VJEONQKOZGKCAK-UHFFFAOYSA-N 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 229940041514 candida albicans extract Drugs 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000000748 compression moulding Methods 0.000 description 1
- 238000002856 computational phylogenetic analysis Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000008367 deionised water Substances 0.000 description 1
- 229910021641 deionized water Inorganic materials 0.000 description 1
- 230000003467 diminishing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 235000019225 fermented tea Nutrition 0.000 description 1
- 239000000499 gel Substances 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 150000004676 glycans Chemical class 0.000 description 1
- 239000012535 impurity Substances 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 239000003999 initiator Substances 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000011081 inoculation Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000001840 matrix-assisted laser desorption--ionisation time-of-flight mass spectrometry Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 235000019319 peptone Nutrition 0.000 description 1
- 239000000447 pesticide residue Substances 0.000 description 1
- 150000008442 polyphenolic compounds Chemical class 0.000 description 1
- 235000013824 polyphenols Nutrition 0.000 description 1
- 229920001282 polysaccharide Polymers 0.000 description 1
- 239000005017 polysaccharide Substances 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000001915 proofreading effect Effects 0.000 description 1
- 239000012146 running buffer Substances 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000001269 time-of-flight mass spectrometry Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 239000012138 yeast extract Substances 0.000 description 1
- 239000007222 ypd medium Substances 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/6895—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/686—Polymerase chain reaction [PCR]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12R—INDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
- C12R2001/00—Microorganisms ; Processes using microorganisms
- C12R2001/645—Fungi ; Processes using fungi
- C12R2001/72—Candida
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Genetics & Genomics (AREA)
- Immunology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Botany (AREA)
- Mycology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a DNA bar code, a primer, a kit, a method and application. The invention selects DYBXS-1 standard gene sequence as the DNA bar code by utilizing protein genomics technology, which can realize rapid identification and differentiation of Candida brownii (Candida blancii) endophyte. Therefore, the invention establishes the standard gene sequence and the sample identification method of the Candida brownii (TMCC 70011 strain) for the industrial fermentation production of the puer tea. Compared with the traditional morphological identification method, the method has the characteristics of universality, easy amplification and easy comparison, and the identification efficiency is remarkably improved, so that a powerful technical means is provided for the protection of the controllable pure fermentation process and the excavation, protection and utilization of microbial strain resources in the puer tea fermentation industry.
Description
Technical Field
The invention belongs to the field of fungus strain identification, and particularly relates to a DNA bar code primer composition, a DNA bar code, a kit, a method and application for identifying a strain for fermentation production of puer tea. Specifically, the puer tea fermentation production strain is Candida brownii (TMCC 70011 strain).
Background
Pu' er tea is post-fermented tea produced in the geographical mark range of Yunnan, and is prepared by adopting large-leaf green-sun-dried raw tea as a raw material through a series of processes. The traditional puer tea manufacturing process comprises the following steps: the picked fresh tea leaves are rolled and dried to prepare raw material dried green tea, and then the raw material dried green tea leaves are subjected to impurity removal, tidal water, pile fermentation, airing, screening, compression molding and packaging to leave the factory. In the production of puer tea, the pile fermentation process is a main factor for the quality formation of puer tea, and in the process, the content components such as tea polyphenol, caffeine, some polysaccharide substances and the like in the tea are greatly changed, so that the special flavor, taste, quality and various health care effects of puer tea are achieved.
In the traditional puer tea production, the moist heat environment activates enzymes contained in the tea leaves, so that a part of content components contained in the tea leaves are converted into substances which can be utilized by microorganisms; microorganisms grow in a large quantity in the fermentation process of the puer tea to generate abundant intracellular enzymes and extracellular enzymes, and the intracellular enzymes catalyze a series of conversion of the content components in the tea, so that the puer tea has unique quality. The different producing areas, the microorganism species and the differences of community structures, so that the Pu' er tea has special flavor and quality.
Besides the unique flavor and culture of the puer tea, the puer tea has the health care effects of losing weight, reducing blood sugar and blood fat, preventing and improving cardiovascular diseases, resisting aging, resisting cancer, diminishing inflammation, helping digestion, nourishing stomach and the like, and is also concerned by people and popular with consumers. The increasing market demand pulls the development of the puer tea industry and promotes the economic growth in Yunnan.
With the improvement of the living standard of people, consumers increasingly pay attention to the problems of sanitation, safety and the like of foods. However, food quality safety events have frequently occurred in recent years, and tea quality and safety issues thereof have also been increasingly concerned. In addition to the problem of pesticide residue, microorganisms in the Pu 'er tea pile fermentation process are possibly important factors affecting the quality of tea, so that the development of the Pu' er tea industry is faced with a plurality of dilemmas. For example, the quality of the product is unstable, the production period is long, the labor input is too high, the microorganism quantity exceeds the standard, mites are bred, and the like.
At present, the production of Pu' er tea by most manufacturers is still an empirical fermentation of semi-natural artificial pile fermentation, and although communities mainly comprising dominant common microorganisms including Candida glabrata (Candida blancii) are relatively stable in the pile fermentation process, a large lifting space exists for the stability of the product, and certain potential safety hazards are inevitably generated in the production process. To further obtain consumer favor and market acceptance, break through foreign trade barriers and promote the market competitiveness of puer tea enterprises, manual control, cleaning and high efficiency of puer tea production must be realized, and the product development and the industry chain extension are not broken. To do this, a new technology must be innovated, and a series of safe, clean, efficient and manually controllable automatic Pu 'er tea new processes are invented to ensure the healthy development of Pu' er tea industry.
The artificial inoculation and the pure fermentation of the puer tea are new development directions of the controllable fermentation of puer tea. In order to protect the fermentation process of the puer tea and the quality microorganism germplasm resources, a quick and accurate identification method for the strains for puer tea fermentation is necessary to be developed.
The DNA bar code technology can rapidly and simply identify and distinguish the similar strains, can provide theoretical basis and technical means for the development of the artificial controllable pure fermentation process and the resource protection of the puer tea, and promote the healthy development of puer tea industry.
Disclosure of Invention
In order to overcome the defect of morphology in identifying puer tea fermentation production bacteria, the invention provides a DNA bar code, a primer, a kit, a method and application for identifying puer tea fermentation strain Candida brownii (TMCC 70011), so that the Candida brownii TMCC 70011 strain can be accurately identified from confusing species or compound species, puer tea produced by fermenting the strain can be accurately identified, quick identification and differentiation can be realized, quick identification and assessment can be provided for a puer tea new fermentation process, interference of other miscellaneous bacteria in the fermentation process can be prevented, and a evidence method and basis can be provided for puer tea artificial controllable fermentation process and strain abuse.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
(1) A DNA barcode for identifying candida brownii strain or pu' er tea produced by fermentation thereof, characterized in that the DNA barcode is derived from the genome of candida brownii TMCC 70011 strain and is selected from the sequence of at least 1000bp in the DNA sequence shown in SEQ ID No. 4.
(2) The DNA barcode according to (1), wherein the nucleotide sequence of the DNA barcode comprises a sequence shown as SEQ ID No.1 or SEQ ID No. 2; alternatively, the DNA barcode is selected from the group consisting of the sequences in the DNA sequences shown as SEQ ID No.1 or SEQ ID No. 2; preferably, the nucleotide sequence of the DNA bar code is shown as SEQ ID No.1, SEQ ID No.2 or SEQ ID No. 4.
(3) A primer pair for amplifying the DNA barcode of (1) or (2).
(4) The primer set according to (3), wherein the nucleotide sequence of the forward primer is the same as that in the genome of the Candida brokii TMCC 70011 strain: the sequence is a sequence from the 1 st position of the nucleotide sequence shown as SEQ ID No.4 to the 401 st position of the nucleotide sequence shown as SEQ ID No.4 in the genome of the TMCC 70011 strain, and the length of the forward primer is 15-30bp; its reverse primer is reverse complementary to such sequence in the genome of the TMCC 70011 strain: the sequence is a sequence from 986 th position of the nucleotide sequence shown as SEQ ID No.4 to the last position of the nucleotide sequence shown as SEQ ID No.4 in the genome of the TMCC 70011 strain, and the length of the reverse primer is 15-30bp.
(5) The primer set according to (4), wherein the nucleotide sequences of the forward primer and the reverse primer are as follows:
forward primer: 5'-ATGCACCCGTGAGTATGTGA-3';
reverse primer: 5'-CTCGGCAGTGTTATGCTCAA-3'.
(6) A kit for identifying candida brownii TMCC 70011 strain or puer tea produced by fermentation thereof, comprising the primer pair according to any one of (3) to (5).
(7) A method for identifying candida brownii TMCC 70011 strain, comprising the steps of:
a) Providing genomic DNA of a strain to be tested;
b) Performing PCR amplification using the genomic DNA of step a) as a template and the primer set according to any one of (3) to (5) to obtain a PCR product;
c) Detecting PCR products by electrophoresis, and if no target band exists, judging that the strain to be detected is not the candida brownii TMCC 70011 strain; if there is a target band, performing steps d) and/or e);
d) Sequencing the obtained PCR product to obtain a nucleotide sequence to be detected; comparing the nucleotide sequence to be detected with the nucleotide sequence of the DNA bar code in the step (1) or (2), and judging that the strain to be detected is a candida brownii TMCC 70011 strain if the homology is more than 99%;
e) And (3) performing nucleic acid mass spectrometry on the obtained PCR product, and if the base quality of the PCR product is different from that of the DNA bar code described in (1) or (2) and the number of bases with the mass difference is greater than or equal to 10, judging that the strain to be detected is not the candida brownii TMCC 70011 strain.
(8) A method for identifying puer tea produced by fermentation of candida brownii TMCC 70011 strain, comprising the steps of:
a) Providing a puer tea sample;
b) Extracting genome DNA of a microorganism strain from the puer tea sample;
c) Performing PCR amplification using the genomic DNA of step b) as a template and the primer set according to any one of (3) to (5) to obtain a PCR product;
d) Detecting PCR products through electrophoresis, and if no target strip exists, judging that the puer tea is not puer tea produced by fermentation of candida brownii TMCC 70011 strain; if there is a target strip, performing steps e) and/or f);
e) Sequencing the obtained PCR product to obtain a nucleotide sequence to be detected; comparing the nucleotide sequence to be detected with the nucleotide sequence of the DNA bar code in the step (1) or (2), and judging that the puer tea is puer tea generated by fermentation of candida brownii TMCC 70011 strain if the homology is more than 99%;
f) And (3) carrying out nucleic acid mass spectrometry on the obtained PCR product, and judging that the puer tea is not puer tea generated by fermentation of candida brownii TMCC 70011 strain if the base quality of the PCR product is different from that of the DNA bar code described in (1) or (2) and the number of the bases with the mass difference is more than or equal to 10.
(9) The use of the DNA bar code according to (1) or (2) for identifying candida brownii TMCC 70011 strain or puer tea produced by fermentation thereof.
(10) The use of the primer pair according to any one of (3) to (5) for identifying candida brownii TMCC 70011 strain or puer tea produced by fermentation thereof.
(11) The kit according to (6) for identifying candida brownii TMCC
The 70011 strain or the puer tea produced by the fermentation thereof.
Compared with the prior art, the invention has the following advantages and positive effects:
1. the present invention uses protein genomics technology to find annotation-missing peptide fragments from the genome of Candida brownii (TMCC 70011 strain). Based on the position of the peptide coding sequence in the genome, the possible gene sequences and protein sequences encoding the peptide are determined. The invention further develops a DNA bar code based on the coding gene (DYBXS-1 gene) of the peptide through careful research and comparative analysis, the bar code sequence can realize the rapid identification and distinction of strains in candida glabrata, can accurately identify candida glabrata TMCC 70011 strains from confusing species or complex species, and further can accurately identify puer tea produced by fermenting the strains.
2. The invention further discovers that the sequence of the DYBXS-1 gene (e.g., SEQ ID No. 1) has the characteristics of versatility, easy amplification and easy alignment, and the differences among different strains in candida glabrata species are obvious compared with other genes.
3. The invention establishes a standard gene sequence and a sample identification method of a strain candida brownii TMCC 70011 strain for the industrial fermentation production of puer tea. Compared with the traditional morphological identification method, the method provided by the invention has the advantage that the identification efficiency of the target strain is obviously improved. The method has low requirements on the integrity of the sample, and the identification index can be quantized, so that an effective basis is provided for timely judging the puer tea fermentation process and germplasm resources thereof. In addition, the invention can further utilize a phylogenetic tree method of cluster analysis and a strain difference locus array based on a nucleic acid mass spectrum strategy, the identification sensitivity and accuracy are better than those of a conventional molecular identification method, and the blank of identifying the candida glabrata strain for producing puer fermentation tea based on a DNA bar code technology is filled.
Drawings
Figure 1 shows a mass spectrum of the newly identified peptide fragment SIAAEQQDAVSSR.
FIG. 2 shows a comparison of the mass spectrum of a chemically synthesized peptide SIAAEQQDAVSSR with the mass spectrum of a originally identified peptide; the original identification peptide fragment is a peptide fragment obtained by mass spectrometry analysis and identification; the upper part of the figure is the mass spectrum of the original identified peptide fragment, and the lower part is the mass spectrum of the chemically synthesized peptide fragment.
FIG. 3 shows the sequence of SEQ ID No.1, with the start site beingATGTermination site isTAA。
FIG. 4 shows the correspondence of SEQ ID No.1 and the amino acid sequence of the protein encoded thereby (SEQ ID No. 3), wherein the grey part is the newly identified peptide SIAAEQQDAVSSR.
FIG. 5 shows SDS-PAGE separation of whole cell proteins of TMCC 70011, the left M lane is a protein molecular weight marker, the middle TCL lane is a whole cell protein separation result of TMCC 70011, the whole lane is cut into 30 strips according to molecular weight and protein abundance, the right side shows positions of each strip, wherein the corresponding protein of SEQ ID NO.3 is located at the 3 rd strip, which is shown bolded in the middle TCL lane.
FIG. 6 shows the results of homology analysis of SEQ ID No.3 by NCBI-BLASTP. The line segments below the query result represent sequences that have some similarity to the target sequence that are matched in NCBI.
FIG. 7 shows the BLASTP homology comparison of SEQ ID No. 3.
FIG. 8 shows the sequence of SEQ ID No.4, the front and rear underlined regions are the positions where the primers of one embodiment of the present invention are located, the gray background region is the sequence of SEQ ID No.1, and the initiation site isATGTermination site is TAA。
FIG. 9 shows the result of agarose gel electrophoresis of a product obtained by PCR amplification using a primer designed for a DNA barcode of the Candida bronsted TMCC 70011 strain of the present invention.
FIG. 10 shows the result of PCR product sequencing of TMCC 70011 strain and the result of DNA barcode SEQ ID No. 1.
FIG. 11 shows an array of differential sites based on a nucleic acid mass spectrometry strategy, wherein the forefront number represents the position of the differential site in SEQ ID No.4, 1 is the base of the Candida broeknownst ABL strain at that position, and 2 is the base of the TMCC 70011 strain at that position, so that the differential sites of both can be seen.
Detailed Description
The invention is further described below by means of the description of specific embodiments and with reference to the accompanying drawings, which are not intended to be limiting, but a person skilled in the art can make various modifications or improvements according to the basic idea of the invention, all without departing from the scope of the invention.
As used herein, the term "misannotation" refers to the inability of gene prediction software (e.g., geneMark, augustus, glimmer, etc.) to predict a gene or protein that is not normally expressed in high amounts under specific conditions after the species has completed genome sequencing, and therefore is difficult to find in research.
The term "DNA barcoding" refers to a novel technique for molecular identification of species using a standard, short DNA fragment within the genome, which allows rapid and accurate species identification.
The term "six-frame translation" is a known term in proteomics and genomics, and is based on the principle that when a DNA encodes a protein, the triplet codon is used to encode the protein, and given a DNA sequence, there are 3 encoding possibilities, plus 3 encoding possibilities on its complementary strand, for a total of 6 encoding possibilities (+1, +2, +3, -3, -2, -1).
The invention utilizes the systematic protein genomics technology to discover a species-specific gene-encoded protein which is difficult to be discovered by traditional gene prediction software from candida brownii TMCC 70011 strains. The peptide segment and relevant proteomics mass spectrum data of the gene coding sequence product are supported to be accurate and reliable. Based on the position of the polypeptide coding sequence in the genome, the possible gene sequences and protein sequences encoding the polypeptide are determined, and the coding gene is DYBXS-1 gene. The invention discovers that the coding frame (SEQ ID No. 1) of the DYBXS-1 gene can identify the candida glabrata TMCC 70011 strain from confusing species, so that the DYBXS-1 gene can be used for developing DNA bar codes for identifying candida glabrata TMCC 70011 strain which is a fermentation production strain of puer tea industry. Compared with the prior art, the DNA bar code obtained by the method has higher specificity.
The invention further obtains the DNA bar code which can accurately and effectively identify the candida brownii TMCC 70011 strain based on the special DNA sequence (SEQ ID No. 1) through careful research and comparative analysis.
Specifically, the invention discovers a peptide segment which is not contained in the annotation gene of the original candida brownii TMCC 70011 strain through systematic proteomics research, the sequence of the peptide segment is SIAAEQQDAVSSR, and the peptide segment and the related proteomics mass spectrum data of the peptide segment are supported to be accurate and reliable. Based on the position of the peptide coding sequence in the genome of the candida brownii TMCC 70011 strain, a possible gene sequence (SEQ ID No. 1) and a protein sequence (SEQ ID No. 3) encoding the protein are determined. The length of SEQ ID No.1 is 1071bp, and according to comparison and analysis, the SEQ ID No.1 is unique to the genome of the candida brownii TMCC 70011 strain, so that the sequence can be used as a DNA bar code of the candida brownii TMCC 70011 strain.
In addition, the invention proves that the specificity of the DNA bar code with longer length and containing the sequence is more ensured through comparative analysis and experimental verification. When the length of the DNA barcode is too long (e.g., greater than 4000 bp), it is less desirable for the amplification operation. Based on the gene sequence, the invention searches longer and better-specific DNA barcode sequences (SEQ ID No.2 and SEQ ID No. 4) in the genome of the candida brownii TMCC 70011 strain.
Based on the above findings, according to one aspect of the present invention, there is provided a DNA barcode for identifying candida brownii strain or puer tea produced by fermentation thereof, characterized in that the DNA barcode is derived from genome of candida brownii TMCC 70011 strain and is selected from a sequence of at least 1000bp among DNA sequences shown as SEQ ID No. 4.
In some embodiments, the DNA barcode is selected from a sequence of at least 1000bp, 1100bp, 1200bp, 1300bp, 1310bp, 1320bp, 1330bp, 1340bp, 1350bp, or 1360bp in the DNA sequence as shown in SEQ ID No. 4.
In some preferred embodiments, the DNA barcode is selected from the sequences in the DNA sequences set forth as SEQ ID No.1 or SEQ ID No. 2. Preferably, the DNA barcode is selected from the group consisting of at least 1000bp, 1010bp, 1020bp, 1030bp, 1040bp, 1050bp and 1060bp of the DNA sequence shown in SEQ ID No. 1. Also preferably, the DNA barcode is selected from the group consisting of at least 1000bp, 1010bp, 1020bp, 1030bp, 1040bp, 1050bp, 1060bp, 1070bp, 1080bp, 1090bp, 1100bp, 1110bp, 1120bp, 1130bp, 1140bp, or 1150bp sequences in the DNA sequence shown in SEQ ID No. 2.
In other preferred embodiments, the nucleotide sequence of the DNA barcode comprises a sequence as set forth in SEQ ID No.1 or SEQ ID No. 2.
In particular, the present invention has found that the DNA sequence shown as SEQ ID No.1, SEQ ID No.2 or SEQ ID No.4 has a particularly high specificity among different species and among different strains within the species of Candida brownii in the genome of the strain TMCC 70011. Since the above fragment sequences have no homologous sequences found in NCBI database, they are suitable as DNA barcodes for identifying Candida bronsted TMCC 70011 strain.
Preferably, the DNA barcode sequence according to the present invention is shown as SEQ ID No.1, SEQ ID No.2 or SEQ ID No. 4.
The DNA sequence can identify the candida brownii TMCC 70011 strain from confusing species, so that the DNA sequence can be used for developing and identifying the DNA bar code of the candida brownii TMCC 70011 strain produced by fermentation of puer tea industry. Compared with the prior art, the DNA bar code obtained by the method has higher specificity.
The sequences of SEQ ID No.1, SEQ ID No.2, SEQ ID No.3 and SEQ ID No.4 are shown below:
SEQ ID No.1:
ATGGCCAACACACAGTCGCCAGACGCTTCCCGCAATGCAGAACATCAGCGCACCACTTCTTCTAGCACGAGAAGGGGCGTTGCGGCCGCGATCTCTCGGATAAGAGCGGCGATGGTGACCGATCCTGGAGATGATGACGTTGTAAAGCGGGAGGTTCCTGACGAGAAAAAGCTCGATTTGACGTCAGAGCCTTCGCATCCTGCTCTTGGCCTTACTCGTGCCGAGTATATGCGCAAACAGGACCGCCTGTTCAAGGAACTAACTGGCATGGATAGTTTCAAGGATGTGAGGCGAGACGAACACATTGATGATATACACGATCCGCTACTGATGCGGGTGCTTGAGACGTACGCCATGGCGGCAGACCAGGACATCTTCGACGACGTCAATCACCAACGCCATGTCATGGACGAGATTGAAGAAATGCAAAATATCGCATTTCAGTCGGTCTACGATATGGAGTCCGACGAGAAACGCATACGCGCGCTGGCTTTTAACCTCGGGCAAATGAGGTCCGAGATGGCTAAGCTGGAAGTTAAGTTCCTTCAAATGCAGCAGTCGCGAACTTTGACTGAAAAACAGCTGGTCGCTTACCGTGACGGATACCGTTTGGAGCTCTTGCAGCGTCTTCAGTTGGAAGAGGTATGTCGCATATTGCGCAATCGGCTAGATCATGCCGGAAAGAGCATCGCTGCAGAACAACAAGATGCTGTCTCATCTCGTGCTCCGGAGAAACAGGCACATTCAGAACCATCACGTCAGGAGACAAGTCAACTGGCATCGATAGAGGCTGAGAGTGCGAGCAAGGTGTCCATGGCGTCGTCTGCGCCTTCAAAGCCGACGACAGAGAAGTCTGTGGGTGAGCCTGCAAAACTTCAAAAATCGACGACCGAGAAGACTATGGAAGAGCCCAAAACCAAGTCCGATATTGATCCCTCGAAACGCCAAAAGGACGAAGTAAGTCACGAAATAACTTCCCAGTTAAATTGTCGGATTCAACAAACTAACAGCAGGACACTGCCGTCGAGGAGGACCGAGAAGTCAGGGAGCAGTTGGCCAGCGACGTATTA A
wherein the initial site is ATGTermination site isTAA。
SEQ ID No.2:
Wherein the gray background is SEQ ID No.1, and the initial site isATGTermination site isTAA。
SEQ ID No.3:
MANTQSPDASRNAEHQRTTSSSTRRGVAAAISRIRAAMVTDPGDDDVVKREVPDEKKLDLTSEPSHPALGLTRAEYMRKQDRLFKELTGMDSFKDVRRDEHIDDIHDPLLMRVLETYAMAADQDIFDDVNHQRHVMDEIEEMQNIAFQSVYDMESDEKRIRALAFNLGQMRSEMAKLEVKFLQMQQSRTLTEKQLVAYRDGYRLELLQRLQLEEVCRILRNRLDHAGKSIAAEQQDAVSSRAPEKQAHSEPSRQETSQLASIEAESASKVSMASSAPSKPTTEKSVGEPAKLQKSTTEKTMEEPKTKSDIDPSKRQKDEVSHEITSQLNCRIQQTNSRTLPSRRTEKSGSSWPATY*
Peptide SIAAEQQDAVSSR is underlined in the sequence of SEQ ID No. 3.
SEQ ID No.4:
Wherein the gray background is SEQ ID No.1, and the initial site isATGTermination site isTAAThe method comprises the steps of carrying out a first treatment on the surface of the The underlined portions at both ends of SEQ ID No.4 are the positions where the primers designed in example 2 were located.
The invention also designs an amplification primer pair based on the nucleotide sequence of the bar code. The candida brownii TMCC 70011 strain can be rapidly and accurately identified by primer amplification according to the existence of PCR products, the difference of amplified fragments and optional phylogenetic tree.
Thus, according to another aspect of the present invention, there is provided a primer pair for amplifying a DNA barcode according to the present invention.
It will be appreciated by those skilled in the art that, according to the DNA barcode sequence provided by the present invention for identifying a strain of Candida brownii, a corresponding primer pair can be easily designed to amplify a desired DNA barcode.
Preferably, the nucleotide sequence of the forward primer of the primer pair is identical to such sequence in the genome of candida brownii TMCC 70011 strain: the sequence is a sequence in a region from the 1 st position of the nucleotide sequence shown as SEQ ID No.4 to the 401 st position of the nucleotide sequence shown as SEQ ID No.4 in the genome of the TMCC 70011 strain, and the length of the forward primer is generally 15-30bp; its reverse primer is reverse complementary to such sequence in the genome of the TMCC 70011 strain: the sequence is a sequence from 986 th position of the nucleotide sequence shown as SEQ ID No.4 to the last position of the nucleotide sequence shown as SEQ ID No.4 in the genome of the TMCC 70011 strain, and the length of the reverse primer is also generally 15-30bp. The product amplified by the forward and reverse primers is a sequence of at least 1000bp selected from the nucleotide sequences shown in SEQ ID No. 4.
In a more preferred embodiment, the primer pair is used to amplify a sequence selected from the nucleotide sequences set forth in SEQ ID No.1 or 2.
In some specific embodiments, the nucleotide sequences of the forward and reverse primers are shown below, respectively:
DYBXS1-F:5’-ATGCACCCGTGAGTATGTGA-3’(SEQ ID No.5);
DYBXS1-R:5’-CTCGGCAGTGTTATGCTCAA-3’(SEQ ID No.6)。
the primer pair of the invention can realize the specific amplification of the DNA barcode sequence.
The invention also provides a kit for identifying candida brownii TMCC 70011 strain or puer tea produced by fermentation of the candida brownii TMCC 70011 strain, which comprises the primer pair.
In another embodiment, the kit further comprises a DNA barcode according to the invention. The DNA barcode may be present on a recording medium. The recording medium is, for example, an optical disc. The kit may also comprise any means and reagents for experimental manipulation.
In yet another aspect, the present invention provides a method for identifying candida brownii TMCC 70011 strain, comprising the steps of:
a) Providing genomic DNA of a strain to be tested;
b) Using the genome DNA of the step a) as a template, and carrying out PCR amplification by using the primer pair to obtain a PCR product;
c) Detecting the PCR product by electrophoresis (e.g., agarose gel electrophoresis), if there is no target band, determining that the strain to be detected is not a strain of Candida brownii TMCC 70011, and if there is a target band, performing steps d) and/or e);
d) Sequencing the obtained PCR product to obtain a nucleotide sequence to be detected; the nucleotide sequence to be detected is subjected to homology comparison with the nucleotide sequence of the DNA bar code, and if the homology is more than 99% (preferably 100%), the strain to be detected is judged to be the candida brownii TMCC 70011 strain;
e) And (3) performing nucleic acid mass spectrometry analysis on the obtained PCR product, and judging that the strain to be detected is not the candida brownii TMCC 70011 strain if the base mass of the PCR product is different from that of the DNA bar code according to the invention and the number of the bases with the mass difference is greater than or equal to 10, preferably greater than or equal to 6, more preferably greater than or equal to 4, and most preferably greater than or equal to 2.
The terms "identity" and "homology" as used herein have the same meaning and are used interchangeably.
The invention also provides a method for identifying puer tea produced by fermentation of candida brownii TMCC 70011 strain, which comprises the following steps:
a) Providing a puer tea sample;
b) Extracting genome DNA of a microorganism strain from the puer tea sample;
c) Taking the genome DNA in the step b) as a template, and carrying out PCR amplification by using the primer pair to obtain a PCR product;
d) Detecting the PCR product by electrophoresis (e.g., agarose gel electrophoresis), and if there is no target band, determining that the puer tea is not puer tea produced by fermentation of candida brownii strain TMCC 70011; if there is a target strip, performing steps e) and/or f);
e) Sequencing the obtained PCR product to obtain a nucleotide sequence to be detected; the nucleotide sequence to be detected is subjected to homology comparison with the nucleotide sequence of the DNA bar code, and if the homology is more than 99% (preferably 100%), the puer tea is judged to be puer tea generated by fermentation of candida brownii TMCC 70011 strain;
f) Subjecting the obtained PCR product to nucleic acid mass spectrometry, and if the base quality of the PCR product is different from that of the DNA barcode according to claim 1 or 2, and the number of bases having the difference in quality is 10 or more, preferably 6 or more, more preferably 4 or more, most preferably 2 or more, then determining that the puer tea is not puer tea produced by fermentation of candida glabrata TMCC 70011 strain.
In step b), genomic DNA of the microorganism strain may be directly extracted from the puer tea sample, or the microorganism strain may be first isolated from puer tea, and genomic DNA may be extracted from the isolated microorganism strain.
In a specific embodiment of the method for identifying strains or puer tea of the present invention, the PCR amplification procedure is: 1) Pre-denaturation at 94 ℃ for 8 min; 2) Denaturation at 94℃for 45 seconds, annealing at 56℃for 45 seconds, elongation at 72℃for 1 minute for 15 seconds, wherein the procedure 2) is carried out for 32-35 cycles, preferably for 35 cycles; 3) Extension was carried out at 72℃for 10 minutes.
In yet another embodiment of the present invention, the method may further comprise performing a cluster analysis (e.g., phylogenetic tree) of the nucleotide sequence to be tested obtained as a result of the sequencing with the DNA barcode of the present invention, and if the sequence to be tested is clustered with the DNA barcode, determining that the strain to be tested is candida brownii TMCC 70011 strain, or that the puer tea to be tested is puer tea produced by fermentation of candida brownii TMCC 70011 strain. For example, a sample sequence of a strain to be tested (which may include other strain sequences within the species candida brownii) and a DNA barcode sequence of the strain to be tested are combined, and an NJ phylogenetic tree is constructed using MEGA 6 or PAUP software, and the strain to be tested is identified based on the clustering of the sequence of the strain to be tested and the DNA barcode sequence.
In a specific embodiment of the invention, genomic DNA extracted from the strain to be identified is PCR amplified using the primer pair of the invention, followed by agarose gel electrophoresis detection. Identifying strains based on detecting the presence or absence of PCR products: if the strain to be identified does not amplify the corresponding target band, it is indicated that the strain is not TMCC 70011; if the corresponding target band is amplified, it is demonstrated that the strain is likely TMCC 70011. For further identification, the PCR product is sequenced, the DNA sequencing result is subjected to homology comparison with the DNA barcode sequence, so that the similarity (i.e. homology) between the sequences is obtained, and if the sequence homology is less than 99%, the strain to be detected is judged not to be the candida brownii TMCC 70011 strain. If the sequence homology is greater than or equal to 99%, the strain to be tested is determined to be the candida brownii TMCC 70011 strain.
If cluster analysis, such as phylogenetic tree, is performed, the DNA bar code is used to construct NJ phylogenetic tree using MEGA 6 or PAUP software together with the DNA sequencing result (i.e., the sequence to be tested) of each strain to be identified. If the test sequence of the strain to be identified is clustered with the DNA barcode of the strain of Candida Brookitiana TMCC 70011, the strain of Candida Brookitiana TMCC 70011 is identified.
The term "cluster" as used herein refers to a cluster that is in the same branch and has the same evolutionary distance after phylogenetic tree analysis.
In yet another embodiment of the present invention, the method of identifying a strain or pu' er tea of the present invention may further comprise performing a strain differential site array based on a nucleic acid mass spectrometry strategy, which identifies the TMCC 70011 strain by means of a highly sensitive, highly accurate nucleic acid mass spectrometry technique. Nucleic acid mass spectrometry enables accurate detection of a fragment of interest DNA sequence by combining PCR techniques, high throughput chips, and mass spectrometers (e.g., time-of-flight mass spectrometry MALDI-TOF MS). The principle is that the PCR product of the strain to be detected is mixed with the chip matrix for crystallization after being treated, the matrix is rapidly evaporated by laser irradiation, the sample is ionized by utilizing the absorbed laser energy, ions generated by the ionization of the sample are accelerated to pass through a flight pipeline under the action of an electric field, and the flight time of the ions is different due to the mass and the charged charges of the ions, so that the difference of DNA sequences can be efficiently and sensitively analyzed, and different strains can be distinguished. The DNA bar code is compared with the genome sequence of the strain to be identified, and a difference site array of the two strains is drawn according to the comparison result, so that the difference sites of the two strains are found, and the candida brownii TMCC 70011 strain is identified.
Specifically, in the nucleic acid mass spectrometry, the genomic sequence of the strain to be identified may be first PCR amplified by designing a specific primer pair to obtain a PCR product fragment of the strain to be identified. Performing Shrimp Alkaline Phosphatase (SAP) treatment, eliminating redundant primers and dNTPs in a reaction system, performing single-base extension reaction and desalination purification, combining the purified sample with a matrix chip, and detecting by a mass spectrometer.
The invention also provides application of the DNA bar code in identifying candida brownii TMCC 70011 strain or puer tea produced by fermentation thereof.
The invention also provides application of the primer pair in identifying candida brownii TMCC 70011 strain or puer tea produced by fermentation thereof.
The invention also provides application of the kit in identifying candida brownii TMCC 70011 strain or puer tea produced by fermentation thereof.
Examples
The invention will be further illustrated with reference to specific examples. The methods used in the examples, unless specifically indicated, all employ conventional methods and known tools.
Example 1: DYBXS-1 gene and DNA barcoding
1. Using high coverage proteome techniques, deep coverage studies of the proteome were performed on candida brownii TMCC 70011 (Candida blankii TMCC 70011) using pFind and pAnno software (where pFind software is used for searching of the proteome database, pAnno software performs genome re-annotation by pasting the proteome data back into the genome), and annotation-encoded gene verification was performed on its genome. To find new protein coding regions, a six-frame translation database of candida brownii TMCC 70011 genome data was obtained using a six-frame translation (Six Frame Translation) strategy in the system protein genomics, the 6 coding possibilities (+1, +2, +3, -1, -2, -3) of the genome were exhausted, and the nucleic acid sequence was referred to as a "six-frame translation nucleic acid sequence", and the protein sequence was referred to as a "six-frame translation protein sequence". Typically, a six-frame translated nucleic acid sequence is a sequence from one terminator to the next. Using this database, the identification of new peptides and new proteins was performed using pFind and pAnno software on high coverage proteome mass spectrometry data of total cellular proteins of TMCC 70011 strain.
A peptide SIAAEQQDAVSSR not found in the candida brownii TMCC 70011 annotation gene in the prior art is identified, and a mass spectrum is shown in figure 1.
Manual inspection of the mass spectrum showed that peptide SIAAEQQDAVSSR secondary mass spectrum (MS 2 ) And (3) almost all y ion sequences are matched, the signal is strong, and the result is reliable.
2. To further confirm this identification, the peptide was chemically synthesized according to the amino acid sequence of the newly identified peptide SIAAEQQDAVSSR, and the high energy collision MS generated from the synthesized peptide 2 Verification was performed that both the primary parent ion and the secondary daughter ion met the theoretical values, indicating that the sequence of the synthesized peptide fragment was correct, see fig. 2.
Based on this, MS of synthetic peptide fragments of the new peptide fragment sequence identified from large-scale proteome data was examined manually 2 And large-scale identification of a new peptide fragment spectrogram, wherein the two spectrograms are almost completely consistent, and a cosin value obtained by sub-ion similarity is as high as 0.99, so that the identification of the new peptide fragment from the candida brownii TMCC 70011 strain in the puer tea industry is proved to be correct.
3. Comparing the peptide sequence with TMCC 70011 genome, and obtaining six-frame translated nucleic acid sequence by using the former stop codon and the latter stop codon as boundary according to the position of the new peptide, namely open reading frame (Open Reading Frame, ORF) DNA sequence shown in SEQ ID No. 2.
4. In order to further determine the coding start site and the termination site of the coding gene, the ORF sequence was extended by 1000bp upstream and downstream respectively, gene prediction was performed using AUGUSTUS, and reference species were selected from Schizosaccharomyces pombe (Schizosaccharomyces pombe). The presence of a protein-encoding gene was predicted in this region, and the coding frame was substantially identical to the ORF sequence (SEQ ID No. 2) described above. According to the prediction result, the possible coding start site and termination site are determined, and the complete gene sequence from the initiator to the terminator is shown as SEQ ID No.1, as shown in FIG. 3.
The corresponding relationship of the amino acid sequence of SEQ ID No.1 and the protein encoded by the same (namely SEQ ID No. 3) is shown in FIG. 4. Peptide SIAAEQQDAVSSR is seated downstream of the protein.
The nucleotide sequence shown as SEQ ID No.1 comprises 1071bp of terminator and codes 356 amino acids in total, and the theoretical molecular weight is 40.42kDa (without containing introns). The amino acid sequence of the theoretical coding of the DYBXS-1 gene is shown as SEQ ID No. 3.
5. To further determine the molecular weight of the protein encoded by the region of the ORF sequence identified by SIAAEQQDAVSSR, we determined its apparent molecular weight.
As shown in FIG. 5, TMCC 70011 was cultured in YPD medium of 2% peptone+1% yeast extract+2% glucose at 38deg.C for 6 hours, and fresh cells in the early stage were collected for subsequent experimental study. Cell whole proteins were separated using SDS-PAGE, and the whole lane was cut into 30 strips together according to molecular weight and protein abundance, and subjected to in-gel digestion and mass spectrometry, wherein the protein corresponding to SEQ ID No.3 was identified in the mass spectrometry data of strip 3.
6. NCBI-BLASTP analysis (FIG. 6) was performed on the amino acid sequence (SEQ ID No. 3) of the theoretical encoded product of the gene, which showed very low homology (no more than 27%) with the existing sequences in the NCBI nr database, indicating that the SEQ ID No.3 sequence has better sequence specificity.
The line segment under the query result in FIG. 6 represents a sequence that has some similarity to the target sequence (SEQ ID No. 3) that is matched in NCBI. It can be seen that there are only 1 similar sequences, and that the amino acid sequence is not highly similar to SEQ ID No. 3.
The data for the 1-sequence shown in FIG. 6 with the highest homology to the amino acid sequence of the TMCC 70011DYBXS-1 protein are set forth in Table 1. As can be seen from the results in Table 1, the coverage of the sequence with SEQ ID No.3 is only 61%, and the similarity is only 27%, thus indicating that the sequence has low similarity with the amino acid sequence (SEQ ID No. 3) of TMCC 70011DYBXS-1 protein.
Table 1 shows a sequence with high homology to TMCC 70011DYBXS-1 protein sequence
Based on the BLASTP results shown in FIG. 6, the detected misannotated gene product has a domain COG5038, but a position that deviates from the homologous sequence obtained by BLASTP. The result shows that the sequence of SEQ ID No.3 has better specificity and very low homology compared with the existing sequence in NCBI nr database.
FIG. 7 shows an alignment of the amino acid sequence (SEQ ID No. 3) of the TMCC 70011DYBXS-1 protein with the above putative protein AWJ20_3636, which demonstrates the above result that the similarity of the homologous proteins is low.
7. The result of NCBI-BLASTN analysis of the DNA sequence of the identified DYBXS-1 gene (i.e., SEQ ID No. 1) shows that there is no homologous sequence in the NCBI database, which indicates that the DYBXS-1 gene sequence found in the strain of Candida bronkinensis TMCC 70011 has a higher specificity than the existing sequence in the NCBI nr database, and can be used as a DNA barcode for distinguishing the strain of TMCC 70011 from other strains or strains.
8. Further consideration was given to the sequence of the transcribed spacer before and after the DYBXS-1 gene sequence (i.e., SEQ ID No. 1), resulting in SEQ ID No.4, as shown in FIG. 8.
Since the sequences of SEQ ID No.2 and SEQ ID No.4 both contain the sequence of SEQ ID No.1, when SEQ ID No.1 has no homologous sequences in the NCBI database, it can be reasonably determined that the sequences of SEQ ID No.2 and SEQ ID No.4 also have no homologous sequences in the NCBI database. Further, by BlastN analysis of the SEQ ID No.4 sequence, it was confirmed that the sequence did not find a homologous sequence in the NCBI database. Therefore, SEQ ID No.2 and SEQ ID No.4 have higher specificity than the existing sequences in NCBI nr database, and can also be used as DNA barcodes for distinguishing TMCC 70011 strains from other strains or strains.
Example 2 identification of strains Using DNA barcodes
And judging whether the sample to be detected is the candida brownii TMCC 70011 strain used in the puer tea industry according to the PCR amplification result of the sample to be detected, the candida brownii TMCC 70011 strain DYBXS-1 gene sequence and the sequence homology thereof.
A transcription spacer sequence of a candida brownii TMCC 70011 strain DYBXS-1 gene is selected to design a primer.
(1) Based on the results of the gene prediction in step 4 of example 1, PCR primers were designed at both ends of the gene using NCBI primer design tools. The sequences of the obtained forward and reverse primers are respectively as follows:
DYBXS1-F:5’-ATGCACCCGTGAGTATGTGA-3’(SEQ ID No.5);
DYBXS1-R:5’-CTCGGCAGTGTTATGCTCAA-3’(SEQ ID No.6)。
the primer is located in the front and rear underlined regions of the SEQ ID No.4 sequence.
(2) Extraction of DNA of TMCC 70011 strain: OMEGA e.z.n.a. was used. TM The genomic DNA of the strain was extracted from the yeast DNA kit of (C), and the DNA concentration of the sample was diluted to 0.5. Mu.g/. Mu.L with sterilized deionized water.
(3) Amplifying the DNA fragment, and performing Polymerase Chain Reaction (PCR), wherein the sequences of the primers are respectively as follows:
forward primer sequence DYBXS1-F:5'-ATGCACCCGTGAGTATGTGA-3';
reverse primer sequence DYBXS1-R:5'-CTCGGCAGTGTTATGCTCAA-3'.
The PCR reaction system is 50 mu L, and the PCR reagent is Thermo Scientific TM Taq DNA polymerase (recombinant): ddH 2 O 37.7μL、MgCl 2 5. Mu.L of dNTPs 4. Mu.L, 1. Mu.L of forward primer, 1. Mu.L of reverse primer, 1. Mu.L of Taq DNA polymerase 0.3. Mu. L, DNA template, and no dye. The amplification procedure was: pre-denaturation at 94 ℃ for 8 min; then denaturation at 94℃for 45 seconds, annealing at 56℃for 45 seconds, and extension at 72℃for 1 minute for 15 seconds, followed by a total of 32-35 cycles; finally, the extension is carried out at 72 ℃ for 10 minutes.
(4) Detection of amplification products: the PCR fragment size was determined by electrophoresis using 1.0% agarose gel, 1 XTBE running buffer and using a DNA molecular weight marker. If the strain to be tested does not have an expected 1370bp (SEQ ID No. 4) amplification band, the strain is not candida brownii TMCC 70011; if a clear band appears and no band exists, the DNA fragment is sent to a biological sequencing company for sequencing.
(5) The theoretical amplification sequence of the PCR primer is 1370bp. As a result, as shown in FIG. 9, the primer was able to effect amplification in the Candida bronsted TMCC 70011 strain.
(6) For strains amplified with the correct bands, sequencing and sequence alignment were performed in order to further verify the sequence of the amplified DNA. Firstly, checking the quality of a sequence peak diagram obtained after sequencing by using software Chromas, and after determining that the quality of the peak diagram meets the requirement of data analysis, splicing forward and reverse sequences by using SeqMan in DNASTAR software package. And (3) carrying out manual proofreading and sequence splicing on the sequencing result, and then carrying out sequence comparison. FIG. 10 shows the result of the comparison of the sequencing of the PCR product of the strain TMCC 70011 of Candida Brookfield with the theoretical sequence (SEQ ID No. 4). The results showed that the PCR product of the Candida Brookfield TMCC 70011 strain was 100% similar to the Candida Brookfield TMCC 70011 standard DNA barcode (SEQ ID No. 4). The result shows that the PCR amplification result of the strain to be detected based on the DNA bar code SEQ ID No.4 sequence is completely consistent with the theoretical sequence (SEQ ID No. 4), the feasibility of the method of the invention is further confirmed, and the existence of the annotation-missing gene is verified.
(7) TMCC 70011 was further aligned with the parallel strain and the reference genomic sequence ASM2473431v1 (https:// www.ncbi.nlm.nih.gov/dataset/genome/GCA_ 024734315.1 /) of Candida bronstaeca, also the only strain genomic sequence sequenced and assembled in Candida bronstaeca, was downloaded from the NCBI database, from the Candida bronstaeca ABL strain.
(8) Bacterial strain origin
TABLE 2 information on relevant strains selected for use
(9) The nucleic acid mass spectrometry technology is a detection method based on the mass spectrometry technology, can be used for detecting information such as sequence, structure, content and the like of DNA or RNA, and is widely applied to analysis of Single Nucleotide Polymorphism (SNP), gene mutation, DNA methylation modification, copy Number Variation (CNV) and the like of genes. The principle is that PCR products of strains to be detected are treated by Shrimp Alkaline Phosphatase (SAP), redundant primers and dNTPs in a reaction system are eliminated, then the purified samples are mixed with a chip matrix for crystallization through single base extension reaction and desalination and purification, the matrix is rapidly evaporated through laser irradiation, the absorbed laser energy is utilized to ionize the samples, ions generated by ionization of the samples are accelerated to pass through a flight pipeline under the action of an electric field, and the flight time of the ions is different due to the mass and charged of the ions, so that the differences of DNA sequences can be obtained through high-efficiency and sensitive analysis, and different strains can be distinguished.
According to this strategy, the DNA barcode standard sequence (SEQ ID No. 4) of the Candida brownii TMCC 70011 strain and the sequence of the Candida brownii ABL strain were aligned (aligned using R Studio v 4.3.1 software), and the differential site arrays of the two strains were mapped according to the alignment (mapped using R Studio v 4.3.1 software), and found to have 115 differential sites in total (FIG. 11), which was sufficient to identify the Candida brownii TMCC 70011 strain based on the nucleic acid mass spectrometry strategy.
Claims (11)
1. A DNA barcode for identifying candida brownii strain or pu' er tea produced by fermentation thereof, characterized in that the DNA barcode is derived from the genome of candida brownii TMCC 70011 strain and is selected from the sequence of at least 1000bp in the DNA sequence shown in SEQ ID No. 4.
2. The DNA barcode of claim 1, wherein the nucleotide sequence of the DNA barcode comprises a sequence as set forth in SEQ ID No.1 or SEQ ID No. 2; alternatively, the DNA barcode is selected from the group consisting of the sequences in the DNA sequences shown as SEQ ID No.1 or SEQ ID No. 2; preferably, the nucleotide sequence of the DNA bar code is shown as SEQ ID No.1, SEQ ID No.2 or SEQ ID No. 4.
3. A primer pair for amplifying the DNA barcode of claim 1 or 2.
4. A primer pair according to claim 3, wherein the forward primer has a nucleotide sequence identical to the sequence in the genome of candida brownii TMCC 70011 strain: the sequence is a sequence from the 1 st position of the nucleotide sequence shown as SEQ ID No.4 to the 401 st position of the nucleotide sequence shown as SEQ ID No.4 in the genome of the TMCC 70011 strain, and the length of the forward primer is 15-30bp; its reverse primer is reverse complementary to such sequence in the genome of the TMCC 70011 strain: the sequence is a sequence from 986 th position of the nucleotide sequence shown as SEQ ID No.4 to the last position of the nucleotide sequence shown as SEQ ID No.4 in the genome of the TMCC 70011 strain, and the length of the reverse primer is 15-30bp.
5. The primer pair of claim 4, wherein the forward primer and the reverse primer have the nucleotide sequences shown below, respectively:
forward primer: 5'-ATGCACCCGTGAGTATGTGA-3';
reverse primer: 5'-CTCGGCAGTGTTATGCTCAA-3'.
6. A kit for identifying candida brownii TMCC 70011 strain or puer tea produced by fermentation thereof, comprising the primer pair according to any one of claims 3-5.
7. A method for identifying candida brownii TMCC 70011 strain, comprising the steps of:
a) Providing genomic DNA of a strain to be tested;
b) Performing PCR amplification using the genomic DNA of step a) as a template and the primer pair according to any one of claims 3-5 to obtain a PCR product;
c) Detecting PCR products by electrophoresis, and if no target band exists, judging that the strain to be detected is not the candida brownii TMCC 70011 strain; if there is a target band, performing steps d) and/or e);
d) Sequencing the obtained PCR product to obtain a nucleotide sequence to be detected; comparing the nucleotide sequence to be detected with the nucleotide sequence of the DNA bar code of claim 1 or 2, and judging that the strain to be detected is candida brownii TMCC 70011 strain if the homology is more than 99%;
e) And (3) performing nucleic acid mass spectrometry on the obtained PCR product, and judging that the strain to be detected is not the candida brownii TMCC 70011 strain if the base quality of the PCR product is different from that of the DNA bar code according to claim 1 or 2 and the number of the bases with the mass difference is greater than or equal to 10.
8. A method for identifying puer tea produced by fermentation of candida brownii TMCC 70011 strain, comprising the steps of:
a) Providing a puer tea sample;
b) Extracting genome DNA of a microorganism strain from the puer tea sample;
c) Performing PCR amplification using the genomic DNA of step b) as a template and the primer pair according to any one of claims 3 to 5 to obtain a PCR product;
d) Detecting PCR products through electrophoresis, and if no target strip exists, judging that the puer tea is not puer tea produced by fermentation of candida brownii TMCC 70011 strain; if there is a target strip, performing steps e) and/or f);
e) Sequencing the obtained PCR product to obtain a nucleotide sequence to be detected; comparing the nucleotide sequence to be detected with the nucleotide sequence of the DNA bar code in claim 1 or 2, and judging that the puer tea is puer tea generated by fermentation of candida brownii TMCC 70011 strain if the homology is more than 99%;
f) And (3) carrying out nucleic acid mass spectrometry on the obtained PCR product, and judging that the puer tea is not puer tea generated by fermentation of candida brownii TMCC 70011 strain if the base quality of the PCR product is different from that of the DNA bar code according to claim 1 or 2 and the number of the bases with the quality difference is more than or equal to 10.
9. Use of a DNA barcode according to claim 1 or 2 for identifying candida brownii TMCC 70011 strain or pu' er tea produced by fermentation thereof.
10. Use of a primer pair according to any one of claims 3-5 for identifying candida brownii TMCC 70011 strain or puer tea produced by fermentation thereof.
11. The use of the kit according to claim 6 for identifying candida brownii TMCC 70011 strain or puer tea produced by fermentation thereof.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311568743.4A CN117363787A (en) | 2023-11-22 | 2023-11-22 | DNA bar code, primer, kit, method and application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311568743.4A CN117363787A (en) | 2023-11-22 | 2023-11-22 | DNA bar code, primer, kit, method and application |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117363787A true CN117363787A (en) | 2024-01-09 |
Family
ID=89394823
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311568743.4A Pending CN117363787A (en) | 2023-11-22 | 2023-11-22 | DNA bar code, primer, kit, method and application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117363787A (en) |
-
2023
- 2023-11-22 CN CN202311568743.4A patent/CN117363787A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111235294B (en) | DNA bar code and primer for screening high-quality Tibetan brown mushrooms and application of DNA bar code and primer | |
Lleixà et al. | Microbiome dynamics during spontaneous fermentations of sound grapes in comparison with sour rot and Botrytis infected grapes | |
US20240218463A1 (en) | Dna barcode for screening floccularia luteovirens with high total polysaccharide content | |
CN112080557A (en) | DNA barcode-based method for identifying producing area of cordyceps sinensis | |
CN107779521B (en) | DNA bar code and application thereof in identifying muscadine grapes | |
CN117363787A (en) | DNA bar code, primer, kit, method and application | |
CN117604147A (en) | DNA bar code, primer, kit, method and application | |
CN109385485B (en) | DNA bar code, primer, kit, method and application | |
CN109385484B (en) | DNA bar code, primer, kit, method and application | |
CN108950039B (en) | DNA bar code, primer, kit, method and application | |
CN114196773A (en) | DNA bar code for screening yellow green rolling hair mushroom with high total fat content | |
Hoff | Molecular typing of wine yeasts: Evaluation of typing techniques and establishment of a database | |
CN111662995B (en) | DNA bar code, primer, kit, method and application | |
CN108795932B (en) | DNA bar code, primer, kit, method and application | |
CN116804231A (en) | DNA bar code, primer, kit, method and application | |
CN109402278B (en) | DNA bar code, primer, kit, method and application | |
CN108866221B (en) | DNA bar code, primer, kit, method and application | |
CN108103218B (en) | DNA bar code primer, DNA bar code, kit, method and application for rapidly identifying alternaria adefovea strain | |
CN109385486B (en) | DNA bar code, primer, kit, method and application | |
CN109402279B (en) | DNA bar code, primer, kit, method and application | |
Andrés-Barrao et al. | Identification techniques of acetic acid bacteria: Comparison between MALDI-TOF MS and Molecular Biology Techniques | |
CN109385483B (en) | DNA bar code, primer, kit, method and application | |
Divol et al. | Stellenbosch University, Stellenbosch, South Africa | |
CN108118098B (en) | DNA bar code primer, DNA bar code, kit, method and application for rapidly identifying alternaria adefovea strain | |
CN108103217B (en) | DNA bar code primer, DNA bar code, kit, method and application for rapidly identifying alternaria adefovea strain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |