CN108165564B - Mycobacterium tuberculosis H37Rv encoding gene and application thereof - Google Patents
Mycobacterium tuberculosis H37Rv encoding gene and application thereof Download PDFInfo
- Publication number
- CN108165564B CN108165564B CN201711251274.8A CN201711251274A CN108165564B CN 108165564 B CN108165564 B CN 108165564B CN 201711251274 A CN201711251274 A CN 201711251274A CN 108165564 B CN108165564 B CN 108165564B
- Authority
- CN
- China
- Prior art keywords
- gene
- mycobacterium tuberculosis
- h37rv
- rv3108c
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 73
- 241001646725 Mycobacterium tuberculosis H37Rv Species 0.000 title claims abstract description 11
- 108700035964 Mycobacterium tuberculosis HsaD Proteins 0.000 title claims abstract description 10
- 241001302239 Mycobacterium tuberculosis complex Species 0.000 claims abstract description 37
- 238000000034 method Methods 0.000 claims description 21
- 238000003752 polymerase chain reaction Methods 0.000 claims description 12
- 230000003321 amplification Effects 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 9
- 238000012163 sequencing technique Methods 0.000 claims description 7
- 241000186359 Mycobacterium Species 0.000 claims description 5
- 238000003745 diagnosis Methods 0.000 claims description 3
- 201000010099 disease Diseases 0.000 claims description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 2
- 238000001502 gel electrophoresis Methods 0.000 claims description 2
- 125000003275 alpha amino acid group Chemical group 0.000 claims 1
- 239000002773 nucleotide Substances 0.000 claims 1
- 125000003729 nucleotide group Chemical group 0.000 claims 1
- 238000001514 detection method Methods 0.000 abstract description 10
- 102000007079 Peptide Fragments Human genes 0.000 description 22
- 108010033276 Peptide Fragments Proteins 0.000 description 22
- 108020004414 DNA Proteins 0.000 description 14
- 108090000765 processed proteins & peptides Proteins 0.000 description 14
- 108700026244 Open Reading Frames Proteins 0.000 description 10
- 201000008827 tuberculosis Diseases 0.000 description 10
- 150000001413 amino acids Chemical class 0.000 description 8
- 102000004169 proteins and genes Human genes 0.000 description 8
- 238000012795 verification Methods 0.000 description 8
- 241000187479 Mycobacterium tuberculosis Species 0.000 description 7
- 108091028043 Nucleic acid sequence Proteins 0.000 description 5
- 238000012408 PCR amplification Methods 0.000 description 5
- 244000052616 bacterial pathogen Species 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 239000003153 chemical reaction reagent Substances 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 150000002500 ions Chemical class 0.000 description 4
- 102000004196 processed proteins & peptides Human genes 0.000 description 4
- 241000894007 species Species 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 4
- 241000894006 Bacteria Species 0.000 description 3
- 108010026552 Proteome Proteins 0.000 description 3
- 238000001962 electrophoresis Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000000241 respiratory effect Effects 0.000 description 3
- 210000002345 respiratory system Anatomy 0.000 description 3
- 108020004705 Codon Proteins 0.000 description 2
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 2
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 2
- 241000282412 Homo Species 0.000 description 2
- 241000992249 Mycobacterium canettii CIPT 140070010 Species 0.000 description 2
- 206010057190 Respiratory tract infections Diseases 0.000 description 2
- 241000700605 Viruses Species 0.000 description 2
- 230000001580 bacterial effect Effects 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 208000015181 infectious disease Diseases 0.000 description 2
- 238000004949 mass spectrometry Methods 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 238000012070 whole genome sequencing analysis Methods 0.000 description 2
- 108020004465 16S ribosomal RNA Proteins 0.000 description 1
- OTOXOKCIIQLMFH-KZVJFYERSA-N Arg-Ala-Thr Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CCCN=C(N)N OTOXOKCIIQLMFH-KZVJFYERSA-N 0.000 description 1
- XYBJLTKSGFBLCS-QXEWZRGKSA-N Asp-Arg-Val Chemical compound NC(N)=NCCC[C@@H](C(=O)N[C@@H](C(C)C)C(O)=O)NC(=O)[C@@H](N)CC(O)=O XYBJLTKSGFBLCS-QXEWZRGKSA-N 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 208000035473 Communicable disease Diseases 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- SBHVGKBYOQKAEA-SDDRHHMPSA-N Gln-His-Pro Chemical compound C1C[C@@H](N(C1)C(=O)[C@H](CC2=CN=CN2)NC(=O)[C@H](CCC(=O)N)N)C(=O)O SBHVGKBYOQKAEA-SDDRHHMPSA-N 0.000 description 1
- 241000282414 Homo sapiens Species 0.000 description 1
- PMGDADKJMCOXHX-UHFFFAOYSA-N L-Arginyl-L-glutamin-acetat Natural products NC(=N)NCCCC(N)C(=O)NC(CCC(N)=O)C(O)=O PMGDADKJMCOXHX-UHFFFAOYSA-N 0.000 description 1
- JLWZLIQRYCTYBD-IHRRRGAJSA-N Leu-Lys-Arg Chemical compound [H]N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O JLWZLIQRYCTYBD-IHRRRGAJSA-N 0.000 description 1
- SBANPBVRHYIMRR-GARJFASQSA-N Leu-Ser-Pro Chemical compound CC(C)C[C@@H](C(=O)N[C@@H](CO)C(=O)N1CCC[C@@H]1C(=O)O)N SBANPBVRHYIMRR-GARJFASQSA-N 0.000 description 1
- SBANPBVRHYIMRR-UHFFFAOYSA-N Leu-Ser-Pro Natural products CC(C)CC(N)C(=O)NC(CO)C(=O)N1CCCC1C(O)=O SBANPBVRHYIMRR-UHFFFAOYSA-N 0.000 description 1
- LPAJOCKCPRZEAG-MNXVOIDGSA-N Lys-Glu-Ile Chemical compound CC[C@H](C)[C@@H](C(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](N)CCCCN LPAJOCKCPRZEAG-MNXVOIDGSA-N 0.000 description 1
- VSJAPSMRFYUOKS-IUCAKERBSA-N Met-Pro-Gly Chemical compound CSCC[C@H](N)C(=O)N1CCC[C@H]1C(=O)NCC(O)=O VSJAPSMRFYUOKS-IUCAKERBSA-N 0.000 description 1
- 241001467553 Mycobacterium africanum Species 0.000 description 1
- 241000186366 Mycobacterium bovis Species 0.000 description 1
- 241001312372 Mycobacterium canettii Species 0.000 description 1
- 241000211133 Mycobacterium caprae Species 0.000 description 1
- 241000187919 Mycobacterium microti Species 0.000 description 1
- 241000699502 Mycobacterium mungi Species 0.000 description 1
- 241000656726 Mycobacterium orygis Species 0.000 description 1
- 241001457456 Mycobacterium pinnipedii Species 0.000 description 1
- JKJSIYKSGIDHPM-WBAXXEDZSA-N Phe-Phe-Ala Chemical compound C[C@H](NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@@H](N)Cc1ccccc1)C(O)=O JKJSIYKSGIDHPM-WBAXXEDZSA-N 0.000 description 1
- 206010036790 Productive cough Diseases 0.000 description 1
- HRNQLKCLPVKZNE-CIUDSAMLSA-N Ser-Ala-Leu Chemical compound [H]N[C@@H](CO)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(O)=O HRNQLKCLPVKZNE-CIUDSAMLSA-N 0.000 description 1
- KDKLLPMFFGYQJD-CYDGBPFRSA-N Val-Ile-Arg Chemical compound CC[C@H](C)[C@@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)NC(=O)[C@H](C(C)C)N KDKLLPMFFGYQJD-CYDGBPFRSA-N 0.000 description 1
- MJFSRZZJQWZHFQ-SRVKXCTJSA-N Val-Met-Val Chemical compound CC(C)[C@@H](C(=O)N[C@@H](CCSC)C(=O)N[C@@H](C(C)C)C(=O)O)N MJFSRZZJQWZHFQ-SRVKXCTJSA-N 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 108010008355 arginyl-glutamine Proteins 0.000 description 1
- 230000007321 biological mechanism Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 238000004587 chromatography analysis Methods 0.000 description 1
- 238000012136 culture method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000004925 denaturation Methods 0.000 description 1
- 230000036425 denaturation Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000014113 dietary fatty acids Nutrition 0.000 description 1
- 239000012154 double-distilled water Substances 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000001647 drug administration Methods 0.000 description 1
- 238000001976 enzyme digestion Methods 0.000 description 1
- 238000011841 epidemiological investigation Methods 0.000 description 1
- 229930195729 fatty acid Natural products 0.000 description 1
- 239000000194 fatty acid Substances 0.000 description 1
- 150000004665 fatty acids Chemical class 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000002458 infectious effect Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000001819 mass spectrum Methods 0.000 description 1
- 244000005700 microbiome Species 0.000 description 1
- 239000003147 molecular marker Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 239000013641 positive control Substances 0.000 description 1
- 238000012257 pre-denaturation Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 208000008128 pulmonary tuberculosis Diseases 0.000 description 1
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000003892 spreading Methods 0.000 description 1
- 210000003802 sputum Anatomy 0.000 description 1
- 208000024794 sputum Diseases 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000005945 translocation Effects 0.000 description 1
- IBIDRSSEHFLGSD-UHFFFAOYSA-N valinyl-arginine Natural products CC(C)C(N)C(=O)NC(C(O)=O)CCCN=C(N)N IBIDRSSEHFLGSD-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/195—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
- C07K14/35—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Mycobacteriaceae (F)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/689—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Zoology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Wood Science & Technology (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Gastroenterology & Hepatology (AREA)
- Medicinal Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to a mycobacterium tuberculosis H37Rv coding gene which can be used as a standard gene for molecular identification of mycobacterium tuberculosis complex and is used for molecular identification and clinical detection of the mycobacterium tuberculosis complex.
Description
Technical Field
The invention relates to the field of gene detection, in particular to identification of pathogenic bacteria species.
Background
Mycobacterium Tuberculosis (MTB) is a pathogenic bacterium that causes tuberculosis in humans. It can invade all organs of the body, but pulmonary tuberculosis is the most common. Tuberculosis is an extremely important infectious disease so far and seriously threatens the life health of human beings. It is reported by WHO that about 800 new cases occur each year, and at least 300 million people die from the disease. The clinical bacterial strain of MTB is difficult to culture, slow in growth, capable of cross-infecting with other mycobacteria, difficult to distinguish between tuberculosis and other respiratory tract infection symptoms and the like, and brings great difficulty to clinical rapid diagnosis and treatment. Therefore, the establishment of a quick, accurate, specific, sensitive and cheap tuberculosis detection method is a necessary premise for effectively treating and controlling tuberculosis spreading, and is a new challenge and a new task for detecting mycobacterium in clinical laboratories.
Mycobacterium tuberculosis complex (MTBC) includes the Mycobacterium groups m.tuberculosis, m.africanum, m.orygis, m.bovis, m.microti, m.canettii, m.caprae, m.pinnipedii, m.subcatetate, m.mungi, which all cause tuberculosis in humans and other life forms. At present, the domestic and foreign MTBC identification method is mainly divided into the following three categories: traditional separation culture method; molecular level detection (IS6110, restriction fragment length polymorphism analysis, multi-site variable number repeat polymorphism analysis, etc.); a method for analyzing the components of a microorganism (fatty acid, mycolic acid) by chromatography. The three methods have respective advantages, but have disadvantages, such as long separation culture period and low thallus culturable rate; at present, the molecular level detection is poor in specificity, sensitivity and simplicity; the analysis cost of the thallus component characteristics is high, and the operation is complex.
MTB H37Rv completed whole genome sequencing in 1998, the MTB strain that completed whole genome sequencing the earliest. From this point on, researchers in various countries are perfecting and supplementing H37Rv gene annotation databases based on strategies such as algorithm optimization, annotation software updating, transcriptomics and proteomics. However, since MTB belongs to prokaryotes, annotation errors (over-annotation, gene boundary error, ORF initiation, termination site error, alternative splicing, ribosome translocation, missing annotation) may still exist in genome annotation due to the inherent shortcomings of the prokaryote genome annotation technology, which brings trouble to deep and accurate analysis of biological mechanisms. In order to solve the problem, proteomics (proteomics) has been used for correcting the annotated gene of H37Rv, however, high-proportion false positive, difficulty in annotated gene prediction, new gene verification, new gene function analysis and application thereof, and the like, are problems faced in the field.
In general, the traditional mycobacterium tuberculosis complex (MTBC) identification strategy has the defects of long period, tedious steps, low specificity and sensitivity and the like. In order to further perfect re-annotation of the H37Rv whole genome, missing annotation genes in H37Rv are found, the H37Rv whole genome missing annotation genes and application technologies thereof in MTBC molecular identification are effectively protected, and a method for quickly and accurately identifying the MTBC group by using the H37Rv new genes is imperatively developed.
Disclosure of Invention
An object of the present invention is to provide a new encoding gene of mycobacterium tuberculosis H37Rv, which is H37Rv leaky annotation encoding gene Rv3108c (-3476972-3477175 |), which can be used as a barcode molecular marker of mycobacterium tuberculosis complex for detecting mycobacterium tuberculosis complex, and the sequence of which is shown in SEQ ID NO. 1.
Other objects of the present invention include providing specific PCR primers useful for amplifying the above-described encoding genes and providing a method of detecting or identifying the presence of a binding Mycobacterium complex in a sample; the invention also provides a detection kit related to the coding gene and application of the gene.
According to one aspect of the invention, by comparing proteomic research techniques, a protein coding sequence of H37Rv that is difficult to find by genetic prediction software was discovered that effectively distinguishes MTBC from other species of the same genus. The gene is a missing annotation gene of Mycobacterium tuberculosis (Mycobacterium tuberculosis H37Rv), namely Rv3108c (- | 3476972-. Comparative genomics studies show that the gene sequence can distinguish the Mycobacterium tuberculosis complex (MTBC) strain from other species of Mycobacterium.
Specifically, a primer capable of specifically amplifying the Rv3108c (- |3476972-3477175|) gene of MTBC is designed, namely the primer provided by the invention, and the primer sequence is as follows:
F:5’-GACCAGTGCCCTCGCAGT-3’;
R:5’-AGGACGATCATGGCTCCG-3’。
according to the existence of the gene DNA sequence PCR product in the sample to be detected or the difference of the DNA sequence, the MTBC can be quickly and accurately identified.
According to another aspect of the present invention, based on the above-mentioned new standard encoding gene of Mycobacterium tuberculosis H37Rv, the present invention specifically establishes a method for detecting or identifying Mycobacterium tuberculosis complex, comprising the following steps:
(1) separating and extracting genome DNA from a sample to be detected;
(2) and (2) performing PCR amplification by using the DNA obtained in the step (1) as a template and adopting the following primers:
F:5’-GACCAGTGCCCTCGCAGT-3’(SEQ ID NO.4);
R:5’-AGGACGATCATGGCTCCG-3’(SEQ ID NO.5)。
(3) performing gel electrophoresis analysis or sequencing on the DNA product obtained by amplification in the step (2);
(4) and (3) comparing the result of the step (3) with the barcode gene Rv3108c (-) (- |3476972-3477175|), and if the homology is more than 99%, judging that the sample to be detected contains the mycobacterium tuberculosis complex.
Further, the detection method is characterized in that electrophoresis analysis is performed on the PCR product primarily according to the DNA bar code principle, and if the strain to be detected does not have a target band, the strain is not MTBC; if the band exists, further sequencing verification can be carried out, the sequence obtained by sequencing and the standard sequence of Rv3108c (- |3476972-3477175|) of H37Rv are subjected to homologous comparison and alignment to obtain the similarity between the sequences, and if the sequence homology is more than 99 percent, the strain can be judged to be MTBC; and (3) distinguishing the MTBC family from nontuberculous mycobacteria, common respiratory pathogenic bacteria and common respiratory viruses according to the clustering condition of the DNA barcode sequence of the strain to be identified and the standard sequence.
The detection method can be used for strain identification research of the mycobacterium tuberculosis complex and can also be used for clinical rapid inspection. The sample to be detected can be H37Rv strain, other MTBC, nontuberculous mycobacteria, respiratory tract common pathogenic bacteria and respiratory tract common virus strain; or directly using sputum, saliva or blood of tuberculosis and other respiratory patients.
Based on the above method, the present invention also provides a detection kit, wherein the kit contains a reagent for detecting the novel standard encoding gene of Mycobacterium tuberculosis H37Rv in a container, and simultaneously provides manufacturing, using and marketing information about the medicine or biological product, which can be approved by a government drug administration. For example, after PCR amplification, the reagent for directly detecting the Rv3108c (- |3476972-3477175|) gene in the sample may comprise one or more of amplification primers, dNTPs, DNA polymerase used for PCR reaction and its buffer, reagents required for enzyme digestion reaction and/or sequencing reaction, etc. It is known to those skilled in the art that the above components are merely illustrative, and for example, the primers may employ the specific PCR primers described above, and the DNA polymerase used for the PCR reaction is an enzyme capable of being used for PCR amplification. The detection of the encoding gene of the present invention can also be provided in the form of an integrated, e.g., gene chip.
Has the advantages that: the invention provides a standard gene and a molecular identification method for molecular identification of Mycobacterium tuberculosis complex (MTBC), wherein the gene can effectively distinguish MTBC from other species of the same genus, the identification method using the gene overcomes the defects of primer design multiplicity, poor result repeatability and the like in the existing identification process of the Mycobacterium tuberculosis complex, has the characteristics of universality, easy amplification and easy comparison, can accurately identify the class from other mycobacteria with close relativity or other respiratory tract infectious germs, and provides powerful technical means and research tools for the epidemiological investigation and the rapid diagnosis and identification of clinical tuberculosis patients.
Drawings
FIG. 1: evidence of peptide profile matching supporting the discovery of new coding genes;
FIG. 2: comparing the mass spectrogram of the synthesized peptide fragment with the mass spectrogram of the original identified peptide fragment;
FIG. 3: a corresponding diagram of a protein sequence coded by ORF of the peptide fragment locus region; the underlined part is the peptide identified in proteomics and verified by the synthetic peptide;
FIG. 4: comparing the homology of the Rv3108c (- |3476972-3477175|) standard gene sequence;
FIG. 5: the result of BLASTP of a protein sequence corresponding to the Rv3108c (- |3476972-3477175|) gene of the H37Rv strain;
FIG. 6: the result of agarose gel electrophoresis of the PCR amplification product of the Rv3108c (- |3476972-3477175|) specific primer;
wherein, the specific information of each lane sample is shown in Table 1;
FIG. 7: the PCR amplification sequencing result of the Rv3108c (- |3476972-3477175|) gene is compared with a standard sequence.
Detailed Description
The invention is further described with reference to specific embodiments, but the scope of the claims is not limited thereto. The reagents used in the present invention are all commercially available.
Example 1: search for genes encoding missing release of the genome of strain H37Rv
1.1 high coverage proteomic validation of the genome of the H37Rv strain
The deep coverage study of proteome was performed on the H37Rv strain using the high coverage proteome technique. Annotated encoding gene validation was performed on its genome using the pFind 3 engine based on the Tuberculosis (20160307) database. To find new protein coding regions, we performed six-reading-frame database translation of H37Rv in the genome-wide (NC _000962.3) file published at NCBI using pAnno software based on proteomic technology, and identified new peptide fragments and new proteins using this database for mass spectrometry data. To reduce the false positive rate, we used 3 filtering methods to separately estimate class FDR for the annotated and new peptide fragments, S-FDR, T-FDR I and T-FDR II, respectively, during the data filtering.
Through data analysis, a total of 3238H 37Rv annotated genes are identified, and the coverage is as high as more than 80% of the strain, which is the largest mass spectrum data of the H37Rv protein reported so far. In addition, we obtained new peptide fragments after 3 FDRs ≤ 1 filtration. In order to further ensure the quality of the new peptide fragments, spectrogram quality screening is carried out on spectrograms corresponding to the new peptide fragments left after filtration, and finally some peptide fragments with good spectrogram quality are reserved. To further investigate that these peptides with higher spectral quality were not due to single amino acid mutations in the annotated peptide, we performed amino acid mutation checks to ensure that these new peptides were newly identified peptides of H37 Rv.
1.2 verification of the encoded protein and database of the Rv3108c (-3476972-3477175 |) Gene
After high coverage proteome verification, we find some suspected new peptide fragments which are leaked to release, and perform peptide fragment synthesis verification on the suspected new peptide fragments with high reliability, and score more than or equal to 0.8 according to the similarity between the original spectrum and the synthesized spectrum of the new peptide fragments as a similarity threshold, and after scoring and screening, a plurality of peptide fragments pass through verification and correspond to a new Open Reading Frame (ORF), namely the potential leaked to release genes of the current H37Rv strain.
Among them, we found that the new leaky release gene Rv3108c (-3476972-3477175. cndot.) has 99% similarity to M.tuboculosis 1825K, A70645 and M.canettii CIPT140070010 and less than 76% similarity to other strains by comparison with BLASTP, and belongs to a protein with unknown function. We detected a peptide segment ATSALAVIR (SEQ ID NO.6) and corresponded to the new gene Rv3108c (- |3476972-3477175|), as shown in FIG. 1, the spectrogram quality was good, the b/y ions were continuously matched, the peak signal was low, and the result was very reliable.
To further confirm this identification, we chemically synthesized the peptide according to the amino acid sequence of our newly identified peptide and generated a secondary spectrum of the synthesized peptide using the mass spectrometry conditions described above.
Our high energy collision MS on synthetic peptide fragments2Verification is carried out, and the primary parent ions and the secondary daughter ions both accord with theoretical values, so that the sequence of the synthesized peptide fragment is correct; on this basis, we manually examined MS of synthetic peptides of novel peptide sequences identified from large-scale proteomic data2And the large scale identification of the new peptide fragment spectrum, both of which are almost completely identical, the cosin value obtained by the daughter ion similarity is 0.98, which proves that the new peptide fragment identified by us from H37Rv is correct. (FIG. 2).
After confirming the sequence of the peptide fragment to be released, according to the gene position of the peptide fragment, taking the region included by the former stop codon and the latter stop codon as a boundary, obtaining the Open Reading Frame (ORF) DNA sequence containing the new peptide fragment to be released, as shown in SEQ ID NO. 2.
TAGTCAGCTGGCATCCTGAAGGGCATGCCAGGCAAGGAAATCGATCGAGTCCGGGCGACCAGTGCCCTCGCAGTGATTAGGCAGCACCCGGTAATGGTGTTCTTCGCGCTGTCGCCGGTACTCGCCGCATTGGGTGTCATGTGGTGGCTAGCCGGTGCTGGATGGGCTATCGTCGCGGCCCTGGTGCTGGTGGTCGTCGGCGGAGCCATGATCGTCCTCAAACGCTGA(SEQ ID NO.2)
The correspondence between the open reading frame code and the amino acid sequence is shown in FIG. 3.
Further translation verification revealed that the authentic gene sequence (SEQ ID NO.1) was found from the above-mentioned open reading frame DNA (SEQ ID NO.2)ATGAt the beginning, 204bp in total encodes 67 amino acids, the theoretical molecular weight of which is 7.10kDa, namely the gene Rv3108c (- |3476972-3477175 |).
ATGCCAGGCAAGGAAATCGATCGAGTCCGGGCGACCAGTGCCCTCGCAGTGATTAGGCAGCACCCGGTAATGGTGTTCTTCGCGCTGTCGCCGGTACTCGCCGCATTGGGTGTCATGTGGTGGCTAGCCGGTGCTGGATGGGCTATCGTCGCGGCCCTGGTGCTGGTGGTCGTCGGCGGAGCCATGATCGTCCTCAAACGCTGA(SEQ ID NO.1)
The theoretical coding product amino acid sequence of the gene is shown as SEQ ID NO. 3:
MPGKEIDRVRATSALAVIRQHPVMVFFALSPVLAALGVMWWLAGAGWAIVAALVLVVVGGAMIVLKR(SEQ ID NO.3)
the amino acid sequence of the theoretical gene-encoded product shown in SEQ ID NO.3 was analyzed by NCBI-BLASTP, and it had 99% similarity to M.tubericalis 1825K, A70645 and M.canettii CIPT140070010, and 76% similarity to other strains, and was a protein of unknown function. (see FIG. 5). It was shown that our detected Rv3108c (- |3476972-3477175|) gene product was missing annotations in the H37Rv strain database.
We carried out comparative genome local BLAST analysis on the DNA sequence of the Rv3108c (- |3476972-3477175|) gene, as shown in FIG. 5, and the result showed that the Rv3108c (- |3476972-3477175|) gene sequence belongs to MTBC family specific gene and has no more homologous sequence in other species, which indicates that the Rv3108c (- |3476972-3477175|) gene sequence found in the H37Rv strain has better sequence specificity and can distinguish MTBC from other mycobacteria and other respiratory tract infection bacteria in the same genus.
Example 2: method for establishing and identifying MTBC complex group
(1) Designing a primer:
based on the CDS sequence of the Rv3108c (- |3476972-3477175|) gene shown in SEQ ID NO.1, Oligo7.0 was used to design PCR primers with the following sequences:
F:5’-GACCAGTGCCCTCGCAGT-3’(SEQ ID NO.4);
R:5’-AGGACGATCATGGCTCCG-3’(SEQ ID NO.5)
the positional relationship between the above primers and the Rv3108c (-3476972-3477175) gene is shown below, wherein the single-dashed lines are marked below the corresponding positions of the primers.
ATGCCAGGCAAGGAAATCGATCGAGTCCGGGCGACCAGTGCCCTCGCAGTGATTAGGCAGCACCCGGTAATGGTGTTCTTCGCGCTGTCGCCGGTACTCGCCGCATTGGGTGTCATGTGGTGGCTAGCCGGTGCTGGATGGGCTATCGTCGCGGCCCTGGTGCTGGTGGTCGTCGGCGGAGCCATGATCGTCCTCAAACGCTGA(SEQ ID NO.1)
(2) Extracting total DNA of strains to be detected including M.tuberculosis H37Rv, wherein 40 standard strains of mycobacterium are preserved by China medical bacterial strain preservation management center (CMCC), the other 16 non-tuberculous mycobacteria are clinical isolates of 309 hospital of China people' S liberation military, completing the work of sequencing and comparing strains 16S RNA genes and submitting NCBI sequences, and the strains to be detected are shown in Table 1:
TABLE 1 related strains selected
(3) The DNA fragment was amplified and subjected to Polymerase Chain Reaction (PCR) using the above F/R primer.
PCR System (25. mu.L) as ddH2O (9.5. mu.L), 2XTaq PCR MasterMix (TIANGEN, 12.5. mu.L), primer F (10. mu.M, 1. mu.L), primer R (10. mu.M, 1. mu.L), DNA template (1. mu.L);
and (3) amplification procedure: pre-denaturation at 94 ℃ for 3min, denaturation at 94 ℃ for 30s, annealing at 58 ℃ for 30s, extension at 72 ℃ for 1min, 35 cycles, and extension at 72 ℃ for 5 min.
(4) And (4) detecting the amplified product by electrophoresis in agarose gel and 1 xTBE electrophoresis solution. As a result, as shown in FIG. 6, an amplification band appeared at 162bp in MTBC and positive control group, and the amplification result was consistent with the expectation, and the specificity was 98.3%.
(5) To further verify the sequence of the amplified DNA, we sequenced the amplified sequence and compared it with the original sequence, as shown in FIG. 7, which is a perfect match to the expected sequence without errors, further verifying the presence of a new missing annotated gene.
This indicates that the method for identifying MTBC complex based on the Rv3108c (-3476972-3477175 |) gene is truly reliable.
SEQUENCE LISTING
<110> Peking proteome research center
<120> Mycobacterium tuberculosis H37Rv encoding gene and application thereof
<130> BJ1936-17P121794
<160> 6
<170> PatentIn version 3.3
<210> 1
<211> 204
<212> DNA
<213> Artificial
<220>
<223> Mycobacterium tuberculosis H37Rv encoding gene Rv3108c (- |3476972-3477175|)
<400> 1
atgccaggca aggaaatcga tcgagtccgg gcgaccagtg ccctcgcagt gattaggcag 60
cacccggtaa tggtgttctt cgcgctgtcg ccggtactcg ccgcattggg tgtcatgtgg 120
tggctagccg gtgctggatg ggctatcgtc gcggccctgg tgctggtggt cgtcggcgga 180
gccatgatcg tcctcaaacg ctga 204
<210> 2
<211> 228
<212> DNA
<213> Artificial
<220>
<223> open reading frame DNA sequence comprising peptide fragment with missing annotation
<400> 2
tagtcagctg gcatcctgaa gggcatgcca ggcaaggaaa tcgatcgagt ccgggcgacc 60
agtgccctcg cagtgattag gcagcacccg gtaatggtgt tcttcgcgct gtcgccggta 120
ctcgccgcat tgggtgtcat gtggtggcta gccggtgctg gatgggctat cgtcgcggcc 180
ctggtgctgg tggtcgtcgg cggagccatg atcgtcctca aacgctga 228
<210> 3
<211> 67
<212> PRT
<213> Artificial
<220>
<223> theoretical coding product amino acid sequence of Rv3108c (-3476972-3477175 |) gene
<400> 3
Met Pro Gly Lys Glu Ile Asp Arg Val Arg Ala Thr Ser Ala Leu Ala
1 5 10 15
Val Ile Arg Gln His Pro Val Met Val Phe Phe Ala Leu Ser Pro Val
20 25 30
Leu Ala Ala Leu Gly Val Met Trp Trp Leu Ala Gly Ala Gly Trp Ala
35 40 45
Ile Val Ala Ala Leu Val Leu Val Val Val Gly Gly Ala Met Ile Val
50 55 60
Leu Lys Arg
65
<210> 4
<211> 18
<212> DNA
<213> Artificial
<220>
<223> F primer sequences
<400> 4
<210> 5
<211> 18
<212> DNA
<213> Artificial
<220>
<223> R primer sequences
<400> 5
<210> 6
<211> 9
<212> PRT
<213> Artificial
<220>
<223> peptide fragment to be released by missed injection
<400> 6
Ala Thr Ser Ala Leu Ala Val Ile Arg
1 5
Claims (5)
1. An identification method for distinguishing the Mycobacterium tuberculosis complex strain from other strains of the Mycobacterium genus, which is not used for the diagnosis and treatment of diseases, characterized in that whether the Mycobacterium tuberculosis complex exists in a sample to be detected is determined by detecting whether the gene Rv3108c (- |3476972-3477175|) encoded by the Mycobacterium tuberculosis H37Rv exists in the sample to be detected, and the nucleotide sequence of the gene Rv3108c (- |3476972-3477175|) encoded by the H37Rv is shown as SEQ ID NO. 1.
2. The method as claimed in claim 1, wherein the gene Rv3108c (-3476972-3477175 |) encoding H37Rv encodes the amino acid sequence shown in SEQ ID No. 3.
3. The method of claim 1, comprising the steps of:
(1) separating and extracting genome DNA from a sample to be detected;
(2) adding an amplification primer by taking the DNA obtained in the step (1) as a template to perform polymerase chain reaction;
(3) carrying out gel electrophoresis analysis and sequencing on the DNA product obtained by amplification in the step (2);
(4) comparing the result of the step (3) with the gene Rv3108c (-) |3476972-3477175|) encoded by the H37Rv of claim 1, and determining whether the Mycobacterium tuberculosis complex of the category exists in the sample to be detected according to the homology.
4. The method of claim 3, wherein the amplification primer sequence of step (2) is:
F: 5’- GACCAGTGCCCTCGCAGT -3’;
R: 5’- AGGACGATCATGGCTCCG -3’。
5. the method according to claim 3, wherein in the step (4), if the homology is more than 99%, it is judged that the Mycobacterium tuberculosis complex of the class is present in the sample to be tested.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711251274.8A CN108165564B (en) | 2017-12-01 | 2017-12-01 | Mycobacterium tuberculosis H37Rv encoding gene and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711251274.8A CN108165564B (en) | 2017-12-01 | 2017-12-01 | Mycobacterium tuberculosis H37Rv encoding gene and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108165564A CN108165564A (en) | 2018-06-15 |
CN108165564B true CN108165564B (en) | 2021-06-08 |
Family
ID=62525068
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711251274.8A Active CN108165564B (en) | 2017-12-01 | 2017-12-01 | Mycobacterium tuberculosis H37Rv encoding gene and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108165564B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110343706A (en) * | 2019-01-25 | 2019-10-18 | 北京蛋白质组研究中心 | Mycobacterium tuberculosis H37Rv encoding gene and its application |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012031752A2 (en) * | 2010-09-07 | 2012-03-15 | Institut Pasteur Korea | Tuberculosis mutants |
CN104212890A (en) * | 2010-02-24 | 2014-12-17 | 布罗德研究所有限公司 | Methods of diagnosing infectious disease pathogens and their drug sensitivity |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006029248A2 (en) * | 2004-09-04 | 2006-03-16 | Haper Laboratories Llc | Hollow fiber technique for in vivo study of cell populations |
-
2017
- 2017-12-01 CN CN201711251274.8A patent/CN108165564B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104212890A (en) * | 2010-02-24 | 2014-12-17 | 布罗德研究所有限公司 | Methods of diagnosing infectious disease pathogens and their drug sensitivity |
WO2012031752A2 (en) * | 2010-09-07 | 2012-03-15 | Institut Pasteur Korea | Tuberculosis mutants |
Non-Patent Citations (2)
Title |
---|
"membrane protein [Mycobacterium tuberculosis TRS4],ACCESSION:AQO70035.1";Eilertson,B. et al.;《GenBank》;20170213;第1-4页 * |
"Mycobacterium tuberculosis H37Rv complete genome,ACCESSION:AL123456.3 ";Cole,S.T. et al.;《Genbank》;20150227;第1页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108165564A (en) | 2018-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lang et al. | Genomics-based diagnostic marker development for Xanthomonas oryzae pv. oryzae and X. oryzae pv. oryzicola | |
CN110408629B (en) | Mycobacterium tuberculosis H37Rv encoding gene and application thereof | |
CN110408630B (en) | Mycobacterium tuberculosis H37Rv encoding gene and application thereof | |
Jiang et al. | Comparison of the proteome of isoniazid-resistant and-susceptible strains of Mycobacterium tuberculosis | |
CN108165561B (en) | Mycobacterium tuberculosis H37Rv encoding gene and application thereof | |
CN108004253B (en) | Mycobacterium tuberculosis H37Rv encoding gene and application thereof | |
CN108165562B (en) | Mycobacterium tuberculosis H37Rv encoding gene and application thereof | |
Ha et al. | Helicobacter pylori 23S rRNA gene mutations associated with clarithromycin resistance in chronic gastritis in Vietnam | |
CN108913768B (en) | Multiplex liquid phase gene chip primer, kit and analysis method for simultaneously detecting seven glucosamine drug resistance genes | |
CN108165564B (en) | Mycobacterium tuberculosis H37Rv encoding gene and application thereof | |
CN110408632B (en) | Mycobacterium tuberculosis H37Rv encoding gene and application thereof | |
CN108165560B (en) | Mycobacterium tuberculosis H37Rv encoding gene and application thereof | |
CN108165565B (en) | Mycobacterium tuberculosis H37Rv encoding gene and application thereof | |
CN108165563B (en) | Mycobacterium tuberculosis H37Rv encoding gene and application thereof | |
CN110408631B (en) | Mycobacterium tuberculosis H37Rv encoding gene and application thereof | |
CN114041188A (en) | Method for detecting helicobacter pylori levels in fecal samples | |
CN114196779A (en) | Pathogenic microorganism detection method and kit based on targeted sequencing | |
CN110923349B (en) | Species-specific detection molecular tags 3283 and 3316 of yersinia enterocolitica and rapid detection method thereof | |
CN114107454A (en) | Respiratory tract infection pathogen detection method based on macrogene/macrotranscriptome sequencing | |
CN110343706A (en) | Mycobacterium tuberculosis H37Rv encoding gene and its application | |
Mironov et al. | Multilocus sequence-typing scheme for Borrelia miyamotoi—the erythema-free ixodid tick-borne borreliosis pathogens | |
CN113174443B (en) | Mycobacterium identification method and biological material thereof | |
CN117230090A (en) | Encoding gene of mycobacterium abscessus and application thereof | |
CN112048552B (en) | Intestinal flora for diagnosing myasthenia gravis and application thereof | |
CN113151309B (en) | Streptococcus suis specific sequence with high risk of human beings and livestock and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240131 Address after: 100850 No. 27 Taiping Road, Beijing, Haidian District Patentee after: ACADEMY OF MILITARY MEDICAL SCIENCES Country or region after: China Address before: Building 1, No.33, kekeyuan Road, Changping District, Beijing Patentee before: BEIJING PROTEOME RESEARCH CENTER Country or region before: China |