CN113943787A - High-throughput detection method and system for antibiotic resistance genes in environmental sample - Google Patents

High-throughput detection method and system for antibiotic resistance genes in environmental sample Download PDF

Info

Publication number
CN113943787A
CN113943787A CN202111282530.6A CN202111282530A CN113943787A CN 113943787 A CN113943787 A CN 113943787A CN 202111282530 A CN202111282530 A CN 202111282530A CN 113943787 A CN113943787 A CN 113943787A
Authority
CN
China
Prior art keywords
antibiotic resistance
resistance gene
gene
sequence
resistance genes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111282530.6A
Other languages
Chinese (zh)
Inventor
韩毛振
张雁
汪栋
罗学才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Medical University
Original Assignee
Anhui Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Medical University filed Critical Anhui Medical University
Priority to CN202111282530.6A priority Critical patent/CN113943787A/en
Publication of CN113943787A publication Critical patent/CN113943787A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Wood Science & Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Genetics & Genomics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The embodiment of the invention discloses a high-throughput detection method and a high-throughput detection system for antibiotic resistance genes in an environmental sample, wherein metagenome DNA of the environmental sample is extracted, sequenced and subjected to library building to obtain metagenome data; performing quality control on the metagenome data, and then splicing to obtain a spliced sequence; performing gene prediction and protein translation on the spliced sequence to obtain a gene with a complete sequence; predicting and screening antibiotic resistance genes of the genes with complete sequences to obtain first antibiotic resistance genes, predicting sources of the first antibiotic resistance genes, extracting sequences of second antibiotic resistance genes of which the sources are not detected, and extracting and detecting protein sequences of the second antibiotic resistance genes; and analyzing the classification status of the source information to obtain the composition of the antibiotic resistance genes in the environmental sample. The method can realize the comprehensive detection and analysis of the antibiotic resistance genes of the environmental sample, thereby effectively detecting the environmental sample.

Description

High-throughput detection method and system for antibiotic resistance genes in environmental sample
Technical Field
The embodiment of the invention relates to the technical field of microbiology and bioinformatics, in particular to a high-throughput detection method and system for antibiotic resistance genes in an environmental sample.
Background
Antibiotics are a powerful class of drugs that are not only widely used in the treatment of human and animal diseases caused by pathogenic microorganisms, but also as growth promoters in agricultural farming and fishery production. In recent years, with the overuse of antibiotics, the problem of abuse of antibiotics has received increasing attention. According to statistics, the method comprises the following steps: among the patients treated with antibiotics in our country, more than 1/3 need no antibiotics at all, and 8 tens of thousands of people die of antibiotics abuse every year. These antibiotics, which are used in human therapy and in animal and plant farming, enter the environment through various routes, including water, sediment soil, air, sludge, sewage, and the like. On the one hand, these antibiotics can cause pollution of these environments; on the other hand, it also causes microorganisms in the environment to accelerate the emergence of drug resistance genes (i.e., antibiotic resistance genes) for these antibiotics. The presence of these genes renders the microorganism antibiotic-resistant, thereby rendering the action of the antibiotic lost and rendering clinical treatment difficult. Since the presence of antibiotic resistance genes poses a threat to human health, antibiotic resistance genes have become one of the most serious problems in the 21 st century (Chen et al.2016).
At present, most of methods used for detecting antibiotic resistance genes in environmental samples are based on traditional molecular biology methods (including PCR and qPCR) or high-throughput real-time fluorescent quantitative nucleic acid amplification (high-throughput qPCR) (Brunel et al 2019; Sui et al 2016; Waseem et al 2019), and only a few to a dozen targeted antibiotic resistance genes are detected in the detection. In fact, antibiotic resistance genes are present in the cells of microorganisms, and therefore, the methods of research in microbial communities (i.e., microbiomics) can be applied to the research of antibiotic resistance genes in microorganisms of environmental samples. The study of microbiology refers to the study of an ecological population of microorganisms (i.e., a microbial community) as the subject of study. In the research of microbiology, one of the important means for the research is high-throughput sequencing technology, which can obtain a great amount of genetic information in microbial communities through high-throughput sequencing, so that the genetic information can be identified to obtain more interesting genes, including antibiotic resistance genes. However, no method for detecting the antibiotic resistance gene of the environmental sample based on high-throughput sequencing is established at present.
The establishment of the method for high-throughput detection of the antibiotic resistance genes in the environmental sample is a precondition for the detection of the antibiotic resistance genes in the environmental sample, so that the variety and distribution of the antibiotic resistance genes in the environmental sample can be comprehensively known, the source analysis of the antibiotic resistance genes is a basis for the research on the propagation mode of the antibiotic resistance genes, the correlation research on the antibiotic resistance genes, the microbial community structure and the physical and chemical factors of the environmental sample can provide a basis for the management of the antibiotic resistance genes in the environmental sample, and the establishment of the method for detecting the antibiotic resistance genes in the environmental sample is urgently needed.
The prior art has the defects of slow detection speed, low flux and incapability of comprehensively and effectively identifying and identifying the antibiotic resistance genes in the sample, and how to provide a high-flux detection method and system for the antibiotic resistance genes in the environmental sample to solve the technical problems is urgent.
Disclosure of Invention
The embodiment of the invention aims to provide a high-flux detection method and a high-flux detection system for antibiotic resistance genes in an environmental sample, which have high detection speed and high flux and can obtain more comprehensive antibiotic resistance gene data in the sample.
In a first aspect of the present invention, the present embodiments provide a method for high-throughput detection of antibiotic resistance genes in environmental samples, the method comprising:
extracting, sequencing and establishing a database of the metagenome DNA of the environmental sample to obtain metagenome data;
performing quality control on the metagenome data to obtain high-quality metagenome data;
splicing the high-quality macro genome data to obtain a spliced sequence;
performing gene prediction and protein translation on the spliced sequence to obtain a gene with a complete sequence;
predicting and screening antibiotic resistance genes of the genes with complete sequences to obtain first antibiotic resistance genes;
predicting the source of the first antibiotic resistance gene to obtain the source information of the first antibiotic resistance gene, and extracting a sequence of a second antibiotic resistance gene of which the source is not detected;
extracting and detecting the source of the protein sequence of the second antibiotic resistance gene to obtain the source information of the protein sequence of the second antibiotic resistance gene;
and analyzing the classification status of the source information of the first antibiotic resistance gene and the source information of the protein sequence of the second antibiotic resistance gene to obtain the composition of the antibiotic resistance genes in the environmental sample.
Further, the environmental sample includes at least one of a body of water, animals and plants, soil, sediment, and air.
Further, the quality control of the metagenome data to obtain high-quality metagenome data includes:
removing low-quality metagenome data by adopting PRIINSEQ software through presetting quality control conditions to obtain high-quality metagenome data, wherein the quality control conditions are as follows: -ns _ max _ p: 10, -no _ qual _ header, -min _ len: 25, and-min _ qual _ mean: 20.
further, the splicing the high-quality macro genome data to obtain a spliced sequence comprises:
and splicing the high-quality macro genome data by adopting macro genome data assembly software MEGAHIT to obtain spliced sequence contigs.
Further, the gene prediction and protein translation of the spliced sequence are carried out to obtain a gene with a complete sequence, and the method comprises the following steps:
and (3) carrying out gene prediction and protein translation on the spliced sequence by adopting Prodigal software to obtain a gene with a complete sequence.
Further, the predicting and screening of the antibiotic resistance gene with the complete sequence to obtain a first antibiotic resistance gene comprises:
predicting an antibiotic resistance gene of the gene with a complete sequence by adopting Abricite software and a CARD database to obtain a predicted antibiotic resistance gene;
and comparing the sequence of the predicted antibiotic resistance gene with that of the corresponding antibiotic resistance gene, and screening the predicted antibiotic resistance gene with the similarity of more than or equal to 80% and the coverage rate of more than or equal to 70% to obtain the antibiotic resistance gene in the environmental sample.
Further, the predicting the source of the first antibiotic resistance gene, obtaining the source information of the first antibiotic resistance gene, and extracting a sequence of a second antibiotic resistance gene whose source is not detected includes:
establishing a local plasmid database;
and introducing the local plasmid database into a blastall tool, predicting the source of the first antibiotic resistance gene, obtaining the source information of the first antibiotic resistance gene, and extracting the sequence of a second antibiotic resistance gene of which the source is not detected.
Further, the extracting and detecting the source of the protein sequence of the second antibiotic resistance gene to obtain the source information of the protein sequence of the second antibiotic resistance gene includes:
extracting a protein sequence of the second antibiotic resistance gene, and detecting the source of the protein sequence by using a blastp tool and an NR database to obtain source information of the protein sequence of the second antibiotic resistance gene.
In a second aspect of the present invention, the embodiments of the present invention further provide a system for high-throughput detection of antibiotic resistance genes in environmental samples, the system comprising:
the acquisition module is used for acquiring metagenome data of the environment sample;
the quality control module is used for performing quality control on the metagenome data to obtain high-quality metagenome data;
the splicing module is used for splicing the high-quality macro genome data to obtain a splicing sequence;
the first prediction module is used for performing gene prediction and protein translation on the spliced sequence to obtain a gene with a complete sequence;
the second prediction module is used for predicting and screening the antibiotic resistance genes of the genes with complete sequences to obtain the sequences of the first antibiotic resistance genes;
the third prediction module is used for extracting and detecting the protein sequence of the second antibiotic resistance gene to obtain the source information of the protein sequence of the second antibiotic resistance gene;
and the analysis module is used for analyzing the classification status of the source information of the first antibiotic resistance gene and the source information of the protein sequence of the second antibiotic resistance gene to obtain the composition of the antibiotic resistance genes in the environmental sample.
Further, the acquisition module comprises a metagenome DNA extraction module and a sequencing module; the quality control module comprises PRINEQ software; the splicing module comprises metagenome data assembly software MEGAHIT; the first prediction module comprises Prodigal software; the second prediction module comprises Abricate software and a CARD database; the third prediction module comprises a local plasmid database and a blastall tool; the parsing module comprises a blastp tool and an NR database.
In a third aspect of the present invention, the embodiments of the present invention further provide a system for high-throughput detection of antibiotic resistance genes in environmental samples, the system comprising:
a processor and a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the system for microbial authentication to perform the steps of the method.
In a fourth aspect of the present invention, the present invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method.
One or more technical solutions in the embodiments of the present invention have at least the following technical effects or advantages:
the embodiment of the invention provides a high-throughput detection method and a system for antibiotic resistance genes in an environmental sample, wherein the method comprises the following steps: extracting, sequencing and establishing a database of the metagenome DNA of the environmental sample to obtain metagenome data; performing quality control on the metagenome data to obtain high-quality metagenome data; splicing the high-quality macro genome data to obtain a spliced sequence; performing gene prediction and protein translation on the spliced sequence to obtain a gene with a complete sequence; predicting and screening antibiotic resistance genes of the genes with complete sequences to obtain first antibiotic resistance genes; predicting the source of the first antibiotic resistance gene to obtain the source information of the first antibiotic resistance gene, and extracting a sequence of a second antibiotic resistance gene of which the source is not detected; extracting and detecting the source of the protein sequence of the second antibiotic resistance gene to obtain the source information of the protein sequence of the second antibiotic resistance gene; and analyzing the classification status of the source information of the first antibiotic resistance gene and the source information of the protein sequence of the second antibiotic resistance gene to obtain the composition of the antibiotic resistance genes in the environmental sample. Compared with the prior art, the invention provides a high-flux detection method for the antibiotic resistance gene of the environmental sample based on a research method of microbiology and an analysis idea of bioinformatics by taking metagenomic data in the environmental sample as a source. The method has the following advantages:
1. the comprehensiveness: compared with the traditional method for identifying the antibiotic resistance genes, the method can acquire more comprehensive antibiotic resistance gene data in a sample based on metagenome data, and has non-targeting property;
2. the cost performance is high: one-time metagenome sequencing can obtain more comprehensive antibiotic resistance gene composition, and compared with the traditional method, the method does not need to design primers and carry out PCR amplification and has high cost performance;
3. the added value is high; for the metagenome sequencing data of the environmental sample, other data mining can be carried out, such as mining of virus factors, metabolic mining of microbial communities and the like;
4. ease of use: the method has clear overall analysis and research thought, and the tool and the database are easy to obtain and localize, easy to understand and use and do not need more professional knowledge.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method for high throughput detection of antibiotic resistance genes in environmental samples according to an embodiment of the present invention;
FIG. 2 is a block diagram of a system for high-throughput detection of antibiotic resistance genes in environmental samples according to an embodiment of the present invention; 10-an acquisition module; 20-a quality control module; 30-a splicing module; 40-a first prediction module; 50-a second prediction module; 60-a third prediction module; 70-an analysis module;
FIG. 3 is a schematic diagram of a method for high throughput detection of antibiotic resistance genes in environmental samples according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to specific embodiments and examples, and the advantages and various effects of the embodiments of the present invention will be more clearly apparent therefrom. It will be understood by those skilled in the art that the present embodiments and examples are illustrative of the present invention and are not to be construed as limiting the present invention.
Throughout the specification, unless otherwise specifically noted, terms used herein should be understood as having meanings as commonly used in the art. Accordingly, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which embodiments of the invention belong. If there is a conflict, the present specification will control.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
Unless otherwise specifically stated, various raw materials, reagents, instruments, equipment and the like used in the examples of the present invention are commercially available or can be prepared by an existing method.
In order to solve the technical problems, the embodiment of the invention provides the following general ideas:
according to an exemplary embodiment of the present invention, a method for high-throughput detection of antibiotic resistance genes in environmental samples is provided, as shown in fig. 1, including:
s1, extracting, sequencing and establishing a database of the metagenome DNA of the environmental sample to obtain metagenome data;
the environmental sample includes at least one of a body of water, animals and plants, soil, sediment, and air.
In the step S1, the method specifically includes extracting, library-creating and sequencing the metagenomic DNA of the environmental sample to obtain metagenomic data of the sample.
The subject of the whole genome is a single species, such as a certain strain of bacteria; the research objects of the metagenome are wide, and can be all microbial populations under a certain specific environment;
the metagenome is as if the gene is directly extracted from the fermentation product without separating the microorganism, and all the genes are detected. For example, the change of microbial population in the fermentation process of vinegar can be researched, and the microbial species in the vinegar can be judged by using metagenome sequencing.
S2, performing quality control on the metagenome data to obtain high-quality metagenome data;
the step S2 specifically includes:
removing low-quality metagenome data by adopting PRIINSEQ software through presetting quality control conditions to obtain high-quality metagenome data, wherein the quality control conditions are as follows: -ns _ max _ p: 10, -no _ qual _ header, -min _ len: 25, and-min _ qual _ mean: 20.
the used metagenomic data analysis method is an effective method for metagenomic data analysis in bioinformatics. The method can realize the quality control of the metagenome data, and can also realize the assembly of the metagenome data of the sample (step S3) and the prediction of genes and proteins in the sample (step S4).
S3, splicing the high-quality macro genome data to obtain a spliced sequence;
the step S3 specifically includes:
and splicing the high-quality macro genome data by adopting macro genome data assembly software MEGAHIT to obtain spliced sequence contigs.
Sequences generated by high throughput sequencing platforms are called reads. The splicing software is based on the overlap region between the reads, and the sequence obtained by splicing is called Contig (Contig).
After genome de novo sequencing and obtaining Contigs through reads splicing, a 454Paired-end library or an Illumina Mate-pair library is often required to be constructed so as to obtain sequences at two ends of a fragment with a certain size (such as 3Kb, 6Kb, 10Kb and 20 Kb). Based on these sequences, some order relationships between Contigs can be determined, and these Contigs whose order is known constitute the Scaffold.
S4, performing gene prediction and protein translation on the spliced sequence to obtain a gene with a complete sequence;
the step S4 specifically includes:
and (3) carrying out gene prediction and protein translation on the spliced sequence by adopting Prodigal software to obtain a gene with a complete sequence.
S5, predicting and screening antibiotic resistance genes of the genes with complete sequences to obtain first antibiotic resistance genes;
the step S5 specifically includes:
predicting an antibiotic resistance gene of the gene with a complete sequence by adopting Abricite software and a CARD database to obtain a predicted antibiotic resistance gene;
and comparing the sequences of the predicted antibiotic resistance genes with the corresponding antibiotic resistance genes, screening the predicted antibiotic resistance genes with the similarity of more than or equal to 80 percent and the coverage rate of more than or equal to 70 percent, and obtaining the types, gene sequences and corresponding products of the antibiotic resistance genes in the environmental sample.
The antibiotic resistance gene detection database used was the CARD database. The database is relatively comprehensive in the existing antibiotic resistance gene detection database, is relatively high in updating speed and easy to localize, and can realize the comprehensive detection of the antibiotic resistance genes in the environmental sample, so that the antibiotic resistance genes in the environmental sample can be more accurately researched.
The antibiotic resistance gene detection tool used was Abricate, which is free sourced, easily available, localized, and easy to use.
The analytical method used is a rapid and efficient method for the resistance gene to antibiotics in bioinformatics. The method can realize effective and accurate identification of the antibiotic resistance gene in the environmental sample, and can also realize effective output of the antibiotic resistance gene product.
S6, predicting the source of the first antibiotic resistance gene to obtain the source information of the first antibiotic resistance gene, and extracting the sequence of a second antibiotic resistance gene of which the source is not detected;
the step S6 specifically includes:
establishing a local plasmid database;
and introducing the local plasmid database into a blastall tool, predicting the source of the first antibiotic resistance gene, obtaining the source information of the first antibiotic resistance gene, and extracting the sequence of a second antibiotic resistance gene of which the source is not detected.
The tools used in the studies on the origin of antibiotic resistance genes are the blastn and blastp tools in the BLAST suite, which are freely available and easily localized and used.
S7, extracting and detecting the source of the protein sequence of the second antibiotic resistance gene to obtain the source information of the protein sequence of the second antibiotic resistance gene;
the step S7 specifically includes:
extracting a protein sequence of the second antibiotic resistance gene, and detecting the source of the protein sequence by using a blastp tool and an NR database to obtain source information of the protein sequence of the second antibiotic resistance gene.
The research on the source of the antibiotic resistance gene is realized, and whether the antibiotic resistance gene is from plasmids or bacteria is given. The databases used were a 13957 plasmid sequence-based database and an NR database, the data of which were easy to download and localize.
S8, analyzing the classification status of the source information of the first antibiotic resistance gene and the source information of the protein sequence of the second antibiotic resistance gene to obtain the composition of the antibiotic resistance genes in the environmental sample.
According to the method for realizing the high-flux detection of the antibiotic resistance gene of the environmental sample, the antibiotic resistance gene obtained by processing can be used as an input file of antibiotic related research, so that the antibiotic resistance research, the association analysis research of the antibiotic and microbial community composition and the environmental sample and the like can be realized.
The classification status is the information of phylum, class, order, family, genus and species;
according to an exemplary embodiment of the present invention, there is provided a system for high-throughput detection of antibiotic resistance genes in environmental samples, the system comprising:
an obtaining module 10, configured to obtain metagenome data of an environment sample;
a quality control module 20, configured to perform quality control on the metagenome data to obtain high-quality metagenome data;
a splicing module 30, configured to splice the high-quality macro genome data to obtain a spliced sequence;
the first prediction module 40 is used for performing gene prediction and protein translation on the spliced sequence to obtain a gene with a complete sequence;
a second prediction module 50, which predicts and screens antibiotic resistance genes of the genes with complete sequences to obtain first antibiotic resistance genes;
a third prediction module 60, configured to extract and detect a source of the protein sequence of the second antibiotic resistance gene, so as to obtain source information of the protein sequence of the second antibiotic resistance gene;
and an analysis module 70, configured to perform classification status analysis on the source information of the first antibiotic resistance gene and the source information of the protein sequence of the second antibiotic resistance gene, so as to obtain a composition of the antibiotic resistance genes in the environmental sample.
As an optional embodiment, the obtaining module comprises a metagenomic DNA extraction module and a sequencing module; the quality control module comprises PRINEQ software; the splicing module comprises metagenome data assembly software MEGAHIT; the first prediction module comprises Prodigal software; the second prediction module comprises Abricate software and a CARD database; the third prediction module comprises a local plasmid database and a blastall tool; the parsing module comprises a blastp tool and an NR database.
According to an exemplary embodiment of the present invention, there is provided a system for high-throughput detection of antibiotic resistance genes in environmental samples, the system comprising:
a processor and a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the system for microbial authentication to perform the steps of the method.
According to an exemplary embodiment of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method.
The establishment of the method provided by the embodiment of the invention can realize high-flux detection of the antibiotic resistance genes in the environmental sample, and the source problem of the antibiotic resistance genes in the sample can be realized by combining the local plasmid database and the non-redundant protein database in NCBI, so that the antibiotic resistance genes in the environmental sample can be comprehensively detected and analyzed, and the environmental sample can be effectively detected. The invention takes a high-throughput sequencing technology as a means, combines an analysis method of bioinformatics antibiotic resistance genes, and establishes an analysis method of antibiotic resistance gene sources. The invention provides a method for carrying out high-throughput detection and 'traceability' on antibiotic resistance genes in an environmental sample based on the method, and the method has the characteristics of comprehensiveness, high cost performance, high throughput, high added value and the like.
The method and system for high-throughput detection of antibiotic resistance genes in environmental samples according to the present application will be described in detail below with reference to examples, comparative examples, and experimental data.
Example one
The sample involved in this example is a water body sample (CH01) near Nan\ 28125or river entrance in a nested lake, and the data involved is a nested
Metagenome data near the entrance of a river, \ 28125in a lake. The process for detecting antibiotic resistance genes in the environmental sample is shown in fig. 1, and specifically comprises the following steps:
1. extracting the metagenome of the environmental sample CH01, and performing library building sequencing to obtain the metagenome data of the sample;
performing quality control on the sample metagenome data, wherein the used tool is PRINSEQ (http:// PRINSEQ. source. net /), removing low-quality sequencing data, and the quality control condition is as follows: -ns _ max _ p: 10, -no _ qual _ header, -min _ len: 25, and-min _ qual _ mean: 20. obtaining high-quality metagenome data;
assembling high-quality metagenome data by using metagenome data assembling software MEGAHIT to obtain contigs sequences;
2. using Prodigal to predict the genes and protein sequences on contigs sequences, wherein only genes and proteins with complete sequences are taken;
3. prediction of antibiotic resistance genes for the genes in step 2 using Abricate tool and CARD database;
4. establishing a local plasmid database: downloading a plasmid sequence from an NCBI website, extracting a nucleic acid sequence of a complete plasmid from the NCBI website, wherein the nucleic acid sequence comprises 13957 nucleic acid sequences, and constructing a local plasmid database by using makeblastdb;
5. building a local non-redundant protein database (NR database): localization is carried out from an NR database in an NCBI website;
6. predicting the source of the antibiotic resistance predicted in the step v by using blastn in a blastall toolkit and the localized plasmid database established in the step vi, and extracting the name of the gene sequence of which the source is not detected;
7. extracting protein sequences of which the sources are not detected in the step 6, and detecting the sources of the protein sequences by using a blastp tool and an NR (noise-and-noise) database in a blastall toolkit;
8. analyzing the source of the antibiotic resistance genes in the step 6 and the step 3, mainly analyzing the classification status of the species, namely the information of phylum, class, order, family, genus and species; outputting composition information of the antibiotic resistance gene in the environmental sample based on the results of the above steps.
TABLE 1 Classification and number statistics of antibiotic resistance genes in CH01 samples
Figure BDA0003331660720000101
Figure BDA0003331660720000111
TABLE 2 statistics of Gene number of antibiotic resistant drugs in CH01 samples
Classes of antibiotics CH01 Classes of antibiotics CH01
rifamycin 103 sulfonamide 27
macrolide 52 monobactam 26
fluoroquinolone 48 peptide 25
cephalosporin 48 aminoglycoside 17
tetracycline 42 lincosamide 7
phenicol 42 acridine_dye 6
penam 37 RND 4
diaminopyrimidine 35 mupirocin 3
Carbapenem 33 streptogramin 1
penem 31 oxazolidinone 1
cephamycin 29 pleuromutilin 1
aminocoumarin 29 triclosam 1
TABLE 3-analysis of the origin of some of the antibiotic resistance genes in CH01 samples
Figure BDA0003331660720000121
From the data processing results of example 1 above, it can be seen that: the high-flux detection is realized based on metagenome data of a water body sample (CH01) close to Nan\2525and river entrance in a nested lake, and after the detection is carried out by using the method, corresponding files are provided for distribution of antibiotic resistance genes in a CH01 sample and analysis and sources of antibiotic resistance medicines.
Example two
The sample referred to in this example is a Soil sample (Soil01) in a collected database, and the process for detecting the antibiotic resistance gene in the environmental sample is shown in fig. 1, and specifically as follows:
1. extracting the metagenome of the environmental sample Soil01, and performing library building sequencing to obtain the metagenome data of the sample;
performing quality control on the sample metagenome data, wherein the used tool is PRINSEQ (http:// PRINSEQ. source. net /), removing low-quality sequencing data, and the quality control condition is as follows: -ns _ max _ p: 10, -no _ qual _ header, -min _ len: 25, and-min _ qual _ mean: 20. obtaining high-quality metagenome data;
assembling high-quality metagenome data by using metagenome data assembling software MEGAHIT to obtain contigs sequences;
2. using Prodigal to predict the genes and protein sequences on contigs sequences, wherein only genes and proteins with complete sequences are taken;
3. prediction of antibiotic resistance genes for the genes in step 2 using Abricate tool and CARD database;
4. establishing a local plasmid database: downloading a plasmid sequence from an NCBI website, extracting a nucleic acid sequence of a complete plasmid from the NCBI website, wherein the nucleic acid sequence comprises 13957 nucleic acid sequences, and constructing a local plasmid database by using makeblastdb;
5. building a local non-redundant protein database (NR database): localization is carried out from an NR database in an NCBI website;
6. predicting the source of the antibiotic resistance predicted in the step v by using blastn in a blastall toolkit and the localized plasmid database established in the step vi, and extracting the name of the gene sequence of which the source is not detected;
7. extracting protein sequences of which the sources are not detected in the step 6, and detecting the sources of the protein sequences by using a blastp tool and an NR (noise-and-noise) database in a blastall toolkit;
8. analyzing the source of the antibiotic resistance genes in the step 6 and the step 3, mainly analyzing the classification status of the species, namely the information of phylum, class, order, family, genus and species; outputting composition information of the antibiotic resistance gene in the environmental sample based on the results of the above steps.
TABLE 4 Classification and number statistics of antibiotic resistance genes in the Soil01 samples
Figure BDA0003331660720000131
Figure BDA0003331660720000141
TABLE 5-Gene statistics for antibiotic resistance drugs in the Soil01 samples
Classes of antibiotics Soil01
rifamycin 82
macrolide 42
fluoroquinolone 38
cephalosporin 39
tetracycline 40
phenicol 36
carbapenem 39
diaminopyrimidine 32
penam 24
penem 30
cephamycin 18
aminocoumarin 19
sulfonamide 20
monobactam 19
peptide 19
lincosamide 17
aminoglycoside 5
mupirocin 4
acridine_dye 1
RND 3
streptogramin 4
oxazolidinone 4
pleuromutilin 4
EXAMPLE III
The embodiment of the invention provides a high-throughput detection system for antibiotic resistance genes in an environmental sample, as shown in fig. 2, the system comprises:
an obtaining module 10, configured to obtain metagenome data of an environment sample; the acquisition module comprises a metagenome DNA extraction module and a sequencing module;
a quality control module 20, configured to perform quality control on the metagenome data to obtain high-quality metagenome data; the quality control module comprises PRINEQ software;
a splicing module 30, configured to splice the high-quality macro genome data to obtain a spliced sequence; the splicing module comprises metagenome data assembly software MEGAHIT;
the first prediction module 40 is used for performing gene prediction and protein translation on the spliced sequence to obtain a gene with a complete sequence; the first prediction module comprises Prodigal software;
a second prediction module 50, which predicts and screens antibiotic resistance genes of the genes with complete sequences to obtain first antibiotic resistance genes; the second prediction module comprises Abricate software and a CARD database;
a third prediction module 60, configured to extract and detect a source of the protein sequence of the second antibiotic resistance gene, so as to obtain source information of the protein sequence of the second antibiotic resistance gene; the third prediction module comprises a local plasmid database and a blastall tool;
and an analysis module 70, configured to perform classification status analysis on the source information of the first antibiotic resistance gene and the source information of the protein sequence of the second antibiotic resistance gene, so as to obtain a composition and a source of the antibiotic resistance genes in the environmental sample. The parsing module comprises a blastp tool and an NR database.
Example four
The third embodiment of the invention provides a high-throughput detection system for antibiotic resistance genes in an environmental sample, which comprises:
a processor and a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the system for microbial authentication to perform the steps of either embodiment one-or either method.
EXAMPLE five
The fourth embodiment discloses a computer-readable storage medium on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of any of the first to fourth embodiments of the method.
Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the method provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes instructions for enabling an electronic device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the foregoing embodiment, each included unit and each included module are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
Finally, it should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the embodiments of the present invention and their equivalents, the embodiments of the present invention are also intended to encompass such modifications and variations.

Claims (10)

1. A method for high throughput detection of antibiotic resistance genes in environmental samples, the method comprising:
extracting, sequencing and establishing a database of the metagenome DNA of the environmental sample to obtain metagenome data;
performing quality control on the metagenome data to obtain high-quality metagenome data;
splicing the high-quality macro genome data to obtain a spliced sequence;
performing gene prediction and protein translation on the spliced sequence to obtain a gene with a complete sequence;
predicting and screening antibiotic resistance genes of the genes with complete sequences to obtain first antibiotic resistance genes;
predicting the source of the first antibiotic resistance gene to obtain the source information of the first antibiotic resistance gene, and extracting a second antibiotic resistance gene of which the source is not detected;
extracting and detecting the source of the protein sequence of the second antibiotic resistance gene to obtain the source information of the protein sequence of the second antibiotic resistance gene;
and analyzing the classification status of the source information of the first antibiotic resistance gene and the source information of the protein sequence of the second antibiotic resistance gene to obtain the composition of the antibiotic resistance genes in the environmental sample.
2. The method for high-throughput detection of antibiotic resistance genes in environmental samples according to claim 1, wherein the quality control of the metagenomic data to obtain high-quality metagenomic data comprises:
removing low-quality metagenome data by adopting PRIINSEQ software through presetting quality control conditions to obtain high-quality metagenome data, wherein the quality control conditions are as follows: -ns _ max _ p: 10, -no _ qual _ header, -min _ len: 25, and-min _ qual _ mean: 20.
3. the method for high-throughput detection of antibiotic resistance genes in environmental samples according to claim 1, wherein the splicing of the high-quality macro genome data to obtain a spliced sequence comprises:
and splicing the high-quality macro genome data by adopting macro genome data assembly software MEGAHIT to obtain spliced sequence contigs.
4. The method for high-throughput detection of antibiotic resistance genes in environmental samples according to claim 1, wherein the gene prediction and protein translation of the spliced sequence to obtain the gene with the complete sequence comprises:
and (3) carrying out gene prediction and protein translation on the spliced sequence by adopting Prodigal software to obtain a gene with a complete sequence.
5. The method for high-throughput detection of antibiotic resistance genes in environmental samples according to claim 1, wherein the predicting and screening of antibiotic resistance genes for the genes having complete sequences to obtain a first antibiotic resistance gene comprises:
predicting an antibiotic resistance gene of the gene with a complete sequence by adopting Abricite software and a CARD database to obtain a predicted antibiotic resistance gene;
and comparing the sequence of the predicted antibiotic resistance gene with that of the corresponding antibiotic resistance gene, and screening the predicted antibiotic resistance gene with the similarity of more than or equal to 80% and the coverage rate of more than or equal to 70% to obtain the antibiotic resistance gene in the environmental sample.
6. The method of claim 1, wherein the predicting the source of the first antibiotic resistance gene, obtaining the information of the source of the first antibiotic resistance gene, and extracting the sequence of the second antibiotic resistance gene whose source is not detected comprises:
establishing a local plasmid database;
and introducing the local plasmid database into a blastall tool, predicting the source of the first antibiotic resistance gene, obtaining the source information of the first antibiotic resistance gene, and extracting the sequence of a second antibiotic resistance gene of which the source is not detected.
7. The method for high-throughput detection of antibiotic resistance genes in environmental samples according to claim 1, wherein the extracting and detecting the source of the protein sequence of the second antibiotic resistance gene to obtain the source information of the protein sequence of the second antibiotic resistance gene comprises:
extracting a protein sequence of the second antibiotic resistance gene, and detecting the source of the protein sequence by using a blastp tool and an NR database to obtain source information of the protein sequence of the second antibiotic resistance gene.
8. A system for high throughput detection of antibiotic resistance genes in environmental samples, said system comprising:
the acquisition module is used for acquiring metagenome data of the environment sample;
the quality control module is used for performing quality control on the metagenome data to obtain high-quality metagenome data;
the splicing module is used for splicing the high-quality macro genome data to obtain a splicing sequence;
the first prediction module is used for performing gene prediction and protein translation on the spliced sequence to obtain a gene with a complete sequence;
the second prediction module is used for predicting the antibiotic resistance genes of the genes with complete sequences to obtain first antibiotic resistance genes in the environmental sample and extracting sequences of second antibiotic resistance genes of which the sources are not detected in the prediction of the antibiotic resistance genes;
the third prediction module is used for extracting and detecting the protein sequence of the second antibiotic resistance gene to obtain the source information of the protein sequence of the second antibiotic resistance gene;
and the analysis module is used for analyzing the classification status of the source information of the first antibiotic resistance gene and the source information of the protein sequence of the second antibiotic resistance gene to obtain the composition of the antibiotic resistance genes in the environmental sample.
9. The system for high-throughput detection of antibiotic resistance genes in environmental samples according to claim 8, wherein the obtaining module comprises a metagenomic DNA extraction module and a sequencing module; the quality control module comprises PRINEQ software; the splicing module comprises metagenome data assembly software MEGAHIT; the first prediction module comprises Prodigal software; the second prediction module comprises Abricate software and a CARD database; the third prediction module comprises a local plasmid database and a blastall tool; the parsing module comprises a blastp tool and an NR database.
10. A system for high throughput detection of antibiotic resistance genes in environmental samples, said system comprising:
a processor and a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the system for microbial authentication to perform the steps of the method of claims 1-7.
CN202111282530.6A 2021-11-01 2021-11-01 High-throughput detection method and system for antibiotic resistance genes in environmental sample Pending CN113943787A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111282530.6A CN113943787A (en) 2021-11-01 2021-11-01 High-throughput detection method and system for antibiotic resistance genes in environmental sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111282530.6A CN113943787A (en) 2021-11-01 2021-11-01 High-throughput detection method and system for antibiotic resistance genes in environmental sample

Publications (1)

Publication Number Publication Date
CN113943787A true CN113943787A (en) 2022-01-18

Family

ID=79337341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111282530.6A Pending CN113943787A (en) 2021-11-01 2021-11-01 High-throughput detection method and system for antibiotic resistance genes in environmental sample

Country Status (1)

Country Link
CN (1) CN113943787A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117699958A (en) * 2024-02-02 2024-03-15 青岛海湾中水有限公司 Sewage treatment system and sewage treatment method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577919A (en) * 2017-08-21 2018-01-12 上海派森诺生物科技股份有限公司 A kind of grand genomic data analysis method based on high throughput sequencing technologies
CN111944914A (en) * 2020-07-16 2020-11-17 中国科学院生态环境研究中心 Method for evaluating water health risk based on resistance gene and virulence factor gene
CN113337591A (en) * 2021-06-30 2021-09-03 清华大学深圳国际研究生院 Method for quantifying activity of antibiotic resistance gene in environment based on macrotranscriptomics and macrogenomics and identifying host

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577919A (en) * 2017-08-21 2018-01-12 上海派森诺生物科技股份有限公司 A kind of grand genomic data analysis method based on high throughput sequencing technologies
CN111944914A (en) * 2020-07-16 2020-11-17 中国科学院生态环境研究中心 Method for evaluating water health risk based on resistance gene and virulence factor gene
CN113337591A (en) * 2021-06-30 2021-09-03 清华大学深圳国际研究生院 Method for quantifying activity of antibiotic resistance gene in environment based on macrotranscriptomics and macrogenomics and identifying host

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117699958A (en) * 2024-02-02 2024-03-15 青岛海湾中水有限公司 Sewage treatment system and sewage treatment method
CN117699958B (en) * 2024-02-02 2024-04-19 青岛海湾中水有限公司 Sewage treatment system and sewage treatment method

Similar Documents

Publication Publication Date Title
Zaheer et al. Impact of sequencing depth on the characterization of the microbiome and resistome
Garrido-Cardenas et al. The metagenomics worldwide research
Tessler et al. Large-scale differences in microbial biodiversity discovery between 16S amplicon and shotgun sequencing
Press et al. Hi-C deconvolution of a human gut microbiome yields high-quality draft genomes and reveals plasmid-genome interactions
Yang et al. Application of next-generation sequencing technology in forensic science
Chin et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
Thines et al. Ten reasons why a sequence-based nomenclature is not useful for fungi anytime soon
Su et al. Next-generation sequencing and its applications in molecular diagnostics
US20200131506A1 (en) Systems and methods for identification of nucleic acids in a sample
US20200234793A1 (en) Systems and methods for metagenomic analysis
Kumar et al. Simultaneous genome sequencing of symbionts and their hosts
CN114067911A (en) Method, apparatus, computer-readable storage medium and electronic device for obtaining microbial species and related information by sequencing
CA2906725C (en) Characterization of biological material using unassembled sequence information, probabilistic methods and trait-specific database catalogs
Baiyintala et al. The whole-genome sequence analysis of Morchella sextelata
CN113066533B (en) mNGS pathogen data analysis method
CN112331268B (en) Method for obtaining specific sequence of target species and method for detecting target species
Eckstrom et al. Resistome metagenomics from plate to farm: The resistome and microbial composition during food waste feeding and composting on a Vermont poultry farm
Lobanov et al. Ecosystem-specific microbiota and microbiome databases in the era of big data
CN113943787A (en) High-throughput detection method and system for antibiotic resistance genes in environmental sample
Nayarisseri et al. Impact of Next-Generation Whole-Exome sequencing in molecular diagnostics
Fleming et al. Leading edge analysis of transcriptomic changes during pseudorabies virus infection
Ranjan et al. Metatranscriptomics in microbiome study: a comprehensive approach
Hu et al. Global abundance patterns, diversity, and ecology of Patescibacteria in wastewater treatment plants
CN113936740A (en) High-throughput detection method and system for pathogenic bacteria virulence factor in environmental sample
Loeffelholz et al. The main challenges that remain in applying high-throughput sequencing to clinical diagnostics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination