CN117476109A - Microbial data analysis method based on super-multiple targeted sequencing technology - Google Patents

Microbial data analysis method based on super-multiple targeted sequencing technology Download PDF

Info

Publication number
CN117476109A
CN117476109A CN202311524535.4A CN202311524535A CN117476109A CN 117476109 A CN117476109 A CN 117476109A CN 202311524535 A CN202311524535 A CN 202311524535A CN 117476109 A CN117476109 A CN 117476109A
Authority
CN
China
Prior art keywords
microorganism
microorganisms
positive
detected
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311524535.4A
Other languages
Chinese (zh)
Inventor
殷琼慧
郭文浒
陈晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rgi Fuzhou Genetic Medicine Laboratory Co ltd
Original Assignee
Rgi Fuzhou Genetic Medicine Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rgi Fuzhou Genetic Medicine Laboratory Co ltd filed Critical Rgi Fuzhou Genetic Medicine Laboratory Co ltd
Priority to CN202311524535.4A priority Critical patent/CN117476109A/en
Publication of CN117476109A publication Critical patent/CN117476109A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the technical field of data analysis, in particular to a microbial data analysis method based on a super-multiple targeted sequencing technology. The microbiological data analysis method comprises the following steps: s1: performing super-multiplex PCR amplification and high-throughput sequencing on a sample to be detected, obtaining a detection data original list of the sample to be detected, and classifying and grading microorganisms in the detection data original list; s2: sequentially filtering false positive microorganisms and batch polluted microorganisms; s3: screening positive microorganisms and positive drug resistance genes according to the number of target points and the number of homogenizing sequences; s4: and analyzing microorganisms corresponding to the positive drug resistance genes. The method can accurately judge the positive microorganisms and drug resistance genes in the sample, and analyze the microorganisms corresponding to the positive drug resistance genes, so that the t-NGS ultra-multiple targeted pathogen detection technology is more effectively applied to microorganism detection.

Description

Microbial data analysis method based on super-multiple targeted sequencing technology
Technical Field
The invention relates to the technical field of data analysis, in particular to a microbial data analysis method based on a super-multiple targeted sequencing technology.
Background
Microorganisms (microorganisms) are classified into bacteria, fungi, viruses, parasites, mycoplasma, chlamydia, etc. according to morphological characteristics, culture characteristics, physiological and biochemical characteristics, ecological habits, etc.
Traditional methods of microbiological detection include smear microscopy, pathogen culture, serological experiments, molecular biological methods, etc., but all suffer from respective limitations: the smear microscopy sensitivity and specificity are low, and the smear microscopy sensitivity and specificity are limited by the experience of operators; the sensitivity and specificity of microorganism culture are low, part of microorganism culture is very time-consuming, and part of microorganism culture is difficult or can not be carried out in vitro culture; serum specific antibody detection may be false negative in early infection, and double serum in acute phase and convalescence is required to be collected for detection and result evaluation; humoral immunodeficiency can lead to false negatives, which can also occur as a result of factors; the antigen detection result has lower accuracy; the traditional molecular biological method qPCR has single selection target point and limited detection range, and is easy to cause false negative results.
t-NGS (Targeted enrichment ofpathogens for sequencing) ultra-multiple targeted sequencing is a microorganism detection project developed based on the principles of multiple PCR and second-generation sequencing technology, and is used for identifying the types of microorganisms and identifying drug resistance genes by detecting and analyzing the nucleic acid information and drug resistance gene information of target microorganisms in a sample. the t-NGS target microorganism can reach tens to thousands, different microorganisms can be freely matched and combined to form different panel, the co-detection of the microorganisms with different nucleic acid types of DNA and RNA is realized, the requirement on sequencing quantity is low, and the cost is moderate, so that the method is attractive and favored. The Chinese patent application No. 202310206240.6 discloses a primer library, a kit and a detection method for detecting mycobacteria and/or tuberculosis drug resistance genes by using t-NGS, and can detect 47 mycobacteria and 20 drug resistance genes simultaneously. However, t-NGS also has some problems in practical applications: if the pathogenicity and infection rate of different microorganisms are different, the number of the microbial targets and the target detection performance are different; aerosol pollution may exist in the experimental operation process, and the pollution is amplified by the ultra-multiple amplification, so that it is difficult to judge whether the detected microorganism exists in the sample objectively or comes from the pollution among samples in the same batch; different microorganisms may carry the same drug resistance gene, it is difficult to determine from which microorganism the drug resistance gene is derived, and the like. The above problems can cause that the data analysis link of the t-NGS is difficult to make a choice, and false positive and false negative situations occur. How to overcome the problems, identifying the objective and authentic microorganism and drug-resistant genes in the sample from the detection data is the key of the effective application of the t-NGS technology to the detection of the microorganism.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the method can accurately judge the positive microorganisms and drug-resistant genes in the sample.
In order to solve the technical problems, the invention adopts the following technical scheme: the microbial data analysis method based on the ultra-multiple targeted sequencing technology comprises the following steps:
s1: performing super-multiplex PCR amplification and high-throughput sequencing on a sample to be detected, obtaining a detection data original list of the sample to be detected, and classifying and grading microorganisms in the detection data original list;
s2: sequentially filtering false positive microorganisms and batch polluted microorganisms;
s3: screening positive microorganisms and positive drug resistance genes according to the number of target points and the number of homogenizing sequences;
s4: and analyzing microorganisms corresponding to the positive drug resistance genes.
The invention has the beneficial effects that: the invention provides a method for judging positive microorganisms and drug-resistant genes in a sample during ultra-multiple targeted sequencing, which can accurately judge the positive microorganisms and drug-resistant genes actually existing in the sample and analyze microorganisms corresponding to the positive drug-resistant genes, so that the t-NGS ultra-multiple targeted pathogen detection technology is more effectively applied to microorganism detection.
Drawings
FIG. 1 is a partial flow chart of a method for analyzing microbiological data according to the present invention;
FIG. 2 is a flow chart of another part of the method for analyzing microbiological data according to the present invention;
Detailed Description
In order to describe the technical contents, the achieved objects and effects of the present invention in detail, the following description will be made with reference to the embodiments in conjunction with the accompanying drawings.
The microbial data analysis method based on the ultra-multiple targeted sequencing technology comprises the following steps:
s1: performing super-multiplex PCR amplification and high-throughput sequencing on a sample to be detected, obtaining a detection data original list of the sample to be detected, and classifying and grading microorganisms in the detection data original list;
s2: sequentially filtering false positive microorganisms and batch polluted microorganisms;
s3: screening positive microorganisms and positive drug resistance genes according to the number of target points and the number of homogenizing sequences;
s4: and analyzing microorganisms corresponding to the positive drug resistance genes.
From the above description, the beneficial effects of the invention are as follows: the invention provides a method for judging positive microorganisms and drug resistance genes in a sample during ultra-multiple targeted sequencing, which comprises the steps of firstly classifying and grading the microorganisms in an original list, primarily distinguishing positive, screening positive microorganisms and positive drug resistance genes according to the number of target points and the number of homogenization sequences, wherein the number of homogenization sequences can reflect the dominance degree of the microorganisms in the sample laterally.
Further, the specific steps of filtering false positive microorganisms are as follows: performing super-multiplex PCR amplification and high-throughput sequencing on the negative quality control product to obtain an original detection data list of the negative quality control product, and directly filtering if the number of the homogenization sequences of the target spot with the highest number of detection sequences of a certain microorganism is lower than that of the homogenization sequences detected by the target spot in the negative quality control product.
From the above description, it is clear that the number of homogenization sequences can laterally reflect the ratio (i.e., abundance) of the microorganism in the sample, and that the higher the number of sequences, the more advantageous the microorganism in the sample, indicating that the microorganism is likely to be the dominant flora of the sample. If the number of the homogenization sequences of the target spot with the highest number of the detection sequences of a certain microorganism is lower than the number of the homogenization sequences detected by the target spot in the negative quality control product, the microorganism is indicated to have low ratio in the sample, and is the pseudo-cation microorganism.
Further, the specific steps of filtering the batch of contaminating microorganisms are: defining microorganisms with the homogenization sequence number of more than or equal to 10000 of 1 target spot in a sample to be detected as strong positive detection microorganisms;
filtering samples adjacent to the sample to be detected when the number of detection sequences of the same target spots is less than 1/100 of the number of homogenization sequences of the strong positive detection microorganisms;
and filtering samples which are not adjacent to the sample to be detected when the number of detected sequences of the same target spots is less than 1/1000 of the number of the homogenization sequences of the strong positive microorganisms.
From the above description, it can be seen that after observing approximately 5 ten thousand t-NGS pathogen targeting detection data: when t-NGS pathogen detection is carried out, the condition that the pollution among samples is easy to occur mainly occurs when strong positive microorganisms are detected in batches of samples, and the pollution among samples in the same batch mainly occurs between adjacent or nearby samples, so that when the batch pollution is judged by data analysis, a pollution threshold can be defined according to the strong positive detection data of the samples and the positions of sample experiments, and the microorganisms possibly having the pollution in the samples can be filtered.
10000 is selected as a pollution index because a large number of experiments show that when the number of the detected sequences of microorganisms is more than or equal to 10000, the pollution among batches of samples is more likely to occur.
Further, the samples are classified into respiratory tract samples (alveolar lavage fluid samples, sputum samples, pharyngeal swab samples), blood samples, and cerebrospinal fluid samples.
Further, a label is given to the microorganism P1, P2 or P3 at the time of classification;
p1 is an "absolute" key microorganism, P2 is a "relative" key microorganism, and P3 is a background microorganism.
From the above description, it follows that the "absolute" key microorganisms: the microorganisms are key microorganisms which are objectively existed in the sample, and the microorganisms have significance only for positive detection results, so that the microorganisms can be judged to be positive only by detection; "relatively" critical microorganisms: the microorganisms are common pathogenic microorganisms in the corresponding sample types, positive detection results are meaningful when the host has low immunity/damage/defect, barrier function damage or microecological imbalance, the microorganisms can be planted in the host under low loading, infection is caused by high loading, and the microorganisms are judged to be positive only when detected data are more prominent; background microorganisms: the micro-ecological flora belonging to the sample type is common colonization microorganisms in the corresponding sample type, generally does not cause infection, and the microorganisms are not generally judged to be positive microorganisms and are only presented in a background list.
Further, when screening positive microorganisms, positive microorganisms are selected from among those labeled P1 and P2, and background microorganisms are selected from among those labeled P3.
As can be seen from the above description,
referring to fig. 1, the specific steps for screening positive microorganisms and background microorganisms are as follows:
the microorganism with the label P1 is reported as positive, and the positive microorganism is displayed at the position where the positive microorganism is reported;
if the preset target number is 1, the microorganism with the label of P2 is reported to be positive when the number row name of the detected microorganism homogenization sequence is 5 before the original list of the detected data, and the microorganism is displayed at the position of the reported positive microorganism; if the preset target number of the microorganism is more than or equal to 2, reporting positive when the detected target number of the microorganism is more than or equal to 2 or the detected target number of the microorganism is 1 and the number row of the homogenization sequence is 3 before the original list of the detection data, and displaying the position of the positive microorganism;
if the preset target point number of the microorganism is 1, displaying the microorganism in the report background microorganism position when the number row of detected microorganism homogenization sequences is 10 in the top of the original list ranking of the detected data; if the preset target number of the microorganism is more than or equal to 2, displaying the microorganism in the report background when the detected target number of the microorganism is more than or equal to 2 or the detected target number of the microorganism is 1 and the number of the homogenization sequences is 5 before the original list of the detection data.
From the above description, the microorganism labeled P2 is a common pathogenic microorganism, can colonize at low load and causes infection at high load, and if the number of detected sequences is higher in the overall detection of the sample, which indicates that the microorganism is a dominant species in the sample, the microorganism is more likely to be in a pathogenic state than in a colonized state, and can be reported as positive.
The more targets in the microorganism are detected, the higher the reliability, if one microorganism presets two targets, but only one target is detected, the reliability is relatively lower, in this case, if the detected microorganism is to be judged to be reliably detected, the requirement on the sequence number or ranking of the targets is higher, so that for the microorganism with the preset target not less than 2, only if a single target is detected, the single target is determined to report positive only if the ranking is 3. And for the microorganism with the preset target point of 1, the ranking requirement is slightly lower, and the cation can be reported by 5 before ranking.
Referring to fig. 2, the specific steps for screening positive drug resistance genes are as follows: the detection target point of the drug-resistant gene is more than or equal to 2, and the number of the homogenization sequences is more than or equal to 100, and the positive result is reported.
Further, the specific steps of S4 are as follows: firstly, constructing a microorganism corresponding to a drug-resistant gene and preferentially displaying a microorganism list, and if only 1 microorganism corresponding to the drug-resistant gene is positive, determining that the microorganism is the microorganism corresponding to the drug-resistant gene;
if a plurality of microorganisms corresponding to the drug resistance genes are reported to be positive, when the microorganisms are preferentially displayed, the preferentially displayed microorganisms are microorganisms corresponding to the drug resistance genes; when the microorganism with priority display does not exist, the highest target spot detection homogenization sequence number of the yang-reporting microorganism is reduced, and the microorganism with the highest sequence number is the microorganism corresponding to the drug resistance gene.
From the above description, the drug resistance gene is a DNA fragment of a nucleotide sequence encoding a drug resistance trait. The drug-resistant genes can be located on the chromosome of bacteria or on an extrachromosomal plasmid, and the drug-resistant genes carried by the plasmid can be transmitted among strains of the same bacteria or even different bacteria by means of conjugation, transformation, transduction and the like. The inventor researches a large amount of literature data and clinical data, and discovers that clinically common bacteria have a strong corresponding relation with drug-resistant genes, such as methicillin-resistant staphylococcus aureus (MRSA) often carries drug-resistant genes mecA, vancomycin-resistant enterococci (VRE) often carries drug-resistant genes VanA and VanB, and ultra-broad-spectrum beta-lactamase (ESBLs) enterobacteria often carries drug-resistant genes such as SHV and TEM. Since the drug-resistant gene can be transmitted through plasmid exchange of microorganisms, the same drug-resistant gene can possibly correspond to a plurality of microorganisms, for example, the drug-resistant gene NDM can be derived from different microorganisms such as escherichia coli, klebsiella pneumoniae, enterobacter cloacae and the like.
In order to identify the microorganism from which the drug-resistant gene originated, the inventors have made reference to a large amount of literature data, and established a correspondence model between the drug-resistant gene and the microorganism and a preferential microorganism model for the drug-resistant gene, wherein the microorganism which is preferentially displayed is the microorganism having the highest degree of correlation with the drug-resistant gene, i.e., the microorganism having the highest occurrence frequency among strains from which the drug-resistant gene was isolated. The microorganism from which the drug-resistant gene is derived can be determined by the number of preferred microorganisms and the number of homogenizing sequences of the drug-resistant gene.
Further, the specific steps of obtaining the original list of the detection data of the sample to be detected are as follows: sequentially extracting nucleic acid, constructing a library and sequencing on a machine to a sample to be tested to obtain processed microbial sequence data of the sample to be tested; comparing the obtained microorganism sequence of the sample to be detected with the target microorganism sequence, selecting a sequence with the mismatch number of the comparison bases less than or equal to 2, carrying out homogenization treatment on the sequence number, obtaining the homogenization sequence number of each target spot of each microorganism, and obtaining a detection data original list of the sample to be detected.
As can be seen from the above description, there is a certain error rate in sequencing, so that a certain base mismatch rate is allowed when aligned. The reason why the mismatch is not more than 2 is that if the mismatch is allowed too much, the mismatch is limited by factors such as sequencing read length, and the like, non-specific amplification may occur, and false positive results may be caused.
Preferably, a sequence with base mismatch number=2 is selected.
From the above description, if the selection is completely error-free or 1 mismatch, numerous real target information is easily filtered out, and false negative results are easily caused.
Further, during the homogenization treatment, the temperature is equal to or higher than 100K.
As can be seen from the above description, the data amount of the different sample sequencing will be different, and in order to establish a unified standard, it is necessary to perform a homogenization treatment on the sequence number, and the data amount homogenized to be more than or equal to 100K can meet the expected sensitivity.
The microbial classification list of the respiratory tract samples, the blood sample, the cerebrospinal fluid sample, and the drug resistance genes are shown in Table 1, 2, 3, and 4 respectively.
TABLE 1
TABLE 2
TABLE 3 Table 3
TABLE 4 Table 4
The first embodiment of the invention is as follows: an example of alveolar lavage fluid t-NGS detects a sample of a microorganism such as a Mycobacterium tuberculosis complex.
Sample type: alveolar lavage fluid (belonging to the respiratory tract sample); the detection method comprises the following steps: t-NGS; sample sequencing amount: 667,297k. The data analysis was performed on the samples using the following steps:
s1 t-NGS detection step: (1) sample processing: liquefying the sample to be detected for inspection to obtain a processed sample; (2) nucleic acid extraction: extracting nucleic acid of the treated sample, and performing concentration measurement and quality control on the obtained nucleic acid; (3) multiplex PCR amplification, library construction: constructing a sequencing library of the sample by using a customized library construction kit to obtain a sample sequencing library, and carrying out concentration measurement and quality inspection on the obtained nucleic acid; (4) sequencing on a machine: sequencing the library in high flux to obtain the off-machine data of the processed sample; (5) and (5) letter generation analysis: comparing the obtained microorganism sequence with a target microorganism sequence, selecting a sequence with the mismatch number of the comparison base within 2 base, homogenizing the sequence number (homogenizing to 100K), and obtaining the homogenizing sequence number of each target spot of each microorganism, thereby obtaining an original list of sample detection data. The alveolar lavage fluid detects 7 microorganisms in total and imparts a P1/P2/P3 tag to the microorganisms.
S2 negative control sample filtration: the sequence number of the highest target point of each microorganism is greater than that of a negative control sample;
s3, detecting data of strong positive microorganisms in batches and ortho-position filtering, namely detecting strong positive in samples of the same batch by using haemophilus influenzae and intermediate prasuvorexa, wherein the samples are not adjacent to the samples detected by the strong positive in the experimental process, the number of homogenization sequences of the haemophilus influenzae is=209 and is larger than 1/1000 of the number of homogenization sequences of the strong positive samples and 25980, and entering a microorganism classification screening link. The highest target homogenization sequence number of the middle Prevotella=23 < 1/1000 of the homogenization sequence number 25634 of the strong positive sample, and filtering. The remaining microorganisms: the mycobacterium tuberculosis, human parainfluenza virus type 3, streptococcus intermedius, inert bacillus coagulans and haemophilus parainfluenza are not detected by strong positive in the samples of the same batch, and all enter a biological classification screening link.
S4, microbial pathogenicity grading and detection data filtering: the Mycobacterium tuberculosis complex and human parainfluenza virus 3 type label are P1, the sequence number is more than 0, the number of detected target points is more than 0, positive is reported, and the positive microorganism position is presented in the report; the haemophilus influenzae label is P2, the preset target number is 4, the detected target number is 2, the positive is reported, and the positive microorganism position is presented in the report; the middle streptococcus label is P3, the preset target point number is 1, the microorganism homogenization sequence number row is named as integral 4, the report is positive, and the background microorganism position is displayed in the report; the inert lactobacillus label is P3, the preset target number is 2, the detected target number is 1, the number of homogenization sequences is ranked 5 in the whole detection of the sample, the positive is reported, and the background microorganism position is displayed in the report; the haemophilus parainfluenza label is P3, the preset target number is 3, the detected target number is 2, the positive is reported, and the background microorganism position is displayed in the report (the analysis process is shown in Table 5).
TABLE 5
The accuracy of the data analysis in example one was verified by using the mNGS detection result (the detection method is the same as that of the chinese invention patent application No. 202310568758.4), and the analysis result is shown in table 6.
TABLE 6
As can be seen from Table 6, the microorganisms detected by t-NGS in the invention are detected in mNGS verification, which indicates that the problem that the microorganisms are reported by false positives does not exist in the data analysis of t-NGS by using the application; the mNGS was completely consistent with the positive microorganisms of example one, most of the background microorganisms were detected consistently, and only a few of the background microorganisms were not detected consistently, because t-NGS was not provided with the corresponding targets, indicating that the analysis method of the present invention was used for data analysis, and that no false negatives were reported for the targeted microorganisms. Therefore, the data analysis method has good accuracy.
The second embodiment of the invention is as follows: sample of alveolar lavage fluid t-NGS for detecting microorganism and drug resistance gene
Sample type: alveolar lavage fluid (belonging to the respiratory tract sample); the detection method comprises the following steps: t-NGS; sample sequencing amount: 916, 365 k. The data analysis was performed on the samples using the following steps:
s1 t-NGS detection step: (1) sample processing: liquefying the sample to be detected for inspection to obtain a processed sample; (2) nucleic acid extraction: extracting nucleic acid of the treated sample, and performing concentration measurement and quality control on the obtained nucleic acid; (3) multiplex PCR amplification, library construction: constructing a sequencing library of the sample by using a customized library construction kit to obtain a sample sequencing library, and carrying out concentration measurement and quality inspection on the obtained nucleic acid; (4) sequencing on a machine: sequencing the library in high flux to obtain the off-machine data of the processed sample; (5) and (5) letter generation analysis: comparing the obtained microorganism sequence with a target microorganism sequence, selecting a sequence with the mismatch number of the comparison base within 2 base, homogenizing the sequence number (homogenizing to 100K), and obtaining the homogenizing sequence number of each target spot of each microorganism, thereby obtaining an original list of sample detection data. The alveolar lavage fluid detects 8 microorganisms and 3 drug resistance genes and endows the microorganisms with P1/P2/P3 labels.
S2 negative control sample filtration: staphylococcus aureus, klebsiella oxytoca and klebsiella aerocarpa are all detected in the negative control sample, but the sequence numbers of the staphylococcus aureus, the klebsiella aerocarpa and the klebsiella aerocarpa are all larger than those of the negative control sample, and the next screening step is carried out. Other microorganisms and drug-resistant genes are not detected in the negative control sample, and the next screening step is also carried out.
S3, detecting data of strong positive microorganisms in batches and ortho-position filtration: the acinetobacter baumannii and streptococcus pneumoniae and streptococcus grass green are subjected to strong positive detection in the samples of the same batch, the strong positive detection samples of the acinetobacter baumannii and the streptococcus grass green in the experimental link are adjacent to the samples, the homogenization sequence number=795 of the acinetobacter baumannii is more than 1/100 of the homogenization sequence number 65892 of the strong positive samples, and the step of microorganism classification screening is carried out. Streptococcus grass homogenization sequence number=57 < 1/100 of strong positive sample homogenization sequence number 32654, filtered. The strong positive detection sample of streptococcus pneumoniae in the experimental link is not adjacent to the sample, the homogenization sequence number=259 is more than 1/1000 of the homogenization sequence number 76532 of the strong positive sample, and the sample enters the microorganism screening link. The remaining microorganisms: staphylococcus aureus, klebsiella aerogenes, streptococcus intermedius and streptococcus angina are not detected by strong positive in the samples of the same batch, and all enter a biological classification screening link.
S4, microbial pathogenicity grading and detection data filtering: the staphylococcus aureus, the klebsiella oxytoca, the klebsiella pneumoniae, the acinetobacter baumannii and the streptococcus pneumoniae are marked with P2, the preset target point number is more than 1, the detected target point number is more than or equal to 2, the positive is reported, and the positive microorganism position is shown in the report; the middle streptococcus label is P3, the preset target point number is 1, the microorganism homogenization sequence number row is named as integral 4, the report is positive, and the background microorganism position is displayed in the report; the preset target number of streptococcus angina is 3, the target number is detected to be 3, the positive is reported, and the background microorganism position is displayed in the report.
S5, screening drug-resistant genes and determining drug-resistant microorganisms: the sample detects 3 drug resistance genes: the number of target points detected by the drug resistance gene KPC, mecA, OXA-23, OXA-23 is less than 2, and the direct filtration is carried out; the KPC and mecA detection targets are 2, the sequence number is more than 100, the positive is reported, and the link of matching drug-resistant genes with microorganisms is entered with reference to Table 4. The microorganism corresponding to the drug resistance gene mecA is only one microorganism of staphylococcus aureus, and the microorganism corresponding to the drug resistance gene mecA is staphylococcus aureus; the corresponding microorganism of KPC is Klebsiella pneumoniae, escherichia coli, enterobacter cloacae, klebsiella oxytoca, etc., and preferably the microorganism is Klebsiella pneumoniae, and the corresponding microorganism of drug resistance gene KPC is Klebsiella pneumoniae (analysis process is shown in Table 7).
TABLE 7
The accuracy of the data analysis in example two was verified by using the mNGS detection result (the detection method is the same as that of the chinese invention patent application No. 202310568758.4), and the analysis result is shown in table 8.
TABLE 8
As can be seen from Table 8, the microorganisms detected by t-NGS of the present invention were detected in mNSS verification, and the problem that the microorganisms were reported by false positives was not present; the mNGS was completely consistent with the positive microorganisms of example one, and only a few background microorganisms were inconsistent in detection results, because t-NGS was not provided with the corresponding targets, indicating that the analysis method of the invention was used for data analysis, and no false negative was reported for the target microorganisms. Therefore, the data analysis method has good accuracy.
In summary, according to the method for judging the positive microorganisms and the drug resistance genes in the samples during the ultra-multiple targeted sequencing, the microorganisms in the original list are classified and graded, so that the microorganisms are distinguished into absolute key microorganisms, relative key microorganisms and background microorganisms, the positive is primarily distinguished according to the relation between the microorganisms and the samples, then the false positive is filtered out by comparing the negative control samples with the negative control samples, the polluted microorganisms in batches are compared with the samples in the same batch, then the positive microorganisms and the positive drug resistance genes are screened according to the number of target points and the number of homogenization sequences, the advantage degree of the microorganisms in the samples can be reflected on the side surface of the number of homogenization sequences, and the positive microorganisms and the drug resistance genes truly existing in the samples can be accurately judged by adopting the method. The invention also establishes a corresponding relation model of the drug-resistant genes and microorganisms and a preferential microorganism model of the drug-resistant genes, and can analyze the microorganisms corresponding to the positive drug-resistant genes according to the waveform and the number of the uniform sequences, thereby enabling the t-NGS ultra-multiple targeted pathogen detection technology to be more effectively applied to microorganism detection.
The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent changes made by the specification and drawings of the present invention, or direct or indirect application in the relevant art, are included in the scope of the present invention.

Claims (10)

1. The microbial data analysis method based on the ultra-multiple targeted sequencing technology is characterized by comprising the following steps of:
s1: performing super-multiplex PCR amplification and high-throughput sequencing on a sample to be detected, obtaining a detection data original list of the sample to be detected, and classifying and grading microorganisms in the detection data original list;
s2: sequentially filtering false positive microorganisms and batch polluted microorganisms;
s3: screening positive microorganisms and positive drug resistance genes according to the number of target points and the number of homogenizing sequences;
s4: and analyzing microorganisms corresponding to the positive drug resistance genes.
2. The method for analyzing microbial data based on the ultra-multiplex targeted sequencing technology according to claim 1, wherein the specific step of filtering the false positive microorganisms is as follows: performing super-multiplex PCR amplification and high-throughput sequencing on the negative quality control product to obtain an original detection data list of the negative quality control product, and directly filtering if the number of the homogenization sequences of the target spot with the highest number of detection sequences of a certain microorganism is lower than that of the homogenization sequences detected by the target spot in the negative quality control product.
3. The method for analyzing microbial data based on the ultra-multiplex targeted sequencing technology according to claim 1, wherein the specific steps of filtering the batch of contaminating microorganisms are: defining microorganisms with the homogenization sequence number of more than or equal to 10000 of 1 target spot in a sample to be detected as strong positive detection microorganisms;
filtering samples adjacent to the sample to be detected when the number of detection sequences of the same target spots is less than 1/100 of the number of homogenization sequences of the strong positive detection microorganisms;
and filtering samples which are not adjacent to the sample to be detected when the number of detected sequences of the same target spots is less than 1/1000 of the number of the homogenization sequences of the strong positive microorganisms.
4. The method for analyzing microbial data based on the super multiplex targeted sequencing technology according to claim 1, wherein the classification is performed by giving the microorganisms P1, P2 or P3 a tag;
the P1 is an "absolute" key microorganism, the P2 is a "relative" key microorganism, and the P3 is a background microorganism.
5. The method for analyzing microbial data based on the super multiplex targeted sequencing technology according to claim 4, wherein when screening the positive microorganisms, screening positive microorganisms among microorganisms labeled P1 and P2 and screening background microorganisms among microorganisms labeled P3.
6. The method for analyzing microbial data based on the ultra-multiplex targeted sequencing technology according to claim 5, wherein the specific steps of screening the positive microorganism and the background microorganism are as follows:
the microorganism with the label P1 is reported as positive, and the positive microorganism is displayed at the position where the positive microorganism is reported;
if the preset target number is 1, the microorganism with the label of P2 is reported to be positive when the number row name of the detected microorganism homogenization sequence is 5 before the original list of the detected data, and the microorganism is displayed at the position of the reported positive microorganism; if the preset target number of the microorganism is more than or equal to 2, reporting positive when the detected target number of the microorganism is more than or equal to 2 or the detected target number of the microorganism is 1 and the number row of the homogenization sequence is 3 before the original list of the detection data, and displaying the position of the positive microorganism;
if the preset target point number of the microorganism is 1, displaying the microorganism in the report background microorganism position when the number row of detected microorganism homogenization sequences is 10 in the top of the original list ranking of the detected data; if the preset target number of the microorganism is more than or equal to 2, displaying the microorganism in the report background when the detected target number of the microorganism is more than or equal to 2 or the detected target number of the microorganism is 1 and the number of the homogenization sequences is 5 before the original list of the detection data.
7. The method for analyzing microbial data based on the ultra-multiplex targeted sequencing technology according to claim 1, wherein the specific steps of screening the positive drug resistance genes are as follows: the detection target point of the drug-resistant gene is more than or equal to 2, and the number of the homogenization sequences is more than or equal to 100, and the positive result is reported.
8. The method for analyzing microbial data based on the super-multiplex targeted sequencing technology according to claim 1, wherein the specific steps of S4 are as follows: firstly, constructing a microorganism corresponding to a drug-resistant gene and preferentially displaying a microorganism list, and if only 1 microorganism corresponding to the drug-resistant gene is positive, determining that the microorganism is the microorganism corresponding to the drug-resistant gene;
if a plurality of microorganisms corresponding to the drug resistance genes are reported to be positive, when the microorganisms are preferentially displayed, the preferentially displayed microorganisms are microorganisms corresponding to the drug resistance genes; when the microorganism with priority display does not exist, the highest target spot detection homogenization sequence number of the yang-reporting microorganism is reduced, and the microorganism with the highest sequence number is the microorganism corresponding to the drug resistance gene.
9. The microbial data analysis method based on the ultra-multiple targeted sequencing technology according to claim 1, wherein the specific steps of obtaining the original list of detection data of the sample to be detected are as follows: sequentially extracting nucleic acid, constructing a library and sequencing on a machine to a sample to be tested to obtain processed microbial sequence data of the sample to be tested; comparing the obtained microorganism sequence of the sample to be detected with the target microorganism sequence, selecting a sequence with the mismatch number of the comparison bases less than or equal to 2, carrying out homogenization treatment on the sequence number, obtaining the homogenization sequence number of each target spot of each microorganism, and obtaining a detection data original list of the sample to be detected.
10. The method for analyzing microbial data based on the ultra-multiple targeted sequencing technology according to claim 9, wherein the homogenization treatment is performed until the homogenization is equal to or more than 100K.
CN202311524535.4A 2023-11-15 2023-11-15 Microbial data analysis method based on super-multiple targeted sequencing technology Pending CN117476109A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311524535.4A CN117476109A (en) 2023-11-15 2023-11-15 Microbial data analysis method based on super-multiple targeted sequencing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311524535.4A CN117476109A (en) 2023-11-15 2023-11-15 Microbial data analysis method based on super-multiple targeted sequencing technology

Publications (1)

Publication Number Publication Date
CN117476109A true CN117476109A (en) 2024-01-30

Family

ID=89625431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311524535.4A Pending CN117476109A (en) 2023-11-15 2023-11-15 Microbial data analysis method based on super-multiple targeted sequencing technology

Country Status (1)

Country Link
CN (1) CN117476109A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117935918A (en) * 2024-03-21 2024-04-26 北京诺禾致源科技股份有限公司 Pathogenic microorganism data analysis method and device and processor

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117935918A (en) * 2024-03-21 2024-04-26 北京诺禾致源科技股份有限公司 Pathogenic microorganism data analysis method and device and processor

Similar Documents

Publication Publication Date Title
Dulanto Chiang et al. From the pipeline to the bedside: advances and challenges in clinical metagenomics
Griffin et al. Use of matrix-assisted laser desorption ionization–time of flight mass spectrometry to identify vancomycin-resistant enterococci and investigate the epidemiology of an outbreak
Murray Matrix-assisted laser desorption ionization time-of-flight mass spectrometry: usefulness for taxonomy and epidemiology
CN113160882B (en) Pathogenic microorganism metagenome detection method based on third generation sequencing
US20200294628A1 (en) Creation or use of anchor-based data structures for sample-derived characteristic determination
CN106661606A (en) Method for detecting and characterising a microorganism
CN101397586B (en) Composite gene chip for food-borne pathogenic bacteria detection
CN117476109A (en) Microbial data analysis method based on super-multiple targeted sequencing technology
CN108090324B (en) Pathogenic microorganism identification method based on high-throughput gene sequencing data
US20090136953A1 (en) Molecular diagnostic method for determining the resistance of a microorganism to an antibiotic
Watts et al. Metagenomic next-generation sequencing in clinical microbiology
US7856322B2 (en) Method and apparatus for determining specificity of a candidate probe
CN116219040A (en) Molecular marker, primer probe group and detection method for detecting lactobacillus plantarum S58
JP4769041B2 (en) Mycobacterial bacteria identification kit
Svensson et al. Detection of Mycobacterium tuberculosis complex in pulmonary and extrapulmonary samples with the FluoroType MTBDR assay
Godreuil et al. Which species concept for pathogenic bacteria?: An E-Debate
Brown-Elliott et al. Enhancement of conventional phenotypic methods with molecular-based methods for the more definitive identification of nontuberculous mycobacteria
US20030175687A1 (en) Methods for the detection and identification of microorganisms
CN113774156B (en) Indiananas and method for simultaneously detecting three serum antigens of Indiananas as well as real-time fluorescent quantitative PCR (polymerase chain reaction) primers, probes, kit and method
Priest Rapid identification of microorganisms
CN116631511A (en) Microbial data analysis method
Horsfield et al. Graph-based Nanopore Adaptive Sampling with GNASTy enables sensitive pneumococcal serotyping in complex samples
Piasecki et al. Retrospective and prospective evaluation of the FluoroType®-Mycobacteria VER 1.0 assay for the identification of mycobacteria from cultures in a French center
US20220076790A1 (en) Incorporating variant information into omics
CN114277185A (en) MNP (MNP) marker combination, primer pair combination, kit and application of adenovirus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination