CN113223618B - Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome - Google Patents

Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome Download PDF

Info

Publication number
CN113223618B
CN113223618B CN202110579642.1A CN202110579642A CN113223618B CN 113223618 B CN113223618 B CN 113223618B CN 202110579642 A CN202110579642 A CN 202110579642A CN 113223618 B CN113223618 B CN 113223618B
Authority
CN
China
Prior art keywords
virulence
gene
clinical
result
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110579642.1A
Other languages
Chinese (zh)
Other versions
CN113223618A (en
Inventor
夏涵
官远林
江月
樊淑
杨静
胡煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yuguo Microcode Biotechnology Co ltd Of Xixian New Area
Yuguo Zhizao Technology Beijing Co ltd
Yuguo Biotechnology Beijing Co ltd
Original Assignee
Yuguo Microcode Biotechnology Co ltd Of Xixian New Area
Yuguo Zhizao Technology Beijing Co ltd
Yuguo Biotechnology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yuguo Microcode Biotechnology Co ltd Of Xixian New Area, Yuguo Zhizao Technology Beijing Co ltd, Yuguo Biotechnology Beijing Co ltd filed Critical Yuguo Microcode Biotechnology Co ltd Of Xixian New Area
Priority to CN202110579642.1A priority Critical patent/CN113223618B/en
Publication of CN113223618A publication Critical patent/CN113223618A/en
Application granted granted Critical
Publication of CN113223618B publication Critical patent/CN113223618B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a metagenome-based method and a metagenome-based system for detecting virulence genes of clinically important pathogenic bacteria. The method comprises the following steps: s10, establishing a clinical pathogenic bacterium virulence gene database; s20, acquiring original data of clinical sample metagenome sequencing, and preprocessing the original data to acquire target data; s30, analyzing target data by using a preset metagenome sequencing data multiple-comparison annotation system, and identifying virulence genes; s40, establishing an important virulence gene-virulence factor-characterization (function/clinical phenotype) correlation database; and S50, generating a virulence gene identification report based on the virulence gene identification result and the associated database by using a preset clinical automation report system. The system can identify virulence genes of metagenome sequencing data of clinical infection samples of different types (cerebrospinal fluid and the like), can identify a plurality of important virulence genes of a plurality of pathogenic bacteria in the samples at one time, has better sensitivity and accuracy, and helps doctors to diagnose, treat and prognose in time.

Description

Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome
Technical Field
The invention belongs to the technical field of biological information algorithm software. Can be applied to clinical pathogen detection products, namely the analysis of clinical pathogen virulence genes for pathogen metagenome detection, and comprises hundreds of virulence genes of various clinical pathogens. The application field is as follows: the identification, identification and traceability of pathogenic bacteria virulence genes detected by pathogen metagenome of samples such as tissues, body fluids (cerebrospinal fluid, alveolar lavage fluid, blood and sputum) and the like of various infectious disease patients assist clinicians in accurate diagnosis, treatment scheme selection and prognosis judgment, and provide useful information in monitoring bacterial infection diseases.
Background
The bacterial infection can cause various acute and chronic diseases, and can also be used as conditional pathogenic bacteria to cause diseases through the interaction among pathogeny, host and environmental factors, wherein certain clinical pathogenic bacteria can seriously harm the life and health of human beings. Such as Staphylococcus aureus (Staphylococcus aureus), which often causes pyogenic infection in humans, can directly cause pneumonia, pseudomembranous enteritis, pericarditis, and even septicemia, sepsis and other systemic infections. Klebsiella pneumoniae (Klebsiella pneumoniae) is widely present on animal mucosal surfaces (such as human gastrointestinal tract) or environments and is a main pathogen of hospital medical related infection and severe community-acquired infection. In china, klebsiella pneumoniae accounts for 11.9% of pathogens isolated from ventilator-associated pneumonia and intensive care unit-acquired pneumonia. Streptococcus pneumoniae (Streptococcus pneumoniae) is one of the main pathogenic bacteria of community-acquired pneumonia, otitis media, meningitis, abscess, septicemia and the like. In developing countries, over 110 million children die each year from pneumonia, with streptococcus pneumoniae accounting for approximately 20%. Another type of treatment that is troublesome, has a high mortality rate, and often exhibits multiple or pan-drug resistance is pseudomonas aeruginosa, a gram-negative bacterium that is susceptible to colonization and infection in the respiratory tract, particularly in immunocompromised persons. These clinical pathogens often exert their pathogenicity through multiple virulence genes during the course of infecting humans, leading to disease development.
Virulence factors are a general term for the functional units of a class of effector or regulatory molecules (proteins, lipid molecules or compounds, etc.) and combinations thereof that are produced by pathogenic microorganisms and cause host disease to occur. The genes encoding these virulence factors are often referred to as virulence genes. For example, staphylococcus aureus can realize adhesion, infection and dissemination to host cells by producing various virulence factors, and can escape from the action of a host immune system or antibiotics by forming a biofilm, important virulence genes comprise pvl, sea, seb and the like, pvl can promote neutrophilic granulocyte lysis to endow the strain with strong pathogenicity, and the staphylococcus aureus is related to skin and soft tissue suppurative infection, severe patients can cause necrotizing pneumonia, and the lethality rate is high; the enterotoxin genes sea, seb, etc. can stimulate vomiting center to cause acute gastroenteritis with vomiting as main symptom, and are the main reason for bacterial food poisoning of human beings. And the streptococcus pneumoniae has various virulence genes, such as capsular polysaccharide synthetase A gene, pneumolysin gene (ply), lytA, nanA and the like. Wherein the capsular related gene such as cps4A is a prerequisite for pathogenicity of streptococcus pneumoniae; hemolysin can cause host cell lysis, cause alveolar edema and hemorrhage, and induce pneumonia, as well as cause bacteremia by forcing bacteria into the blood; lytA is involved in bacterial autolysis, resulting in secretion of hemolysin and other components, which may cause a strong inflammatory response in the host. The difference of virulence genes of different strains of Klebsiella pneumoniae can cause the difference of pathogenicity of the Klebsiella pneumoniae, an important virulence gene of the Klebsiella pneumoniae has rmpA, the synthesis of capsular polysaccharide of the Klebsiella pneumoniae can be adjusted, the high mucus phenotype of the Klebsiella pneumoniae is generated, the high pathogenicity of the Klebsiella pneumoniae causes the strong pathogenicity of the strains, and the virulence function of the Klebsiella pneumoniae and the virulence genes such as iUTA influence the formation of liver abscess. These virulence genes encoding virulence factors such as toxins and surface proteins can help bacteria to adhere to and invade host cells, improve survival and propagation of the bacteria in the host cells, cause toxic death of the host cells, and the like, and finally cause various infectious diseases of the host. Therefore, the detection and identification of the virulence genes of the high-frequency or important pathogenic bacteria of the clinical samples are beneficial to identifying potential pathogenic bacteria, evaluating the virulence of clinical strains, and assisting the selection and implementation of specific measures such as diagnosis, accurate treatment, prognosis treatment and the like of clinical infectious diseases. On the other hand, in the field of public health, detection of virulence genes and identification of specific virulence spectrums thereof provide useful information on monitoring of bacterial infection diseases, judgment of epidemic outbreak probability, evaluation of epidemic severity and the like, and help to propose and implement reasonable disease control measures.
The method for detecting and identifying virulence genes of clinical pathogenic bacteria mainly comprises the following steps: single/multiple gene detection based on Polymerase Chain Reaction (PCR) and derivative technology thereof, loop-mediated isothermal amplification, gene chip, second-generation sequencing metagenome detection technology and the like. Among them, the most widely used in clinical practice is the PCR technique, which mainly aims at the conserved region of the nucleic acid sequence of a specific virulence gene to design a specific primer, and takes the nucleic acid of a clinical sample or an isolated strain as a template for amplification detection. The technology can realize the rapid detection of the gene and has the characteristic of high sensitivity. The clinical application mainly comprises: 1) multiplex PCR technique: two pairs or more than two pairs of primers are added into the same PCR reaction system, a plurality of nucleic acid fragments are amplified simultaneously, and two or more than two virulence genes can be detected and identified simultaneously; 2) fluorescent quantitative PCR technology: on the basis of common PCR, the real-time detection of the fluorescence signal of each cycle product is added, so that the quantitative and qualitative analysis of the initial template DNA is realized. Meanwhile, the two PCR technologies also have the defects of complex operation, high requirements on instruments and personnel, unsuitability for rapid on-site diagnosis and the like. The loop-mediated isothermal amplification (LAMP) is a novel nucleic acid amplification technology different from PCR, relies on DNA polymerase with strand displacement activity and 2 pairs of specially designed primers, does not need repeated temperature cycle and expensive instruments and equipment, can efficiently and quickly complete the amplification reaction under isothermal conditions, and is widely applied to detection and identification of pathogens such as bacteria, viruses, parasites and the like at present. Compared with the common PCR technology, LAMP has the characteristics of high specificity, high sensitivity, simple operation, low requirements on instruments and equipment, capability of quickly completing nucleic acid amplification under a constant temperature condition and the like. The defects are that the design requirement on the primer is high, the non-specific amplification is not easy to distinguish, the pollution influence is large, and the like. A gene chip, also called DNA microarray, refers to a dense molecular array formed by fixing a large number of DNA probes such as gene fragments and artificially synthesized oligonucleotides on a carrier in a pre-designed manner by using in-situ synthesis (in-situ synthesis) or micro-spotting and other methods, hybridizing with a nucleic acid sample labeled by fluorescein or other methods, and determining the presence or absence of a target gene in the sample and quantifying by detecting the strength of a hybridization signal. Recent developments have led to the use of gene chip technology in the fields of gene expression analysis, mutation and polymorphism analysis, and the like. Compared with PCR or LAMP technology, the gene chip technology has the advantages of capability of realizing detection of a large number of genes in one experiment, rapidness, high parallelism, diversity, automation and the like. On the other hand, the gene chip has high detection cost, high operation requirement and poor sensitivity, which results in limited application range. No matter PCR, LAMP or gene chip, the prior knowledge of the sample is needed to be known, only the specific virulence genes of specific bacteria can be detected, the genes with large variation and unpredicted variation are difficult to deal with, and the virulence genes with important clinical significance cannot be completely covered. The Metagenomic sequencing technology (Metagenomic sequencing) which is rapidly developed in recent years and is based on the next generation sequencing has unique advantages in overcoming the defects. The metagenome sequencing does not need to separately culture pathogen separation, and clinical samples can be directly analyzed through nucleic acid extraction and purification. And (3) carrying out comprehensive virulence gene annotation and identification by utilizing sequence homology comparison.
A plurality of pathogenic bacteria exist in a clinical infection disease sample, the pathogenic mechanism of the pathogenic bacteria relates to different virulence factors, and pathogenicity is generated by the synergistic regulation and control effect of a plurality of virulence genes. The prior art relating to the detection of microbial virulence genes comprises PCR and derivative technology thereof, loop-mediated isothermal amplification, gene chips and the like, and has the problems of limited number and range of virulence gene detection, prior cognition, easy cross contamination and the like. The current products for virulence gene identification are based primarily on PCR techniques, which can only detect a limited range of bacteria and a limited number of virulence genes. In particular, in the case of ordinary PCR, only one virulence gene of one bacterium can be detected in one experiment, for example, in Chinese patent publication CN110669853A, only the ampR gene of Klebsiella pneumoniae, which is not sticky, can be detected. Even in the case of multiplex PCR, it is necessary to consider the problem that an excessive number of primer pairs will easily form dimers and affect the amplification efficiency, resulting in a small number of virulence genes to be detected, for example, in Chinese patent publication CN111876509A, four virulence genes such as abaR, CsuA, and bap of Acinetobacter baumannii are detected at a time by applying the multiplex PCR technology, 7 virulence genes of Aeromonas are detected by the multiplex PCR product of Chinese patent publication CN109554449A, and a seven-fold PCR detection primer set is designed by the Chinese patent publication CN108707680A technology, and only covers specific regions of 21 virulence genes such as sip, fbsA, and hylB of Streptococcus agalactiae. The multiplex fluorescent PCR technology is also applied to the detection of virulence genes due to the convenience of result interpretation, for example, the design of multiplex fluorescent PCR in the Chinese patent publication CN112430677A submitted in 2020 quantitatively detects three virulence genes of icuA, rmpA1 and rmpA2 of Klebsiella pneumoniae. Meanwhile, the loop-mediated isothermal amplification technology developed in recent years is also applied to clinical virulence gene detection. For example, Chinese patent publication CN11150075A discloses that 2 pairs of primers are used to amplify 6 different regions of peg-344 gene of Klebsiella pneumoniae with high virulence, so as to identify clinical high virulence strains. The gene chip has higher cost and is less applied to clinical virulence gene detection. The product of Chinese patent No. CN105950732B filed in 2016 is designed and identified with 9 animal-derived food pathogenic bacteria: 17 virulence genes of Salmonella (Salmonella), Enterococcus (Enterococcus), Clostridium perfringens (Clostridium perfringens), and the like. The prior art needs to design or use specific primers of one or a plurality of known genes before experiments, so that only virulence genes in a preset range can be detected. Clinically, a more sensitive and comprehensive virulence gene detection strategy for infectious pathogenic bacteria is needed, and the requirements of China on diagnosis, treatment and epidemiological monitoring of important pathogenic bacteria with high incidence and high toxicity are met. In the metagenomic sequencing technology developed in recent years, the whole microbial community in a specific habitat is taken as a research object, and the DNA of all the microbial groups in a clinical sample is directly extracted for sequencing annotation and comparative analysis. The technology makes up the defects of the prior sequencing method, does not need culture or prior knowledge of samples, and can simultaneously carry out comprehensive virulence gene scanning and identification on clinical pathogen metagenome. The prior Chinese patent published application or obtained projects have no products or similar projects for detecting virulence genes based on metagenome, and the research, development and popularization of the products are helpful for meeting the requirements of the diagnosis of the highly virulent pathogenic bacteria of clinical infectious diseases.
Disclosure of Invention
The patent provides a method and a system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome, including but not limited to identification of hundreds of important virulence genes of various pathogenic bacteria such as Klebsiella pneumoniae, Streptococcus pneumoniae, Escherichia coli, Haemophilus influenzae, Staphylococcus aureus and the like, such as rmpA, iucA, ply, cps, stx1A, bexA, lukF-PV, hly, ompA, plc, cylL, ctxA, eccA1, lipA, slo, acm, icmTlef, toxA, pgm and the like. The method comprises the following main parts: 1) establishing a clinical pathogenic bacterium virulence gene database; 2) obtaining clinical sample metagenome sequencing original data, and preprocessing the clinical sample metagenome sequencing original data to obtain target data; 3) analyzing target data by using a preset metagenome sequencing data multiple comparison system and a multiple annotation system, and identifying virulence genes; 4) establishing an important virulence gene-virulence factor-characterization (function/clinical phenotype) association database; 5) and generating a virulence gene identification report based on the virulence gene identification result and the associated database by using a preset clinical automation report system. The method is suitable for clinical multiple infection disease sample types (cerebrospinal fluid, alveolar lavage fluid, blood and the like), multiple high-frequency and important virulence genes of multiple clinical pathogenic bacteria are identified at one time, additional screening time is reduced, a deep correlation database and a multiple comparison strategy have high sensitivity and accuracy, an automatic report system quickly generates a report, and a doctor is helped to identify, diagnose, treat and prognose infection type high-virulence pathogenic strains in time.
The invention discloses a method for detecting virulence genes of clinically important pathogenic bacteria based on metagenome, which comprises the following steps:
s10, establishing a clinical pathogenic bacterium virulence gene database;
s20, acquiring original data of clinical sample metagenome sequencing, and preprocessing to obtain target data;
s30, analyzing target data by using a preset metagenome sequencing data multiple comparison system and a multiple annotation system, and identifying virulence genes;
s40, establishing an important virulence gene-virulence factor-characterization (function/clinical phenotype) correlation database;
and S50, generating a virulence gene identification report based on the S30 virulence gene identification result and the S40 association database by using a preset clinical automation report system.
In some embodiments of the invention, the S10 includes the following steps:
obtaining the virulence genes and sequences of clinical pathogenic bacteria from a virulence gene database;
acquiring all genomes, gene sequences and annotation information of the clinical pathogenic bacteria from a public database;
filtering pseudogenes, fragments, and misannotated sequences in the gene sequence;
clustering each gene unit sequence by multiple thresholds, and performing cross comparison and de-duplication in groups;
simulating a data set to test a gene unit reference gene sequence, and adjusting a supplementary gene unit reference sequence;
clustering each gene unit sequence by circulating multiple threshold values, and performing cross comparison and duplication removal in groups;
clustering reference sequences of all gene units, and filtering abnormal sequences;
extracting public database annotation information such as gene names and species names of the reference sequences, and proofreading and standardizing reference sequence annotations of each gene unit;
establishing reference sequence indexes of all virulence gene units;
and optionally, establishing a software to realize automatic downloading sequence, cluster deduplication, updating and standardization of the database.
In some embodiments of the invention, the S20 includes:
filtering reads with a quality value below 2 and a base count of 40% of the total read;
excising bases with average mass of less than 20 bases in the sliding window (5 bp);
filtering reads with average quality less than 20, N number greater than 5, and length less than 50.
In some embodiments of the present invention, the S30 includes the following steps:
the set of reference sequences for a particular virulence gene was set as: { s 1 ,s 2 ,…,s n }; wherein s is n : reference sequences n, n being the total number of reference sequences;
comparing high-quality reads (clean reads) of the metagenome to a reference sequence set by using a multiple comparison algorithm, wherein a threshold value e-value is 1 e-5;
the alignment of each read was: { R 1 ,R 2 ,…,R m }∈g i (ii) a Wherein m is more than or equal to 0 and less than or equal to n; r m : the result of the mth alignment; g i : the ith gene unit;
filtration strategy for detection of virulence genes (VF-result):
Figure GDA0003737837600000051
wherein, id ═ sequence similarity score (%);
Figure GDA0003737837600000052
score is the quality score of pairwise alignment of sequences;
and (3) filtering conditions: VF-result belonging to g i ,i>1, discarding and obtaining no result;
and optionally, building software to implement automated alignment, filtering, and result list generation.
In some embodiments of the invention, the alignment of S30 comprises:
when the comparison result is a single result (m is 1), taking the result (R) m ) As final result (r);
when multiple alignment results (m)>1) And the target reference sequence is the same gene unit of the same species, and after scoring and sorting, the final result r i The following:
Figure GDA0003737837600000061
when there are multiple comparisons (m)>1) And the targeting reference sequences are different gene units (g) of the same species i ,i>1) The final result r is the union of the best results in each gene group { r 1 ,r 2 ,…,r i In which g is i The results in the grouping are as follows:
Figure GDA0003737837600000062
score is the quality score of pairwise alignment of sequences, Max is the maximum value after ranking of the targeting reference sequence as the same genetic unit of the same species, identity is the consistency calculation function, and Max is the maximum value function.
In some embodiments of the invention, the S40 includes the following steps:
collecting metagenome sequencing data of a pathogenic bacteria clinical sample;
analyzing the data based on the S20 and S30, and constructing a virulence gene spectrum of the single pathogenic bacterium of each sample;
extracting and standardizing corresponding clinical phenotype and physiological and biochemical indexes of the sample;
analyzing and extracting gene characteristics by using a maximum likelihood method;
constructing a virulence gene characteristic spectrum related to clinical diagnosis by applying a multi-machine learning strategy by combining clinical routine detection indexes and sequence characteristics (PAAC and PSSM-C) of virulence genes;
clustering synergistic virulence genes into single virulence factor units, correlating the corresponding characteristics (functional/clinical phenotype);
constructing a virulence factor-virulence gene and virulence factor-characterization (function/clinical phenotype) association table, and establishing a clinically important virulence gene-virulence factor-characterization (function/clinical phenotype) association database;
and optionally, automated alignment, filtering, and result list generation are implemented by software.
In some embodiments of the present invention, the analyzing and extracting gene features using the maximum likelihood method in S40 includes:
extraction of protein sequence physicochemical characteristics (PAAC) of virulence genes:
Figure GDA0003737837600000063
ai: the physicochemical feature set of 20 amino acids,
Figure GDA0003737837600000064
nth position physicochemical characteristic, N: the total number of physicochemical characteristics of amino acid;
wherein, the physical and chemical characteristics of single amino acid are as follows:
Figure GDA0003737837600000065
n is 1,2, …, N; x is 1 to n;
for any two amino acids R b And R d The correlation of (A) is:
Figure GDA0003737837600000071
F q (R b ) Is R b Physicochemical characteristics of position q of (1), F q (R d ) Is R d Physicochemical characteristics of the q-th position of (1);
for amino acid sequences of length L, the sequence position correlation parameter θ h The definition is as follows:
Figure GDA0003737837600000072
then, the physicochemical feature extraction formula for amino acid e in the 20+ λ (λ ═ 2) dimensional sequence is as follows:
Figure GDA0003737837600000073
wherein f is e : the frequency of amino acid e in the sequence; ω: the default value of the position weighting parameter of amino acid in the sequence is 0.1, theta e-20 Reflecting the effect of the order parameter of amino acid e.
Extraction of evolutionary features (PSSM) of virulence protein sequences:
the protein sequence of the virulence genes within the transforming gene units is the original PSSM matrix as follows:
Figure GDA0003737837600000074
wherein, L: the length of the sequence; 20, column number presents 20 natural amino acids; p is a radical of formula u,v : possibility of evolutionary mutation of the u amino acid to the v amino acid;
PSSM-C the PSSM matrix was transformed into a 20x20 matrix, with amino acid Z in row u u The calculation is as follows:
Figure GDA0003737837600000075
wherein the content of the first and second substances,
Figure GDA0003737837600000076
z t : the value of the t-th bit in the original PSSM table; p is a radical of t : amino acid at position t in the sequence; l: the length of the sequence; a is u Is the u-th amino acid among the 20 amino acids.
In some embodiments of the invention, the S50 includes the following steps:
importing the result list obtained in the S30, comparing the result list with the S40 association database to generate a virulence gene result, wherein the virulence gene result comprises pathogenic bacteria species (species Latin name and Chinese name) and gene information (gene name, virulence factor, characterization, support score and the like);
importing the result into a corresponding table of a report template;
importing the customer information of the database into a report template;
a virulence gene identification report (PDF format) of the particular pathogen is generated for the final clinical sample.
The second aspect of the invention discloses a metagenome-based system for detecting virulence genes of clinically important pathogenic bacteria, which comprises the following components:
a clinical pathogen virulence gene database;
important virulence genes-virulence factors-characterization (functional/clinical phenotype) association databases;
a multiple alignment system and a multiple annotation system for metagenomic sequencing data;
a clinical automated reporting system.
The beneficial technical effects of the invention are as follows:
(1) a clinical important pathogenic bacterium virulence gene detection system based on metagenome is established, the multi-aspect limitations of the prior art and the method are overcome, virulence gene detection and identification can be carried out on clinical infection samples with different sample types (cerebrospinal fluid, lung lavage fluid, blood, throat swabs and the like) and low nucleic acid content, hundreds of important virulence genes of various clinical pathogenic bacteria can be identified at one time, and the additional screening time is reduced. The sensitivity and accuracy of identification are improved by the deep level database and the two-step comparison strategy. The clinical automatic report system can quickly generate reports to help doctors to diagnose, treat and prognose in time;
(2) constructing a comprehensive and manually corrected virulence gene database of clinically important pathogenic bacteria, wherein the database comprises all reference sequences of hundreds of important virulence genes of various clinically important pathogenic bacteria, and corrected species and function annotation information;
(3) the machine learning algorithm is applied to carry out literature and clinical big data mining, the high frequency and important virulence gene spectrum of each pathogenic bacterium and virulence factors and characteristics (functions/clinical phenotypes) thereof are identified, the pathogenic bacterium is divided into different virulence factor units according to the synergistic effect of the genes, and a strong association knowledge base of the important virulence genes and the virulence factors and the characteristics (functions/clinical phenotypes) thereof is established, so that the method has more reference value for clinical diagnosis and prognosis treatment;
(4) based on a comparison grading threshold filtering algorithm after large sample analysis, the method improves the sensitivity of a virulence gene detection result while considering the comparison accuracy, overcomes the limitation of the prior related technology on low-abundance and short-reading sample identification, and is particularly suitable for the virulence gene detection and identification of clinical samples (such as cerebrospinal fluid) with single-end short reading (50-75 bp) and low nucleic acid content;
(5) the comparison result of metagenome data and virulence factor and characterization (function/clinical phenotype) information of important virulence genes are integrated in a clinical automatic report system, and the system has higher reliability and clinical practicability.
Drawings
FIG. 1 is a flow chart of a method for detecting a virulence gene of a clinically important pathogen according to an embodiment of the present invention;
FIG. 2 is a flow chart of the operation of the gene detection system for virulence of clinically important pathogenic bacteria according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Example 1
As shown in figure 1, the method for detecting the virulence gene of clinically important pathogenic bacteria based on metagenome mainly comprises the following steps:
1. establishing clinical pathogenic bacteria virulence gene database
1.1. 1761 virulence genes and sequences of 24 important pathogenic bacteria (covering 18 genera, 10 gram-negative bacteria and 14 gram-positive bacteria) are obtained from a virulence database such as VFDB;
1.2. downloading from a public database (NCBI RefSeq) all genomes and gene sequences and annotation information for filtering 24 pathogens;
1.3. filtering pseudogenes, segments and misannotated sequences in the downloaded sequence by using self-developed software;
1.4. clustering each gene unit sequence by multiple thresholds, and performing cross comparison and de-duplication in groups;
1.5. simulating a data set to test a gene unit reference gene sequence, and adjusting a supplementary gene unit reference sequence;
1.6. 1.4, clustering the gene unit sequences, and performing cross comparison and duplication removal in groups;
1.7. clustering reference sequences of all gene units, and filtering abnormal sequences;
1.8. extracting NCBI annotation information such as gene names and species names of the reference sequences by using a regular expression, and proofreading and standardizing the annotation of the reference sequences of all gene units;
1.9. establishing reference sequence indexes of all virulence gene units;
1.10. the software (VF _ MKDB) implements the automatic download sequence, cluster deduplication, update and standardize databases.
2 obtaining clinical sample metagenome sequencing original data, preprocessing the original data to obtain target data
2.1. Filtering reads with a quality value below 2 and a base count of 40% of the total read;
2.2 excising bases with an average mass of less than 20 bases within the sliding window (5 bp);
2.3 Filtering reads with average mass less than 20, N number greater than 5, and length less than 50.
3, analyzing target data by using a preset metagenome sequencing data multiple comparison system and a multiple annotation system, and identifying virulence genes based on a metagenome sequencing Read (Read) two-step comparison strategy and a judgment method as follows:
3.1. the set of reference sequences for a particular virulence gene was set as: { s 1 ,s 2 ,…,s n }; wherein s is n : a reference sequence n;
3.2. high quality reads (clean reads) to reference sequence set (threshold) for metagenome alignment using multiple alignment algorithm
e-value=1e-5);
3.3. The alignment of each read was: { R 1 ,R 2 ,…,R m }∈g i (ii) a Wherein m is more than or equal to 0 and less than or equal to n; r m : the result of the mth alignment; g i : the ith gene unit;
3.4, step one:
if the comparison result is a single result (m is 1), taking the result (R) m ) As final result (r);
3.5. if there are multiple alignments (m >1), two cases:
3.5.1 the targeting reference sequence is the same gene unit of the same species, after scoring and ordering, the final result r i The following were used:
Figure GDA0003737837600000101
3.5.2 targeting reference sequences are different Gene units (g) of the same species i ,i>1) The final result r is the union of the best results in each gene group { r } 1 ,r 2 ,…,r i In which g is i The results in the grouping are as follows:
Figure GDA0003737837600000102
3.6. step two:
filtration strategy for virulence gene detection results (VF-result):
Figure GDA0003737837600000103
wherein, id ═ sequence similarity score (%);
Figure GDA0003737837600000104
score is the quality score of pairwise alignment of sequences;
and (3) filtering conditions: VF-result belonging to g i ,i>1, discard, result None (None);
3.7. the software (VF _ Finder) implements automated alignment, filtering, and result list generation.
4. Establishing important virulence gene-virulence factor-characterization (function/clinical phenotype) correlation database
Metagenomic sequencing data collection of 4.1.24 pathogen clinical samples (approximately 50 samples/individual pathogen);
4.2. analyzing the data based on the metagenome sequencing data multiple comparison system and the multiple annotation system, and constructing a virulence gene profile of a single pathogenic bacterium in each sample;
4.3 extracting and standardizing corresponding clinical phenotype and physiological and biochemical indexes of the sample, which mainly comprises the following steps: white blood cell count, neutrophil count, monocyte fraction, lymphocyte fraction, C-reactive protein, endotoxin, etc.;
4.4. analyzing and extracting gene characteristics by using a maximum likelihood method:
4.4.1. extraction of protein sequence physicochemical characteristics (PAAC) of virulence genes:
Figure GDA0003737837600000105
ai: the physicochemical feature set of 20 amino acids,
Figure GDA0003737837600000106
nth position physicochemical characteristic, N: the total number of physicochemical characteristics of amino acid;
wherein, the physical and chemical characteristics of single amino acid are as follows:
Figure GDA0003737837600000111
n is 1,2, …, N; x is 1 to n;
for any two amino acids R b And R d The correlation of (A) is:
Figure GDA0003737837600000112
F q (R b ) Is R b Physicochemical characteristics of position q of (1), F q (R d ) Is R d The physicochemical characteristics of the q-th position of (1);
for amino acid sequences of length L, the sequence position correlation parameter θ h The definition is as follows:
Figure GDA0003737837600000113
then, the physicochemical feature extraction formula for amino acid e in the 20+ λ (λ ═ 2) dimensional sequence is as follows:
Figure GDA0003737837600000114
wherein f is e : the frequency of amino acid e in the sequence; ω: the amino acid position in the sequence is weighted by a parameter with a default value of 0.1.
4.4.2. Extraction of evolutionary features (PSSM) of virulence protein sequences:
the protein sequence of the virulence genes within the gene unit transformed using PSI-BLAST is the original PSSM matrix (Position-specific targeting matrix) as follows:
Figure GDA0003737837600000115
wherein, L: the length of the sequence; 20, column number presents 20 natural amino acids; p is a radical of formula u,v : possibility of evolutionary mutation of the u amino acid to the v amino acid;
PSSM-C (PSSM-composition) the PSSM matrix was transformed into a 20x20 matrix in which amino acid Z in the u-th row u The calculation is as follows:
Figure GDA0003737837600000116
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003737837600000117
z t : original sourceThe value of the t-th bit in the starting PSSM table; p is a radical of t : amino acid at position t in the sequence; l: the length of the sequence; a is u Is the u-th amino acid among the 20 amino acids.
4.5. Constructing a virulence gene characteristic spectrum related to clinical diagnosis by applying a multi-machine learning strategy (multi-task logistic regression, random forest, support vector machine and the like) by combining clinical routine detection indexes and sequence characteristics (PAAC and PSSM-C) of virulence genes;
4.6. clustering synergistic virulence genes into single virulence factor units, correlating the corresponding characteristics (functional/clinical phenotype);
4.7. constructing a virulence factor-virulence gene and virulence factor-characterization (function/clinical phenotype) association table, and establishing a clinically important virulence gene-virulence factor-characterization (function/clinical phenotype) association database;
4.8. and the software (VF-KDB) realizes the collection, analysis and upgrading of data.
5. Generating a virulence gene identification report based on the virulence gene identification result and the associated database by using a preset clinical automation report system
5.1. Importing a result list obtained by the metagenomic sequencing data multiple comparison system and the multiple annotation system, comparing an important virulence gene-virulence factor-characterization (function/clinical phenotype) association database, and automatically generating a gene result list (text format): comprises pathogenic bacteria species (species Latin name, Chinese name) and gene information (gene name, virulence factor, characterization and support score, etc.);
5.2. the program automatically leads the results into corresponding forms of the report template;
5.3. the program automatically leads the client information of the database into a report template;
5.5. a virulence gene identification report (PDF format) of the particular pathogen is generated for the final clinical sample.
As shown in fig. 2, the metagenome-based system for detecting virulence genes of clinically important pathogenic bacteria comprises the following components:
1. a clinical pathogen virulence gene database;
2. important virulence genes-virulence factors-characterization (functional/clinical phenotype) association databases;
3. a metagenomic sequencing data multiple alignment system and a multiple annotation system;
4. a clinical automated reporting system.
While the preferred embodiments and examples of the present invention have been described in detail, the present invention is not limited to the embodiments and examples, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (7)

1. A metagenome-based method for detecting virulence genes of clinically important pathogenic bacteria is characterized by comprising the following steps:
s10, establishing a clinical pathogenic bacterium virulence gene database;
s20, acquiring original data of clinical sample metagenome sequencing, and preprocessing to obtain target data;
s30, analyzing the target data by using a preset metagenome sequencing data multiple comparison annotation system, and identifying virulence genes;
s40, establishing an important virulence gene-virulence factor-characterization correlation database;
s50, generating a virulence gene identification report based on the virulence gene identification result and the associated database by using a preset clinical automation report system;
the S30 includes:
the set of reference sequences for a particular virulence gene was set as: { s 1 ,s 2 ,…,s n }; wherein s is n : reference sequences n, n being the total number of reference sequences;
comparing the high-quality reading sequence of the metagenome to a reference sequence set by using a multiple comparison algorithm, wherein the threshold value e-value is 1 e-5;
the alignment of each read was: { R 1 ,R 2 ,…,R m }∈g i (ii) a Wherein m is more than or equal to 0 and less than or equal to n; r m : the result of the mth alignment; g i : the ith gene unit;
filtration strategy of detection results of virulence genes:
VF-result=
Figure 353395DEST_PATH_IMAGE001
wherein, id ═ sequence similarity score (%);
cov=100
Figure 680472DEST_PATH_IMAGE002
Figure 168085DEST_PATH_IMAGE003
);
score is the quality score of pairwise alignment of sequences;
and (3) filtering conditions: VF-result belonging to g i ,i>1, discarding and obtaining no result;
the alignment result in S30 includes:
when the comparison result is a single result, taking the result as a final result;
when a plurality of comparison results and the target reference sequence are the same gene unit of the same species, and after scoring and sorting, the final result r i The following:
Figure 846191DEST_PATH_IMAGE004
=
Figure 77452DEST_PATH_IMAGE005
wherein Score is the quality Score of pairwise alignment of sequences; r is m : the result of the mth alignment; max is the largest sequence of the target reference sequence after the scores of the same gene units of the same species are sorted; identity is a consistency calculation function;
when a plurality of comparison results and the target reference sequences are different gene units of the same species, the final result is the union of the optimal results in each gene group, wherein g i The results in the grouping are as follows:
Figure 524614DEST_PATH_IMAGE006
=
Figure 183128DEST_PATH_IMAGE007
wherein Score is the quality Score of pairwise alignment of sequences; r m : the result of the mth alignment; max is the largest sequence of the target reference sequence after the scores of the same gene units of the same species are sorted; identity is a consistency calculation function;
the S40 includes:
collecting metagenome sequencing data of a pathogenic bacteria clinical sample;
analyzing the data based on the S20 and S30, and constructing a virulence gene spectrum of the single pathogenic bacteria of each sample;
extracting and standardizing corresponding clinical phenotype and physiological and biochemical indexes of the sample;
analyzing and extracting gene characteristics by using a maximum likelihood method;
combining clinical routine detection indexes and sequence characteristics of virulence genes, and applying a multiple machine learning strategy to construct a virulence gene characteristic spectrum related to clinical diagnosis;
clustering virulence genes with synergistic action to a single virulence factor unit, and associating corresponding representations;
constructing a virulence factor-virulence gene and virulence factor-characterization association table, and establishing a clinically important virulence gene-virulence factor-characterization association database;
the S50 includes:
importing the result list obtained in the S30, and comparing the result list with the S40 association database to generate a virulence gene result which comprises pathogenic bacteria species and gene information;
importing the result into a corresponding table of a report template;
importing the client information of the database into a report template;
generating a virulence gene identification report of the specific pathogenic bacteria of the final clinical sample;
the characterization is a functional characterization or a clinical phenotypic characterization.
2. The method according to claim 1, wherein the S30 further comprises:
software is established to realize automatic comparison, filtering and result list generation.
3. The method according to claim 1, wherein the S40 further comprises:
software is established to realize automatic comparison, filtering and result list generation.
4. The method according to claim 1, wherein the S10 includes:
obtaining the virulence genes and sequences of clinical pathogenic bacteria from a virulence gene database;
acquiring all genomes, gene sequences and annotation information of the clinical pathogenic bacteria from a public database;
filtering pseudogenes, fragments, and misannotated sequences in the gene sequence;
clustering each gene unit sequence by multiple thresholds, and performing cross comparison and de-duplication in groups;
simulating a data set to test a gene unit reference gene sequence, and adjusting a supplementary gene unit reference sequence;
clustering each gene unit sequence by circulating multiple threshold values, and performing cross comparison and duplication removal in groups;
clustering reference sequences of all gene units, and filtering abnormal sequences;
extracting public database annotation information of the reference sequence, and proofreading and standardizing reference sequence annotations of each gene unit;
and establishing reference sequence indexes of all virulence gene units.
5. The method according to claim 1, wherein the S10 further comprises:
and establishing a database for realizing automatic downloading sequence, cluster duplicate removal, updating and standardization by software.
6. The method according to claim 1, wherein the S20 includes:
filtering reads with a mass value below 2 and a base count of 40% of the total read;
excising bases having an average mass of bases less than 20 within the window;
filtering reads with average quality less than 20, N number greater than 5, and length less than 50.
7. A system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome comprises the following components:
a clinical pathogen virulence gene database;
important virulence genes-virulence factors-characterization association databases;
a multiple alignment annotation system for metagenomic sequencing data;
a clinical automated reporting system;
the system employs the method of claim 1 for detection.
CN202110579642.1A 2021-05-26 2021-05-26 Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome Active CN113223618B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110579642.1A CN113223618B (en) 2021-05-26 2021-05-26 Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110579642.1A CN113223618B (en) 2021-05-26 2021-05-26 Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome

Publications (2)

Publication Number Publication Date
CN113223618A CN113223618A (en) 2021-08-06
CN113223618B true CN113223618B (en) 2022-09-16

Family

ID=77099541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110579642.1A Active CN113223618B (en) 2021-05-26 2021-05-26 Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome

Country Status (1)

Country Link
CN (1) CN113223618B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113782100B (en) * 2021-11-10 2022-02-18 中国人民解放军军事科学院军事医学研究院 Method for identifying plasmid type carried by bacterial population based on bacterial genome high-throughput sequencing data
CN114038501B (en) * 2021-12-21 2022-05-27 广州金匙医学检验有限公司 Background bacterium judgment method based on machine learning
CN114574606B (en) * 2022-04-02 2023-04-28 予果生物科技(北京)有限公司 Primer group for detecting mycobacterium tuberculosis in metagenome and high-throughput sequencing method
CN115985400B (en) * 2022-12-02 2024-03-15 江苏先声医疗器械有限公司 Method for reassigning metagenome multiple comparison sequences and application

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112542214A (en) * 2020-12-18 2021-03-23 昆明金域医学检验所有限公司 Causal analysis method for Granger among multiple flora based on pathogenic microorganism metagenome

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008128111A1 (en) * 2007-04-13 2008-10-23 Sequenom, Inc. Comparative sequence analysis processes and systems
US8478544B2 (en) * 2007-11-21 2013-07-02 Cosmosid Inc. Direct identification and measurement of relative populations of microorganisms with direct DNA sequencing and probabilistic methods
US10513737B2 (en) * 2011-12-13 2019-12-24 Decipher Biosciences, Inc. Cancer diagnostics using non-coding transcripts
DK3011059T3 (en) * 2013-06-20 2019-05-13 Immunexpress Pty Ltd IDENTIFICATION biomarker
EP3141612A1 (en) * 2015-09-10 2017-03-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and device for nucleic acid based diagnostic approaches including the determination of a deviant condtion, especially a health condition and/or pathogenic condition of a sample
CN105950732B (en) * 2016-05-25 2019-12-13 中国农业大学 Animal-derived food pathogenic bacteria identification and drug-resistant and virulence gene detection composite chip
US11749381B2 (en) * 2016-10-13 2023-09-05 bioMérieux Identification and antibiotic characterization of pathogens in metagenomic sample
EP3571308A4 (en) * 2016-12-21 2020-08-19 The Regents of The University of California Single cell genomic sequencing using hydrogel based droplets
CN107391965A (en) * 2017-08-15 2017-11-24 上海派森诺生物科技股份有限公司 A kind of lung cancer somatic mutation determination method based on high throughput sequencing technologies
CN110349630B (en) * 2019-06-21 2023-03-14 深圳华大因源医药科技有限公司 Analysis method and device for blood metagenome sequencing data and application thereof
CN111192630B (en) * 2019-12-24 2023-10-13 中国科学院生态环境研究中心 Metagenomic data mining method
CN111276185B (en) * 2020-02-18 2023-11-03 上海桑格信息技术有限公司 Microorganism identification analysis system and device based on second-generation high-throughput sequencing
CN111187813B (en) * 2020-02-20 2020-12-04 予果生物科技(北京)有限公司 Full-process quality control pathogenic microorganism high-throughput sequencing detection method
CN111445955B (en) * 2020-04-10 2021-09-10 广州微远医疗器械有限公司 Novel coronavirus variation analysis method and application
CN111491023B (en) * 2020-04-10 2021-10-26 西咸新区予果微码生物科技有限公司 Microbial detection system based on CRISPR technology
CN113689912A (en) * 2020-12-14 2021-11-23 广东美格基因科技有限公司 Method and system for correcting microbial contrast result based on metagenome sequencing
CN112837745B (en) * 2021-01-15 2023-11-21 广州微远基因科技有限公司 Pathogenic microorganism virulence gene association model and establishment method and application thereof
CN112687344B (en) * 2021-01-21 2021-09-10 予果生物科技(北京)有限公司 Human adenovirus molecule typing and tracing method and system based on metagenome

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112542214A (en) * 2020-12-18 2021-03-23 昆明金域医学检验所有限公司 Causal analysis method for Granger among multiple flora based on pathogenic microorganism metagenome

Also Published As

Publication number Publication date
CN113223618A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN113223618B (en) Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome
Wu et al. Guild-based analysis for understanding gut microbiome in human health and diseases
Okura et al. Current taxonomical situation of Streptococcus suis
Gong et al. Advances in the methods for studying gut microbiota and their relevance to the research of dietary fiber functions
Bryant et al. Chips with everything: DNA microarrays in infectious diseases
Links et al. The chaperonin-60 universal target is a barcode for bacteria that enables de novo assembly of metagenomic sequence data
CN111378788B (en) Bacterial marker for assisting COVID-19 diagnosis and application thereof
US20200294628A1 (en) Creation or use of anchor-based data structures for sample-derived characteristic determination
CN114898800B (en) Method and system for predicting sensitivity of klebsiella pneumoniae to ceftriaxone
CN114898808B (en) Method and system for predicting sensitivity of Klebsiella pneumoniae to cefepime
Be et al. Detection of Bacillus anthracis DNA in complex soil and air samples using next-generation sequencing
Khademi et al. Phylogenetic relationships among Staphylococcus aureus isolated from clinical samples in Mashhad, Iran
CN111647673A (en) Application of microbial flora in acute pancreatitis
Osek et al. Listeria monocytogenes in foods—From culture identification to whole‐genome characteristics
Ojha et al. Examination of animal and zoonotic pathogens using microarrays
JP2023501538A (en) Identification of host RNA biomarkers of infection
CN108384782B (en) Kit and kit for detecting pathogens causing bloodstream infections
Oliveira Haemophilus parasuis diagnostics
Torres-Morales et al. Site-specialization of human oral Gemella species
CN114854847A (en) Method for constructing machine learning model for identifying infectious diseases and non-infectious diseases
CN109652573B (en) For Salmonella typhimurtum or the site VNTR, detection primer group and the determination method of its single-phase bacterium mutation parting detection
CN113862382A (en) Application of biomarker of intestinal flora in preparation of product for diagnosing adult immune thrombocytopenia
Xu et al. Application of Next Generation Sequencing in identifying different pathogens
CN114045353B (en) Microbial markers associated with norovirus infectious diarrhea and uses thereof
Zhu et al. MSI: strain-level pathogen detection from nanopore metagenomic sequencing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant