CN113223618A - Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome - Google Patents

Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome Download PDF

Info

Publication number
CN113223618A
CN113223618A CN202110579642.1A CN202110579642A CN113223618A CN 113223618 A CN113223618 A CN 113223618A CN 202110579642 A CN202110579642 A CN 202110579642A CN 113223618 A CN113223618 A CN 113223618A
Authority
CN
China
Prior art keywords
virulence
gene
clinical
sequence
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110579642.1A
Other languages
Chinese (zh)
Other versions
CN113223618B (en
Inventor
夏涵
官远林
江月
樊淑
杨静
胡煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yuguo Microcode Biotechnology Co ltd Of Xixian New Area
Yuguo Zhizao Technology Beijing Co ltd
Yuguo Biotechnology Beijing Co ltd
Original Assignee
Yuguo Microcode Biotechnology Co ltd Of Xixian New Area
Yuguo Zhizao Technology Beijing Co ltd
Yuguo Biotechnology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yuguo Microcode Biotechnology Co ltd Of Xixian New Area, Yuguo Zhizao Technology Beijing Co ltd, Yuguo Biotechnology Beijing Co ltd filed Critical Yuguo Microcode Biotechnology Co ltd Of Xixian New Area
Priority to CN202110579642.1A priority Critical patent/CN113223618B/en
Publication of CN113223618A publication Critical patent/CN113223618A/en
Application granted granted Critical
Publication of CN113223618B publication Critical patent/CN113223618B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method and a system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome. The method comprises the following steps: s10, establishing a clinical pathogenic bacterium virulence gene database; s20, acquiring original data of clinical sample metagenome sequencing, and preprocessing the original data to acquire target data; s30, analyzing target data by using a preset metagenome sequencing data multiple-comparison annotation system, and identifying virulence genes; s40, establishing an important virulence gene-virulence factor-characterization (function/clinical phenotype) correlation database; and S50, generating a virulence gene identification report based on the virulence gene identification result and the associated database by using a preset clinical automation report system. The system can identify virulence genes of metagenome sequencing data of clinical infection samples of different types (cerebrospinal fluid and the like), can identify a plurality of important virulence genes of a plurality of pathogenic bacteria in the samples at one time, has better sensitivity and accuracy, and helps doctors to diagnose, treat and prognose in time.

Description

Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome
Technical Field
The invention belongs to the technical field of biological information algorithm software. Can be applied to clinical pathogen detection products, namely the analysis of clinical pathogen virulence genes for pathogen metagenome detection, and comprises hundreds of virulence genes of various clinical pathogens. The application field is as follows: the identification, identification and traceability of pathogenic bacteria virulence genes detected by pathogen metagenome of samples such as tissues, body fluids (cerebrospinal fluid, alveolar lavage fluid, blood and sputum) and the like of various infectious disease patients assist clinicians in accurate diagnosis, treatment scheme selection and prognosis judgment, and provide useful information in monitoring bacterial infection diseases.
Background
The bacterial infection can cause various acute and chronic diseases, and can also be used as conditional pathogenic bacteria to cause diseases through the interaction among pathogeny, host and environmental factors, wherein certain clinical pathogenic bacteria can seriously harm the life and health of human beings. Such as Staphylococcus aureus (Staphylococcus aureus), which often causes pyogenic infection in humans, can directly cause pneumonia, pseudomembranous enteritis, pericarditis, and even septicemia, sepsis and other systemic infections. Klebsiella pneumoniae (Klebsiella pneumoniae) is widely present on animal mucosal surfaces (such as human gastrointestinal tract) or environments and is a main pathogen of hospital medical related infection and severe community-acquired infection. In china, klebsiella pneumoniae accounts for 11.9% of pathogens isolated from ventilator-associated pneumonia and intensive care unit-acquired pneumonia. Streptococcus pneumoniae (Streptococcus pneumoniae) is one of the main pathogenic bacteria of community-acquired pneumonia, otitis media, meningitis, abscess, septicemia and the like. In developing countries, over 110 million children die each year from pneumonia, with streptococcus pneumoniae accounting for approximately 20%. Another type of treatment that is troublesome, has a high mortality rate, and often exhibits multiple or pan-drug resistance is pseudomonas aeruginosa, a gram-negative bacterium that is susceptible to colonization and infection in the respiratory tract, particularly in immunocompromised persons. These clinical pathogens often exert their pathogenicity through multiple virulence genes during the course of infecting humans, leading to disease development.
Virulence factors are a general term for the functional units of a class of effector or regulatory molecules (proteins, lipid molecules or compounds, etc.) and combinations thereof, which are produced by pathogenic microorganisms and cause host disease to occur. The genes encoding these virulence factors are often referred to as virulence genes. For example, staphylococcus aureus can realize adhesion, infection and dissemination to host cells by producing various virulence factors, and can escape from the action of a host immune system or antibiotics by forming a biofilm, important virulence genes comprise pvl, sea, seb and the like, pvl can promote neutrophilic granulocyte lysis to endow the strain with strong pathogenicity, and the staphylococcus aureus is related to skin and soft tissue suppurative infection, severe patients can cause necrotizing pneumonia, and the lethality rate is high; the enterotoxin genes sea, seb, etc. can stimulate vomiting center to cause acute gastroenteritis with vomiting as main symptom, and are the main reason for bacterial food poisoning of human beings. And the streptococcus pneumoniae has various virulence genes, such as capsular polysaccharide synthetase A gene, pneumolysin gene (ply), lytA, nanA and the like. Wherein the capsular related gene such as cps4A is a prerequisite for pathogenicity of streptococcus pneumoniae; hemolysin can cause host cell lysis, cause alveolar edema and hemorrhage, and induce pneumonia, as well as cause bacteremia by forcing bacteria into the blood; lytA is involved in bacterial autolysis, resulting in secretion of hemolysin and other components, which may cause a strong inflammatory response in the host. The difference of virulence genes of different strains of Klebsiella pneumoniae can cause the difference of pathogenicity of the Klebsiella pneumoniae, an important virulence gene of the Klebsiella pneumoniae has rmpA, the synthesis of capsular polysaccharide of the Klebsiella pneumoniae can be adjusted, the high mucus phenotype of the Klebsiella pneumoniae is generated, the high pathogenicity of the Klebsiella pneumoniae causes the strong pathogenicity of the strains, and the virulence function of the Klebsiella pneumoniae and the virulence genes such as iUTA influence the formation of liver abscess. These virulence genes encoding virulence factors such as toxins and surface proteins can help bacteria to adhere to and invade host cells, improve survival and propagation of the bacteria in the host cells, cause toxic death of the host cells, and the like, and finally cause various infectious diseases of the host. Therefore, the detection and identification of the virulence genes of the high-frequency or important pathogenic bacteria of the clinical samples are beneficial to identifying potential pathogenic bacteria, evaluating the virulence of clinical strains, and assisting the selection and implementation of specific measures such as diagnosis, accurate treatment, prognosis treatment and the like of clinical infectious diseases. On the other hand, in the field of public health, detection of virulence genes and identification of specific virulence spectrums thereof provide useful information on monitoring of bacterial infection diseases, judgment of epidemic outbreak probability, evaluation of epidemic severity and the like, and help to propose and implement reasonable disease control measures.
The method for detecting and identifying virulence genes of clinical pathogenic bacteria mainly comprises the following steps: single/multiple gene detection, loop-mediated isothermal amplification, gene chip and second-generation sequencing metagenome detection technology based on Polymerase Chain Reaction (PCR) and derivative technology thereof. Among them, the most widely used in clinical practice is the PCR technique, which mainly aims at the conserved region of the nucleic acid sequence of a specific virulence gene to design a specific primer, and takes the nucleic acid of a clinical sample or an isolated strain as a template for amplification detection. The technology can realize the rapid detection of the gene and has the characteristic of high sensitivity. The clinical application mainly comprises: 1) multiplex PCR technique: two pairs or more than two pairs of primers are added into the same PCR reaction system, a plurality of nucleic acid fragments are amplified simultaneously, and two or more than two virulence genes can be detected and identified simultaneously; 2) fluorescent quantitative PCR technology: on the basis of common PCR, the real-time detection of the fluorescence signal of each cycle product is added, so that the quantitative and qualitative analysis of the initial template DNA is realized. Meanwhile, the two PCR technologies also have the defects of complex operation, high requirements on instruments and personnel, unsuitability for rapid on-site diagnosis and the like. The loop-mediated isothermal amplification (LAMP) is a novel nucleic acid amplification technology different from PCR, relies on DNA polymerase with strand displacement activity and 2 pairs of specially designed primers, does not need repeated temperature cycle and expensive instruments and equipment, can efficiently and quickly complete the amplification reaction under isothermal conditions, and is widely applied to detection and identification of pathogens such as bacteria, viruses, parasites and the like at present. Compared with the common PCR technology, LAMP has the characteristics of high specificity, high sensitivity, simple operation, low requirements on instruments and equipment, capability of quickly completing nucleic acid amplification under a constant temperature condition and the like. The defects are that the design requirement on the primer is high, the non-specific amplification is not easy to distinguish, the pollution influence is large, and the like. A gene chip, also called DNA microarray, refers to a dense molecular array formed by fixing a large number of DNA probes such as gene fragments and artificially synthesized oligonucleotides on a carrier in a pre-designed manner by using in-situ synthesis (in-situ synthesis) or micro-spotting and other methods, hybridizing with a nucleic acid sample labeled by fluorescein or other methods, and determining the presence or absence of a target gene in the sample and quantifying by detecting the strength of a hybridization signal. Recent developments have led to the use of gene chip technology in the fields of gene expression analysis, mutation and polymorphism analysis, and the like. Compared with PCR or LAMP technology, the gene chip technology has the advantages of capability of realizing detection of a large number of genes in one experiment, rapidness, high parallelism, diversity, automation and the like. On the other hand, the gene chip has high detection cost, high operation requirement and poor sensitivity, which results in limited application range. No matter PCR, LAMP or gene chip, the prior knowledge of the sample is needed to be known, only the specific virulence genes of specific bacteria can be detected, the genes with large variation and unpredicted variation are difficult to deal with, and the virulence genes with important clinical significance cannot be completely covered. The Metagenomic sequencing technology (Metagenomic sequencing) which is rapidly developed in recent years and is based on the next generation sequencing has unique advantages in overcoming the defects. The metagenome sequencing does not need to separately culture pathogen separation, and clinical samples can be directly analyzed through nucleic acid extraction and purification. And (3) carrying out comprehensive virulence gene annotation and identification by utilizing sequence homology comparison.
A plurality of pathogenic bacteria exist in a clinical infection disease sample, the pathogenic mechanism of the pathogenic bacteria relates to different virulence factors, and pathogenicity is generated by the synergistic regulation and control effect of a plurality of virulence genes. The prior art relating to the detection of microbial virulence genes comprises PCR and derivative technology thereof, loop-mediated isothermal amplification, gene chips and the like, and has the problems of limited number and range of virulence gene detection, prior cognition, easy cross contamination and the like. The current products for virulence gene identification are based primarily on PCR techniques, which can only detect a limited range of bacteria and a limited number of virulence genes. In particular, in the case of ordinary PCR, only one virulence gene of one bacterium can be detected in one experiment, for example, in Chinese patent publication CN110669853A, only the ampR gene of Klebsiella pneumoniae, which is not sticky, can be detected. Even in the case of multiplex PCR, it is necessary to consider the problem that an excessive number of primer pairs will easily form dimers and affect the amplification efficiency, resulting in a small number of virulence genes to be detected, for example, in Chinese patent publication CN111876509A, four virulence genes such as abaR, CsuA, and bap of Acinetobacter baumannii are detected at a time by applying the multiplex PCR technology, 7 virulence genes of Aeromonas are detected by the multiplex PCR product of Chinese patent publication CN109554449A, and a seven-fold PCR detection primer set is designed by the Chinese patent publication CN108707680A technology, and only covers specific regions of 21 virulence genes such as sip, fbsA, and hylB of Streptococcus agalactiae. The multiplex fluorescent PCR technology is also applied to the detection of virulence genes due to the convenience of result interpretation, for example, the design of multiplex fluorescent PCR in the Chinese patent publication CN112430677A submitted in 2020 quantitatively detects three virulence genes of icuA, rmpA1 and rmpA2 of Klebsiella pneumoniae. Meanwhile, the loop-mediated isothermal amplification technology developed in recent years is also applied to clinical virulence gene detection. For example, Chinese patent publication CN11150075A discloses that 2 pairs of primers are used to amplify 6 different regions of peg-344 gene of Klebsiella pneumoniae with high virulence, so as to identify clinical high virulence strains. The gene chip has higher cost and is less applied to clinical virulence gene detection. The product of Chinese patent No. CN105950732B filed in 2016 is designed and identified with 9 animal-derived food pathogenic bacteria: 17 virulence genes of Salmonella (Salmonella), Enterococcus (Enterococcus), Clostridium perfringens (Clostridium perfringens), and the like. The prior art needs to design or use specific primers of one or a plurality of known genes before experiments, so that only virulence genes in a preset range can be detected. Clinically, a more sensitive and comprehensive virulence gene detection strategy for infectious pathogenic bacteria is needed, and the requirements of China on diagnosis, treatment and epidemiological monitoring of important pathogenic bacteria with high incidence and high toxicity are met. In the metagenomic sequencing technology developed in recent years, the whole microbial community in a specific habitat is taken as a research object, and the DNA of all the microbial groups in a clinical sample is directly extracted for sequencing annotation and comparative analysis. The technology makes up the defects of the prior sequencing method, does not need culture or prior knowledge of samples, and can simultaneously carry out comprehensive virulence gene scanning and identification on clinical pathogen metagenome. The prior Chinese patent published application or obtained projects have no products or similar projects for detecting virulence genes based on metagenome, and the research, development and popularization of the products are helpful for meeting the requirements of the diagnosis of the highly virulent pathogenic bacteria of clinical infectious diseases.
Disclosure of Invention
The patent provides a method and a system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome, including but not limited to identification of hundreds of important virulence genes of various pathogenic bacteria such as Klebsiella pneumoniae, Streptococcus pneumoniae, Escherichia coli, Haemophilus influenzae, Staphylococcus aureus and the like, such as rmpA, iucA, ply, cps, stx1A, bexA, lukF-PV, hly, ompA, plc, cylL, ctxA, eccA1, lipA, slo, acm, icmTlef, toxA, pgm and the like. The method comprises the following main parts: 1) establishing a clinical pathogenic bacterium virulence gene database; 2) obtaining clinical sample metagenome sequencing original data, and preprocessing the clinical sample metagenome sequencing original data to obtain target data; 3) analyzing target data by using a preset metagenome sequencing data multiple comparison annotation system, and identifying virulence genes; 4) establishing an important virulence gene-virulence factor-characterization (function/clinical phenotype) association database; 5) and generating a virulence gene identification report based on the virulence gene identification result and the associated database by using a preset clinical automatic report system. The method is suitable for clinical multiple infection disease sample types (cerebrospinal fluid, alveolar lavage fluid, blood and the like), can be used for identifying multiple high-frequency and important virulence genes of multiple clinical pathogenic bacteria at one time, reduces additional screening time, has higher sensitivity and accuracy in a deep association database and a multiple comparison strategy, and can be used for rapidly generating reports by an automatic reporting system, so that doctors can be helped to identify, diagnose, treat and prognose infection-type high-virulence pathogenic strains in time.
The invention discloses a method for detecting virulence genes of clinically important pathogenic bacteria based on metagenome, which comprises the following steps:
s10, establishing a clinical pathogenic bacterium virulence gene database;
s20, acquiring original data of clinical sample metagenome sequencing, and preprocessing to obtain target data;
s30, analyzing target data by using a preset metagenome sequencing data multiple-comparison annotation system, and identifying virulence genes;
s40, establishing an important virulence gene-virulence factor-characterization (function/clinical phenotype) correlation database;
and S50, generating a virulence gene identification report based on the S30 virulence gene identification result and the S40 association database by using a preset clinical automation report system.
In some embodiments of the present invention, the S10 includes the following steps:
obtaining the virulence genes and sequences of clinical pathogenic bacteria from a virulence gene database;
acquiring all genomes, gene sequences and annotation information of the clinical pathogenic bacteria from a public database;
filtering pseudogenes, fragments, and misannotated sequences in the gene sequence;
clustering each gene unit sequence by multiple thresholds, and performing cross comparison and de-duplication in groups;
simulating a data set to test a gene unit reference gene sequence, and adjusting a supplementary gene unit reference sequence;
clustering each gene unit sequence by circulating multiple threshold values, and performing cross comparison and duplication removal in groups;
clustering reference sequences of all gene units, and filtering abnormal sequences;
extracting public database annotation information such as gene names and species names of the reference sequences, and proofreading and standardizing reference sequence annotations of each gene unit;
establishing reference sequence indexes of all virulence gene units;
and optionally, establishing a software to realize automatic downloading sequence, cluster deduplication, updating and standardization of the database.
In some embodiments of the invention, the S20 includes:
filtering reads with a quality value below 2 and a base count of 40% of the total read;
excising bases with average mass of less than 20 bases in the sliding window (5 bp);
filtering reads with average quality less than 20, N number greater than 5, and length less than 50.
In some embodiments of the present invention, the S30 includes the following steps:
the set of reference sequences for a particular virulence gene was set as: { s1,s2,…,sn}; wherein s isn: a reference sequence n;
comparing high-quality reads (clean reads) of the metagenome to a reference sequence set by using a multiple comparison algorithm, wherein a threshold value e-value is 1 e-5;
the alignment of each read was: { R1,R2,…,Rm}∈gi(ii) a Wherein m is more than or equal to 0 and less than or equal to n; rm: the result of the mth alignment; gi: the ith gene unit;
filtration strategy for detection of virulence genes (VF-result):
Figure BDA0003085682040000051
wherein, id ═ sequence similarity score (%);
Figure BDA0003085682040000052
score is the quality score of pairwise alignment of sequences;
and (3) filtering conditions: VF-result belonging to gi,i>1, discarding and obtaining no result;
and optionally, building software to implement automated alignment, filtering, and result list generation.
In some embodiments of the invention, the alignment of S30 comprises:
when the comparison result is a single result (m is 1), taking the result (R)m) As final result (r);
when there are multiple comparisons (m)>1) And the target reference sequence is the same gene unit of the same species, and after scoring and sorting, the final result riThe following were used:
Figure BDA0003085682040000061
when there are multiple comparisons (m)>1) And the targeting reference sequences are different gene units (g) of the same speciesi,i>1) The final result r is the union of the best results in each gene group { r1,r2,…,riIn which g isiThe results in the grouping are as follows:
Figure BDA0003085682040000062
in some embodiments of the present invention, the S40 includes the following steps:
collecting metagenome sequencing data of a pathogenic bacteria clinical sample;
analyzing the data based on the S20 and S30, and constructing a virulence gene spectrum of the single pathogenic bacteria of each sample;
extracting and standardizing corresponding clinical phenotype and physiological and biochemical indexes of the sample;
analyzing and extracting gene characteristics by using a maximum likelihood method;
combining clinical routine detection indexes and sequence characteristics (PAAC and PSSM-C) of virulence genes, and applying a multi-machine learning strategy to construct a virulence gene characteristic spectrum related to clinical diagnosis;
clustering synergistic virulence genes into single virulence factor units, correlating the corresponding characteristics (functional/clinical phenotype);
constructing a virulence factor-virulence gene and virulence factor-characterization (function/clinical phenotype) association table, and establishing a clinically important virulence gene-virulence factor-characterization (function/clinical phenotype) association database;
and optionally, automated alignment, filtering, and result list generation are implemented by software.
In some embodiments of the present invention, the analyzing and extracting gene features using the maximum likelihood method in S40 includes:
extraction of protein sequence physicochemical characteristics (PAAC) of virulence genes:
Figure BDA0003085682040000063
ai: the physicochemical feature set of 20 amino acids,
Figure BDA0003085682040000064
nth position physicochemical characteristic, N: the total number of physicochemical characteristics of amino acid;
wherein, the physical and chemical characteristics of single amino acid are as follows:
Figure BDA0003085682040000065
for any two amino acids RbAnd RdThe correlation of (A) is:
Figure BDA0003085682040000066
Fk(Rb) Is RbPhysicochemical characteristics of the q-th position of (1);
for amino acid sequences of length L, the sequence position correlation parameter θhThe definition is as follows:
Figure BDA0003085682040000067
then, the physicochemical feature extraction formula for amino acid e in the 20+ λ (λ ═ 2) dimensional sequence is as follows:
Figure BDA0003085682040000071
wherein f ise: the frequency of amino acid e in the sequence; ω: the amino acid position in the sequence is weighted by a parameter with a default value of 0.1.
Extraction of evolutionary features (PSSM) of virulence protein sequences:
the protein sequence of the virulence genes within the transforming gene units is the original PSSM matrix as follows:
Figure BDA0003085682040000072
wherein, L: the length of the sequence; 20, column number presents 20 natural amino acids; p is a radical ofu,v: possibility of evolutionary mutation of the u amino acid to the v amino acid;
PSSM-C the PSSM matrix was transformed into a 20x20 matrix, with amino acid Z in row uuThe calculation is as follows:
Figure BDA0003085682040000073
wherein the content of the first and second substances,
Figure BDA0003085682040000074
zt: the value of the t-th bit in the original PSSM table; p is a radical oft: amino acid at position t in the sequence; l: the length of the sequence; a isuIs the u-th amino acid among the 20 amino acids.
In some embodiments of the present invention, the S50 includes the following steps:
importing the result list obtained in the S30, comparing the result list with the S40 association database to generate a virulence gene result, wherein the virulence gene result comprises pathogenic bacteria species (species Latin name and Chinese name) and gene information (gene name, virulence factor, characterization, support score and the like);
importing the result into a corresponding table of a report template;
importing the client information of the database into a report template;
a virulence gene identification report (PDF format) of the particular pathogen is generated for the final clinical sample.
The second aspect of the invention discloses a metagenome-based system for detecting virulence genes of clinically important pathogenic bacteria, which comprises the following components:
a clinical pathogen virulence gene database;
important virulence genes-virulence factors-characterization (functional/clinical phenotype) association databases;
a multiple alignment annotation system for metagenomic sequencing data;
a clinical automated reporting system.
The beneficial technical effects of the invention are as follows:
(1) a clinical important pathogenic bacterium virulence gene detection system based on metagenome is established, the multi-aspect limitations of the prior art and the method are overcome, virulence gene detection and identification can be carried out on clinical infection samples with different sample types (cerebrospinal fluid, lung lavage fluid, blood, throat swabs and the like) and low nucleic acid content, hundreds of important virulence genes of various clinical pathogenic bacteria can be identified at one time, and the additional screening time is reduced. The sensitivity and accuracy of identification are improved by the deep level database and the two-step comparison strategy. The clinical automatic report system can quickly generate reports to help doctors to diagnose, treat and prognose in time;
(2) constructing a comprehensive and artificially corrected clinical important pathogenic bacterium virulence gene database, wherein the database comprises all reference sequences of hundreds of important virulence genes of various clinical important pathogenic bacteria, and corrected species and function annotation information;
(3) the machine learning algorithm is applied to carry out literature and clinical big data mining, the high frequency and important virulence gene spectrum of each pathogenic bacterium and virulence factors and characteristics (functions/clinical phenotypes) thereof are identified, the pathogenic bacterium is divided into different virulence factor units according to the synergistic effect of the genes, and a strong association knowledge base of the important virulence genes and the virulence factors and the characteristics (functions/clinical phenotypes) thereof is established, so that the method has more reference value for clinical diagnosis and prognosis treatment;
(4) based on a comparison grading threshold filtering algorithm after large sample analysis, the method improves the sensitivity of a virulence gene detection result while considering the comparison accuracy, overcomes the limitation of the prior related technology on low-abundance and short-reading sample identification, and is particularly suitable for the virulence gene detection and identification of clinical samples (such as cerebrospinal fluid) with single-end short reading (50-75 bp) and low nucleic acid content;
(5) the comparison result of metagenome data and virulence factor and characterization (function/clinical phenotype) information of important virulence genes are integrated in a clinical automatic report system, and the system has higher reliability and clinical practicability.
Drawings
FIG. 1 is a flow chart of a method for detecting a virulence gene of a clinically important pathogen according to an embodiment of the present invention;
FIG. 2 is a flow chart of the operation of the gene detection system for virulence of clinically important pathogenic bacteria according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Example 1
As shown in figure 1, the method for detecting the virulence gene of clinically important pathogenic bacteria based on metagenome mainly comprises the following steps:
1. establishing clinical pathogenic bacteria virulence gene database
1.1. Acquiring 1761 virulence genes and sequences of 24 important pathogenic bacteria (covering 18 genera, 10 gram-negative bacteria and 14 gram-positive bacteria) from a virulence database such as VFDB;
1.2. downloading from a public database (NCBI RefSeq) all genomes and gene sequences and annotation information for filtering 24 pathogens;
1.3. filtering pseudogenes, segments and misannotated sequences in the downloaded sequence by using self-developed software;
1.4. clustering each gene unit sequence by multiple thresholds, and performing cross comparison and de-duplication in groups;
1.5. simulating a data set to test a gene unit reference gene sequence, and adjusting a supplementary gene unit reference sequence;
1.6. 1.4, clustering the gene unit sequences, and performing cross comparison and duplication removal in groups;
1.7. clustering reference sequences of all gene units, and filtering abnormal sequences;
1.8. extracting NCBI annotation information such as gene names and species names of the reference sequences by using a regular expression, and proofreading and standardizing the annotation of the reference sequences of all gene units;
1.9. establishing reference sequence indexes of all virulence gene units;
1.10. the software (VF _ MKDB) implements the automatic download sequence, cluster deduplication, update and standardize databases.
2 obtaining clinical sample metagenome sequencing original data, preprocessing the original data to obtain target data
2.1. Filtering reads with a quality value below 2 and a base count of 40% of the total read;
2.2 excising bases with an average mass of less than 20 bases within the sliding window (5 bp);
2.3 Filtering reads with an average mass of less than 20, N number greater than 5, and length less than 50.
3 analyzing the target data by using a preset metagenome sequencing data multiple comparison annotation system to identify the virulence genes
The two-step comparison strategy and judgment method based on metagenome sequencing Read (Read) is as follows:
3.1. the set of reference sequences for a particular virulence gene was set as: { s1,s2,…,sn}; it is composed ofIn, sn: a reference sequence n;
3.2. aligning high-quality reads (clean reads) of the metagenome to a reference sequence set (threshold e-value of 1e-5) by applying a multiple alignment algorithm;
3.3. the alignment of each read was: { R1,R2,…,Rm}∈gi(ii) a Wherein m is more than or equal to 0 and less than or equal to n; rm: the result of the mth alignment; gi: the ith gene unit;
3.4, step one:
if the comparison result is a single result (m is 1), taking the result (R)m) As final result (r);
3.5. if there are multiple alignments (m >1), two cases:
3.5.1 the targeting reference sequence is the same gene unit of the same species, after scoring and ordering, the final result riThe following were used:
Figure BDA0003085682040000091
3.5.2 targeting reference sequences are different Gene units (g) of the same speciesi,i>1) The final result r is the union of the best results in each gene group { r1,r2,…,riIn which g isiThe results in the grouping are as follows:
Figure BDA0003085682040000101
3.6. step two:
filtration strategy for detection of virulence genes (VF-result):
Figure BDA0003085682040000102
wherein, id ═ sequence similarity score (%);
Figure BDA0003085682040000103
score is the quality score of pairwise alignment of sequences;
and (3) filtering conditions: VF-result belonging to gi,i>1, discard, result None (None);
3.7. the software (VF _ Finder) implements automated alignment, filtering, and result list generation.
4. Establishing important virulence gene-virulence factor-characterization (function/clinical phenotype) correlation database
Metagenomic sequencing data collection of 4.1.24 pathogen clinical samples (approximately 50 samples/individual pathogen);
4.2. analyzing the data based on the metagenome sequencing data multiple comparison annotation system, and constructing a virulence gene profile of a single pathogenic bacterium in each sample;
4.3 extracting and standardizing corresponding clinical phenotype and physiological and biochemical indexes of the sample, which mainly comprises the following steps: white blood cell count, neutrophil count, monocyte fraction, lymphocyte fraction, C-reactive protein, endotoxin, etc.;
4.4. analyzing and extracting gene characteristics by using a maximum likelihood method:
4.4.1. extraction of protein sequence physicochemical characteristics (PAAC) of virulence genes:
Figure BDA0003085682040000104
ai: the physicochemical feature set of 20 amino acids,
Figure BDA0003085682040000105
nth position physicochemical characteristic, N: the total number of physicochemical characteristics of amino acid;
wherein, the physical and chemical characteristics of single amino acid are as follows:
Figure BDA0003085682040000106
for any two amino acids RbAnd RdThe correlation of (A) is:
Figure BDA0003085682040000107
Fk(Rb) Is RbPhysicochemical characteristics of the q-th position of (1);
for amino acid sequences of length L, the sequence position correlation parameter θhThe definition is as follows:
Figure BDA0003085682040000111
then, the physicochemical feature extraction formula for amino acid e in the 20+ λ (λ ═ 2) dimensional sequence is as follows:
Figure BDA0003085682040000112
wherein f ise: the frequency of amino acid e in the sequence; ω: the amino acid position in the sequence is weighted by a parameter with a default value of 0.1.
4.4.2. Extraction of evolutionary features (PSSM) of virulence protein sequences:
the protein sequence of the virulence genes within the gene unit transformed using PSI-BLAST is the original PSSM matrix (Position-specific targeting matrix) as follows:
Figure BDA0003085682040000113
wherein, L: the length of the sequence; 20, column number presents 20 natural amino acids; p is a radical ofu,v: possibility of evolutionary mutation of the u amino acid to the v amino acid;
PSSM-C (PSSM-composition) conversion PSSM matrix into 20x20 matrix, wherein amino groups of u row
Acid ZuThe calculation is as follows:
Figure BDA0003085682040000114
wherein the content of the first and second substances,
Figure BDA0003085682040000115
zt: the value of the t-th bit in the original PSSM table; p is a radical oft: amino acid at position t in the sequence; l: the length of the sequence; a isuIs the u-th amino acid among the 20 amino acids.
4.5. Constructing a virulence gene characteristic spectrum related to clinical diagnosis by applying a multi-machine learning strategy (multi-task logistic regression, random forest, support vector machine and the like) by combining clinical routine detection indexes and sequence characteristics (PAAC and PSSM-C) of virulence genes;
4.6. clustering synergistic virulence genes into single virulence factor units, correlating the corresponding characteristics (functional/clinical phenotype);
4.7. constructing a virulence factor-virulence gene and virulence factor-characterization (function/clinical phenotype) association table, and establishing a clinically important virulence gene-virulence factor-characterization (function/clinical phenotype) association database;
4.8. and the software (VF-KDB) realizes the collection, analysis and upgrading of data.
5. Generating a virulence gene identification report based on the virulence gene identification result and the associated database by using a preset clinical automation report system
5.1. Importing a result list obtained by the metagenomic sequencing data multi-comparison annotation system, comparing an important virulence gene-virulence factor-characterization (function/clinical phenotype) association database, and automatically generating a gene result list (text format): comprises pathogenic bacteria species (species Latin name, Chinese name) and gene information (gene name, virulence factor, characterization and support score, etc.);
5.2. the program automatically leads the results into corresponding forms of the report template;
5.3. the program automatically leads the client information of the database into a report template;
5.5. a virulence gene identification report (PDF format) of the particular pathogen is generated for the final clinical sample.
As shown in fig. 2, the metagenome-based system for detecting virulence genes of clinically important pathogenic bacteria comprises the following components:
1. a clinical pathogen virulence gene database;
2. important virulence genes-virulence factors-characterization (functional/clinical phenotype) association databases;
3. a multiple alignment annotation system for metagenomic sequencing data;
4. a clinical automated reporting system.
While the preferred embodiments and examples of the present invention have been described in detail, the present invention is not limited to the embodiments and examples, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (9)

1. A method for detecting virulence genes of clinically important pathogenic bacteria based on metagenome is characterized by comprising the following steps:
s10, establishing a clinical pathogenic bacterium virulence gene database;
s20, acquiring original data of clinical sample metagenome sequencing, and preprocessing to obtain target data;
s30, analyzing target data by using a preset metagenome sequencing data multiple-comparison annotation system, and identifying virulence genes;
s40, establishing an important virulence gene-virulence factor-characterization (function/clinical phenotype) correlation database;
and S50, generating a virulence gene identification report based on the virulence gene identification result and the associated database by using a preset clinical automation report system.
2. The method according to claim 1, wherein the S10 includes:
obtaining the virulence genes and sequences of clinical pathogenic bacteria from a virulence gene database;
acquiring all genomes, gene sequences and annotation information of the clinical pathogenic bacteria from a public database;
filtering pseudogenes, fragments, and misannotated sequences in the gene sequence;
clustering each gene unit sequence by multiple thresholds, and performing cross comparison and de-duplication in groups;
simulating a data set to test a gene unit reference gene sequence, and adjusting a supplementary gene unit reference sequence;
clustering each gene unit sequence by circulating multiple threshold values, and performing cross comparison and duplication removal in groups;
clustering reference sequences of all gene units, and filtering abnormal sequences;
extracting public database annotation information of the reference sequence, and checking and standardizing reference sequence annotations of each gene unit;
establishing reference sequence indexes of all virulence gene units;
and optionally, establishing a software to realize automatic downloading sequence, cluster deduplication, updating and standardization of the database.
3. The method according to claim 1, wherein the S20 includes:
filtering reads with a quality value below 2 and a base count of 40% of the total read;
excising bases with average mass of less than 20 bases in the sliding window (5 bp);
filtering reads with average quality less than 20, N number greater than 5, and length less than 50.
4. The method according to claim 1, wherein the S30 includes:
the set of reference sequences for a particular virulence gene was set as: { s1,s2,…,sn}; wherein s isn: a reference sequence n;
comparing the high-quality reading sequence of the metagenome to a reference sequence set by using a multiple comparison algorithm, wherein the threshold value e-value is 1 e-5;
the alignment of each read was: { R1,R2,…,Rm}∈gi(ii) a Wherein m is more than or equal to 0 and less than or equal to n; rm: the result of the mth alignment; gi: the ith gene unit;
filtration strategy of detection results of virulence genes:
Figure FDA0003085682030000021
wherein, id ═ sequence similarity score (%);
Figure FDA0003085682030000022
score is the quality score of pairwise alignment of sequences;
and (3) filtering conditions: VF-result belonging to gi,i>1, discarding and obtaining no result;
and optionally, building software to implement automated alignment, filtering, and result list generation.
5. The method of claim 4, wherein the comparing in S30 comprises:
when the comparison result is a single result, taking the result as a final result;
when a plurality of comparison results and the target reference sequence are the same gene unit of the same species, and after scoring and sorting, the final result riThe following were used:
Figure FDA0003085682030000023
when a plurality of comparison results and the target reference sequences are different gene units of the same species, the final result is the union of the optimal results in each gene group, wherein giThe results in the grouping are as follows:
Figure FDA0003085682030000024
6. the method according to claim 1, wherein the S40 includes:
collecting metagenome sequencing data of a pathogenic bacteria clinical sample;
analyzing the data based on the S20 and S30, and constructing a virulence gene spectrum of the single pathogenic bacteria of each sample;
extracting and standardizing corresponding clinical phenotype and physiological and biochemical indexes of the sample;
analyzing and extracting gene characteristics by using a maximum likelihood method;
combining clinical routine detection indexes and sequence characteristics of virulence genes, and applying a multiple machine learning strategy to construct a virulence gene characteristic spectrum related to clinical diagnosis;
clustering synergistic virulence genes into single virulence factor units, correlating the corresponding characteristics (functional/clinical phenotype);
constructing a virulence factor-virulence gene and virulence factor-characterization (function/clinical phenotype) association table, and establishing a clinically important virulence gene-virulence factor-characterization (function/clinical phenotype) association database;
and optionally, automated alignment, filtering, and result list generation are implemented by software.
7. The method according to claim 6, wherein the extracting the gene features by applying the maximum likelihood analysis in S40 comprises:
extraction of protein sequence physicochemical characteristics of virulence genes:
Figure FDA0003085682030000031
ai: the physicochemical feature set of 20 amino acids,
Figure FDA0003085682030000032
nth position physicochemical characteristic, N: the total number of physicochemical characteristics of amino acid;
wherein, the physical and chemical characteristics of single amino acid are as follows:
Figure FDA0003085682030000033
n is 1,2, …, N; x is 1 to n;
for any two amino acids RbAnd RdThe correlation of (A) is:
Figure FDA0003085682030000034
Fk(Rb) Is RbPhysicochemical characteristics of the q-th position of (1);
for amino acid sequences of length L, the sequence position correlation parameter θhThe definition is as follows:
Figure FDA0003085682030000035
h=1,2,…,L-1;
then, the physicochemical feature extraction formula for amino acid e in the 20+ λ (λ ═ 2) dimensional sequence is as follows:
Figure FDA0003085682030000036
wherein f ise: the frequency of amino acid e in the sequence; ω: the amino acid position weighting parameter in the sequence is 0.1 as default;
extraction of the evolutionary features of the virulence protein sequences:
the protein sequence of the virulence genes within the transforming gene units is the original PSSM matrix as follows:
Figure FDA0003085682030000037
wherein, L: the length of the sequence; 20, column number presents 20 natural amino acids; p is a radical ofu,v: possibility of evolutionary mutation of the u amino acid to the v amino acid;
the PSSM matrix is transformed into a 20x20 matrix, wherein the amino acid Z in the u-th rowuThe calculation is as follows:
Figure FDA0003085682030000038
wherein the content of the first and second substances,
Figure FDA0003085682030000039
(u=1,…,20;t=1,…,L)
zt: the value of the t-th bit in the original PSSM table; p is a radical oft: amino acid at position t in the sequence; l: the length of the sequence; a isuIs the u-th amino acid among the 20 amino acids.
8. The method according to claim 1, wherein the S50 includes:
importing the result list obtained in the S30, and comparing the result list with the S40 association database to generate a virulence gene result which comprises pathogenic bacteria species and gene information;
importing the result into a corresponding table of a report template;
importing the client information of the database into a report template;
generating a virulence gene identification report of the specific pathogenic bacteria of the final clinical sample.
9. A system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome comprises the following components:
a clinical pathogen virulence gene database;
important virulence genes-virulence factors-characterization (functional/clinical phenotype) association databases;
a multiple alignment annotation system for metagenomic sequencing data;
a clinical automated reporting system.
CN202110579642.1A 2021-05-26 2021-05-26 Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome Active CN113223618B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110579642.1A CN113223618B (en) 2021-05-26 2021-05-26 Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110579642.1A CN113223618B (en) 2021-05-26 2021-05-26 Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome

Publications (2)

Publication Number Publication Date
CN113223618A true CN113223618A (en) 2021-08-06
CN113223618B CN113223618B (en) 2022-09-16

Family

ID=77099541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110579642.1A Active CN113223618B (en) 2021-05-26 2021-05-26 Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome

Country Status (1)

Country Link
CN (1) CN113223618B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113782100A (en) * 2021-11-10 2021-12-10 中国人民解放军军事科学院军事医学研究院 Method for identifying plasmid type carried by bacterial population based on bacterial genome high-throughput sequencing data
CN114038501A (en) * 2021-12-21 2022-02-11 广州金匙医学检验有限公司 Background bacterium judgment method based on machine learning
CN114420213A (en) * 2021-12-31 2022-04-29 圣湘生物科技股份有限公司 Biological information analysis method and device, electronic equipment and storage medium
CN114574606A (en) * 2022-04-02 2022-06-03 予果生物科技(北京)有限公司 Primer group for detecting mycobacterium tuberculosis in metagenome and high-throughput sequencing method
CN115985400A (en) * 2022-12-02 2023-04-18 江苏先声医疗器械有限公司 Method for reassigning multiple alignment sequences of metagenome and application thereof

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101680872A (en) * 2007-04-13 2010-03-24 塞昆纳姆股份有限公司 Comparative sequence analysis processes and systems
US20120004111A1 (en) * 2007-11-21 2012-01-05 Cosmosid Inc. Direct identification and measurement of relative populations of microorganisms with direct dna sequencing and probabilistic methods
US20150011401A1 (en) * 2011-12-13 2015-01-08 Genomedx Biosciences, Inc. Cancer Diagnostics Using Non-Coding Transcripts
CN105518153A (en) * 2013-06-20 2016-04-20 因姆内克斯普雷斯私人有限公司 Biomarker identification
CN105950732A (en) * 2016-05-25 2016-09-21 中国农业大学 Animal-derived food pathogen identification and drug-resistant and toxic gene detection composite chip
EP3141612A1 (en) * 2015-09-10 2017-03-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and device for nucleic acid based diagnostic approaches including the determination of a deviant condtion, especially a health condition and/or pathogenic condition of a sample
CN107391965A (en) * 2017-08-15 2017-11-24 上海派森诺生物科技股份有限公司 A kind of lung cancer somatic mutation determination method based on high throughput sequencing technologies
WO2018069430A1 (en) * 2016-10-13 2018-04-19 bioMérieux Identification and antibiotic characterization of pathogens in metagenomic sample
CN110349630A (en) * 2019-06-21 2019-10-18 天津华大医学检验所有限公司 Analysis method and device for blood metagenome sequencing data and application thereof
CN110462053A (en) * 2016-12-21 2019-11-15 加利福尼亚大学董事会 Unicellular gene order-checking is carried out using the drop based on hydrogel
CN111187813A (en) * 2020-02-20 2020-05-22 予果生物科技(北京)有限公司 Full-process quality control pathogenic microorganism high-throughput sequencing detection method
CN111192630A (en) * 2019-12-24 2020-05-22 中国科学院生态环境研究中心 Metagenome data mining method
CN111276185A (en) * 2020-02-18 2020-06-12 上海桑格信息技术有限公司 Microorganism identification and analysis system and device based on second-generation high-throughput sequencing
CN111445955A (en) * 2020-04-10 2020-07-24 广州微远基因科技有限公司 Novel coronavirus variation analysis method and application
CN111491023A (en) * 2020-04-10 2020-08-04 西咸新区予果微码生物科技有限公司 Microbial detection system based on CRISPR technology
CN112530519A (en) * 2020-12-14 2021-03-19 广东美格基因科技有限公司 Method and system for detecting microorganisms and drug resistance genes in sample
CN112542214A (en) * 2020-12-18 2021-03-23 昆明金域医学检验所有限公司 Causal analysis method for Granger among multiple flora based on pathogenic microorganism metagenome
CN112687344A (en) * 2021-01-21 2021-04-20 予果生物科技(北京)有限公司 Human adenovirus molecule typing and tracing method and system based on metagenome
CN112837745A (en) * 2021-01-15 2021-05-25 广州微远基因科技有限公司 Pathogenic microorganism virulence gene association model and establishment method and application thereof

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101680872A (en) * 2007-04-13 2010-03-24 塞昆纳姆股份有限公司 Comparative sequence analysis processes and systems
US20120004111A1 (en) * 2007-11-21 2012-01-05 Cosmosid Inc. Direct identification and measurement of relative populations of microorganisms with direct dna sequencing and probabilistic methods
US20150011401A1 (en) * 2011-12-13 2015-01-08 Genomedx Biosciences, Inc. Cancer Diagnostics Using Non-Coding Transcripts
CN105518153A (en) * 2013-06-20 2016-04-20 因姆内克斯普雷斯私人有限公司 Biomarker identification
EP3141612A1 (en) * 2015-09-10 2017-03-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and device for nucleic acid based diagnostic approaches including the determination of a deviant condtion, especially a health condition and/or pathogenic condition of a sample
CN105950732A (en) * 2016-05-25 2016-09-21 中国农业大学 Animal-derived food pathogen identification and drug-resistant and toxic gene detection composite chip
CN109923217A (en) * 2016-10-13 2019-06-21 生物梅里埃公司 The identification of pathogen and antibiotic characterization in macro genomic samples
WO2018069430A1 (en) * 2016-10-13 2018-04-19 bioMérieux Identification and antibiotic characterization of pathogens in metagenomic sample
CN110462053A (en) * 2016-12-21 2019-11-15 加利福尼亚大学董事会 Unicellular gene order-checking is carried out using the drop based on hydrogel
CN107391965A (en) * 2017-08-15 2017-11-24 上海派森诺生物科技股份有限公司 A kind of lung cancer somatic mutation determination method based on high throughput sequencing technologies
CN110349630A (en) * 2019-06-21 2019-10-18 天津华大医学检验所有限公司 Analysis method and device for blood metagenome sequencing data and application thereof
CN111192630A (en) * 2019-12-24 2020-05-22 中国科学院生态环境研究中心 Metagenome data mining method
CN111276185A (en) * 2020-02-18 2020-06-12 上海桑格信息技术有限公司 Microorganism identification and analysis system and device based on second-generation high-throughput sequencing
CN111187813A (en) * 2020-02-20 2020-05-22 予果生物科技(北京)有限公司 Full-process quality control pathogenic microorganism high-throughput sequencing detection method
CN111445955A (en) * 2020-04-10 2020-07-24 广州微远基因科技有限公司 Novel coronavirus variation analysis method and application
CN111491023A (en) * 2020-04-10 2020-08-04 西咸新区予果微码生物科技有限公司 Microbial detection system based on CRISPR technology
CN112530519A (en) * 2020-12-14 2021-03-19 广东美格基因科技有限公司 Method and system for detecting microorganisms and drug resistance genes in sample
CN112542214A (en) * 2020-12-18 2021-03-23 昆明金域医学检验所有限公司 Causal analysis method for Granger among multiple flora based on pathogenic microorganism metagenome
CN112837745A (en) * 2021-01-15 2021-05-25 广州微远基因科技有限公司 Pathogenic microorganism virulence gene association model and establishment method and application thereof
CN112687344A (en) * 2021-01-21 2021-04-20 予果生物科技(北京)有限公司 Human adenovirus molecule typing and tracing method and system based on metagenome

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
RUOPENG XIE 等: "DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy", 《BRIEFINGS IN BIOINFORMATICS》 *
TINGTING LI 等: "Multi-stage analysis of gene expression and transcription regulation in C57/B6 mouse liver development C57/B6", 《GENOMICS》 *
ZURAB B. 等: "Machine learning for detection of viral sequences in human metagenomic datasets", 《BMC BIOINFORMATICS》 *
江月 等: "长江下游某水源型水库抗生素抗性基因污染研究", 《长江下游某水源型水库抗生素抗性基因污染研究 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113782100A (en) * 2021-11-10 2021-12-10 中国人民解放军军事科学院军事医学研究院 Method for identifying plasmid type carried by bacterial population based on bacterial genome high-throughput sequencing data
CN114038501A (en) * 2021-12-21 2022-02-11 广州金匙医学检验有限公司 Background bacterium judgment method based on machine learning
CN114420213A (en) * 2021-12-31 2022-04-29 圣湘生物科技股份有限公司 Biological information analysis method and device, electronic equipment and storage medium
CN114574606A (en) * 2022-04-02 2022-06-03 予果生物科技(北京)有限公司 Primer group for detecting mycobacterium tuberculosis in metagenome and high-throughput sequencing method
CN115985400A (en) * 2022-12-02 2023-04-18 江苏先声医疗器械有限公司 Method for reassigning multiple alignment sequences of metagenome and application thereof
CN115985400B (en) * 2022-12-02 2024-03-15 江苏先声医疗器械有限公司 Method for reassigning metagenome multiple comparison sequences and application

Also Published As

Publication number Publication date
CN113223618B (en) 2022-09-16

Similar Documents

Publication Publication Date Title
CN113223618B (en) Method and system for detecting virulence genes of clinically important pathogenic bacteria based on metagenome
Wu et al. Guild-based analysis for understanding gut microbiome in human health and diseases
Okura et al. Current taxonomical situation of Streptococcus suis
Links et al. The chaperonin-60 universal target is a barcode for bacteria that enables de novo assembly of metagenomic sequence data
Díaz-Sánchez et al. Next-generation sequencing: the future of molecular genetics in poultry production and food safety
CN111378788B (en) Bacterial marker for assisting COVID-19 diagnosis and application thereof
Suttisunhakul et al. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry for the identification of Burkholderia pseudomallei from Asia and Australia and differentiation between Burkholderia species
US20200294628A1 (en) Creation or use of anchor-based data structures for sample-derived characteristic determination
CN114898800B (en) Method and system for predicting sensitivity of klebsiella pneumoniae to ceftriaxone
CN114898808B (en) Method and system for predicting sensitivity of Klebsiella pneumoniae to cefepime
Soverini et al. HumanMycobiomeScan: a new bioinformatics tool for the characterization of the fungal fraction in metagenomic samples
Kudirkiene et al. Rapid and accurate identification of Streptococcus equi subspecies by MALDI-TOF MS
CN108064272A (en) Biomarker for rheumatoid arthritis and application thereof
Davis et al. Evaluation of Fourier transform infrared (FT-IR) spectroscopy and chemometrics as a rapid approach for sub-typing Escherichia coli O157: H7 isolates
Be et al. Detection of Bacillus anthracis DNA in complex soil and air samples using next-generation sequencing
JP2023501538A (en) Identification of host RNA biomarkers of infection
Imai et al. Rapid and accurate species identification of mitis group streptococci using the MinION nanopore sequencer
Osek et al. Listeria monocytogenes in foods—From culture identification to whole‐genome characteristics
CN111647673A (en) Application of microbial flora in acute pancreatitis
Watts et al. Metagenomic next-generation sequencing in clinical microbiology
CN114854847A (en) Method for constructing machine learning model for identifying infectious diseases and non-infectious diseases
Ojha et al. Examination of animal and zoonotic pathogens using microarrays
CN108384782B (en) Kit and kit for detecting pathogens causing bloodstream infections
CN109652573B (en) For Salmonella typhimurtum or the site VNTR, detection primer group and the determination method of its single-phase bacterium mutation parting detection
CN113862382A (en) Application of biomarker of intestinal flora in preparation of product for diagnosing adult immune thrombocytopenia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant