CN117935918A - Pathogenic microorganism data analysis method and device and processor - Google Patents

Pathogenic microorganism data analysis method and device and processor Download PDF

Info

Publication number
CN117935918A
CN117935918A CN202410323192.3A CN202410323192A CN117935918A CN 117935918 A CN117935918 A CN 117935918A CN 202410323192 A CN202410323192 A CN 202410323192A CN 117935918 A CN117935918 A CN 117935918A
Authority
CN
China
Prior art keywords
species
sample
detected
screening
abundance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410323192.3A
Other languages
Chinese (zh)
Other versions
CN117935918B (en
Inventor
于洋
赵琳
李连凤
李学平
王蒙
欧宇红
史祺云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Novogene Technology Co ltd
Original Assignee
Beijing Novogene Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Novogene Technology Co ltd filed Critical Beijing Novogene Technology Co ltd
Priority to CN202410323192.3A priority Critical patent/CN117935918B/en
Publication of CN117935918A publication Critical patent/CN117935918A/en
Application granted granted Critical
Publication of CN117935918B publication Critical patent/CN117935918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioethics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a pathogenic microorganism data analysis method, a pathogenic microorganism data analysis device and a pathogenic microorganism data processor, and relates to the technical field of biological analysis. According to the analysis method provided by the invention, a semi-automatic mode is adopted, the original microorganism detection list is processed according to preset logic, the interference species are gradually removed, the result in the sample is more likely to be reserved, a simplified range is provided for the interpretation work of the pathogenic report, the correlation between the clinical information of the sample and the report result is transmitted to a clinician/report interpretation personnel for final judgment, and the risk of species being removed by mistake is reduced.

Description

Pathogenic microorganism data analysis method and device and processor
Technical Field
The invention relates to the technical field of biological information analysis, in particular to a pathogenic microorganism data analysis method and device and a processor.
Background
The microbial metagenome sequencing analysis (metagenomics next generation sequencing, mNGS) is based on a high-throughput sequencing platform, and uses a biological information analysis means to identify the microorganism from the nucleic acid source in the sample, so that the method has the outstanding advantages of no bias, wide detection range, high sensitivity, no dependence on culture and the like compared with the traditional microorganism detection method. The method is widely applied to pathogen metagenome analysis for detecting clinical infectious diseases at present, and by utilizing the characteristics of the technology, pathogen nucleic acid information can be directly and rapidly analyzed and removed from clinical samples, the pain point problems of new generation, mixed infection, rare pathogen identification and the like are solved, reference information is provided for clinical diagnosis and treatment in time, and the detection value of the method is proved by a large number of cases. However, as the factors influencing the analysis result of mNGS technology are numerous, the species annotation information obtained based on nucleic acid comparison is huge, and a small amount of microbial pathogen information with clinical pathogenicity can be accurately extracted from the vast information, so that the method is an important index for evaluating the performance of mNGS products. From initial sequencing data to clinically significant reporting results, a continuous screening and filtering process. The biological information analysis means can filter and mark nucleic acid from host, environment and reagent background in the sample; meanwhile, a clinician or report interpretation auditor is required to comprehensively judge and obtain more possible pathogenic microorganisms by combining the clinical information of the sample.
Although mNGS is of great value in the detection of infection, there is a relatively large variability in all analytical links from sample processing to the acquisition of pathogen reporting results. In order to promote the standardization management of the technology, a plurality of clinical experts sequentially release relevant consensus of pathogen mNGS detection technology, and suggestions are made on various aspects such as sample collection, nucleic acid processing, bioinformatics analysis, report content, result interpretation and the like. In accordance with the above, mNGS detection data includes nucleic acid fragments of various sources such as host, reagent, laboratory environment, etc., in addition to the microorganism nucleic acid fragments that are objectively present in the sample; and the micro-ecological flora in different sample types also has significant difference, and the accurate identification of truly existing pathogenic microorganisms from sample detection data is an important challenge of mNGS technology. This requires the accumulation of a large number of samples and a rich experience of correlating clinical information.
Some classification methods for fully automatically identifying key microorganisms have been proposed by the present unit (e.g., alongan, CN 116631511A, fozhou). In this protocol, the originally detected microorganisms are classified into 5 categories of microorganisms with low nucleic acid extraction efficiency, "absolute" key microorganisms, "relative" key microorganisms, kit nucleic acid sequences and laboratory environment microorganisms, and are sequentially given the tags of P1 to P5. And (5) removing according to the relative importance of the 5 categories, so as to judge the real microorganism result in the sample.
The key concept of the scheme is that the kit nucleic acid sequence and laboratory common microorganism nucleic acid are firstly removed, and then other types of key microorganisms are screened. This solution may have two main drawbacks in practical applications: 1. the scheme provides a reference method for acquiring the nucleic acid sequence of the kit and the nucleic acid sequence of the common microorganism in the laboratory, but the operation is complex and complicated, and the practicability is low under the conditions of high update frequency of reagent consumable batches and laboratory environment; 2. although the influence factors affecting the mNGS analysis result can be clearly known at present, some unknown influence factors exist, and important information can be omitted by using full-automatic analysis without manual auditing judgment, because the report interpretation of mNGS is closely related to the clinical information of the sample, and the reference meaning of the clinical information is not considered in the comparison scheme.
In view of this, the present invention has been made.
Disclosure of Invention
The first object of the present invention is to provide a pathogenic microorganism data analysis method, which generates a pathogenic report list suitable for auditors to read by performing primary screening of clinically relevant pathogenic microorganisms and supplementary screening of important-focus microorganisms on an acquired sample microorganism data original detection list. The method aims at using a semi-automatic mode, provides simple and comprehensive clinical microorganism detection information for clinicians/report interpretation auditors while guaranteeing comprehensive and meaningful pathogenic microorganism filtering results, can improve the working efficiency of pathogenic interpretation reporting personnel, reduces the risk of sample missing reporting microorganisms, and helps to accumulate sample data.
A second object of the present invention is to provide a pathogenic microorganism data analysis device.
A third object of the present invention is to provide a processor.
In order to achieve the above object, the following technical scheme is adopted:
In a first aspect, the present invention provides a method for analyzing pathogenic microorganism data, comprising the steps of:
a. Obtaining a sample to be tested and a species comparison result after biological information analysis of a negative control sample (NC) mNGS, dividing detected microorganisms into clinical pathogenic microorganisms and non-clinical pathogenic microorganisms according to clinical pathogenic meanings according to the comparison result, and dividing the detected microorganisms into three categories of bacteria, fungi, viruses and parasites according to biological classification groups;
b. counting the detection frequency of each microorganism in the type of the sample to be tested under the same experimental flow with the sample to be tested, and taking the detection frequency as a background library;
c. the following screening was performed on clinically pathogenic microorganisms:
for bacteria and fungi, enter pathogen candidate list 1 if all of the following screening conditions are met:
Top 15 of bacterial genus abundance or top 15 of fungal genus abundance, and top 2 of intracytoplasmic species abundance; the detection rate in the background library is less than 25% or is not recorded; NC_ratio is not less than 3 or is na; the sequence number of the first in-genus species abundance rank is more than or equal to 3, or the sequence number of the second in-genus species abundance rank is more than or equal to 3 and is higher than 0.1 times of the sequence number of the first in-genus species abundance rank, which does not belong to the common microecological flora;
for the virus class, enter pathogen candidate list 1 if the following screening conditions are met:
The sequence number of the species detected for the virus or suspected background carried by human body is more than or equal to 3, or the sequence number of the species detected for the virus or suspected background carried by non-human body is more than or equal to 1; the species detected by the suspected background is viruses with the detection rate of more than 50% in the same type of sample detected in the same batch as the sample to be detected;
for parasites, entry into pathogen candidate list 1 occurs if the following screening conditions are met:
the number of sequences is more than or equal to 3, and NC_ratio is more than or equal to 3 or na;
d. Acquiring a pathogen species list with important focus on all sample types and a pathogen species list with important focus on respiratory tract samples, and entering a pathogen candidate list 2 if the detected microorganisms meet any one of the following screening conditions:
Belongs to a list of pathogen species of great concern; the sample to be tested is a respiratory tract sample, the abundance of the species in the genus is ranked at the top 2, the sequence number is more than or equal to 3, and the sample belongs to a pathogen species list which is focused on by the respiratory tract sample;
e. Combining and de-duplicating the pathogen candidate list 1 and the pathogen candidate list 2 to obtain a pathogen candidate list;
Nc_ratio=rpm value of a species in the sample to be measured/RPM value of the species in the negative control sample, and nc_ratio is denoted as na if the species is not detected in the negative control sample;
The list of pathogenic species of major interest for all sample types includes: novel cryptococcus, yarrowia, trichosporon assamica, aspergillus fumigatus, mycobacterium tuberculosis complex, mycoplasma pneumoniae, chlamydia psittaci, treponema pallidum, bartonella henselae, benakaokex, clostridium tetani, vibrio vulnificus, wound coccus porus, helicobacter pylori, leptospira interrogans, eastern tsutsugamushi and whipple organisms;
The list of pathogenic species of great interest to the respiratory tract sample includes: escherichia coli, klebsiella pneumoniae, klebsiella aerogenes, staphylococcus aureus, acinetobacter baumannii, streptococcus pneumoniae, pseudomonas aeruginosa, brucella abortus, brucella melitensis, brucella accidentalis, burkholderia cepacia, haemophilus influenzae, nocardia abscess, nocardia astrocina, nocardia asiatica, nocardia brasiliensis, nocardia guinea pig, proteus mirabilis, rhodococcus maltophilia, streptococcus agalactiae, streptococcus pyogenes, moraxella catarrhalis, achromobacter xylosoxidans, enterobacter cloacae complex, exopathia, isaria meningitidis, morganella morganii and Serratia marcescens.
As a further technical scheme, the negative control sample is a healthy human sample.
As a further technical scheme, the step a further comprises the steps of marking the sample to be detected to detect common microecological flora in microorganisms according to the comparison result, and calculating the abundance of genus and species in genus.
As a further technical scheme, in the step b, at least 4 samples which are the same in experimental flow and type as the sample to be detected are counted, and the detection frequency of each microorganism in the type of the sample to be detected is calculated.
As a further technical scheme, the type of the sample to be tested comprises a cerebrospinal fluid sample, a respiratory tract sample or a blood sample.
In a second aspect, the invention provides a pathogenic microorganism data analysis device, which comprises a species comparison result acquisition module, a background library acquisition module, a screening module and an output module;
The species comparison result acquisition module is used for acquiring a species comparison result after biological information analysis of a sample to be detected and a negative control sample mNGS thereof, dividing detected microorganisms into clinical pathogenic microorganisms and non-clinical pathogenic microorganisms according to clinical pathogenic significance according to the comparison result, and dividing the detected microorganisms into three categories of bacteria, fungi, viruses and parasites according to biological classification groups to which the detected microorganisms belong;
the background library acquisition module is used for counting the detection frequency of each microorganism in the type of the sample to be tested under the same experimental process as the sample to be tested to acquire a background library;
the screening module is used for screening species meeting the conditions according to species comparison results, and comprises a bacterial and fungus screening module, a virus screening module, a parasite screening module and a pathogenic species screening module which focuses on;
the bacteria and fungus screening module, the virus screening module and the parasite screening module are used for screening clinical pathogenic microorganisms;
The bacterial and fungus screening module is used for screening bacteria and fungi and listing the matched species in a pathogen candidate list 1, and screening conditions comprise: top 15 of bacterial genus abundance or top 15 of fungal genus abundance, and top 2 of intracytoplasmic species abundance; the detection rate in the background library is less than 25% or is not recorded; NC_ratio is not less than 3 or is na; the sequence number of the first in-genus species abundance rank is more than or equal to 3, or the sequence number of the second in-genus species abundance rank is more than or equal to 3 and is higher than 0.1 times of the sequence number of the first in-genus species abundance rank, which does not belong to the common microecological flora;
The virus screening module is used for screening viruses and listing the matched species into a pathogen candidate list 1, and screening conditions comprise: the sequence number of the species detected by the virus or suspected background carried by the human body is more than or equal to 3, or the sequence number of the species detected by the virus or suspected background carried by the non-human body is more than or equal to 1; the species detected by the suspected background is viruses with the detection rate of more than 50% in the same type of sample detected in the same batch as the sample to be detected;
the parasite screening module is used for screening parasites and listing the matched species into a pathogen candidate list 1, and screening conditions comprise: the number of sequences is more than or equal to 3, and NC_ratio is more than or equal to 3 or na;
The important focused pathogen species screening module is used for screening important focused pathogen species, and the conforming species are listed in a pathogen candidate list 2, and screening conditions comprise: the method belongs to a pathogen species list which is focused on by all sample types, or belongs to a pathogen species list which is focused on by respiratory tract samples, has the abundance of the species in the genus of which the ranking is 2 and the sequence number is more than or equal to 3;
the output module is used for merging and de-duplicating the pathogen candidate list 1 and the pathogen candidate list 2 to obtain a pathogen candidate list;
Nc_ratio=rpm value of a species in the sample to be measured/RPM value of the species in the negative control sample, and nc_ratio is denoted as na if the species is not detected in the negative control sample;
The list of pathogenic species of major interest for all sample types includes: novel cryptococcus, yarrowia, trichosporon assamica, aspergillus fumigatus, mycobacterium tuberculosis complex, mycoplasma pneumoniae, chlamydia psittaci, treponema pallidum, bartonella henselae, benakaokex, clostridium tetani, vibrio vulnificus, wound coccus porus, helicobacter pylori, leptospira interrogans, eastern tsutsugamushi and whipple organisms;
The list of pathogenic species of great interest to the respiratory tract sample includes: escherichia coli, klebsiella pneumoniae, klebsiella aerogenes, staphylococcus aureus, acinetobacter baumannii, streptococcus pneumoniae, pseudomonas aeruginosa, brucella abortus, brucella melitensis, brucella accidentalis, burkholderia cepacia, haemophilus influenzae, nocardia abscess, nocardia astrocina, nocardia asiatica, nocardia brasiliensis, nocardia guinea pig, proteus mirabilis, rhodococcus maltophilia, streptococcus agalactiae, streptococcus pyogenes, moraxella catarrhalis, achromobacter xylosoxidans, enterobacter cloacae complex, exopathia, isaria meningitidis, morganella morganii and Serratia marcescens.
As a further technical scheme, the system further comprises an abundance calculating module for calculating the genus abundance and the intra-genus species abundance of the microorganism detected by the sample to be detected according to the species comparison result.
As a further technical scheme, the device further comprises mNGS biological information analysis module for analyzing mNGS biological information of the sample to be tested and sequencing data of the negative control sample thereof, and outputting a species comparison result after mNGS biological information analysis to the species comparison result acquisition module.
As a further technical scheme, the negative control sample is a healthy human sample;
the type of the sample to be tested comprises a cerebrospinal fluid sample, a respiratory tract sample or a blood sample.
In a third aspect, the present invention provides a processor for running a program, wherein the program is run to perform the pathogenic microorganism data analysis method.
Compared with the prior art, the invention has the following beneficial effects:
According to the analysis method provided by the invention, a semi-automatic mode is adopted, the original microorganism detection list is processed through a specific screening method, the interference species are gradually removed, the result in the sample is more likely to be reserved, a simplified range is provided for the interpretation work of the pathogenic report, the correlation between the clinical information of the sample and the report result is transmitted to a clinician/report interpretation personnel for final judgment, and the risk of species being removed by mistake is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1: the data preprocessing method is schematically shown;
fig. 2: in different sample types, the method has the treatment effect on the filtering proportion of the original species;
Fig. 3: in different sample types, the effective audit duty ratio situation obtained by applying the method;
fig. 4: in different sample types, comparing the situation that the report species occupy the number of the originally detected species when the method is not applied;
Fig. 5: in different sample types, after the method is applied, the improvement ratio of the auditing effective rate is reported.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to embodiments and examples, but it will be understood by those skilled in the art that the following embodiments and examples are only for illustrating the present invention and should not be construed as limiting the scope of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. The specific conditions are not specified, and the process is carried out according to conventional conditions or conditions suggested by manufacturers. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention.
In the present invention, clinically pathogenic microorganisms refer to microorganisms capable of causing diseases in humans, such as: 1. pathogenic microorganisms mentioned in the catalogue of pathogenic microorganisms of interpersonal infections; 2. pathogenic microorganisms mentioned in Harrison infectious diseases, handbook of clinical microbiology, diagnostic and graphic of clinical microbiology, etc.; 3. some authoritative websites at home and abroad: pathogenic microorganisms mentioned in national bacterial resistance monitoring networks and International pathogen detection networks (IPSN, https:// www.who.int/data/collections).
Bacteria and fungi refer to both bacteria and fungi. Wherein bacterial species refers to the broad bacterial kingdom, including mycobacteria, mycoplasma, chlamydia, spirochetes, rickettsiae, and other bacterial species that are likely to cause human disease; fungi refer to the kingdom fungi and include the literature reports of fungal species that may cause human disease, such as trichophyton, candida, cryptococcus, mold, malassezia, and the like.
Viruses include mainly viral species capable of causing human diseases, viruses covering both RNA and DNA classes.
Parasite populations, mainly including protozoa, helminth species, capable of causing human diseases.
In the invention, the common microecological flora refers to the part where normal microbial flora is distributed and planted in a human body, and the normal microecological system can comprise skin, oral cavity, respiratory tract, gastrointestinal tract, genitourinary tract and other systems. Can be in normal symbiotic state with human body, and does not cause disease, and can be used as common microecological flora. Such as actinomycetes, are generally considered to be micro-ecological bacteria of the oral cavity and respiratory tract.
In the invention, the virus commonly carried by human body refers to the extremely low-load virus commonly carried by healthy people, and the virus does not cause diseases when the immunity is normal; such as various human herpesviruses, parvoviruses, human papillomaviruses, and the like.
In a first aspect, the present invention provides a method for analyzing pathogenic microorganism data, comprising the steps of:
a. The method comprises the steps of obtaining a sample to be detected and a species comparison result after biological information analysis of a negative control sample mNGS of the sample to be detected, dividing detected microorganisms into clinical pathogenic microorganisms and non-clinical pathogenic microorganisms according to clinical pathogenic meanings according to the comparison result, and dividing the detected microorganisms into three main categories of bacteria, fungi, viruses and parasites according to biological classification groups to which the detected microorganisms belong;
b. counting the detection frequency of each microorganism in the type of the sample to be tested under the same experimental flow with the sample to be tested, and taking the detection frequency as a background library;
c. the following screening was performed on clinically pathogenic microorganisms:
for bacteria and fungi, enter pathogen candidate list 1 if all of the following screening conditions are met:
Top 15 of bacterial genus abundance or top 15 of fungal genus abundance, and top 2 of intracytoplasmic species abundance; the detection rate in the background library is less than 25% or is not recorded; NC_ratio is not less than 3 or is na; the sequence number of the first in-genus species abundance rank is more than or equal to 3, or the sequence number of the second in-genus species abundance rank is more than or equal to 3 and is higher than 0.1 times of the sequence number of the first in-genus species abundance rank, which does not belong to the common microecological flora;
for the virus class, enter pathogen candidate list 1 if the following screening conditions are met:
The sequence number of the species detected by the virus or suspected background carried by the human body is more than or equal to 3, or the sequence number of the species detected by the virus or suspected background carried by the non-human body is more than or equal to 1; the species detected by the suspected background is viruses with the detection rate of more than 50% in the same type of sample detected in the same batch as the sample to be detected;
for parasites, entry into pathogen candidate list 1 occurs if the following screening conditions are met:
the number of sequences is more than or equal to 3, and NC_ratio is more than or equal to 3 or na;
d. Acquiring a pathogen species list with important focus on all sample types and a pathogen species list with important focus on respiratory tract samples, and entering a pathogen candidate list 2 if the detected microorganisms meet any one of the following screening conditions:
Belongs to a list of pathogen species of great concern; the sample to be tested is a respiratory tract sample, the abundance of the species in the genus is ranked at the top 2, the sequence number is more than or equal to 3, and the sample belongs to a pathogen species list which is focused on by the respiratory tract sample.
The step mainly plays a role in leak detection and deficiency repair, and clinical microorganism conditions which pay attention to are screened according to sample types.
E. Combining and de-duplicating the pathogen candidate list 1 and the pathogen candidate list 2 to obtain a pathogen candidate list;
Nc_ratio=rpm value of a species in the sample to be measured/RPM value of the species in the negative control sample, and nc_ratio is denoted as na if the species is not detected in the negative control sample;
The list of pathogenic species of major interest for all sample types includes: novel cryptococcus, yarrowia, trichosporon assamica, aspergillus fumigatus, mycobacterium tuberculosis complex, mycoplasma pneumoniae, chlamydia psittaci, treponema pallidum, bartonella henselae, benakaokex, clostridium tetani, vibrio vulnificus, wound coccus porus, helicobacter pylori, leptospira interrogans, eastern tsutsugamushi and whipple organisms;
The list of pathogenic species of great interest to the respiratory tract sample includes: escherichia coli, klebsiella pneumoniae, klebsiella aerogenes, staphylococcus aureus, acinetobacter baumannii, streptococcus pneumoniae, pseudomonas aeruginosa, brucella abortus, brucella melitensis, brucella accidentalis, burkholderia cepacia, haemophilus influenzae, nocardia abscess, nocardia astrocina, nocardia asiatica, nocardia brasiliensis, nocardia guinea pig, proteus mirabilis, rhodococcus maltophilia, streptococcus agalactiae, streptococcus pyogenes, moraxella catarrhalis, achromobacter xylosoxidans, enterobacter cloacae complex, exopathia, isaria meningitidis, morganella morganii and Serratia marcescens.
It should be noted that, in the present invention, the "same experimental procedure" refers to the use of the same processing steps (e.g. extraction, library establishment) and reagents as the sample to be tested; the "detection frequency" refers to the ratio of the number of samples of the species detected to the number of all samples of the type in the same type as the sample to be detected; setting the number of detection sequences of species A of a certain bacterium in a sample as x, the number of detection sequences of species A corresponds to the genus G, the total number of sequences of other species classified as the genus G in the sample as y, and the number of detection sequences of the total species belonging to the bacterial class in the sample as z; then the abundance of species A is x/z and the relative abundance of the belonging G is (x+y)/z. "same batch" refers to samples that are tested at the same time, at the same location, by the same batch of test operators using the same test procedure.
In the invention, a pathogen species list which is focused on by all sample types and a pathogen species list which is focused on by respiratory tract samples contain clinically common pathogenic pathogens, especially the types which are easy to be missed in analysis if the pathogen concentration in a sample to be detected is low; in order to reduce the risk of omission in the microbial screening process, important care is required. For respiratory tract samples, as a relatively complex micro-ecological system exists at the part of the human body, the possibly common pathogenic species of the samples are summarized for leak detection and deficiency repair; pathogenic species focused on the respiratory tract are mostly detected in other sample types, but pathogenic microorganisms are highly likely to be in the respiratory tract sample, and the respiratory tract sample needs to be judged and screened by combining other clinical information.
According to the analysis method provided by the invention, a semi-automatic mode is adopted, the original microorganism detection list is processed according to a specific screening method, the interference species are gradually removed, the result in the sample is more likely to be reserved, a simplified range is provided for the interpretation work of the pathogenic report, the correlation between the clinical information of the sample and the report result is transmitted to a clinician/report interpretation personnel for final judgment, and the risk of species being removed by mistake is reduced.
In some alternative embodiments, the sample to be tested comprises a clinical sample;
The negative control sample is a healthy human sample.
In some alternative embodiments, the step a further comprises labeling the sample to be tested for detecting a common micro-ecological flora in the microorganism according to the comparison result.
In some alternative embodiments, step a further comprises calculating the abundance of the genus and the abundance of the species within the genus.
In some alternative embodiments, in step b, at least 4 samples of the same experimental procedure and type as the sample to be tested are counted, and the detection frequency of each microorganism in the sample type to be tested is calculated.
In some optional embodiments, in the step b, the sample to be tested is the same experimental procedure and the same type of sample is a sample detected in a selected period of time in a laboratory; the selected time period can be, for example, within 1 month before the detection of the sample to be detected, and the closer the selected time period is to the detection time of the sample to be detected, the more beneficial to eliminating background interference of reagent consumable and experimental environment.
The method for constructing the background fungus library file does not need to be used for collecting and evaluating the environmental samples independently. Sample detection is carried out in a laboratory subjected to standard sterilization treatment by using a reagent passing quality inspection, and detection results of detection samples are accumulated periodically to serve as monitoring and reference, and meanwhile, the reagent background and the elimination of nucleic acid interference in the laboratory environment are assisted. Compared with a method for establishing reagent and laboratory environment background reference through single detection, the dynamic background library established regularly has the advantages of large sample size, similar sample references and the like.
In some alternative embodiments, the type of sample to be tested includes, but is not limited to, a cerebrospinal fluid sample, a respiratory tract sample, or a blood sample.
In a second aspect, the invention provides a pathogenic microorganism data analysis device, which comprises a species comparison result acquisition module, a background library acquisition module, a screening module and an output module;
The species comparison result acquisition module is used for acquiring a species comparison result after biological information analysis of a sample to be detected and a negative control sample mNGS thereof, dividing detected microorganisms into clinical pathogenic microorganisms and non-clinical pathogenic microorganisms according to clinical pathogenic significance according to the comparison result, and dividing the detected microorganisms into three categories of bacteria, fungi, viruses and parasites according to biological classification groups to which the detected microorganisms belong;
the background library acquisition module is used for counting the detection frequency of each microorganism in the type of the sample to be tested under the same experimental process as the sample to be tested to acquire a background library;
the screening module is used for screening species meeting the conditions according to species comparison results, and comprises a bacterial and fungus screening module, a virus screening module, a parasite screening module and a pathogenic species screening module which focuses on;
the bacteria and fungus screening module, the virus screening module and the parasite screening module are used for screening clinical pathogenic microorganisms;
The bacterial and fungus screening module is used for screening bacteria and fungi and listing the matched species in a pathogen candidate list 1, and screening conditions comprise: top 15 of bacterial genus abundance or top 15 of fungal genus abundance, and top 2 of intracytoplasmic species abundance; the detection rate in the background library is less than 25% or is not recorded; NC_ratio is not less than 3 or is na; the sequence number of the first in-genus species abundance rank is more than or equal to 3, or the sequence number of the second in-genus species abundance rank is more than or equal to 3 and is higher than 0.1 times of the sequence number of the first in-genus species abundance rank, which does not belong to the common microecological flora;
The virus screening module is used for screening viruses and listing the matched species into a pathogen candidate list 1, and screening conditions comprise: the sequence number of the species detected by the virus or suspected background carried by the human body is more than or equal to 3, or the sequence number of the species detected by the virus or suspected background carried by the non-human body is more than or equal to 1; the species detected by the suspected background is viruses with the detection rate of more than 50% in the same type of sample detected in the same batch as the sample to be detected;
the parasite screening module is used for screening parasites and listing the matched species into a pathogen candidate list 1, and screening conditions comprise: the number of sequences is more than or equal to 3, and NC_ratio is more than or equal to 3 or na;
The important focused pathogen species screening module is used for screening important focused pathogen species, and the conforming species are listed in a pathogen candidate list 2, and screening conditions comprise: the method belongs to a pathogen species list with important attention, or belongs to a pathogen species list with important attention of the respiratory tract sample, wherein the sample to be detected is the respiratory tract sample, the abundance of the species in the genus is ranked at the front 2, the sequence number is more than or equal to 3;
the output module is used for merging and de-duplicating the pathogen candidate list 1 and the pathogen candidate list 2 to obtain a pathogen candidate list;
Nc_ratio=rpm value of a species in the sample to be measured/RPM value of the species in the negative control sample, and nc_ratio is denoted as na if the species is not detected in the negative control sample;
The list of pathogenic species of major interest for all sample types includes: novel cryptococcus, yarrowia, trichosporon assamica, aspergillus fumigatus, mycobacterium tuberculosis complex, mycoplasma pneumoniae, chlamydia psittaci, treponema pallidum, bartonella henselae, benakaokex, clostridium tetani, vibrio vulnificus, wound coccus porus, helicobacter pylori, leptospira interrogans, eastern tsutsugamushi and whipple organisms;
The list of pathogenic species of great interest to the respiratory tract sample includes: escherichia coli, klebsiella pneumoniae, klebsiella aerogenes, staphylococcus aureus, acinetobacter baumannii, streptococcus pneumoniae, pseudomonas aeruginosa, brucella abortus, brucella melitensis, brucella accidentalis, burkholderia cepacia, haemophilus influenzae, nocardia abscess, nocardia astrocina, nocardia asiatica, nocardia brasiliensis, nocardia guinea pig, proteus mirabilis, rhodococcus maltophilia, streptococcus agalactiae, streptococcus pyogenes, moraxella catarrhalis, achromobacter xylosoxidans, enterobacter cloacae complex, exopathia, isaria meningitidis, morganella morganii and Serratia marcescens.
The device is based on the pathogenic microorganism data analysis method, has simple structure, and generates a pathogenic report list suitable for being read by auditors by carrying out primary screening of clinically relevant pathogenic microorganisms and supplementary screening of important focused microorganisms on an acquired sample microorganism data original detection list. The device can provide simple and comprehensive clinical microorganism detection information for clinicians/report interpretation auditors, can improve the working efficiency of pathogenic interpretation reporting personnel, reduce the risk of sample missing microorganisms, and assist the accumulation of sample data.
In some optional embodiments, the method further comprises an abundance calculating module for calculating the genus abundance and the intra-genus species abundance of the microorganism detected by the sample to be detected according to the species comparison result.
In some optional embodiments, the kit further comprises mNGS biological information analysis module, which is configured to perform mNGS biological information analysis on the sample to be tested and sequencing data of the negative control sample thereof, and output a species comparison result after mNGS biological information analysis to the species comparison result acquisition module.
In some alternative embodiments, the sample to be tested comprises a clinical sample;
The negative control sample is a healthy human sample;
Types of the sample to be tested include, but are not limited to, a cerebrospinal fluid sample, a respiratory tract sample, or a blood sample.
In some alternative embodiments, the background library obtaining module counts at least 4 samples of the same experimental procedure and the same type as the sample to be tested, and calculates the detection frequency of each microorganism in the sample type to be tested.
In some alternative embodiments, the samples to be tested are of the same experimental procedure and the same type of sample is a sample detected in a selected period of time in a laboratory; the selected time period can be, for example, within 1 month before the detection of the sample to be detected, and the closer the selected time period is to the detection time of the sample to be detected, the more beneficial to eliminating background interference of reagent consumable and experimental environment.
In a third aspect, the present invention provides a processor for running a program, wherein the program is run to perform the pathogenic microorganism data analysis method.
The processor can generate a pathogen report list suitable for being read by auditors, can improve the working efficiency of pathogen reading reporting personnel, reduces the risk of sample missing reporting microorganisms, and helps to accumulate sample data.
The invention is further illustrated by the following specific examples, but it should be understood that these examples are for the purpose of illustration only and are not to be construed as limiting the invention in any way.
Example 1
A pathogenic microorganism data analysis method, the analysis flow is shown in figure 1 and comprises the following steps:
1) Species comparison results after sample mNGS bioinformatic analysis were obtained:
And carrying out high-throughput sequencing on the sample to be detected, and obtaining a microorganism original list of sample detection data. The samples to be tested herein include Negative Control (NC) samples from the same batch analysis, in addition to conventional clinical samples. The negative control sample is a healthy human sample. The original microorganism list is classified into the following three major categories according to clinical database information: common microecological flora, special pathogens and other pathogens; meanwhile, preliminary marking is carried out according to clinical pathogenic significance, clinical pathogenic microorganisms and non-clinical pathogenic microorganisms are distinguished, and the species detection result in the conventional sample is matched with the reference information in the negative control sample. The method comprises the steps of dividing original detected microorganisms of a sample to be detected into three categories of bacteria, fungi, viruses and parasites according to the biological classification group, and calculating abundance values of corresponding classification units based on the sequence number detected by the species according to the species relationship of the detected species. The abundance of its species, corresponding genus, was calculated for all detected species.
2) Obtaining background library species information corresponding to current analysis:
The information of the background library species comes from the periodic centralized processing of the results of the detection of microorganisms in the historical sample. Designating a fixed time interval (recommending that the sample to be detected in the previous month), and counting the occurrence frequency of each species in different samples under the same experimental treatment flow. Sample types are grouped by broad category and can be categorized into cerebrospinal fluid samples, respiratory tract samples, blood samples and others. For a sample set for a selected period of time, the distribution of microbial detection in different sample classes is counted in the manner described above, and the frequency of detection of each species in the corresponding sample class (the number of samples of that species detected/the number of samples in that sample class) can be calculated. The file is a background library file and is used for subsequent filtering analysis.
3) Primary screening of clinically relevant pathogenic microorganisms:
the following screening was performed on clinically pathogenic microorganisms:
a. Bacteria & fungi: microorganisms of the bacterial and fungal species are among the most abundant species detected in the samples for examination, and also occupy a considerable proportion of pathogenic microorganisms of clinical interest. And according to the calculated abundance value, sequentially carrying out descending processing on the sample detection result according to the genus abundance and the species abundance. The following determination is sequentially made for the detected species:
i. The ranking requirement: the top 15 positions of the abundance of the bacteria genus or the top 15 positions of the abundance of the fungi genus are met, and the abundance of the species in the genus is 2 times of the abundance of the species in the genus, and the next round of screening is carried out; otherwise, eliminating.
And ii, distinguishing the background bacteria: referring to the background library file record, if the detection rate in the background library is less than 25% or not recorded, entering the next round of screening; otherwise, eliminating.
Negative control discrimination: referring to the detection result of the species in the negative control sample, the difference between the species in the conventional sample and the negative control detection is calculated: for species s detected in both the negative control sample and the conventional sample, carrying out homogenization treatment on the corresponding detected reads number according to the size of the sample data volume to obtain two RPM values (reads per million, RPM) of the species in the negative control sample and the conventional sample, wherein NC_ratio=RPM (sample)/RPM (negative control); nc_ratio is denoted na if species s is detected only in regular samples. In the judgment, if NC_ratio is more than or equal to 3 or na, the species enters the next round of screening; otherwise, eliminating.
Further, by in-genus ranking, distinguish: if the abundance ranking is the first and the sequence number is more than or equal to 3, directly entering a pathogen candidate list 1; if the species abundance in the genus is ranked as 2 nd and marked as a common microecological flora, rejecting; if the sequence number is more than or equal to 3, and the sequence number is 0.1 times higher than the sequence number of the species with the in-genus abundance ranking 1, entering the pathogen candidate list 1, and removing the other unsatisfied species.
B. Virus type: the microbial genome of the viral class is generally smaller and less detected than the bacterial fungal class, so no filtering conditions are set in the data preparation to reduce the risk of false negatives. For the species marked as the virus type, firstly judging through a clinical database reference mark and a background library file, and if the species is the species (such as human herpesvirus 7 type and the like) detected by the virus or suspected background carried by human body commonly and the number of detected sequences is more than or equal to 3, directly entering a pathogen candidate list 1; if the number of sequences of viruses in other categories is more than or equal to 1, entering a candidate list 1; the species detected by the suspected background is viruses with the detection rate of more than 50% in the same type of sample detected in the same batch as the sample to be detected.
C. Parasites: the genome of the parasite class is larger than the other classes, and false positive alignment results are likely to occur in mNGS assays due to the high similarity of sequences to hosts. For the results of parasite detection, a higher reporting limit is set, and if the number of sequences is lower than 20, direct elimination is performed. Otherwise, continuing to judge the NC_ratio, and if NC_ratio is more than or equal to 3 or is na, entering the next round of screening; otherwise, eliminating.
And 3) screening in the step 3), obtaining a candidate list 1 of the sample.
4) The important focus is on species screening:
The step mainly plays a role in leak detection and deficiency repair, and clinical microorganism conditions which are focused are screened according to sample types. Based on a preliminary empirical summary, a list of important pathogens (Table 1) is obtained to form a checklist, and the detection of these important species is checked in the sample as the content of candidate list 2. For respiratory tract samples (including alveolar lavage fluid, pharyngeal swabs, sputum, etc.) with complex microbial species, some judgment conditions are added: firstly judging whether the condition of 2 before the ranking in the genus is satisfied, and if not, rejecting; further, if the number of species sequences is not less than 3 and belongs to pathogenic species which are important in the respiratory tract sample, entering a candidate list 2; the remaining species that do not meet the conditions are rejected.
The candidate list 2 of the sample can be obtained through the screening of the step 4).
Table 1: all sample important pathogen checklists (18 kinds in total)
Table 2: important pathogen examination list of respiratory tract specimens (32 kinds in total)
5) Obtaining an audit list of sample pathogenic species interpretations:
The step combines and de-duplicates the two candidate lists obtained in the steps 3) and 4). Dividing all results into two groups of common microecological flora and other pathogenic results, and sequencing the two groups according to the following rules: firstly, according to the group sorting of the belonging groups, the order of outputting bacteria, fungi, viruses and parasites is agreed; secondly, arranging the sequences in descending order according to the total sequence number; again arranged in descending order according to the sequence number of species in the genus; finally, the whole result is output after being ranked into two types of special pathogens and other pathogens according to the species with the common microecological flora labels.
And 5) obtaining a worksheet of the sample entering report reading and auditing.
Test example 1
An example of mNGS microorganism detection result of alveolar lavage fluid DNA was analyzed by the method of example 1.
Sample type: alveolar lavage fluid; sample sequencing data volume: 36M; sample sequencing strategy: single ended sequencing 50bp (SE 50). The original detection result of the sample and the final report result after manual examination are shown in Table 3.
Table 3 original list of microorganism detection and pretreatment result display of test example 1
Note that: specific pathogenic bacteria: indicating whether the detected microorganism is a special pathogenic bacterium after comparing the information of the clinical database; pathogenic meaning: indicating the attention degree of the originally detected microorganisms after comparing the information of the clinical database, and judging whether the detected microorganisms are clinically reportable pathogens Clinical Reportable Pathogens (CRP) and clinically important pathogens Clinical Concern Reportable Pathogens (CCRP); pretreatment results: labeling the result after analysis of the invention, and reserving or filtering; whether it is the final reporter species: and labeling the processing result of whether to report or not after the manual inspection of a reporting and reading personnel.
Description of the analytical procedure:
The sample is processed by a general mNGS experimental procedure, sequencing data is analyzed by using a mNGS pathogen analysis procedure built by a laboratory (comprising data quality control, human host data filtering, pathogen microorganism genome database comparison and equivalent general analysis modules), 7 microorganism results are detected, and the results comprise 4 bacteria, 1 fungi and 2 parasites which are clinically reportable pathogen microorganisms.
The analysis was performed using the method of example 1, for bacterial & fungal pathogens, 5 pathogens all met the first requirement of top 15 in rank and first in-genus species sequence, but 4 pathogens were rejected with a sequence number < 3. For both parasite pathogens, the sequence number was <20, and thus knocked out. Mycoplasma pneumoniae thus entered candidate list 1. Meanwhile, the initial detection result of the sample is firstly compared with a species of interest checking list, and only mycoplasma pneumoniae enters a candidate list 2; and eliminating other pathogens which do not meet the condition for further judging the respiratory tract sample. The results of candidate lists 1, 2 are combined and duplicates are removed leaving only mycoplasma pneumoniae as the list reported to be audited. The subsequent reporter can further determine mycoplasma pneumoniae as the pathogenic microorganism of the sample according to the clinical relevant information of the sample (such as fever, cough, pneumonia diagnosis).
Test example 2
A plasma sample was analyzed for DNA detection by mNGS microorganisms using the method of example 1.
Sample type: plasma; sample sequencing data volume: 30M; sample sequencing strategy: single ended sequencing 50bp (SE 50).
The results of the initial detection of a part of the samples and the final report after all manual examination are shown in Table 4.
Table 4 original list of microorganism detection and pretreatment result display of test example 2
Description of the analytical procedure:
The samples were subjected to a general mNGS experimental procedure and sequencing data were analyzed using a mNGS pathogen assay procedure built up in the laboratory (method same as test example 1), and 89 microorganisms were detected in total, including 80 bacteria (belonging to 33 genera), 5 fungi (belonging to 5 genera) and 4 viruses (belonging to 4 genera). For convenience of description, the results of originally detected bacteria were subjected to puncturing treatment (Table 4).
The analysis is carried out by adopting the analysis method of the example 1, and firstly, non-clinical pathogenic microorganisms are removed according to the prompt of a pathogenic meaning column; and secondly, carrying out a filtering step according to the screening standards of different groups. For bacteria & fungi, all species with genus abundance ranking after position 15 were excluded; removing species with the detection rate of more than 25% from the background library according to the detection rate reference of the background library; for the case of detecting multiple species within the same genus, all species within the genus after the top 2 are excluded; and finally, eliminating the species with the sequence number of <3 in the single species, so that mycoplasma pneumoniae in the bacterial group and the streptococcus in children enter a candidate list 1. The sample was free of parasite detection. For the 4 viruses detected, the number of sequences is equal to or greater than 1, and the candidate list 1 is entered. Meanwhile, the initial detection result of the sample is compared with a target species detection table, and mycoplasma pneumoniae enters the candidate list 2. And merging the results of the candidate lists 1 and 2, removing the repetition, and reserving 6 pathogenic microorganism results such as mycoplasma pneumoniae, streptococcus suis, human adenovirus B, human adenovirus 7, human adenovirus B3, human herpesvirus 5 and the like to generate a list reporting to be audited. The subsequent reporter performed on-line BLAST analysis of the specificity of the virus detection sequence, excluding 3 viruses with lower specificity, and further explicitly reported the possible pathogenic microorganisms of the sample as: mycoplasma pulmonary, human herpesvirus type 5, and streptococcus suis.
Test example 3
To further verify the synergy of the invention to manual interpretation and auditing, we collected the mNGS test results of 782 different sample types (the analysis method is the same as test example 1), and compared the variation of the species number range entering manual auditing before and after using the analysis method of the invention example 1, the test sample composition is shown in table 5. In order to quantify the improvement effect of the analysis method of the invention on the manual auditing efficiency, we first acquire three values: the total microorganism species number obtained after mNGS analysis of the sample is T, the microorganism species number obtained after treatment by the analysis method is S, and the final report species number F of the test sample after interpretation and examination by a reporter; next, 4 data indexes are defined: 1. species filtration ratio (abbreviated as filtration ratio) refers to the ratio of the number of species filtered out after the sample to be tested is analyzed to the total detected number, and the specific calculation is that: (T-S)/T.times.100%; 2. the effective auditing proportion refers to the proportion of the number of final reported species to the number of species after pretreatment, and is specifically calculated: F/S×100%;3. reporting the ratio of the number of species to the total number of samples, and specifically calculating: F/T×100%;4. the effective rate is improved, and the difference between the effective auditing rate and the report occupying detecting rate reflects the comprehensive improving effect of auditing efficiency of each sample detecting result after the method is applied. In the test examples, the filtering proportion of the test example 1 is 85% (6/7), the effective auditing proportion is 100%, the reported species accounts for 14.28% (1/7), and the auditing effective rate is improved to 85.72%; the filtration ratio of test example 2 was 93.3% (83/89), the effective audit ratio was 50% (3/6), the reported species was 3.37% (3/89), and the audit efficiency was improved to 46.6%.
Table 5: test sample composition (DNA detection flow)
The performance of the 4 indicators in the test samples is described according to different sample types.
Filtration ratio: as shown in FIG. 2, the method can effectively filter more than 80% of the original detection results of microorganisms in 5 tested sample types. The average filtration ratio was 90.14% (n=376) in alveolar lavage fluid, 90.79% (n=92) in cerebrospinal fluid, 84.91% (n=151) in whole blood, 89.88% (n=103) in sputum, 85.07% (n=60) in tissue.
Effective auditing ratio: as shown in FIG. 3, the effective auditing rate obtained by the test of the method can reach more than 60% in the sample types of 5 tests. The average effective audit rate was 73.59% (n=376) in alveolar lavage fluid, 62.14% (n=92) in cerebrospinal fluid, 65.39% (n=151) in whole blood, 72.12% (n=103) in sputum, 63.89% (n=60) in tissue. This ratio demonstrates that the range of microorganism species after pretreatment is effectively focused.
Reporting the ratio of detection: as shown in fig. 4, the sample reported species ratio was less than 10% of the original detected ratio in the 5 sample types tested without the treatment of the present method. This means that the number of microorganisms originally detected by different sample types is large, and the workload of manual interpretation and verification is huge if no effective pretreatment method is available. In these test samples, the number of originally detected microorganisms can range from 10≡0 to 10≡3, with the average value reported to be 7.16% (n=376), 5.14% (n=92), 9.19% (n=151) of whole blood, 7.05% (n=103) of sputum, and 8.86% (n=60) of tissue in the alveolar lavage fluid.
Effective rate improvement ratio: as shown in FIG. 5, after pretreatment is finished by using the method, the workload of manual auditing is reduced, and the improvement of the effective auditing proportion is reflected. If the pretreatment method is not used, reporting work is to select less than 10% of microorganisms from the original detection range for judging pathogenicity; after the pretreatment method, the report only needs to be screened in the range after filtration, wherein more than about half of the microorganisms need to be further interpreted as pathogenicity. The average value of the effective improvement is respectively as follows in the sample types of 5 tests: alveolar lavage fluid 66.44% (n=376), cerebrospinal fluid 56.73% (n=92), whole blood 56.20% (n=151), sputum 65.07% (n=103), tissue 55.04% (n=60).
In summary, according to the analysis method of pathogenic microorganism data provided by the invention, after the original list of sample detection data is obtained, classification and grading are carried out according to a certain rule, corresponding screening are carried out according to different detected groups (bacteria, fungi, viruses and parasites), meanwhile, important attention species of different sample types are checked, the target range is effectively screened on the basis of the original detection list, a more simplified auditing range is provided for subsequent report interpretation personnel, the pathogenic significance of the species is further judged by combining with clinical information of the sample, and the auditing efficiency of the report interpretation personnel is improved.
Meanwhile, in the analysis method, the reference method for dynamically establishing the background library reduces the operation complexity for establishing the background library, and effectively utilizes the sample information accumulation data of the past detection in the laboratory.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. A method for analyzing pathogenic microorganism data, comprising the steps of:
a. The method comprises the steps of obtaining a sample to be detected and a species comparison result after biological information analysis of a negative control sample mNGS of the sample to be detected, dividing detected microorganisms into clinical pathogenic microorganisms and non-clinical pathogenic microorganisms according to clinical pathogenic meanings according to the comparison result, and dividing the detected microorganisms into three main categories of bacteria, fungi, viruses and parasites according to biological classification groups to which the detected microorganisms belong;
b. counting the detection frequency of each microorganism in the type of the sample to be tested under the same experimental flow with the sample to be tested, and taking the detection frequency as a background library;
c. the following screening was performed on clinically pathogenic microorganisms:
for bacteria and fungi, enter pathogen candidate list 1 if all of the following screening conditions are met:
Top 15 of bacterial genus abundance or top 15 of fungal genus abundance, and top 2 of intracytoplasmic species abundance; the detection rate in the background library is less than 25% or is not recorded; NC_ratio is not less than 3 or is na; the sequence number of the first in-genus species abundance rank is more than or equal to 3, or the sequence number of the second in-genus species abundance rank is more than or equal to 3 and is higher than 0.1 times of the sequence number of the first in-genus species abundance rank, which does not belong to the common microecological flora;
for the virus class, enter pathogen candidate list 1 if the following screening conditions are met:
The sequence number of the species detected by the virus or suspected background carried by the human body is more than or equal to 3, or the sequence number of the species detected by the virus or suspected background carried by the non-human body is more than or equal to 1; the species detected by the suspected background is viruses with the detection rate of more than 50% in the same type of sample detected in the same batch as the sample to be detected;
for parasites, entry into pathogen candidate list 1 occurs if the following screening conditions are met:
the number of sequences is more than or equal to 3, and NC_ratio is more than or equal to 3 or na;
d. Acquiring a pathogen species list with important focus on all sample types and a pathogen species list with important focus on respiratory tract samples, and entering a pathogen candidate list 2 if the detected microorganisms meet any one of the following screening conditions:
Belongs to a list of pathogen species of great concern; the sample to be tested is a respiratory tract sample, the abundance of the species in the genus is ranked at the top 2, the sequence number is more than or equal to 3, and the sample belongs to a pathogen species list which is focused on by the respiratory tract sample;
e. Combining and de-duplicating the pathogen candidate list 1 and the pathogen candidate list 2 to obtain a pathogen candidate list;
Nc_ratio=rpm value of a species in the sample to be measured/RPM value of the species in the negative control sample, and nc_ratio is denoted as na if the species is not detected in the negative control sample;
The list of pathogenic species of major interest for all sample types includes: novel cryptococcus, yarrowia, trichosporon assamica, aspergillus fumigatus, mycobacterium tuberculosis complex, mycoplasma pneumoniae, chlamydia psittaci, treponema pallidum, bartonella henselae, benakaokex, clostridium tetani, vibrio vulnificus, wound coccus porus, helicobacter pylori, leptospira interrogans, eastern tsutsugamushi and whipple organisms;
The list of pathogenic species of great interest to the respiratory tract sample includes: escherichia coli, klebsiella pneumoniae, klebsiella aerogenes, staphylococcus aureus, acinetobacter baumannii, streptococcus pneumoniae, pseudomonas aeruginosa, brucella abortus, brucella melitensis, brucella accidentalis, burkholderia cepacia, haemophilus influenzae, nocardia abscess, nocardia astrocina, nocardia asiatica, nocardia brasiliensis, nocardia guinea pig, proteus mirabilis, rhodococcus maltophilia, streptococcus agalactiae, streptococcus pyogenes, moraxella catarrhalis, achromobacter xylosoxidans, enterobacter cloacae complex, exopathia, isaria meningitidis, morganella morganii and Serratia marcescens.
2. The method of claim 1, wherein the negative control sample is a healthy human sample.
3. The method according to claim 1, wherein the step a further comprises labeling the sample to be tested for detecting a common micro-ecological flora in the microorganism based on the comparison result, and calculating the abundance of the genus and the abundance of the species in the genus.
4. The method according to claim 1, wherein in step b, at least 4 samples of the same experimental procedure and type as the sample to be tested are counted, and the detection frequency of each microorganism in the type of sample to be tested is calculated.
5. The method of claim 1, wherein the sample type comprises a cerebrospinal fluid sample, a respiratory tract sample, or a blood sample.
6. The pathogenic microorganism data analysis device is characterized by comprising a species comparison result acquisition module, a background library acquisition module, a screening module and an output module;
The species comparison result acquisition module is used for acquiring a species comparison result after biological information analysis of a sample to be detected and a negative control sample mNGS thereof, dividing detected microorganisms into clinical pathogenic microorganisms and non-clinical pathogenic microorganisms according to clinical pathogenic significance according to the comparison result, and dividing the detected microorganisms into three categories of bacteria, fungi, viruses and parasites according to biological classification groups to which the detected microorganisms belong;
the background library acquisition module is used for counting the detection frequency of each microorganism in the type of the sample to be tested under the same experimental process as the sample to be tested to acquire a background library;
the screening module is used for screening species meeting the conditions according to species comparison results, and comprises a bacterial and fungus screening module, a virus screening module, a parasite screening module and a pathogenic species screening module which focuses on;
the bacteria and fungus screening module, the virus screening module and the parasite screening module are used for screening clinical pathogenic microorganisms;
The bacterial and fungus screening module is used for screening bacteria and fungi and listing the matched species in a pathogen candidate list 1, and screening conditions comprise: top 15 of bacterial genus abundance or top 15 of fungal genus abundance, and top 2 of intracytoplasmic species abundance; the detection rate in the background library is less than 25% or is not recorded; NC_ratio is not less than 3 or is na; the sequence number of the first in-genus species abundance rank is more than or equal to 3, or the sequence number of the second in-genus species abundance rank is more than or equal to 3 and is higher than 0.1 times of the sequence number of the first in-genus species abundance rank, which does not belong to the common microecological flora;
The virus screening module is used for screening viruses and listing the matched species into a pathogen candidate list 1, and screening conditions comprise: the sequence number of the species detected by the virus or suspected background carried by the human body is more than or equal to 3, or the sequence number of the species detected by the virus or suspected background carried by the non-human body is more than or equal to 1; the species detected by the suspected background is viruses with the detection rate of more than 50% in the same type of sample detected in the same batch as the sample to be detected;
the parasite screening module is used for screening parasites and listing the matched species into a pathogen candidate list 1, and screening conditions comprise: the number of sequences is more than or equal to 3, and NC_ratio is more than or equal to 3 or na;
The important focused pathogen species screening module is used for screening important focused pathogen species, and the conforming species are listed in a pathogen candidate list 2, and screening conditions comprise: the method belongs to a pathogen species list which is focused on by all sample types, or belongs to a pathogen species list which is focused on by respiratory tract samples, has the abundance of the species in the genus of which the ranking is 2 and the sequence number is more than or equal to 3;
the output module is used for merging and de-duplicating the pathogen candidate list 1 and the pathogen candidate list 2 to obtain a pathogen candidate list;
Nc_ratio=rpm value of a species in the sample to be measured/RPM value of the species in the negative control sample, and nc_ratio is denoted as na if the species is not detected in the negative control sample;
The list of pathogenic species of major interest for all sample types includes: novel cryptococcus, yarrowia, trichosporon assamica, aspergillus fumigatus, mycobacterium tuberculosis complex, mycoplasma pneumoniae, chlamydia psittaci, treponema pallidum, bartonella henselae, benakaokex, clostridium tetani, vibrio vulnificus, wound coccus porus, helicobacter pylori, leptospira interrogans, eastern tsutsugamushi and whipple organisms;
The list of pathogenic species of great interest to the respiratory tract sample includes: escherichia coli, klebsiella pneumoniae, klebsiella aerogenes, staphylococcus aureus, acinetobacter baumannii, streptococcus pneumoniae, pseudomonas aeruginosa, brucella abortus, brucella melitensis, brucella accidentalis, burkholderia cepacia, haemophilus influenzae, nocardia abscess, nocardia astrocina, nocardia asiatica, nocardia brasiliensis, nocardia guinea pig, proteus mirabilis, rhodococcus maltophilia, streptococcus agalactiae, streptococcus pyogenes, moraxella catarrhalis, achromobacter xylosoxidans, enterobacter cloacae complex, exopathia, isaria meningitidis, morganella morganii and Serratia marcescens.
7. The pathogenic microorganism data analysis device of claim 6, further comprising an abundance calculating module for calculating the genus abundance and the intra-genus species abundance of the microorganism from the sample to be measured based on the species comparison result.
8. The pathogenic microorganism data analysis device of claim 6, further comprising a mNGS bioinformatic analysis module for performing mNGS bioinformatic analysis on the sample to be tested and its sequencing data of the negative control sample, and outputting the species comparison result after mNGS bioinformatic analysis to the species comparison result obtaining module.
9. The pathogenic microorganism data analysis device of claim 6, wherein the negative control sample is a healthy human sample;
the type of the sample to be tested comprises a cerebrospinal fluid sample, a respiratory tract sample or a blood sample.
10. A processor, wherein the processor is configured to run a program, wherein the program is configured to perform the pathogenic microorganism data analysis method of any one of claims 1-5.
CN202410323192.3A 2024-03-21 2024-03-21 Pathogenic microorganism data analysis method and device and processor Active CN117935918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410323192.3A CN117935918B (en) 2024-03-21 2024-03-21 Pathogenic microorganism data analysis method and device and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410323192.3A CN117935918B (en) 2024-03-21 2024-03-21 Pathogenic microorganism data analysis method and device and processor

Publications (2)

Publication Number Publication Date
CN117935918A true CN117935918A (en) 2024-04-26
CN117935918B CN117935918B (en) 2024-07-02

Family

ID=90761182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410323192.3A Active CN117935918B (en) 2024-03-21 2024-03-21 Pathogenic microorganism data analysis method and device and processor

Country Status (1)

Country Link
CN (1) CN117935918B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160053295A1 (en) * 2009-05-07 2016-02-25 Biomerieux, Inc. Methods for Antimicrobial Resistance Determination
CN110751984A (en) * 2019-10-31 2020-02-04 广州微远基因科技有限公司 Automatic analysis method and system for sequencing data of metagenome or macrotranscriptome
CN112837745A (en) * 2021-01-15 2021-05-25 广州微远基因科技有限公司 Pathogenic microorganism virulence gene association model and establishment method and application thereof
CN114334005A (en) * 2021-12-06 2022-04-12 上海锐翌生物科技有限公司 Method and system for analyzing and identifying broad-spectrum pathogenic microorganisms
CN117476109A (en) * 2023-11-15 2024-01-30 阿吉安(福州)基因医学检验实验室有限公司 Microbial data analysis method based on super-multiple targeted sequencing technology

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160053295A1 (en) * 2009-05-07 2016-02-25 Biomerieux, Inc. Methods for Antimicrobial Resistance Determination
CN110751984A (en) * 2019-10-31 2020-02-04 广州微远基因科技有限公司 Automatic analysis method and system for sequencing data of metagenome or macrotranscriptome
CN112837745A (en) * 2021-01-15 2021-05-25 广州微远基因科技有限公司 Pathogenic microorganism virulence gene association model and establishment method and application thereof
CN114334005A (en) * 2021-12-06 2022-04-12 上海锐翌生物科技有限公司 Method and system for analyzing and identifying broad-spectrum pathogenic microorganisms
CN117476109A (en) * 2023-11-15 2024-01-30 阿吉安(福州)基因医学检验实验室有限公司 Microbial data analysis method based on super-multiple targeted sequencing technology

Also Published As

Publication number Publication date
CN117935918B (en) 2024-07-02

Similar Documents

Publication Publication Date Title
CN112530519B (en) Method and system for detecting microorganisms and drug resistance genes in sample
CN110349629B (en) Analysis method for detecting microorganisms by using metagenome or macrotranscriptome
CN111462821B (en) Pathogenic microorganism analysis and identification system and application
CN111009286B (en) Method and apparatus for microbiological analysis of a host sample
CN113160882B (en) Pathogenic microorganism metagenome detection method based on third generation sequencing
CN115064215B (en) Method for tracing strains and identifying attributes through similarity
CN105525033A (en) Method and device for detecting microorganisms in blood
CN111599413B (en) Classification unit component calculation method of sequencing data
CN110875082B (en) Microorganism detection method and device based on targeted amplification sequencing
CN113096736A (en) Method and system for automatically analyzing viruses in real time based on nanopore sequencing
CN113066533B (en) mNGS pathogen data analysis method
CN113571128A (en) Method for establishing reference threshold for detecting macro genomics pathogens
CN117935918B (en) Pathogenic microorganism data analysis method and device and processor
CN111310792B (en) Drug sensitivity experiment result identification method and system based on decision tree
Wang et al. Diagnostic yield of nucleic acid amplification tests in oral samples for pulmonary tuberculosis: A systematic review and meta-analysis
CN113355438B (en) Plasma microbial species diversity evaluation method and device and storage medium
CN115083527A (en) Construction method of clustered pan-genetic database
CN115700557A (en) Method, device and storage medium for classifying nucleic acid samples
CN113470752A (en) Bacterial sequencing data identification method based on nanopore sequencer
Li et al. Evaluation of Xpert MTB/RIF for the diagnosis of lymphatic tuberculosis
CN114334005A (en) Method and system for analyzing and identifying broad-spectrum pathogenic microorganisms
CN116741384B (en) Bedside care-based severe acute pancreatitis clinical data management method
Sun et al. Eliminate false positives in metagenomic profiling based on type IIB restriction sites
CN117524313A (en) Analysis method and device for pathogen metagenome sequencing data and application thereof
CN211578386U (en) Metagenome analysis device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant