CN116694744A - Multiple pathogen detection one-stop microorganism detection instrument based on nucleic acid amplification technology - Google Patents

Multiple pathogen detection one-stop microorganism detection instrument based on nucleic acid amplification technology Download PDF

Info

Publication number
CN116694744A
CN116694744A CN202310600810.XA CN202310600810A CN116694744A CN 116694744 A CN116694744 A CN 116694744A CN 202310600810 A CN202310600810 A CN 202310600810A CN 116694744 A CN116694744 A CN 116694744A
Authority
CN
China
Prior art keywords
pathogen
read data
sequencing read
sequencing
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310600810.XA
Other languages
Chinese (zh)
Inventor
张明磊
栾亮
刘静
关会霞
褚美玲
赵汐渟
王继红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
General Hospital of Shenyang Military Region
Original Assignee
General Hospital of Shenyang Military Region
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by General Hospital of Shenyang Military Region filed Critical General Hospital of Shenyang Military Region
Priority to CN202310600810.XA priority Critical patent/CN116694744A/en
Publication of CN116694744A publication Critical patent/CN116694744A/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/30Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention belongs to the technical field of microorganism detection, and discloses a multi-pathogen detection one-stop microorganism detection instrument based on a nucleic acid amplification technology, which comprises: the system comprises a sampling module, a gene extraction module, a main control module, an amplification reaction module, a gene sequencing module, a pathogen type identification module, a mechanism analysis module and a display module. The pathogen type identification module is used for accurately determining the pathogen type existing in the pathogen sample to be tested; meanwhile, acquiring whole pathogen genome expression data after the silencing of a plurality of pathogen genes through a mechanism analysis module; acquiring a pathogen gene expression rank sequence corresponding to the whole pathogen genome expression data; the genome expression profile data of all pathogens infected by a plurality of different pathogens are obtained, so that massive transcriptome big data of various cross-platform types are fused, pathogens are not required to be cultivated from the head and cells are not required to be infected, and a large-scale experiment is performed, so that the research and development cost is reduced, and the detection period is shortened.

Description

Multiple pathogen detection one-stop microorganism detection instrument based on nucleic acid amplification technology
Technical Field
The invention belongs to the technical field of microorganism detection, and particularly relates to a multi-pathogen detection one-stop microorganism detection instrument based on a nucleic acid amplification technology.
Background
The microorganism includes: bacteria, viruses, fungi and some small protozoa, microalgae, etc., and its individual is tiny and closely related to human. A plurality of harmful species are covered, and the food, medicine, industry and agriculture, environmental protection, sports and other fields are widely related. In textbooks in our country, microorganisms are classified into the following 8 major categories: bacteria, viruses, fungi, actinomycetes, rickettsia, mycoplasma, chlamydia, spirochetes. Some microorganisms are visible to the naked eye like mushrooms, ganoderma lucidum, mushrooms, etc. which are fungi. Also microorganisms are a class of "non-cellular organisms" consisting of a few components, such as nucleic acids and proteins; however, existing microorganism detection instruments are inaccurate in pathogen type identification; meanwhile, the existing differential expression pathogen gene analysis analyzes the pathogen gene expression difference caused by unknown pathogen infection, and analyzes the damage mechanism of the unknown pathogen infection aiming at the pathogen gene with the maximum differential expression multiple change, however, the pathogen needs to be cultured from the head and the cells are infected, and a large-scale experiment is carried out, so that the time and the labor are wasted.
Through the above analysis, the problems and defects existing in the prior art are as follows:
(1) The existing microorganism detection instrument is inaccurate in pathogen type identification.
(2) At present, the existing differential expression pathogen gene analysis analyzes the pathogen gene expression difference caused by unknown pathogen infection, and then analyzes the damage mechanism of the unknown pathogen infection aiming at the pathogen gene with the maximum differential expression multiple change, however, the pathogen needs to be cultured from the beginning and the cells are infected, and a large-scale experiment is carried out, so that time and labor are wasted.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a multi-pathogen detection one-stop type microorganism detection instrument based on a nucleic acid amplification technology.
The invention is realized in that a multiple pathogen detection one-station microorganism detection instrument based on a nucleic acid amplification technology comprises:
the system comprises a sampling module, a gene extraction module, a main control module, an amplification reaction module, a gene sequencing module, a pathogen type identification module, a mechanism analysis module and a display module;
the sampling module is connected with the gene extraction module and is used for collecting pathogen samples;
the sampling method of the sampling module comprises the following steps:
the gene extraction module is connected with the sampling module and the main control module and is used for extracting pathogen genes;
The extraction method of the gene extraction module comprises the following steps:
(1) Cell disruption: disrupting microbial cells in the collected pathogen sample by an ultrasonic disruption method to release nucleic acids in the cells;
(2) DNA/RNA isolation: centrifuging the crushed cell mixture to separate out supernatant containing nucleic acid; for DNA extraction, a DNA binding column extraction method is used; for RNA extraction, a phenol/chloroform extraction method was used;
(3) Enzyme treatment: performing enzyme treatment on the extracted nucleic acid to remove protein and RNase;
(4) Purifying and concentrating: removing residual contaminants using column chromatography while concentrating the nucleic acid solution for subsequent analysis;
the main control module is connected with the gene extraction module, the amplification reaction module, the gene sequencing module, the pathogen type identification module, the mechanism analysis module and the display module and used for controlling the normal operation of each module;
the amplification reaction module is connected with the main control module and is used for carrying out nucleic acid amplification reaction on pathogen genes;
the reaction method of the amplification reaction module comprises the following steps:
(1) Preparing a PCR reaction system comprising a PCR buffer solution, a template DNA, a primer, polymerase and dNTPs, and selecting single PCR, multiple PCR and real-time fluorescence quantitative PCR according to the types and the quantity of pathogens to be detected;
(2) Template DNA and a primer are added into a PCR reaction system, and PCR reaction is carried out in a thermal cycler;
the gene sequencing module is connected with the main control module and is used for sequencing pathogen genes;
the sequencing method of the gene sequencing module comprises the following steps:
(1) Preparing a sample: extracting nucleic acid from the pathogen sample obtained by the sampling module and preparing the nucleic acid as a starting material for sequencing;
(2) Library construction: processing and preparing the extracted pathogen nucleic acid sample, including removing contaminants and low quality nucleic acid fragments that may be present; converting the processed nucleic acid sample into a library for subsequent sequencing analysis;
(3) Sequencing method selection: in a gene sequencing module, selecting a proper sequencing method to sequence pathogen genes;
(4) Sequencing was performed: in the gene sequencing module, performing a selected sequencing method to sequence pathogen nucleic acid samples; performing double-ended sequencing according to the selected sequencing method;
(5) Data analysis: the obtained sequencing data is imported into a main control module, and data processing and interpretation are carried out by using sequencing data analysis software based on machine learning;
the pathogen type identification module is connected with the main control module and used for identifying pathogen types;
The mechanism analysis module is connected with the main control module and used for analyzing pathogen infection damage mechanisms;
the display module is connected with the main control module and used for displaying a gene sequencing result, a pathogen type recognition result and a pathogen infection damage mechanism analysis result.
Further, the pathogen type recognition module recognizes the following method:
(1) Acquiring at least one sequencing read data obtained by detecting a nucleic acid sequence corresponding to a pathogen sample to be detected; counting sequencing read data; determining the data type of each sequencing read data in the at least one sequencing read data, and carrying out data cleaning on each sequencing read data in the at least one sequencing read data according to a preset data cleaning strategy according to the data type so as to obtain at least one first sequencing read data with qualified quality;
(2) Performing preliminary identification of pathogen type on each first sequencing read data in the at least one qualified first sequencing read data by using a first identification method and a second identification method respectively to obtain a first pathogen type identification result and a second pathogen type identification result corresponding to each first sequencing read data respectively; selecting second sequencing read data according to a first pathogen type identification result and a second pathogen type identification result corresponding to each first sequencing read data, obtaining at least one second sequencing read data, and classifying the at least one second sequencing read data according to the identified pathogen type to obtain second sequencing read data sets corresponding to different pathogen types;
(3) For any one of the second sequencing read data sets, comparing each second sequencing read data in the any one of the second sequencing read data sets with a pathogen reference sequence of a pathogen type corresponding to the any one of the second sequencing read data sets, and obtaining a comparison result corresponding to each second sequencing read data in the any one of the second sequencing read data sets; and determining a final pathogen type identification result of the pathogen sample to be tested according to the comparison result corresponding to each second sequencing read data in each second sequencing read data set.
Further, the data cleaning for each sequencing read data in the at least one sequencing read data according to the data type and a preset data cleaning policy to obtain at least one qualified first sequencing read data includes:
determining the sequencing read data quality of each sequencing read data with the data type of second generation sequencing data by using fastQC, and screening out each sequencing read data with the sequencing read data quality meeting the preset first sequencing read data quality standard as first sequencing read data;
determining the sequencing read data quality of each sequencing read data with the data type of three-generation sequencing data by utilizing Nanofilt, and screening out each sequencing read data with the sequencing read data quality meeting the preset second sequencing read data quality standard as the first sequencing read data.
Further, the selecting the second sequencing read data according to the first pathogen type identification result and the second pathogen type identification result corresponding to each first sequencing read data, and obtaining at least one second sequencing read data includes:
for any one of the first sequencing read data, if the first pathogen type identification result corresponding to the any one of the first sequencing read data indicates that the identified pathogen type belongs to a preset pathogen type set, taking the any one of the first sequencing read data as second sequencing read data;
and for any one of the first sequencing read data, if the second pathogen type identification result corresponding to the any one of the first sequencing read data indicates that the identified pathogen type belongs to a preset pathogen type set, the first comparison score of the any one of the first sequencing read data and the pathogen reference sequence corresponding to the identified pathogen type is greater than or equal to a preset scoring threshold value, and the first length matching ratio of the length of the pathogen reference sequence corresponding to the identified pathogen type in the any one of the first sequencing read data and the total length of the any one of the first sequencing read data is greater than or equal to a preset first ratio threshold value, taking the any one of the first sequencing read data as one of the second sequencing read data.
Further, the comparison result includes: a second alignment score, a second match length ratio, a sequence similarity, and an alignment expectation for each second sequencing read data to a pathogen reference sequence corresponding to the identified pathogen type;
the determining a final pathogen type recognition result of the pathogen sample to be tested according to the comparison result corresponding to each second sequencing read data in each second sequencing read data set includes:
for any sequencing read data set, selecting second sequencing read data with highest second comparison score in any sequencing read data set as target sequencing read data, and judging whether the target sequencing read data simultaneously meets a second matching length ratio which is larger than or equal to a preset second ratio threshold, wherein the sequence similarity is larger than or equal to a preset sequence similarity threshold, and the comparison expected value is smaller than or equal to a preset comparison expected threshold; if yes, determining that the pathogen type corresponding to the target sequencing reading data exists in the pathogen sample to be tested;
and determining the final pathogen type recognition result according to the determined pathogen types existing in all the pathogen samples to be detected.
Further, the method further comprises:
carrying out statistical analysis on sequencing read data of the identified pathogen type according to a final pathogen type identification result of the pathogen sample to be tested to obtain a statistical analysis result; wherein the statistical analysis result includes: the pathogen types present in the test pathogen sample, the genus level and species level of each pathogen type, the composition ratio, the RPM value, and the reference genome coverage.
Further, the method further comprises:
and acquiring annotation information, pathogen sample information to be detected and target object information of the pathogen sample to be detected of each pathogen type in the final pathogen type identification result, and generating a detection report according to the statistical analysis result, the annotation information, the pathogen sample information to be detected and the target object information.
Further, the mechanism analysis module analysis method is as follows:
1) Constructing a pathogen database, and storing the acquired pathogen gene data into the pathogen database; acquiring whole pathogen genome expression data after the silencing of a plurality of pathogen genes; acquiring a pathogen gene expression rank sequence corresponding to the whole pathogen genome expression data;
2) Obtaining whole pathogen genome expression profile data from a database after infection by a plurality of different pathogens; constructing a imprinted pathogen gene set for the pathogen infection based on the whole pathogen genome expression profile data; acquiring the enrichment scores of the pathogen gene expression rank sequence and the imprinted pathogen gene set; determining the mechanism of injury of the pathogen infection based on the enrichment score.
Further, the constructing the imprinted pathogen gene set for the pathogen infection based on the whole pathogen genome expression profile data comprises:
obtaining the differential expression quantity corresponding to the genome expression profile data of the whole pathogen;
according to the differential expression quantity, the pathogen genes corresponding to the whole pathogen genome expression profile data are arranged from top to bottom to obtain a rank sequence;
obtaining a plurality of pathogen genes from the top and bottom of the order;
identifying the plurality of pathogen genes as a imprinted pathogen gene set for the pathogen infection;
and, said determining an injury mechanism of said pathogen infection from said enrichment score comprising:
determining whether the enrichment score is a positive number;
if so, determining that the cellular response of the pathogen gene silencing is consistent with the cellular response of the pathogen infection injury according to the enrichment score;
Wherein the determining whether the enrichment score is a positive number comprises:
determining whether a subset of target pathogen genes of the imprinted pathogen gene set is located at a corresponding position of the pathogen gene expression rank sequence under the pathogen gene silencing;
if yes, determining that the enrichment fraction is a positive number;
alternatively, the determining the mechanism of injury to the pathogen infection based on the enrichment score comprises:
constructing a correlation network between pathogen genes used for the pathogen gene silencing and pathogens used for the pathogen infection with the enrichment score as a weight;
acquiring all association relations with highest enrichment scores in the association network, wherein the association relations are used for representing the relation between the pathogen genes and pathogens used by pathogen infection;
and determining the damage mechanism of the pathogen infection according to the association relation.
Further, the obtaining whole pathogen genome expression data after the plurality of pathogen gene silencing comprises:
whole pathogen genome expression data after a plurality of pathogen gene silencing is obtained from a LINCS database.
In combination with the technical scheme and the technical problems to be solved, the technical scheme to be protected has the following advantages and positive effects:
The pathogen type identification module is used for carrying out preliminary judgment on pathogen types of first sequencing read data of different data types of pathogen samples to be tested through two different algorithms, then selecting second sequencing read data, and then comparing, verifying and screening the second sequencing read data of the identified different pathogen types with corresponding pathogen reference sequences, so that the pathogen types in the pathogen samples to be tested are accurately determined; meanwhile, acquiring whole pathogen genome expression data after the silencing of a plurality of pathogen genes through a mechanism analysis module; acquiring a pathogen gene expression rank sequence corresponding to the whole pathogen genome expression data; acquiring genome expression profile data of all pathogens infected by a plurality of different pathogens; constructing a imprinted pathogen gene set for the pathogen infection based on the whole pathogen genome expression profile data; acquiring the enrichment scores of the pathogen gene expression rank sequence and the imprinted pathogen gene set; and determining the damage mechanism of pathogen infection according to the enrichment fraction, so that massive various cross-platform transcriptome big data are fused, the pathogen is not required to be cultured from the head and the cells are not required to be infected, and a large-scale experiment is performed, so that the research and development cost is reduced, and the detection period is shortened.
Drawings
FIG. 1 is a block diagram of a multi-pathogen detection one-stop microorganism detection instrument based on nucleic acid amplification technology according to an embodiment of the present invention;
FIG. 2 is a flowchart of a pathogen-type recognition module recognition method according to an embodiment of the invention;
fig. 3 is a flowchart of a mechanism analysis module analysis method according to an embodiment of the present invention.
In fig. 1: 1. a sampling module; 2. a gene extraction module; 3. a main control module; 4. an amplification reaction module; 5. a gene sequencing module; 6. a pathogen-type recognition module; 7. a mechanism analysis module; 8. and a display module.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, a multiple pathogen detection one-station microorganism detection instrument based on a nucleic acid amplification technology according to an embodiment of the present invention includes: the system comprises a sampling module 1, a gene extraction module 2, a main control module 3, an amplification reaction module 4, a gene sequencing module 5, a pathogen type identification module 6, a mechanism analysis module 7 and a display module 8.
The sampling module 1 is connected with the gene extraction module 2 and is used for collecting pathogen samples;
the gene extraction module 2 is connected with the sampling module 1 and the main control module 3 and is used for extracting pathogen genes;
the extraction method of the gene extraction module comprises the following steps:
(1) Cell disruption: disrupting microbial cells in the collected pathogen sample by an ultrasonic disruption method to release nucleic acids in the cells;
(2) DNA/RNA isolation: centrifuging the crushed cell mixture to separate out supernatant containing nucleic acid; for DNA extraction, a DNA binding column extraction method is used; for RNA extraction, a phenol/chloroform extraction method was used;
(3) Enzyme treatment: performing enzyme treatment on the extracted nucleic acid to remove protein and RNase;
(4) Purifying and concentrating: removing residual contaminants using column chromatography while concentrating the nucleic acid solution for subsequent analysis;
the main control module 3 is connected with the gene extraction module 2, the amplification reaction module 4, the gene sequencing module 5, the pathogen type identification module 6, the mechanism analysis module 7 and the display module 8 and is used for controlling the normal operation of each module;
the amplification reaction module 4 is connected with the main control module 3 and is used for carrying out nucleic acid amplification reaction on pathogen genes;
The reaction method of the amplification reaction module comprises the following steps:
(1) Preparing a PCR reaction system comprising a PCR buffer solution, a template DNA, a primer, polymerase and dNTPs, and selecting single PCR, multiple PCR and real-time fluorescence quantitative PCR according to the types and the quantity of pathogens to be detected;
(2) Template DNA and a primer are added into a PCR reaction system, and PCR reaction is carried out in a thermal cycler;
the gene sequencing module 5 is connected with the main control module 3 and is used for sequencing pathogen genes;
the sequencing method of the gene sequencing module comprises the following steps:
(1) Preparing a sample: extracting nucleic acid from the pathogen sample obtained by the sampling module and preparing the nucleic acid as a starting material for sequencing;
(2) Library construction: processing and preparing the extracted pathogen nucleic acid sample, including removing contaminants and low quality nucleic acid fragments that may be present; converting the processed nucleic acid sample into a library for subsequent sequencing analysis;
(3) Sequencing method selection: in a gene sequencing module, selecting a proper sequencing method to sequence pathogen genes;
(4) Sequencing was performed: in the gene sequencing module, performing a selected sequencing method to sequence pathogen nucleic acid samples; performing double-ended sequencing according to the selected sequencing method;
(5) Data analysis: the obtained sequencing data is imported into a main control module, and data processing and interpretation are carried out by using sequencing data analysis software based on machine learning;
the pathogen type identification module 6 is connected with the main control module 3 and is used for identifying pathogen types;
the mechanism analysis module 7 is connected with the main control module 3 and is used for analyzing pathogen infection damage mechanisms;
and the display module 8 is connected with the main control module 3 and used for displaying a gene sequencing result, a pathogen type recognition result and a pathogen infection damage mechanism analysis result.
As shown in fig. 2, the pathogen type recognition module recognition method provided by the invention is as follows:
s101, acquiring at least one sequencing read data obtained by nucleic acid sequence detection corresponding to a pathogen sample to be detected; counting sequencing read data; determining the data type of each sequencing read data in the at least one sequencing read data, and carrying out data cleaning on each sequencing read data in the at least one sequencing read data according to a preset data cleaning strategy according to the data type so as to obtain at least one first sequencing read data with qualified quality;
s102, performing preliminary identification of pathogen type on each first sequencing read data in the at least one qualified first sequencing read data by using a first identification method and a second identification method respectively to acquire a first pathogen type identification result and a second pathogen type identification result corresponding to each first sequencing read data respectively; selecting second sequencing read data according to a first pathogen type identification result and a second pathogen type identification result corresponding to each first sequencing read data, obtaining at least one second sequencing read data, and classifying the at least one second sequencing read data according to the identified pathogen type to obtain second sequencing read data sets corresponding to different pathogen types;
S103, for any second sequencing read data set, comparing each second sequencing read data in any second sequencing read data set with a pathogen reference sequence of a pathogen type corresponding to the any second sequencing read data set to obtain a comparison result corresponding to each second sequencing read data in any second sequencing read data set; and determining a final pathogen type identification result of the pathogen sample to be tested according to the comparison result corresponding to each second sequencing read data in each second sequencing read data set.
The invention provides a method for cleaning data of each sequencing read data in at least one sequencing read data according to a preset data cleaning strategy according to a data type to obtain at least one first sequencing read data with qualified quality, comprising the following steps:
determining the sequencing read data quality of each sequencing read data with the data type of second generation sequencing data by using fastQC, and screening out each sequencing read data with the sequencing read data quality meeting the preset first sequencing read data quality standard as first sequencing read data;
determining the sequencing read data quality of each sequencing read data with the data type of three-generation sequencing data by utilizing Nanofilt, and screening out each sequencing read data with the sequencing read data quality meeting the preset second sequencing read data quality standard as the first sequencing read data.
The invention provides a method for selecting second sequencing read data according to a first pathogen type identification result and a second pathogen type identification result corresponding to each first sequencing read data, and obtaining at least one second sequencing read data, comprising the following steps:
for any one of the first sequencing read data, if the first pathogen type identification result corresponding to the any one of the first sequencing read data indicates that the identified pathogen type belongs to a preset pathogen type set, taking the any one of the first sequencing read data as second sequencing read data;
and for any one of the first sequencing read data, if the second pathogen type identification result corresponding to the any one of the first sequencing read data indicates that the identified pathogen type belongs to a preset pathogen type set, the first comparison score of the any one of the first sequencing read data and the pathogen reference sequence corresponding to the identified pathogen type is greater than or equal to a preset scoring threshold value, and the first length matching ratio of the length of the pathogen reference sequence corresponding to the identified pathogen type in the any one of the first sequencing read data and the total length of the any one of the first sequencing read data is greater than or equal to a preset first ratio threshold value, taking the any one of the first sequencing read data as one of the second sequencing read data.
The comparison result provided by the invention comprises the following steps: a second alignment score, a second match length ratio, a sequence similarity, and an alignment expectation for each second sequencing read data to a pathogen reference sequence corresponding to the identified pathogen type;
the determining a final pathogen type recognition result of the pathogen sample to be tested according to the comparison result corresponding to each second sequencing read data in each second sequencing read data set includes:
for any sequencing read data set, selecting second sequencing read data with highest second comparison score in any sequencing read data set as target sequencing read data, and judging whether the target sequencing read data simultaneously meets a second matching length ratio which is larger than or equal to a preset second ratio threshold, wherein the sequence similarity is larger than or equal to a preset sequence similarity threshold, and the comparison expected value is smaller than or equal to a preset comparison expected threshold; if yes, determining that the pathogen type corresponding to the target sequencing reading data exists in the pathogen sample to be tested;
and determining the final pathogen type recognition result according to the determined pathogen types existing in all the pathogen samples to be detected.
The method provided by the invention further comprises the following steps:
carrying out statistical analysis on sequencing read data of the identified pathogen type according to a final pathogen type identification result of the pathogen sample to be tested to obtain a statistical analysis result; wherein the statistical analysis result includes: the pathogen types present in the test pathogen sample, the genus level and species level of each pathogen type, the composition ratio, the RPM value, and the reference genome coverage.
The method provided by the invention further comprises the following steps:
and acquiring annotation information, pathogen sample information to be detected and target object information of the pathogen sample to be detected of each pathogen type in the final pathogen type identification result, and generating a detection report according to the statistical analysis result, the annotation information, the pathogen sample information to be detected and the target object information.
As shown in fig. 3, the mechanism analysis module analysis method provided by the invention is as follows:
s201, constructing a pathogen database, and storing the acquired pathogen gene data into the pathogen database; acquiring whole pathogen genome expression data after the silencing of a plurality of pathogen genes; acquiring a pathogen gene expression rank sequence corresponding to the whole pathogen genome expression data;
S202, acquiring genome expression profile data of all pathogens infected by a plurality of different pathogens from a database; constructing a imprinted pathogen gene set for the pathogen infection based on the whole pathogen genome expression profile data; acquiring the enrichment scores of the pathogen gene expression rank sequence and the imprinted pathogen gene set; determining the mechanism of injury of the pathogen infection based on the enrichment score.
The invention provides a method for constructing a imprinted pathogen gene set infected by the pathogen based on the genome expression profile data of the whole pathogen, which comprises the following steps:
obtaining the differential expression quantity corresponding to the genome expression profile data of the whole pathogen;
according to the differential expression quantity, the pathogen genes corresponding to the whole pathogen genome expression profile data are arranged from top to bottom to obtain a rank sequence;
obtaining a plurality of pathogen genes from the top and bottom of the order;
identifying the plurality of pathogen genes as a imprinted pathogen gene set for the pathogen infection;
and, said determining an injury mechanism of said pathogen infection from said enrichment score comprising:
determining whether the enrichment score is a positive number;
if so, determining that the cellular response of the pathogen gene silencing is consistent with the cellular response of the pathogen infection injury according to the enrichment score;
Wherein the determining whether the enrichment score is a positive number comprises:
determining whether a subset of target pathogen genes of the imprinted pathogen gene set is located at a corresponding position of the pathogen gene expression rank sequence under the pathogen gene silencing;
if yes, determining that the enrichment fraction is a positive number;
alternatively, the determining the mechanism of injury to the pathogen infection based on the enrichment score comprises:
constructing a correlation network between pathogen genes used for the pathogen gene silencing and pathogens used for the pathogen infection with the enrichment score as a weight;
acquiring all association relations with highest enrichment scores in the association network, wherein the association relations are used for representing the relation between the pathogen genes and pathogens used by pathogen infection;
and determining the damage mechanism of the pathogen infection according to the association relation.
The invention provides a method for obtaining whole pathogen genome expression data after silencing a plurality of pathogen genes, comprising the following steps:
whole pathogen genome expression data after a plurality of pathogen gene silencing is obtained from a LINCS database.
When the application embodiment of the invention works, firstly, a pathogen sample is collected through a sampling module 1; extracting pathogen genes through a gene extraction module 2; secondly, the main control module 3 performs nucleic acid amplification reaction on pathogen genes through the amplification reaction module 4; sequencing pathogen genes by a gene sequencing module 5; identifying the pathogen type by the pathogen type identification module 6; then, analyzing a pathogen infection damage mechanism through a mechanism analysis module 7; finally, the gene sequencing result, the pathogen type recognition result and the pathogen infection damage mechanism analysis result are displayed through a display module 8.
It should be noted that the embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those of ordinary skill in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The device of the present invention and its modules may be implemented by hardware circuitry, such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., as well as software executed by various types of processors, or by a combination of the above hardware circuitry and software, such as firmware.
The pathogen type identification module is used for carrying out preliminary judgment on pathogen types of first sequencing read data of different data types of pathogen samples to be tested through two different algorithms, then selecting second sequencing read data, and then comparing, verifying and screening the second sequencing read data of the identified different pathogen types with corresponding pathogen reference sequences, so that the pathogen types in the pathogen samples to be tested are accurately determined; meanwhile, acquiring whole pathogen genome expression data after the silencing of a plurality of pathogen genes through a mechanism analysis module; acquiring a pathogen gene expression rank sequence corresponding to the whole pathogen genome expression data; acquiring genome expression profile data of all pathogens infected by a plurality of different pathogens; constructing a imprinted pathogen gene set for the pathogen infection based on the whole pathogen genome expression profile data; acquiring the enrichment scores of the pathogen gene expression rank sequence and the imprinted pathogen gene set; and determining the damage mechanism of pathogen infection according to the enrichment fraction, so that massive various cross-platform transcriptome big data are fused, the pathogen is not required to be cultured from the head and the cells are not required to be infected, and a large-scale experiment is performed, so that the research and development cost is reduced, and the detection period is shortened.
The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims (10)

1. A multiple pathogen detection one-stop microorganism detection instrument based on a nucleic acid amplification technology, characterized in that the multiple pathogen detection one-stop microorganism detection instrument based on a nucleic acid amplification technology comprises:
the system comprises a sampling module, a gene extraction module, a main control module, an amplification reaction module, a gene sequencing module, a pathogen type identification module, a mechanism analysis module and a display module;
the sampling module is connected with the gene extraction module and is used for collecting pathogen samples;
the gene extraction module is connected with the sampling module and the main control module and is used for extracting pathogen genes;
the extraction method of the gene extraction module comprises the following steps:
(1) Cell disruption: disrupting microbial cells in the collected pathogen sample by an ultrasonic disruption method to release nucleic acids in the cells;
(2) DNA/RNA isolation: centrifuging the crushed cell mixture to separate out supernatant containing nucleic acid; for DNA extraction, a DNA binding column extraction method is used; for RNA extraction, a phenol/chloroform extraction method was used;
(3) Enzyme treatment: performing enzyme treatment on the extracted nucleic acid to remove protein and RNase;
(4) Purifying and concentrating: removing residual contaminants using column chromatography while concentrating the nucleic acid solution for subsequent analysis;
the main control module is connected with the gene extraction module, the amplification reaction module, the gene sequencing module, the pathogen type identification module, the mechanism analysis module and the display module and used for controlling the normal operation of each module;
the amplification reaction module is connected with the main control module and is used for carrying out nucleic acid amplification reaction on pathogen genes;
the reaction method of the amplification reaction module comprises the following steps:
(1) Preparing a PCR reaction system comprising a PCR buffer solution, a template DNA, a primer, polymerase and dNTPs, and selecting single PCR, multiple PCR and real-time fluorescence quantitative PCR according to the types and the quantity of pathogens to be detected;
(2) Template DNA and a primer are added into a PCR reaction system, and PCR reaction is carried out in a thermal cycler;
The gene sequencing module is connected with the main control module and is used for sequencing pathogen genes;
the sequencing method of the gene sequencing module comprises the following steps:
(1) Preparing a sample: extracting nucleic acid from the pathogen sample obtained by the sampling module and preparing the nucleic acid as a starting material for sequencing;
(2) Library construction: processing and preparing the extracted pathogen nucleic acid sample, including removing contaminants and low quality nucleic acid fragments that may be present; converting the processed nucleic acid sample into a library for subsequent sequencing analysis;
(3) Sequencing method selection: in a gene sequencing module, selecting a proper sequencing method to sequence pathogen genes;
(4) Sequencing was performed: in the gene sequencing module, performing a selected sequencing method to sequence pathogen nucleic acid samples; performing double-ended sequencing according to the selected sequencing method;
(5) Data analysis: the obtained sequencing data is imported into a main control module, and data processing and interpretation are carried out by using sequencing data analysis software based on machine learning;
the pathogen type identification module is connected with the main control module and used for identifying pathogen types;
the mechanism analysis module is connected with the main control module and used for analyzing pathogen infection damage mechanisms;
The display module is connected with the main control module and used for displaying a gene sequencing result, a pathogen type recognition result and a pathogen infection damage mechanism analysis result.
2. The nucleic acid amplification technology-based multi-pathogen detection one-stop microbial detection instrument of claim 1, wherein the pathogen type recognition module recognizes the following method:
(1) Acquiring at least one sequencing read data obtained by detecting a nucleic acid sequence corresponding to a pathogen sample to be detected; counting sequencing read data; determining the data type of each sequencing read data in the at least one sequencing read data, and carrying out data cleaning on each sequencing read data in the at least one sequencing read data according to a preset data cleaning strategy according to the data type so as to obtain at least one first sequencing read data with qualified quality;
(2) Performing preliminary identification of pathogen type on each first sequencing read data in the at least one qualified first sequencing read data by using a first identification method and a second identification method respectively to obtain a first pathogen type identification result and a second pathogen type identification result corresponding to each first sequencing read data respectively; selecting second sequencing read data according to a first pathogen type identification result and a second pathogen type identification result corresponding to each first sequencing read data, obtaining at least one second sequencing read data, and classifying the at least one second sequencing read data according to the identified pathogen type to obtain second sequencing read data sets corresponding to different pathogen types;
(3) For any one of the second sequencing read data sets, comparing each second sequencing read data in the any one of the second sequencing read data sets with a pathogen reference sequence of a pathogen type corresponding to the any one of the second sequencing read data sets, and obtaining a comparison result corresponding to each second sequencing read data in the any one of the second sequencing read data sets; and determining a final pathogen type identification result of the pathogen sample to be tested according to the comparison result corresponding to each second sequencing read data in each second sequencing read data set.
3. The nucleic acid amplification technology-based multiplex pathogen detection one-stop microbial detection instrument of claim 2, wherein the data cleaning of each of the at least one sequencing read data according to a predetermined data cleaning strategy based on data type to obtain at least one quality-acceptable first sequencing read data comprises:
determining the sequencing read data quality of each sequencing read data with the data type of second generation sequencing data by using fastQC, and screening out each sequencing read data with the sequencing read data quality meeting the preset first sequencing read data quality standard as first sequencing read data;
Determining the sequencing read data quality of each sequencing read data with the data type of three-generation sequencing data by utilizing Nanofilt, and screening out each sequencing read data with the sequencing read data quality meeting the preset second sequencing read data quality standard as the first sequencing read data.
4. The nucleic acid amplification technology-based multi-pathogen detection one-stop microbial detection instrument of claim 2, wherein the selecting the second sequencing read data based on the first pathogen type identification result and the second pathogen type identification result corresponding to each of the first sequencing read data to obtain at least one second sequencing read data comprises:
for any one of the first sequencing read data, if the first pathogen type identification result corresponding to the any one of the first sequencing read data indicates that the identified pathogen type belongs to a preset pathogen type set, taking the any one of the first sequencing read data as second sequencing read data;
and for any one of the first sequencing read data, if the second pathogen type identification result corresponding to the any one of the first sequencing read data indicates that the identified pathogen type belongs to a preset pathogen type set, the first comparison score of the any one of the first sequencing read data and the pathogen reference sequence corresponding to the identified pathogen type is greater than or equal to a preset scoring threshold value, and the first length matching ratio of the length of the pathogen reference sequence corresponding to the identified pathogen type in the any one of the first sequencing read data and the total length of the any one of the first sequencing read data is greater than or equal to a preset first ratio threshold value, taking the any one of the first sequencing read data as one of the second sequencing read data.
5. The nucleic acid amplification technology-based multiplex pathogen detection one-stop microbial detection instrument of claim 2, wherein the comparison result comprises: a second alignment score, a second match length ratio, a sequence similarity, and an alignment expectation for each second sequencing read data to a pathogen reference sequence corresponding to the identified pathogen type;
the determining a final pathogen type recognition result of the pathogen sample to be tested according to the comparison result corresponding to each second sequencing read data in each second sequencing read data set includes:
for any sequencing read data set, selecting second sequencing read data with highest second comparison score in any sequencing read data set as target sequencing read data, and judging whether the target sequencing read data simultaneously meets a second matching length ratio which is larger than or equal to a preset second ratio threshold, wherein the sequence similarity is larger than or equal to a preset sequence similarity threshold, and the comparison expected value is smaller than or equal to a preset comparison expected threshold; if yes, determining that the pathogen type corresponding to the target sequencing reading data exists in the pathogen sample to be tested;
And determining the final pathogen type recognition result according to the determined pathogen types existing in all the pathogen samples to be detected.
6. The nucleic acid amplification technology-based multiplex pathogen detection one-stop microbial detection instrument of claim 2, wherein the method further comprises:
carrying out statistical analysis on sequencing read data of the identified pathogen type according to a final pathogen type identification result of the pathogen sample to be tested to obtain a statistical analysis result; wherein the statistical analysis result includes: the pathogen types present in the test pathogen sample, the genus level and species level of each pathogen type, the composition ratio, the RPM value, and the reference genome coverage.
7. The nucleic acid amplification technology-based multiplex pathogen detection one-stop microbial detection instrument of claim 2, wherein the method further comprises:
and acquiring annotation information, pathogen sample information to be detected and target object information of the pathogen sample to be detected of each pathogen type in the final pathogen type identification result, and generating a detection report according to the statistical analysis result, the annotation information, the pathogen sample information to be detected and the target object information.
8. The nucleic acid amplification technology-based multi-pathogen detection one-stop microbial detection instrument of claim 1, wherein the mechanism analysis module analyzes the method as follows:
1) Constructing a pathogen database, and storing the acquired pathogen gene data into the pathogen database; acquiring whole pathogen genome expression data after the silencing of a plurality of pathogen genes; acquiring a pathogen gene expression rank sequence corresponding to the whole pathogen genome expression data;
2) Obtaining whole pathogen genome expression profile data from a database after infection by a plurality of different pathogens; constructing a imprinted pathogen gene set for the pathogen infection based on the whole pathogen genome expression profile data; acquiring the enrichment scores of the pathogen gene expression rank sequence and the imprinted pathogen gene set; determining the mechanism of injury of the pathogen infection based on the enrichment score.
9. The nucleic acid amplification technology-based multiplex pathogen detection one-stop microbial detection instrument of claim 8, wherein the construction of the imprinted pathogen gene set for pathogen infection based on the whole pathogen genomic expression profile data comprises:
Obtaining the differential expression quantity corresponding to the genome expression profile data of the whole pathogen;
according to the differential expression quantity, the pathogen genes corresponding to the whole pathogen genome expression profile data are arranged from top to bottom to obtain a rank sequence;
obtaining a plurality of pathogen genes from the top and bottom of the order;
identifying the plurality of pathogen genes as a imprinted pathogen gene set for the pathogen infection;
and, said determining an injury mechanism of said pathogen infection from said enrichment score comprising:
determining whether the enrichment score is a positive number;
if so, determining that the cellular response of the pathogen gene silencing is consistent with the cellular response of the pathogen infection injury according to the enrichment score;
wherein the determining whether the enrichment score is a positive number comprises:
determining whether a subset of target pathogen genes of the imprinted pathogen gene set is located at a corresponding position of the pathogen gene expression rank sequence under the pathogen gene silencing;
if yes, determining that the enrichment fraction is a positive number;
alternatively, the determining the mechanism of injury to the pathogen infection based on the enrichment score comprises:
Constructing a correlation network between pathogen genes used for the pathogen gene silencing and pathogens used for the pathogen infection with the enrichment score as a weight;
acquiring all association relations with highest enrichment scores in the association network, wherein the association relations are used for representing the relation between the pathogen genes and pathogens used by pathogen infection;
and determining the damage mechanism of the pathogen infection according to the association relation.
10. The nucleic acid amplification technology-based multiplex pathogen detection one-stop microbial detection instrument of claim 8, wherein the obtaining of whole pathogen genomic expression data after a plurality of pathogen gene silencing comprises:
whole pathogen genome expression data after a plurality of pathogen gene silencing is obtained from a LINCS database.
CN202310600810.XA 2023-05-25 2023-05-25 Multiple pathogen detection one-stop microorganism detection instrument based on nucleic acid amplification technology Pending CN116694744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310600810.XA CN116694744A (en) 2023-05-25 2023-05-25 Multiple pathogen detection one-stop microorganism detection instrument based on nucleic acid amplification technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310600810.XA CN116694744A (en) 2023-05-25 2023-05-25 Multiple pathogen detection one-stop microorganism detection instrument based on nucleic acid amplification technology

Publications (1)

Publication Number Publication Date
CN116694744A true CN116694744A (en) 2023-09-05

Family

ID=87838463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310600810.XA Pending CN116694744A (en) 2023-05-25 2023-05-25 Multiple pathogen detection one-stop microorganism detection instrument based on nucleic acid amplification technology

Country Status (1)

Country Link
CN (1) CN116694744A (en)

Similar Documents

Publication Publication Date Title
CN101748213B (en) Environmental microorganism detection method and system
US20200294628A1 (en) Creation or use of anchor-based data structures for sample-derived characteristic determination
CN110875082B (en) Microorganism detection method and device based on targeted amplification sequencing
CN114708910B (en) Method for calculating enrichment score of cell subpopulations in cell sequencing by using single cell sequencing data
CN115719616B (en) Screening method and system for pathogen species specific sequences
WO2017129110A1 (en) Method for qualitative and quantitative detection of microorganism in human body
CN113066533B (en) mNGS pathogen data analysis method
Feye et al. Poultry processing and the application of microbiome mapping
CN113160882A (en) Pathogenic microorganism metagenome detection method based on third generation sequencing
CN109082479A (en) The method and apparatus of microbial species are identified from sample
CN104017859A (en) Method for identifying sugarcane germplasm resources based on SSR (Simple Sequence Repeats) and CE (capillary electrophoresis) technique
CN112331268B (en) Method for obtaining specific sequence of target species and method for detecting target species
US20060019295A1 (en) Genomic barcoding for organism identification
CN105603081B (en) Non-diagnosis-purpose qualitative and quantitative detection method for intestinal microorganisms
CN101429559A (en) Environmental microorganism detection method and system
CN111554349B (en) Species identification system and method based on high-throughput sequencing
Tristezza et al. An optimized protocol for the production of interdelta markers in Saccharomyces cerevisiae by using capillary electrophoresis
CN116694744A (en) Multiple pathogen detection one-stop microorganism detection instrument based on nucleic acid amplification technology
CN110592253A (en) DNA combined bar code for identifying Yunnan tea tree variety and identification method thereof
Olds et al. Applying a modified metabarcoding approach for the sequencing of macrofungal specimens from fungarium collections
CN111028889B (en) Method for obtaining in-vivo nutritional type plant pathogenic oomycete pollution-free genome
CN113936740A (en) High-throughput detection method and system for pathogenic bacteria virulence factor in environmental sample
CN113744806A (en) Fungus sequencing data identification method based on nanopore sequencer
CN110305974A (en) The PCR analysis primer and its analysis method of common mouse metallothionein-Ⅰ are distinguished based on five SNP sites of detection
Welsh et al. The prevalence of controls in phyllosphere microbiome research: a methodological review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination