CN112852916A - Marker combination for intestinal microecology, auxiliary diagnosis model and application of marker combination - Google Patents
Marker combination for intestinal microecology, auxiliary diagnosis model and application of marker combination Download PDFInfo
- Publication number
- CN112852916A CN112852916A CN202110192133.3A CN202110192133A CN112852916A CN 112852916 A CN112852916 A CN 112852916A CN 202110192133 A CN202110192133 A CN 202110192133A CN 112852916 A CN112852916 A CN 112852916A
- Authority
- CN
- China
- Prior art keywords
- flora
- sample
- known sample
- information
- marker combination
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000003550 marker Substances 0.000 title claims abstract description 27
- 238000003745 diagnosis Methods 0.000 title claims abstract description 26
- 230000000968 intestinal effect Effects 0.000 title abstract description 27
- 208000018737 Parkinson disease Diseases 0.000 claims abstract description 61
- 244000005700 microbiome Species 0.000 claims abstract description 23
- 241000606125 Bacteroides Species 0.000 claims abstract description 9
- 241000701474 Alistipes Species 0.000 claims abstract description 7
- 241000605937 Oxalobacter Species 0.000 claims abstract description 7
- 241000192031 Ruminococcus Species 0.000 claims abstract description 7
- 241000193403 Clostridium Species 0.000 claims abstract description 6
- 241000186000 Bifidobacterium Species 0.000 claims abstract description 5
- 241001495171 Bilophila Species 0.000 claims abstract description 5
- 241001486845 Scardovia Species 0.000 claims abstract description 5
- 241000549372 Solobacterium Species 0.000 claims abstract description 4
- 238000007637 random forest analysis Methods 0.000 claims description 35
- 238000000034 method Methods 0.000 claims description 25
- 238000012163 sequencing technique Methods 0.000 claims description 17
- 238000002790 cross-validation Methods 0.000 claims description 16
- 238000004458 analytical method Methods 0.000 claims description 14
- 238000012512 characterization method Methods 0.000 claims description 14
- 230000000813 microbial effect Effects 0.000 claims description 14
- 241000894007 species Species 0.000 claims description 13
- 239000003153 chemical reaction reagent Substances 0.000 claims description 12
- 230000001580 bacterial effect Effects 0.000 claims description 10
- 230000009467 reduction Effects 0.000 claims description 10
- 238000000692 Student's t-test Methods 0.000 claims description 9
- 238000007689 inspection Methods 0.000 claims description 8
- 238000013138 pruning Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 6
- 241000588748 Klebsiella Species 0.000 claims description 5
- 241000043362 Megamonas Species 0.000 claims description 5
- 241000607768 Shigella Species 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000009966 trimming Methods 0.000 claims description 5
- FERIUCNNQQJTOY-UHFFFAOYSA-N Butyric acid Chemical compound CCCC(O)=O FERIUCNNQQJTOY-UHFFFAOYSA-N 0.000 claims description 4
- 230000008676 import Effects 0.000 claims description 4
- 238000000491 multivariate analysis Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 241001227086 Anaerostipes Species 0.000 claims description 3
- 241001464956 Collinsella Species 0.000 claims description 3
- 241000605947 Roseburia Species 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 210000003608 fece Anatomy 0.000 claims description 3
- 241000195649 Chlorella <Chlorellales> Species 0.000 claims description 2
- 241000186216 Corynebacterium Species 0.000 claims description 2
- 241000589565 Flavobacterium Species 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims description 2
- 239000000032 diagnostic agent Substances 0.000 claims description 2
- 229940039227 diagnostic agent Drugs 0.000 claims description 2
- 238000000605 extraction Methods 0.000 claims description 2
- 238000002360 preparation method Methods 0.000 claims description 2
- 230000004044 response Effects 0.000 claims description 2
- 238000012353 t test Methods 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 claims 1
- 230000002906 microbiologic effect Effects 0.000 claims 1
- 108020004414 DNA Proteins 0.000 description 17
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 15
- 201000010099 disease Diseases 0.000 description 13
- 210000001035 gastrointestinal tract Anatomy 0.000 description 9
- 244000005709 gut microbiome Species 0.000 description 9
- 238000000585 Mann–Whitney U test Methods 0.000 description 7
- 235000005911 diet Nutrition 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000007717 exclusion Effects 0.000 description 5
- 230000037213 diet Effects 0.000 description 4
- 241001156739 Actinobacteria <phylum> Species 0.000 description 3
- 208000024827 Alzheimer disease Diseases 0.000 description 3
- 125000003118 aryl group Chemical group 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000002550 fecal effect Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 239000002207 metabolite Substances 0.000 description 3
- 241000894006 Bacteria Species 0.000 description 2
- 206010006100 Bradykinesia Diseases 0.000 description 2
- 206010010774 Constipation Diseases 0.000 description 2
- 208000012661 Dyskinesia Diseases 0.000 description 2
- 208000006083 Hypokinesia Diseases 0.000 description 2
- 208000002740 Muscle Rigidity Diseases 0.000 description 2
- 208000027089 Parkinsonian disease Diseases 0.000 description 2
- 206010034010 Parkinsonism Diseases 0.000 description 2
- 206010071390 Resting tremor Diseases 0.000 description 2
- 206010044565 Tremor Diseases 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 210000003169 central nervous system Anatomy 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000378 dietary effect Effects 0.000 description 2
- 208000035475 disorder Diseases 0.000 description 2
- VYFYYTLLBUKUHU-UHFFFAOYSA-N dopamine Chemical compound NCCC1=CC=C(O)C(O)=C1 VYFYYTLLBUKUHU-UHFFFAOYSA-N 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000007160 gastrointestinal dysfunction Effects 0.000 description 2
- 230000007149 gut brain axis pathway Effects 0.000 description 2
- 230000004060 metabolic process Effects 0.000 description 2
- 230000004770 neurodegeneration Effects 0.000 description 2
- 208000015122 neurodegenerative disease Diseases 0.000 description 2
- 238000004393 prognosis Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 108020004465 16S ribosomal RNA Proteins 0.000 description 1
- 108091093088 Amplicon Proteins 0.000 description 1
- 206010003805 Autism Diseases 0.000 description 1
- 208000020706 Autistic disease Diseases 0.000 description 1
- 206010003840 Autonomic nervous system imbalance Diseases 0.000 description 1
- 241000606126 Bacteroidaceae Species 0.000 description 1
- 241001216243 Butyricimonas Species 0.000 description 1
- 241001112695 Clostridiales Species 0.000 description 1
- 206010009944 Colon cancer Diseases 0.000 description 1
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 1
- 241001135761 Deltaproteobacteria Species 0.000 description 1
- 206010012289 Dementia Diseases 0.000 description 1
- 208000020401 Depressive disease Diseases 0.000 description 1
- 241000605716 Desulfovibrio Species 0.000 description 1
- 241001571085 Desulfovibrionales Species 0.000 description 1
- 206010051244 Dyschezia Diseases 0.000 description 1
- 208000004232 Enteritis Diseases 0.000 description 1
- 241000192125 Firmicutes Species 0.000 description 1
- 208000018522 Gastrointestinal disease Diseases 0.000 description 1
- 241000193789 Gemella Species 0.000 description 1
- 206010021518 Impaired gastric emptying Diseases 0.000 description 1
- 208000019022 Mood disease Diseases 0.000 description 1
- 208000012902 Nervous system disease Diseases 0.000 description 1
- 208000025966 Neurological disease Diseases 0.000 description 1
- 206010056242 Parkinsonian gait Diseases 0.000 description 1
- 241000425347 Phyla <beetle> Species 0.000 description 1
- 241000192142 Proteobacteria Species 0.000 description 1
- 102100027378 Prothrombin Human genes 0.000 description 1
- 108010094028 Prothrombin Proteins 0.000 description 1
- 208000014604 Specific Language disease Diseases 0.000 description 1
- 208000025865 Ulcer Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 229940088710 antibiotic agent Drugs 0.000 description 1
- 201000007201 aphasia Diseases 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000000740 bleeding effect Effects 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 238000010224 classification analysis Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000006999 cognitive decline Effects 0.000 description 1
- 208000010877 cognitive disease Diseases 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 208000029078 coronary artery disease Diseases 0.000 description 1
- 230000007850 degeneration Effects 0.000 description 1
- 230000003412 degenerative effect Effects 0.000 description 1
- 229960003638 dopamine Drugs 0.000 description 1
- 210000005064 dopaminergic neuron Anatomy 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000005021 gait Effects 0.000 description 1
- 208000001288 gastroparesis Diseases 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000003284 homeostatic effect Effects 0.000 description 1
- 230000007236 host immunity Effects 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 238000001802 infusion Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000005027 intestinal barrier Anatomy 0.000 description 1
- 230000007358 intestinal barrier function Effects 0.000 description 1
- 210000002011 intestinal secretion Anatomy 0.000 description 1
- 210000004558 lewy body Anatomy 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 201000006417 multiple sclerosis Diseases 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 208000035824 paresthesia Diseases 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 230000001144 postural effect Effects 0.000 description 1
- 239000006041 probiotic Substances 0.000 description 1
- 235000018291 probiotics Nutrition 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 229940039716 prothrombin Drugs 0.000 description 1
- 208000020016 psychiatric disease Diseases 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 201000000980 schizophrenia Diseases 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 208000019116 sleep disease Diseases 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
- 208000016261 weight loss Diseases 0.000 description 1
- 230000004580 weight loss Effects 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/02—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving viable microorganisms
- C12Q1/04—Determining presence or kind of microorganism; Use of selective media for testing antibiotics or bacteriocides; Compositions containing a chemical indicator therefor
- C12Q1/10—Enterobacteria
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/02—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving viable microorganisms
- C12Q1/04—Determining presence or kind of microorganism; Use of selective media for testing antibiotics or bacteriocides; Compositions containing a chemical indicator therefor
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/02—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving viable microorganisms
- C12Q1/04—Determining presence or kind of microorganism; Use of selective media for testing antibiotics or bacteriocides; Compositions containing a chemical indicator therefor
- C12Q1/06—Quantitative determination
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/689—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/20—Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2333/00—Assays involving biological materials from specific organisms or of a specific nature
- G01N2333/195—Assays involving biological materials from specific organisms or of a specific nature from bacteria
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2333/00—Assays involving biological materials from specific organisms or of a specific nature
- G01N2333/195—Assays involving biological materials from specific organisms or of a specific nature from bacteria
- G01N2333/24—Assays involving biological materials from specific organisms or of a specific nature from bacteria from Enterobacteriaceae (F), e.g. Citrobacter, Serratia, Proteus, Providencia, Morganella, Yersinia
- G01N2333/25—Shigella (G)
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2333/00—Assays involving biological materials from specific organisms or of a specific nature
- G01N2333/195—Assays involving biological materials from specific organisms or of a specific nature from bacteria
- G01N2333/24—Assays involving biological materials from specific organisms or of a specific nature from bacteria from Enterobacteriaceae (F), e.g. Citrobacter, Serratia, Proteus, Providencia, Morganella, Yersinia
- G01N2333/26—Klebsiella (G)
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2800/00—Detection or diagnosis of diseases
- G01N2800/28—Neurological disorders
- G01N2800/2835—Movement disorders, e.g. Parkinson, Huntington, Tourette
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Organic Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Data Mining & Analysis (AREA)
- Public Health (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Databases & Information Systems (AREA)
- Immunology (AREA)
- Epidemiology (AREA)
- Biomedical Technology (AREA)
- Toxicology (AREA)
- Artificial Intelligence (AREA)
- Primary Health Care (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Pathology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a marker combination of intestinal microecology, an auxiliary diagnosis model and application thereof. The marker combinations include the following microorganisms: scatavia (Scardovia), ruminococcus (unnamed) (ruminococcus noname), cholephilus (Bilophila), Bacteroides (Bacteroides), gemfibrococcus (gemellal), Alistipes (Alistipes), Oxalobacter (Oxalobacter), sorafella (Solobacterium), Bifidobacterium (bifidum) and clostridium (unnamed) (clostridium noname). The model of the invention can provide high-accuracy noninvasive auxiliary diagnosis of the Parkinson's disease, and the accuracy can reach 80.3 percent.
Description
Technical Field
The invention relates to the fields of medicine, biology and bioinformatics, in particular to a marker combination for intestinal microecology, an auxiliary diagnosis model and application thereof.
Background
Parkinson's Disease (PD) is a multifocal and progressive neurodegenerative disease. This disease is second only to alzheimer's disease as the second major neurodegenerative disease. By 2015, the number of patients worldwide was 620 million, of which 11.7 million died from parkinson's disease. Pathologically, PD is characterized primarily by degeneration of nigral dopaminergic neurons, striatal dopamine depletion, and the formation of abnormal protein aggregate lewy bodies within neurons. Clinically, the main features of PD are resting tremor, rigidity, bradykinesia and gait abnormalities, which are also considered to be the "four-key" signs of PD. Other characteristics include frozen gait, postural instability, dysphasia, autonomic dysfunction, paresthesia, mood disorders, sleep disorders, cognitive decline and dementia. Many PD patients often have manifestations of gastrointestinal dysfunction before they develop dyskinesia. A range of PD-associated gastrointestinal dysfunction has been identified clinically, including weight loss, gastroparesis, constipation and dyschezia. In recent years, metagenomic studies further discuss the relevance of Parkinson's disease and intestinal flora abnormality, and can be said to be the extension of the gastrointestinal tract hypothesis on the aspect of intestinal flora.
The intestinal flora consists of bacterial communities in the gastrointestinal tract that are symbiotic with the human host. The development of the intestinal flora is influenced by many factors, such as diet, antibiotic treatment, type of delivery and breast feeding. A healthy and stable intestinal flora plays a crucial role in maintaining intestinal barrier integrity, a homeostatic balance of function, metabolism and immunity, and regulating the gut-brain axis. Recent studies have highlighted the effects of the gut flora on the gut-brain axis and its potential role in central nervous system related disorders and neuropsychiatric disorders such as multiple sclerosis, autism, depression and schizophrenia. Intestinal flora and microbial metabolites are known to significantly interfere with metabolism, cognition, behavior and immunity of the host, and thus the role of intestinal flora and microbial metabolites in the pathogenesis of PD is of increasing interest and has recently shown some phenotypic correlations. For example, changes in the number and composition of gut microbiota and microbial metabolites are found in PD patients. Therefore, understanding the early interactions between the gut flora and the development of PD will open new avenues for intervention, especially for early diagnosis and treatment of PD.
At present, diagnostic models of diseases based on the intestinal flora, such as diagnostic models for colorectal cancer, ulcerative enteritis and predictive models for coronary artery diseases, are reported, but except for the diagnostic model of the intestinal flora of alzheimer disease, drugs for treating degenerative neurological diseases against the intestinal flora, such as treatment of alzheimer disease by GV-971, and diagnostic models of the intestinal flora against diseases of the central nervous system, are mostly developed. Due to the lack of early diagnostic markers in Parkinson's disease, most Parkinson's disease patients are diagnosed in an advanced stage, and the prognosis is poor. Considering that the diagnosis of parkinson's disease requires a complicated scale and the experience of doctors for judgment, finding a novel diagnostic marker and an efficient diagnostic model for parkinson's disease is urgently needed to improve prognosis.
From Rehman A et al, geological patterns of the standing and active human genome in health and IBD. Gut 65, 238-shaped 248(2016), Kushuulova A et al, Metagenomics of the gut microbiology from a Central aspect BMJ Open 8, e021682(2018) and Descraaux M et al, mapping the composition of gut microbiology in a population with varied dietary orientation but buried in the genome, D24, D1526-shaped 1531(2018), it is known that the gut flora has very large correlation with diet and human species, the Western is very different from the dietary structure, so a more precise method for diagnosing the population is necessary (S16).
Although in mainland china, the gut microbiome diversity has been analyzed by organizations in five cities, beijing, shanghai, guangzhou, vinblastic and jin using 16S rRNA amplicon sequencing technology. However, large-scale population research shows that the diagnosis model of the intestinal flora diseases has very obvious regional dependence, and different diseases are influenced by regional factors differently. Therefore, it is necessary to find a diagnostic marker of the intestinal flora of the parkinson disease and construct an auxiliary diagnostic model for the application of the auxiliary diagnosis of the parkinson disease to the population in the selected region.
Disclosure of Invention
In order to solve the technical problem that a high-accuracy noninvasive Parkinson disease intestinal flora diagnosis marker and an auxiliary diagnosis model are lacked in the prior art, the invention provides an intestinal microecological marker combination, an auxiliary diagnosis model and application thereof, the diagnosis marker is selected based on the intestinal flora to detect the Parkinson disease, the marker targeting the intestinal microecological can be used as a potential Parkinson disease noninvasive diagnosis tool in a certain area, and the diagnosis accuracy of the Parkinson disease can reach 80.3%.
The inventor finds that the relative abundance information of intestinal microorganisms of Parkinson disease patients and healthy people is mostly mapped to the bacterial kingdom in the metagenomic analysis of collected samples; further diversity analysis showed that the a diversity was higher in parkinson patients than in healthy persons at the genus and species level, and that the disease state was associated with changes in intestinal microorganisms; in conjunction with the beta diversity assessment, it was found that at different classification levels, the difference at the high classification level was more pronounced than the difference at the low classification level. Therefore, the inventors selected microorganisms having genus levels significantly different among groups, selected a random forest model, verified the role of the microorganisms having genus levels significantly different in predicting the types of samples to be tested, and constructed a parkinson's disease auxiliary diagnosis model based on the random forest model.
A first aspect of the invention provides a marker combination for gut microbiology, the marker combination comprising the following microorganisms: scatavia (Scardovia), ruminococcus (unnamed) (ruminococcus noname), cholephilus (Bilophila), Bacteroides (Bacteroides), gemfibrococcus (gemellal), Alistipes (Alistipes), Oxalobacter (Oxalobacter), sorafella (Solobacterium), Bifidobacterium (bifidum) and clostridium (unnamed) (clostridium noname).
In a preferred embodiment of the present invention, the marker combination further comprises: rosellia (Roseburia), anaerobic Corynebacterium (Anaerostipes), ParaSalmonella (Parastutterella), Megamonas (Megamonas), Klebsiella (Klebsiella), butyric acid monad (Butyrimonas), Coriolis (Collinsella), Shigella (Shigella), rare Chlorella (Subdoligranum) and Flavobacterium (Flavonibacter).
The marker combination is suitable for Hubei Xiangyang areas.
A second aspect of the invention provides a combination of reagents comprising reagents capable of detecting a combination of markers as described in the first aspect.
In a preferred embodiment of the present invention, the reagent combination comprises a reagent for PCR or a reagent for sequencing.
Preferably, the reagent combination comprises a marker combination as described in the first aspect.
A third aspect of the invention provides the use of a marker combination as described in the first aspect or a combination of reagents as described in the second aspect for the preparation of a diagnostic agent for the diagnosis of parkinson's disease.
A fourth aspect of the invention provides a diagnostic aid model comprising:
(1) the input module is used for inputting the information of the microbial taxonomy characterization and the relative abundance of the sample to be detected to obtain flora of the genus of top 10 or top 20 based on the average accuracy reduction method;
(2) the processing module calls a prediction function by adopting a random forest classifier, and predicts the source of the sample to be detected based on the flora of the genus of the top 10 or the top 20 in the step (1);
the random forest classifier obtains characteristic flora of the known sample based on the microbial taxonomy characterization and the relative abundance information of the known sample;
the definition of the random forest classifier is as follows: randomForest (class, data _ df, ntree, nPerm, 50, mtry floor (sqrt (ncol _ df) -1)), promimity T, and import T); wherein class is a dataset of information on taxonomic characterization and relative abundance of microorganisms for the known sample;
the prediction function is defined as follows: predict (rf, newdata ═ test _ df, type ═ response "); wherein test _ df is the information in (1).
In a preferred embodiment of the present invention, the characteristic flora is a flora composed of top-10 genera obtained based on the average accuracy reduction method; and/or, the information on the taxonomic characterization and relative abundance of microorganisms of the known sample and the sample to be tested is obtained by a microbiome metagenomic analysis such as metaphan 2.
In a more preferred embodiment of the present invention, the characteristic flora is a flora consisting of top-20 genera obtained based on the mean accuracy reduction method.
The sample to be tested may be intestinal secretions conventional in the art, preferably faeces, e.g. from a subject in the region of Xiangyang, Hubei.
In a preferred embodiment of the present invention, the auxiliary diagnostic model further comprises (0) a pre-processing module, and/or (3) an output module, wherein the pre-processing module performs extraction, library construction and sequencing on the DNA of the sample, obtains a metagenome original reading of the DNA of the sample, removes noise, and transmits the noise-removed information to the input module; the output module is used for outputting the prediction result of the processing module;
wherein, the noise removal means: and performing quality inspection on the metagenome original reading, and trimming the low-quality sequence to obtain the metagenome reading of the microbial DNA of the sample to be detected.
Preferably, the quality inspection is realized by second-generation sequencing quality control software such as FastQC, SolexaQA or PRINSEQ; and/or, the pruning of low quality sequences is achieved by metagenomic sequencing quality control software such as KneadData;
more preferably, the parameters of the kneadData are set as follows: "SLIDINGWINDOW: 4: 20 MINLENEN: 50 "; and/or, the denoising further comprises: deleting unwanted human DNA reads after pruning low quality sequences, said unwanted human DNA reads being deleted with the parameter "very-sensitive-dovetail".
A fifth aspect of the invention provides a method of obtaining a characteristic population of a known sample, comprising: obtaining a characteristic flora by adopting a random forest classifier based on the microbial taxonomy characterization and relative abundance information of a known sample;
wherein the random forest classifier is defined as follows: randomForest (class, data _ df, ntree, nPerm, 50, mtry floor (sqrt (ncol _ df) -1)), promimity T, and import T); wherein class is a data set of information on the taxonomic characterization and relative abundance of microorganisms for the known sample.
In a preferred embodiment of the present invention, the characteristic flora is a flora composed of top-10 genera obtained based on the average accuracy reduction method.
In a more preferred embodiment of the present invention, the characteristic flora is a flora consisting of top-20 genera obtained based on the mean accuracy reduction method.
And/or, the information on the taxonomic characterization and relative abundance of microorganisms of the known sample is obtained by a microbiome metagenomic analysis, such as metaphan 2.
In an embodiment of the invention, the method further comprises evaluating the accuracy of the random forest classifier.
In one embodiment of the invention, the accuracy of the random forest classifier is assessed by cross validation; the cross-validation is preferably selected from simple cross-validation, k-fold cross-validation, or leave-one cross-validation, such as leave-one cross-validation.
The advantage of leave-one-cross validation is that the maximum possible number of samples are used in each iteration for training, so the method is deterministic. With this maximum possible number of cross-validations, a more accurate classifier may be obtained.
In a preferred embodiment of the present invention, the number of decision trees of the random forest classifier is 1000(ntree ═ 1000), the number of preselected feature variables per node of each tree is the number of columns of the matrix, minus one, and the seed is set to 2019613.
The random forest classifier and the cross validation are completed through an R language.
In a preferred embodiment of the invention, the information on relative abundance is the difference in abundance of the bacterial population at different taxonomic levels assessed based on α diversity and β diversity.
Preferably, the method for assessing alpha diversity is a t test, preferably a Student's t test; the beta diversity assessment method comprises the following steps: genus abundance nonparametric permutation multivariate analysis of variance (PERMANOVA) and principal coordinates analysis (PCoA) based on the Bray-Curtis distance.
The nonparametric replacement multivariate analysis of variance preferably evaluates the clustering conditions of the samples under the prediction factors such as disease conditions, sexes, ages and the like; for example using vegan 2.5-4 package.
The principal coordinate analysis (PCoA) visualizes the clustering of the samples.
The alpha diversity calculation method comprises the following steps: shannon (Shannon) index and/or species abundance.
In a more preferred embodiment of the present invention, the method further comprises a pretreatment step of: extracting, constructing a library and sequencing the DNA of the known sample to obtain the metagenome original reading of the DNA of the known sample and removing noise;
wherein, the noise removal means: and performing quality inspection on the metagenome original reading, and trimming the low-quality sequence to obtain the metagenome reading of the microbial DNA of the known sample.
The quality inspection is realized by second-generation sequencing quality control software; preferably FastQC, SolexaQA or PRINSEQ; for example FastQC.
The pruning of the low-quality sequence is realized by metagenome sequencing quality control software; preferably KneadData.
The parameters of the KneadData are set as follows: "SLIDINGWINDOW: 4: 20 MINLENEN: 50 "; the parameter for deleting the unwanted human DNA reads is "- - -very-positive- -dovetail".
The denoising further includes: deleting unwanted human DNA reads after trimming low quality sequences; the parameter for deleting the unwanted human DNA reads is "- - -very-positive- -dovetail".
A sixth aspect of the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the functions of the auxiliary diagnostic model according to the fourth aspect.
On the basis of the common knowledge in the field, the above preferred conditions can be combined randomly to obtain the preferred embodiments of the invention.
The positive progress effects of the invention are as follows:
the model of the invention can provide high-accuracy noninvasive Parkinson disease auxiliary diagnosis based on the selected intestinal flora diagnosis marker, and the accuracy can reach 80.3%. If the patient can roughly know the own intestinal composition while making a diagnosis, the later treatment can be performed with a larger effect.
Drawings
FIG. 1 is an analysis of α and β diversity for example 1;
wherein: (a) species abundance of PD group and SP group at genus level, (b) shannon index of PD group and SP group at genus level, (c) species abundance of PD group and SP group at species level, (d) shannon index of PD group and SP group at species level, (e) PCoA analysis of Bray-Curtis distance between samples;
FIG. 2 shows the abundance difference between the PD group and the SP group.
FIG. 3 shows the abundance difference between the PD group and the SP group.
FIG. 4 is a differential bacterial group clade plot of the gut microbiome of the PD group and the SP group.
Fig. 5 is the top 10 most important genera of characteristic bacteria for the diagnostic model, determined by the random forest classifier MDA.
Fig. 6 is the top 20 most important genera of characteristic bacteria for the diagnostic model, determined by the random forest classifier MDA.
FIG. 7 is a ROC curve predicting the occurrence of PD in a patient cohort.
Detailed Description
The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention. The experimental methods without specifying specific conditions in the following examples were selected according to the conventional methods and conditions, or according to the commercial instructions.
Example 1
The embodiment comprises the following steps:
first, select the patient queue
This example is a cross-sectional study with 78 subjects all enrolled from the city of Xiangyang, Hubei province. In order to reduce the potential influence of factors such as diet and rest, the subjects were couples, i.e. one of them was parkinson patients (PD group) and the other was control (SP group).
The diagnostic criteria for PD were referred to in MDS (International society for dyskinesia) diagnosis in 2015. The primary core criteria for diagnosis is to determine whether a patient is parkinsonism, and if the patient exhibits bradykinesia combined with resting tremor and/or muscular rigidity, the patient is considered to have parkinsonism. Once the patient is definitely diagnosed as the Parkinson's disease, the patient needs to be diagnosed according to the support standard, the exclusion standard and the warning sign, and is determined as the clinically probable PD patient.
Exclusion criteria for the PD group were:
(1) eliminating the administration or infusion of antibiotics or probiotics for approximately three months;
(2) exclusion of severe gastrointestinal disorders;
(3) eliminating obvious mental diseases;
(4) the exclusion of platelets was 80 x 10 lower9/L;
(5) Exclusion of Prothrombin Time (PT) >15 s;
(6) the history of bleeding of any organ was excluded.
No detailed diet schedule was required prior to fecal collection and the sample was the first bowel movement on the day.
Secondly, extracting excrement DNA, constructing DNA library and sequencing
Fecal DNA was extracted according to the protocol provided by MetaHIT, DNA concentration was determined by Qubit (Invitrogen), and a DNA library was constructed according to the manufacturer's (MGI, China) instructions. That is, a sample insert of 350bp paired-end library was constructed using 500ng of DNA and sequenced using the BGISEQ-500 sequencer in the PE100 mode. A total of 1761.8GB of raw sequencing data was obtained for 78 stool samples.
Third, metagenome reading denoising and classification analysis
The shotgun metagenome data was processed according to the SOP of the Microbiome Helper (https:// github. com/LangilleLab/Microbiome _ Helper/wiki/Metagenomics-Tutorial-Humann 2). The fastQC tool was used to check the quality of metagenomic raw reads, the kneadData was used to trim low quality sequences (parameter: "SLIDNGWINDOW: 4: 20 MINLENEN: 50") and to delete unwanted human genomic (HG19) reads (parameter: -very-sensitive-dovetail).
After pruning and filtering by the KneadData software, a total of more than 4.05X 10 is obtained9The amount of 100bp high-quality end-paired (paired-end) data, in which the total number of human reads is 4.52X 107The ratio is 1.12%. The mean reads per sample for the PD group after host contamination removal was 5.31X 107±1.58×107SP group of 4.95X 107±2.26×107(Student's t-test, P ═ 0.41). The average number of reads of the host in the PD group was 6.07X 105±1.01×106SP group of 5.53X 105±1.71×106(Student's t-test,P=0.87)。
The software MetaPhlAn2 used unique clade-specific markers to detect the taxonomic clades present in the microbiome samples and to estimate their relative abundance. The processed readings were taxonomically characterized and abundance estimated using the default parameters of MetaPhlAn2 software.
Most of the readings of the samples examined mapped to the bacterial kingdom, 98.61 ± 5.45% and 99.87 ± 0.41% in the PD and SP groups, respectively (Mann-Whitney U test, P ═ 0.67), with a smaller ratio corresponding to the viral kingdom, 1.36 ± 5.42% in the PD group and 0.13 ± 0.41% in the SP group (Mann-Whitney U test, P ═ 0.89).
Alpha diversity was estimated by shannon index and species abundance. The Student's t test was used to assess alpha diversity.
This example analyzes the species abundance and the aromatic index of the microbiome at the genus and species level, respectively. The genus abundance of the PD group was significantly higher than that of the SP group (53.15 ± 7.69vs.48.56 ± 7.29, Student's t-test, P ═ 0.004) (fig. 1 a). The Shannon index of the PD group was significantly higher than that of the SP group (2.08 ± 0.38vs.1.76 ± 0.42, Student's t-test, P ═ 0.0002) (fig. 1 b). Similar trends were observed at lower taxon levels. The species abundance (115.69 ± 21.07vs.106.26 ± 17.43, Student's t-test, P ═ 0.017) (fig. 1c) and the aromatic index (2.77 ± 0.53vs.2.54 ± 0.51, Student's t-test, P ═ 0.028) (fig. 1d) of the PD group were significantly higher than the SP group. The results show that the diversity of the gut microbiome is significantly higher in PD patients than in healthy people. Thus, the higher gut microbiome abundance and the aromatic index in this example may not be indicative of a healthy gut microbiome.
Beta diversity assessment based on the Bray-Curtis distance matrix, non-parametric permutation multivariate analysis of variance (PERMANOVA) was performed on the genus abundance of all samples to assess the clustering of samples under predictors of disease status, gender, age, etc., and their relationship to the composition of intestinal microorganisms, and finally further visualized using principal coordinate analysis (PCoA) plots to assess the overall difference in microbial communities between the two groups.
PERMANOVA uses vegan 2.5-4 package.
In this example, the disease status is related to changes in intestinal microorganisms among the groups, and the effects of age and sex are relatively independent. The PCoA plot revealed a certain degree of separation of healthy controls from the PD population. The interpretations of the first two primary coordinates are 41.63% and 13.81%, respectively (fig. 1 e).
Differences in abundance of the groups between PD and SP groups were identified by the Linear Discriminant Analysis (LDA) effect size method (Lefse).
Only bacterial taxa with P <0.05(Kruksal-Wallis test) and LDA score >2 were considered significantly enriched.
According to the analysis, the gut microbiome in the sample consisted mainly of 3 phyla, including bacteroides (PD 54.79 ± 16.42%, SP 61.49 ± 12.88%, Mann-Whitney U-test, P ═ 0.09), Firmicutes (PD 28.90 ± 14.76%, SP 30.34 ± 13.17%, Mann-Whitney U-test, P ═ 0.47) and Proteobacteria (PD 12.34 ± 17.36%, SP 7.04 ± 6.82%, Mann-Whitney U-test, P ═ 0.43). It is noteworthy that there was a significant difference between Actinobacteria (PD 1.54 ± 2.11%, SP 0.56 ± 0.77%, Mann-Whitney U-test, P ═ 0.01) and synergestees (PD 2.52 ± 7.26%, SP 0.33 ± 1.12%, Mann-Whitney U-test, P ═ 0.01), and the abundance of the PD group was significantly increased. These results indicate that at high taxonomic levels there is a significant difference in the gut microbiome between the PD and SP groups. This of course also means that corresponding changes may occur at lower classification levels.
As shown in fig. 2-4, a total of 71 bacterial taxa were identified in this example as having abundance differences between the two groups. The Lefse algorithm reveals that there are differences between 1 phylum, 2 classes, 3 orders, 7 families, 14 genera and 44 species. Enrichment at the genus and species level is demonstrated in fig. 2 and 3, respectively. As shown in fig. 4, in the PD group, p _ Actinobacteria, c _ Actinobacteria, o _ bifidobacteria, f _ bifidobacteria and g _ Scardovia were observed to be enriched at different classification levels of the same clade. In addition, taxa c _ Deltaproteobacteria, o _ Desulfovibrionales, f _ Desulfovibrionacee, g _ Desulfovibrio, and g _ Bilophila also exhibited consistent enrichment at different taxonomic levels. In the SP group, f _ Bacteroidaceae and g _ Bacteroides share the same clade and show an enrichment trend and show a similar enrichment trend.
Fourth, the construction of disease auxiliary diagnosis model
To determine fecal bacterial characteristics for disease classification of metagenomic samples, the study used a Random Forest (RF) classifier and leave-one-out cross-validation to evaluate accuracy, i.e., a portion of the samples were selected as validation set and another portion of the samples were used as training set to determine parameters of the random forest and calculate the probability of correct prediction for the validation samples.
A prediction model was constructed based on the relative abundance of the gut microflora of 78 subjects. The number of decision trees in the RF is set to 1000(ntree is 1000), the number of preselected feature variables per node of each tree is the second root of the number of columns of the matrix minus one, and the seed is set to 2019613. And (3) determining the variable with the most classification capability by analyzing Mean increment Accuracy (MDA) and finally establishing a random forest classifier.
An ROC curve is established and the area under the ROC curve (AUC) is calculated for evaluating the accuracy of the new standard on disease prediction.
In the embodiment, the random forest algorithm is used for classifying the samples according to the disease conditions and establishing a diagnosis model. One of the advantages of the random forest algorithm model is that it can estimate the importance of each feature and identify the most important features in the classification process. As shown in fig. 5 and 6, the most important 10 genera in the random forest classifier based on MDA include Scardovia, ruminococcus noname, Bilophila, Bacteroides, Gemella, Alistipes, Oxalobacter, Solobacterium, bidolobacterium, and Clostridiales noname; the top 20 genera of most importance also include Roseburia, Anaerostipes, Parastutterella, Megamonas, Klebsiella, Butyricimonas, Collinsella, Shigella, Subdoligranum, and Flavonfractor. These were verified as characteristic bacterial groups. To improve the results of the random forest classifier, models were constructed using the top 10 MDA features and the top 20 MDA features.
This example uses the ROC curve and the area under the curve AUC to evaluate the performance of the RF binary classifier. As shown in fig. 7, the ordinate is sensitivity and the abscissa is specificity; PD can be distinguished from SP using all genera, with an AUC of 0.663, whereas the AUC for the variable using the LefSe method is only 76.0%, with an AUC of 0.795 using the first 10 MDA features and an AUC of 0.803 using the first 20 MDA features, the diagnostic accuracy is further improved.
The above workflow is completed in R (4.6-14, random forest package).
Claims (10)
1. A marker combination for gut microbiology, wherein the marker combination comprises the following microorganisms: scatavia (Scardovia), ruminococcus (unnamed) (ruminococcus noname), cholephilus (Bilophila), Bacteroides (Bacteroides), gemfibrococcus (gemellal), Alistipes (Alistipes), Oxalobacter (Oxalobacter), sorafella (Solobacterium), Bifidobacterium (bifidum) and clostridium (unnamed) (clostridium noname).
2. The marker combination of claim 1 wherein said marker combination further comprises: rosellia (Roseburia), anaerobic Corynebacterium (Anaerostipes), ParaSalmonella (Parastutterella), Megamonas (Megamonas), Klebsiella (Klebsiella), butyric acid monad (Butyrimonas), Coriolis (Collinsella), Shigella (Shigella), rare Chlorella (Subdoligranum) and Flavobacterium (Flavonibacter).
3. A reagent combination comprising reagents capable of detecting a marker combination according to claim 1 or 2, such as reagents for PCR or sequencing; preferably, the reagent combination further comprises a marker combination according to claim 1 or 2.
4. Use of a marker combination according to claim 1 or 2 or a reagent combination according to claim 3 for the preparation of a diagnostic agent for the diagnosis of parkinson's disease.
5. An aided diagnosis model, comprising:
(1) the input module is used for inputting the information of the microbial taxonomy characterization and the relative abundance of the sample to be detected to obtain flora of the genus of top 10 or top 20 based on the average accuracy reduction method;
(2) the processing module calls a prediction function by adopting a random forest classifier, and predicts the source of the sample to be detected based on the flora of the genus of the top 10 or the top 20 in the step (1);
the random forest classifier obtains characteristic flora of the known sample based on the microbial taxonomy characterization and the relative abundance information of the known sample;
the definition of the random forest classifier is as follows: randomForest (class, data _ df, ntree, nPerm, 50, mtry floor (sqrt (ncol _ df) -1)), promimity T, and import T); wherein class is a dataset of information on taxonomic characterization and relative abundance of microorganisms for the known sample;
the prediction function is defined as follows: predict (rf, newdata ═ test _ df, type ═ response "); wherein test _ df is the information in (1);
preferably, the characteristic flora is a flora consisting of top-10 genera obtained based on an average accuracy reduction method; and/or, the information of the microbiologic characterization and relative abundance of the known sample and the test sample is obtained by microbiome metagenomic analysis, such as metaphan 2;
more preferably, the characteristic flora is a flora consisting of top-20 genera obtained based on an average accuracy reduction method;
even more preferably, the sample to be tested is feces, such as feces from a subject in the Hubei Xiangyang region.
6. The aided diagnosis model of claim 5, further comprising (0) a pre-processing module, and/or (3) an output module, wherein the pre-processing module performs extraction, library construction and sequencing on the DNA of the sample, obtains metagenome original reading of the DNA of the sample, removes noise, and transmits the noise-removed information to the input module; the output module is used for outputting the prediction result of the processing module;
wherein, the noise removal means: performing quality inspection on the metagenome original reading, and trimming a low-quality sequence to obtain the metagenome reading of the microbial DNA of the sample to be detected;
preferably, the quality inspection is realized by second-generation sequencing quality control software such as FastQC, SolexaQA or PRINSEQ; and/or, the pruning of low quality sequences is achieved by metagenomic sequencing quality control software such as KneadData;
more preferably, the parameters of the kneadData are set as follows: "SLIDINGWINDOW: 4: 20 MINLENEN: 50 "; and/or, the denoising further comprises: deleting unwanted human DNA reads after pruning low quality sequences, said unwanted human DNA reads being deleted with the parameter "very-sensitive-dovetail".
7. A method for obtaining a population characteristic of a known sample, comprising: based on the microbial taxonomy characterization and relative abundance information of the known sample, obtaining the characteristic flora of the known sample by adopting a random forest classifier;
wherein the random forest classifier is defined as follows: randomForest (class, data _ df, ntree, nPerm, 50, mtry floor (sqrt (ncol _ df) -1)), promimity T, and import T); wherein class is a dataset of information on taxonomic characterization and relative abundance of microorganisms for the known sample;
preferably, the characteristic flora is a flora consisting of top-10 genera obtained based on an average accuracy reduction method;
more preferably, the characteristic flora is a flora consisting of top-20 genera obtained based on an average accuracy reduction method; and/or, the information on the microbiology characterization and relative abundance of the known sample is obtained by microbiome metagenomic analysis, such as metaphan 2;
further preferably, the method further comprises the step of evaluating the accuracy of the random forest classifier; for example, the accuracy of the random forest classifier is assessed by cross-validation; the cross-validation is preferably selected from simple cross-validation, k-fold cross-validation, or leave-one cross-validation, such as leave-one cross-validation.
8. The method of claim 7, wherein the information on relative abundance is the difference in abundance of the bacterial population at different taxonomic levels assessed based on α diversity and β diversity;
preferably, the calculation method of the alpha diversity comprises a shannon index and/or a species abundance degree, and the evaluation method is a t test, preferably a Student's t test; the beta diversity assessment method comprises the following steps: genus abundance nonparametric replacement multivariate analysis of variance based on Bray-Curtis distance using, for example, vegan 2.5-4package, and principal coordinate analysis.
9. The method of claim 7, further comprising the step of preprocessing: extracting, constructing a library and sequencing the DNA of the known sample to obtain the metagenome original reading of the DNA of the known sample and removing noise;
wherein, the noise removal means: performing quality inspection on the metagenome original reading, and trimming a low-quality sequence to obtain the metagenome reading of the microbial DNA of the known sample;
preferably, the quality inspection is realized by second-generation sequencing quality control software such as FastQC, SolexaQA or PRINSEQ; and/or, the pruning of low quality sequences is achieved by metagenomic sequencing quality control software such as KneadData;
more preferably, the parameters of the KneadData are set as follows: "SLIDINGWINDOW: 4: 20 MINLENEN: 50 "; and/or, the denoising further comprises: deleting unwanted human DNA reads after pruning low quality sequences, said unwanted human DNA reads being deleted with the parameter "very-sensitive-dovetail".
10. A computer-readable storage medium, characterized in that the computer-readable medium stores a computer program which, when being executed by a processor, carries out the functions of an auxiliary diagnostic model as claimed in claim 5 or 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110192133.3A CN112852916A (en) | 2021-02-19 | 2021-02-19 | Marker combination for intestinal microecology, auxiliary diagnosis model and application of marker combination |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110192133.3A CN112852916A (en) | 2021-02-19 | 2021-02-19 | Marker combination for intestinal microecology, auxiliary diagnosis model and application of marker combination |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112852916A true CN112852916A (en) | 2021-05-28 |
Family
ID=75988369
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110192133.3A Pending CN112852916A (en) | 2021-02-19 | 2021-02-19 | Marker combination for intestinal microecology, auxiliary diagnosis model and application of marker combination |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112852916A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114085899A (en) * | 2021-10-11 | 2022-02-25 | 广东省人民医院 | Judgment marker for cognitive impairment of Parkinson's disease and application of judgment marker |
CN114373505A (en) * | 2021-12-29 | 2022-04-19 | 浙江大学 | System for early prediction of postpartum subclinical ketosis of dairy cow based on intestinal microorganisms |
CN114854884A (en) * | 2022-05-27 | 2022-08-05 | 山东农业大学 | Method for early warning or noninvasive diagnosis of fatty liver dairy cow by using fecal microorganisms belonging to level |
WO2023138266A1 (en) * | 2022-01-20 | 2023-07-27 | 浙江养生堂天然药物研究所有限公司 | Biomarker related to parkinson's disease and application thereof |
WO2023206739A1 (en) * | 2022-04-26 | 2023-11-02 | 中国科学院深圳先进技术研究院 | Feces-based biomarker of alzheimer's disease and use thereof |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160266113A1 (en) * | 2000-11-16 | 2016-09-15 | Curemark Llc | Methods for Diagnosing Pervasive Development Disorders, Dysautonomia and Other Neurological Conditions |
CN109658980A (en) * | 2018-03-20 | 2019-04-19 | 上海交通大学医学院附属瑞金医院 | A kind of screening and application of excrement gene marker |
CN110546280A (en) * | 2017-02-24 | 2019-12-06 | Md保健株式会社 | Method for diagnosing Parkinson's disease by bacterial metagenomic analysis |
-
2021
- 2021-02-19 CN CN202110192133.3A patent/CN112852916A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160266113A1 (en) * | 2000-11-16 | 2016-09-15 | Curemark Llc | Methods for Diagnosing Pervasive Development Disorders, Dysautonomia and Other Neurological Conditions |
CN110546280A (en) * | 2017-02-24 | 2019-12-06 | Md保健株式会社 | Method for diagnosing Parkinson's disease by bacterial metagenomic analysis |
CN109658980A (en) * | 2018-03-20 | 2019-04-19 | 上海交通大学医学院附属瑞金医院 | A kind of screening and application of excrement gene marker |
Non-Patent Citations (11)
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114085899A (en) * | 2021-10-11 | 2022-02-25 | 广东省人民医院 | Judgment marker for cognitive impairment of Parkinson's disease and application of judgment marker |
CN114373505A (en) * | 2021-12-29 | 2022-04-19 | 浙江大学 | System for early prediction of postpartum subclinical ketosis of dairy cow based on intestinal microorganisms |
CN114373505B (en) * | 2021-12-29 | 2022-11-01 | 浙江大学 | System for early prediction of postpartum subclinical ketosis of dairy cow based on intestinal microorganisms |
WO2023138266A1 (en) * | 2022-01-20 | 2023-07-27 | 浙江养生堂天然药物研究所有限公司 | Biomarker related to parkinson's disease and application thereof |
WO2023206739A1 (en) * | 2022-04-26 | 2023-11-02 | 中国科学院深圳先进技术研究院 | Feces-based biomarker of alzheimer's disease and use thereof |
CN114854884A (en) * | 2022-05-27 | 2022-08-05 | 山东农业大学 | Method for early warning or noninvasive diagnosis of fatty liver dairy cow by using fecal microorganisms belonging to level |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105603066B (en) | Intestinal microbial marker of mental disorder and application thereof | |
CN112852916A (en) | Marker combination for intestinal microecology, auxiliary diagnosis model and application of marker combination | |
Galvez et al. | Shaping of intestinal microbiota in Nlrp6-and Rag2-deficient mice depends on community structure | |
CN108350502B (en) | Microbiome derived diagnostic and therapeutic methods and systems for oral health | |
CN105368944B (en) | Biomarker of detectable disease and application thereof | |
Paulson et al. | Differential abundance analysis for microbial marker-gene surveys | |
CN111430027B (en) | Duplex affective disorder biomarker based on intestinal microorganisms and screening application thereof | |
CN105132518B (en) | Large intestine carcinoma marker and its application | |
CN107217089B (en) | Method and device for determining individual state | |
CN105296590A (en) | Colorectal cancer marker and application thereof | |
WO2020244018A1 (en) | Small-scale schizophrenia biomarker combination, application thereof and metaphlan2 screening method therefor | |
CN110904213B (en) | Ulcerative colitis biomarker based on intestinal flora and application thereof | |
CN112119167B (en) | Biomarker for depression and application thereof | |
WO2017109059A1 (en) | Microbial marker in inflammatory arthritis diseases | |
Lin et al. | Alterations in the fecal microbiota of patients with spinal cord injury | |
Abbas-Egbariya et al. | Meta-analysis defines predominant shared microbial responses in various diseases and a specific inflammatory bowel disease signal | |
Kosciolek et al. | Individuals with substance use disorders have a distinct oral microbiome pattern | |
Fan et al. | Altered gut microbiota in older adults with mild cognitive impairment: a case-control study | |
Cui et al. | Gut microbiome distinguishes patients with epilepsy from healthy individuals | |
Yuan et al. | Classification of mild cognitive impairment with multimodal data using both labeled and unlabeled samples | |
Ren et al. | Lifestyle patterns influence the composition of the gut microbiome in a healthy Chinese population | |
CN112384634B (en) | Osteoporosis biomarker and application thereof | |
CN114657270B (en) | Alzheimer disease biomarker based on intestinal flora and application thereof | |
RU2699284C2 (en) | System and method of interpreting data and providing recommendations to user based on genetic data thereof and data on composition of intestinal microbiota | |
CN110396538A (en) | Migraine biomarker and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210528 |