CN115261499B - Intestinal microbial marker related to endurance and application thereof - Google Patents
Intestinal microbial marker related to endurance and application thereof Download PDFInfo
- Publication number
- CN115261499B CN115261499B CN202210970874.4A CN202210970874A CN115261499B CN 115261499 B CN115261499 B CN 115261499B CN 202210970874 A CN202210970874 A CN 202210970874A CN 115261499 B CN115261499 B CN 115261499B
- Authority
- CN
- China
- Prior art keywords
- endurance
- intestinal
- reagent
- sample
- subject
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/689—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/20—Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12R—INDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
- C12R2001/00—Microorganisms ; Processes using microorganisms
- C12R2001/01—Bacteria or Actinomycetales ; using bacteria or Actinomycetales
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12R—INDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
- C12R2001/00—Microorganisms ; Processes using microorganisms
- C12R2001/01—Bacteria or Actinomycetales ; using bacteria or Actinomycetales
- C12R2001/07—Bacillus
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12R—INDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
- C12R2001/00—Microorganisms ; Processes using microorganisms
- C12R2001/01—Bacteria or Actinomycetales ; using bacteria or Actinomycetales
- C12R2001/145—Clostridium
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Genetics & Genomics (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Evolutionary Biology (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Organic Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Software Systems (AREA)
- Immunology (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Public Health (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Microbiology (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention belongs to the field of biological medicine, and in particular relates to an intestinal microbial marker related to endurance and application thereof. Specifically, the intestinal microorganisms include: proteobacteria_bacteria_CAG_139, parasutella_ excrementihominis, burkholderiales _bacteria_1_1_47, olsenella_ profusa, parasutterella _externtenthominis_CAG_233, clostridium_ ihumii, thermoanaerobacter _ italicus, bacillus _sp_MUM_116.
Description
Technical Field
The invention belongs to the field of biological medicine, and in particular relates to an intestinal microbial marker related to endurance and application thereof.
Background
The disclosure of this background section is only intended to increase the understanding of the general background of the invention and is not necessarily to be construed as an admission or any form of suggestion that this information forms the prior art already known to those of ordinary skill in the art.
The endurance of a person to strenuous physical activity is the ability of the person to perform sustained muscle work for a long period of time, i.e., the ability to combat fatigue. Endurance includes two aspects, muscle endurance and cardiovascular endurance. The improvement of endurance is not only dependent on the developmental maturity of the human, but also on the load requirements.
The endurance exercise promotes health, and the improvement of endurance exercise capacity has remarkable effects in preventing and treating diabetes, cardiovascular diseases and other chronic diseases. Regular endurance burden training may bring about adaptation of the muscular, organ, cardiopulmonary, blood, immune system and substance metabolism regulation. Endurance exercise induces a series of physiological adaptations, which play an important role in promoting cardiac function and the organism's aerobic metabolic capacity.
The human gastrointestinal tract hosts trillions of microorganisms, which are called intestinal flora (intestinal microorganisms), which are the second set of genomic information of the human body in addition to chromosomes. Along with the continuous updating of DNA sequencing technology, the way of researching intestinal flora is more and more convenient, and the research on the intestinal flora is more and more in depth. The intestinal flora maintains the balance of physiological activities of human bodies and is influenced by various factors. There is increasing evidence that human Gut Microbiota (GM) may be a useful marker and contributor to diagnosis, treatment and prevention of many human diseases, such as obesity, diabetes, liver disease, cancer and neurodegenerative diseases. Intestinal microbiota is a complex micro-ecological system that remains relatively stable throughout the life cycle, but may be subject to fluctuations in diet.
Disclosure of Invention
The disclosure solves the technical problem of how to provide an intestinal microorganism for distinguishing high and low endurance people, so as to non-invasively distinguish high and low endurance people, and apply the intestinal microorganism to preparing related products for distinguishing high and low endurance people, wherein the products include but are not limited to kits, systems and the like.
In order to solve the technical problems, whether intestinal flora (also called intestinal bacteria and intestinal microorganisms) can be used as biomarkers for distinguishing high-low endurance population is studied. The model provided by the invention has high resolution capability through the operation characteristics (ROC curve) of a receiver, and can realize the effect of distinguishing high and low endurance crowd.
Based on the research findings, the present disclosure proposes the following technical solutions:
in a first aspect, the present invention provides the use of an agent for detecting a combination of intestinal microorganisms consisting of:
Proteobacteria_bacterium_CAG_139、Parasutterella_excrementihominis、
Burkholderiales_bacterium_1_1_47、Olsenella_profusa、
Parasutterella_excrementihominis_CAG_233、Clostridium_ihumii、
Thermoanaerobacter_italicus、Bacillus_sp_MUM_116。
preferably, the detection is performed on a biological sample from a subject; most preferably, the biological sample is a fecal sample. More precisely, the detection is to detect the abundance of each microorganism in the intestinal microorganism combination in the biological sample of the subject (the expression level of the gene thereof is representative of the abundance).
Preferably, the reagent may include at least one of a sequencing reagent, a reagent for detecting gene expression amount, and a reagent for extracting DNA.
Preferably, the method of sequencing (DNA sequencing) may be any sequencing method known in the art, such as a first generation sequencing method (Sanger method), a second generation sequencing method (also known as Next generation sequencing technology Next-generation sequencing, high-throughput sequencing technology high-throughput sequencing), a third generation sequencing method (referring to single molecule sequencing technology). Specifically, the sequencing method selected in the embodiment of the invention is a second generation sequencing method, specifically sequencing by using an Illumina HiSeq sequencing platform, and the sequencing platform is suitable for sequencing of whole genome and transcriptome.
Preferably, the kind of the reagent for detecting gene expression is also well known, and specifically includes, but is not limited to, a specific probe that specifically binds to a target sequence, a specific primer that amplifies the target sequence, and the like.
The "specific probe" of the present invention may be a single labeled nucleic acid probe, such as a radionuclide (e.g., 32P, 3H, 35S, etc.) labeled probe, biotin labeled probe, horseradish peroxidase labeled probe, digoxin labeled probe, or fluorescent group (e.g., FITC, FAM, TET, HEX, TAMRA, cy3, cy5, etc.) labeled probe; the "specific probe" according to the present invention may also be a double-labeled nucleic acid probe such as Taqman probe, molecular beacon, substitution probe, QUAL probe, FRET probe, etc.
The "specific primers" of the present invention can bind to a target nucleic acid for initial nucleic acid synthesis and template processing in a nucleic acid amplification reaction. The nucleic acid amplification reaction includes PCR, specifically, reverse transcription PCR (RT-PCR), in situ PCR, ligase chain reaction (Ligase chain reaction, LCR), tagged PCR (Labeled primers, LP-PCR), inverse PCR (reverse PCR, amplification of unknown sequences outside two primers), asymmetric PCR (asymmetric PCR), falling PCR (touchdown PCR), recombination PCR (recombinant PCR), nested PCR (nest PCR), multiplex PCR (multiplex PCR), immune-PCR (immune-PCR), mRNA differential PCR, strand displacement amplification (Strand displacement amplification, SDA), nucleic acid sequence dependent amplification (Nucleic acid sequence-based amplification, NASBA), transcription dependent amplification system (transmissit-based amplification system, TAS), Q replicase (Q-beta replicase) catalyzed RNA amplification, rolling circle amplification (Rolling circle amplification, RCA), loop-mediated isothermal amplification (Loop mediated isothermal amplification, LAMP), and the like.
Preferably, the DNA extraction reagent comprises a reagent used in the CTAB method, a reagent used in the GITC method, or a commercial DNA extraction kit (as used in the embodiments of the present invention). More specifically, the reagents for extracting DNA may include a lysis buffer, a binding buffer, a washing buffer, and an elution buffer. The lysis buffer typically consists of a protein denaturing agent, a detergent, a pH buffer and a nuclease inhibitor. The binding buffer typically consists of a protein denaturing agent and a pH buffer. The washing buffer is divided into a washing buffer A and a washing buffer B: the cleaning buffer solution A consists of a protein denaturant, a nuclease inhibitor, a detergent, a pH buffer and ethanol; wash buffer B consisted of nuclease inhibitor, pH buffer and ethanol. The elution buffer typically consists of a nuclease inhibitor and a pH buffer. The protein denaturant is selected from one or more of guanidine isothiocyanate, guanidine hydrochloride and urea; the detergent is selected from one or more of Tween20, IGEPAL CA-630, triton X-100, NP-40 and SDS; the pH buffer is selected from one or more of Tris, boric acid, phosphate, MES and HEPES; the nuclease inhibitor is selected from one or more of EDTA, EGTA and DEPC.
In a second aspect, the present invention provides a kit for distinguishing between high and low endurance populations, the kit comprising reagents for detecting a combination of intestinal microorganisms:
Proteobacteria_bacterium_CAG_139、Parasutterella_excrementihominis、
Burkholderiales_bacterium_1_1_47、Olsenella_profusa、
Parasutterella_excrementihominis_CAG_233、Clostridium_ihumii、
Thermoanaerobacter_italicus、Bacillus_sp_MUM_116。
preferably, the kit may further comprise an instrument and/or reagent for collecting a biological sample collected from the subject.
Most preferably, the biological sample is a fecal sample.
In a third aspect, the present invention provides a system for distinguishing between high and low endurance population, the system comprising a computing device for determining whether the endurance of a subject is high or low according to the detection result of the intestinal microorganism combination.
Preferably, the detection result is obtained by detecting a sample from a subject.
Most preferably, the biological sample is a fecal sample.
Preferably, the detection comprises sequencing or gene expression level detection, the gene expression level of the intestinal microorganisms obtained by the detection can represent the abundance of the intestinal microorganisms, and the population with high and low endurance can be distinguished according to the abundance of each intestinal microorganism in the intestinal microorganism combination.
Preferably, the system may further comprise detection means.
Preferably, the detecting comprises sequencing or detecting the gene expression levels of the individual intestinal microorganisms in the intestinal microorganism combination.
Preferably, the detection device can comprise a real-time quantitative PCR instrument, a high throughput sequencing platform, a detection chip, a chip signal reader and the like.
Preferably, the system may further comprise a detection result collecting device.
Preferably, the system may further comprise a result output means.
Preferably, the system may further include a result transmitting means that can transmit a resolution result of the subject being the high or low endurance crowd to an information communication terminal device that the patient or the medical staff can refer to.
In a fourth aspect, the invention provides application of the kit and the system in distinguishing high and low endurance population and in preparing products for distinguishing high and low endurance population.
In a fifth aspect, the present invention provides a method of determining the level of endurance in a subject, the method comprising determining the level of endurance in the subject based on the results of the detection of the combination of intestinal microorganisms.
More specifically, the method comprises the steps of:
1) Collecting a biological sample of the subject, in particular a fecal sample;
2) Extracting DNA;
3) Sequencing or detecting the gene expression quantity of each intestinal microorganism in the intestinal microorganism combination;
4) Judging whether the subject belongs to the high endurance crowd or the low endurance crowd according to the detection result of the 3).
Implementation of the methods and/or systems provided by the present invention may include performing or completing selected tasks manually, automatically, or a combination thereof. Moreover, the actual instrumentation and equipment of the embodiments of the method and/or system according to the present invention could implement several selected tasks by hardware, by software, or by firmware or by a combination thereof using an operating system. For example, hardware for performing selected tasks according to embodiments of the invention could be implemented as a chip or circuit. As software, selected tasks according to embodiments of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system.
In a sixth aspect, the present invention provides a screening method for a combination of intestinal microorganisms useful for determining the endurance level of a subject, the method comprising the steps of:
1) Dividing the subjects into two groups, high endurance and low endurance;
2) Detecting a candidate intestinal microbial level in the subject;
3) Judging the difference degree and/or similarity of the candidate intestinal microorganisms between the high-endurance subjects and the low-endurance subjects, so as to obtain the difference intestinal microorganisms;
4) And 3) verifying the differential intestinal microorganisms obtained in the step 3), wherein the AUC value reaches a certain standard and is determined to be the intestinal microorganisms which can be used for judging the endurance level of the subject.
Preferably, the endurance level is represented by a running completion speed of 3000 m.
Preferably, the result of the screening method is to give the intestinal microorganism combination provided by the invention.
Preferably, step 2) is performed by genetic sequencing.
In a seventh aspect, the present invention further provides a method for modeling a population with high and low endurance, the method comprising modeling using the intestinal microorganism combination provided by the present invention.
Preferably, the algorithms for model construction include logistic regression (LogReg), linear Discriminant Analysis (LDA), eigenvectors linear discriminant analysis (EigengeneLinearDiscriminant Analysis, ELDA), support vector machine (Support Vector Machines, SVM), random Forest (RF), recursive partitioning tree (RPART), XGBoost (XGB) and other relevant Decision tree classification techniques, shrunkenCentroids (SC), stepAIC, nearest Kth neighbors (Kth-Nearest Neighbor), boosting, decision tree (Decision tree), neural networks, bayesian networks, support vector machine and hidden markov model (Hidden MarkovModels), etc., and many such algorithm techniques are further implemented to perform eigenvectors (genetic) selection and regularization (regularization), such as in ridge regression (ridge regression), lasso on. The generated predictive models may be validated in other studies or cross-validated in studies in which they were originally trained, using techniques such as Bootstrap, leave-One-Out (LOO) and 10-Fold cross-validation (10-Fold CV). At various steps, the false discovery rate may be estimated by a permutation of values according to techniques known in the art.
The term "intestinal microorganisms" as used herein refers to a vast number of microorganisms present in the human intestinal tract, which rely on the intestinal life of animals while helping the host to perform a variety of physiological and biochemical functions. The intestinal tract is not only an important site for digestion and absorption by the human body, but also the largest immune organ, and plays an extremely important role in maintaining normal immune defenses. The "intestinal microorganisms" include intestinal flora, enteroviruses such as phage, etc.
Drawings
Fig. 1 is a graph of ROC for an optimal model in resolving high and low endurance populations.
Figure 2 is a statistical plot of AUC values for different species numbers for high and low endurance population discrimination.
Detailed Description
The present invention is further described in terms of the following examples, which are given by way of illustration only, and not by way of limitation, of the present invention, and any person skilled in the art may make any modifications to the equivalent examples using the teachings disclosed above. Any simple modification or equivalent variation of the following embodiments according to the technical substance of the present invention falls within the scope of the present invention.
Example 1 screening and validation of differential intestinal microorganisms between high endurance, low endurance populations
1. Subject information
140 subjects were selected and tested for 3000 meters, 140 subjects were men between 18-22 years old, physically non-obese or over-lean populations, and no other underlying metabolic disease or trauma.
The 3000 m test results of 140 individuals are ranked according to the length of time, the shorter the time is used for representing stronger explosive force, the longer the time is used for representing weaker explosive force, the median of the results of the whole test crowd is taken as a demarcation point, the first 72 are the first group, and the second 68 are the second group. Specifically, the average time period of the first group was 13 minutes and 46 seconds, and the average time period of the second group was 16 minutes and 1 second. The two groups represent high and low endurance populations, respectively.
2. Experimental method
1. Fecal sample collection and DNA extraction
Collecting the stool samples of the crowd, and then adopting a kit to extract DNA to obtain an extracted DNA sample.
2. Metagenome high throughput sequencing and analysis
In the study, an Illumina HiSeq sequencing platform is adopted for sequencing, 1,101,388.83Mbp of original Data (Raw Data) (the average Data volume is 6,517.09 Mbp) is obtained altogether, 1,098,496.61Mbp of effective Data (clear Data) (the average Data volume is 6,499.98 Mbp) is obtained through quality control, and 21,297,727,819bp of Scaftigs is obtained altogether after single sample assembly and mixed assembly. Gene prediction was performed on each sample and the results of the mixed assembly using MetaGeneMark software to obtain 26,606,828 Open Reading Frames (ORFs) (average 157,437), and after redundancy removal, a total of 3,005,425 ORFs were obtained, with a total length of 2,155.60Mbp, wherein the number of complete genes was 1,788,981, and the proportion was 59.53%. The non-redundant gene set was blastp compared with the microNR library and species annotation was performed using the LCA algorithm, with the annotation to genus and phylum ratios of 68.59% and 88.87%, respectively. Common functional database annotation (e-value < = 10-5) was performed on non-redundant gene sets using DIAMOND software, with 93,372 (3.11%) ORFs aligned to the CAZy database, 1,834,009 (61.02%) ORFs aligned to the KEGG database, 1,779,683 (59.22%) ORFs aligned to the eggNOG database. Non-redundant gene sets were annotated with the resistance gene database (CARD) (e-value < = 10-30), with 1330 genes aligned to the CARD database.
(1) Sequencing data pretreatment
Summary of quality control results: the total sequencing data amount is 1,101,388.83mbp, the average sequencing data amount is 6,517.09mbp, the total data amount and the average data amount after quality control are 1,098,496.61mbp,6,499.98mbp, and the effective data rate of quality control is 99.74%.
The specific processing steps of the data preprocessing are as follows:
1) Removing reads containing low-quality bases (mass value < = 38) exceeding a certain proportion (40 bp by default);
2) Removing N bases to reach a certain proportion of reads (10 bp by default);
3) Removing reads which exceed a certain threshold value (set as 15bp by default) from overlap between adapters;
4) If the sample has host pollution, comparing the sample with a host database, and filtering reads possibly derived from the host;
(2) Metagenome assembly
Summary of assembly results: co-assembling to obtain Scaffolds of 23,380,685,107bp, average length of 2,043.82bp, maximum length of 1,391,704bp, N50 of 5,318.81bp, N90 of 724.33bp; scaftibds were broken from N to give Scaftigs, which gave 21,297,727,819bp of Scaftigs with an average length of 1,966bp, N50 of 4,668bp and N90 of 703bp.
The specific processing steps of metaname assembly are as follows:
1) The Clean Data is obtained after pretreatment, and is assembled by using SOAP denovo assembling software;
2) For a single sample, firstly, selecting a K-mer (55 is selected by default) for assembly to obtain an assembly result of the sample;
3) Breaking the assembled scaffoldes from the N-junctions, resulting in a sequence fragment free of N, termed Scaftigs (i.e., continuous sequences within Scaffolds);
4) Comparing the CleanData subjected to quality control with the Scaftigs assembled by the samples by using Bowtie2 software to obtain PE reads which are not utilized;
5) Putting the ready of each sample which is not utilized together for mixed assembly, and only selecting one kmer for assembly (default-K55) in consideration of calculation consumption and time consumption during assembly, wherein other assembly parameters are the same as those of a single sample;
6) Breaking the mixed assembled Scaffolds from the N junction to obtain a N-free Scaftigs sequence;
7) Filtering fragments below 500bp for Scaftigs generated by single sample and mixed assembly, and carrying out statistical analysis and subsequent gene prediction;
(3) Gene prediction and abundance analysis
Summary of gene prediction results: a total of 26,606,828 ORFs were predicted, with an average of 157,437 ORFs per sample; after redundancy elimination, 3,005,425 ORFs with total length of 2,155.60Mbp, average length of 717.24bp and GC content of 44.76% are obtained, wherein 1,788,981 complete genes account for 59.53% of the total number of all non-redundant genes.
Basic steps of gene prediction:
1) ORF (Open Reading Frame) prediction and filtration was performed using MetaGeneMark, starting from each sample and mixed assembled Scaftigs (> = 500 bp);
2) Performing redundancy elimination on each sample and the ORF prediction result of the mixed assembly by adopting CD-HIT software;
3) Comparing the clear Data of each sample with the redundant representative genes, and calculating to obtain the numbers of reads of the genes in the comparison of each sample;
4) Filtering out genes supporting a number of reads >2 that are not present in each sample, obtaining a gene catalog (Unigenes) that is ultimately used for subsequent analysis;
5) Starting from the number of reads and the length of the genes on the comparison, calculating to obtain the abundance information of each gene in each sample;
6) Based on the abundance information of each gene in the gene category in each sample, basic information statistics, core-pan gene analysis, correlation analysis among samples and gene number wien diagram analysis are carried out.
(4) Species annotation
Species annotation results overview: among the ORFs annotated to the NR database, the number of ORFs annotated to the NR database was 2,499,701 (83.17%), the proportion of the threshold level was 91.61%, the proportion of the line level was 88.87%, the proportion of the mesh level was 84.75%, the proportion of the line level was 84.12%, the proportion of the family level was 73.12%, the proportion of the genus level was 68.59%, and the proportion of the species level was 50.11%, among 3,005,425 predicted genes from which redundancy was originally removed. The dominant gates include mainly Firmics, proteobacteria, bacterioides, etc. The gates with significant differences between groups are mainly k __ bacteria\; p __ Acidobacteria, k __ Eukaryota\; p __ Zoopapomomycota, k __ bacteria\; p __ Dictyoglomi et al.
The basic steps of annotation:
1) Unigenes were aligned with bacterial (bacterio), fungal (Fungi), archaea (Archaea) and viral (Viruses) sequences extracted from the NCBI's NR (Version: 2018.01) database using DIAMOND software (blastp, value < = 1 e-5);
2) And (3) filtering a comparison result: for the comparison result of each sequence, selecting the comparison result of the value < = minimum value 10 for subsequent analysis;
3) After filtering, adopting an LCA algorithm (applied to system classification of MEGAN software), and taking the classification level before the first branch as species annotation information of each sequence;
4) Starting from LCA annotation results and a gene abundance table, obtaining abundance information and gene number information of each sample on each classification level (the genus species of the family Mentha);
5) Starting from the abundance table on each classification level (the genus species of the phylum synopsidae), krona analysis, relative abundance profile display, abundance cluster heat map display, PCA and NMDS dimension reduction analysis, anosim inter (intra) group difference analysis, meta stat and LEfSe multivariate statistical analysis of the inter-group difference species were performed.
3. Construction of classification model
And establishing a machine learning classification model by utilizing the microbial species abundance information table obtained by the flow.
Based on XGBoost (eXtreme Gradient Boosting), selecting different numbers of intestinal microorganism characteristics to classify the population with high and low endurance, and finally taking the average value of AUC values (the area under the ROC curve) by using a ten-fold cross-validation mode, wherein the final screening of the optimal classification model comprises the following 8 intestinal microorganisms:
Proteobacteria_bacterium_CAG_139;
Parasutterella_excrementihominis;
Burkholderiales_bacterium_1_1_47;
Olsenella_profusa;Parasutterella_excrementihominis_CAG_233;
Clostridium_ihumii;Thermoanaerobacter_italicus;Bacillus_sp_MUM_116。
3. experimental results
The model constructed based on 8 intestinal microorganisms is the optimal model. The intestinal metagenome data of 140 subjects are used for classifying two groups of people, the ROC curve is shown in the attached 1, the AUC reaches 0.83, and the application value of accurately distinguishing high and low endurance people is represented.
In addition, classification of two groups of people was performed using different numbers of intestinal microorganisms, and AUC values corresponding to different numbers were studied, and the results are shown in fig. 2.
Claims (21)
1. Use of an agent for detecting a combination of intestinal microorganisms consisting of the following microorganisms in a population of high and low endurance:
Proteobacteria_bacterium_CAG_139、Parasutterella_excrementihominis、
Burkholderiales_bacterium_1_1_47、Olsenella_profusa、
Parasutterella_excrementihominis_CAG_233、Clostridium_ihumii、
Thermoanaerobacter_italicus、Bacillus_sp_MUM_116。
2. the use of claim 1, wherein the detection is performed on a biological sample from a subject, the biological sample being a fecal sample.
3. The use of claim 1, wherein the reagent comprises at least one of a sequencing reagent, a reagent for detecting gene expression level, and a reagent for extracting DNA.
4. The use of claim 1, wherein the method of sequencing comprises a first generation sequencing method, a second generation sequencing method, a third generation sequencing method.
5. The use according to claim 1, wherein the reagent for detecting gene expression comprises a specific probe which specifically binds to a target sequence, and a specific primer for amplifying the target sequence.
6. The use of claim 1, wherein the DNA extraction reagent comprises a reagent used in the CTAB method, a reagent used in the GITC method, or a commercial DNA extraction kit.
7. A kit for distinguishing between high and low endurance populations, the kit comprising the reagent for detecting a combination of intestinal microorganisms according to claim 1, the combination of intestinal microorganisms consisting of:
Proteobacteria_bacterium_CAG_139、Parasutterella_excrementihominis、
Burkholderiales_bacterium_1_1_47、Olsenella_profusa、
Parasutterella_excrementihominis_CAG_233、Clostridium_ihumii、
Thermoanaerobacter_italicus、Bacillus_sp_MUM_116。
8. the kit of claim 7, further comprising an instrument and/or reagent for collecting a biological sample from a subject, the biological sample being a fecal sample.
9. A system for distinguishing between high and low endurance populations, the system comprising a computing device for determining the endurance of a subject based on the results of the detection of the combination of intestinal microorganisms according to claim 1.
10. The system of claim 9, wherein the test results are obtained by testing a sample from a subject, the sample being a fecal sample.
11. The system of claim 9, further comprising a detection device.
12. The system of claim 11, wherein the detection device comprises a PCR instrument, a high throughput sequencing platform, and a detection chip.
13. The system of claim 9, further comprising a test result collection device.
14. The system of claim 9, further comprising a result output device.
15. The system of claim 9, further comprising a result transmitting device.
16. Use of the kit of claim 7, the system of claim 9 for distinguishing between high and low endurance populations.
17. A method of determining the endurance level of a subject, the method comprising determining the endurance level of the subject based on the detection result of the intestinal microorganism combination of claim 1.
18. The method of claim 17, comprising the steps of:
1) Collecting a biological sample of the subject;
2) Extracting DNA;
3) Sequencing or detecting abundance information of each intestinal microorganism in the intestinal microorganism combination;
4) Judging whether the subject belongs to the high endurance crowd or the low endurance crowd according to the detection result of the 3).
19. The method of claim 18, wherein the biological sample is a fecal sample.
20. A method of modeling a population of high and low endurance, the method comprising modeling using the intestinal microbiota combination of claim 1.
21. The method of claim 20, wherein the model building algorithm comprises logistic regression, linear discriminant analysis, eigenvector linear discriminant analysis, support vector machine, random forest, recursive partitioning tree, XGBoost decision tree classification technique, shrunkenCentroids, stepAIC, kth-Nearest Neighbor, boosting, neural network, bayesian network, hidden markov model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210970874.4A CN115261499B (en) | 2022-08-14 | 2022-08-14 | Intestinal microbial marker related to endurance and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210970874.4A CN115261499B (en) | 2022-08-14 | 2022-08-14 | Intestinal microbial marker related to endurance and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115261499A CN115261499A (en) | 2022-11-01 |
CN115261499B true CN115261499B (en) | 2023-04-28 |
Family
ID=83751801
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210970874.4A Active CN115261499B (en) | 2022-08-14 | 2022-08-14 | Intestinal microbial marker related to endurance and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115261499B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116042928B (en) * | 2023-02-22 | 2024-01-02 | 中国人民解放军军事科学院军事医学研究院 | Primer group for amplifying and detecting nucleic acid sequence of digestive tract virus |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3932415A1 (en) * | 2020-07-01 | 2022-01-05 | Fundació Institut d'Investigació Biomèdica de Girona Dr. Josep Trueta (IDIBGI) | Gut microbiota composition and uses thereof |
CN114045326A (en) * | 2021-11-29 | 2022-02-15 | 广东药科大学 | Diarrhea-type irritable bowel syndrome intestinal microbial marker and application thereof |
-
2022
- 2022-08-14 CN CN202210970874.4A patent/CN115261499B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115261499A (en) | 2022-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10597720B2 (en) | Apparatus, kits and methods for the prediction of onset of sepsis | |
CN104603283B (en) | Determine the method and system of abnormality associated biomarkers | |
JP2020513856A (en) | Leveraging Sequence-Based Fecal Microbial Survey Data to Identify Multiple Biomarkers for Colorectal Cancer | |
US20080020379A1 (en) | Diagnosis and prognosis of infectious diseases clinical phenotypes and other physiologic states using host gene expression biomarkers in blood | |
EP4446439A2 (en) | Identification of host rna biomarkers of infection | |
WO2013138727A1 (en) | Method, kit and array for biomarker validation and clinical use | |
JP2019517783A (en) | Use of microbiome profiles to detect liver disease | |
CN115261499B (en) | Intestinal microbial marker related to endurance and application thereof | |
CN115719616A (en) | Method and system for screening specific sequences of pathogenic species | |
CN111647673A (en) | Application of microbial flora in acute pancreatitis | |
WO2014019408A1 (en) | Biomarkers for diabetes and usages thereof | |
CN114317725B (en) | Crohn disease biomarker, kit and screening method of biomarker | |
CN115261500B (en) | Intestinal microbial marker related to explosive force and application thereof | |
CN114566224A (en) | Model for identifying or distinguishing different altitude crowds and application thereof | |
CN111662992A (en) | Flora associated with acute pancreatitis and application thereof | |
JP2020178590A (en) | Method for determining the risk of ureteral calculus and/or nephrolithiasis | |
CN112634983B (en) | Pathogen species specific PCR primer optimization design method | |
CN113637782B (en) | Microbial marker related to progression of acute pancreatitis course and application thereof | |
CN114839369B (en) | Acute altitude stress microbial marker and application thereof | |
CN111996248B (en) | Reagent for detecting microorganism and application thereof in diagnosis of myasthenia gravis | |
RU2709815C1 (en) | Method of searching for molecular markers of a pathological process for differential diagnosis, monitoring and targeted therapy | |
CN116287400A (en) | Screening method for identifying fungus microorganism of systemic lupus erythematosus as marker and application thereof | |
CN114736970A (en) | Method for identifying different crowds | |
CN117418000A (en) | Library construction method for allergy-associated gene detection, primer composition and product thereof | |
JP2020174639A (en) | Method for determining risk of hypertension |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |