CN110396537B - Asthma biomarker and application thereof - Google Patents

Asthma biomarker and application thereof Download PDF

Info

Publication number
CN110396537B
CN110396537B CN201810371588.XA CN201810371588A CN110396537B CN 110396537 B CN110396537 B CN 110396537B CN 201810371588 A CN201810371588 A CN 201810371588A CN 110396537 B CN110396537 B CN 110396537B
Authority
CN
China
Prior art keywords
asthma
biomarker
relative abundance
sequencing
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810371588.XA
Other languages
Chinese (zh)
Other versions
CN110396537A (en
Inventor
郭锐进
王奇
贾慧珏
鞠艳梅
朱杰
赵慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Shenzhen Co Ltd
Original Assignee
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd filed Critical BGI Shenzhen Co Ltd
Priority to CN201810371588.XA priority Critical patent/CN110396537B/en
Publication of CN110396537A publication Critical patent/CN110396537A/en
Application granted granted Critical
Publication of CN110396537B publication Critical patent/CN110396537B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/689Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/136Screening for pharmacological compounds

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the field of biological medicine, in particular to an asthma biomarker and application thereof. The present invention provides biomarkers for asthma or related diseases, comprising at least one selected from the group consisting of: bacteroides faecalis (Bacterodies stercoris) and/or its analogs, eggerthella lenta (Eggerthella lenta) and/or its analogs, and Desoxata (Sutterella wadsworthensis) and/or its analogs. The invention provides a biomarker for early diagnosis of asthma or related diseases, diagnosis of asthma or related diseases and a method for predicting disease risks, which can solve the defects that the existing asthma diagnosis method cannot realize early warning, cannot predict asthma attack and the like.

Description

Asthma biomarker and application thereof
Technical Field
The invention relates to the field of biological medicine, in particular to an asthma biomarker and application thereof. In particular, the invention relates to biomarkers for asthma or related diseases, methods of diagnosing or predicting the risk of asthma or related diseases, kits and uses of the biomarkers for asthma in the preparation of the kits.
Background
Asthma (english: asthma, also known as asthma) is a common chronic inflammatory disease of the airways and is mainly characterized by variable and recurrent symptoms, reversible airflow obstruction and bronchospasm. Common symptoms are wheezing, coughing, chest tightness and dyspnea. Cough may also result from a cough in the lungs, but is often difficult to cough. During the recurrence of asthma, substances like pus may appear due to an increase in the number of white blood cells called eosinophils. Symptoms in the evening and early morning, and reactions to exercise and cold air, are often more severe. Some asthmatics respond very little to triggers, while others respond strongly and permanently.
By 2011, 235,000,000 ~ 300,000,000 people worldwide have asthma, about 25 tens of thousands die from which they had died. Asthma in different countries has different disease rates and prevalence rates between 1% and 18%.
Although asthma is a well-established disease, there is currently no accurate diagnostic test based on the type of symptoms and response to treatment over time. And asthma in children under six years old is difficult to diagnose because they are too small to use a spirometry.
Thus, studies on asthma have yet to be improved. There is an urgent need in the art for further research into asthma biomarkers.
Disclosure of Invention
The present application is made based on the discovery and recognition by the inventors of the following facts and problems: intestinal microorganisms are microflora present in the human intestinal tract and are the "second genome" of the human body. The intestinal flora and the host form a mutually related whole, and the intestinal microorganisms not only can degrade the digestive nutrient components, the host vitamins and other nutrient substances in the food, but also can promote the differentiation and maturation of intestinal epithelial cells, thereby activating the intestinal immune system and regulating the energy storage and metabolism of the host, and play important roles in the aspects of digestion and absorption, immune response, metabolic activity and the like of the human body. Therefore, the inventor of the invention screens out biomarkers with high correlation with asthma by analyzing intestinal flora and gene sequences of asthmatic patients and healthy people, and can accurately diagnose asthma or related diseases or predict disease risks by using the biomarkers, and can be used for monitoring treatment effects.
Therefore, the invention aims to provide a biomarker for evaluating the risk of asthma or diagnosing asthma at an early stage, and a diagnosis and disease risk evaluation method of asthma, which can solve the defects that the existing diagnosis method of asthma cannot realize early warning, cannot predict the trend of the onset and development of asthma and the like. Thus being applicable to predicting the incidence and development trend of asthma and being applicable to pathological typing of diseases.
It is believed that asthma-related biomarkers are valuable for early diagnosis for the following reasons. First, the markers of the present invention have specificity and sensitivity. Second, analysis of stool ensures accuracy, safety, affordability, and patient compliance. And the sample of faeces is transportable. Polymerase Chain Reaction (PCR) based assays are comfortable and non-invasive, so one would more easily participate in a given screening procedure. Third, the markers of the invention may be used as tools for therapy monitoring of asthmatic patients to detect response to therapy.
According to a first aspect of the invention, the invention provides a biomarker. According to an embodiment of the invention, the biomarker comprises at least one selected from the group consisting of:
The bacteroides faecalis (Bacterodies stercoris) and/or its analogue, the Eggerthella lenta (Eggerthella lenta) and/or its analogue and the Wade Save (Sutterella wadsworthensis) and/or its analogue, the bacteroides faecalis (Bacterodies stercoris) analogue has a comparison similarity of 85% or more with the genomic sequence of bacteroides faecalis (Bacterodies stercoris), the Eggerthella lenta (Eggerthella lenta) analogue has a comparison similarity of 85% or more with the genomic sequence of Eggerthella lenta (Eggerthella lenta), and the Wade Save (Sutterella wadsworthensis) analogue has a comparison similarity of 85% or more with the genomic sequence of Wade Save (Sutterella wadsworthensis). These biomarkers can be used as biomarkers for asthma detection, and by determining whether one or two or more of these markers are present in the intestinal flora of a subject, it is possible to effectively determine whether a subject is suffering from or susceptible to asthma (i.e. predicts the risk of suffering from asthma), and these biomarkers can be further used to monitor the therapeutic efficacy of an asthmatic patient. In addition, when the healthy sample size is sufficiently large, the person skilled in the art can also obtain the normal value or normal range of each biomarker in the intestinal tract according to the test and calculation method, so as to be used for indicating the content of each biomarker in the healthy sample, thereby determining whether the subject has or is susceptible to asthma by detecting the content of at least one of the biomarkers in the intestinal flora in the sample, and simultaneously can be used for monitoring the efficiency of the treatment effect of the asthma patient. Moreover, it is known to those skilled in the art that when a certain unknown microorganism or a certain gene sequence derived from a certain nucleic acid is compared with a certain gene sequence of a certain known strain, the microorganism can be considered to belong to the same genus as the strain, or the gene sequences can be classified into the same genus as the strain, and the microorganisms of the same genus generally have the same or similar functions, and therefore, the analogs can also be utilized as markers of asthma.
In the invention, the alignment similarity, which may also be referred to as alignment similarity, refers to the ratio of the same base or amino acid residue sequence between a target sequence (sequence to be determined) and a reference sequence (known sequence) in the sequence alignment process.
According to an embodiment of the invention, the biomarker is selected from at least one of bacteroides faecalis ATCC 43183 (Bacterodies stercoris ATCC 43183), eglinium tarda DSM 2243 (Eggerthella lenta DSM 2243) or gardnerella sp.3_1_45 b (Sutterella wadsworthensis 3_1_45 b). These biomarkers, as representative strains of bacteroides faecalis (Bacterodies stercoris), eggerthella lenta (Eggerthella lenta) and Sinorhizome (Sutterella wadsworthensis), can be used to indicate the status or risk of asthma or asthma-related diseases.
According to an embodiment of the invention, the analogue of bacteroides faecalis (Bacterodies stercoris) has a comparison similarity of more than 95% with the genomic sequence of bacteroides faecalis (Bacterodies stercoris), the analogue of Eggerthella lenta (Eggerthella lenta) has a comparison similarity of more than 95% with the genomic sequence of Eggerthella lenta (Eggerthella lenta), and the analogue of Sinapis albae (Sutterella wadsworthensis) has a comparison similarity of more than 95% with the genomic sequence of Sinapis albae (Sutterella wadsworthensis). It is known to those skilled in the art that when a certain unknown microorganism or a certain nucleic acid-derived gene sequence has a similarity of 95% or more to a certain known strain, the microorganism can be considered as the same species as the strain, or the gene sequence can be classified as the same species as the strain. Thus, the person skilled in the art can directly obtain the nucleic acid sequence information in the detection object, and then compare the nucleic acid sequence information with the genome sequence of the bacteroides faecalis (Bacterodies stercoris), the Eggerthella lenta (Eggerthella lenta) or the Edison bacterium (Sutterella wadsworthensis), and if the sequence similarity is more than 95%, the nucleic acid sequence information can be used as a mark for detecting whether the detection object suffers from asthma or is susceptible to asthma.
According to an embodiment of the present invention, when the analog of Bacteroides faecalis is compared with the genomic sequence of Bacteroides faecalis, the analog of Eggerthella lenta is compared with the genomic sequence of Eggerthella lenta, the alignment coverage is 80% or more, and the alignment similarity is 85% or more, these analogs can be considered to belong to the same genus as the corresponding strain, and can be used as a marker of asthma. Preferably, when the comparison coverage of these analogues with the corresponding strain is 80% or more and the comparison similarity is 95% or more, these analogues can be considered as being homologous to the corresponding strain and can be used as markers of asthma.
In the invention, the comparison coverage refers to the proportion of the length of a sequence which is compared with a reference sequence in a target sequence to the total length of a detection sequence in the process of comparing the target sequence with the reference sequence.
According to a second aspect of the invention, the invention proposes a method of diagnosing whether a subject has asthma or a related disease or predicting whether a subject has a risk of asthma or a related disease. According to an embodiment of the invention, the method comprises the steps of: (1) collecting a sample from the subject; (2) Determining the relative abundance information of a biomarker in the sample obtained in step (1), the biomarker being a biomarker according to the first aspect of the invention; (3) Comparing the relative abundance information described in step (2) to a reference dataset or reference value. The method can be used for diagnosing diseases in the meaning of patent law, and can be used for scientific research or other non-disease diagnosis such as enrichment of personal genetic information and enrichment of a genetic information library. Comparing the relative abundance information of each biomarker in the test subject to a reference dataset or reference value to determine whether the subject has asthma or a related disease, or to predict the risk of having asthma or a related disease.
The reference data set refers to the relative abundance information of each biomarker obtained by operating on samples of individuals who are confirmed to be ill and healthy, and is used as a reference for the relative abundance of each biomarker. In one embodiment of the invention, the reference data set refers to a training data set. According to the invention, the training set means and the validation set have meanings known in the art. In one embodiment of the invention, the training set refers to a data set comprising the content of each biomarker in a test sample of asthmatic subjects and non-asthmatic subjects of a certain number of samples. The validation set is a separate set of data used to test the performance of the training set.
The reference value refers to a reference value or a normal value of a healthy control. It is known to those skilled in the art that when the sample volume is sufficiently large, a range of normal values (absolute values) for each biomarker in the sample can be obtained using detection and calculation methods known in the art. When the level of the biomarker is detected using an assay, the absolute value of the level of the biomarker in the sample may be directly compared to a reference value to assess risk of illness and diagnose or early diagnose asthma or related diseases, optionally, statistical methods may be included.
The asthma-related diseases as referred to in the present invention mean diseases associated with asthma, including diseases or symptoms which can cause early stages of asthma, such as inflammatory reactions in which some cells including mast cells, eosinophils and T lymphocytes participate, and subsequent or concurrent symptoms or diseases caused by asthma, such as aqueous alkali imbalance, pulmonary infection, etc.
According to an embodiment of the present invention, the method may further include the following technical features:
according to an embodiment of the invention, the reference dataset comprises relative abundance information of biomarkers in samples from a plurality of asthmatic patients and a plurality of healthy controls, the biomarkers being biomarkers according to the first aspect of the invention.
According to an embodiment of the invention, in the step of comparing the relative abundance information described in step (2) with a reference dataset, further comprising executing a multivariate statistical model to obtain a probability of illness. And rapid and efficient detection can be realized by utilizing the multivariate statistical model.
According to an embodiment of the invention, the multivariate statistical model is a random forest model.
According to an embodiment of the invention, the probability of illness being greater than a threshold value indicates that the subject has or is at risk of having asthma or a related disease.
According to an embodiment of the invention, the threshold value is 0.5.
According to an embodiment of the invention, a decrease in said bacteroides faecalis (Bacterodies stercoris) and/or an analogue thereof, said gardnerella (Sutterella wadsworthensis) and/or an analogue thereof, when compared to a reference value, indicates that said subject has or is at risk of having asthma or a related disease; an increase in the Eggerthella lenta (Eggerthella lenta) and/or analog thereof is indicative of the subject suffering from or at risk of suffering from asthma or a related disorder.
According to an embodiment of the invention, the relative abundance information of the biomarker in step (2) is obtained using a sequencing method, further comprising: isolating a nucleic acid sample from the sample of the subject, constructing a DNA library based on the obtained nucleic acid sample, sequencing the DNA library to obtain a sequencing result, and comparing the sequencing result to a reference gene set based on the sequencing result to determine relative abundance information of the biomarker. According to one embodiment of the present invention, the sequencing result may be aligned with the reference gene set using at least one of SOAP2 and MAQ, whereby the efficiency of alignment may be improved, and thus the efficiency of asthma detection may be improved. According to the embodiment of the invention, multiple (at least two) biomarkers can be detected at the same time, and the efficiency of asthma detection can be improved.
According to an embodiment of the invention, the reference gene set comprises performing metagenomic sequencing from samples of a plurality of asthmatic patients and a plurality of healthy controls to obtain a non-redundant gene set, and then combining the non-redundant gene set with the intestinal microbial gene set to obtain the reference gene set. The reference gene set in the invention can be an existing gene set, such as an existing published intestinal microorganism reference gene set; or performing metagenome sequencing on samples of a plurality of asthmatic patients and a plurality of healthy controls to obtain a non-redundant gene set, and combining the non-redundant gene set with an intestinal microbial gene set to obtain the reference gene set, so that the obtained reference gene set has more comprehensive information and more reliable detection results.
The non-redundant gene set described in the present invention is explained as a general understanding of those skilled in the art, simply the set of remaining genes after the redundant genes are removed. Redundant genes generally refer to multiple copies of a gene that occur on a chromosome.
According to an embodiment of the invention, the sample is a stool sample.
According to an embodiment of the invention, the sequencing method is performed by a second generation sequencing method or a third generation sequencing method. The means for sequencing is not particularly limited, and rapid and efficient sequencing can be achieved by sequencing through a second-generation or third-generation sequencing method.
According to an embodiment of the invention, the sequencing method is performed by at least one selected from the group consisting of Hiseq2000, SOLiD, 454, and single molecule sequencing devices. Therefore, the high-throughput and deep sequencing characteristics of the sequencing devices can be utilized, so that the analysis of subsequent sequencing data, particularly the precision and accuracy in the process of statistical inspection, is facilitated.
According to a third aspect of the present invention, there is provided a kit comprising reagents for detecting a biomarker comprising a biomarker according to the first aspect of the present invention. With the kit, the relative abundance of these markers in the intestinal flora can be determined, whereby it can be determined whether the subject suffers from or is susceptible to asthma, and the efficiency of the treatment for monitoring asthmatic patients, by the resulting relative abundance values.
According to an embodiment of the invention, the kit comprises a set of reference data sets or reference values for reference of the relative abundance of each biomarker. The reference data set or reference value may preferably be attached to a physical carrier, e.g. an optical disc, such as a CD-ROM or the like.
According to an embodiment of the invention, the kit further comprises a first computer program product for performing the obtaining of the reference data set or reference value. I.e. the first computer program product is arranged to perform obtaining a set of reference data sets or reference values for diagnosing whether the subject suffers from asthma or a related disease or for predicting whether the subject suffers from asthma or a related disease.
According to an embodiment of the invention, the kit further comprises a second computer program product which may also be used to perform the method of diagnosing whether a subject has asthma or a related disease or predicting whether a subject has a risk of asthma or a related disease according to the second aspect of the invention.
According to a fourth aspect of the invention, the invention proposes the use of a biomarker in the manufacture of a kit for diagnosing whether a subject has asthma or a related disease or predicting whether a subject has a risk of asthma or a related disease. According to an embodiment of the invention, the diagnosis or prediction comprises the steps of: 1) Collecting a sample from the subject; 2) Determining the relative abundance information of a biomarker in the sample obtained in step 1), the biomarker being a biomarker according to the first aspect of the invention; 3) Comparing the relative abundance information described in step 2) with a reference dataset or reference value. From the kit, the relative abundance of these markers in the intestinal flora can be determined, whereby it can be determined from the obtained relative abundance values whether the subject suffers from or is susceptible to asthma, and the efficiency for monitoring the therapeutic effect of asthmatic patients.
According to the embodiment of the invention, the application of the biomarker in the preparation of the kit can be further added with the following technical characteristics:
according to an embodiment of the invention, in the above use, the reference dataset comprises relative abundance information of a biomarker in samples from a plurality of asthmatic patients and a plurality of healthy controls, the biomarker being a biomarker according to the first aspect of the invention.
According to an embodiment of the present invention, in the above use, in the step of comparing the relative abundance information described in step 2) with a reference dataset, further comprising executing a multivariate statistical model to obtain a probability of illness; preferably, the multivariate statistical model is a random forest model.
According to an embodiment of the invention, in the above use, the probability of illness being greater than a threshold value indicates that the subject has or is at risk of having asthma or a related disease; preferably, the threshold is 0.5.
According to an embodiment of the invention, in the above use, a decrease in said bacteroides faecalis (Bacterodies stercoris) and/or an analogue thereof, said gardnerella (Sutterella wadsworthensis) and/or an analogue thereof when compared to a reference value indicates that said subject has or is at risk of having asthma or a related disease; an increase in the Eggerthella lenta (Eggerthella lenta) and/or analog thereof is indicative of the subject suffering from or at risk of suffering from asthma or a related disorder.
According to an embodiment of the present invention, in the above use, the relative abundance information of the biomarker in step 2) is obtained by a sequencing method, further comprising: isolating a nucleic acid sample from the sample of the subject, constructing a DNA library based on the obtained nucleic acid sample, sequencing the DNA library to obtain a sequencing result, and comparing the sequencing result to a reference gene set based on the sequencing result to determine the relative abundance of the biomarker.
According to an embodiment of the present invention, in the above use, the reference gene set comprises performing metagenomic sequencing from samples of a plurality of asthmatic patients and a plurality of healthy controls to obtain a non-redundant gene set, and then combining the non-redundant gene set with the intestinal microbial gene set to obtain the reference gene set.
According to an embodiment of the invention, in the above use, the sample is a stool sample.
According to an embodiment of the invention, in the above use, the sequencing method is performed by a second generation sequencing method or a third generation sequencing method.
According to an embodiment of the present invention, in the above use, the sequencing method is performed by at least one selected from the group consisting of Hiseq2000, SOLiD, 454, and single molecule sequencing device.
According to a fifth aspect of the invention, the invention proposes the use of a biomarker as a target for screening for a medicament for the treatment or prophylaxis of asthma or related diseases. According to an embodiment of the invention, the biomarker is a biomarker according to the first aspect of the invention. According to embodiments of the present invention, the effect of the drug candidate on these biomarkers before and after use may be utilized to determine whether the drug candidate may be used to treat or prevent asthma.
The beneficial effects obtained by the invention are as follows: the feces are metabolites of human bodies, not only contain the metabolites of human bodies, but also include intestinal microorganisms closely related to the changes of metabolism, immunity and other functions of the human bodies, and the study on the feces shows that obvious differences exist in the composition of intestinal flora of asthmatic patients and healthy people, so that the asthmatic patients can be accurately subjected to risk assessment and early diagnosis. According to the invention, through comparing and analyzing intestinal flora of asthmatic patients and healthy people, various related intestinal microorganisms are obtained, and by combining high-quality asthmatic people and healthy people MGSs as training sets, the risk evaluation and early diagnosis of the asthmatic patients can be accurately carried out. Compared with the current common diagnosis method, the method has the characteristics of convenience and rapidness.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
Fig. 1 shows a schematic diagram of a device for determining whether a subject has asthma or a related disease or predicting whether a subject has asthma or a related disease according to one embodiment of the invention, wherein fig. a is a schematic diagram of the device and fig. b is a schematic diagram of a biomarker relative abundance determination means in the device.
Fig. 2 shows the case of asthmatic patients and healthy control Alpha variability at the gene level, two sets of differences in gene count (p= 0.02618,Wilcox test) and Shannon index (p= 0.0801,Wilcox test) according to one embodiment of the invention.
Figure 3 illustrates a graph of error rate distribution for 5 10-fold cross-validation in a random forest classifier according to one embodiment of the present invention.
Figure 4 shows a Receiver Operating Curve (ROC) and Area Under Curve (AUC) of a training set consisting of healthy controls and asthmatics based on a random forest model (3 intestinal markers) according to one embodiment of the invention.
Figure 5 shows a Receiver Operating Curve (ROC) and Area Under Curve (AUC) of a validation set consisting of healthy controls and asthmatics (healthy: 65 and sick: 7) based on a random forest model (3 intestinal markers) according to one embodiment of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
Aiming at the defects that the existing asthma diagnosis method cannot realize early warning, cannot predict the trend of asthma attack and development and the like, the invention provides a biomarker for evaluating the risk of asthma or early diagnosing asthma, and a diagnosis and disease risk evaluation method for the asthma, which can predict the trend of asthma attack and development and is applied to pathological typing of diseases.
Biomarkers
According to one aspect of the invention, the invention proposes a biomarker.
The terms used in the present invention have meanings commonly understood by those of ordinary skill in the relevant art. However, for a better understanding of the present invention, some definitions and related terms are explained as follows:
according to the present invention, the term "asthma" is the most common chronic inflammatory disease of the airways and is mainly characterized by variable and recurrent symptoms, reversible airflow obstruction, and bronchospasm. Common symptoms are wheezing, coughing, chest tightness and dyspnea.
According to the invention, the level of biomarker substances is indicated by relative abundance.
According to the present invention, the term "biomarker", also referred to as "biomarker", refers to a measurable indicator of the biological status of an individual. Such biomarkers may be any substance in the individual as long as they are related to a specific biological state (e.g., disease) of the subject individual, e.g., nucleic acid markers (which may also be referred to as genetic markers, e.g., DNA), protein markers, cytokine markers, chemokine markers, carbohydrate markers, antigen markers, antibody markers, species markers (markers of species/genus), functional markers (KO/OG markers), and the like. The meaning of the nucleic acid marker is not limited to the existing gene which can be expressed as a protein with biological activity, but also includes any nucleic acid fragment, which can be DNA, RNA, modified DNA or RNA, unmodified DNA or RNA, and a collection consisting of the modified DNA or RNA. Nucleic acid markers may also sometimes be referred to herein as signature fragments. In the present invention, biomarkers may also be denoted as "intestinal markers" because the biomarkers found in the present invention that are associated with asthma are all present in the intestinal tract of a subject. Biomarkers are measured and evaluated, often to examine normal biological processes, pathogenic processes, or therapeutic intervention pharmacological responses, and are useful in many scientific fields.
According to embodiments of the invention, stool samples from healthy people and asthmatic patients can be analyzed in batches using high throughput sequencing. Based on the high throughput sequencing data, healthy population is aligned with the asthma patient population to determine specific nucleic acid sequences associated with the asthma patient population. Briefly, the procedure is as follows:
sample collection and processing: collecting fecal samples of healthy people and asthma patient groups, and extracting DNA by using a kit to obtain a nucleic acid sample;
library construction and sequencing: DNA library construction and sequencing is performed using high throughput sequencing to obtain nucleic acid sequences of intestinal microorganisms contained in fecal samples;
by bioinformatic analysis methods, specific intestinal microbial nucleic acid sequences associated with asthmatic patients were determined. First, the sequenced sequences (reads) are aligned with a reference gene set (also referred to as a reference gene set, which may be a newly constructed gene set or a database of any known sequences, for example, using a known human intestinal microflora non-redundant gene set). Next, based on the comparison results, the relative abundance of each gene in the nucleic acid samples from the stool samples of the healthy and asthmatic patient populations, respectively, was determined. By comparing the sequencing sequence with the reference gene set, a corresponding relationship can be established between the sequencing sequence and the genes in the reference gene set, so that the number of the sequencing sequences corresponding to the specific genes in the nucleic acid sample can effectively reflect the relative abundance of the genes. Thus, the relative abundance of genes in a nucleic acid sample can be determined by comparison and conventional statistical analysis. Finally, after determining the relative abundance of each gene in the nucleic acid sample, the relative abundance of each gene in the nucleic acid sample from the stool of healthy and asthmatic patient populations is statistically examined, thereby determining whether there is a gene with a significant difference in relative abundance between healthy and asthmatic patient populations, and if there is a significant difference in the gene, the gene is considered as a biomarker of an abnormal state, i.e., a nucleic acid marker.
In addition, for known or newly constructed reference gene sets, they typically contain genetic species information and functional annotations, whereby, based on determining the relative abundance of genes, the species information and functional annotations of genes can be further categorized to determine the relative abundance of species and functional relative abundance of each microorganism in the intestinal flora, and thus further determine the species markers and functional markers of abnormal states. Briefly, the method of determining a species marker and a functional marker further comprises: comparing the sequencing sequences of the healthy population and the asthmatic population to a reference gene set; based on the comparison result, determining the species relative abundance and the functional relative abundance of each gene in the nucleic acid samples of the healthy population and the asthmatic population respectively; statistically testing the relative abundance of species and relative abundance of function for each gene in nucleic acid samples from healthy and asthmatic populations; and determining a species marker and a functional marker, respectively, for which there is a significant difference in relative abundance between nucleic acid samples of the healthy population and the asthmatic population. According to embodiments of the present invention, the functional relative abundance and the species relative abundance may be determined using statistical tests, such as addition, averaging, median, etc., of the relative abundance of genes from the same species and the relative abundance of genes with the same functional annotation.
Finally, biological markers were determined for the presence of significant differences in relative abundance between fecal samples from healthy and asthmatic patient populations, i.e. including microbial species: bacteroides faecalis (Bacterodies stercoris) and/or its analogues, eggerthella lenta (Eggerthella lenta) and/or its analogues and/or Edsat's (Sutterella wadsworthensis) and/or its analogues. Thus, by detecting the presence of at least one of the above markers, it is effectively determined whether a subject is suffering from or susceptible to asthma and can be used to monitor the effectiveness of the treatment of an asthmatic patient. The term "presence" as used herein is to be understood in a broad sense, and refers to both qualitative analysis of a sample for the presence of the corresponding target, quantitative analysis of the target in the sample, and further statistical analysis of the resulting quantitative analysis results with a reference (e.g., a quantitative analysis result obtained by parallel testing of a sample having a known state) or any known mathematical operation. Those skilled in the art can readily select as desired and as test conditions. According to embodiments of the present invention, it is also possible to determine whether a subject is suffering from or susceptible to asthma and to monitor the effectiveness of treatment of asthmatics by determining the relative abundance of these microorganisms in the intestinal flora.
It may be useful to determine whether a subject is suffering from or susceptible to asthma by detecting the presence of at least one of the above-mentioned microbial species in the subject's intestinal flora, or by detecting the presence of two or more of the above-mentioned species in the subject's intestinal flora, i.e. the presence of a combination of the above-mentioned biomarkers, and may be used to monitor the therapeutic effect of an asthmatic patient. Herein, the term "biomarker combination" refers to a combination consisting of two or more biomarkers.
The presence or absence of species and function in the intestinal flora can also be determined by a person skilled in the art for species markers and function markers by conventional means of strain identification and biological activity assay. For example, strain identification can be performed by performing 16s rRNA.
Device for detecting whether a subject suffers from asthma or a related disease or for predicting whether a subject suffers from asthma or a related disease
According to a further aspect of the present invention, the present invention proposes a device for detecting whether a subject suffers from asthma or a related disease or predicting whether a subject suffers from asthma or a related disease, as shown in fig. 1. According to an embodiment of the invention, the apparatus comprises a sample acquisition device 100, a biomarker relative abundance determination device 200, and a probability of illness determination device 300 (shown as a in fig. 1). Wherein the sample acquisition device is adapted to acquire a sample from the subject; a biomarker relative abundance determination means connected to the sample acquisition means, adapted to determine relative abundance information of a biomarker in the obtained sample, the biomarker being a biomarker according to the first aspect of the invention; the disease probability determining device is connected with the biomarker relative abundance determining device, and the disease probability determining device is used for comparing the relative abundance information of the biomarkers obtained by the relative abundance determining device with a reference data set or a reference value.
According to a specific embodiment of the invention, the reference dataset comprises relative abundance information of the biomarker according to the first aspect of the invention in samples from a plurality of asthmatic patients and a plurality of healthy controls.
According to one embodiment of the present invention, the disease probability determining apparatus further includes executing a multivariate statistical model to obtain a disease probability; preferably, the multivariate statistical model is a random forest model. According to a preferred embodiment of the invention, the probability of illness being greater than a threshold value indicates that the subject has or is at risk of having asthma or a related disease; preferably, the threshold is 0.5. According to a preferred embodiment of the invention, a decrease in said bacteroides faecalis (Bacterodies stercoris) and/or analogue thereof and/or in said gardnerella (Sutterella wadsworthensis) and/or analogue thereof when compared to a reference value indicates that said subject has or is at risk of having asthma or a related disease; an increase in the Eggerthella lenta (Eggerthella lenta) and/or analog thereof is indicative of the subject suffering from or at risk of suffering from asthma or a related disorder.
According to one embodiment of the invention, the biomarker relative abundance determination means (shown as b in fig. 1) further comprises: a nucleic acid sample separation unit 210, a sequencing unit 220, and an alignment unit 230. According to an embodiment of the invention, the nucleic acid sample separation unit is adapted for separating a nucleic acid sample from the sample of the subject, the sequencing unit is connected to the nucleic acid sample separation unit and based on the obtained nucleic acid sample a DNA library is constructed, the DNA library is sequenced in order to obtain a sequencing result, the comparison unit is connected to the sequencing unit and based on the sequencing result is compared to a reference gene set in order to determine the relative abundance information of the biomarker.
According to one embodiment of the invention, the reference gene set comprises performing metagenomic sequencing from samples of a plurality of asthmatic patients and a plurality of healthy controls to obtain a non-redundant gene set, and then combining the non-redundant gene set with the intestinal microbial gene set to obtain the reference gene set.
According to an embodiment of the present invention, the sequencing unit is not particularly limited. Preferably, the sequencing unit is performed using a second generation sequencing method or a third generation sequencing method. Preferably, the sequencing unit is at least one selected from the group consisting of Hiseq2000, SOLiD, 454, and single molecule sequencing device. Therefore, the characteristics of high-throughput and deep sequencing of the sequencing units can be utilized, so that the analysis of subsequent sequencing data, particularly the precision and accuracy in the process of statistical inspection, is facilitated.
According to one embodiment of the invention, the alignment unit performs the alignment using at least one selected from SOAP2 and MAQ. Thus, the efficiency of comparison can be improved, and the efficiency of detecting asthma can be further improved.
In addition, according to the embodiment of the invention, the invention also provides a drug screening method. Thus, according to the embodiment of the invention, the marker closely related to asthma is used as a drug design target to screen drugs, so that the discovery of new drugs for treating asthma is promoted. For example, it may be determined whether a candidate agent may be an agent for treating or preventing asthma by detecting changes in biomarker levels before and after contact with the candidate agent. For example, it is detected whether the level of the detrimental biomarker is reduced after contact with the drug candidate and the level of the beneficial biomarker is increased after contact with the drug candidate. In addition, screening for candidate compounds as agents for treating or preventing asthma may also be performed by determining the direct or indirect effect of a drug on the biological activity of bacteroides faecalis (Bacterodies stercoris) and/or an analog thereof, eggeratia lenta (Eggerthella lenta) and/or an analog thereof, and/or at least one of Waldesat's fungus (Sutterella wadsworthensis) and/or an analog thereof. Thus, according to an embodiment of the present invention, the present invention also proposes the use of biomarkers according to asthma in screening for drugs for the treatment or prophylaxis of asthma.
It is to be noted that the explanation of the terms is provided herein only for better understanding of the present invention by those skilled in the art, and is not to be construed as limiting the present invention.
It is understood that within the scope of the present invention, the above-described technical features of the present invention and technical features specifically described below (e.g., in the examples) may be combined with each other to constitute new or preferred technical solutions. And are limited to a space, and are not described in detail herein.
The invention will now be described with reference to specific examples, which are intended to be illustrative only and are not to be construed as limiting the invention.
Unless otherwise indicated, the technical means employed in the examples are conventional means well known to those skilled in the art, and may be carried out with reference to the third edition of the guidelines for molecular cloning experiments or related products, and the reagents and products employed are also commercially available. The various processes and methods not described in detail are conventional methods well known in the art, the sources of the reagents used, the trade names and those necessary to list the constituents are all indicated at the first occurrence, and the same reagents used thereafter, unless otherwise indicated, are the same as those indicated at the first occurrence.
The invention adopts a metagenome association analysis (MWAS) analysis method, and the flora composition and the functional difference of the fecal sample are analyzed by sequencing; the random forest discrimination model is used for discriminating asthma groups and non-asthma groups to obtain the disease probability, and the method is used for disease risk assessment, diagnosis, early diagnosis or searching potential drug targets of asthma.
According to the present invention, the term "MGS" refers to the metagenomic species (metagenomic species) (Nielsen H B, almeida M, juncker A S, et al identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes [ J ]. Nature biotechnology,2014,32 (8): 822-828.) which is the same marker artificially set for a certain taxonomic unit (strain, species, genus, group, etc.) in phylogenetic or population genetic studies for ease of analysis. Sequences are typically partitioned into different MGSs by similarity threshold, each MGS typically being considered a species of microorganism. An MGS is considered a known strain if more than 90% of the sequences in one MGS are 95% sequence-like and an alignment length greater than 80% of its length is capable of aligning with the known strain; an MGS is considered a known species if more than 50% of the sequences in one MGS are 95% sequence-like and an alignment length greater than 80% of its length is capable of aligning to the known microorganism species; if more than 50% of the sequences in one MGS are 85% base-like and an alignment length greater than 80% of its length is able to align to a known genus level of microorganisms, then MGS is considered annotated as a known genus level of species; if more than 50% of the sequences in one MGS are aligned at a known phylum level of microorganisms with 75% base similarity and an alignment length greater than 80% of its length, then the MGS is considered annotated as a known phylum level for that species.
According to the invention, the term "individual" refers to an animal, in particular a mammal, such as a primate, preferably a human.
According to the present invention, terms such as "a," "an," and "the" do not refer to an individual in the singular, but include the general class which may be used to describe a particular embodiment.
In the present invention, the sequencing (second generation sequencing) and MWAS are well known in the art, and can be adjusted according to the specific situation by those skilled in the art. According to embodiments of the present invention, this may be performed according to the method described in the literature (Jun Wang, and Huijue Jia. Metagenome-wide association studies: fine-reduction the microbiolome. Nature Reviews Microbiology 14.8.8 (2016): 508-522.).
In the present invention, the methods of using random forest models and ROC curves are well known in the art, and those skilled in the art can set and adjust parameters according to specific situations. According to embodiments of the present invention, this can be done according to the methods described in the literature (Drogan D, dunn WB, lin W, buijsse B, schulze MB, langenberg C, brown M, floegel a, dietrich S, rolandson O, wedge DC, goodare R, forouhi NG, sharp SJ, spanger J, wareham NJ, boeing H: untargeted Metabolic Profiling Identifies Altered Serum Metabolites of Type-Diabetes Mellitus in a Prospective, nested Case Control student.Clin Chem 2015, 61:487-497; michalik SJ, michaliszyn SF, de las HerasJ, bacha F, lee S, chace DH, dejesus VR, vockley J, arslaian SA: metabolomic profiling of fatty acid and amino acid metabolism in youth with obesity and type 2diabetes:evidence for enhanced mitochondrial oxidation.Diabetes Care 2012,35:605-611.
In the invention, a training set of biomarkers of asthmatic subjects and non-asthmatic subjects is constructed, and the biomarker content value of a sample to be tested is evaluated based on the training set.
It is known to those skilled in the art that when the sample size is further expanded, the normal content value interval (absolute value) of each biomarker in the sample can be derived using sample detection and calculation methods well known in the art. The absolute value of the biomarker content detected may be compared to a normal content value, optionally in combination with statistical methods, to derive an asthma risk assessment, diagnosis, efficiency for monitoring the efficacy of treatment of asthmatic patients, etc.
Without wishing to be bound by any theory, the inventors point out that these biomarkers are intestinal flora present in humans. The method provided by the invention is used for carrying out association analysis on the intestinal flora of a subject, and the biomarker obtained from the asthma colony shows a certain content range value in flora detection.
Example 1
1.1 sample collection
Referring to the method described in literature A Metagenome-wide association study of gut microbiota in type 2diabetes (Qin, J.et al Nature 490,55-60 (2012)), a stool sample is collected, frozen for transport and rapidly transferred to-80℃for storage, and DNA extraction is performed to obtain an extracted DNA sample. The stool samples of asthmatic and non-asthmatic subjects used were from adult twin in the united kingdom for a total of 250, and then 29 samples of the missing phenotype, which were samples for which the absence or presence of illness could not be judged by clinical detection means, were discarded from 250 total samples, and the remaining 221 samples included 185 healthy samples and 36 asthmatic samples.
1.2 metagenomic sequencing and Assembly
A sequencing library was constructed using the extracted DNA samples, and two-way (Paired-end) metagenomic sequencing (insert 350bp, read 100bp long) was performed on an Illumina Hiseq2000 sequencing platform. The data from sequencing was filtered (quality-control, adapter contaminating sequence removed, low quality sequence removed and host genome contaminating sequence removed) and re-assembled using the soap denovo software (v 2.04) to obtain assembled fragments (condics).
1.3 Gene set construction
For assembled assembly fragments (connigs), gene prediction was performed by using GeneMark software (v 2.7 d), then redundancy was performed by using BLAT software (alignment similarity (identity) was 95% or more, alignment coverage (overlap) was 90% or more, and gaps (gaps) were not present), thus obtaining a non-redundant gene set containing 5,901,478 genes; then, as described in reference An integrated catalog of reference genes in the human gut microbiome (Li J, jia H, cai X, et al Nature biotechnology,2014,32 (8): 834-841.), the fecal sample gene set was further supplemented with CD-HIT software into the published intestinal microbial reference gene set containing 9,879,896 genes (alignment similarity was 95% or more, alignment coverage was 90% or more), and finally a new gene set containing 11,446,577 genes was obtained.
The high quality sequencing fragments (reads) used for "1.2 metagenome sequencing and assembly" assembly described above were aligned with the intestinal reference gene set (11,446,577 genes described above), as described in reference a metanom-wide association study of gut microbiota in type diabetes (Qin, j. Et al nature 490,55-60 (2012)), to obtain the relative abundance of the genes.
1.4 species classification annotation and abundance calculation
The predicted genes were species-classified by comparison with the IMG (v 400) database, as described in reference A Metagenome-wide association study of gut microbiota in type 2diabetes (Qin, J.et al Nature 490,55-60 (2012)). For phylum-level species classification, the similarity of the alignment is 65% or more, and the alignment coverage is 70% or more as a threshold for phylum-level species classification. For genus-level species classification, the similarity of the alignment is above 85% as a threshold for genus-level species classification. The similarity of the alignment is more than 95 percent and is used as a critical value of species classification at the strain level.
The relative abundance of this species was then calculated using the relative abundance of the genes and statistically tested (p < 0.05) using a rank sum test (Wilcoxon rank-sum test) to determine species with significant differences in relative abundance between cases and controls, as described in reference to the methods described in literature A Metagenome-wide association study of gut microbiota in type 2diabetes (Qin J, li Y, cai Z, et al Nature,2012,490 (7418):55-60.).
1.5 biomarker abundance calculations
Clustering genes according to the relative abundance of the genes (software: https:// bitbucket. Org/HeyHo/MGS-canopy-algorithm. Git), and selecting MGS with a clustering base factor greater than 700 for species annotation; and according to the method of adding the relative abundance of the corresponding genes, the relative abundance of the corresponding MGS is obtained, and the MGS with obvious difference between the relative abundance of the case and the relative abundance of the control are calculated.
1.6 screening potential biomarkers for the development and progression of asthma Using random forest (ROC/AUC)
To further screen for potential disease intestinal biomarkers, the present example constructs a training set of biomarkers for asthmatic subjects and non-asthmatic subjects, and based thereon, evaluates the biomarker content values of the samples to be tested. Wherein in the present invention, the training set and the validation set have meanings known in the art. In embodiments of the invention, the training set refers to a data set comprising the content of each biomarker in a test sample of asthmatic subjects and non-asthmatic subjects for a certain number of samples. The validation set is a separate set of data used to test the performance of the training set. The non-asthmatic subjects are those with a good mental state, and the subjects may be human or model animals, and in this embodiment, human subjects are used for the experiment.
The method specifically comprises the following steps:
the present invention uses the oversampling method to randomly replace 29 diseased samples from 221 samples (healthy person: 185 and asthmatic patient: 36) (29 missing phenotype samples discarded from the 250 total samples), and since the asthmatic disease samples are too few, reference Oversampling method for imbalanced classification (Zheng Z, cai Y, li Y. Computing and information, 2016,34 (5): 1017-1037) uses the oversampling method to replace 29 diseased samples, and then 120 asthmatic samples and 120 normal samples extracted from 185 normal samples are selected to form 240 samples (120 asthmatic patients and 120 normal persons) as training sets, and the rest samples are used as verification sets (7 asthmatic patients and 65 normal persons).
1.6.1 biomarkers screened Using training set data
First, the relative abundance of each gene in each sample in the training set was calculated and the genes were clustered according to the method described in 1.4-1.5. Then, MGS with the number of training set genes being more than 700 is input into a Random Forest (RF) classifier (random forest 4.6-12in R3.2.5). The classifier is subjected to 5 times of 10-fold cross validation and 10 times of repetition, the relative abundance of MGS screened by the RF model is used for calculating the asthma disease risk of each individual, a curve of the operating characteristics (receiver operation characteristic, ROC) of the subject is drawn, and the area under the curve (AUC) is calculated as a performance evaluation parameter of the discrimination model. The combination with the marker combination number smaller than 30 and the best discrimination performance is selected as the combination of the invention. The higher the frequency of selection of each MGS is output in the model, the higher the importance representing the marker for discriminating between asthma and non-asthma.
The RF classifier obtained by the invention comprises 3 metabolites (i.e. 3 biomarkers), the relative abundance of the 3 biomarkers is shown in table 1, and the detailed information is shown in table 2. Figure 3 shows the error rate distribution for 5 10 fold cross-validation in a random forest classifier. The model was trained with training set samples (120 asthmatic patients, normal control 120) on the relative abundance of MGS meeting the targets obtained by MWAS procedure treatment. The black solid curve in fig. 3 represents the average of 5 trials (the light gray curve represents 5 trials) and the vertical line represents the number of MGS in the selected optimal combination. Fig. 4 shows the judgment of asthma patients and healthy controls based on a random forest model (3 biomarkers), receiver Operating Curve (ROC) and Area Under Curve (AUC) of the training set, wherein the specificity characterizes the probability for non-diseased judgment pairs, the sensitivity refers to the probability for diseased judgment pairs, wherein the judgment efficacy for the training set samples is: auc=98.72%, 95% confidence interval ci=96.92-100%, and the results indicate that the resulting metabolite combinations of this model can be used as potential biomarkers for distinguishing between asthma and non-asthma.
Table 1 random forest model training set intestinal Marker (MGS) relative abundance data
Figure BDA0001638623580000161
/>
Figure BDA0001638623580000171
/>
Figure BDA0001638623580000181
/>
Figure BDA0001638623580000191
/>
Figure BDA0001638623580000201
/>
Figure BDA0001638623580000211
/>
Figure BDA0001638623580000221
/>
Figure BDA0001638623580000231
Table 2 3 biomarker detailed information
Figure BDA0001638623580000232
/>
Figure BDA0001638623580000241
Wherein, in Table 2, the size of each marker gene set represents the number of nucleic acid sequences included in each marker; the marker gene set annotation numbers represent: how many genes are annotated to the marker; the optimal annotation of the markers is characterized in that all gene sets included by each marker are compared with an IMG (v 400) database, and corresponding species classification is obtained; the optimal annotated gene proportions are characterized by: how much proportion of genes within this cluster are annotated to that species; the best annotated similarity is characterized by: annotating the species in the gene clusters, wherein the average value of annotation accuracy of all genes is used as the optimal annotation similarity of the marker; the direction of enrichment represents the change in relative abundance of each biomarker in asthmatic patients and healthy controls, where a < N represents the relative abundance of the biomarker in asthmatic patients as less than in healthy controls and N < a represents the relative abundance of the biomarker in asthmatic patients as greater than in healthy controls; the screening frequency is represented by: 5-fold 10-fold cross-validation was performed with the frequency at which the biomarker was selected; the validation set AUC represents: representing the discrimination degree of the verification set data under the training set data obtaining model; the 95% confidence interval (95% ci) is between a and b, representing a corresponding 95% probability for each biomarker given, which can be said to be 5% for samples between a and b given.
As can be seen from table 2, asthmatic patients showed an increase in relative abundance at Eggerthella lenta DSM2243 and a decrease in relative abundance at Bacteroides stercoris ATCC 43183, sutterella wadsworthensis 3_1_45b compared to healthy controls in the column of the enrichment direction.
Strain information for each microorganism is given in table 3.
Table 3 3 biomarker information
Figure BDA0001638623580000242
Figure BDA0001638623580000251
Table 4 shows 3 biomarker combinations to predict the probability of illness for the training set, where probability of illness > = 0.5 can confirm that an individual is at risk of asthma or has asthma.
Table 43 probability of disease for biomarker binding predictive training sets
Figure BDA0001638623580000252
/>
Figure BDA0001638623580000261
/>
Figure BDA0001638623580000271
/>
Figure BDA0001638623580000281
1.6.2 verification of the biomarkers screened Using the verification set data
The invention subsequently uses independent populations to verify the model, and the disease probability (RP) is more than or equal to 0.5 to predict that an individual has the risk of suffering from asthma disease or asthma. First, the relative abundance of each biomarker in each sample in the validation set was calculated according to the method described in 1.5. The verification set data is then verified using a random forest model according to the method of 1.6.1.
Table 5 random forest model validation set of intestinal Marker (MGS) relative abundance data
Figure BDA0001638623580000291
/>
Figure BDA0001638623580000301
/>
Figure BDA0001638623580000311
Based on the model:
Figure 5 shows the determination of asthmatic patients and healthy controls based on a random forest model (3 biomarkers), receiver Operating Curve (ROC) and Area Under Curve (AUC) for the validation set, where for independent validation set 1 (asthma=7 and healthy controls=65), the determination AUC for the model=99.78% (95% ci=99.17-100%); accuracy = 100%.
Random forest model classification and regression was performed using "random forest 4.6-12package" in version 3.2.5, R. Inputs included training set data (i.e., the relative abundance of selected MGSs markers in the training samples, see table 1), sample disease states (sample disease states of the training samples are vectors, '1' for asthma, '0' for healthy persons), and a validation set (the relative abundance of selected MGSs markers in the validation set, see table 5). Then, the inventor predicts the verification set data by utilizing a random forest function of a random forest packet in R software to establish classification and prediction functions, and outputs a prediction result (the disease probability; the threshold value is 0.5, and if the probability of the disease is more than or equal to 0.5, the risk of asthma is considered.
Table 6 random forest model (based on 3 intestinal marker binding, individual biomarkers) predicts asthma and healthy controls
Is at risk of or probability of suffering from asthma
Figure BDA0001638623580000312
/>
Figure BDA0001638623580000321
/>
Figure BDA0001638623580000331
/>
Figure BDA0001638623580000341
The results show that the biomarker disclosed by the invention has higher accuracy and specificity and good development prospect as a diagnosis method, thereby providing basis for disease risk assessment, diagnosis and early diagnosis of asthma and searching for potential drug targets.
In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", "axial", "radial", "circumferential", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; may be mechanically connected, may be electrically connected or may be in communication with each other; either directly or indirectly, through intermediaries, or both, may be in communication with each other or in interaction with each other, unless expressly defined otherwise. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the present invention, unless expressly stated or limited otherwise, a first feature "up" or "down" a second feature may be the first and second features in direct contact, or the first and second features in indirect contact via an intervening medium. Moreover, a first feature "above," "over" and "on" a second feature may be a first feature directly above or obliquely above the second feature, or simply indicate that the first feature is higher in level than the second feature. The first feature being "under", "below" and "beneath" the second feature may be the first feature being directly under or obliquely below the second feature, or simply indicating that the first feature is less level than the second feature.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims (14)

1. Use of a reagent for detecting a biomarker in the preparation of a kit for diagnosing whether a subject has asthma or predicting whether a subject has a risk of asthma;
The biomarker is:
faeces bacteroides (Fr.) SingBacterodies stercoris) Egget's bacteria tardiveEggerthella lenta) And Waldsat's bacteriaSutterella wadsworthensis)。
2. The use according to claim 1, wherein the biomarker is bacteroides faecalis ATCC 43183 #Bacterodies stercorisATCC 43183), eggerthella lenta DSM 2243Eggerthella lentaDSM 2243) and Waldesate's bacteria 3_1_45B%Sutterella wadsworthensis 3_1_45B)。
3. Use according to claim 1, characterized in that said diagnosis or prognosis comprises the following steps:
1) Collecting a sample from the subject;
2) Determining the relative abundance information of the biomarker in the sample obtained in step 1);
3) Comparing the relative abundance information described in step 2) with a reference dataset or reference value.
4. The use of claim 3, wherein the reference dataset comprises relative abundance information of the biomarker in samples from a plurality of asthmatic patients and a plurality of healthy controls.
5. The use according to claim 3, further comprising, in the step of comparing the relative abundance information described in step 2) with a reference dataset, performing a multivariate statistical model to obtain a probability of illness.
6. The use according to claim 5, wherein the multivariate statistical model is a random forest model.
7. The use of claim 5, wherein the probability of illness being greater than a threshold value indicates that the subject has or is at risk of having asthma.
8. The use according to claim 7, wherein the threshold value is 0.5.
9. The use according to claim 3, wherein said bacteroides faecalis when compared to a reference valueBacterodies stercoris) The gardnerella is preparedSutterella wadsworthensis) Is indicative of the subject suffering from or at risk of suffering from asthma; the Egget bacteria tardaEggerthella lenta) An increase in (a) indicates that the subject has or is at risk of having asthma.
10. Use according to claim 3, characterized in that the relative abundance information of the biomarker in step 2) is obtained by a sequencing method comprising:
isolating a nucleic acid sample from said sample of said subject,
constructing a DNA library based on the obtained nucleic acid sample, sequencing the DNA library to obtain a sequencing result,
and comparing the sequencing result to a reference gene set based on the sequencing result to determine the relative abundance of the biomarker.
11. Use according to claim 10, characterized in that the reference gene set comprises performing metagenomic sequencing from samples of a plurality of asthmatic patients and a plurality of healthy controls to obtain a non-redundant gene set, and then combining the non-redundant gene set with the intestinal microbial gene set to obtain the reference gene set.
12. The use according to claim 10, wherein the sample is a fecal sample.
13. The use according to claim 10, wherein the sequencing method is performed by a second generation sequencing method or a third generation sequencing method.
14. The use according to claim 10, wherein the sequencing method is performed by at least one selected from the group consisting of Hiseq2000, SOLiD, 454, and single molecule sequencing devices.
CN201810371588.XA 2018-04-24 2018-04-24 Asthma biomarker and application thereof Active CN110396537B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810371588.XA CN110396537B (en) 2018-04-24 2018-04-24 Asthma biomarker and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810371588.XA CN110396537B (en) 2018-04-24 2018-04-24 Asthma biomarker and application thereof

Publications (2)

Publication Number Publication Date
CN110396537A CN110396537A (en) 2019-11-01
CN110396537B true CN110396537B (en) 2023-06-20

Family

ID=68320264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810371588.XA Active CN110396537B (en) 2018-04-24 2018-04-24 Asthma biomarker and application thereof

Country Status (1)

Country Link
CN (1) CN110396537B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016049920A1 (en) * 2014-09-30 2016-04-07 Bgi Shenzhen Co., Limited Biomarkers for coronary artery disease
CN107541544A (en) * 2016-06-27 2018-01-05 卡尤迪生物科技(北京)有限公司 Methods, systems, kits, uses and compositions for determining a microbial profile
CN107849569A (en) * 2015-11-05 2018-03-27 深圳华大生命科学研究院 Adenocarcinoma of lung biomarker and its application

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9050276B2 (en) * 2009-06-16 2015-06-09 The Trustees Of Columbia University In The City Of New York Autism-associated biomarkers and uses thereof
US20150211053A1 (en) * 2012-08-01 2015-07-30 Bgi-Shenzhen Biomarkers for diabetes and usages thereof
GB201505364D0 (en) * 2015-03-27 2015-05-13 Genetic Analysis As Method for determining gastrointestinal tract dysbiosis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016049920A1 (en) * 2014-09-30 2016-04-07 Bgi Shenzhen Co., Limited Biomarkers for coronary artery disease
CN107849569A (en) * 2015-11-05 2018-03-27 深圳华大生命科学研究院 Adenocarcinoma of lung biomarker and its application
CN107541544A (en) * 2016-06-27 2018-01-05 卡尤迪生物科技(北京)有限公司 Methods, systems, kits, uses and compositions for determining a microbial profile

Also Published As

Publication number Publication date
CN110396537A (en) 2019-11-01

Similar Documents

Publication Publication Date Title
CN111430027B (en) Duplex affective disorder biomarker based on intestinal microorganisms and screening application thereof
CN110904213B (en) Ulcerative colitis biomarker based on intestinal flora and application thereof
CN112119167B (en) Biomarker for depression and application thereof
CN105603066B (en) Intestinal microbial marker of mental disorder and application thereof
WO2020244018A1 (en) Small-scale schizophrenia biomarker combination, application thereof and metaphlan2 screening method therefor
CN111020020A (en) Biomarker combination for schizophrenia, application thereof and metaplan 2 screening method
CN104769132A (en) Gene signatures of inflammatory disorders that relate to the liver
CN111505288A (en) Novel depression biomarker and application thereof
CN110396538B (en) Migraine biomarkers and uses thereof
CN113913490A (en) Non-alcoholic fatty liver marker microorganism and application thereof
CN112384634B (en) Osteoporosis biomarker and application thereof
CN110396537B (en) Asthma biomarker and application thereof
CN114657270B (en) Alzheimer disease biomarker based on intestinal flora and application thereof
CN111020021A (en) Intestinal flora-based small-scale schizophrenia biomarker combination, application thereof and mOTU screening method
CN112011605B (en) Use of microbial flora in disease diagnosis
CN112063709B (en) Diagnosis kit for myasthenia gravis by taking microorganisms as diagnosis markers and application
WO2021184413A1 (en) Gut microbe-based biomarkers for predicting curative effect on bipolar disorder, and screening and applications thereof
CN114317671A (en) Intestinal bacteria and fecal metabolites capable of being used as biomarkers of type 1diabetes and application thereof
CN109266733B (en) Autistic intestinal flora virulence factor gene and application thereof
CN112877417A (en) Screening and application of polycystic ovarian syndrome intestinal flora biomarker
CN109182577B (en) Autism biomarker and application thereof
CN109072278A (en) Isolated nucleic acid and application
CN111996248B (en) Reagent for detecting microorganism and application thereof in diagnosis of myasthenia gravis
CN112048565B (en) Intestinal flora for diagnosing myasthenia gravis and application thereof
CN113166815B (en) Use of intestinal metagenome in screening PD-1 antibody blocking agent curative effect

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant