CN111128378B - Prediction method for evaluating infant intestinal flora development age - Google Patents

Prediction method for evaluating infant intestinal flora development age Download PDF

Info

Publication number
CN111128378B
CN111128378B CN201911278021.9A CN201911278021A CN111128378B CN 111128378 B CN111128378 B CN 111128378B CN 201911278021 A CN201911278021 A CN 201911278021A CN 111128378 B CN111128378 B CN 111128378B
Authority
CN
China
Prior art keywords
data
age
feature
intestinal flora
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911278021.9A
Other languages
Chinese (zh)
Other versions
CN111128378A (en
Inventor
杨恒文
谭宇翔
钟竞辉
尹芝南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN201911278021.9A priority Critical patent/CN111128378B/en
Publication of CN111128378A publication Critical patent/CN111128378A/en
Application granted granted Critical
Publication of CN111128378B publication Critical patent/CN111128378B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a prediction method for evaluating the development age of intestinal flora of infants, which comprises the following steps: acquiring intestinal flora data of infants; constructing a prediction model, namely a classification data model, by using the intestinal flora data as a basis through linear discriminant analysis and random forests; inputting a sample to be detected into a prediction model for prediction, and outputting classification data to obtain a prediction result; obtaining the intestinal flora development age range of the sample to be detected according to the prediction result; comparing the development age of the intestinal flora of the obtained sample to be detected with the actual age, and judging whether the intestinal tract of the infant is disordered or has development deviation; according to the invention, the linear discriminant analysis and the random forest combination are adopted to construct the prediction model, so that the accuracy is greatly improved, the corresponding age is predicted through the prediction model, and then whether the flora is in dysplasia or not is evaluated through the comparison of the predicted age and the actual age.

Description

Prediction method for evaluating infant intestinal flora development age
Technical Field
The invention relates to the field of research of intestinal flora prediction, in particular to a prediction method for evaluating the development age of infant intestinal flora.
Background
In the prior art, the detection method for human intestinal microorganisms is relatively few, such as CN109448842a patent, linear discrimination is not used, the content of the patent is not specific to infants, and is not specific to the judgment of newly added individual individuals, whether the intestinal microorganisms of the human are unbalanced is mainly estimated, the age is not predicted, the reference of the age is not used, the accuracy of the prediction is less than 70%, such as CN108345768A patent, the maturity of flora is predicted, the age is not predicted, the accuracy of the prediction is also relatively low, the unbalance of the intestinal microorganisms is a sub-health result, and meanwhile, the sub-health is possibly aggravated, so that diseases occur. Intestinal microecology is the most important and huge ecological system of organisms. The large number of microorganisms in the intestinal tract are in dynamic balance and relatively stable at all times. Numerous factors influence this balance. The occurrence, development and treatment of sub-health of human body are accompanied by the change or unbalance of intestinal microecological normal flora, thereby affecting the growth and development of infants. However, to date, there is no good method for predicting the age of infant intestinal flora.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, and provides a prediction method for evaluating the development age of an infant intestinal flora, which is used for establishing a prediction model and judging whether the intestinal flora is dysregulated or not by predicting the age of the intestinal flora.
The aim of the invention is achieved by the following technical scheme:
a predictive method for assessing infant gut flora development age comprising the steps of:
acquiring intestinal flora data of infants as raw data and storing the raw data in a reference data set of a database;
preprocessing by linear discriminant analysis based on the intestinal flora data to obtain classification data, and constructing a prediction model by random forest training;
inputting a sample to be detected into a prediction model for prediction to obtain a prediction result, and obtaining an intestinal flora development age bracket of the sample to be detected according to the prediction result;
and comparing the development age of the intestinal flora of the obtained sample to be detected with the actual age, and judging whether the intestinal tract of the infant is disordered or has development deviation.
Further, the method for acquiring the intestinal flora data of the infant specifically comprises the following steps: sequencing and analyzing by 16S amplicon sequencing technology, collecting 1-48 months of healthy infant excreta for testing, observing infant status and recording in the reference data set of the database.
Further, the intestinal flora data is 525-dimensional 10 classification data with labels, wherein 525-dimensional refers to the flora structure consisting of 525 strain classification units; the 10-class data includes 8 classes of 1-48 months, and two classes of young and middle-aged and elderly people.
Further, the construction of the prediction model is specifically as follows:
preprocessing the 525-dimensional 10 classification data with the tag by using the intestinal flora data as a basis and the corresponding sampling age information and adopting linear discriminant analysis, namely reducing the dimension to obtain low-dimensional data; and dividing the low-dimensional data into training data and test data by adopting a random forest, setting the number of the basic classifier as K, and training to obtain a prediction model.
Further, the ratio of the training data to the test data is 7:3; the number K of the basic classifiers is more than 100.
Further, the predicting is performed to obtain a predicting result, specifically:
determining the importance of each original feature of an original data set, namely the importance of the features of an original flora, respectively performing disorder arrangement operation on new features obtained by linear discriminant analysis and conversion to obtain disorder arrangement features, classifying the disorder arrangement features again by using a random forest, and judging the importance of each disorder arrangement feature according to the difference between the accuracy of the prediction model and the accuracy of the original model obtained each time to obtain disorder arrangement importance;
calculating a correlation coefficient between each original feature and each disordered feature, determining the correlation between the original feature and the disordered feature, and obtaining the Pearson correlation coefficient absolute value between the original feature and the disordered feature as a weight, wherein the feature importance of the original feature is calculated as follows:
wherein F is i Is the characteristic importance of the ith original strain, p i,j For the Pearson phase relationship between the ith original strain and the jth new featureNumber f j The importance is arranged out of order for the jth new feature.
Further, comparing the development age of the intestinal flora of the sample to be detected with the actual age, and judging whether the intestinal tract of the infant is disordered or has development deviation, wherein the method specifically comprises the following steps:
if the actual age deviation between the predicted age range and the sample of the test target individual is less than N months, the test target individual is normal; if the deviation is greater than N months, the flora is dysplasia, and an intervention scheme is further formulated according to actual conditions.
Further, N is 12.
Compared with the prior art, the invention has the following advantages and beneficial effects:
according to the invention, the data set is established by amplicon sequencing collected data, the linear discriminant analysis and the random forest are adopted to establish the prediction model, the discrimination of multiple age groups is supported, the coverage range is wide, the prediction accuracy is improved, the development condition of intestinal flora of infants is concerned, the problems of a series of subsequent immunity, metabolism, nervous systems and the like caused by flora disorder can be avoided in advance, and the method has important significance for the prepotency.
Drawings
FIG. 1 is a flow chart of a prediction method for assessing the development age of intestinal flora of infants according to the invention.
FIG. 2 is a schematic diagram of prediction accuracy according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
Examples
A predictive method for assessing infant gut flora development age, as shown in figure 1, comprising the steps of:
acquiring intestinal flora data of infants;
because the composition of intestinal microorganisms in feces is changing in real time and is affected by many different short-term factors (e.g., antibiotic use, probiotic ingestion, disease state, etc.). Thus, to establish a baseline dataset covering the developmental age span of healthy infants, faeces of healthy infants of 1, 6, 12, 18, 24, 30, 36, 48 months were collected, and had no intestinal related condition (such as constipation or diarrhea) nor immune-activating diseases (such as cold and fever) at the time of sample collection, and had no antibiotics and probiotics, prebiotic formulation taken within one month. During sample collection, the condition of the infant can be observed and recorded by the collection personnel. The feces are placed into three collecting pipes, placed into dry ice for preservation and quickly returned to a laboratory for being placed into a refrigerator with minus 80 ℃, and can be preserved for 2 weeks at normal temperature if the feces are preserving pipes with normal-temperature preserving fluid; if an empty holding tube, it must be placed in dry ice or other low temperature environment for no more than 24 hours and transferred to a low temperature refrigerator or subjected to DNA extraction as soon as possible. Extracting DNA from the sample, and then preparing the sample; and (5) carrying out amplicon sequencing on the sample prepared by a sequencer to obtain an amplicon sequencing result. 4. And (5) analyzing the amplicon sequencing data to obtain intestinal flora data.
The DNA extracted from the sample was checked for concentration using a Qubit instrument and the quality was observed by agarose gel electrophoresis. The V4 region of 16S rRNA was selected for amplicon sequencing (front primer: 515F:5'-GTGCCAGCMGCCGCGGTAA-3' and rear primer: 806R:5 '-GGACTACHVGGGTWTCTAAT-3'). The primer sequence has a 3' -end linking sequence of Illumina and a sample identification sequence with the length of 12bp, and the Illumina Miseq and HiSeq2500 platforms are used for sequencing.
And sequencing the lower-level data to obtain specific data sets of different samples according to the sample identification sequences. The data were double-ended spliced and low quality fragments were removed using FLASH software. And the use of USEARCH method and GreenGene database for chimeric removal improves data purity. Finally, analysis of the entire flora structure, using QIIME kits, is practical.
Constructing a prediction model by linear discriminant analysis and random forests on the basis of the intestinal flora data;
the data MINdePTH-L7 to be processed is 525-dimensional 10-class data with a label, linear Discriminant Analysis (LDA) is adopted to supervise and preprocess the class data to obtain class data, and a predictive model capable of classifying the related data in multiple ways is obtained through training by a Random Forest (Random Forest) multiple class method.
Wherein, in order to convert the high-dimensional data set into a form which is easier to process, linear judgment analysis (LDA) is utilized to supervise and reduce given data from 525 dimension to 9 dimension, thereby not only obtaining a more efficient data expression form, but also being beneficial to further training and prediction of a machine learning model on the basis.
In order to complete the data training and prediction, a classification method which is lighter and is convenient for processing Missing values (Missing values) is adopted, and the forest is random. In the random forest, the proportion of training data and test data is divided into 70% and 30% due to the sparsity of the training data, and the number of basic classifiers is set to 200, so that a final prediction model is obtained.
Inputting a sample to be detected into a prediction model for prediction, and outputting classification data to obtain a prediction result; obtaining the intestinal flora development age range of the sample to be detected according to the prediction result;
in order to further determine the importance of each feature (flora) on the original dataset according to the classification data (Feature Importance), we respectively perform disorder operation (Permutation) on the new features obtained by the 9 LDA conversions to obtain disorder features, re-classify the data obtained by each disorder operation by using a random forest, and judge the importance of each disorder feature according to the difference between the precision of the classification model obtained each time and the precision of the original model, wherein the importance of each disorder feature is called disorder importance (Permutation Importance).
Based on the importance of the out-of-order arrangement of the 9 new features, we want to calculate the importance of 525 original features (flora). Firstly, determining the correlation between the original feature and the new feature by calculating Pearson correlation coefficients between each original feature and 9 new features, and finally calculating the feature importance of the original feature by taking the absolute value of the correlation coefficient between the original feature and the new feature as a weight, wherein the feature importance of the original feature is calculated as follows:
wherein F is i Is the characteristic importance of the ith original strain, p i,j For the Pearson correlation coefficient between the ith original strain and the jth new feature, f j The importance of the unordered arrangement for the jth new feature;
and comparing the development age of the intestinal flora of the obtained sample to be detected with the actual age, and judging whether the intestinal tract of the infant is disordered or has development deviation. If the actual age deviation between the predicted age range and the sample of the test target individual is less than 12 months, the test target individual is normal; if the deviation is more than 12 months, the flora is dysregulated, and an intervention scheme is further formulated according to actual conditions.
The prediction results are shown in fig. 2, wherein,
1 month (22 persons); b:6 months (34 people); c, 12 months (30 people); 18 months (20 people);
e, 24 months (18); f, 30 months (9 persons); g, 36 months (13); h48 months (16 persons);
adult (36-51 years of age) (13); y adult (20-27 years old) (22)
Co 197
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (2)

1. A predictive method for assessing infant gut flora development age comprising the steps of:
acquiring intestinal flora data of infants as raw data and storing the raw data in a reference data set of a database;
the intestinal flora data of the infants are obtained specifically as follows: sequencing and analyzing by a 16S amplicon sequencing technology, collecting the excreta of healthy infants for 1-48 months for testing, observing the conditions of the infants and recording the conditions in a reference data set of a database;
the intestinal flora data are 525-dimensional 10 classification data with labels, wherein 525-dimensional refers to the fact that the flora structure is composed of 525 strain classification units; the 10 classification data comprises 8 classifications of 1-48 months and two classifications of young and middle-aged and elderly people;
preprocessing by linear discriminant analysis based on the intestinal flora data to obtain classification data, and constructing a prediction model by random forest training;
the construction of the prediction model is specifically as follows:
preprocessing the 525-dimensional 10 classification data with the tag by using the intestinal flora data as a basis and the corresponding sampling age information and adopting linear discriminant analysis, namely reducing the dimension to obtain low-dimensional data; dividing the low-dimensional data into training data and test data by adopting a random forest, setting the number of basic classifiers as K, and training to obtain a prediction model;
inputting a sample to be detected into a prediction model for prediction to obtain a prediction result, and obtaining an intestinal flora development age bracket of the sample to be detected according to the prediction result;
predicting to obtain a prediction result, specifically:
determining the importance of each original feature of an original data set, namely the importance of the features of an original flora, respectively performing disorder arrangement operation on new features obtained by linear discriminant analysis and conversion to obtain disorder arrangement features, classifying the disorder arrangement features again by using a random forest, and judging the importance of each disorder arrangement feature according to the difference between the accuracy of the prediction model and the accuracy of the original model obtained each time to obtain disorder arrangement importance;
calculating a correlation coefficient between each original feature and each disordered feature, determining the correlation between the original feature and the disordered feature, and obtaining the Pearson correlation coefficient absolute value between the original feature and the disordered feature as a weight, wherein the feature importance of the original feature is calculated as follows:
wherein F is i Is the characteristic importance of the ith original strain, p i,j For the Pearson correlation coefficient between the ith original strain and the jth new feature, f j The importance of the unordered arrangement for the jth new feature;
and comparing the development age of the intestinal flora of the obtained sample to be detected with the actual age, and judging whether the intestinal tract of the infant is disordered or has development deviation.
2. A predictive method for assessing the developmental age of an intestinal flora of an infant according to claim 1 wherein the ratio of training data to test data is 7:3; the number K of the basic classifiers is more than 100.
CN201911278021.9A 2019-12-12 2019-12-12 Prediction method for evaluating infant intestinal flora development age Active CN111128378B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911278021.9A CN111128378B (en) 2019-12-12 2019-12-12 Prediction method for evaluating infant intestinal flora development age

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911278021.9A CN111128378B (en) 2019-12-12 2019-12-12 Prediction method for evaluating infant intestinal flora development age

Publications (2)

Publication Number Publication Date
CN111128378A CN111128378A (en) 2020-05-08
CN111128378B true CN111128378B (en) 2023-08-25

Family

ID=70498577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911278021.9A Active CN111128378B (en) 2019-12-12 2019-12-12 Prediction method for evaluating infant intestinal flora development age

Country Status (1)

Country Link
CN (1) CN111128378B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112712856A (en) * 2020-12-25 2021-04-27 北京群峰纳源健康科技有限公司 Method for analyzing dietary structure based on intestinal flora

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009058915A1 (en) * 2007-10-29 2009-05-07 The Trustees Of The University Of Pennsylvania Computer assisted diagnosis (cad) of cancer using multi-functional, multi-modal in-vivo magnetic resonance spectroscopy (mrs) and imaging (mri)
CN104851346A (en) * 2015-04-30 2015-08-19 暨南大学 Modular animal digestive tract in-vitro simulation system and human intestinal tract simulation method thereof
CN108345768A (en) * 2017-01-20 2018-07-31 深圳华大生命科学研究院 A kind of method and marker combination of determining infant's intestinal flora maturity

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3140424B1 (en) * 2014-05-06 2020-04-29 IS Diagnostics LTD Microbial population analysis
US11001900B2 (en) * 2015-06-30 2021-05-11 Psomagen, Inc. Method and system for characterization for female reproductive system-related conditions associated with microorganisms

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009058915A1 (en) * 2007-10-29 2009-05-07 The Trustees Of The University Of Pennsylvania Computer assisted diagnosis (cad) of cancer using multi-functional, multi-modal in-vivo magnetic resonance spectroscopy (mrs) and imaging (mri)
CN104851346A (en) * 2015-04-30 2015-08-19 暨南大学 Modular animal digestive tract in-vitro simulation system and human intestinal tract simulation method thereof
CN108345768A (en) * 2017-01-20 2018-07-31 深圳华大生命科学研究院 A kind of method and marker combination of determining infant's intestinal flora maturity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
臧凯丽等.微生态制剂调节便秘、腹泻人...短链脂肪酸关键菌属的相关性.食品科学.2018,第39卷(第05期),第155-165页. *

Also Published As

Publication number Publication date
CN111128378A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN109706235A (en) A kind of the detection and analysis method and its system of intestinal microflora
CN104603283B (en) Determine the method and system of abnormality associated biomarkers
Robinson et al. Intricacies of assessing the human microbiome in epidemiologic studies
CN110892081A (en) Method for diagnosing dysbacteriosis
CN108345768B (en) Method for determining maturity of intestinal flora of infants and marker combination
CN110097928B (en) Prediction method and prediction model for predicting tissue trace element content based on intestinal flora
Kudirkiene et al. Rapid and accurate identification of Streptococcus equi subspecies by MALDI-TOF MS
CN113186310B (en) Method for predicting healthy aging through relative abundance of intestinal flora
CN112852916A (en) Marker combination for intestinal microecology, auxiliary diagnosis model and application of marker combination
CN111128378B (en) Prediction method for evaluating infant intestinal flora development age
CN111206079A (en) Death time inference method based on microbiome sequencing data and machine learning algorithm
CN112435756A (en) Intestinal flora associated disease risk prediction system based on mutual evidence of multiple data set differences
CN114582429B (en) Mycobacterium tuberculosis drug resistance prediction method and device based on hierarchical attention neural network
CN115896242A (en) Intelligent cancer screening model and method based on peripheral blood immune characteristics
CN110827917A (en) Method for identifying individual intestinal flora type based on SNP
CN110734989A (en) medicinal plant symbiotic microorganism identification method and application thereof
CN114023386A (en) Metagenome data analysis and characteristic bacteria screening method
CN110111841B (en) Method for constructing identification model of atherosclerosis
CN116590381A (en) Method for screening key water quality factors influencing river biodiversity by reclaimed water supplementing
CN112908414A (en) Large-scale single cell typing method, system and storage medium
CN115527608A (en) Intestinal age prediction method and system
Tang et al. Mixed effect Dirichlet-tree Multinomial for longitudinal microbiome data and weight prediction
Paulson Normalization and differential abundance analysis of metagenomic biomarker-gene surveys
CN111261222A (en) Construction method and application of oral microbial community detection model
CN114717342B (en) Intestinal microbial gene marker for predicting neutralizing antibody level of new coronal pneumonia patient after one year and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant