CN109448842B - The determination method, apparatus and electronic equipment of human body intestinal canal Dysbiosis - Google Patents

The determination method, apparatus and electronic equipment of human body intestinal canal Dysbiosis Download PDF

Info

Publication number
CN109448842B
CN109448842B CN201811357592.7A CN201811357592A CN109448842B CN 109448842 B CN109448842 B CN 109448842B CN 201811357592 A CN201811357592 A CN 201811357592A CN 109448842 B CN109448842 B CN 109448842B
Authority
CN
China
Prior art keywords
sample
random forest
model
enteron aisle
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811357592.7A
Other languages
Chinese (zh)
Other versions
CN109448842A (en
Inventor
朱永亮
陆敏
穆延召
陈倩
张水龙
史文阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Prison Gene Technology Co., Ltd.
Original Assignee
Suzhou Puruisen Gene Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Puruisen Gene Technology Co Ltd filed Critical Suzhou Puruisen Gene Technology Co Ltd
Priority to CN201811357592.7A priority Critical patent/CN109448842B/en
Publication of CN109448842A publication Critical patent/CN109448842A/en
Application granted granted Critical
Publication of CN109448842B publication Critical patent/CN109448842B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Abstract

The present invention provides the determination method, apparatus and electronic equipment of a kind of human body intestinal canal Dysbiosis, it is related to microbial ecology technical field, the determination method of human body intestinal canal Dysbiosis includes: to obtain the intestinal flora data of target body, wherein, intestinal flora data are a sample data comprising multiple species markers;Intestinal flora data are input in enteron aisle random forest prediction model and are predicted, output result is obtained;Enteron aisle random forest prediction model is corresponding with the threshold value for judgement sample phenotype, and enteron aisle random forest prediction model includes: random forest OTU model or Shannon exponential random forest model;Judge whether the intestinal microecology of target body is in imbalance state based on output result.The present invention can judge whether target body is in intestinal microecology imbalance state by prediction model.

Description

The determination method, apparatus and electronic equipment of human body intestinal canal Dysbiosis
Technical field
The present invention relates to microbial ecology technical fields, more particularly, to a kind of determination of human body intestinal canal Dysbiosis Method, apparatus and electronic equipment.
Background technique
Since poor eating habits, environmental pollution and various pesticides, chemical fertilizer, antibiotic, the unreasonable of hormone use finally Influence the health of the mankind.Some in these influences may be the process of an accumulation, will not immediately result in human body illness, but can It can lead to human body Dysbiosis, become an important factor for facilitating human-body sub-health.Microecology was developed over nearly three, 40 years The emerging biological subject got up, he is research host (people, animal, plant) and its intracorporal microflora and inner and outer ring The subject of border correlation.From ecological angle, the relationship of human health and internal and external environment is studied, various internal and external environments are disclosed To the affecting laws of human body.Dysbiosis is one important symbol of human-body sub-health.Dysbiosis indication human body occur is By illness or illness.The inferior health that Dysbiosis corrects human body in other words is corrected, cannot go to solve such as by medical department This numerous human body group problem, it is necessary to be realized by publicizing and popularizing public health and nutrient knowledge.
Nearly all inducement for causing inferior health all may cause the unbalance of intestinal microecology.Intestinal microecology it is unbalance both It is inferior health as a result, inferior health may also be aggravated simultaneously, leads to the generation of disease.Intestinal microecology is that body is most important, most Huge, the especially special ecosystem.In enteron aisle a large amount of microbial bacteria moment be in dynamic equilibrium and it is relatively stable among.It is numerous Factor influences this balance.Generation, development and the Cure outcome of human-body sub-health are along with intestinal microecology normal flora Variation is unbalance.But up to the present, there are no the methods for judging human body intestinal canal Dysbiosis well.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of determination methods of human body intestinal canal Dysbiosis.
In a first aspect, the embodiment of the invention provides a kind of determination methods of human body intestinal canal Dysbiosis, comprising: obtain The intestinal flora data of target body, wherein intestinal flora data are a sample data comprising multiple species markers;It will Intestinal flora data are input in enteron aisle random forest prediction model and are predicted, obtain output result;Enteron aisle random forest is pre- Survey model and be corresponding with the threshold value for judgement sample phenotype, enteron aisle random forest prediction model include: random forest OTU model or Shannon exponential random forest model;Judge whether the intestinal microecology of target body is in imbalance state based on output result.
With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein intestines The quantity of road random forest prediction model is multiple;Intestinal flora data are input in enteron aisle random forest prediction model and are carried out Prediction obtains output result, comprising: intestinal flora data are input in each enteron aisle random forest prediction model and are carried out in advance It surveys, obtains multiple output results;Judge whether the intestinal microecology of target body is in imbalance state based on multiple output results, Include: the prediction result for determining each enteron aisle random forest prediction model based on each output result, obtains multiple prediction results; And whether the intestinal microecology based on the ratio-dependent target body of target prediction result in multiple prediction results is in unbalance shape State, wherein target prediction result is to be in imbalance state for characterizing the intestinal microecology of target body in multiple prediction results Prediction result.
With reference to first aspect, the embodiment of the invention provides second of possible embodiments of first aspect, wherein base Whether the intestinal microecology of the ratio-dependent target body of target prediction result is in imbalance state in multiple prediction results, packet It includes: if in multiple prediction results, ratio shared by target prediction result is greater than the first preset ratio threshold value, it is determined that go out target person The intestinal microecology of body is in imbalance state;If in multiple prediction results, it is pre- that ratio shared by target prediction result is greater than second If proportion threshold value, and less than the first preset ratio threshold value, then based on target enteron aisle in multiple enteron aisle random forest prediction models with The prediction result of machine forest prediction model determines whether the intestinal microecology of target body is in imbalance state, and target enteron aisle is random Forest prediction model is enteron aisle random forest prediction model corresponding to max-thresholds in multiple enteron aisle random forest prediction models, Second preset ratio threshold value is less than the first preset ratio threshold value, the corresponding threshold value of each enteron aisle random forest prediction model.
With reference to first aspect, the embodiment of the invention provides the third possible embodiments of first aspect, wherein base The prediction result of each enteron aisle random forest prediction model is determined in each output result, comprising: distinguishes intestinal flora data It inputs in enteron aisle random forest prediction model Ai, obtains output result Bi, wherein it is that multiple enteron aisles are random that i, which successively takes 1 to I, I, The quantity of forest prediction model;If exporting result Bi is more than or equal to threshold corresponding to enteron aisle random forest prediction model Ai Value, then obtain the prediction result that the intestinal microecology for characterizing target body is in imbalance state;If output result Bi is less than Threshold value corresponding to enteron aisle random forest prediction model Ai then obtains the intestinal microecology for characterizing target body and is in normal The prediction result of state.
With reference to first aspect, the embodiment of the invention provides the 4th kind of possible embodiments of first aspect, wherein logical Following steps building random forest OTU model is crossed, specifically includes: obtaining first sample set;First sample set is based on from first The objective matrix of collected intestinal flora data composition in human body group and the second human body group, it is each in the first human body group The intestinal microecology of human body is in normal condition, and the intestinal microecology of each human body is in imbalance state in the second human body group, Objective matrix is the matrix of M*N, and it includes M sample that first sample, which is concentrated, and N indicates that each sample corresponds to N number of species marker;Mesh Mark matrix is by the corresponding N number of species marker of sample each in M sample for indicating the Abundances of species marker quantity Composition;Based on first sample set, enteron aisle Random Forest model is established;Based on enteron aisle Random Forest model, from the N of first sample set In a species marker, the important species marker of predetermined number is filtered out, forms the second sample set;Based on the second sample set structure Build random forest OTU model.
With reference to first aspect, the embodiment of the invention provides the 5th kind of possible embodiments of first aspect, wherein base Random forest OTU model is constructed in the second sample set, comprising: the second sample set is based on, using staying a cross-validation method and target Modeling method establishes random forest OTU model, wherein Target Modeling method is the side for constructing enteron aisle Random Forest model Method.
With reference to first aspect, the embodiment of the invention provides the 6th kind of possible embodiments of first aspect, wherein mesh Mark modeling method, comprising: according to sampling with replacement mode, M sampling is carried out to first sample set, obtains M training set, and be Each training set establishes decision tree, obtains M decision tree;Division processing is carried out to every decision tree in M decision tree, obtains M The decision tree grown to greatest extent, division corresponding to the split vertexes and each split vertexes with every decision tree of determination are special Sign;The decision tree that M grows to greatest extent is combined, enteron aisle Random Forest model is obtained.
With reference to first aspect, the embodiment of the invention provides the 7th kind of possible embodiments of first aspect, wherein right Every decision tree carries out division processing in M decision tree, comprising: is every in N number of species marker that first sample is concentrated Each split vertexes select corresponding disruptive features in decision tree, carry out division processing to every decision tree to realize.
With reference to first aspect, the embodiment of the invention provides the 8th kind of possible embodiments of first aspect, wherein base In enteron aisle Random Forest model, from N number of species marker of first sample set, the important species label of predetermined number is filtered out Object forms the second sample set, comprising: by the importance importance function in enteron aisle Random Forest model, export N number of object The ranking results of the importance of kind marker;By species marker corresponding to predetermined number maximum importance preceding in ranking results As important species marker;Sequence taxonomical unit OTU table based on important species marker forms the second sample set.
With reference to first aspect, the embodiment of the invention provides the 9th kind of possible embodiments of first aspect, wherein base In the second sample set, using staying a cross-validation method and Target Modeling method to establish random forest OTU model, comprising: based on the Each sample in two sample sets constructs M test set and M training set, wherein and M is the quantity of sample in the second sample set, When m-th of test set in M test set is m-th of sample, m-th of training set is in the second sample set in M training set Other samples in addition to m-th of sample;Based on M training set and Target Modeling method, M the first submodels are established, as Random forest OTU model.
With reference to first aspect, the embodiment of the invention provides the tenth kind of possible embodiments of first aspect, wherein side Method further include: using test set corresponding with M the first submodels in M test set, each first submodel is carried out pre- It surveys, obtains M first output result;The first ROC indicatrix is drawn according to M first output result;Based on the first ROC feature Curve calculates the threshold value of random forest OTU model, wherein the threshold value of random forest OTU model is used for judgement sample phenotype.
With reference to first aspect, the embodiment of the invention provides a kind of the tenth possible embodiments of first aspect, wherein The important species of predetermined number are being filtered out from N number of species marker of first sample set based on enteron aisle Random Forest model Marker, after forming the second sample set, further includes: according to the Abundances of important species marker, calculate in the second sample set The Shannon index of each sample;Shannon index is added in the OTU table of important species marker, obtains third sample Collection;Based on third sample set, using a cross-validation method and Target Modeling method is stayed, Shannon exponential random forest mould is established Type, wherein Target Modeling method is the method for constructing enteron aisle Random Forest model.
With reference to first aspect, the embodiment of the invention provides the 12nd kind of possible embodiments of first aspect, wherein According to the Abundances of important species marker, the Shannon index of each sample in the second sample set is calculated, comprising: under utilization Formula calculates the Shannon index of each sample in the second sample set: S=- (P1x lnP1+P2x lnP2+...+Pn x lnPn);Wherein, P1, P2...Pn are the 1st, the 2nd ... Abundances of n-th of species marker of sample, and S is sample Shannon index.
With reference to first aspect, the embodiment of the invention provides the 13rd kind of possible embodiments of first aspect, wherein Based on third sample set, using a cross-validation method and Target Modeling method is stayed, Shannon exponential random forest model is established, It include: that M test set and M training set are constructed based on each sample in third sample set, wherein M is in third sample set The quantity of sample, when m-th of test set in M test set is m-th of sample, m-th of training set is the in M training set Other samples in three sample sets in addition to m-th of sample;M second son is established based on M training set and Target Modeling method Model, as Shannon exponential random forest model.
With reference to first aspect, the embodiment of the invention provides the 14th kind of possible embodiments of first aspect, wherein Method further include: using test set corresponding with M the second submodels in M test set, each second submodel is carried out Prediction obtains M the second prediction results;The 2nd ROC indicatrix is drawn according to M the second prediction results;Based on the 2nd ROC spy Curve is levied, the threshold value of Shannon exponential random forest model is calculated, wherein the threshold value of Shannon exponential random forest model is used In judgement sample phenotype.
Second aspect, the embodiment of the present invention provide a kind of determining device of human body intestinal canal Dysbiosis, comprising: data obtain Modulus block, for obtaining the intestinal flora data of target body, wherein intestinal flora data are to include multiple species markers One sample data;Model prediction module is carried out for intestinal flora data to be input in enteron aisle random forest prediction model Prediction obtains output result;Enteron aisle random forest prediction model is corresponding with the threshold value for judgement sample phenotype, and enteron aisle is gloomy at random Woods prediction model includes: random forest OTU model or Shannon exponential random forest model;As a result judgment module, for being based on Output result judges whether the intestinal microecology of target body is in imbalance state.
In conjunction with second aspect, the embodiment of the invention provides the first possible embodiments of second aspect, wherein intestines The quantity of road random forest prediction model is multiple;Model prediction module, is also used to: intestinal flora data are input to each intestines It is predicted in road random forest prediction model, obtains multiple output results;As a result judgment module is also used to: based on multiple defeated Result determines the prediction result of each enteron aisle random forest prediction model out, obtains multiple prediction results;And it is based on multiple predictions As a result whether the intestinal microecology of the ratio-dependent target body of middle target prediction result is in imbalance state, wherein target is pre- Surveying result is the prediction result for being in imbalance state for characterizing the intestinal microecology of target body in multiple prediction results.
In conjunction with second aspect, the embodiment of the invention provides second of possible embodiments of second aspect, wherein knot Fruit judgment module, is also used to: if in multiple prediction results, ratio shared by target prediction result is greater than the first preset ratio threshold Value, it is determined that the intestinal microecology for going out target body is in imbalance state;If in multiple prediction results, shared by target prediction result Ratio be greater than the second preset ratio threshold value, and less than the first preset ratio threshold value, then predicted based on multiple enteron aisle random forests The prediction result of target enteron aisle random forest prediction model determines whether the intestinal microecology of target body is in unbalance in model State, target enteron aisle random forest prediction model are enteron aisle corresponding to max-thresholds in multiple enteron aisle random forest prediction models Random forest prediction model, the second preset ratio threshold value is less than the first preset ratio threshold value.
In conjunction with second aspect, the embodiment of the invention provides the third possible embodiments of second aspect, wherein knot Fruit judgment module, is also used to: intestinal flora data are inputted respectively in enteron aisle random forest prediction model Ai, output result is obtained Bi, wherein it is the quantity of multiple enteron aisle random forest prediction models that i, which successively takes 1 to I, I,;If output result Bi is greater than or waits The threshold value corresponding to enteron aisle random forest prediction model Ai then obtains the intestinal microecology for characterizing target body and is in mistake The prediction result of weighing apparatus state;If exporting result Bi is less than threshold value corresponding to enteron aisle random forest prediction model Ai, used The prediction result of normal condition is in the intestinal microecology of characterization target body.
In conjunction with second aspect, the embodiment of the invention provides the 4th kind of possible embodiments of second aspect, wherein also It include: model construction module;Model construction module includes: that first sample set obtains module, for obtaining first sample set;First Sample set is the objective matrix formed based on intestinal flora data collected from the first human body group and the second human body group, The intestinal microecology of each human body is in normal condition in first human body group, and the enteron aisle of each human body is micro- in the second human body group Ecology is in imbalance state, and objective matrix is the matrix of M*N, and it includes M sample that first sample, which is concentrated, and N indicates each sample pair Answer N number of species marker;Objective matrix is by the corresponding N number of species marker of sample each in M sample for indicating species The Abundances of marker quantity form;Basic model establishes module, for being based on first sample set, establishes enteron aisle random forest mould Type;Second sample set obtains module, for being based on enteron aisle Random Forest model, from N number of species marker of first sample set, The important species marker of predetermined number is filtered out, the second sample set is formed;Prediction model constructs module, for being based on the second sample This collection constructs random forest OTU model.
In conjunction with second aspect, the embodiment of the invention provides the 5th kind of possible embodiments of second aspect, wherein pre- Model construction module is surveyed, is also used to: is random using staying a cross-validation method and Target Modeling method to establish based on the second sample set Forest OTU model, wherein Target Modeling method is the method for constructing enteron aisle Random Forest model.
In conjunction with second aspect, the embodiment of the invention provides the 6th kind of possible embodiments of second aspect, wherein base Plinth model building module, is also used to: according to sampling with replacement mode, carrying out M sampling to first sample set, obtains M training Collection, and decision tree is established for each training set, obtain M decision tree;Every decision tree in M decision tree is carried out at division Reason, obtains the M decision tree grown to greatest extent, corresponding to the split vertexes and each split vertexes with every decision tree of determination Disruptive features;The decision tree that M grows to greatest extent is combined, enteron aisle Random Forest model is obtained.
In conjunction with second aspect, the embodiment of the invention provides the 7th kind of possible embodiments of second aspect, wherein base Plinth model building module, is also used to: being each division in every decision tree in N number of species marker that first sample is concentrated Node selects corresponding disruptive features, carries out division processing to every decision tree to realize.
In conjunction with second aspect, the embodiment of the invention provides the 8th kind of possible embodiments of second aspect, wherein the Two sample sets obtain module, are also used to: by the importance importance function in enteron aisle Random Forest model, exporting N number of The ranking results of the importance of species marker;Species corresponding to predetermined number maximum importance preceding in ranking results are marked Object is as important species marker;Sequence taxonomical unit OTU table based on important species marker forms the second sample set.
In conjunction with second aspect, the embodiment of the invention provides the 9th kind of possible embodiments of second aspect, wherein pre- Model construction module is surveyed, is also used to: M test set and M training set are constructed based on each sample in the second sample set, In, M is the quantity of sample in the second sample set, when m-th of test set in M test set is m-th of sample, M training Concentrating m-th of training set is other samples in the second sample set in addition to m-th of sample;It is built based on M training set and target Mould method establishes M the first submodels, as random forest OTU model.
In conjunction with second aspect, the embodiment of the invention provides the tenth kind of possible embodiments of second aspect, wherein also Include: first threshold determining module, be used for: using test set corresponding with M the first submodels in M test set, to every A first submodel is predicted, M first output result is obtained;It is bent that the first ROC feature is drawn according to M first output result Line;Based on the first ROC indicatrix, the threshold value of random forest OTU model is calculated, wherein threshold value is used for judgement sample phenotype.
In conjunction with second aspect, the embodiment of the invention provides a kind of the tenth possible embodiments of second aspect, wherein Model construction module, further includes: third sample set obtains module, for the Abundances according to important species marker, calculates the The Shannon index of each sample in two sample sets;Shannon index is added in the OTU table of important species marker, is obtained To third sample set;Prediction model constructs module, is also used to based on third sample set, using staying a cross-validation method and target to build Mould method establishes Shannon exponential random forest model, wherein Target Modeling method is for constructing enteron aisle random forest mould The method of type.
In conjunction with second aspect, the embodiment of the invention provides the 12nd kind of possible embodiments of second aspect, wherein Third sample set obtains module, is also used to: using following formula, calculating the Shannon index of each sample in the second sample set: S=- (P1x lnP1+P2x lnP2+...+Pn x lnPn);Wherein, P1, P2...Pn are the 1st, the 2nd ... n-th of object of sample The Abundances of kind marker, S are the Shannon index of sample.
In conjunction with second aspect, the embodiment of the invention provides the 13rd kind of possible embodiments of second aspect, wherein Prediction model constructs module, is also used to: M test set and M training set are constructed based on each sample in third sample set, In, M is the quantity of sample in third sample set, when m-th of test set in M test set is m-th of sample, M training Concentrating m-th of training set is other samples in third sample set in addition to m-th of sample;It is built based on M training set and target Mould method establishes M the second submodels, as Shannon exponential random forest model.
In conjunction with second aspect, the embodiment of the invention provides the 14th kind of possible embodiments of second aspect, wherein Further include: second threshold determining module is used for: right using test set corresponding with M the second submodels in M test set Each second submodel is predicted, M the second prediction results are obtained;The 2nd ROC feature is drawn according to M the second prediction results Curve;Based on the 2nd ROC indicatrix, the threshold value of Shannon exponential random forest model is calculated, wherein threshold value is for judging Sample phenotype.
The third aspect, the embodiment of the present invention also provide a kind of electronic equipment, including memory, processor, deposit on memory The computer program that can be run on a processor is contained, processor is realized any in above-mentioned first aspect when executing computer program The step of method described in the possible embodiment of kind.
Fourth aspect, the embodiment of the present invention also provide a kind of meter of non-volatile program code that can be performed with processor Calculation machine readable medium, program code make processor execute method described in any possible embodiment in first aspect.
The embodiment of the present invention bring it is following the utility model has the advantages that
In the determination method of human body intestinal canal Dysbiosis provided in an embodiment of the present invention, the intestines of target body are obtained first Road flora data, which is a sample data comprising multiple species markers, by the intestinal flora data It is input in preparatory trained enteron aisle random forest prediction model and is predicted, obtain the corresponding output knot of the prediction model Fruit;Wherein, enteron aisle random forest prediction model can be it is multiple be also possible to one, each enteron aisle random forest prediction model pair Should have a threshold value of judgement sample phenotype, and enteron aisle random forest prediction model may include: random forest OTU model or Shannon exponential random forest model;Finally judge whether the intestinal microecology of target body is in based on above-mentioned output result Imbalance state.Through the above scheme, it quickly can relatively accurately judge whether target body is in the unbalance shape of intestinal microecology State.
Other features and advantages of the present invention will illustrate in the following description, also, partly become from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention are in specification, claims And specifically noted structure is achieved and obtained in attached drawing.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart of the determination method of human body intestinal canal Dysbiosis provided in an embodiment of the present invention;
Fig. 2 is the flow chart of the determination method of another human body intestinal canal Dysbiosis provided in an embodiment of the present invention;
Fig. 3 is the flow chart of the determination method of another human body intestinal canal Dysbiosis provided in an embodiment of the present invention;
Fig. 4 is the first sample set in a kind of determination method of human body intestinal canal Dysbiosis provided in an embodiment of the present invention In M*N matrix schematic diagram;
Fig. 5 is random for determining in a kind of determination method of human body intestinal canal Dysbiosis provided in an embodiment of the present invention The ROC curve figure of forest OTU model threshold;
Fig. 6 is the flow chart of the determination method of another human body intestinal canal Dysbiosis provided in an embodiment of the present invention;
Fig. 7 is in a kind of determination method of human body intestinal canal Dysbiosis provided in an embodiment of the present invention for determining The ROC curve figure of Shannon exponential random forest model threshold value;
Fig. 8 be in a kind of determination method of human body intestinal canal Dysbiosis provided in an embodiment of the present invention sequencing fragment to object Wealth of species analytical procedure schematic diagram;
Fig. 9 is a kind of schematic diagram of the determining device of human body intestinal canal Dysbiosis provided in an embodiment of the present invention;
Figure 10 is the schematic diagram of the determining device of another human body intestinal canal Dysbiosis provided in an embodiment of the present invention;
Figure 11 is the schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Currently, judging the method for human body intestinal canal Dysbiosis well not yet.Based on this, the embodiment of the present invention is provided A kind of determination method of human body intestinal canal Dysbiosis, can judge whether target body is in the micro- life of enteron aisle by prediction model State imbalance state.
For convenient for understanding the present embodiment, first to a kind of human body intestinal canal Tiny ecosystem disclosed in the embodiment of the present invention Unbalance determination method describes in detail.
The embodiment of the invention provides a kind of determination method of human body intestinal canal Dysbiosis, shown in Figure 1, this method The following steps are included:
Step S102 obtains the intestinal flora data of target body.
Wherein, intestinal flora data are a sample data comprising multiple species markers, such as species marker Number is N, then the intestinal flora data are exactly the matrix of a 1*N, the matrix by N number of species marker Abundances group At.
Intestinal flora data are input in enteron aisle random forest prediction model and predict, exported by step S104 As a result.
In specific implementation, above-mentioned enteron aisle random forest prediction model can be one, such as: random forest OTU model or Shannon exponential random forest model, be also possible to it is multiple, such as: by the sample set of the species marker of different number, being instructed Practise the multiple random forest OTU models come and/or multiple Shannon exponential random forest models.Wherein, each enteron aisle is random Forest prediction model is corresponding with the threshold value for judgement sample phenotype, and sample phenotype includes normal and unbalance.
Intestinal flora data are input in enteron aisle random forest prediction model and carry out by the case where for a prediction model After prediction, an output result is obtained.The case where for multiple prediction models, by intestinal flora data be input to multiple enteron aisles with After being predicted in machine forest prediction model, multiple output results are obtained.Above-mentioned output result is the prediction probability knot of model Fruit.
Step S106 judges whether the intestinal microecology of target body is in imbalance state based on output result.
It is the case where for a prediction model, corresponding with model by the output result after obtaining an output result The comparison of threshold value, judges whether the intestinal microecology of target body is in imbalance state.
Specific deterministic process are as follows:
Judge whether above-mentioned output result is greater than threshold value corresponding to enteron aisle random forest prediction model;If it is, really The intestinal microecology for making the target body is in imbalance state;If it is not, then determining that the enteron aisle of the target body is micro- Ecology is in normal condition.
The case where for multiple prediction models, is primarily based on multiple output results and determines after obtaining multiple output results The prediction result of each enteron aisle random forest prediction model, the ratio for being then based on target prediction result in multiple prediction results are true Whether the intestinal microecology of human body of setting the goal is in imbalance state, wherein target prediction result is to be used in multiple prediction results The intestinal microecology of characterization target body is in the prediction result of imbalance state.
By the model prediction of above-mentioned two situations, may be implemented whether to be in mistake to the intestinal microecology of target body Weighing apparatus state is judged.Judge by the way that whether a kind of intestinal microecology of model to target body is in imbalance state, it can To realize quickly accurate deterministic process, and predicted by above-mentioned a variety of trained models in advance, and based on multiple Prediction result is judged, can be improved the accuracy of judgement, and accuracy comparatively can be higher.
In a kind of possible embodiment, the above-mentioned ratio-dependent mesh based on target prediction result in multiple prediction results Whether the intestinal microecology of mark human body, which is in imbalance state, specifically includes following two situation:
(1) if in multiple prediction results, ratio shared by target prediction result is greater than the first preset ratio threshold value, it is determined that The intestinal microecology of target body is in imbalance state out.
Here, target prediction result is to be in unbalance for characterizing the intestinal microecology of target body in multiple prediction results The prediction result of state, that is, prediction result are unbalance.In practical applications, above-mentioned first preset ratio threshold value is generally set to 80%, i.e., in multiple prediction results, have greater than 80% prediction result be it is unbalance when, so that it may determine above-mentioned target The intestinal microecology of human body is in imbalance state.Certainly, above-mentioned first preset ratio threshold value is bigger, micro- to the enteron aisle of target body The judging result of ecology is more accurate.
(2) if in multiple prediction results, ratio shared by target prediction result is greater than the second preset ratio threshold value, and is less than First preset ratio threshold value, then based in multiple enteron aisle random forest prediction models target enteron aisle random forest prediction model it is pre- It surveys result and determines whether the intestinal microecology of target body is in imbalance state, target enteron aisle random forest prediction model is multiple Enteron aisle random forest prediction model corresponding to max-thresholds in enteron aisle random forest prediction model, wherein the second preset ratio Threshold value is less than the first preset ratio threshold value.
In practical application, the second preset ratio threshold value is generally set to 20%, that is to say, that in multiple prediction results, has When greater than 20% less than 80% prediction result being unbalance, further according in multiple enteron aisle random forest prediction models, The prediction result of that maximum enteron aisle random forest prediction model of threshold value, determines whether the intestinal microecology of target body is in Imbalance state.If the prediction result is unbalance, it is determined that the intestinal microecology of target body is in imbalance state, if this is pre- It surveys the result is that normal, it is determined that the intestinal microecology of target body is in normal condition.
Above-mentioned multiple enteron aisle random forest prediction models have respectively corresponded a threshold value, pre- based on multiple enteron aisle random forests The output result for surveying model determines that the process of the prediction result of each enteron aisle random forest prediction model includes the following steps, referring to Shown in Fig. 2:
Step S202 inputs intestinal flora data in enteron aisle random forest prediction model Ai respectively, obtains output result Bi, wherein it is the quantity of multiple enteron aisle random forest prediction models that i, which successively takes 1 to I, I,;
Step S204, if output result Bi is more than or equal to threshold value corresponding to enteron aisle random forest prediction model Ai, Then obtain the prediction result that the intestinal microecology for characterizing target body is in imbalance state;
Step S206 is used if output result Bi is less than threshold value corresponding to enteron aisle random forest prediction model Ai The prediction result of normal condition is in the intestinal microecology of characterization target body.
Wherein, the threshold value corresponding to enteron aisle random forest prediction model for judgement sample phenotype is by complicated mistake The optimal threshold (specific screening process sees below described) that journey is screened from numerous threshold values, passes through above-mentioned output result Obtained prediction result is more accurate compared with the threshold value, then based on imbalance state in multiple accurate prediction results Shared ratio determines whether the intestinal microecology of target body is unbalance as a result, the result that this mode is determined Accuracy is higher.
The building process of the random forest OTU model in enteron aisle random forest prediction model is described in detail below, It is shown in Figure 3, the construction method of the prediction model the following steps are included:
Step S302 obtains first sample set.
Wherein, first sample set is based on the collected intestinal flora number from the first human body group and the second human body group According to the objective matrix of composition, the intestinal microecology of each human body is in normal condition, the second human body group in the first human body group In the intestinal microecology of each human body be in imbalance state, objective matrix is the matrix of M*N, and it includes M sample that first sample, which is concentrated, Product, N indicate that each sample corresponds to N number of species marker;Objective matrix is by M sample, the corresponding N number of species mark of each sample Remember the Abundances composition of object.
Taking human as example, the first human body group is the normal crowd of intestinal microecology, and the second human body group is intestinal microecology Unbalance crowd, such as: the people that first sample set is unbalance by the normal people of 500 intestinal microecologies and 500 intestinal microecologies Intestinal microecology species marker data composition, in general, can be by everyone intestinal microecology species marker Data regard a sample as, and each sample includes N (such as 1000) a species marker, and each sample is labeled as a table Type is such as normal or unbalance.As shown in figure 4, M*N matrix is by M sample, the use of the corresponding N number of species marker of each sample In the Abundances composition for indicating species marker quantity.
Step S304 is based on first sample set, establishes enteron aisle Random Forest model.
Specifically, the following steps are included:
(1) according to sampling with replacement mode, M sampling is carried out to first sample set, obtains M training set, and be each Training set establishes decision tree, obtains M decision tree.
(2) division processing is carried out to every decision tree in M decision tree, obtains the M decision tree grown to greatest extent, with Determine disruptive features corresponding to the split vertexes and each split vertexes of every decision tree.
Carrying out division processing to every decision tree in M decision tree includes:
It is that each split vertexes selection is corresponding in every decision tree from N number of species marker that first sample is concentrated Disruptive features carry out division processing to every decision tree to realize.
(3) decision tree that M grows to greatest extent is combined, obtains enteron aisle Random Forest model.
Specifically, vacation lets m represent training sample number, N indicates the number of features in sample, the building process of random forest It is as follows:
1) input feature vector number N, for determining the result of decision of a node on decision tree;Wherein n should be much smaller than N.
2) it from M training sample in a manner of sampling with replacement, samples M times, forms a training set, and with not being extracted into Sample predict, assess its error.
3) for each node, n feature is randomly choosed, the decision of each node is all based on these spies on decision tree Sign determination.According to n feature, its optimal divisional mode is calculated.
4) each tree all can completely grow up without beta pruning, this is possible to after having built a normal tree classifier can quilt Using.
5) it repeats the above steps, constructs other many decision trees, until reaching a group decision tree of predetermined number, Random forest is built.
In specific implementation, for sample, by the way of putting back to.For species marker, from N number of species marker In, select n (n < N), it may be assumed that when each sample has N number of attribute, when each node of decision tree needs to divide, at random from this N N attribute is selected in a attribute.Decision tree is established out using the mode of fully nonlinear water wave to the data after sampling.Division is done Method is: using the process of above said column sampling using certain tactful (such as some species marker) from this n attribute To select Split Attribute of 1 attribute as the node.Each node will be by the side of fully nonlinear water wave in decision tree forming process Formula divides, until can not divide again until.
Mtry parameter is the variable number of random sampling when constructing decision tree branches in random forest modeling.Selection is suitable Mtry parameter value can reduce the prediction error rate of Random Forest model.Such as have n species marker, it can be set by traversal Determining Mtry parameter is 1 to m to carry out m modeling, and obtains the error rate modeled every time, the minimum Mtry parameter of selection error rate into The modeling of row Random Forest model.
Step S306 is filtered out pre- from N number of species marker of first sample set based on enteron aisle Random Forest model If the important species marker of number, the second sample set is formed.
Specific screening process the following steps are included:
(1) by the importance importance function in enteron aisle Random Forest model, the weight of N number of species marker is exported The ranking results for the property wanted.
(2) it is marked using species marker corresponding to predetermined number maximum importance preceding in ranking results as important species Object.
(3) the sequence taxonomical unit OTU table based on important species marker forms the second sample set.
In practical applications, it is proved by experimental data, preceding 15 species markers are as important in selection ranking results Species marker, composition sample carry out model training, and the precision of prediction of finally obtained model is higher.It is sieved by the above process The important species marker of choosing is as shown in the table:
Step S308 constructs random forest OTU model based on the second sample set.
Specifically, the second sample set is based on, using staying a cross-validation method and Target Modeling method to establish random forest OTU Model, wherein Target Modeling method is the method for constructing enteron aisle Random Forest model.
In specific implementation, M test set and M training set are constructed based on each sample in the second sample set, wherein M is the quantity of sample in the second sample set, when m-th of test set in M test set is m-th of sample, in M training set M-th of training set is other samples in the second sample set in addition to m-th of sample;
Based on M training set and Target Modeling method, M the first submodels are established, as random forest OTU (Operational Taxonomic Unit) model.
It is above-mentioned to stay in a cross-validation method, it is assumed that sample data concentration has M sample data.By each sample separately as Test set, remaining M-1 sample is as training set, and this results in M classifier or models, with this M classifier or model Classification accuracy performance indicator of the average as this classifier.Since each classifier or model are with almost institute Some samples carry out training pattern, and closest to sample, it is reliable to assess resulting result in this way.No enchancement factor is tested, it is whole A process is repeatable.
By the comparison of threshold value and the output result of model, prediction result can be just obtained, therefore, the above method further includes meter The process of model threshold is calculated, as follows:
(1) using test set corresponding with M the first submodels in M test set, each first submodel is carried out Prediction obtains M first output result.
(2) the first ROC (Receiver Operating Characteristic is drawn according to M first output result Curve) indicatrix.
(3) it is based on the first ROC indicatrix, calculates the threshold value of random forest OTU model, wherein threshold value is for judging sample Product phenotype.
This method is M sample (the normal people of such as 500 intestinal microecologies and the unbalance people of 500 intestinal microecologies) In, a personal accomplishment test set is randomly choosed, the data of remaining crowd are as training set according to above-mentioned enteron aisle Random Forest model Method for building up is modeled, and is predicted test set.The above process executes repeatedly, until each individual is primary by selection As test set, each individual prediction probability result is counted.ROC curve (subject's work is drawn according to statistical forecast probability results Make indicatrix) and AUC (Area Under Curve) value (region area that AUC value is covered by ROC curve, it is clear that AUC Bigger, classifier classifying quality is better), and then specificity is calculated, susceptibility, final choice goes out optimal judgement sample phenotype The threshold value of (normal or intestinal flora is unbalance).Fig. 5 is a kind of ROC curve provided by the embodiment of the present invention, on ROC curve, It is susceptibility and the higher critical value of specificity, i.e. optimal threshold near the upper left point of coordinate diagram.In order to obtain ROC song The optimal threshold of line is needed using an index -- youden index, also referred to as correct index.Youden index is sensitivity and specificity The sum of subtract 1.The threshold value determined from Fig. 5 is 0.628, and model distinguishes normal specimens and the unbalance sample of intestinal microecology Sensitivity is 0.737 (73.7%), and specificity is 0.957 (95.7%).
In alternatively possible embodiment, it is being based on enteron aisle Random Forest model, from N number of species of first sample set In marker, the important species marker of predetermined number is filtered out, further comprising the steps of after forming the second sample set, ginseng As shown in Figure 6:
Step S602 calculates the Shannon of each sample in the second sample set according to the Abundances of important species marker Index.
In specific implementation, using following formula, the Shannon index of each sample in the second sample set is calculated:
S=- (P1x lnP1+P2x lnP2+...+Pn x lnPn);
Wherein, P1、P2...PnFor the 1st, the 2nd ... Abundances of n-th of species marker of sample, S is sample Shannon index.
Chao1, ACE, Shannon, npShannon, Simpson and Good ' s Coverage are that the common α of six major class is more Sample sex index.On the whole, the species richness information of Chao1/ACE index major concern sample;Good ' s Coverage is anti- Reflect the low abundance OTU coverage condition of sample;Simpson/Shannon/npShannon mainly the comprehensive richness for embodying species and The uniformity.Shannon index and Chao1, ACE is different, and Chao1 and ACE are mainly used for calculating the richness of species (Richness), more lie in whether sample has this species.And Shannon index is not only concerned species richness, and simultaneously The uniformity (Evenness) of species is concerned about, so being the more comprehensive reaction to structure of community.
The formula for calculating index of species diversity has very much, and form is different, and essence is much the same.Most of diversity In index, the biological species for forming group are more, and diversity refers to that value is bigger.
Shannon index is added in the OTU table of important species marker, obtains third sample set by step S604.
Step S606 establishes Shannon using a cross-validation method and Target Modeling method is stayed based on third sample set Exponential random forest model, wherein Target Modeling method is the method for constructing enteron aisle Random Forest model.
The modeling process of specific Shannon exponential random forest model with above-mentioned random forest OTU model establishment process, It is specific as follows:
M test set and M training set are constructed based on each sample in third sample set, wherein M is third sample set The quantity of middle sample, when m-th of test set in M test set is m-th of sample, m-th of training set is in M training set Other samples in third sample set in addition to m-th of sample.
M the second submodels are established based on M training set and Target Modeling method, as Shannon exponential random forest Model.
Equally, the above method further includes the calculating process of the threshold value of Shannon exponential random forest model, as follows:
(1) using test set corresponding with M the second submodels in M test set, each second submodel is carried out Prediction obtains M the second prediction results.
(2) the 2nd ROC indicatrix is drawn according to M the second prediction results.
(3) it is based on the 2nd ROC indicatrix, calculates the threshold value of Shannon exponential random forest model, wherein threshold value is used In judgement sample phenotype.
In the present embodiment, the Shannon index of each sample is added to 15 above-mentioned as one new " OTU " In important species marker OTU table, the tables of data (15 OTU+Shannon indexes) of new 16 OTU composition is formed.It is right This new OTU table, using staying a cross-validation method (Leave one out cross validation) to establish Shannon index Random Forest model.This method is to randomly choose a people to 500 normal populations and the unbalance crowd of 500 intestinal microecologies to make For test set, remaining crowd is modeled as training set according to above-mentioned enteron aisle Random Forest model method for building up, and to test Collection is predicted.For the above process until each individual is selected once as test set, each individual can only be by selection one It is secondary, count each individual prediction probability result.ROC curve (Receiver Operating Characteristics are drawn according to the prediction probability result of statistics Curve) and AUC value, calculate specificity, susceptibility, to select optimal judgment threshold.Fig. 7 indicates the ROC curve of this example, Judgment threshold is 0.717, and it is 0.868 that this model, which distinguishes normal specimens and the sensitivity of the unbalance sample of intestinal microecology, (86.8%), specificity is 0.826 (82.6%).Compared with a upper model, AUC value is improved, and sensitivity improves.
Below to the acquisition process of the intestinal flora data of target body, i.e. OUT table generating process is explained in detail:
1. the preliminary treatment for the data that gene sequencer obtains:
High-flux sequence has produced the DNA sequence data of magnanimity, how to carry out bioinformatics processing to these data, The information that extracting has biological significance is an important ring for entire project.The present embodiment is using 16S rRNA high-flux sequence The Bioinformatics Platform of data processing.This Platform integration common data base resource and some open source softwares, have independently in Linux Perl, Python, R LISP program LISP that operating system is write analyze 16S rRNA data.
Data quality control: it usually will appear the sequencing mistake such as some point mutation, and sequence end in high-flux sequence Quality it is relatively low, in order to obtain higher quality and more accurate analysis of biological information as a result, it is desirable to sequencing initial data into Row optimization processing.High-flux sequence data generally comprise sequence information and sequencing quality data, to sequencing result original image number The identification of image base (Base calling) is carried out according to using software CASAVA (v1.8.2), preliminary quality analysis is sequenced Initial data rawdata, as a result with FASTQ stored in file format, wherein including sequencing sequence information (FASTQ format secondary series) Corresponding sequencing quality information (column of FASTQ format the 4th).
Optimization Steps and parameter:
1) primer and joint sequence are removed, removal both ends mass value is lower than 25 base, and it is low to draw window method removal average quality In 25 base.
2) two sequences are compared using pandaseq software, are spliced according to the end overlay region of comparison, is gone It is greater than the sequence that 480bp is less than 400bp in splicing result except, containing the sequence of N, removing in splicing result.
3) filtered sequence will be spliced above to be compared with database, removes chimera sequence therein, obtains most Whole valid data.
Quality examination further includes removing the sequence of poor quality's (reading length is abnormal, base identification is fuzzy etc.), is then carried out embedding Zoarium checks, if there is the pollution of chloroplaset and mtDNA sequence, it should first this partial sequence be rejected, the number of high-flux sequence There can be some chimeras in, when data processing will also remove this partial sequence.Finally according to the kind of OTU matrix and imparting Relationship is for statistical analysis, will select software such as MAFFT, Blastn or systematic open source software such as RDP, QIIM etc..
2.OTU is calculated:
May there are several hundred or thousands of a microorganisms in one sample, 16S rRNA sequencing measures each microorganism The variable region fragment of 16SrRNA gene, as soon as it can simply be interpreted as a strain only one individual, machine in sample A segment of this kind is measured, another strain compares more (abundance is high) in this sample, for example there is 100 individuals, Machine just measures the segment of 100 strains.It is to represent microorganisms that theoretically machine, which measures the identical segment of sequence come, Type.When the sequence come out to machine is analyzed, these segments are first carried out classification analysis by the similarity of sequence (Clustering), be classified as many collection, general 97% it is similar be just classified as a collection, this collection is exactly an OTU, OTU is the abbreviation of Operational Taxonomic Units, can be described as sequence taxonomical unit.A general OTU is one corresponding Microbial species, by compared with standard microorganism gene pool such as sequence, so that it may know it is which species.And each OTU is one The number measured in a sample is exactly abundance of the microbial species in this sample.
OTU quantity and relative abundance represent the diversity actually observed in sample, therefore calculate OTU in different samples Distribution, is one extremely important step of high-flux sequence Data processing.Before calculating OTU, sequence will be compared (alignment).The algorithm of comparison have very much, cluster (clustering) method also there are many, selected by different software Algorithm have nothing in common with each other, as to method (pairwisealignment), used Qiime platform point in contrast with the application of ESPRIT platform Analysis method: carrying out OTU cluster using UCLUST method, and sequence similarity is set as 97% in OTU, obtains OTU list and OTU is represented Property sequence.Sequence, which is represented, using OTU of the RDP classifier bayesian algorithm to 97% similar level carries out Taxonomic analysis, And formed in the group of each sample of each level statistic, as shown in Figure 8.
Fig. 9 shows a kind of determining device of human body intestinal canal Dysbiosis provided in an embodiment of the present invention, comprising: data Obtain module 91, model prediction module 92 and result judgment module 93.
Wherein, data acquisition module 91, for obtaining the intestinal flora data of target body, wherein intestinal flora data For a sample data comprising multiple species markers;Model prediction module 92, for intestinal flora data to be input to intestines It is predicted in road random forest prediction model, obtains output result;Enteron aisle random forest prediction model is corresponding with for judging The threshold value of sample phenotype, enteron aisle random forest prediction model include: random forest OTU model or Shannon exponential random forest Model;As a result judgment module 93, for judging whether the intestinal microecology of target body is in imbalance state based on output result.
When the quantity of enteron aisle random forest prediction model is multiple, model prediction module 92 is also used to: by intestinal flora Data are input in each enteron aisle random forest prediction model and are predicted, obtain multiple output results;As a result judgment module 93, It is also used to: determining the prediction result of each enteron aisle random forest prediction model based on multiple output results, obtain multiple prediction knots Fruit;And whether the intestinal microecology based on the ratio-dependent target body of target prediction result in multiple prediction results is in unbalance State, wherein target prediction result is to be in unbalance shape for characterizing the intestinal microecology of target body in multiple prediction results The prediction result of state.
In specific implementation, as a result judgment module 93, are also used to: if in multiple prediction results, shared by target prediction result Ratio be greater than the first preset ratio threshold value, it is determined that go out target body intestinal microecology be in imbalance state;If multiple pre- It surveys in result, ratio shared by target prediction result is greater than the second preset ratio threshold value, and less than the first preset ratio threshold value, then Target body is determined based on the prediction result of target enteron aisle random forest prediction model in multiple enteron aisle random forest prediction models Intestinal microecology whether be in imbalance state, target enteron aisle random forest prediction model is that multiple enteron aisle random forests predict mould Enteron aisle random forest prediction model corresponding to max-thresholds in type, the second preset ratio threshold value is less than the first preset ratio threshold Value, the corresponding threshold value of each enteron aisle random forest prediction model.
In specific implementation, as a result judgment module 93, are also used to: intestinal flora data are inputted enteron aisle random forest respectively In prediction model Ai, output result Bi is obtained, wherein it is the number of multiple enteron aisle random forest prediction models that i, which successively takes 1 to I, I, Amount;If exporting result Bi is more than or equal to threshold value corresponding to enteron aisle random forest prediction model Ai, obtain for characterizing The intestinal microecology of target body is in the prediction result of imbalance state;Mould is predicted if exporting result Bi and being less than enteron aisle random forest Threshold value corresponding to type Ai then obtains the prediction result that the intestinal microecology for characterizing target body is in normal condition.
In another embodiment, the determining device of above-mentioned human body intestinal canal Dysbiosis further include: model construction mould Block 94, shown in Figure 10, model construction module 94 specifically includes: first sample set obtains module 941, basic model establishes mould Block 942, the second sample set obtain module 943 and prediction model constructs module 944.
Wherein, first sample set obtains module 941, for obtaining first sample set;First sample set is based on from first The objective matrix of collected intestinal flora data composition in human body group and the second human body group, it is each in the first human body group The intestinal microecology of human body is in normal condition, and the intestinal microecology of each human body is in imbalance state in the second human body group, Objective matrix is the matrix of M*N, and it includes M sample that first sample, which is concentrated, and N indicates that each sample corresponds to N number of species marker;Mesh Mark matrix is made of the Abundances of the corresponding N number of species marker of sample each in M sample;Basic model establishes module 942, For being based on first sample set, enteron aisle Random Forest model is established;Second sample set obtain module 943, for based on enteron aisle with Machine forest model filters out the important species marker of predetermined number from N number of species marker of first sample set, composition Second sample set;Prediction model constructs module 944, for constructing random forest OTU model based on the second sample set.
In specific implementation, prediction model constructs module 944, is also used to: based on the second sample set, using staying an intersection to test Demonstration and Target Modeling method establish random forest OTU model, wherein Target Modeling method is for constructing enteron aisle random forest The method of model.
Above-mentioned basic model establishes module 942, is also used to: according to sampling with replacement mode, carrying out M times to first sample set Sampling obtains M training set, and establishes decision tree for each training set, obtains M decision tree;Certainly to every in M decision tree Plan tree carries out division processing, obtains the M decision tree grown to greatest extent, with the split vertexes of every decision tree of determination and each Disruptive features corresponding to split vertexes;The decision tree that M grows to greatest extent is combined, enteron aisle random forest mould is obtained Type.
In another embodiment, basic model establishes module 942, is also used to: in N number of species that first sample is concentrated In marker, be that each split vertexes select corresponding disruptive features in every decision tree, thus realize to every decision tree into Line splitting processing.
In specific implementation, the second sample set obtains module 943, is also used to: by the weight in enteron aisle Random Forest model The property wanted importance function exports the ranking results of the importance of N number of species marker;By predetermined number preceding in ranking results Species marker corresponding to maximum importance is as important species marker;Sequence grouping sheet based on important species marker Position OTU table, forms the second sample set.
In another embodiment, prediction model constructs module 944, is also used to: based on each of second sample set Sample constructs M test set and M training set, wherein M for sample in the second sample set quantity, when the in M test set When m test set is m-th of sample, in M training set m-th of training set in the second sample set in addition to m-th of sample Other samples;Based on M training set and Target Modeling method, M the first submodels are established, as random forest OTU model.
In alternatively possible embodiment, the determining device of above-mentioned human body intestinal canal Dysbiosis further include: first Threshold determination module 946, is used for: using test set corresponding with M the first submodels in M test set, to each first Submodel is predicted, M first output result is obtained;The first ROC indicatrix is drawn according to M first output result;Base In the first ROC indicatrix, the threshold value of random forest OTU model is calculated, wherein threshold value is used for judgement sample phenotype.
In alternatively possible embodiment, above-mentioned model construction module 94, further includes: third sample set obtains module 945, for the Abundances according to important species marker, calculate the Shannon index of each sample in the second sample set;It will Shannon index is added in the OTU table of important species marker, obtains third sample set;Prediction model constructs module 944, It is also used to establish Shannon exponential random forest using a cross-validation method and Target Modeling method is stayed based on third sample set Model, wherein Target Modeling method is the method for constructing enteron aisle Random Forest model.
In specific implementation, third sample set obtains module 945, is also used to: using following formula, calculating every in the second sample set The Shannon index of a sample:
S=- (P1x lnP1+P2x lnP2+...+Pn x lnPn);
Wherein, P1、P2...PnFor the 1st, the 2nd ... Abundances of n-th of species marker of sample, S is sample Shannon index.
Above-mentioned prediction model constructs module 944, is also used to: constructing M test based on each sample in third sample set Collection and M training set, wherein M is the quantity of sample in third sample set, when m-th of test set in M test set is m When a sample, m-th of training set is other samples in third sample set in addition to m-th of sample in M training set;Based on M A training set and Target Modeling method establish M the second submodels, as Shannon exponential random forest model.
In alternatively possible embodiment, the determining device of above-mentioned human body intestinal canal Dysbiosis further include: second Threshold determination module 947, is used for: using test set corresponding with M the second submodels in M test set, to each second Submodel is predicted, M the second prediction results are obtained;The 2nd ROC indicatrix is drawn according to M the second prediction results;Base In the 2nd ROC indicatrix, the threshold value of Shannon exponential random forest model is calculated, wherein threshold value is used for judgement sample table Type.
In the determining device of human body intestinal canal Dysbiosis provided by the embodiment of the present invention, modules and aforementioned human body Therefore above-mentioned function equally may be implemented in the determination method technical characteristic having the same that intestinal microecology is unbalance.In the present apparatus The specific work process of modules is referring to above method embodiment, and details are not described herein.
Figure 11 shows a kind of electronic equipment provided by the embodiment of the present invention, which includes: processor 110, Memory 111, bus 112 and communication interface 113, the processor 110, communication interface 113 and memory 111 pass through bus 112 connections;Processor 110 is for executing the executable module stored in memory 111, such as computer program.Processor is held The step of method as described in embodiment of the method is realized when row computer program.
Wherein, memory 111 may include high-speed random access memory (RAM, RandomAccessMemory), can also It can further include nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.Pass through at least one A communication interface 113 (can be wired or wireless) realizes the communication link between the system network element and at least one other network element It connects, internet, wide area network, local network, Metropolitan Area Network (MAN) etc. can be used.
Bus 112 can be isa bus, pci bus or eisa bus etc..The bus can be divided into address bus, number According to bus, control bus etc..Only to be indicated with a four-headed arrow in Figure 11 convenient for indicating, it is not intended that only one total Line or a type of bus.
Wherein, memory 111 is for storing program, and the processor 110 executes the journey after receiving and executing instruction Sequence, method performed by the device that the stream process that aforementioned any embodiment of the embodiment of the present invention discloses defines can be applied to handle In device 110, or realized by processor 110.
Processor 110 may be a kind of IC chip, the processing capacity with signal.It is above-mentioned during realization Each step of method can be completed by the integrated logic circuit of the hardware in processor 110 or the instruction of software form.On The processor 110 stated can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (Digital Signal Processing, abbreviation DSP), specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), ready-made programmable gate array (Field-Programmable Gate Array, abbreviation FPGA) or Person other programmable logic device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute sheet Disclosed each method, step and logic diagram in inventive embodiments.General processor can be microprocessor or the processing Device is also possible to any conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in Hardware decoding processor executes completion, or in decoding processor hardware and software module combination execute completion.Software mould Block can be located at random access memory, flash memory, read-only memory, programmable read only memory or electrically erasable programmable storage In the storage medium of this fields such as device, register maturation.The storage medium is located at memory 111, and processor 110 reads memory Information in 111, in conjunction with the step of its hardware completion above method.
The computer program product of the determination method of human body intestinal canal Dysbiosis provided by the embodiment of the present invention, including Store the computer readable storage medium of the executable non-volatile program code of processor, the finger that said program code includes Order can be used for executing previous methods method as described in the examples, and specific implementation can be found in embodiment of the method, and details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description And the specific work process of electronic equipment, it can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
The flow chart and block diagram in the drawings show multiple embodiment method and computer program products according to the present invention Architecture, function and operation in the cards.In this regard, each box in flowchart or block diagram can represent one A part of module, section or code, a part of the module, section or code include it is one or more for realizing The executable instruction of defined logic function.It should also be noted that in some implementations as replacements, function marked in the box It can also can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be substantially parallel Ground executes, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram And/or the combination of each box in flow chart and the box in block diagram and or flow chart, it can the function as defined in executing Can or the dedicated hardware based system of movement realize, or can come using a combination of dedicated hardware and computer instructions real It is existing.
In the description of the present invention, it should be noted that term " center ", "upper", "lower", "left", "right", "vertical", The orientation or positional relationship of the instructions such as "horizontal", "inner", "outside" be based on the orientation or positional relationship shown in the drawings, merely to Convenient for description the present invention and simplify description, rather than the device or element of indication or suggestion meaning must have a particular orientation, It is constructed and operated in a specific orientation, therefore is not considered as limiting the invention.In addition, term " first ", " second ", " third " is used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of logical function partition, there may be another division manner in actual implementation, in another example, multiple units or components can To combine or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or beg for The mutual coupling, direct-coupling or communication connection of opinion can be through some communication interfaces, device or unit it is indirect Coupling or communication connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in the executable non-volatile computer-readable storage medium of a processor.Based on this understanding, of the invention Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words The form of product embodies, which is stored in a storage medium, including some instructions use so that One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the present invention State all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read- Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can be with Store the medium of program code.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (27)

1. a kind of determination method of human body intestinal canal Dysbiosis characterized by comprising
Obtain the intestinal flora data of target body, wherein the intestinal flora data are the sample comprising multiple species markers Notebook data;
The intestinal flora data are input in enteron aisle random forest prediction model and are predicted, output result is obtained;It is described Enteron aisle random forest prediction model is corresponding with the threshold value for judgement sample phenotype, the enteron aisle random forest prediction model packet It includes: random forest OTU model or Shannon exponential random forest model;
Judge whether the intestinal microecology of the target body is in imbalance state based on the output result;
The quantity of the enteron aisle random forest prediction model is multiple;
The intestinal flora data are input in enteron aisle random forest prediction model and are predicted, output result is obtained, comprising: The intestinal flora data are input in each enteron aisle random forest prediction model and are predicted, multiple output knots are obtained Fruit;
Judge whether the intestinal microecology of the target body is in imbalance state based on the multiple output result, comprising: base The prediction result of each enteron aisle random forest prediction model is determined in each output result, obtains multiple prediction knots Fruit;And the intestinal microecology of target body described in the ratio-dependent based on target prediction result in the multiple prediction result whether In imbalance state, wherein the target prediction result is in the multiple prediction result for characterizing the target body Intestinal microecology is in the prediction result of the imbalance state;
Whether the intestinal microecology of target body described in the ratio-dependent based on target prediction result in the multiple prediction result In imbalance state, comprising:
If in the multiple prediction result, ratio shared by the target prediction result is greater than the first preset ratio threshold value, then really The intestinal microecology for making the target body is in imbalance state;
If in the multiple prediction result, ratio shared by the target prediction result is greater than the second preset ratio threshold value, and small It is in the first preset ratio threshold value, then pre- based on target enteron aisle random forest in the multiple enteron aisle random forest prediction model The prediction result for surveying model determines whether the intestinal microecology of the target body is in imbalance state, and the target enteron aisle is random Forest prediction model is the prediction of enteron aisle random forest corresponding to max-thresholds in the multiple enteron aisle random forest prediction model Model, the second preset ratio threshold value are less than the first preset ratio threshold value.
2. the method according to claim 1, wherein determining each enteron aisle based on each output result The prediction result of random forest prediction model, comprising:
The intestinal flora data are inputted respectively in enteron aisle random forest prediction model Ai, obtain output result Bi, wherein i Successively taking 1 to I, I is the quantity of the multiple enteron aisle random forest prediction model;
If the output result Bi is more than or equal to threshold value corresponding to the enteron aisle random forest prediction model Ai, obtain Intestinal microecology for characterizing the target body is in the prediction result of imbalance state;
If the output result Bi is less than threshold value corresponding to the enteron aisle random forest prediction model Ai, obtain for characterizing The intestinal microecology of the target body is in the prediction result of normal condition.
3. the method according to claim 1, wherein construct the random forest OTU model as follows, It specifically includes:
Obtain first sample set;The first sample set is based on collected from the first human body group and the second human body group The objective matrix of intestinal flora data composition, the intestinal microecology of each human body is in normal shape in first human body group State, the intestinal microecology of each human body is in the imbalance state in second human body group, and the objective matrix is M*N's Matrix, it includes M sample that the first sample, which is concentrated, and N indicates that each sample corresponds to N number of species marker;The objective matrix It is made of the Abundances for indicating species marker quantity of the corresponding N number of species marker of sample each in M sample;
Based on the first sample set, enteron aisle Random Forest model is established;
Predetermined number is filtered out from N number of species marker of the first sample set based on the enteron aisle Random Forest model Important species marker, form the second sample set;
The random forest OTU model is constructed based on second sample set.
4. according to the method described in claim 3, it is characterized in that, constructing the random forest based on second sample set OTU model, comprising:
Based on second sample set, using staying a cross-validation method and Target Modeling method to establish the random forest OTU mould Type, wherein the Target Modeling method is the method for constructing the enteron aisle Random Forest model.
5. according to the method described in claim 4, it is characterized in that, the Target Modeling method, comprising:
According to sampling with replacement mode, M sampling is carried out to the first sample set, obtains M training set, and is each described Training set establishes decision tree, obtains M decision tree;
Division processing is carried out to every decision tree in the M decision tree, the M decision tree grown to greatest extent is obtained, with true Disruptive features corresponding to the split vertexes and each split vertexes of fixed every decision tree;
The decision tree that described M grows to greatest extent is combined, the enteron aisle Random Forest model is obtained.
6. according to the method described in claim 5, it is characterized in that, being divided to every decision tree in the M decision tree Processing, comprising:
It is that each split vertexes selection is corresponding in every decision tree in N number of species marker that the first sample is concentrated Disruptive features carry out division processing to every decision tree to realize.
7. according to the method described in claim 3, it is characterized in that, the enteron aisle Random Forest model is based on, from described first In N number of species marker of sample set, the important species marker of predetermined number is filtered out, forms the second sample set, comprising:
By the importance importance function in the enteron aisle Random Forest model, N number of species marker is exported The ranking results of importance;
Using species marker corresponding to predetermined number maximum importance preceding in the ranking results as the important species mark Remember object;
Sequence taxonomical unit OTU table based on the important species marker forms second sample set.
8. according to the method described in claim 4, it is characterized in that, second sample set is based on, using staying a cross validation Method and Target Modeling method establish the random forest OTU model, comprising:
M test set and M training set are constructed based on each sample in second sample set, wherein M is second sample The quantity of this concentration sample, when m-th of test set in the M test set is m-th of sample, in the M training set M-th of training set is other samples in second sample set in addition to m-th of sample;
Based on the M training set and the Target Modeling method, M the first submodels are established, as the random forest OTU Model.
9. according to the method described in claim 8, it is characterized in that, the method also includes:
Using test set corresponding with the M the first submodels in the M test set, to each first submodel It is predicted, obtains M first output result;
The first ROC indicatrix is drawn according to the M first output result;
Based on the first ROC indicatrix, the threshold value of the random forest OTU model is calculated, wherein the random forest The threshold value of OTU model is used for judgement sample phenotype.
10. according to the method described in claim 3, it is characterized in that, being based on the enteron aisle Random Forest model, from described the In N number of species marker of one sample set, the important species marker of predetermined number is filtered out, after forming the second sample set, Further include:
According to the Abundances of the important species marker, the Shannon index of each sample in second sample set is calculated;
The Shannon index is added in the OTU table of the important species marker, obtains third sample set;
It is gloomy to establish Shannon exponential random using a cross-validation method and Target Modeling method is stayed based on the third sample set Woods model, wherein the Target Modeling method is the method for constructing the enteron aisle Random Forest model.
11. according to the method described in claim 10, it is characterized in that, being counted according to the Abundances of the important species marker Calculate the Shannon index of each sample in second sample set, comprising:
Using following formula, the Shannon index of each sample in second sample set is calculated:
S=- (P1 x lnP1+P2 x lnP2+...+Pn x lnPn);
Wherein, P1、P2...PnFor the 1st, the 2nd ... Abundances of n-th of species marker of sample, S is the sample Shannon index.
12. according to the method described in claim 10, it is characterized in that, the third sample set is based on, using staying an intersection to test Demonstration and Target Modeling method establish Shannon exponential random forest model, comprising:
M test set and M training set are constructed based on each sample in the third sample set, wherein M is the third sample The quantity of this concentration sample, when m-th of test set in the M test set is m-th of sample, in the M training set M-th of training set is other samples in the third sample set in addition to m-th of sample;
M the second submodels are established based on the M training set and the Target Modeling method, as the Shannon index Random Forest model.
13. according to the method for claim 12, which is characterized in that the method also includes:
Using test set corresponding with M the second submodels in the M test set, each second submodel is carried out Prediction obtains M the second prediction results;
The 2nd ROC indicatrix is drawn according to the M the second prediction results;
Based on the 2nd ROC indicatrix, the threshold value of the Shannon exponential random forest model is calculated, wherein described The threshold value of Shannon exponential random forest model is used for judgement sample phenotype.
14. a kind of determining device of human body intestinal canal Dysbiosis characterized by comprising
Data acquisition module, for obtaining the intestinal flora data of target body, wherein the intestinal flora data are comprising more One sample data of a species marker;
Model prediction module predicts for the intestinal flora data to be input in enteron aisle random forest prediction model, Obtain output result;The enteron aisle random forest prediction model is corresponding with the threshold value for judgement sample phenotype, the enteron aisle with Machine forest prediction model includes: random forest OTU model or Shannon exponential random forest model;
As a result judgment module, for judging whether the intestinal microecology of the target body is in unbalance based on the output result State;
The quantity of the enteron aisle random forest prediction model is multiple;
The model prediction module, is also used to: the intestinal flora data being input to each enteron aisle random forest and are predicted It is predicted in model, obtains multiple output results;
The result judgment module, is also used to: determining that each enteron aisle random forest is predicted based on the multiple output result The prediction result of model obtains multiple prediction results;And the ratio based on target prediction result in the multiple prediction result is true Whether the intestinal microecology of the fixed target body is in imbalance state, wherein the target prediction result is the multiple pre- Survey the prediction result for being in the imbalance state for characterizing the intestinal microecology of the target body in result;
The result judgment module, is also used to:
If in the multiple prediction result, ratio shared by the target prediction result is greater than the first preset ratio threshold value, then really The intestinal microecology for making the target body is in imbalance state;
If in the multiple prediction result, ratio shared by the target prediction result is greater than the second preset ratio threshold value, and small It is in the first preset ratio threshold value, then pre- based on target enteron aisle random forest in the multiple enteron aisle random forest prediction model The prediction result for surveying model determines whether the intestinal microecology of the target body is in imbalance state, and the target enteron aisle is random Forest prediction model is the prediction of enteron aisle random forest corresponding to max-thresholds in the multiple enteron aisle random forest prediction model Model, the second preset ratio threshold value are less than the first preset ratio threshold value, and each enteron aisle random forest predicts mould Type corresponds to a threshold value.
15. device according to claim 14, which is characterized in that the result judgment module is also used to:
The intestinal flora data are inputted respectively in enteron aisle random forest prediction model Ai, obtain output result Bi, wherein i Successively taking 1 to I, I is the quantity of the multiple enteron aisle random forest prediction model;
If the output result Bi is more than or equal to threshold value corresponding to the enteron aisle random forest prediction model Ai, obtain Intestinal microecology for characterizing the target body is in the prediction result of imbalance state;
If the output result Bi is less than threshold value corresponding to the enteron aisle random forest prediction model Ai, obtain for characterizing The intestinal microecology of the target body is in the prediction result of normal condition.
16. device according to claim 14, which is characterized in that further include: model construction module;The model construction mould Block includes:
First sample set obtains module, for obtaining first sample set;The first sample set is based on from the first human body group With the objective matrix of intestinal flora data collected in the second human body group composition, each human body in first human body group Intestinal microecology be in normal condition, the intestinal microecology of each human body is in the unbalance shape in second human body group State, the objective matrix are the matrix of M*N, and it includes M sample that the first sample, which is concentrated, and N indicates that each sample corresponds to N number of object Kind marker;The objective matrix is by the corresponding N number of species marker of sample each in M sample for indicating that species mark The Abundances of object quantity form;
Basic model establishes module, for being based on the first sample set, establishes enteron aisle Random Forest model;
Second sample set obtains module, for being based on the enteron aisle Random Forest model, from N number of species of the first sample set In marker, the important species marker of predetermined number is filtered out, forms the second sample set;
Prediction model constructs module, for constructing the random forest OTU model based on second sample set.
17. device according to claim 16, which is characterized in that the prediction model constructs module, is also used to:
Based on second sample set, using staying a cross-validation method and Target Modeling method to establish the random forest OTU mould Type, wherein the Target Modeling method is the method for constructing the enteron aisle Random Forest model.
18. device according to claim 17, which is characterized in that the basic model establishes module, is also used to:
According to sampling with replacement mode, M sampling is carried out to the first sample set, obtains M training set, and is each described Training set establishes decision tree, obtains M decision tree;
Division processing is carried out to every decision tree in the M decision tree, the M decision tree grown to greatest extent is obtained, with true Disruptive features corresponding to the split vertexes and each split vertexes of fixed every decision tree;
The decision tree that described M grows to greatest extent is combined, the enteron aisle Random Forest model is obtained.
19. device according to claim 18, which is characterized in that the basic model establishes module, is also used to:
It is that each split vertexes selection is corresponding in every decision tree in N number of species marker that the first sample is concentrated Disruptive features carry out division processing to every decision tree to realize.
20. device according to claim 16, which is characterized in that second sample set obtains module, is also used to:
By the importance importance function in the enteron aisle Random Forest model, N number of species marker is exported The ranking results of importance;
Using species marker corresponding to predetermined number maximum importance preceding in the ranking results as the important species mark Remember object;
Sequence taxonomical unit OTU table based on the important species marker forms second sample set.
21. device according to claim 17, which is characterized in that the prediction model constructs module, is also used to:
M test set and M training set are constructed based on each sample in second sample set, wherein M is second sample The quantity of this concentration sample, when m-th of test set in the M test set is m-th of sample, in the M training set M-th of training set is other samples in second sample set in addition to m-th of sample;
Based on the M training set and the Target Modeling method, M the first submodels are established, as the random forest OTU Model.
22. device according to claim 21, which is characterized in that further include: first threshold determining module is used for:
Using test set corresponding with the M the first submodels in the M test set, to each first submodel It is predicted, obtains M first output result;
The first ROC indicatrix is drawn according to the M first output result;
Based on the first ROC indicatrix, the threshold value of the random forest OTU model is calculated, wherein the random forest The threshold value of OTU model is used for judgement sample phenotype.
23. device according to claim 16, which is characterized in that the model construction module, further includes:
Third sample set obtains module, for the Abundances according to the important species marker, calculates second sample set In each sample Shannon index;The Shannon index is added in the OTU table of the important species marker, is obtained To third sample set;
The prediction model constructs module, is also used to based on the third sample set, using staying a cross-validation method and target to build Mould method establishes Shannon exponential random forest model, wherein the Target Modeling method be for construct the enteron aisle with The method of machine forest model.
24. device according to claim 23, which is characterized in that the third sample set obtains module, is also used to:
Using following formula, the Shannon index of each sample in second sample set is calculated:
S=- (P1 x lnP1+P2 x lnP2+...+Pn x lnPn);
Wherein, P1、P2...PnFor the 1st, the 2nd ... Abundances of n-th of species marker of sample, S is the sample Shannon index.
25. device according to claim 23, which is characterized in that the prediction model constructs module, is also used to:
M test set and M training set are constructed based on each sample in the third sample set, wherein M is the third sample The quantity of this concentration sample, when m-th of test set in the M test set is m-th of sample, in the M training set M-th of training set is other samples in the third sample set in addition to m-th of sample;
M the second submodels are established based on the M training set and the Target Modeling method, as the Shannon index Random Forest model.
26. device according to claim 25, which is characterized in that further include: second threshold determining module is used for:
Using test set corresponding with M the second submodels in the M test set, each second submodel is carried out Prediction obtains M the second prediction results;
The 2nd ROC indicatrix is drawn according to the M the second prediction results;
Based on the 2nd ROC indicatrix, the threshold value of the Shannon exponential random forest model is calculated, wherein described The threshold value of Shannon exponential random forest model is used for judgement sample phenotype.
27. a kind of electronic equipment, including memory, processor, it is stored with and can runs on the processor on the memory Computer program, which is characterized in that the processor realizes the claims 1 to 13 when executing the computer program The step of method described in one.
CN201811357592.7A 2018-11-15 2018-11-15 The determination method, apparatus and electronic equipment of human body intestinal canal Dysbiosis Active CN109448842B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811357592.7A CN109448842B (en) 2018-11-15 2018-11-15 The determination method, apparatus and electronic equipment of human body intestinal canal Dysbiosis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811357592.7A CN109448842B (en) 2018-11-15 2018-11-15 The determination method, apparatus and electronic equipment of human body intestinal canal Dysbiosis

Publications (2)

Publication Number Publication Date
CN109448842A CN109448842A (en) 2019-03-08
CN109448842B true CN109448842B (en) 2019-09-24

Family

ID=65553505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811357592.7A Active CN109448842B (en) 2018-11-15 2018-11-15 The determination method, apparatus and electronic equipment of human body intestinal canal Dysbiosis

Country Status (1)

Country Link
CN (1) CN109448842B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111897A (en) * 2019-05-24 2019-08-09 天益健康科学研究院(镇江)有限公司 A method of obtaining genital tract flora sequencer address
CN111710364B (en) * 2020-05-08 2022-02-15 中国科学院深圳先进技术研究院 Method, device, terminal and storage medium for acquiring flora marker
CN114999574B (en) * 2022-08-01 2022-12-27 中山大学 Parallel identification and analysis method and system for intestinal flora big data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103045748A (en) * 2013-01-08 2013-04-17 浙江大学 Reagent kit and testing method for rapidly evaluating whether intestinal microecology of human body is unbalanced
GB201505364D0 (en) * 2015-03-27 2015-05-13 Genetic Analysis As Method for determining gastrointestinal tract dysbiosis
CN107480474B (en) * 2017-08-01 2019-03-26 山东师范大学 Classifier modeling evaluation method of calibration and system based on intestinal flora abundance
CN107586862B (en) * 2017-10-12 2019-10-08 青岛大学附属医院 Application of the intestinal flora in repeated respiratory tract infections in children diagnosis
CN107894827B (en) * 2017-10-31 2020-07-07 Oppo广东移动通信有限公司 Application cleaning method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN109448842A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
Erisoglu et al. A new algorithm for initial cluster centers in k-means algorithm
CN109448842B (en) The determination method, apparatus and electronic equipment of human body intestinal canal Dysbiosis
CN106815492B (en) A kind of automatic method of bacterial community composition and diversity analysis for 16S rRNA gene
Kowalski et al. Classification of archaeological artifacts by applying pattern recognition to trace element data
CN105389480B (en) Multiclass imbalance genomics data iteration Ensemble feature selection method and system
US6868342B2 (en) Method and display for multivariate classification
CN107742061A (en) A kind of prediction of protein-protein interaction mthods, systems and devices
CN109801680B (en) Tumor metastasis and recurrence prediction method and system based on TCGA database
CN106682454B (en) A kind of macro genomic data classification method and device
CN108053030A (en) A kind of transfer learning method and system of Opening field
CN104966105A (en) Robust machine error retrieving method and system
CN106446597B (en) Several species feature selecting and the method for identifying unknown gene
Beikmohammadi et al. SWP-LeafNET: A novel multistage approach for plant leaf identification based on deep CNN
CN104966075B (en) A kind of face identification method and system differentiating feature based on two dimension
CN112669899B (en) 16S and metagenome sequencing data correlation analysis method, system and equipment
CN109829494A (en) A kind of clustering ensemble method based on weighting similarity measurement
CN110188592B (en) Urine formed component cell image classification model construction method and classification method
CN107480627A (en) Activity recognition method, apparatus, storage medium and processor
CN107480441A (en) A kind of modeling method and system of children&#39;s septic shock prognosis prediction based on SVMs
CN106874705B (en) The method for determining tumor marker based on transcript profile data
CN108875310A (en) DNA binding protein sequence information feature extraction and classifying method and device
Li et al. Current progress and future prospects in phylofloristics
CN110010204A (en) Prognosis biomarker recognition methods based on converged network and more marking strategies
CN109657695A (en) A kind of fuzzy division clustering method and device based on definitive operation
CN107729918A (en) Cellular automata based on Cost Sensitive Support Vector Machines emerges in large numbers the sorting technique of phenomenon

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20191211

Address after: 341000 office building 1, floor 3, No.28, National Road 323 section, high tech Zone, Zhanggong District, Ganzhou City, Jiangxi Province

Patentee after: Jiangxi Prison Gene Technology Co., Ltd.

Address before: Unit 505, A5 Building, 218 Xinghu Street, Suzhou Industrial Park, Jiangsu Province

Patentee before: Suzhou puruisen Gene Technology Co Ltd

TR01 Transfer of patent right