CN111599477A - Model construction method and system for predicting diabetes based on eating habits - Google Patents
Model construction method and system for predicting diabetes based on eating habits Download PDFInfo
- Publication number
- CN111599477A CN111599477A CN202010664488.3A CN202010664488A CN111599477A CN 111599477 A CN111599477 A CN 111599477A CN 202010664488 A CN202010664488 A CN 202010664488A CN 111599477 A CN111599477 A CN 111599477A
- Authority
- CN
- China
- Prior art keywords
- decision tree
- diabetes
- intake
- sample
- sample set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010012601 diabetes mellitus Diseases 0.000 title claims abstract description 63
- 235000006694 eating habits Nutrition 0.000 title claims abstract description 21
- 238000010276 construction Methods 0.000 title claims abstract description 10
- 238000003066 decision tree Methods 0.000 claims abstract description 42
- 235000013305 food Nutrition 0.000 claims abstract description 36
- 235000015097 nutrients Nutrition 0.000 claims abstract description 27
- 239000000463 material Substances 0.000 claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 14
- 235000012054 meals Nutrition 0.000 claims abstract description 13
- 238000012795 verification Methods 0.000 claims abstract description 4
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 claims description 28
- 239000008103 glucose Substances 0.000 claims description 28
- 102000004169 proteins and genes Human genes 0.000 claims description 24
- 108090000623 proteins and genes Proteins 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 15
- 235000006286 nutrient intake Nutrition 0.000 claims description 12
- 208000008589 Obesity Diseases 0.000 claims description 10
- 230000002068 genetic effect Effects 0.000 claims description 10
- 235000020824 obesity Nutrition 0.000 claims description 10
- 206010020751 Hypersensitivity Diseases 0.000 claims description 7
- 230000007815 allergy Effects 0.000 claims description 7
- 208000026935 allergic disease Diseases 0.000 claims description 4
- 238000007637 random forest analysis Methods 0.000 claims description 4
- 201000005569 Gout Diseases 0.000 claims description 3
- 125000002791 glucosyl group Chemical group C1([C@H](O)[C@@H](O)[C@H](O)[C@H](O1)CO)* 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 abstract description 4
- 230000020595 eating behavior Effects 0.000 abstract description 3
- 235000021049 nutrient content Nutrition 0.000 abstract description 2
- 239000003925 fat Substances 0.000 description 33
- 239000008280 blood Substances 0.000 description 21
- 210000004369 blood Anatomy 0.000 description 21
- 230000002641 glycemic effect Effects 0.000 description 12
- 235000021075 protein intake Nutrition 0.000 description 7
- 150000001720 carbohydrates Chemical class 0.000 description 6
- 235000014633 carbohydrates Nutrition 0.000 description 4
- 241000219109 Citrullus Species 0.000 description 3
- 235000012828 Citrullus lanatus var citroides Nutrition 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- 235000005686 eating Nutrition 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 235000005911 diet Nutrition 0.000 description 2
- 230000037213 diet Effects 0.000 description 2
- 235000021061 dietary behavior Nutrition 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000000291 postprandial effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 241000209140 Triticum Species 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001684 chronic effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 235000013601 eggs Nutrition 0.000 description 1
- 235000012041 food component Nutrition 0.000 description 1
- 239000005417 food ingredient Substances 0.000 description 1
- 230000037406 food intake Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007446 glucose tolerance test Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 208000030159 metabolic disease Diseases 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 235000012794 white bread Nutrition 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/60—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to nutrition control, e.g. diets
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Evolutionary Computation (AREA)
- Nutrition Science (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention relates to a model construction method for predicting diabetes based on eating habits, which comprises the following steps: obtaining a first sample set comprising food material information of a sample multi-day meal; extracting a plurality of data in the first sample set, and forming a second sample set by using the data as features; dividing the second sample set into a training set and a verification set, and taking the training set as the input of a decision tree modeler; and training the decision tree model until the information gain of the features is lower than a threshold value to obtain the decision tree model. The method and the device analyze the main causes of diabetes, evaluate the eating behaviors of the user, and utilize the accumulative weighting of the nutrient content in the food materials and the decision tree algorithm, so that the user can quickly and conveniently know the intake condition of the nutrients and predict the diabetes risk index, and the user experience is improved.
Description
Technical Field
The invention relates to the field of medical information processing, and relates to a model construction method and a system for predicting diabetes based on eating habits.
Background
At present, the diabetes diagnosis standard is uniformly established by the world health organization, has no relation with people, age and sex, and takes blood sugar (fasting blood sugar, blood sugar at any time or glucose tolerance test) as the only diagnosis standard. Studies have shown that high blood glucose values are an important criterion for the diagnosis of diabetes, but absolutely not the only criterion. Diabetes is not only a problem of high blood sugar, but also a problem of blood sugar going to, and actually in the process of metabolism, blood sugar becomes fat. Therefore, all diabetics need to strictly control blood sugar and blood fat and actively adopt a scientific method to treat the diabetics to avoid harm.
With the continuous improvement of living standard and the change of life style of people, the prevalence rate of diabetes is on the rise. Diabetes is a group of metabolic diseases characterized by chronic increases in blood glucose levels. The Postprandial Glycemic Response (PGR) of a human is influenced by a variety of factors, and for a single food, is significantly correlated with food composition, Glycemic Index (GI) and Glycemic Load (GL) values; however, for mixed diets, there was no significant correlation between food composition and postprandial blood glucose, but there was some correlation between GL and GI values. Secondly, the protein and fat in the food also have some effect on blood glucose. The magnitude of their effect on blood glucose is in turn: GL > protein > fat.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides a model construction method for predicting diabetes based on eating habits.
The technical scheme for solving the technical problems is as follows: a model construction method for predicting diabetes based on eating habits comprises the following steps: obtaining a first sample set comprising food material information of a sample multi-day meal; extracting a plurality of data in the first sample set, and forming a second sample set by using the data as features; dividing the second sample set into a training set and a verification set, and taking the training set as the input of a decision tree modeler; and training the decision tree model until the information gain of the features is lower than a threshold value to obtain the decision tree model.
In some embodiments of the invention, the first sample set further comprises the age, sex, weight, height, past history of diabetes, history of allergies of the sample population.
In some embodiments of the invention, the nutrient intake profile is calculated from the age, sex, and food material information of the multi-day meal of each sample in the first sample set, and is characterized by the second data set.
In some embodiments of the invention, the nutrients include protein, fat, glucose.
In some embodiments of the invention, the characteristics of the second sample set include age, glycemic load, fat, protein, obesity, genetic history of diabetes.
In some embodiments of the invention, the blood glucose load is taken as the root node of the decision tree model.
The invention provides a system for predicting diabetes based on eating habits, which comprises an acquisition module, a matching module, a calculation module and a decision tree model, wherein the acquisition module is used for acquiring the food material information of the user such as age, sex, weight, height, past history of diabetes, allergy history and multi-day meals; the matching module is used for searching the daily intake of nutrients and the content of the nutrients according to the gender and the age of the user; the calculation module is used for carrying out weighted calculation on the content of the nutrients retrieved by the matching module and comparing the content of the nutrients with the daily intake to obtain the characteristics of the intake of the nutrients; the decision tree model predicts the probability of the user suffering from diabetes based on the nutrient intake characteristics.
In some embodiments of the invention, the decision tree model includes a model constructed by the model construction method for predicting gout based on eating habits.
In some embodiments of the invention, the nutrient is glucose, fat, protein.
In some embodiments of the invention, the decision tree model is optimized by a random forest tree.
The invention has the beneficial effects that: the main causes of diabetes are analyzed, the dietary behavior of the user is monitored and evaluated in real time, less resources are consumed, and the user can quickly and conveniently know the nutrient intake condition and predict the diabetes risk index by using the nutrient content accumulation weighting and decision tree (ID3) algorithm in the food materials, so that the disease speculation speed is increased, and the user experience is improved.
By recording or acquiring the data of the diet behaviors for multiple days, the daily blood sugar load (GL), protein and fat intake conditions are analyzed, statistics is carried out, and whether a balance relation is established between the human body demand and the supply quantity is analyzed. If this balance is disturbed, the risk of diabetes is increased. And (3) evaluating the risk index of the diabetes by using the ingestion conditions of the blood Glucose Load (GL), protein and fat in a certain period of time.
Drawings
FIG. 1 is a basic flow diagram of a model building method for predicting diabetes based on eating habits in some embodiments of the present invention;
FIG. 2 is a schematic diagram of the architecture of a system for predicting diabetes based on eating habits in some embodiments of the present invention;
FIG. 3 is an example of a portion of samples in a second sample set in some embodiments of the invention;
FIG. 4 is a decision tree model in some embodiments of the invention;
fig. 5a and 5b are tables of common food ingredients in 100 grams per food with reference.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
First, some necessary concepts of the present application are explained:
labeling: the labels are what we want to predict, i.e. the y variables in a simple linear regression. The label may be future price of wheat, animal species shown in the picture, meaning of an audio clip or anything, in this application the label may refer to whether the population in the sample has diabetes.
Is characterized in that: the features are input variables, i.e., x variables in a simple linear regression. A simple machine learning item may use a single feature, while a more complex machine learning item may use millions of features, specified as follows: x1, x 2.. xN. In this application, a characteristic may refer to a numerical value or boolean value corresponding to age, glucose, fat, protein, obesity, genetic history of diabetes.
Sample preparation: samples refer to specific instances of data: x. (x is a vector.) we classify samples into the following two categories: labeled swatches, unlabeled swatches, labeled swatches contain both features and labels. Namely: labeled examples { features, label }: x, y), we used labeled samples to train the model. In this application, a labeled sample is a user explicitly labeled as "having diabetes" or "not having diabetes". For example, a patient or user sample may include characteristics such as age, gender, weight, height, obesity, a genetic history of diabetes, etc.
The decision tree (ID3) algorithm is described as follows:
let the training data set be D, and | D | represent the sample capacity, i.e., the number of samples. Is provided with K classes Ck,k=1,2,…,K,|CkIs of class CkThe number of samples of (a) to (b),provided with a feature of*There are V different valuesAccording to the characteristic a*Is to divide D into V subsets D1,D2,…,DV,|DtL is DtThe number of samples of (a) to (b),memory set DiIn the class CkSet of samples of Dik. I.e. Dik=Di∩Ck,|DikL is DikNumber of samples of the block. The method of calculating the information gain is then as follows:
(1) calculating empirical entropy of data set D H (D)
(2) Computing empirical conditional entropy of features on a data set
Assuming a given training data set: d { (x)1,y1),(x2,y2),...,(xN,yN)},
Wherein,for the input example, i.e. the feature vector, N is the number of features, i is 1, 2, 3 … … N, N is the number of samples, y isi∈ {1, 2.., K } is a class label.
The technical scheme of the invention is specifically described as follows:
referring to fig. 1, a model construction method for predicting diabetes based on eating habits includes the following steps: s101, obtaining a first sample set comprising food material information of sample multi-day meals; s102, extracting a plurality of data in the first sample set, and forming a second sample set by using the data as characteristics; s103, dividing the second sample set into a training set and a verification set, and taking the training set as the input of the decision tree modeler; s104, training the decision tree model until the information gain of the features is lower than a threshold value to obtain the decision tree model.
In some embodiments of the invention, the first sample set further comprises the age, sex, weight, height, past history of diabetes, history of allergies of the sample population. The past history of diabetes comprises the hereditary history of diabetes and the history of diabetes patients.
In some embodiments of the invention, the nutrient intake profile is calculated from the age, sex, and food material information of the multi-day meal of each sample in the first sample set, and the nutrient intake profile is characterized by the second data set. The nutrient intake is divided into three cases: if the value is lower than the standard value, the value is recorded as low; equal to the standard value, and is recorded as medium or moderate; higher than the standard value, high or slightly high.
Referring to fig. 3, in some embodiments of the invention, the nutrients include protein, fat, glucose.
Preferably, the characteristics of the second sample set include age, glycemic load, fat, protein, obesity, genetic history of diabetes. Note that, glycemic load index (GL): GL ═ food GI × the amount of actually available carbohydrate (g) ingested for that food. Glycemic Index (GI): refers to the percentage value of the blood glucose response level in vivo after eating a carbohydrate 50g meal or an equivalent amount of standard meal (glucose or white bread). The calculation formula is as follows: glycemic index is the area under the curve for the rise in blood glucose two hours after eating a certain food containing 100g of glucose equivalent in sugar/the area under the curve for the rise in blood glucose two hours after eating 100g of glucose x 100. The GI value of glucose is usually set to 100. Therefore, the glycemic load is somewhat related to other nutrients such as glucose, proteins, fats, etc.
Referring to fig. 3, 5a and 5b, in some embodiments of the invention, to prevent under-or over-fitting of the model, the features of the sample need to be normalized. Because the food material information and the nutrients in the first sample set or the second sample set have a plurality of characteristics and the span range of the values is large, the classification results with relatively small other values are dominated by the characteristics, the influence of the other characteristics is weakened, and the data needs to be normalized. The characteristic dispersion is normalized by linear transformation of the original data, so that the result falls into a range from [0,1] to [0,10], and the range can be adjusted according to actual conditions.
Referring to FIG. 4, in some embodiments of the invention, the decision tree model has the blood Glucose Load (GL) as the root node of the decision tree model.
Referring to fig. 2, another aspect of the present invention provides a system 1 for predicting diabetes based on eating habits, including an obtaining module 11, a matching module 12, a calculating module 13, and a decision tree model 14, where the obtaining module 11 is configured to obtain food material information of a user, such as age, gender, weight, height, past history of diabetes, allergy history, and multi-day meal; the matching module 12 is used for searching the daily intake of nutrients and the content of the nutrients according to the gender and the age of the user; the calculation module 13 is used for performing weighted calculation on the content of the nutrients retrieved by the matching module and comparing the content of the nutrients with the daily intake to obtain the characteristics of the intake of the nutrients; the decision tree model 14 predicts the probability of the user suffering from diabetes based on the nutrient intake characteristics.
In some embodiments of the invention, the decision tree model 14 includes a model constructed by the aforementioned model construction method for predicting gout based on eating habits.
In some embodiments of the invention, the nutrient is glucose, fat, protein.
In some embodiments of the invention, the decision tree model 14 is optimized by a random forest tree.
In some embodiments of the invention, a system 1 for predicting diabetes based on eating habits comprises an obtaining module 11, a matching module 12, a calculating module 13, a decision tree model 14,
the acquisition module 11: acquiring the age, sex, weight, height, past history, allergy history and food material information of multiple-day meals of a user;
the matching module 12: according to data input by a user, searching the daily intake of elements and the content of nutrients in food materials according to age and gender in each nutrient (protein, fat and glucose) table in sequence;
the calculation module 13: and respectively carrying out weighted calculation on the content of the nutrients according to the result of data retrieval, and benchmarking the weighted result with the daily intake. Judging the nutrient intake conditions (low, high and moderate) according to the benchmarking result, and performing classification statistics;
decision tree model 14: recording the diet conditions of 800 users for 60 days continuously, and analyzing the protein, fat and glucose intake conditions; the selected user characteristics are as follows: "protein", "fat", "glycemic load", "obesity", "genetic history of diabetes". And calculating the information gain value of each characteristic according to the characteristic information, selecting the characteristic of the maximum result as a root node according to the result of the information gain, using the characteristics with sequentially reduced results as child nodes, recursively calling the method for the child nodes to construct a decision tree until the information gain of all the characteristics is very small or no characteristic can be selected, and finally obtaining the decision tree. It should be noted that the above 800 users (samples) are only for illustration, and the number of samples may be adjusted as appropriate.
Diabetes risk index prediction: and (3) according to the user information, applying a decision tree (ID3) model to realize the prediction of the diabetes risk index or the prevalence probability.
The technical solution of the present application will be described below with reference to specific examples.
Example (c): recording the eating behavior of a week after the age of 40 years, women, height of 164cm and weight of 58KG, analyzing the eating behavior, counting the intake condition of each element, and predicting the diabetes risk index by using GL, protein and fat element intake conditions (high, medium and low) as characteristics.
1. Calculating daily element intake: by recording the daily food materials and weight, the contents of protein and fat (fat unit is%, the content of food materials is g, and conversion to percentage is required) (for example, 100g of eggs, 8.8g of fat, 8.8/100 of fat, and 8.8% of conversion to standard calculation) are searched out by using a knowledge base food material table), and the total daily intake of protein and fat is respectively set as X1 and Y1 by weighting calculation.
2. Similarly, according to the step 1, respectively calculating the protein intake of the rest six days to be respectively set as X2, X3, X4, X5, X6 and X7; the fat intake was Y2, Y3, Y4, Y5, Y6 and Y7, respectively. The total protein intake in one week is X ═ X1+ X2+ … … X7, and the average daily protein intake is X/7; the total intake of fat Y is Y1+ Y2+ … … Y7, and the average intake of fat per day is Y/7.
3. The intake is labeled, and according to the daily intake scale of knowledge base elements, the daily intake of protein of a 40-year-old female is searched to be B1, and the daily intake of fat is searched to be B2. Comparing the X/7 and B1 values, there are three possibilities, greater than, less than, equal, corresponding to higher, lower, moderate protein intake, respectively. In the same way, the intake of fat is higher, lower and moderate.
GL ═ carbohydrate (g) × GI/100, e.g. the carbohydrate content of 100 grams of watermelons is 7.5 grams, the glycemic index of watermelons is 72%, and its Glycemic Load (GL) is 7.5 × 72/100 ═ 5.4; the blood Glucose Load (GL) of 500 g watermelon is 37.5 × 72/100 is 27. GL >20 are high GL foods; GL is 10-20 is middle GL food; GL <10 is a low GL food. GL does not relate to intake amount, GL values corresponding to all food materials are sequentially searched in a knowledge base food material table, the GL values of the food materials are compared in which interval, and if the GL value of the food material is 22, the result is higher; if GL value is 15, the result is moderate; if GL is 2, the result is lower. The GL value in the food material table corresponds to 100g of food, and the GL value is scaled according to the weight of the user food material in an equal ratio mode. And (3) utilizing a decision tree (ID3) algorithm to realize the diabetes risk index prediction.
By combining the sample data set (analysis statistics on nutrient intake conditions within continuous 60 days), the selected characteristic information is as follows: "age", "obesity", "sugar intake status", "protein intake status", "fat intake status", "family history of diabetes"; the number of samples is 800. Category labels fall into two categories: is diabetes and not diabetes.
According to the calculation steps, there are illustrated:
1) calculating the information entropy required by the classification of the given sample (the smaller the information entropy is, the smaller the uncertainty is, the larger the certainty is, and the higher the purity of the information is);
according to the sample set, the number of people with diabetes is 123, and the number of people without diabetes is 800-.
2) And respectively calculating the information entropy and the information gain of each characteristic (the age, the obesity, the GL intake condition, the protein intake condition, the fat intake condition and the diabetes family genetic history).
Taking the "GL intake" characteristic as an example, the intake was divided into three groups: low, high and moderate. Information entropy with lower GL: as can be seen from the data set, the number of persons with low GL is 177, the number of samples is 800, the probability P0 that the low GL accounts for the total samples is calculated to be 177/800, and the information entropy i0 that the low GL is calculated by the formula (2). Similarly, calculating the information entropy i1 with higher GL, and the probability P1 that the higher GL accounts for the total samples; the information entropy i2 with moderate GL accounts for the probability P2 of the total sample. The information entropy of GL intake is: e (GL uptake) ═ P0 × i0+ P1 × i1+ P2 × i 2;
1. calculating an information gain, wherein the information gain of GL intake is as follows: g (GL intake) ═ i-E. Similarly, other characteristic information gains, an information gain of age, an information gain of protein intake, an information gain of fat intake, and an information gain of the family genetic history of diabetes were calculated. The calculation result is as follows: the information gain of GL is: 0.2667; the information gain of the genetic history of diabetes is: 0.2033, respectively; the information gain of fat is: 0.1624, respectively; the information gain for obesity was: 0.1273, respectively; the information gain of the protein is: 0.0968, respectively; the information gain of age is: 0.0183.
and according to the result of the information gain, selecting the characteristic GL with the largest result as a root node, sequentially reducing the result as child nodes, recursively calling the method for the child nodes to construct a decision tree until the information gain of all the characteristics is very small or no characteristics can be selected, and finally obtaining the decision tree. In particular, the decision tree model is optimized by a random forest tree.
Calculating the obesity (No) and the diabetes family genetic history (No) by using the height and the weight according to the information input by the user, wherein the protein is moderate, the fat is higher, and the GL is higher according to the statistical result in the step 3; with the decision tree model, the user's risk of diabetes was predicted to be 56.9%.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit or a module is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium.
Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent replacements, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A model construction method for predicting diabetes based on eating habits is characterized by comprising the following steps:
obtaining a first sample set comprising food material information of a sample multi-day meal;
extracting a plurality of data in the first sample set, and forming a second sample set by using the data as features;
dividing the second sample set into a training set and a verification set, and taking the training set as the input of a decision tree modeler;
and training the decision tree model until the information gain of the features is lower than a threshold value to obtain the decision tree model.
2. The method of claim 1, wherein the first sample set further comprises age, gender, weight, height, past history of diabetes, and history of allergies of the sample population.
3. The method of claim 1, wherein the nutrient intake is calculated from the age, sex, and food material information of the multiple-day meal of each sample in the first sample set, and the nutrient intake is used as the characteristic of the second data set.
4. The method of claim 3, wherein the nutrients comprise protein, fat, and glucose.
5. The method of any of claims 3 or 4, wherein the characteristics of the second sample set include age, glucose load, fat, protein, obesity, genetic history of diabetes.
6. The method of any one of claims 1-4, wherein the glucose load is used as a root node of the decision tree model.
7. A system for predicting diabetes based on eating habits is characterized by comprising an acquisition module, a matching module, a calculation module and a decision tree model,
the acquisition module is used for acquiring the age, sex, weight, height, past history of diabetes, allergy history and food material information of multi-day meals of a user;
the matching module is used for searching the daily intake of nutrients and the content of the nutrients according to the gender and the age of the user;
the calculation module is used for carrying out weighted calculation on the content of the nutrients retrieved by the matching module and comparing the content of the nutrients with the daily intake to obtain the characteristics of the intake of the nutrients;
the decision tree model predicts the probability of the user suffering from diabetes based on the nutrient intake characteristics.
8. The system for predicting diabetes based on eating habits according to claim 7, wherein the decision tree model comprises a model constructed by the method for constructing a model for predicting gout based on eating habits according to any one of claims 1 to 6.
9. The system for predicting diabetes based on eating habits of claim 8, wherein the nutrient is glucose, fat, protein.
10. The system for predicting diabetes based on eating habits of claim 9, wherein the decision tree model is optimized by random forest trees.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010664488.3A CN111599477A (en) | 2020-07-10 | 2020-07-10 | Model construction method and system for predicting diabetes based on eating habits |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010664488.3A CN111599477A (en) | 2020-07-10 | 2020-07-10 | Model construction method and system for predicting diabetes based on eating habits |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111599477A true CN111599477A (en) | 2020-08-28 |
Family
ID=72188249
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010664488.3A Pending CN111599477A (en) | 2020-07-10 | 2020-07-10 | Model construction method and system for predicting diabetes based on eating habits |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111599477A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112133434A (en) * | 2020-09-17 | 2020-12-25 | 吾征智能技术(北京)有限公司 | Dietary habit-based hyperlipidemia auxiliary diagnosis system, device and storage medium |
CN112786200A (en) * | 2021-01-18 | 2021-05-11 | 吾征智能技术(北京)有限公司 | Intelligent diet evaluation system based on meal data |
CN112967807A (en) * | 2021-03-03 | 2021-06-15 | 吾征智能技术(北京)有限公司 | System, device and storage medium for predicting cerebral apoplexy based on eating behavior |
CN113257422A (en) * | 2021-06-04 | 2021-08-13 | 福州大学 | Method and system for constructing disease prediction model based on glucose metabolism data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391970A (en) * | 2014-12-04 | 2015-03-04 | 深圳先进技术研究院 | Attribute subspace weighted random forest data processing method |
CN107403072A (en) * | 2017-08-07 | 2017-11-28 | 北京工业大学 | A kind of diabetes B prediction and warning method based on machine learning |
CN110379488A (en) * | 2019-07-12 | 2019-10-25 | 深圳市预防宝科技有限公司 | A kind of pair of postprandial hyperglycemia carries out the device and method of early warning |
CN111383758A (en) * | 2020-03-06 | 2020-07-07 | 三七二二(北京)健康咨询有限公司 | Method and device for predicting postprandial blood glucose based on multidimensional data |
-
2020
- 2020-07-10 CN CN202010664488.3A patent/CN111599477A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391970A (en) * | 2014-12-04 | 2015-03-04 | 深圳先进技术研究院 | Attribute subspace weighted random forest data processing method |
CN107403072A (en) * | 2017-08-07 | 2017-11-28 | 北京工业大学 | A kind of diabetes B prediction and warning method based on machine learning |
CN110379488A (en) * | 2019-07-12 | 2019-10-25 | 深圳市预防宝科技有限公司 | A kind of pair of postprandial hyperglycemia carries out the device and method of early warning |
CN111383758A (en) * | 2020-03-06 | 2020-07-07 | 三七二二(北京)健康咨询有限公司 | Method and device for predicting postprandial blood glucose based on multidimensional data |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112133434A (en) * | 2020-09-17 | 2020-12-25 | 吾征智能技术(北京)有限公司 | Dietary habit-based hyperlipidemia auxiliary diagnosis system, device and storage medium |
CN112786200A (en) * | 2021-01-18 | 2021-05-11 | 吾征智能技术(北京)有限公司 | Intelligent diet evaluation system based on meal data |
CN112967807A (en) * | 2021-03-03 | 2021-06-15 | 吾征智能技术(北京)有限公司 | System, device and storage medium for predicting cerebral apoplexy based on eating behavior |
CN112967807B (en) * | 2021-03-03 | 2023-12-01 | 吾征智能技术(北京)有限公司 | System, device and storage medium for predicting cerebral apoplexy based on diet behavior |
CN113257422A (en) * | 2021-06-04 | 2021-08-13 | 福州大学 | Method and system for constructing disease prediction model based on glucose metabolism data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111599477A (en) | Model construction method and system for predicting diabetes based on eating habits | |
Vanhonacker et al. | The concept of farm animal welfare: Citizen perceptions and stakeholder opinion in Flanders, Belgium | |
Renner et al. | Why we eat what we eat. The Eating Motivation Survey (TEMS) | |
Kim et al. | Health knowledge and consumer use of nutritional labels: the issue revisited | |
Mohr et al. | Personal and lifestyle characteristics predictive of the consumption of fast foods in Australia | |
O'sullivan et al. | 21st century toolkit for optimizing population health through precision nutrition | |
Waynforth | Life-history theory, chronic childhood illness and the timing of first reproduction in a British birth cohort | |
CN110349647A (en) | Dietary management method, system, electronic equipment and storage medium | |
CN112951430A (en) | Multiple chronic disease joint management apparatus and computer-readable storage medium | |
Waltner et al. | Personalized dietary self-management using mobile vision-based assistance | |
US20230207136A1 (en) | Methods and systems for generating a vibrant compatbility plan using artificial intelligence | |
CN111816280A (en) | Disease prediction model construction method and system based on eating behavior | |
US20220399099A1 (en) | System and method for providing personalized dietary suggestion platform | |
Calder et al. | Factors influencing women’s choice of weight-loss diet | |
KR20220168142A (en) | System and method for providing personalized dietary suggestion platform | |
KR102298350B1 (en) | Method and device for providing customized health functional food manufacturing and recommendation information using microbiome | |
Hanley-Cook et al. | Food biodiversity: Quantifying the unquantifiable in human diets | |
CN116130058A (en) | Recipe recommendation method based on intelligent diet | |
US12001796B2 (en) | Methods and systems for personal recipe generation | |
Gatica-Perez et al. | Discovering eating routines in context with a smartphone app | |
RU2721234C1 (en) | Method and system for tracking a ration and forming an opinion on the quality of nutrition and / or individual nutrition recommendations | |
WO2021148545A1 (en) | System and method for data-driven individualized nutrition | |
US20220367050A1 (en) | Predicting gut microbiome diversity | |
Kassahun-Yimer et al. | A joint model for multivariate hierarchical semicontinuous data with replications | |
Gardner et al. | Three factors that need to be addressed more consistently in nutrition studies:“instead of what?”,“in what context?”, and “for what?” |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200828 |
|
RJ01 | Rejection of invention patent application after publication |