WO2020181805A1

WO2020181805A1 - Diabetes prediction method and apparatus, storage medium, and computer device

Info

Publication number: WO2020181805A1
Application number: PCT/CN2019/117217
Authority: WO
Inventors: 金晓辉; 阮晓雯; 徐亮; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-03-12
Filing date: 2019-11-11
Publication date: 2020-09-17
Also published as: CN110197720A

Abstract

The present application discloses a diabetes prediction method and apparatus, a storage medium and a computer device, relating to the field of computer technology, being able to effectively solve the problem in the prior art that it can only be determined whether a user suffers from diabetes, but the severity of the disease thereof cannot be determined. Said method comprises: acquiring sample user data from original health profile and electronic medical record data; creating, according to user features in the sample user data, a regression prediction model of a numerical type; determining, using the regression prediction model, a first physical examination index value of fasting blood glucose of a target user and a second physical examination index value of blood glucose within a pre-set time period after a meal; and determining, according to the first physical examination index value and/or the second physical examination index value, the disease severity of the target user. The present application is applicable to the prediction of diabetes, and to the determination of the severity of diabetes.

Description

Diabetes prediction method and device, storage medium, and computer equipment

This application claims priority with the Chinese patent application filed on March 12, 2019 with the Chinese Patent Office, the application number is 2019101850792, and the application name is "diabetes prediction method and device, storage medium, computer equipment", the entire content of which is incorporated by reference Incorporate in the application.

Technical field

This application relates to the field of computer technology, in particular to a method and device for predicting diabetes, a storage medium, and computer equipment.

Background technique

Diabetes is a group of metabolic diseases characterized by high blood sugar, which can cause damage to large blood vessels and capillaries and endanger the heart, brain, kidneys, peripheral nerves, eyes, feet and other parts of the disease. There are also many complications. Therefore, it is absolutely necessary to strengthen the prediction of diabetes. However, with the advancement of technology, the diagnosis of disease types is no longer limited to the analysis of doctors. Using artificial intelligence to predict diabetes is more in line with today's development trend.

The inventor found that the current common method for diabetes prediction in the industry is to collect diabetes medical records, compare the data of diabetic patients with the data of healthy people, build a 0-1 classification model, and judge whether the user is suffering from diabetes through various characteristic dimension data of the patient . However, the existing diabetes prediction methods can only judge whether a patient has diabetes, but cannot judge the severity of the disease, resulting in insufficient diagnosis results, and inability to carry out supporting control treatments based on the degree of the disease, which may cause the patient’s condition Exacerbate deterioration.

Summary of the invention

In view of this, this application provides a diabetes prediction method and device, storage medium, and computer equipment. The main purpose is to solve the problem that when using the constructed 0-1 classification model to predict diabetes, only whether the user has diabetes can be judged. However, it is impossible to judge the severity of the disease, which leads to the problem of insufficient diagnosis.

According to one aspect of the present application, a method for predicting diabetes is provided. The method includes: obtaining sample user data in original health files and electronic medical record data; creating a numerical regression prediction based on user characteristics in the sample user data Model; using the regression prediction model to determine the first physical examination index value of the fasting blood glucose of the target user and the second physical examination index value of the blood glucose of the preset duration after a meal; according to the first physical examination index value and/or the second physical examination index Value to determine the degree of illness of the target user.

According to another aspect of the present application, a device for predicting diabetes is provided, the device comprising:

The obtaining unit is used to obtain sample user data in the original health file and electronic medical record data;

The creation unit is used to create a numerical regression prediction model based on the user characteristics in the sample user data;

A judging unit, configured to use the regression prediction model to judge the first physical examination index value of the fasting blood glucose of the target user and the second physical examination index value of the blood glucose of the preset duration after a meal;

The determining unit is configured to determine the disease degree of the target user according to the first physical examination index value and/or the second physical examination index value.

According to another aspect of the present application, there is provided a non-volatile readable storage medium having computer readable instructions stored thereon, and when the computer readable instructions are executed by a processor, the aforementioned diabetes prediction method is implemented. According to another aspect of the present application, there is provided a computer device, including a non-volatile readable storage medium, a processor, and a computer-readable storage medium that is stored on the non-volatile readable storage medium and can run on the processor. Instructions, when the processor executes the computer-readable instructions, the aforementioned diabetes prediction method is implemented.

With the above technical solutions, the method and device, storage medium, and computer equipment for predicting diabetes provided by this application are compared with the current method of predicting diabetes using the constructed 0-1 classification model. On the basis of the model, a regression prediction model of postprandial blood glucose and fasting 2h blood glucose is added. The regression prediction model can be used to determine the first physical examination index value of the target user’s fasting blood glucose and the second physical examination index value of the blood glucose of the preset duration after the meal. That is, the physical examination index value can be used to determine whether the target user has diabetes, and the degree of the target user's disease can be further judged.

The above description is only an overview of the technical solution of this application. In order to understand the technical means of this application more clearly, it can be implemented in accordance with the content of the specification, and for the above and other purposes, features and advantages of this application to be more obvious and understandable, Specific implementations of this application are cited below.

Description of the drawings

The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application, and do not constitute an improper limitation of the local application. In the attached picture:

Fig. 1 shows a schematic flow chart of a method for predicting diabetes according to an embodiment of the present application; Fig. 2 shows a schematic flow chart of another method for predicting diabetes according to an embodiment of the present application; Fig. 3 shows the present application The embodiment provides a schematic structural diagram of a diabetes prediction device; FIG. 4 shows a schematic structural diagram of another diabetes prediction device provided by an embodiment of the present application.

detailed description

Hereinafter, the present application will be described in detail with reference to embodiments and in conjunction with the drawings. It should be noted that the embodiments in this application and the features in the embodiments can be combined with each other if there is no conflict. Aiming at the problem that when the currently constructed 0-1 classification model is used to predict diabetes, the severity of diabetes cannot be judged based on user data. This embodiment provides a method for predicting diabetes, as shown in FIG. Methods include:

101. Obtain sample user data from original health files and electronic medical record data. Among them, the sample user data may include patient visit data, physical examination index data, medication data, and health notification data.

102. Create a numerical regression prediction model based on user characteristics in the sample user data. Among them, the user characteristics can include multiple types of feature dimension data such as postprandial blood glucose and fasting 2h blood glucose, blood pressure, sebum thickness, insulin, BMI body mass index, diabetes genetic information, age, and diagnosis results. In specific implementations, the regression prediction model can be constructed using multiple different framework models based on decision trees, that is, using integrated learning ideas to gather multiple prediction models based on decision trees to improve the accuracy of prediction results . Decision tree is a relatively simple type of machine learning supervised learning classification algorithm. Decision tree is a predictive model; it represents a mapping relationship between object attributes and object values. Each node in the tree represents an object, and each bifurcation path represents a possible attribute value, and each leaf node corresponds to the object represented by the path from the root node to the leaf node. value. The decision tree has only a single output. If you want to have multiple outputs, you can build an independent decision tree to handle different outputs. Decision tree algorithms include ID3, C4.5, and CART algorithms. The common point is that they are all greedy algorithms. The difference is that the measurement methods are different. For example, ID3 uses the amount of information acquisition as the measurement method, while C4.5 uses the maximum gain rate. The regression prediction model created by the creation can well reflect the postprandial blood glucose value and fasting 2h blood glucose value of the sample users of different blood pressure, sebum thickness, insulin, BMI body mass index, diabetes genetic information, age, diagnosis results, etc. .

103. Use the regression prediction model to determine the first physical examination index value of the fasting blood glucose of the target user and the second physical examination index value of the blood glucose of the preset duration after a meal. Among them, the target user is a user who needs to predict the condition of diabetes; the first physical examination index value corresponds to the data test result of the target user's fasting blood glucose; the second physical examination index value corresponds to the target user's blood glucose test result for a preset time after meal; the preset time Can be determined according to actual needs. For this embodiment, based on the postprandial blood glucose values and fasting 2h blood glucose values reflected by sample users with different characteristics, the target user’s characteristics are matched with those of the sample user to find the postprandial blood glucose value and fasting corresponding to the characteristics of the matched sample user 2h blood glucose level.

104. Determine the disease degree of the target user according to the first physical examination index value and/or the second physical examination index value.

In a specific application scenario, the fasting blood glucose of the target user can be judged according to the first physical examination index value, and whether the target user’s blood glucose is normal for the preset time after meal according to the second physical examination index value. When the first physical examination index value and/or When the index value of the second physical examination is abnormal, it can be judged that the user has diabetes, and the patient's degree of disease can be further judged by comparing with the threshold value. Through the diabetes prediction method in this embodiment, a numerical regression prediction model can be created based on the user characteristics in the sample user data, and the regression prediction model can be used to determine the first physical examination index value of the target user’s fasting blood glucose and the preset duration of blood glucose after a meal. The second physical examination index value, and according to the first physical examination index value and/or the second physical examination index value, determine whether the target user is ill and the severity of the disease, so that the diagnosis result of the condition is more accurate, the diagnosis content is more complete, and it is convenient According to the different development levels of diabetes, timely and effective supporting treatments are carried out to curb the development of the disease.

Further, as a refinement and extension of the specific implementation of the foregoing embodiment, in order to fully explain the specific implementation process in the embodiment of the present application, another method for predicting diabetes is provided. As shown in FIG. 2, the method includes:

201. Obtain sample user data from original health files and electronic medical record data. For example, a total of about 100 sample user data with complete user characteristics are obtained from the original health file and electronic medical record data, and then the sample user data is further analyzed and processed.

The following two prediction methods are described. One is to use the fasting blood glucose value as a physical examination index value for prediction (that is, the process shown in steps 202a to 205a), and the other is to use the blood glucose level two hours after a meal, which is a physical examination index value. Perform prediction (that is, the process shown in steps 202b to 205b).

202a. Use the fasting blood glucose value in the user characteristics of the sample user as label information Y1, and use target feature data of the sample user except the fasting blood glucose value and the two-hour postprandial blood glucose value as the feature information X1 to create a first model training set. Among them, the user characteristics are extracted from the sample user data using regular expressions, and the target characteristic data includes at least one or more of the sample user’s medical history data, hospitalization data, medical treatment data, physical examination data, and health notification data , For example, may include the user's medical history, hospitalization, medication, physical examination, health notification and other related information. The created first model training set contains each feature information X1 and each corresponding label information Y1. That is, the fasting blood glucose values corresponding to the sample users of different medical history records, hospitalization records, medication status, physical examination status, and health notifications.

203a. Train a first recognition model for judging the value of the first physical examination index based on a preset regression prediction algorithm through the first model training set.

Among them, the preset regression prediction algorithm is obtained by the fusion of four algorithms: Random Forest (Random Forest), Gradient Boosting Decision Tree (GBDT), Xgboost, and LightGBM. The evaluation of the first recognition model uses the average absolute percentage error (MAPE). ) Index, when the MAPE index value corresponding to the first recognition model is less than the preset standard comparison threshold, it is determined that the first recognition model meets the evaluation standard. The MAPE indicator is used to evaluate the error between the predicted value of the model and the true value. Common regression model evaluation indicators are MAP, MSE, RMSE and MAPE, but MAP, MSE, and RMSE only consider the value of the error. MAPE also considers the ratio between the error and the true value. The calculation formula is:

In the above formula, N is the total number of samples, X is the measured value, and Y is the simulated value. The smaller the MAPE value, the smaller the error between the model predicted value and the true value. In specific implementations, the standard comparison threshold can be set according to the actual situation. When the MAPE is less than the standard comparison threshold, it means that the first recognition model meets the evaluation standard. Prediction through the recognition model that meets the evaluation criteria can ensure the accuracy of the prediction results. The first mapping relationship between the feature information X1 and the label information Y1 can be determined by the first recognition model that meets the evaluation standard. In order to illustrate the process of using the preset regression prediction algorithm obtained by fusion of the above four algorithms to obtain the first recognition model, as an optional method, the process may specifically include:

(1) Use random sampling to obtain the first training sample set, the second training sample set, the third training sample set, and the fourth training sample set from the first model training set. For example, n randomly selected from the first model training set The training samples are drawn in four rounds to obtain four training sets. (The four training sets are independent of each other, and the elements can be repeated);

(2) Using the random forest algorithm based on the first training sample set to train the first classifier; based on the second training sample set using the GBDT algorithm to train the second classifier; based on the third training sample set using the Xgboost algorithm to train The third classifier: Based on the fourth training sample set, the LightGBM algorithm is used to train the fourth classifier; where each training sample set contains different feature information X1 and their corresponding label information Y1, these four categories The training process of the device can be trained based on the corresponding model training algorithm, and the four classifiers obtained can be used to predict user diabetes separately, that is, input the characteristic data of the user to be tested (the specific content corresponds to the characteristic information X1), and pass The classifier finds the corresponding label information Y1.

For the specific training process of the first classifier: ①From the first training sample set, use the Bootstraping method to randomly select m samples with replacement sampling, and perform a total of n samples to generate n training sets; ②For n training sets, Train n decision tree models separately (it can be constructed by ID3 algorithm, C4.5 algorithm, CART algorithm and other existing algorithms); ③For a single decision tree model, assuming that the number of training sample features is n, then each split is based on Information gain/information gain ratio/Gini index selects the best feature for splitting; ④Each tree keeps splitting like this until all training examples of the node belong to the same category, and there is no need to cut the decision tree during the splitting process. Branch; ⑤ The generated multiple decision trees are formed into a random forest. For regression problems, the final prediction result is determined by the average of the predicted values of multiple trees, that is, as the prediction result of the first classifier.

For the specific training process of the second classifier: input the second training sample T={(x1,y1),(x2,y2),...(xm,ym)}T={(x1,y1),(x2 ,y2),...(xm,ym)}, the maximum number of iterations T, loss function L. The output is the strong learner f(x):

A) Initialize the weak learner

Among them, c is the set constant.

B) For the number of iterations t=1, 2,...T:

a) For samples i = 1, 2, ... m, calculate the negative gradient r _ti

b) Use (xi, rti) (i=1, 2, ..m) to fit a CART regression tree to obtain the t-th regression tree, and the corresponding leaf node area is Rtj, j=1, 2, ...,J. Where J is the number of leaf nodes of the regression tree t.

c) For the leaf area j = 1, 2, ... J, calculate the best fit value c _tj

d) Update strong learner

Among them, I is the set of training samples of all leaf node regions Rtj.

C) Get the expression of the strong learner f(x)

Based on the above-mentioned strong learner f(x), the second classifier is obtained by training.

For the specific training process of the third classifier:

A) Establish an initial model, specifically as follows:

Among them, k represents the number of trees, F represents each tree structure constructed, xi represents the i-th sample, and the sum of the score values of xi on each tree is the predicted value of xi,

Is the predicted value.

The objective function of this initial model is

yi is the actual value of the sample corresponding to xi.

B) As the tree grows, through the t-round formula recursion, the final objective function is obtained as

among them,

I _j represents: all samples included in the j-th leaf, wj represents the weight of the j-th leaf, and γT corresponds to the number of leaves.

C) Use the above initial model to substitute the third training sample set data for fitting training, and use the above final objective function to measure how well the model fits the training data (that is, use the objective function to calculate the loss function. The smaller the loss function, the smaller the model can Fit the training data well), so that the deviation and variance of the model meet the standard requirements, that is, the third classifier is finally trained.

For the specific training process of the fourth classifier: A) Use the existing LightGBM algorithm to fit the data in the fourth sample training set, and use the test set selected from the fourth sample training set to obtain the model after each fitting Test to obtain the corresponding coefficient of determination and mean square error value; B) When the coefficient of determination is greater than a certain threshold and the average error value is less than a certain threshold, determine that the fitted model meets the standard, and determine the model that meets the standard as the first Four classifiers.

(3) Finally, the first classifier, the second classifier, the third classifier, and the fourth classifier are fused by bagging to obtain the first recognition model. The specific integration method is through the process of voting, that is, the majority principle is adopted, and the minority obeys the majority. For example, for these four classifiers, after inputting the medical history data, hospitalization data, medical treatment data, physical examination data, and health notification data of the user to be tested, if the prediction results obtained by three of the four classifiers correspond to If the fasting blood glucose value meets the criteria for diabetes, then it can be determined that the user to be tested is suffering from diabetes; if there is only one prediction result obtained by the classifier corresponding to the fasting blood glucose value that meets the criteria for diabetes, the other three classifiers correspond to fasting If the blood glucose level does not meet the standard, it can be determined that the user to be tested does not have diabetes.

It should be noted that if the MAPE index value of the first recognition model is greater than the preset standard comparison threshold, that is, the first recognition model obtained by training does not meet the evaluation standard, then the first model training set can be re-divided to obtain a new first training Sample set, second training sample set, third training sample set, fourth training sample set, and then use the new first training sample set to continue training the first classifier, and use the new second training sample set to continue training the second Classifier, and use the new third training sample set to continue training the third classifier, and use the new fourth training sample set to continue training the fourth classifier, and then the four classifiers obtained by the new training are fused and processed, Determine whether the MAPE index value of the new first recognition model is less than the preset standard comparison threshold. If it is still greater than the preset standard comparison threshold, repeat the process of repeatedly dividing the model training set and updating the training classifier until the latest obtained The MAPE index value of the first recognition model is greater than the preset standard comparison threshold, that is, it meets the evaluation standard.

204a. Input the characteristic information of the target user into the first recognition model to perform similarity matching with the characteristic information X1. Among them, the characteristic information of the target user corresponds to the target characteristic data of the target user except the fasting blood glucose value and the blood glucose value two hours after a meal. As an optional method, step 204a may specifically include: subjecting the feature information of the target user to data cleaning, feature extraction, missing value filling, and outlier processing to obtain feature information of the structured data; combining the feature information of the structured data with The feature information X1 performs similarity matching.

Since the characteristic information of the target user sometimes contains useless data, and/or there are missing values, and/or there are outliers, that is, unstructured data that is not suitable for direct prediction using the first recognition model. Therefore, it is possible to clean the characteristic information of the target user first, and remove useless data (such as removing the user’s current residence location, household registration location, etc.), and only keep medical history data, hospitalization data, medical treatment data, medical examination data, and health notification data Etc.); then perform feature extraction on the retained data (such as extracting medical history data, hospitalization data, medical treatment data, physical examination data, health notification data, etc.); if there are missing values in the extracted feature data, a value of 0 can be used Filling (such as the height and weight of the user's physical examination data, it can be filled with a value of 0, so that subsequent matching with the feature information X1 in the model will ensure comparability and avoid unmatched errors during feature matching); if the extracted features The abnormal value in the data can be corrected with reference to the actual situation (for example, the length of hospitalization is 99999 days, which is obviously abnormal, and the correct length of hospitalization can be further calculated by the start time and end time of the hospitalization, and then be modified). Through a series of processing such as data cleaning, feature extraction, missing value filling, and outlier processing in this optional method, it can be guaranteed to obtain structured data that is comparable to the feature information X1 in the first recognition model and avoid feature matching When generating unmatched errors, remove outliers and improve the accuracy of feature matching.

205a. Determine the first physical examination index value corresponding to the target user by using the feature information X1 with the similarity degree greater than the preset threshold and the highest similarity degree and the first mapping relationship. Among them, the preset threshold can be preset according to actual needs. For example, the larger the preset threshold is set, the higher the matching accuracy of the corresponding feature is. If the similarity is 100%, the feature is completely matched. For example, after the first recognition model inputs the medical history data, hospitalization data, medical treatment data, physical examination data, and health notification data of the target user, it is equivalent to inputting these data into the four classifiers in step 203a and classifying them. The corresponding feature information of each device is matched by similarity, and the feature information that is most similar and greater than a certain threshold is found, and then the corresponding physical examination index values are obtained through the four classifiers, namely the fasting blood glucose value of the target user. Three of the four fasting blood glucose values meet the criteria for diabetes, then the target user can be determined to have diabetes, and the average of these three fasting blood glucose values can be calculated as the first physical examination index calculated by the first recognition model If two of the four fasting blood glucose values do not meet the criteria for diabetes, and the other two fasting blood glucose values meet the criteria for diabetes, then the average of the four fasting blood glucose values is calculated as the first A first physical examination index value calculated by a recognition model, and based on this average value, it is determined whether the target user has diabetes.

Step 202b, which is parallel to step 202a, uses the user characteristics of the sample user's blood glucose level two hours after the meal as the label information Y2, and sets the target feature data of the sample user except the fasting blood glucose value and the two hours postprandial blood glucose level as the feature information X2, Create a second model training set. It should be noted that step 202b is similar to step 202a. The target characteristic data of the sample user at least includes one or more of the sample user's medical history data, hospitalization data, medical treatment data, physical examination data, and health notification data. And the created second model training set includes each feature information X2 and each corresponding label information Y2. That is, the two-hour postprandial blood glucose values of the sample users corresponding to different medical history records, hospitalization records, medication status, physical examination status, and health notifications.

203b. Train a second recognition model for judging a second physical examination index value based on a preset regression prediction algorithm through the second model training set. The evaluation of the second recognition model also uses the MAPE index. When the MAPE index value corresponding to the second recognition model is less than the predetermined standard comparison threshold, it is determined that the second recognition model meets the evaluation standard, which can be determined by the second recognition model that meets the evaluation standard. The second mapping relationship between the feature information X2 and the label information Y2. As an alternative, the specific process of step 203b may include: (1) Using random sampling to obtain the fifth training sample set, the sixth training sample set, the seventh training sample set, and the second model training set separately Eight training sample sets; (2) Based on the fifth training sample set, use the random forest algorithm to train the fifth classifier; based on the sixth training sample set use the GBDT algorithm to train the sixth classifier; use the seventh training sample set Xgboost algorithm, train to obtain the seventh classifier; based on the eighth training sample set and use the LightGBM algorithm to train to obtain the eighth classifier; (3) Combine the fifth classifier, sixth classifier, seventh classifier, and eighth classifier The bagging method is used for fusion processing to obtain the second recognition model.

Similar to the optional method in step 203a, the specific fusion processing method is also a voting process, that is, the majority principle is adopted, and the minority obeys the majority. For example, for these four classifiers, after inputting the medical history data, hospitalization data, medical treatment data, physical examination data, and health notification data of the user to be tested, if the prediction results obtained by three of the four classifiers correspond to The blood glucose level two hours after a meal meets the criteria for diabetes, then it can be determined that the user to be tested has diabetes; if there is only one prediction result obtained by the classifier, the blood glucose level two hours after a meal meets the criteria for diabetes, and the other three The blood glucose level two hours after a meal corresponding to each of the classifiers does not meet the standard, then it can be determined that the user to be tested does not have diabetes. It should be noted that if the MAPE index value of the second recognition model is greater than the preset standard comparison threshold, that is, the second recognition model obtained by training does not meet the evaluation standard, then the first model training set can be re-divided to obtain a new fifth training Sample set, sixth training sample set, seventh training sample set, eighth training sample set, and then use the new fifth training sample set to continue training the fifth classifier, and use the new sixth training sample set to continue training the sixth Classifier, and use the new seventh training sample set to continue training the seventh classifier, and use the new eighth training sample set to continue training the eighth classifier, and then the four classifiers obtained by the new training are fused and processed, Determine whether the MAPE index value of the new second recognition model is less than the preset standard comparison threshold. If it is still greater than the preset standard comparison threshold, repeat the process of repeatedly dividing the model training set and updating the training classifier until the latest obtained The MAPE index value of the second recognition model is greater than the preset standard comparison threshold, that is, it meets the evaluation standard.

204b. Input the feature information of the target user into the second recognition model to perform similarity matching with the feature information X2. In this step, the characteristic information of the target user corresponds to the target characteristic data of the target user except the fasting blood glucose value and the blood glucose value two hours after a meal. As an optional method, step 204b may specifically include: subjecting the feature information of the target user to data cleaning, feature extraction, missing value filling, and outlier processing to obtain feature information of the structured data; combining the feature information of the structured data with The feature information X2 performs similarity matching.

Similar to the optional method in step 204a, through a series of processing such as data cleaning, feature extraction, missing value filling, and outlier processing in this optional method, it can be ensured that when matching with the feature information X2 in the second recognition model Comparable structured data avoids unmatched errors during feature matching, removes outliers, and improves the accuracy of feature matching.

205b. Use the feature information X2 with the similarity degree greater than the predetermined threshold and the highest similarity degree and the second mapping relationship to determine the second physical examination index value corresponding to the target user. Among them, the predetermined threshold can be preset according to actual needs. For example, the larger the predetermined threshold is set, the higher the matching accuracy of the corresponding feature is. If the similarity is 100%, the feature is completely matched. For example, after the second recognition model inputs the medical history data, hospitalization data, medical treatment data, physical examination data, and health notification data of the target user, it is equivalent to inputting these data into the four classifiers in step 203b and classifying them. The corresponding feature information of each device is matched by similarity, and the feature information that is most similar and greater than a certain threshold is found, and then the corresponding physical examination index values are obtained through these four classifiers, that is, the blood glucose value of the target user two hours after a meal , If three of the four two-hour postprandial blood glucose levels meet the criteria for diabetes, then the target user can be determined to have diabetes, and the average of the three two-hour postprandial blood glucose levels can be calculated As the second physical examination index value calculated by the second recognition model; if two of the four two-hour postprandial blood glucose values do not meet the criteria for diabetes, the other two two-hour postprandial blood glucose values If the value meets the criteria for suffering from diabetes, then the average value of the blood glucose values of the four two hours after a meal is calculated as the second physical examination index value calculated by the second recognition model, and the target user is determined whether or not the target user has diabetes based on this average value.

206. Determine the disease degree of the target user according to the first physical examination index value and/or the second physical examination index value.

As an optional manner, step 206 may specifically include: if the first physical examination index value corresponding to the target user is greater than or equal to a first preset threshold, and/or the second physical examination index value is greater than or equal to a second preset threshold, determining the target The user suffers from diabetes; then, the disease degree of the target user is judged by the first numerical interval where the first physical examination index value is located and/or the second numerical interval where the second physical examination index value is located. Among them, the first preset threshold value is determined according to the setting standard of fasting blood glucose for judging diabetes, such as 7.0mmol/L; the second preset threshold value is determined according to the setting standard of blood glucose two hours after a meal for judging diabetes, such as 11.1mmol/L. For example, if it is determined that the first preset threshold of the target user is 8.0 mmol/L and the second preset threshold is 7.6 mmol/L, because the first physical examination index value is greater than the first preset threshold, it can be determined that the target user has diabetes; If it is determined that the first preset threshold value of the target user is 5.7mmol/L and the second preset threshold value is 11.9mmol/L, because the second physical examination index value is greater than the second preset threshold value, it can be determined that the target user has diabetes; The first preset threshold for the target user is 8.3mmol/L, and the second preset threshold is 11.7mmol/L. Because the first physical examination index value is greater than the first preset threshold, the second physical examination index value is greater than the second preset threshold, so It can be determined that the target user has diabetes.

As for the prevalence of diabetes, the following three cases are discussed: (1) Only the first physical examination index value is used to judge, that is, the first numerical interval where the first physical examination index value is located is used to judge the target user’s suffering The degree of disease may specifically include: dividing a plurality of numerical intervals greater than the first preset threshold and increasing according to a predetermined numerical law; creating a third mapping relationship between the plurality of numerical intervals and the degree of diabetes; determining the first physical examination index The value corresponds to a first numerical interval in a plurality of numerical intervals; according to the third mapping relationship and the first numerical interval, the degree of first diabetes prevalence of the target user is determined. For example, setting multiple numerical intervals greater than the first preset threshold of 7.0 mmol/L and the corresponding diabetes degree in the third mapping relationship are: mild diabetes: 7.0-8.4 mmol/L, moderate diabetes: 8.4 ～10.1mmol/L, severe diabetes: greater than 10.11mmol/L. If it is determined that the first physical examination index value is 9.6mmol/L, it can be judged that the first numerical interval in which the first physical examination index value lies is: 8.4～11.1mmol/L. According to the third mapping relationship and the first numerical interval, It is determined that the prevalence of diabetes of the target user is moderate diabetes. (2) Use only the second physical examination index value to judge, that is, judge the target user’s prevalence through the second numerical interval in which the second physical examination index value is located, specifically including: dividing greater than the second preset threshold, and according to the predetermined value Multiple numerical intervals that increase regularly; create a fourth mapping relationship between multiple numerical intervals and the degree of diabetes; determine that the second physical examination index value corresponds to a second numerical interval in the multiple numerical intervals; according to the fourth mapping relationship And the second numerical interval to determine the degree of the second diabetes of the target user. For example, setting multiple numerical intervals greater than the second preset threshold 11.1mmol/L and the corresponding diabetes degree in the fourth mapping relationship are respectively: moderate diabetes: 11.1-16.7mmol/L, severe diabetes: greater than 16.7 mmol/L (when it is greater than 16.7 mmol/L, ketoacidosis is prone to occur). If it is determined that the second physical examination index value is 12.6mmol/L, it can be judged that the second numerical interval of the second physical examination index value is 11.1～16.7mmol/L. According to the fourth mapping relationship and the second numerical interval, It is determined that the degree of diabetes of the target user is already moderate diabetes. (3) Combine the first physical examination index value and the second physical examination index value to make a comprehensive judgment (this judgment method takes into account many factors, so the prediction accuracy is relatively high), that is, the first value through the first physical examination index value The interval and the second numerical interval where the second physical examination index value is located are used to determine the disease degree of the target user, including: if the disease degree of the first diabetes and the disease degree of the second diabetes are the same, according to the same diabetes The degree of illness determines the final degree of illness. If the degree of prevalence of the first diabetes and the prevalence of the second diabetes are different, according to the accuracy or acceptance rate of the user’s feedback on the two prediction methods of the first recognition model and the second recognition model, the corresponding first recognition model is obtained respectively The first weight of and the second weight corresponding to the second recognition model; when the first weight is greater than the second weight, the first diabetes is determined as the target user’s disease; when the second weight is greater than the first weight , Determine the degree of the second diabetes as the degree of the target user.

In this embodiment, the weights corresponding to the two prediction methods can be set according to the accuracy or acceptance rate of user feedback. Specifically, the weight values corresponding to different accuracy rates or adoption rates can be counted, and then the corresponding weights of the prediction methods can be found through the mapping relationship obtained by statistics. For this embodiment, according to the accuracy or acceptance rate of user feedback, it can accurately reflect which prediction method has higher prediction accuracy, and then the prediction result obtained by the prediction method with higher prediction accuracy is selected as the final judgment result, which is more accurate. In addition, the corresponding weights of the two prediction methods can also be preset artificially according to the actual situation. For example, according to the results of user feedback, it is found that using the first physical examination index value to predict the degree of diabetes is more accurate, and the weight that can be configured for the first physical examination index value prediction method is 70%, which is the second physical examination index value prediction The weight of the mode configuration is 30%. When the two predictions produce different results, the result of the first physical examination index value prediction can be fed back to the target user as the final diagnosis result. Assuming that the first physical examination index value predicts moderate diabetes, and the second physical examination index predicts severe diabetes, then according to the configured weight, it is finally determined that the diabetes prevalence of the target user is moderate diabetes. After obtaining the actual fasting blood glucose value of the target user and the two-hour postprandial blood glucose value of the target user, the two recognition models in this embodiment can be continuously trained as a new sample training set to achieve the effect of higher prediction accuracy. Through the above diabetes prediction method, the mapping relationship between feature information and label information can be determined by training the model training set, matching the structured data of the target user with the regression prediction model, and then determining the fasting blood glucose through the mapping relationship The first integrated examination index value and the second physical examination index value two hours after a meal can be compared with the values of the first preset threshold and the second preset threshold to determine whether the user has diabetes, starting from the diabetes diagnosis index, Not only can it predict whether the user is ill, but also the first numerical interval where the first physical examination index value is located, and/or the second numerical interval where the second physical examination index value is located, to determine the degree of the target user’s illness and make the diagnosis The result is more complete.

Further, as a specific embodiment of the method shown in FIG. 1 and FIG. 2, an embodiment of the present application provides a diabetes prediction device. As shown in FIG. 3, the device includes: an acquisition unit 31, a creation unit 32, and a judgment unit 33 , Determination unit 34. The obtaining unit 31 can be used to obtain sample user data in the original health file and electronic medical record data; the creating unit 32 can be used to create a numerical regression prediction model based on user characteristics in the sample user data; the judging unit 33 can be used to use regression The prediction model judges the first physical examination index value of the fasting blood glucose of the target user and the second physical examination index value of the blood glucose of the preset duration after a meal; the determining unit 34 may be used to determine the target according to the first physical examination index value and/or the second physical examination index value The user’s prevalence.

In a specific implementation application scenario, in order to create a numerical regression prediction model based on user characteristics in the sample user data, as shown in FIG. 4, the creating unit 32 may specifically include: a creating module 321, a training module 322, and a determining module 323. The creation module 321 can be specifically configured to use the fasting blood glucose value in the user characteristics as the label information Y1, and use the target feature data of the sample user except the fasting blood glucose value and the blood glucose value two hours after a meal as the feature information X1, Create a first model training set, and the target feature data includes at least one or more of the sample user’s medical history data, hospitalization data, medical treatment data, physical examination data, and health notification data; training module 322, specifically It can be used to train a first recognition model for judging the value of the first physical examination index based on a preset regression prediction algorithm through the first model training set, wherein the preset regression prediction algorithm consists of random forest and gradient boosting decision tree The four algorithms of GBDT, Xgboost, and LightGBM are fused. The evaluation of the first recognition model adopts the average absolute percentage error MAPE index. When the MAPE index value corresponding to the first recognition model is less than the preset standard comparison threshold, it is determined that the The first recognition model meets the evaluation standard; the determining module 323 can be specifically used to determine the first mapping relationship between the feature information X1 and the label information Y1 through the first recognition model that meets the evaluation standard; creating module 321 Specifically, it can also be used to use the user's characteristic Chinese blood glucose level two hours after a meal as label information Y2, and the target characteristic data of the sample user as characteristic information X2 to create a second model training set; training module 322, specifically It can also be used to train a second recognition model for judging the value of the second physical examination index through the second model training set based on the preset regression prediction algorithm, wherein the evaluation of the second recognition model adopts the MAPE index, When the MAPE index value corresponding to the second recognition model is less than the predetermined standard comparison threshold, it is determined that the second recognition model meets the evaluation standard; the determining module 323 can be specifically used to pass the second recognition model that meets the evaluation standard. Determine a second mapping relationship between the feature information X2 and the label information Y2.

Correspondingly, in order to determine the first physical examination index value of the fasting blood glucose of the target user and the second physical examination index value of the blood glucose of the preset duration after the meal, as shown in FIG. 4, the judgment unit 33 may specifically include: a matching module 331 and a determination module 332. The matching module 331 may be specifically configured to input the characteristic information of the target user into the first recognition model to perform similarity matching with the characteristic information X1, and the characteristic information of the target user corresponds to the target user except the The target feature data other than the fasting blood glucose level and the blood glucose level two hours after a meal; the determining module 332 can be specifically used to use the feature information X1 and the first feature information with the highest similarity and the similarity greater than the preset threshold. The mapping relationship determines the first physical examination index value corresponding to the target user; the matching module 331 may be specifically used to input the characteristic information of the target user into the second recognition model to perform similarity with the characteristic information X2 Matching; the determination module 332 can be specifically configured to use feature information X2 with a similarity greater than a predetermined threshold and the highest similarity and the second mapping relationship to determine the second physical examination index value corresponding to the target user. In a specific application scenario, in order to determine the disease degree of the target user according to the first physical examination index value and/or the second physical examination index value, as shown in FIG. 4, the determining unit 34 may specifically include: a determining module 341, Judging module 342.

The determining module 341 can be used to determine that the target user has diabetes if the first physical examination index value corresponding to the target user is greater than or equal to the first preset threshold, and/or the second physical examination index value is greater than or equal to the second preset threshold; 342. It can be used to judge the disease degree of the target user through the first numerical interval where the first physical examination index value is located and/or the second numerical interval where the second physical examination index value is located. In a specific application scenario, in order to accurately determine the degree of disease of the target user, the determination module 342 is specifically used to divide multiple numerical ranges that are greater than the first preset threshold and increase according to a predetermined numerical law; The third mapping relationship between the numerical interval and the degree of diabetes; determining that the first physical examination index value corresponds to the first numerical interval in the multiple numerical intervals; judging the diabetes of the target user according to the third mapping relationship and the first numerical interval Degree of illness. Divide multiple numerical intervals that are greater than the second preset threshold and increase according to a predetermined numerical law; create a fourth mapping relationship between multiple numerical intervals and the degree of diabetes; determine that the second physical examination index value corresponds to multiple numerical intervals The second numerical interval in, according to the fourth mapping relationship and the second numerical interval, determine the diabetes prevalence of the target user;

The judging module 342 is specifically further configured to: if the prevalence degree of the first diabetes and the prevalence degree of the second diabetes are different, according to the user's prediction of passing the first recognition model and the second recognition model The accuracy or adoption rate of the way feedback, the first weight corresponding to the first recognition model and the second weight corresponding to the second recognition model are respectively obtained; when the first weight is greater than the second weight, the The first degree of diabetes prevalence is determined as the prevalence of the target user; when the second weight is greater than the first weight, the second prevalence degree of diabetes is determined as the prevalence of the target user Disease severity. In a specific application scenario, the matching module 331 can be specifically used to process the feature information of the target user through data cleaning, feature extraction, missing value filling, and outlier processing to obtain feature information of structured data; The feature information is matched with the feature information X1 for similarity; the matching module 331 can be specifically used to process the feature information of the target user through data cleaning, feature extraction, missing value filling, and outlier processing to obtain feature information of structured data ; Perform similarity matching between the feature information of the structured data and the feature information X2.

In a specific application scenario, the training module 322 can be specifically used to obtain a first training sample set, a second training sample set, a third training sample set, and a fourth training sample from the first model training set by random sampling. Set; based on the first training sample set using the random forest algorithm to train a first classifier; based on the second training sample set using the GBDT algorithm to train a second classifier; based on the third training sample set using The Xgboost algorithm is used to train the third classifier; the LightGBM algorithm is used to train the fourth classifier based on the fourth training sample set; the first classifier, the second classifier, and the third classifier are The fourth classifier uses the bagging method to perform fusion processing to obtain the first recognition model; the training module 322 can be specifically used to obtain a fifth training sample set from the second model training set by random sampling 6. The sixth training sample set, the seventh training sample set, and the eighth training sample set; based on the fifth training sample set, a random forest algorithm is used to train a fifth classifier; based on the sixth training sample set, the GBDT algorithm is used , Training to obtain a sixth classifier; based on the seventh training sample set using the Xgboost algorithm, training to obtain a seventh classifier; based on the eighth training sample set using the LightGBM algorithm, training to obtain an eighth classifier; The fifth classifier, the sixth classifier, the seventh classifier, and the eighth classifier perform fusion processing using a bagging method to obtain the second recognition model.

It should be noted that, for other corresponding descriptions of the functional units involved in the diabetes prediction device provided in this embodiment, reference may be made to the corresponding descriptions in FIGS. 1 to 2, and details are not repeated here. Based on the methods shown in Figures 1 and 2, correspondingly, embodiments of the present application also provide a storage medium on which computer-readable instructions are stored. When the computer-readable instructions are executed by a processor, the foregoing 1 and Figure 2 shows the prediction method of diabetes. Based on this understanding, the technical solution of this application can be embodied in the form of a software product. The software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.), including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods in each implementation scenario of the present application.

Based on the methods shown in Figures 1 and 2 and the virtual device embodiments shown in Figures 3 and 4, in order to achieve the above objectives, an embodiment of the present application also provides a computer device, which may be a personal computer, Servers, network devices, etc., the physical device includes a storage medium and a processor; the storage medium is used to store computer-readable instructions; the processor is used to execute computer-readable instructions to implement the above-mentioned diabetes shown in FIGS. 1 and 2 Forecasting method. Optionally, the computer device may also include a user interface, a network interface, a camera, a radio frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and so on. The user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, and the like. The network interface can optionally include a standard wired interface, a wireless interface (such as a Bluetooth interface, a WI-FI interface), etc.

Those skilled in the art can understand that the computer device structure provided in this embodiment does not constitute a limitation on the physical device, and may include more or fewer components, or combine certain components, or arrange different components. The non-volatile readable storage medium may also include an operating system and a network communication module. The operating system is a program that manages the hardware and software resources of the physical equipment for the prediction of diabetes, and supports the operation of information processing programs and other software and/or programs. The network communication module is used to implement communication between various components in the non-volatile readable storage medium and communication with other hardware and software in the physical device. Through the description of the foregoing implementation manners, those skilled in the art can clearly understand that this application can be implemented by means of software plus a necessary general hardware platform, or by hardware. By applying the technical solution of this application, compared with the current prior art, this application can further determine the severity of the disease on the basis of detecting that the target user is suffering from diabetes, so that the diagnosis result can be more perfect, and thus can be tracked in time Understand the development of the target user's condition and carry out the corresponding supporting treatment.

Those skilled in the art can understand that the accompanying drawings are only schematic diagrams of preferred implementation scenarios, and the modules or processes in the accompanying drawings are not necessarily necessary for implementing this application. Those skilled in the art can understand that the modules in the device in the implementation scenario can be distributed in the device in the implementation scenario according to the description of the implementation scenario, or can be changed to be located in one or more devices different from the implementation scenario. The modules of the above implementation scenarios can be combined into one module or further divided into multiple sub-modules. The above serial number of this application is only for description, and does not represent the merits of implementation scenarios. The above disclosures are only a few specific implementation scenarios of the application, but the application is not limited to these, and any changes that can be thought of by those skilled in the art should fall into the protection scope of the application.

Claims

A method for predicting diabetes, including:

Obtain sample user data from original health files and electronic medical record data;

Creating a numerical regression prediction model based on user characteristics in the sample user data;

Using the regression prediction model to determine the first physical examination index value of the fasting blood glucose of the target user and the second physical examination index value of the blood glucose of the preset duration after a meal;

According to the first physical examination index value and/or the second physical examination index value, the disease degree of the target user is determined.
The method according to claim 1, wherein the user characteristics are extracted from the sample user data by using regular expressions, and the preset duration is two hours; and the user characteristics are created based on the user characteristics in the sample user data. The numerical regression prediction model specifically includes: using the fasting blood glucose value in the user characteristics as the label information Y1, and using the target feature data of the sample user except the fasting blood glucose value and the blood glucose value two hours after a meal as the feature Information X1, create a first model training set, the target feature data includes at least one or more of the sample user’s medical history data, hospitalization data, medical treatment data, physical examination data, and health notification data; The first model training set trains a first recognition model for judging the value of the first physical examination index based on a preset regression prediction algorithm, wherein the preset regression prediction algorithm consists of random forest, gradient boosting decision tree GBDT, Xgboost, The four algorithms of LightGBM are fused. The evaluation of the first recognition model adopts the average absolute percentage error MAPE index. When the MAPE index value corresponding to the first recognition model is less than the preset standard comparison threshold, the first recognition model is determined In accordance with the evaluation standard, the first mapping relationship between the characteristic information X1 and the label information Y1 can be determined by the first recognition model that meets the evaluation standard; the user’s characteristic blood glucose level two hours after a meal is used as label information Y2, using the target feature data of the sample user as feature information X2 to create a second model training set; through the second model training set based on the preset regression prediction algorithm training for determining the second The second identification model of the physical examination index value, wherein the evaluation of the second identification model adopts the MAPE index, and when the MAPE index value corresponding to the second identification model is less than a predetermined standard comparison threshold, it is determined that the second identification model conforms to Evaluation criteria, the second mapping relationship between the feature information X2 and the label information Y2 can be determined through the second recognition model that meets the evaluation criteria.
The method according to claim 2, said using the regression prediction model to determine the first physical examination index value of the fasting blood glucose of the target user and the second physical examination index value of the blood glucose for a preset time after a meal, specifically comprising: comparing the target user The feature information of is input into the first recognition model to perform similarity matching with the feature information X1, and the feature information of the target user corresponds to the fasting blood glucose value and the two-hour postprandial blood glucose value. Target feature data; using the feature information X1 with the similarity greater than a preset threshold and the highest similarity and the first mapping relationship to determine the first physical examination index value corresponding to the target user; and the feature of the target user The information is input into the second recognition model to perform similarity matching with the feature information X2; the feature information X2 with the similarity greater than a predetermined threshold and the highest similarity and the second mapping relationship are used to determine that the target user corresponds The second physical examination index value.
The method according to claim 3, wherein the determining the disease degree of the target user according to the first physical examination index value and/or the second physical examination index value specifically includes: if the target user corresponds to If the first physical examination index value is greater than or equal to a first preset threshold, and/or the second physical examination index value is greater than or equal to a second preset threshold, it is determined that the target user has diabetes; and the first physical examination index is passed The first numerical interval where the value is located, and/or the second numerical interval where the second physical examination index value is located, determine the disease degree of the target user.
The method according to claim 4, judging the disease degree of the target user according to the first numerical interval in which the first physical examination index value is located, specifically comprising: dividing the disease greater than the first preset threshold value, and according to A plurality of numerical intervals with a predetermined numerical regular increase; creating a third mapping relationship between the plurality of numerical intervals and the prevalence of diabetes; determining that the first physical examination index value corresponds to the plurality of numerical intervals A first numerical interval; judging the first diabetes degree of the target user according to the third mapping relationship and the first numerical interval;

According to the second numerical interval in which the second physical examination index value is located, judging the disease degree of the target user specifically includes: dividing a plurality of numerical intervals greater than the second preset threshold value and increasing according to a predetermined numerical law Create a fourth mapping relationship between the multiple numerical intervals and the degree of diabetes; determine that the second physical examination index value corresponds to the second numerical interval in the multiple numerical intervals; according to the first Four mapping relationships and the second numerical interval to determine the second diabetes prevalence of the target user;

According to the first numerical interval in which the first physical examination index value is located and the second numerical interval in which the second physical examination index value is located, determining the disease degree of the target user specifically includes: The prevalence of diabetes is different from the prevalence of the second diabetes. According to the accuracy or acceptance rate of the user’s feedback on the two prediction methods of the first recognition model and the second recognition model, the The first weight corresponding to the first recognition model and the second weight corresponding to the second recognition model; when the first weight is greater than the second weight, the first degree of diabetes is determined as the target The disease degree of the user; when the second weight is greater than the first weight, the second diabetes degree is determined as the disease degree of the target user.
The method according to claim 3, wherein the inputting the characteristic information of the target user into the first recognition model to perform similarity matching with the characteristic information X1 specifically includes: matching the characteristic information of the target user After data cleaning, feature extraction, missing value filling, and abnormal value processing, the feature information of the structured data is obtained; the feature information of the structured data is matched with the feature information X1 for similarity;

The inputting the feature information of the target user into the second recognition model to perform similarity matching with the feature information X2 specifically includes: subjecting the feature information of the target user to data cleaning, feature extraction, and missing values Filling and outlier processing to obtain the feature information of the structured data; the feature information of the structured data is matched with the feature information X2 for similarity.
The method according to claim 2, wherein the training of the first recognition model for judging the value of the first physical examination index through the first model training set based on a preset regression prediction algorithm specifically includes: adopting a random sampling method from In the first model training set, a first training sample set, a second training sample set, a third training sample set, and a fourth training sample set are respectively obtained; based on the first training sample set, a random forest algorithm is used to train to obtain the first Classifier; based on the second training sample set using the GBDT algorithm to train a second classifier; based on the third training sample set using the Xgboost algorithm to train a third classifier; based on the fourth training sample set using The LightGBM algorithm is trained to obtain a fourth classifier; the first classifier, the second classifier, the third classifier, and the fourth classifier are fused using the bagging method to obtain the first classifier A recognition model;

The training of the second recognition model for judging the value of the second physical examination index through the second model training set based on the preset regression prediction algorithm specifically includes: adopting a random sampling method from the second model training set Obtain the fifth training sample set, the sixth training sample set, the seventh training sample set, and the eighth training sample set respectively; based on the fifth training sample set, the random forest algorithm is used to train the fifth classifier; based on the first Sixth training sample set uses the GBDT algorithm to train the sixth classifier; based on the seventh training sample set uses the Xgboost algorithm to train the seventh classifier; based on the eighth training sample set uses the LightGBM algorithm to train the eighth Classifier; the fifth classifier, the sixth classifier, the seventh classifier, and the eighth classifier are fused using a bagging method to obtain the second recognition model.
A diabetes prediction device, including:

The obtaining unit is used to obtain sample user data in the original health file and electronic medical record data;

The creation unit is used to create a numerical regression prediction model based on the user characteristics in the sample user data;

A judging unit, configured to use the regression prediction model to judge the first physical examination index value of the fasting blood glucose of the target user and the second physical examination index value of the blood glucose of the preset duration after a meal;

The determining unit is configured to determine the disease degree of the target user according to the first physical examination index value and/or the second physical examination index value.
The apparatus according to claim 8, wherein the creation unit specifically includes: a creation module, a training module, and a determination module;

The creation module is specifically configured to use the fasting blood glucose value in the user characteristics as label information Y1, and use target feature data of the sample user except the fasting blood glucose value and the blood glucose value two hours after a meal as the feature information X1 Create a first model training set, the target feature data includes at least one or more of medical history data, hospitalization data, medical treatment data, physical examination data, and health notification data of the sample user;

The training module is specifically configured to train a first recognition model for judging the value of the first physical examination index based on a preset regression prediction algorithm through the first model training set, wherein the preset regression prediction algorithm is determined by random The four algorithms of forest, gradient boosting decision tree GBDT, Xgboost, and LightGBM are fused. The evaluation of the first recognition model adopts the average absolute percentage error MAPE index. When the MAPE index value corresponding to the first recognition model is less than the preset standard comparison Threshold, determining that the first recognition model meets the evaluation standard;

The determining module is specifically configured to determine the first mapping relationship between the feature information X1 and the label information Y1 through the first recognition model that meets the evaluation standard;

The creation module is specifically further configured to use the user's characteristic Chinese blood glucose level two hours after a meal as label information Y2, and use the target characteristic data of the sample user as characteristic information X2 to create a second model training set;

The training module is specifically further configured to train a second recognition model for judging the second physical examination index value based on the preset regression prediction algorithm through the second model training set, wherein the second recognition model The MAPE index is used for the evaluation, and when the MAPE index value corresponding to the second recognition model is less than a predetermined standard comparison threshold, it is determined that the second recognition model meets the evaluation standard;

The determining module is specifically further configured to determine the second mapping relationship between the feature information X2 and the label information Y2 through the second recognition model that meets the evaluation standard.
The device according to claim 9, wherein the judgment unit specifically includes: a matching module and a determining module;

The matching module is specifically configured to input the characteristic information of the target user into the first recognition model to perform similarity matching with the characteristic information X1, and the characteristic information of the target user corresponds to the target user The target characteristic data other than the fasting blood glucose value and the blood glucose value two hours after the meal;

The determining module is specifically configured to determine the first physical examination index value corresponding to the target user by using the characteristic information X1 with the similarity greater than a preset threshold and the highest similarity and the first mapping relationship;

The matching module is specifically further configured to input feature information of the target user into the second recognition model to perform similarity matching with the feature information X2;

The determining module is specifically further configured to determine the second physical examination index value corresponding to the target user by using the feature information X2 with the similarity greater than a predetermined threshold and the highest similarity and the second mapping relationship.
The device according to claim 10, the determining unit specifically includes: a determining module and a judging module;

The determining module is configured to determine that the target user has diabetes if the first physical examination index value corresponding to the target user is greater than or equal to a first preset threshold, and/or the second physical examination index value is greater than or equal to a second preset threshold;

The judgment module is configured to judge the disease degree of the target user through the first numerical interval where the first physical examination index value is located and/or the second numerical interval where the second physical examination index value is located.
The device according to claim 11, wherein the judgment module is specifically further configured to divide a plurality of numerical intervals that are greater than a first preset threshold and increase according to a predetermined numerical law; create a relationship between the plurality of numerical intervals and the degree of diabetes The third mapping relationship for determining the first physical examination index value corresponds to the first numerical interval in the multiple numerical intervals; according to the third mapping relationship and the first numerical interval, the degree of diabetes prevalence of the target user is determined. Divide multiple numerical intervals that are greater than the second preset threshold and increase according to a predetermined numerical law; create a fourth mapping relationship between multiple numerical intervals and the degree of diabetes; determine that the second physical examination index value corresponds to multiple numerical intervals The second numerical interval in, according to the fourth mapping relationship and the second numerical interval, determine the diabetes prevalence of the target user;

The judgment module is further specifically configured to pass the first recognition model and the second recognition model according to the user’s opinion if the first diabetes degree is different from the second diabetes degree. For the accuracy or adoption rate of the prediction mode feedback, the first weight corresponding to the first recognition model and the second weight corresponding to the second recognition model are respectively obtained; when the first weight is greater than the second weight, The first diabetes degree is determined as the target user's disease degree; when the second weight is greater than the first weight, the second diabetes degree is determined as the target user's disease degree Degree of illness.
The device according to claim 10, the matching module is specifically configured to process the feature information of the target user through data cleaning, feature extraction, missing value filling, and outlier processing to obtain feature information of structured data; Similarity matching between the feature information of the transformed data and the feature information X1;

The matching module is specifically configured to perform data cleaning, feature extraction, missing value filling, and abnormal value processing on the feature information of the target user to obtain feature information of the structured data; and compare the feature information of the structured data with the feature Information X2 performs similarity matching.
The device according to claim 9, wherein the training module is specifically configured to obtain a first training sample set, a second training sample set, a third training sample set, and a first training sample set from the first model training set by random sampling. Four training sample set; based on the first training sample set using the random forest algorithm to train a first classifier; based on the second training sample set using the GBDT algorithm to train a second classifier; based on the third training The sample set uses the Xgboost algorithm to train the third classifier; based on the fourth training sample set uses the LightGBM algorithm to train the fourth classifier; the first classifier, the second classifier, and the first The three classifiers and the fourth classifier perform fusion processing using the bagging method to obtain the first recognition model;

The training module is specifically further configured to obtain a fifth training sample set, a sixth training sample set, a seventh training sample set, and an eighth training sample set from the second model training set by random sampling; based on the The fifth training sample set uses the random forest algorithm to train a fifth classifier; based on the sixth training sample set the GBDT algorithm is used to train the sixth classifier; based on the seventh training sample set the Xgboost algorithm is used to train Seventh classifier; Based on the eighth training sample set, use the LightGBM algorithm to train an eighth classifier; combine the fifth classifier, the sixth classifier, the seventh classifier, and the eighth The classifier uses the bagging method to perform fusion processing to obtain the second recognition model.
A non-volatile readable storage medium having computer readable instructions stored thereon. The method for predicting diabetes when the computer readable instructions are executed by a processor includes: obtaining samples in original health files and electronic medical record data User data; create a numerical regression prediction model based on the user characteristics in the sample user data; use the regression prediction model to determine the first physical examination index value of the target user’s fasting blood glucose and the second physical examination index of the blood glucose for a preset duration after a meal Value; according to the first physical examination index value and/or the second physical examination index value, determine the disease degree of the target user.
The non-volatile readable storage medium according to claim 15, wherein the user characteristics are extracted from the sample user data using regular expressions, and the preset duration is two hours; the computer-readable instructions When executed by the processor, the creation of a numerical regression prediction model based on the user characteristics in the sample user data specifically includes: taking the fasting blood glucose value in the user characteristics as the label information Y1, and dividing the sample user by the The fasting blood glucose value and the target feature data other than the blood glucose value two hours after the meal are used as feature information X1 to create a first model training set. The target feature data includes at least the medical history data, hospitalization data, and doctor visits of the sample user One or more of medication data, physical examination data, and health notification data; the first recognition model for judging the value of the first physical examination index is trained based on a preset regression prediction algorithm through the first model training set, wherein, The preset regression prediction algorithm is obtained by fusion of four algorithms: random forest, gradient boosting decision tree GBDT, Xgboost, and LightGBM. The evaluation of the first recognition model uses the average absolute percentage error MAPE index. When the first recognition model corresponds to When the MAPE index value is less than the preset standard comparison threshold, it is determined that the first recognition model meets the evaluation standard, and the first recognition model that meets the evaluation standard can determine the difference between the feature information X1 and the label information Y1 The first mapping relationship; using the user's characteristic Chinese blood glucose level two hours after the meal as the label information Y2, and the target characteristic data of the sample user as the characteristic information X2, creating a second model training set; through the second The model training set trains a second recognition model for judging the value of the second physical examination index based on the preset regression prediction algorithm, wherein the evaluation of the second recognition model uses the MAPE index, and when the second recognition model corresponds to When the MAPE index value is less than the predetermined standard comparison threshold, it is determined that the second recognition model meets the evaluation standard, and the second recognition model that meets the evaluation standard can determine the first between the feature information X2 and the label information Y2 Two mapping relationship.
The non-volatile readable storage medium according to claim 16, when the computer-readable instructions are executed by the processor, the first physical examination index value and the postprandial index value of the fasting blood glucose of the target user are determined by the regression prediction model. The second physical examination index value of blood glucose for a preset duration specifically includes: inputting the characteristic information of the target user into the first recognition model to perform similarity matching with the characteristic information X1, and the characteristic information of the target user corresponds The target feature data other than the fasting blood glucose value and the two-hour postprandial blood glucose value; using the feature information X1 with the similarity greater than the preset threshold and the highest similarity and the first mapping relationship to determine The first physical examination index value corresponding to the target user; the characteristic information of the target user is input into the second recognition model to perform similarity matching with the characteristic information X2; the similarity is greater than a predetermined threshold and the similarity is used The highest characteristic information X2 and the second mapping relationship determine the second physical examination index value corresponding to the target user.
A computer device includes a non-volatile readable storage medium, a processor, and computer readable instructions stored on the non-volatile readable storage medium and running on the processor, and the processor executes the computer A method for predicting diabetes when instructions are readable, including: obtaining sample user data in original health files and electronic medical record data; creating a numerical regression prediction model based on user characteristics in the sample user data; using the regression prediction model Determine the first physical examination index value of the fasting blood glucose of the target user and the second physical examination index value of the blood glucose of the preset duration after a meal; according to the first physical examination index value and/or the second physical examination index value, determine the target user’s Degree of illness.
The computer device according to claim 18, wherein the user characteristics are extracted from the sample user data using regular expressions, and the preset duration is two hours; the computer-readable instructions are implemented when the processor is executed The creation of a numerical regression prediction model based on the user characteristics in the sample user data specifically includes: using the fasting blood glucose value in the user characteristics as label information Y1, and dividing the sample user by the fasting blood glucose value and the The target feature data other than the blood glucose level two hours after a meal is used as feature information X1 to create a first model training set. The target feature data includes at least the medical history data, hospitalization data, medical treatment data, physical examination data, One or more items of health notification data; a first recognition model for judging the value of the first physical examination index is trained based on a preset regression prediction algorithm through the first model training set, wherein the preset regression prediction The algorithm is obtained by fusion of four algorithms: random forest, gradient boosting decision tree GBDT, Xgboost, and LightGBM. The evaluation of the first recognition model adopts the average absolute percentage error MAPE index. When the MAPE index value corresponding to the first recognition model is less than the expected value When setting the standard comparison threshold, it is determined that the first recognition model meets the evaluation standard, and the first mapping relationship between the feature information X1 and the label information Y1 can be determined through the first recognition model that meets the evaluation standard; The user’s characteristic blood glucose level two hours after the meal is used as label information Y2, and the target characteristic data of the sample user is used as characteristic information X2 to create a second model training set; the second model training set is based on the A preset regression prediction algorithm trains a second recognition model for judging the value of the second physical examination index, wherein the evaluation of the second recognition model adopts the MAPE index, and when the MAPE index value corresponding to the second recognition model is less than a predetermined value When the standard compares the threshold, it is determined that the second recognition model meets the evaluation standard, and the second mapping relationship between the feature information X2 and the label information Y2 can be determined by the second recognition model that meets the evaluation standard.
The computer device according to claim 19, when the computer-readable instructions are executed by the processor, the first physical examination index value of the fasting blood glucose of the target user and the second blood glucose of the preset duration after a meal are determined by the regression prediction model. The second physical examination index value specifically includes: inputting the characteristic information of the target user into the first recognition model to perform similarity matching with the characteristic information X1, and the characteristic information of the target user corresponds to the fasting blood glucose value And the target feature data other than the blood glucose level two hours after the meal; using the feature information X1 with the similarity greater than the preset threshold and the highest similarity and the first mapping relationship to determine the corresponding The first integrated inspection index value; input the feature information of the target user into the second recognition model to perform similarity matching with the feature information X2; use the feature information X2 with the similarity greater than a predetermined threshold and the highest similarity and The second mapping relationship determines the second physical examination index value corresponding to the target user.