CN110197720A - Prediction technique and device, storage medium, the computer equipment of diabetes - Google Patents

Prediction technique and device, storage medium, the computer equipment of diabetes Download PDF

Info

Publication number
CN110197720A
CN110197720A CN201910185079.2A CN201910185079A CN110197720A CN 110197720 A CN110197720 A CN 110197720A CN 201910185079 A CN201910185079 A CN 201910185079A CN 110197720 A CN110197720 A CN 110197720A
Authority
CN
China
Prior art keywords
physical examination
index value
training
diabetes
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910185079.2A
Other languages
Chinese (zh)
Inventor
金晓辉
阮晓雯
徐亮
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910185079.2A priority Critical patent/CN110197720A/en
Publication of CN110197720A publication Critical patent/CN110197720A/en
Priority to PCT/CN2019/117217 priority patent/WO2020181805A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Abstract

This application discloses a kind of prediction technique of diabetes and device, storage medium, computer equipments, it is related to field of computer technology, can effectively solve can only judge whether user suffers from diabetes in the prior art, but the problem of can not judging the severity of its illness, wherein method includes: the sample of users data obtained in original health archives and electronic health record data;Regressive prediction model according to the user characteristics creation numeric type in the sample of users data;The first physical examination index value of target user's fasting blood-glucose and the second physical examination index value of postprandial preset duration blood glucose are judged using the regressive prediction model;According to the first physical examination index value and/or the second physical examination index value, the extent of the target user is determined.The application is suitable for the prediction of diabetes, and the determination to diabetes extent.

Description

Prediction technique and device, storage medium, the computer equipment of diabetes
Technical field
This application involves the prediction technique and device of field of computer technology more particularly to a kind of diabetes, storages to be situated between Matter, computer equipment.
Background technique
Diabetes are one group of metabolic diseases characterized by hyperglycemia, and it is impaired that when morbidity will lead to big blood vessel, capilary And multiple positions such as jeopardize the heart, brain, kidney, peripheral nerve, eyes, foot, also it can reinforce the pre- of diabetes with multiple complications It is completely necessary for surveying work.However as the progress of science and technology, the diagnosis of disease has been not limited to the analysis of doctor, has utilized people Work intelligently predicts diabetes, can more meet trend of today.
It is in the industry at present by collecting diabetes case, by diabetes patient data for the common methods of glycosuria disease forecasting It is compared with healthy population data, constructs 0-1 disaggregated model by all kinds of characteristic dimension data of patient and judge that user is It is no to suffer from diabetes.
However the method for existing glycosuria disease forecasting can only judge whether patient suffers from diabetes, can not but judge its illness Severity causes diagnostic result incomplete, can not carry out matched control according to extent and treat, and then may make At the exacerbated of conditions of patients.
Summary of the invention
In view of this, this application provides a kind of prediction technique of diabetes and device, storage medium, computer equipment, Main purpose is to solve when carrying out the prediction of diabetes using the 0-1 disaggregated model of building, can only judge whether user suffers from Diabetes, the problem that can not but judge the severity of its illness, and then cause diagnostic result incomplete.
According to the one aspect of the application, a kind of prediction technique of diabetes is provided, this method comprises:
Obtain the sample of users data in original health archives and electronic health record data;
Regressive prediction model according to the user characteristics creation numeric type in the sample of users data;
When judging the first physical examination index value of target user's fasting blood-glucose using the regressive prediction model and preset after the meal Second physical examination index value of long blood glucose;
According to the first physical examination index value and/or the second physical examination index value, the illness of the target user is determined Degree.
According to further aspect of the application, a kind of prediction meanss of diabetes are provided, which includes:
Acquiring unit, for obtaining the sample of users data in original health archives and electronic health record data;
Creating unit, for the regression forecasting mould according to the user characteristics creation numeric type in the sample of users data Type;
Judging unit, for judging the first physical examination index value of target user's fasting blood-glucose using the regressive prediction model With the second physical examination index value of postprandial preset duration blood glucose;
Determination unit, for determining the mesh according to the first physical examination index value and/or the second physical examination index value Mark the extent of user.
According to the another aspect of the application, a kind of non-volatile readable storage medium is provided, calculating is stored thereon with Machine program realizes the prediction technique of above-mentioned diabetes when described program is executed by processor.
According to another aspect of the application, a kind of computer equipment is provided, including non-volatile readable storage medium, Processor and it is stored in the computer program that can be run on non-volatile readable storage medium and on a processor, the processor The prediction technique of above-mentioned diabetes is realized when executing described program.
By above-mentioned technical proposal, a kind of prediction technique and device, storage medium, calculating of diabetes provided by the present application Machine equipment, compared with currently with the method for the 0-1 disaggregated model of building prediction diabetes, the application is pre- in existing diabetes It surveys on the basis of model, increases the regressive prediction model of postprandial blood sugar and empty stomach 2h blood glucose, sentence using regressive prediction model Second physical examination index value of disconnected first physical examination index value of target user's fasting blood-glucose and postprandial preset duration blood glucose out, Ji Keli It determines whether target user suffers from diabetes with physical examination index value, and can also further judge the illness journey of target user Degree.
Above description is only the general introduction of technical scheme, in order to better understand the technological means of the application, And it can be implemented in accordance with the contents of the specification, and in order to which the above and other objects, features and advantages of the application can be more It becomes apparent, below the special specific embodiment for lifting the application.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please do not constitute the improper restriction to locally applying for explaining the application.In the accompanying drawings:
Fig. 1 shows a kind of flow diagram of the prediction technique of diabetes provided by the embodiments of the present application;
Fig. 2 shows the flow diagrams of the prediction technique of another diabetes provided by the embodiments of the present application;
Fig. 3 shows a kind of structural schematic diagram of the prediction meanss of diabetes provided by the embodiments of the present application;
Fig. 4 shows the structural schematic diagram of the prediction meanss of another diabetes provided by the embodiments of the present application.
Specific embodiment
The application is described in detail below with reference to embodiment and in conjunction with attached drawing.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
When predicting diabetes, it can not be sentenced according to user data for the 0-1 disaggregated model currently with building The problem of disconnected diabetes illness severity out, a kind of prediction technique of diabetes is present embodiments provided, as shown in Figure 1, should Method includes:
101, the sample of users data in original health archives and electronic health record data are obtained.
Wherein, sample of users data may include that patient assessment's data, physical examination achievement data, administration data and health inform number According to etc..
102, the regressive prediction model according to the user characteristics creation numeric type in sample of users data.
Wherein, user characteristics may include postprandial blood sugar and empty stomach 2h blood glucose, blood pressure, stratum corneum lipids, insulin, BMI body The multiclass features dimension datas such as performance figure, diabetes hereditary information, age, diagnostic result.
In a particular embodiment, regressive prediction model can be used a variety of different frames models based on decision tree and carry out It constructs, i.e., multiple prediction models based on decision tree is flocked together using integrated study thought, to improve prediction knot The accuracy rate of fruit.Decision tree is to belong to one kind fairly simple in machine learning supervised learning sorting algorithm, and decision tree is prediction Model;What it was represented is a kind of mapping relations between object properties and object value.Each node indicates some object in tree, and Some possible attribute value that each diverging paths then represent, and then correspondence is passed through each leaf node from root node to the leaf node The value of object represented by the path gone through.Decision tree only has single output, if being intended to plural output, can establish independent decision Tree is to handle different outputs.Decision Tree algorithms have an ID3, C4.5, CART algorithm, common ground be all be greedy algorithm, degree of being distinguished as Amount mode is different, and just such as ID3 has used obtained information quantity as metric form, and C4.5 uses maximum gain ratio.
By create obtained regressive prediction model can be very good to reflect different blood pressures, stratum corneum lipids, insulin, The corresponding postprandial plasma glucose level of sample of users of BMI body-mass index, diabetes hereditary information, age, diagnostic result etc. With empty stomach 2h blood glucose value.
103, when judging the first physical examination index value of target user's fasting blood-glucose using regressive prediction model and preset after the meal Second physical examination index value of long blood glucose.
Wherein, target user is the user for needing to carry out diabetic condition prediction;First physical examination index value corresponds to target use The data detection result of family fasting blood-glucose;Second physical examination index value corresponds to the data detection of the postprandial preset duration blood glucose of target user As a result;Preset duration can determine according to actual needs.
The postprandial plasma glucose level and empty stomach 2h blood glucose value reflected for the present embodiment, the sample of users based on different characteristic, The feature of target user is matched with the feature of sample of users, finds the corresponding postprandial plasma glucose level of matched sample user characteristics With empty stomach 2h blood glucose value.
104, according to the first physical examination index value and/or the second physical examination index value, the extent of target user is determined.
In specific application scenarios, can judge whether target user's fasting blood-glucose is normal according to the first physical examination index value, Judge whether the postprandial preset duration blood glucose of target user normal according to the second physical examination index value, when the first physical examination index value and/or When the display of second physical examination index value is abnormal, it can judge that user suffers from diabetes, and can also be by carrying out with critical value Compare, further judges the extent of patient.
The prediction technique of middle diabetes through this embodiment can create number according to the user characteristics in sample of users data The regressive prediction model of value type judges the first physical examination index value of target user's fasting blood-glucose and postprandial using regressive prediction model Second physical examination index value of preset duration blood glucose, and according to the first physical examination index value and/or the second physical examination index value, determine mesh Mark user whether the severity of illness and illness, keep condition-inference result more accurate, diagnosis content is more perfect, is convenient for Timely and effectively mating treatment is carried out according to the different development degree of diabetes, and then contains progression of the disease.
Further, as the refinement and extension of above-described embodiment specific embodiment, in order to completely illustrate the application reality The specific implementation process in example is applied, the prediction technique of another diabetes is provided, as shown in Fig. 2, this method comprises:
201, the sample of users data in original health archives and electronic health record data are obtained.
For example, obtaining 100 or so the complete samples of user characteristics altogether in original health archives and electronic health record data Then user data is further analyzed processing to sample of users data.
It is divided to two kinds of prediction modes to be illustrated below, one is this physical examination index values using fasting blood sugar to be predicted (i.e. process shown in step 202a to 205a), another kind are predicted using postprandial this physical examination index value of two hours blood glucose values (i.e. process shown in step 202b to 205b).
202a, fasting blood sugar is mixed the sample in the user characteristics at family as label information Y1, and mix the sample with family except sky Target signature data other than abdomen blood glucose value and postprandial two hours blood glucose values create the first model training as characteristic information X1 Collection.
Wherein, user characteristics are extracted from sample of users data using regular expression, and target signature data are at least Suffer from history data, hospitalization data, medical administration data, physical examination data, healthy one informed in data including sample of users Or it is multinomial, such as may include that suffer from medical history taking, record of being hospitalized, medicining condition, physical examination situation, the health of user is informed etc. and related to be believed Breath.
The first obtained model training is created to concentrate comprising each characteristic information X1 and corresponding label information Y1.The i.e. different sample of users for suffering from medical history taking, record of being hospitalized, medicining condition, physical examination situation, health informing etc. are corresponding Fasting blood sugar.
203a, default regression forecasting algorithm training is based on for judging the first physical examination index value by the first model training collection The first identification model.
Wherein, it presets regression forecasting algorithm and decision tree is promoted by random forest (Random Forest), gradient Tetra- kinds of (Gradient Boosting Decision Tree, GBDT), Xgboost, LightGBM algorithm fusions obtain, and first The assessment of identification model uses mean absolute percentage error (MAPE) index, when the corresponding MAPE index value of the first identification model When comparing threshold value less than pre-set criteria, determine that the first identification model meets evaluation criteria.MAPE index is predicted for assessment models Error between value and true value.Common regression model evaluation index has MAP, MSE, RMSE and MAPE, but MAP, MSE and RMSE only considers the value of error, and MAPE also contemplates the ratio between error and true value, its calculation formula is:
In formula above, N is total sample number, and X is measured value, and Y is the analogue value.MAPE value is smaller, illustrates model prediction Error between value and true value is smaller, in a specific embodiment, standard comparing threshold value can be set according to actual conditions, when When MAPE is less than standard comparing threshold value, illustrate that the first identification model meets evaluation criteria.By the identification mould for meeting evaluation criteria Type is predicted, it is ensured that the accuracy of prediction result.
The first identification model by meeting evaluation criteria can determine first between characteristic information X1 and label information Y1 Mapping relations.
The first identification mould is obtained in order to illustrate the default regression forecasting algorithm training obtained using above-mentioned four kinds of algorithm fusions The process of type, alternatively, the process are specific can include:
(1) the first training sample set, the second training are obtained from the first model training concentration using stochastical sampling mode respectively Sample set, third training sample set, the 4th training sample set, such as n trained sample is randomly selected from the first model training concentration This, carries out four-wheel extraction altogether, obtains four training sets.(mutually indepedent between four training sets, element can have repetition);
(2) random forests algorithm is utilized based on the first training sample set, training obtains the first classifier;Based on the second training Sample set utilizes GBDT algorithm, and training obtains the second classifier;Xgboost algorithm, training are utilized based on third training sample set Obtain third classifier;LightGBM algorithm is utilized based on the 4th training sample set, training obtains the 4th classifier;
Wherein, each training sample concentration includes different characteristic information X1 and corresponding label information The training process of Y1, these four classifiers can train these four that obtain, and obtain based on corresponding model training algorithm Classifier all can individually carry out the prediction of user's diabetes, that is, input characteristic (the particular content character pair of user to be measured Information X1), corresponding label information Y1 is found by classifier.
For the specific training process of the first classifier: 1. being concentrated from the first training sample and use the side Bootstraping Method puts back to sampling at random and selects m sample, carries out n times sampling altogether, generates n training set;2. for n training set, respectively N decision-tree model of training (can be constructed) by the existing algorithm such as ID3 algorithm, C4.5 algorithm, CART algorithm;3. for single Decision-tree model, it is assumed that the number of training sample feature is n, then according to information gain/information gain ratio/base when dividing every time The best feature of Buddhist nun's Index selection divides;4. each tree all go down so always by division, until all training of the node Sample belongs to same class, does not need beta pruning in the fission process of decision tree;5. more decision trees of generation are formed random Forest.For regression problem, the mean value for setting predicted value by more determines final prediction result, the i.e. prediction as the first classifier As a result.
For the specific training process of the second classifier: the second training sample T=of input (x1, y1), (x2, y2) ... (xm, ym) } T=(x1, y1), (x2, y2) ... (xm, ym) }, maximum number of iterations T, loss function L.Output is to learn by force Device f (x):
A weak learner) is initialized
Wherein, c is setting constant.
B) to iteration wheel number t=1,2 ... T has:
A) to sample i=1,2 ... m calculates negative gradient rti
B) (xi, rti) (i=1,2 ..m) is utilized, is fitted a CART regression tree, obtains the t regression tree, corresponded to Leaf node region be Rtj, j=1,2 ..., J.Wherein J is the number of the leaf node of regression tree t.
C) to area foliage j=1,2 ..J, best-fit values c is calculatedtj
D) strong learner is updated
Wherein, I is the training sample of all leaf node region Rtj Set.
C the expression formula of strong learner f (x)) is obtained
Based on above-mentioned strong learner f (x), training obtains the second classifier.
For the specific training process of third classifier:
A initial model) is established, formula specific as follows:
Wherein, k indicates the number of tree, and F indicates that each tree construction of building, xi indicate I-th of sample, predicted value that is score value of the xi on each tree and being exactly xi,For predicted value.
The objective function of the initial model is
Yi is the corresponding sample actual value of xi.
B) with the growth of tree, the formula recursion taken turns by t, obtaining final goal function is
Wherein, IjIt indicates: j-th of leaf In include all samples, wj indicates the weight of j-th of leaf, and γ T corresponds to the number of leaf.
C third training sample set data) are substituted into using above-mentioned initial model and are fitted training, and utilize above-mentioned final mesh The fine or not degree that scalar functions measure models fitting training data (calculates loss function using objective function, loss function is smaller Illustrate that model can preferably be fitted training data) it is so that the deviation and variance of model obtain standard requirements, i.e., final trained To third classifier.
For the specific training process of the 4th classifier:
A the data that the 4th sample training is concentrated) are fitted using existing LightGBM algorithm, and to obtaining after each fitting Model concentrates the test set selected to be tested using from the 4th sample training, obtains the corresponding coefficient of determination and mean square error Value;
B the model) when the coefficient of determination is greater than certain threshold value and respectively error amount is less than certain threshold value, after determining fitting It complies with standard, and standard compliant model is determined as the 4th classifier.
(3) the first classifier, the second classifier, third classifier, the 4th classifier are finally utilized into bagging method (bagging) fusion treatment is carried out, the first identification model is obtained.
Specific fusion treatment mode is the process by voting, that is, uses most of principles, the minority is subordinate to the majority.Example Such as, for these four classifiers, suffer from history data, hospitalization data, medical administration data, physical examination number in input user to be measured After informing data according to, health, if the corresponding fasting blood sugar of prediction result obtained in four classifiers there are three classifier Meet the standard with diabetes, then can determine user to be measured with diabetes;If only one classifier obtains pre- It surveys the corresponding fasting blood sugar of result and meets the standard with diabetes, the corresponding fasting blood sugar of the other three classifier is not inconsistent The standard is closed, then can determine that user to be measured does not suffer from diabetes.
It should be noted that being trained if the MAPE index value of the first identification model is greater than pre-set criteria and compares threshold value The first obtained identification model does not meet evaluation criteria, then the first model training collection can be repartitioned, obtains the first new instruction Practice sample set, the second training sample set, third training sample set, the 4th training sample set, then utilizes the first new training sample This collection continues to train the first classifier, and continues the second classifier of training using the second new training sample set, and utilize newly Third training sample set continues to train third classifier, and continues the 4th classifier of training using the 4th new training sample set, Then this four Multiple Classifier Fusion processing obtained again by newly training, determine the MAPE index value of the first new identification model Whether it is less than pre-set criteria and compares threshold value, if being still greater than pre-set criteria compares threshold value, repeats above-mentioned repetition and divide mould Type training set and the process for updating training classifier, until the MAPE index value of newest the first obtained identification model is greater than Pre-set criteria compares threshold value, that is, meets evaluation criteria.
204a, the characteristic information of target user is input in the first identification model and characteristic information X1 progress similarity Match.
Wherein, the characteristic information of target user corresponds to target user in addition to fasting blood sugar and postprandial two hours blood glucose values Target signature data.
Alternatively, step 204a is specific can include: by the characteristic information of target user by data cleansing, Feature extraction, Missing Data Filling, outlier processing obtain the characteristic information of structural data;The feature of structural data is believed Breath carries out similarity mode with characteristic information X1.
Since the characteristic information of target user sometimes includes hash, and/or there are missing values, and/or there are different Constant value, that is, be not suitable for the unstructured data directly predicted using the first identification model.Therefore, target can be used first The characteristic information at family carries out data cleansing, and removing hash, (such as removal user shows dwelling location, registered permanent residence location number According to only history data, hospitalization data, medical administration data, physical examination data, health informing data etc. are suffered from reservation);Again to reservation Data carry out feature extraction and (suffer from history data, hospitalization data, medical administration data, physical examination data, health informing number as extracted According to etc.);If there are can be filled (height in such as user's physical examination data when missing values using 0 value in the characteristic extracted With one vacancy of weight, to be filled using 0 value, guarantee when matching with characteristic information X1 in model subsequent in this way is comparable, Generating when avoiding characteristic matching can not matched mistake);If in the characteristic extracted, there are exceptional values can refer to practical feelings It (is 99999 days as being hospitalized duration one, hence it is evident that there are exceptions, can further pass through the time started of being hospitalized that condition, which is modified processing, Duration of being correctly hospitalized, processing of then modifying are calculated with the end time).
Pass through a series of places such as data cleansing, feature extraction, Missing Data Filling, outlier processing in this optional way Reason, it is ensured that obtain avoiding feature with the structural data being comparable when characteristic information X1 is matched in the first identification model When matching generate can not matched mistake, remove exceptional value, improve the accuracy of characteristic matching.
205a, it is greater than preset threshold and the highest characteristic information X1 of similarity and the first mapping relations using similarity, really The corresponding first physical examination index value of the user that sets the goal.
Wherein, preset threshold can be preset according to actual needs.For example, preset threshold is arranged bigger, it is corresponding special It is higher to levy matching precision, if similarity is 100%, illustrates that feature exactly matches.
For example, suffering from history data, hospitalization data, medical administration data, body in the first identification model input target user Examine data, health inform data after, be equivalent to by these data be separately input in four classifiers of step 203a and with point The corresponding characteristic information of class device carries out similarity mode, finds characteristic information that is most like and being greater than certain threshold value respectively, And then corresponding physical examination index value, the i.e. fasting blood sugar of target user are found out respectively by this four classifiers, if Meet the standard with diabetes there are three fasting blood sugar in this four fasting blood sugars, then can determine that target user suffers from Diabetes, and the first physical examination index that the average value for calculating these three fasting blood sugars is calculated as the first identification model Value;If not meeting the standard with diabetes there are two fasting blood sugar in this four fasting blood sugars, other two is on an empty stomach Blood glucose value meets the standard with diabetes, then calculating the average value of this four fasting blood sugars as the first identification model meter The first obtained physical examination index value, and determine whether target user suffers from diabetes according to this average value.
The step 202b arranged side by side with step 202a, mix the sample with the user characteristics Chinese meal at family after two hours blood glucose values as mark Information Y2 is signed, and mixes the sample with target signature data of the family in addition to fasting blood sugar and postprandial two hours blood glucose values as feature Information X2 creates the second model training collection.
It should be noted that step 202b is similar with step 202a, the target signature data of sample of users include at least sample This user suffer from history data, hospitalization data, medical administration data, physical examination data, health inform it is one or more in data. And it creates the second obtained model training to concentrate comprising each characteristic information X2 and corresponding label information Y2.I.e. The sample of users that difference suffers from medical history taking, record of being hospitalized, medicining condition, physical examination situation, health informing etc. is corresponding postprandial Two hours blood glucose values.
203b, default regression forecasting algorithm training is based on for judging the second physical examination index value by the second model training collection The second identification model.
Wherein, the assessment of the second identification model equally uses MAPE index, when the corresponding MAPE index of the second identification model When value compares threshold value less than preassigned, determine that the second identification model meets evaluation criteria, by meet evaluation criteria second Identification model can determine the second mapping relations between characteristic information X2 and label information Y2.
Alternatively, the detailed process of the 203b step can include:
(1) it is concentrated using stochastical sampling mode from the second model training and obtains the 5th training sample set, the 6th training respectively Sample set, the 7th training sample set, the 8th training sample set;
(2) random forests algorithm is utilized based on the 5th training sample set, training obtains the 5th classifier;Based on the 6th training Sample set utilizes GBDT algorithm, and training obtains the 6th classifier;Xgboost algorithm, training are utilized based on the 7th training sample set Obtain the 7th classifier;LightGBM algorithm is utilized based on the 8th training sample set, training obtains the 8th classifier;
(3) the 5th classifier, the 6th classifier, the 7th classifier, the 8th classifier are carried out at fusion using bagging method Reason, obtains the second identification model.
Similar with the optional way in step 203a, specific fusion treatment mode is also the process by voting, i.e., Using most of principles, the minority is subordinate to the majority.For example, for these four classifiers, input user to be measured suffer from history data, After hospitalization data, medical administration data, physical examination data, health inform data, if obtained in four classifiers there are three classifier To the corresponding postprandial two hours blood glucose values of prediction result meet the standard with diabetes, then can determine that user to be measured suffers from Diabetes;If the corresponding postprandial two hours blood glucose values of the prediction result that only one classifier obtains meet with diabetes Standard, the corresponding postprandial two hours blood glucose values of the other three classifier do not meet the standard, then can determine that user to be measured does not have Suffer from diabetes.
It should be noted that being trained if the MAPE index value of the second identification model is greater than pre-set criteria and compares threshold value The second obtained identification model does not meet evaluation criteria, then the first model training collection can be repartitioned, obtains the 5th new instruction Practice sample set, the 6th training sample set, the 7th training sample set, the 8th training sample set, then utilizes the 5th new training sample This collection continues to train the 5th classifier, and continues the 6th classifier of training using the 6th new training sample set, and utilize newly 7th training sample set continues the 7th classifier of training, and continues the 8th classifier of training using the 8th new training sample set, Then this four Multiple Classifier Fusion processing obtained again by newly training, determine the MAPE index value of the second new identification model Whether it is less than pre-set criteria and compares threshold value, if being still greater than pre-set criteria compares threshold value, repeats above-mentioned repetition and divide mould Type training set and the process for updating training classifier, until the MAPE index value of newest the second obtained identification model is greater than Pre-set criteria compares threshold value, that is, meets evaluation criteria.
204b, the characteristic information of target user is input in the second identification model and characteristic information X2 progress similarity Match.
In this step, the characteristic information of target user corresponds to target user except fasting blood sugar and postprandial two hours blood glucose Target signature data other than value.
Alternatively, step 204b is specific can include: by the characteristic information of target user by data cleansing, Feature extraction, Missing Data Filling, outlier processing obtain the characteristic information of structural data;The feature of structural data is believed Breath carries out similarity mode with characteristic information X2.
It is similar with the optional way in step 204a, pass through the data cleansing in this optional way, feature extraction, missing values A series of processing such as filling, outlier processing, it is ensured that having when obtaining matching with characteristic information X2 in the second identification model can Than the structural data of property, avoid generating when characteristic matching can not matched mistake, remove exceptional value, improve the essence of characteristic matching Exactness.
205b, it is greater than predetermined threshold and the highest characteristic information X2 of similarity and the second mapping relations using similarity, really The corresponding second physical examination index value of the user that sets the goal.
Wherein, predetermined threshold can be preset according to actual needs.For example, predetermined threshold is arranged bigger, it is corresponding special It is higher to levy matching precision, if similarity is 100%, illustrates that feature exactly matches.
For example, suffering from history data, hospitalization data, medical administration data, body in the second identification model input target user Examine data, health inform data after, be equivalent to by these data be separately input in four classifiers of step 203b and with point The corresponding characteristic information of class device carries out similarity mode, finds characteristic information that is most like and being greater than certain threshold value respectively, And then corresponding physical examination index value, the i.e. postprandial two hours blood glucose of target user are found out respectively by this four classifiers Value, if meeting the standard with diabetes there are three postprandial two hours blood glucose values in this four postprandial two hours blood glucose values, that It can determine that target user suffers from diabetes, and calculate the average value of these three postprandial two hours blood glucose values as the second identification mould The second physical examination index value that type is calculated;If there are two postprandial two hours blood glucose values in this four postprandial two hours blood glucose values The standard with diabetes is not met, other two postprandial two hours blood glucose value meets the standard with diabetes, then calculating The second physical examination index value that the average value of this four postprandial two hours blood glucose values is calculated as the second identification model, and foundation This average value determines whether target user suffers from diabetes.
206, according to the first physical examination index value and/or the second physical examination index value, the extent of target user is determined.
Alternatively, step 206 is specific can include: if the corresponding first physical examination index value of target user is greater than It is more than or equal to the second preset threshold equal to the first preset threshold and/or the second physical examination index value, it is determined that target user is with sugar Urine disease;Then pass through the second number locating for the first numerical intervals locating for the first physical examination index value, and/or the second physical examination index value It is worth section, judges the extent of target user.
Wherein, the first preset threshold is to judge the established standards of diabetes according to fasting blood-glucose to determine, such as 7.0mmol/L;Second preset threshold is to judge the established standards of diabetes according to postprandial two hours blood glucose to determine, such as 11.1mmol/L。
Such as, however, it is determined that the first preset threshold of target user is 8.0mmol/L, and the second preset threshold is 7.6mmol/L, Because the first physical examination index value is greater than the first preset threshold, therefore it can determine target user with diabetes;If it is determined that target user One preset threshold is 5.7mmol/L, and the second preset threshold is 11.9mmol/L, is preset because the second physical examination index value is greater than second Threshold value, therefore can determine target user with diabetes;If it is determined that the first preset threshold of target user is 8.3mmol/L, second is pre- If threshold value is 11.7mmol/L, because the first physical examination index value is greater than the first preset threshold, it is pre- that the second physical examination index value is greater than second If threshold value, therefore it can determine target user with diabetes.
And for the extent of diabetes, divide three kinds of situations to discuss below:
(1) only judged with the first physical examination index value, that is, pass through the first numerical intervals locating for the first physical examination index value, judgement The extent of the target user, specifically can include: divide and be greater than the first preset threshold, and according to predetermined value regular increase Multiple numerical intervals;Create the third mapping relations between multiple numerical intervals and diabetes extent;Determine the first body Examine corresponding the first numerical intervals in multiple numerical intervals of index value;According to third mapping relations and the first numerical value area Between, judge the first diabetes extent of target user.
For example, setting is right greater than in multiple numerical intervals of the first preset threshold 7.0mmol/L and third mapping relations The diabetes extent answered is respectively as follows: mild diabetes: 7.0~8.4mmol/L, medium diabetes mellitus: 8.4~10.1mmol/ L, severe diabetes: greater than 10.11mmol/L.If it is determined that the first physical examination index value is 9.6mmol/L, then it can determine whether out the first body The first numerical intervals that inspection index value is in are as follows: 8.4~11.1mmol/L, then according to third mapping relations and the first numerical value Section can determine whether out that the extent of the diabetes of target user is medium diabetes mellitus.
(2) only judged with the second physical examination index value, that is, pass through second value section locating for the second physical examination index value, judgement The extent of target user, specifically includes: dividing and is greater than the second preset threshold, and according to the multiple of predetermined value regular increase Numerical intervals;Create the 4th mapping relations between multiple numerical intervals and diabetes extent;Determine the second physical examination index The corresponding second value section in multiple numerical intervals of value;According to the 4th mapping relations and second value section, judgement The second diabetes extent of target user.
For example, setting is right greater than in multiple numerical intervals of the second preset threshold 11.1mmol/L and the 4th mapping relations The diabetes extent answered is respectively as follows: medium diabetes mellitus: 11.1~16.7mmol/L, severe diabetes: greater than 16.7mmol/ L (the phenomenon that being easy to appear ketoacidosis when being greater than 16.7mmol/L).If it is determined that the second physical examination index value is 12.6mmol/ L then can determine whether out the second value section that the second physical examination index value is in are as follows: 11.1~16.7mmol/L is then reflected according to the 4th Relationship and second value section are penetrated, can determine whether out that the extent of the diabetes of target user has been medium diabetes mellitus.
(3) combining the first physical examination index value and the second physical examination index value to carry out comprehensive judgement, (this decision procedure is due to examining Consider many factors, therefore precision of prediction is relatively high), i.e., by the first numerical intervals locating for the first physical examination index value and Second value section locating for second physical examination index value, judges the extent of target user, specifically includes: if the first diabetes Extent and the second diabetes extent are identical, then determine final illness journey according to the identical diabetes extent of the two Degree.If the first diabetes extent and the second diabetes extent are different, according to user to passing through the first identification model With the accuracy rate of both prediction modes of the second identification model feedback or adopt rate, obtain the first identification model corresponding the respectively One weight and corresponding second weight of the second identification model;When the first weight is greater than the second weight, by the first diabetes illness Degree is determined as the extent of target user;It is when the second weight is greater than the first weight, the second diabetes extent is true It is set to the extent of target user.
In the present embodiment, two kinds of corresponding weights of prediction mode can according to user feedback accuracy rate or adopt rate It is set.Specific statistics available different accuracy rate adopts the corresponding weighted value of rate, is then reflected by what statistics obtained Relationship is penetrated, the corresponding weight of prediction mode is found.For the present embodiment, according to user feedback accuracy rate or adopt rate, can be quasi- The prediction result for really reflecting which kind of prediction mode precision of prediction is higher, and then the higher prediction mode of precision of prediction being selected to obtain Determine as final as a result, more accurate.In addition to this, it is each that two kinds of prediction modes can be also artificially preset according to the actual situation Self-corresponding weight.
For example, according to user feedback as a result, discovery utilizes the standard of the first physical examination index value prediction diabetes extent True rate is higher, then can be 70% for the weight of the first physical examination index value prediction mode configuration, be the second physical examination index value prediction side The weight of formula configuration is 30%, and the result that can predict the first physical examination index value when the result difference that two kinds of predictions generate is fed back To target user, as last diagnostic result.Assuming that the prediction of the first physical examination index value is medium diabetes mellitus, the second physical examination index Value prediction is severe diabetes, then according to the weight of configuration height, the final diabetes extent for determining target user is Medium diabetes mellitus.
It is subsequent obtain target user practical fasting blood sugar and postprandial two hours blood glucose values after, be alternatively arranged as new sample This training set continues to train to two identification models in the present embodiment, to reach the higher effect of precision of prediction.Pass through The prediction technique of above-mentioned diabetes can be determined between characteristic information and label information by being trained to model training collection Mapping relations, the structural data of target user is matched with regressive prediction model, so pass through mapping relations determine First physical examination index value of fasting blood-glucose and postprandial two hours the second physical examination index values, by with the first preset threshold and second The numerical value of preset threshold compares, and can judge whether user suffers from diabetes, not only can be pre- from diabetes diagnosis index Survey user whether illness, moreover it is possible to pass through the first numerical intervals locating for the first physical examination index value, and/or the second physical examination index value institute The second value section at place, judges the extent of target user, keeps diagnostic result more perfect.
Further, the concrete embodiment as method shown in Fig. 1 and Fig. 2, the embodiment of the present application provide a kind of diabetes Prediction meanss, as shown in figure 3, the device includes: acquiring unit 31, creating unit 32, judging unit 33, determination unit 34.
Acquiring unit 31 can be used for obtaining the sample of users data in original health archives and electronic health record data;
Creating unit 32 can be used for the regression forecasting mould according to the user characteristics creation numeric type in sample of users data Type;
Judging unit 33 can be used for judging using regressive prediction model the first physical examination index value of target user's fasting blood-glucose With the second physical examination index value of postprandial preset duration blood glucose;
Determination unit 34 can be used for determining target user's according to the first physical examination index value and/or the second physical examination index value Extent.
In specific implementation application scenarios, for the recurrence according to the user characteristics creation numeric type in sample of users data Prediction model, as shown in figure 4, creating unit 32, specifically can include: creation module 321, training module 322, determining module 323.
Creation module 321 is particularly used in using fasting blood sugar in the user characteristics as label information Y1, and will Target signature data of the sample of users in addition to the fasting blood sugar and postprandial two hours blood glucose values are as characteristic information X1, creates the first model training collection, and the target signature data include at least suffering from history data, counting in hospital for the sample of users According to, medical administration data, physical examination data, health inform it is one or more in data;
Training module 322 is particularly used in and is based on default regression forecasting algorithm training by the first model training collection For judging the first identification model of the first physical examination index value, wherein the default regression forecasting algorithm by random forest, Gradient promotes tetra- kinds of algorithm fusions of decision tree GBDT, Xgboost, LightGBM and obtains, and the assessment of first identification model is adopted With mean absolute percentage error MAPE index, when the corresponding MAPE index value of first identification model is less than pre-set criteria ratio When compared with threshold value, determine that first identification model meets evaluation criteria;
Determining module 323 is particularly used in and can determine the spy by first identification model for meeting evaluation criteria Reference ceases the first mapping relations between the X1 and label information Y1;
Creation module 321 specifically can also be used in using two hours blood glucose values after the user characteristics Chinese meal as label information Y2, and using the target signature data of the sample of users as characteristic information X2, create the second model training collection;
Training module 322 specifically can also be used to be based on the default regression forecasting calculation by the second model training collection Method trains the second identification model for judging the second physical examination index value, wherein the assessment of second identification model is adopted With MAPE index, when the corresponding MAPE index value of second identification model, which is less than preassigned, compares threshold value, determine described in Second identification model meets evaluation criteria;
Determining module 323 specifically can also be used to can determine by second identification model for meeting evaluation criteria described The second mapping relations between characteristic information X2 and the label information Y2.
Correspondingly, in order to judge target user's fasting blood-glucose the first physical examination index value and postprandial preset duration blood glucose Second physical examination index value, as shown in figure 4, judging unit 33, specifically can include: matching module 331, determining module 332.
Matching module 331 is particularly used in the characteristic information of the target user being input to first identification model In with the characteristic information X1 carry out similarity mode, the characteristic information of the target user correspond to the target user remove described in The target signature data other than fasting blood sugar and postprandial two hours blood glucose values;
Determining module 332 is particularly used in and is greater than preset threshold and the highest feature letter of similarity using similarity X1 and first mapping relations are ceased, determine the corresponding first physical examination index value of the target user;
Matching module 331 specifically can also be used to for the characteristic information of the target user to be input to the second identification mould Similarity mode is carried out with the characteristic information X2 in type;
Determining module 332 specifically can also be used to be greater than predetermined threshold and the highest characteristic information of similarity using similarity X2 and second mapping relations determine the corresponding second physical examination index value of the target user.
In specific application scenarios, in order to determine mesh according to the first physical examination index value and/or the second physical examination index value The extent of user is marked, as shown in figure 4, determination unit 34, specifically can include: determining module 341, judgment module 342.
Determining module 341, if can be used for the corresponding first physical examination index value of target user is more than or equal to the first preset threshold, And/or second physical examination index value be more than or equal to the second preset threshold, it is determined that target user suffer from diabetes;
Judgment module 342 can be used for through the first numerical intervals locating for the first physical examination index value, and/or the second physical examination Second value section locating for index value, judges the extent of target user.
In specific application scenarios, in order to accurately judge the extent of target user, judgment module 342, tool Body, which is also used to divide, is greater than the first preset threshold, and according to multiple numerical intervals of predetermined value regular increase;Create multiple numbers The third mapping relations being worth between section and diabetes extent;Determine that the first physical examination index value is corresponding in multiple numerical value areas Between in the first numerical intervals;According to third mapping relations and the first numerical intervals, the diabetes illness of target user is judged Degree.It divides and is greater than the second preset threshold, and according to multiple numerical intervals of predetermined value regular increase;Create multiple numerical value areas Between the 4th mapping relations between diabetes extent;Determine that the second physical examination index value is corresponding in multiple numerical intervals Second value section;According to the 4th mapping relations and second value section, the diabetes extent of target user is judged;
Judgment module 342, if being specifically also used to the first diabetes extent and the second patient of diabetes course of disease Degree is different, then according to user to being fed back by first identification model and described both prediction modes of second identification model Accuracy rate adopts rate, obtains corresponding first weight of first identification model respectively and second identification model is corresponding Second weight;When first weight is greater than second weight, the first diabetes extent is determined as described The extent of target user;When second weight is greater than first weight, by the second diabetes extent It is determined as the extent of the target user.
In specific application scenarios, matching module 331 is particularly used in and passes through the characteristic information of the target user Data cleansing, feature extraction, Missing Data Filling, outlier processing obtain the characteristic information of structural data;By structural data Characteristic information and the characteristic information X1 carry out similarity mode;
Matching module 331, be particularly used in by the characteristic information of the target user by data cleansing, feature extraction, Missing Data Filling, outlier processing obtain the characteristic information of structural data;By the characteristic information of structural data and the spy Reference ceases X2 and carries out similarity mode.
In specific application scenarios, training module 322 is particularly used in using stochastical sampling mode from first mould The first training sample set, the second training sample set, third training sample set, the 4th training sample are obtained in type training set respectively Collection;Random forests algorithm is utilized based on first training sample set, training obtains the first classifier;Based on second training Sample set utilizes GBDT algorithm, and training obtains the second classifier;Xgboost algorithm is utilized based on the third training sample set, Training obtains third classifier;LightGBM algorithm is utilized based on the 4th training sample set, training obtains the 4th classifier; First classifier, second classifier, the third classifier, the 4th classifier are melted using bagging method Conjunction processing, obtains first identification model;
Training module 322 specifically can also be used to obtain respectively using stochastical sampling mode from second model training concentration Take the 5th training sample set, the 6th training sample set, the 7th training sample set, the 8th training sample set;Based on the 5th instruction Practice sample set and utilize random forests algorithm, training obtains the 5th classifier;It is calculated based on the 6th training sample set using GBDT Method, training obtain the 6th classifier;Xgboost algorithm is utilized based on the 7th training sample set, training obtains the 7th classification Device;LightGBM algorithm is utilized based on the 8th training sample set, training obtains the 8th classifier;By the 5th classification Device, the 6th classifier, the 7th classifier, the 8th classifier carry out fusion treatment using bagging method, obtain institute State the second identification model.
It should be noted that each functional unit involved by a kind of prediction meanss of diabetes provided in this embodiment is other Corresponding description, can be referring to figs. 1 to the corresponding description in Fig. 2, and details are not described herein.
Based on above-mentioned method as depicted in figs. 1 and 2, correspondingly, the embodiment of the present application also provides a kind of storage medium, On be stored with computer program, which realizes the above-mentioned prediction such as Fig. 1 and diabetes shown in Fig. 2 when being executed by processor Method.
Based on this understanding, the technical solution of the application can be embodied in the form of software products, which produces Product can store in a non-volatile memory medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions With so that computer equipment (can be personal computer, server or the network equipment an etc.) execution the application is each The method of implement scene.
Based on above-mentioned method as shown in Figure 1 and Figure 2 and Fig. 3, virtual bench embodiment shown in Fig. 4, in order to realize Above-mentioned purpose, the embodiment of the present application also provides a kind of computer equipments, are specifically as follows personal computer, server, network Equipment etc., the entity device include storage medium and processor;Storage medium, for storing computer program;Processor is used for Computer program is executed to realize the prediction technique of above-mentioned diabetes as depicted in figs. 1 and 2.
Optionally, which can also include user interface, network interface, camera, radio frequency (Radio Frequency, RF) circuit, sensor, voicefrequency circuit, WI-FI module etc..User interface may include display screen (Display), input unit such as keyboard (Keyboard) etc., optional user interface can also connect including USB interface, card reader Mouthful etc..Network interface optionally may include standard wireline interface and wireless interface (such as blue tooth interface, WI-FI interface).
It will be understood by those skilled in the art that computer equipment structure provided in this embodiment is not constituted and is set to the entity Standby restriction may include more or fewer components, perhaps combine certain components or different component layouts.
It can also include operating system, network communication module in non-volatile readable storage medium.Operating system is management The program of the entity device hardware and software resource of the prediction of diabetes, support message handling program and other softwares and/or The operation of program.Network communication module for realizing the communication between component each inside non-volatile readable storage medium, and It is communicated between hardware and softwares other in the entity device.
Through the above description of the embodiments, those skilled in the art can be understood that the application can borrow It helps software that the mode of necessary general hardware platform is added to realize, hardware realization can also be passed through.Pass through the skill of application the application Art scheme, compared with currently available technology, the application can be on the basis of detecting target user with diabetes, further Judge the severity of illness, diagnostic result can be made more perfect, and then the state of an illness hair for understanding target user can be tracked in time Situation is opened up, and carries out corresponding mating treatment.
It will be appreciated by those skilled in the art that the accompanying drawings are only schematic diagrams of a preferred implementation scenario, module in attached drawing or Process is not necessarily implemented necessary to the application.It will be appreciated by those skilled in the art that the mould in device in implement scene Block can according to implement scene describe be distributed in the device of implement scene, can also carry out corresponding change be located at be different from In one or more devices of this implement scene.The module of above-mentioned implement scene can be merged into a module, can also be into one Step splits into multiple submodule.
Above-mentioned the application serial number is for illustration only, does not represent the superiority and inferiority of implement scene.Disclosed above is only the application Several specific implementation scenes, still, the application is not limited to this, and the changes that any person skilled in the art can think of is all The protection scope of the application should be fallen into.

Claims (10)

1. a kind of prediction technique of diabetes characterized by comprising
Obtain the sample of users data in original health archives and electronic health record data;
Regressive prediction model according to the user characteristics creation numeric type in the sample of users data;
Using the regressive prediction model judge target user's fasting blood-glucose the first physical examination index value and postprandial preset duration blood Second physical examination index value of sugar;
According to the first physical examination index value and/or the second physical examination index value, the extent of the target user is determined.
2. the method according to claim 1, wherein the user characteristics are to utilize regular expression from the sample It is extracted in this user data, the preset duration is two hours;
The regressive prediction model according to the user characteristics creation numeric type in the sample of users data, specifically includes:
Using fasting blood sugar in the user characteristics as label information Y1, and family is mixed the sample with except the fasting blood sugar and institute The target signature data other than postprandial two hours blood glucose values are stated as characteristic information X1, create the first model training collection, the mesh Mark characteristic includes at least suffering from history data, hospitalization data, medical administration data, physical examination data, being good for for the sample of users Health is accused one or more in primary data;
Default regression forecasting algorithm training is based on for judging the first physical examination index value by the first model training collection The first identification model, wherein the default regression forecasting algorithm by random forest, gradient promoted decision tree GBDT, Tetra- kinds of algorithm fusions of Xgboost, LightGBM obtain, and the assessment of first identification model uses mean absolute percentage error MAPE index determines described the when the corresponding MAPE index value of first identification model, which is less than pre-set criteria, compares threshold value One identification model meets evaluation criteria, and first identification model by meeting evaluation criteria can determine the characteristic information X1 The first mapping relations between the label information Y1;
Using two hours blood glucose values after the user characteristics Chinese meal as label information Y2, and by the target of the sample of users Characteristic creates the second model training collection as characteristic information X2;
The default regression forecasting algorithm training is based on for judging that second physical examination refers to by the second model training collection Second identification model of scale value, wherein the assessment of second identification model uses MAPE index, when second identification model When corresponding MAPE index value compares threshold value less than preassigned, determines that second identification model meets evaluation criteria, pass through Meet evaluation criteria second identification model can determine between the characteristic information X2 and the label information Y2 second Mapping relations.
3. according to the method described in claim 2, it is characterized in that, described judge target user using the regressive prediction model First physical examination index value of fasting blood-glucose and the second physical examination index value of postprandial preset duration blood glucose, specifically include:
The characteristic information of the target user is input in first identification model similar to the characteristic information X1 progress Degree matching, the characteristic information of the target user are corresponding in addition to the fasting blood sugar and postprandial two hours blood glucose values The target signature data;
Using similarity greater than preset threshold and the highest characteristic information X1 of similarity and first mapping relations, really Determine the corresponding first physical examination index value of the target user;
The characteristic information of the target user is input in second identification model similar to the characteristic information X2 progress Degree matching;
Using similarity greater than predetermined threshold and the highest characteristic information X2 of similarity and second mapping relations, institute is determined State the corresponding second physical examination index value of target user.
4. according to the method described in claim 3, it is characterized in that, described according to the first physical examination index value and/or described Second physical examination index value, determines the extent of the target user, specifically includes:
If the corresponding first physical examination index value of the target user is more than or equal to the first preset threshold and/or described second Physical examination index value is more than or equal to the second preset threshold, it is determined that the target user suffers from diabetes;
Pass through locating for the first numerical intervals locating for the first physical examination index value, and/or the second physical examination index value Two numerical intervals judge the extent of the target user.
5. according to the method described in claim 4, it is characterized in that, passing through the first numerical value locating for the first physical examination index value Section judges the extent of the target user, specifically includes:
It divides and is greater than first preset threshold, and according to multiple numerical intervals of predetermined value regular increase;
Create the third mapping relations between the multiple numerical intervals and diabetes extent;
Determine corresponding first numerical intervals in the multiple numerical intervals of the first physical examination index value;
According to the third mapping relations and first numerical intervals, the first diabetes illness of the target user is judged Degree;
By second value section locating for the second physical examination index value, the extent of the target user is judged, specifically Include:
It divides and is greater than second preset threshold, and according to multiple numerical intervals of predetermined value regular increase;
Create the 4th mapping relations between the multiple numerical intervals and diabetes extent;
Determine the corresponding second value section in the multiple numerical intervals of the second physical examination index value;
According to the 4th mapping relations and the second value section, the second diabetes illness of the target user is judged Degree;
Pass through the second number locating for the first numerical intervals locating for the first physical examination index value and the second physical examination index value It is worth section, judges the extent of the target user, specifically include:
If the first diabetes extent is different with the second diabetes extent, according to user to by described The accuracy rate or adopt rate that first identification model and described both prediction modes of second identification model are fed back, respectively described in acquisition Corresponding first weight of first identification model and corresponding second weight of second identification model;
When first weight is greater than second weight, the first diabetes extent is determined as the target and is used The extent at family;
When second weight is greater than first weight, the second diabetes extent is determined as the target and is used The extent at family.
6. according to the method described in claim 3, it is characterized in that, the characteristic information by the target user is input to institute It states in the first identification model and carries out similarity mode with the characteristic information X1, specifically include:
The characteristic information of the target user is passed through into data cleansing, feature extraction, Missing Data Filling, outlier processing, is obtained The characteristic information of structural data;
The characteristic information of structural data and the characteristic information X1 are subjected to similarity mode;
The characteristic information by the target user is input in second identification model to carry out with the characteristic information X2 Similarity mode specifically includes:
The characteristic information of the target user is passed through into data cleansing, feature extraction, Missing Data Filling, outlier processing, is obtained The characteristic information of structural data;
The characteristic information of structural data and the characteristic information X2 are subjected to similarity mode.
7. according to the method described in claim 2, it is characterized in that, described be based on presetting back by the first model training collection Return prediction algorithm to train the first identification model for judging the first physical examination index value, specifically include:
First training sample set, the second training sample are obtained from first model training concentration using stochastical sampling mode respectively Collection, third training sample set, the 4th training sample set;
Random forests algorithm is utilized based on first training sample set, training obtains the first classifier;
GBDT algorithm is utilized based on second training sample set, training obtains the second classifier;
Xgboost algorithm is utilized based on the third training sample set, training obtains third classifier;
LightGBM algorithm is utilized based on the 4th training sample set, training obtains the 4th classifier;
By first classifier, second classifier, the third classifier, the 4th classifier using bagging method into Row fusion treatment obtains first identification model;
It is described to be collected based on the default regression forecasting algorithm training by second model training for judging second body The second identification model for examining index value, specifically includes:
It is concentrated using stochastical sampling mode from second model training and obtains the 5th training sample set, the 6th training sample respectively Collection, the 7th training sample set, the 8th training sample set;
Random forests algorithm is utilized based on the 5th training sample set, training obtains the 5th classifier;
GBDT algorithm is utilized based on the 6th training sample set, training obtains the 6th classifier;
Xgboost algorithm is utilized based on the 7th training sample set, training obtains the 7th classifier;
LightGBM algorithm is utilized based on the 8th training sample set, training obtains the 8th classifier;
By the 5th classifier, the 6th classifier, the 7th classifier, the 8th classifier using bagging method into Row fusion treatment obtains second identification model.
8. a kind of prediction meanss of diabetes characterized by comprising
Acquiring unit, for obtaining the sample of users data in original health archives and electronic health record data;
Creating unit, for the regressive prediction model according to the user characteristics creation numeric type in the sample of users data;
Judging unit, for judging the first physical examination index value and meal of target user's fasting blood-glucose using the regressive prediction model Second physical examination index value of preset duration blood glucose afterwards;
Determination unit, for determining that the target is used according to the first physical examination index value and/or the second physical examination index value The extent at family.
9. a kind of non-volatile readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed Device realizes the prediction technique of diabetes described in any one of claims 1 to 7 when executing.
10. a kind of computer equipment, including non-volatile readable storage medium, processor and it is stored in non-volatile readable storage On medium and the computer program that can run on a processor, which is characterized in that the processor is realized when executing described program The prediction technique of diabetes described in any one of claims 1 to 7.
CN201910185079.2A 2019-03-12 2019-03-12 Prediction technique and device, storage medium, the computer equipment of diabetes Pending CN110197720A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910185079.2A CN110197720A (en) 2019-03-12 2019-03-12 Prediction technique and device, storage medium, the computer equipment of diabetes
PCT/CN2019/117217 WO2020181805A1 (en) 2019-03-12 2019-11-11 Diabetes prediction method and apparatus, storage medium, and computer device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910185079.2A CN110197720A (en) 2019-03-12 2019-03-12 Prediction technique and device, storage medium, the computer equipment of diabetes

Publications (1)

Publication Number Publication Date
CN110197720A true CN110197720A (en) 2019-09-03

Family

ID=67751751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910185079.2A Pending CN110197720A (en) 2019-03-12 2019-03-12 Prediction technique and device, storage medium, the computer equipment of diabetes

Country Status (2)

Country Link
CN (1) CN110197720A (en)
WO (1) WO2020181805A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111429289A (en) * 2020-03-23 2020-07-17 平安医疗健康管理股份有限公司 Single disease identification method and device, computer equipment and storage medium
CN111599470A (en) * 2020-04-23 2020-08-28 中国科学院上海技术物理研究所 Method for improving near-infrared noninvasive blood glucose detection precision
CN111657873A (en) * 2020-07-07 2020-09-15 四川长虹电器股份有限公司 Physical constitution prediction method based on visible light and near infrared spectrum technology
WO2020181805A1 (en) * 2019-03-12 2020-09-17 平安科技(深圳)有限公司 Diabetes prediction method and apparatus, storage medium, and computer device
CN111710420A (en) * 2020-05-15 2020-09-25 深圳先进技术研究院 Complication morbidity risk prediction method, system, terminal and storage medium based on electronic medical record big data
CN111739646A (en) * 2020-06-22 2020-10-02 平安医疗健康管理股份有限公司 Data verification method and device, computer equipment and readable storage medium
CN111797284A (en) * 2020-07-08 2020-10-20 北京康健德科技有限公司 Graph database construction method and device, electronic equipment and storage medium
CN112382394A (en) * 2020-11-05 2021-02-19 苏州麦迪斯顿医疗科技股份有限公司 Event processing method and device, electronic equipment and storage medium
WO2021151273A1 (en) * 2020-05-26 2021-08-05 平安科技(深圳)有限公司 Disease prediction method and apparatus, electronic device, and storage medium
CN113658704A (en) * 2021-09-17 2021-11-16 平安国际智慧城市科技股份有限公司 Diabetes risk prediction device, apparatus and storage medium
CN113796852A (en) * 2021-09-30 2021-12-17 太原理工大学 Diabetes foot prediction method based on gradient lifting decision tree model algorithm
CN114242247A (en) * 2021-12-30 2022-03-25 吉林大学第一医院 Non-obese MAFLD prediction system, device and storage medium

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112164454A (en) * 2020-10-10 2021-01-01 联仁健康医疗大数据科技股份有限公司 Diagnosis prediction method and device and electronic equipment
CN113057586B (en) * 2021-03-17 2024-03-12 上海电气集团股份有限公司 Disease early warning method, device, equipment and medium
CN113035357A (en) * 2021-04-06 2021-06-25 昆明医科大学第一附属医院 Diabetic kidney disease risk assessment system
CN113113134A (en) * 2021-04-07 2021-07-13 闵东 Clinical etiology prejudgment device and system
CN113488166A (en) * 2021-07-28 2021-10-08 联仁健康医疗大数据科技股份有限公司 Diabetes data analysis model training and data management method, device and equipment
CN113808744A (en) * 2021-09-22 2021-12-17 河北工程大学 Diabetes risk prediction method, device, equipment and storage medium
CN116189896B (en) * 2023-04-24 2023-08-08 北京快舒尔医疗技术有限公司 Cloud-based diabetes health data early warning method and system
CN117112729A (en) * 2023-08-21 2023-11-24 北京科文思数据管理有限公司 Medical resource docking method and system based on artificial intelligence
CN117494688B (en) * 2023-12-29 2024-03-29 深圳智能思创科技有限公司 Form information extraction method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN2850518Y (en) * 2005-10-24 2006-12-27 北京软测科技有限公司 Portable diabetes condition monitoring apparatus
US20150347707A1 (en) * 2014-05-30 2015-12-03 Anthony Michael Albisser Computer-Implemented System And Method For Improving Glucose Management Through Cloud-Based Modeling Of Circadian Profiles
CN109378072A (en) * 2018-10-13 2019-02-22 中山大学 A kind of abnormal fasting blood sugar method for early warning based on integrated study Fusion Model

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11144825B2 (en) * 2016-12-01 2021-10-12 University Of Southern California Interpretable deep learning framework for mining and predictive modeling of health care data
CN106682412A (en) * 2016-12-22 2017-05-17 浙江大学 Diabetes prediction method based on medical examination data
CN109308545B (en) * 2018-08-21 2023-07-07 中国平安人寿保险股份有限公司 Method, device, computer equipment and storage medium for predicting diabetes probability
CN110197720A (en) * 2019-03-12 2019-09-03 平安科技(深圳)有限公司 Prediction technique and device, storage medium, the computer equipment of diabetes

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN2850518Y (en) * 2005-10-24 2006-12-27 北京软测科技有限公司 Portable diabetes condition monitoring apparatus
US20150347707A1 (en) * 2014-05-30 2015-12-03 Anthony Michael Albisser Computer-Implemented System And Method For Improving Glucose Management Through Cloud-Based Modeling Of Circadian Profiles
CN109378072A (en) * 2018-10-13 2019-02-22 中山大学 A kind of abnormal fasting blood sugar method for early warning based on integrated study Fusion Model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
武士敏 等: "《实用全科护理学》", 30 April 2017, 吉林科学技术出版社 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020181805A1 (en) * 2019-03-12 2020-09-17 平安科技(深圳)有限公司 Diabetes prediction method and apparatus, storage medium, and computer device
CN111429289B (en) * 2020-03-23 2023-03-24 平安医疗健康管理股份有限公司 Single disease identification method and device, computer equipment and storage medium
CN111429289A (en) * 2020-03-23 2020-07-17 平安医疗健康管理股份有限公司 Single disease identification method and device, computer equipment and storage medium
CN111599470A (en) * 2020-04-23 2020-08-28 中国科学院上海技术物理研究所 Method for improving near-infrared noninvasive blood glucose detection precision
CN111710420A (en) * 2020-05-15 2020-09-25 深圳先进技术研究院 Complication morbidity risk prediction method, system, terminal and storage medium based on electronic medical record big data
CN111710420B (en) * 2020-05-15 2024-03-19 深圳先进技术研究院 Complication onset risk prediction method, system, terminal and storage medium based on electronic medical record big data
WO2021151273A1 (en) * 2020-05-26 2021-08-05 平安科技(深圳)有限公司 Disease prediction method and apparatus, electronic device, and storage medium
CN111739646A (en) * 2020-06-22 2020-10-02 平安医疗健康管理股份有限公司 Data verification method and device, computer equipment and readable storage medium
CN111657873A (en) * 2020-07-07 2020-09-15 四川长虹电器股份有限公司 Physical constitution prediction method based on visible light and near infrared spectrum technology
CN111797284A (en) * 2020-07-08 2020-10-20 北京康健德科技有限公司 Graph database construction method and device, electronic equipment and storage medium
CN112382394A (en) * 2020-11-05 2021-02-19 苏州麦迪斯顿医疗科技股份有限公司 Event processing method and device, electronic equipment and storage medium
CN113658704A (en) * 2021-09-17 2021-11-16 平安国际智慧城市科技股份有限公司 Diabetes risk prediction device, apparatus and storage medium
CN113796852A (en) * 2021-09-30 2021-12-17 太原理工大学 Diabetes foot prediction method based on gradient lifting decision tree model algorithm
CN113796852B (en) * 2021-09-30 2023-09-08 太原理工大学 Diabetes foot prediction method based on gradient lifting decision tree model algorithm
CN114242247A (en) * 2021-12-30 2022-03-25 吉林大学第一医院 Non-obese MAFLD prediction system, device and storage medium

Also Published As

Publication number Publication date
WO2020181805A1 (en) 2020-09-17

Similar Documents

Publication Publication Date Title
CN110197720A (en) Prediction technique and device, storage medium, the computer equipment of diabetes
US7809660B2 (en) System and method to optimize control cohorts using clustering algorithms
McFall et al. Quantifying the information value of clinical assessments with signal detection theory
US8793144B2 (en) Treatment effect prediction system, a treatment effect prediction method, and a computer program product thereof
CN110197724A (en) Predict the method, apparatus and computer equipment in diabetes illness stage
CN110197728A (en) Prediction technique, device and the computer equipment of diabetes
WO2021151295A1 (en) Method, apparatus, computer device, and medium for determining patient treatment plan
Koehl et al. Landmark-free geometric methods in biological shape analysis
CN102405473A (en) A point-of-care enactive medical system and method
De Falco et al. A genetic programming-based regression for extrapolating a blood glucose-dynamics model from interstitial glucose measurements and their first derivatives
Nagpal et al. Auton-survival: An open-source package for regression, counterfactual estimation, evaluation and phenotyping with censored time-to-event data
WO2016006042A1 (en) Data analysis device, control method for data analysis device, and control program for data analysis device
Hezarjaribi et al. Human-in-the-loop learning for personalized diet monitoring from unstructured mobile data
Cheng et al. Classification models for pulmonary function using motion analysis from phone sensors
US11961204B2 (en) State visualization device, state visualization method, and state visualization program
Cheng et al. Mining discriminative patterns to predict health status for cardiopulmonary patients
Liu et al. Methods for estimating and interpreting provider-specific standardized mortality ratios
WO2021122345A1 (en) Aortic stenosis classification
Johnson Mortality prediction and acuity assessment in critical care
Gyuk et al. Diabetes lifestyle support with improved glycemia prediction algorithm
KR102550465B1 (en) Artificial intelligence-based virtual patient management system
US20220406017A1 (en) Health management system, and human body information display method and human body model generation method applied to same
Xu The Application of Machine Learning-Based Prediction Models for Cardiometabolic Risk Among a Representative US Adult Population: A Cross-Sectional Study of NHANES 1999-2006
Priya et al. Multi Modal Smart Diagnosis of Pulmonary Diseases
Shamsuddin Analyzing and Synthesizing Healthcare Time Series Data for Decision-Support

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190903

RJ01 Rejection of invention patent application after publication