CN110197720A - Prediction technique and device, storage medium, the computer equipment of diabetes - Google Patents
Prediction technique and device, storage medium, the computer equipment of diabetes Download PDFInfo
- Publication number
- CN110197720A CN110197720A CN201910185079.2A CN201910185079A CN110197720A CN 110197720 A CN110197720 A CN 110197720A CN 201910185079 A CN201910185079 A CN 201910185079A CN 110197720 A CN110197720 A CN 110197720A
- Authority
- CN
- China
- Prior art keywords
- physical examination
- index value
- training
- diabetes
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010012601 diabetes mellitus Diseases 0.000 title claims abstract description 122
- 238000000034 method Methods 0.000 title claims abstract description 65
- 238000003860 storage Methods 0.000 title claims abstract description 17
- 239000008280 blood Substances 0.000 claims abstract description 71
- 210000004369 blood Anatomy 0.000 claims abstract description 71
- 239000008103 glucose Substances 0.000 claims abstract description 63
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 claims abstract description 48
- 230000000291 postprandial effect Effects 0.000 claims abstract description 39
- 230000036541 health Effects 0.000 claims abstract description 31
- 230000001373 regressive effect Effects 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims description 159
- 238000004422 calculation algorithm Methods 0.000 claims description 48
- 238000013507 mapping Methods 0.000 claims description 30
- 235000000346 sugar Nutrition 0.000 claims description 27
- 238000011156 evaluation Methods 0.000 claims description 19
- 238000003066 decision tree Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 13
- 230000004927 fusion Effects 0.000 claims description 11
- 238000007637 random forest analysis Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 235000012054 meals Nutrition 0.000 claims description 6
- 101001095088 Homo sapiens Melanoma antigen preferentially expressed in tumors Proteins 0.000 claims 4
- 102100037020 Melanoma antigen preferentially expressed in tumors Human genes 0.000 claims 4
- 238000005516 engineering process Methods 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 17
- 230000006870 function Effects 0.000 description 7
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 6
- 210000002784 stomach Anatomy 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 201000010099 disease Diseases 0.000 description 4
- NOESYZHRGYRDHS-UHFFFAOYSA-N insulin Chemical compound N1C(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(NC(=O)CN)C(C)CC)CSSCC(C(NC(CO)C(=O)NC(CC(C)C)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCC(O)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CSSCC(NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2C=CC(O)=CC=2)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2NC=NC=2)NC(=O)C(CO)NC(=O)CNC2=O)C(=O)NCC(=O)NC(CCC(O)=O)C(=O)NC(CCCNC(N)=N)C(=O)NCC(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC(O)=CC=3)C(=O)NC(C(C)O)C(=O)N3C(CCC3)C(=O)NC(CCCCN)C(=O)NC(C)C(O)=O)C(=O)NC(CC(N)=O)C(O)=O)=O)NC(=O)C(C(C)CC)NC(=O)C(CO)NC(=O)C(C(C)O)NC(=O)C1CSSCC2NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(N)CC=1C=CC=CC=1)C(C)C)CC1=CN=CN1 NOESYZHRGYRDHS-UHFFFAOYSA-N 0.000 description 4
- 150000008163 sugars Chemical class 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 239000012141 concentrate Substances 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- 206010018473 Glycosuria Diseases 0.000 description 2
- 102000004877 Insulin Human genes 0.000 description 2
- 108090001061 Insulin Proteins 0.000 description 2
- 230000036772 blood pressure Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 229940125396 insulin Drugs 0.000 description 2
- 150000002632 lipids Chemical class 0.000 description 2
- 230000013011 mating Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 210000000434 stratum corneum Anatomy 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 206010023379 Ketoacidosis Diseases 0.000 description 1
- 208000007976 Ketosis Diseases 0.000 description 1
- 210000001015 abdomen Anatomy 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 210000004204 blood vessel Anatomy 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000001508 eye Anatomy 0.000 description 1
- 230000004992 fission Effects 0.000 description 1
- 210000002683 foot Anatomy 0.000 description 1
- 210000002216 heart Anatomy 0.000 description 1
- 201000001421 hyperglycemia Diseases 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 208000030159 metabolic disease Diseases 0.000 description 1
- 210000000578 peripheral nerve Anatomy 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Abstract
This application discloses a kind of prediction technique of diabetes and device, storage medium, computer equipments, it is related to field of computer technology, can effectively solve can only judge whether user suffers from diabetes in the prior art, but the problem of can not judging the severity of its illness, wherein method includes: the sample of users data obtained in original health archives and electronic health record data;Regressive prediction model according to the user characteristics creation numeric type in the sample of users data;The first physical examination index value of target user's fasting blood-glucose and the second physical examination index value of postprandial preset duration blood glucose are judged using the regressive prediction model;According to the first physical examination index value and/or the second physical examination index value, the extent of the target user is determined.The application is suitable for the prediction of diabetes, and the determination to diabetes extent.
Description
Technical field
This application involves the prediction technique and device of field of computer technology more particularly to a kind of diabetes, storages to be situated between
Matter, computer equipment.
Background technique
Diabetes are one group of metabolic diseases characterized by hyperglycemia, and it is impaired that when morbidity will lead to big blood vessel, capilary
And multiple positions such as jeopardize the heart, brain, kidney, peripheral nerve, eyes, foot, also it can reinforce the pre- of diabetes with multiple complications
It is completely necessary for surveying work.However as the progress of science and technology, the diagnosis of disease has been not limited to the analysis of doctor, has utilized people
Work intelligently predicts diabetes, can more meet trend of today.
It is in the industry at present by collecting diabetes case, by diabetes patient data for the common methods of glycosuria disease forecasting
It is compared with healthy population data, constructs 0-1 disaggregated model by all kinds of characteristic dimension data of patient and judge that user is
It is no to suffer from diabetes.
However the method for existing glycosuria disease forecasting can only judge whether patient suffers from diabetes, can not but judge its illness
Severity causes diagnostic result incomplete, can not carry out matched control according to extent and treat, and then may make
At the exacerbated of conditions of patients.
Summary of the invention
In view of this, this application provides a kind of prediction technique of diabetes and device, storage medium, computer equipment,
Main purpose is to solve when carrying out the prediction of diabetes using the 0-1 disaggregated model of building, can only judge whether user suffers from
Diabetes, the problem that can not but judge the severity of its illness, and then cause diagnostic result incomplete.
According to the one aspect of the application, a kind of prediction technique of diabetes is provided, this method comprises:
Obtain the sample of users data in original health archives and electronic health record data;
Regressive prediction model according to the user characteristics creation numeric type in the sample of users data;
When judging the first physical examination index value of target user's fasting blood-glucose using the regressive prediction model and preset after the meal
Second physical examination index value of long blood glucose;
According to the first physical examination index value and/or the second physical examination index value, the illness of the target user is determined
Degree.
According to further aspect of the application, a kind of prediction meanss of diabetes are provided, which includes:
Acquiring unit, for obtaining the sample of users data in original health archives and electronic health record data;
Creating unit, for the regression forecasting mould according to the user characteristics creation numeric type in the sample of users data
Type;
Judging unit, for judging the first physical examination index value of target user's fasting blood-glucose using the regressive prediction model
With the second physical examination index value of postprandial preset duration blood glucose;
Determination unit, for determining the mesh according to the first physical examination index value and/or the second physical examination index value
Mark the extent of user.
According to the another aspect of the application, a kind of non-volatile readable storage medium is provided, calculating is stored thereon with
Machine program realizes the prediction technique of above-mentioned diabetes when described program is executed by processor.
According to another aspect of the application, a kind of computer equipment is provided, including non-volatile readable storage medium,
Processor and it is stored in the computer program that can be run on non-volatile readable storage medium and on a processor, the processor
The prediction technique of above-mentioned diabetes is realized when executing described program.
By above-mentioned technical proposal, a kind of prediction technique and device, storage medium, calculating of diabetes provided by the present application
Machine equipment, compared with currently with the method for the 0-1 disaggregated model of building prediction diabetes, the application is pre- in existing diabetes
It surveys on the basis of model, increases the regressive prediction model of postprandial blood sugar and empty stomach 2h blood glucose, sentence using regressive prediction model
Second physical examination index value of disconnected first physical examination index value of target user's fasting blood-glucose and postprandial preset duration blood glucose out, Ji Keli
It determines whether target user suffers from diabetes with physical examination index value, and can also further judge the illness journey of target user
Degree.
Above description is only the general introduction of technical scheme, in order to better understand the technological means of the application,
And it can be implemented in accordance with the contents of the specification, and in order to which the above and other objects, features and advantages of the application can be more
It becomes apparent, below the special specific embodiment for lifting the application.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen
Illustrative embodiments and their description please do not constitute the improper restriction to locally applying for explaining the application.In the accompanying drawings:
Fig. 1 shows a kind of flow diagram of the prediction technique of diabetes provided by the embodiments of the present application;
Fig. 2 shows the flow diagrams of the prediction technique of another diabetes provided by the embodiments of the present application;
Fig. 3 shows a kind of structural schematic diagram of the prediction meanss of diabetes provided by the embodiments of the present application;
Fig. 4 shows the structural schematic diagram of the prediction meanss of another diabetes provided by the embodiments of the present application.
Specific embodiment
The application is described in detail below with reference to embodiment and in conjunction with attached drawing.It should be noted that not conflicting
In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
When predicting diabetes, it can not be sentenced according to user data for the 0-1 disaggregated model currently with building
The problem of disconnected diabetes illness severity out, a kind of prediction technique of diabetes is present embodiments provided, as shown in Figure 1, should
Method includes:
101, the sample of users data in original health archives and electronic health record data are obtained.
Wherein, sample of users data may include that patient assessment's data, physical examination achievement data, administration data and health inform number
According to etc..
102, the regressive prediction model according to the user characteristics creation numeric type in sample of users data.
Wherein, user characteristics may include postprandial blood sugar and empty stomach 2h blood glucose, blood pressure, stratum corneum lipids, insulin, BMI body
The multiclass features dimension datas such as performance figure, diabetes hereditary information, age, diagnostic result.
In a particular embodiment, regressive prediction model can be used a variety of different frames models based on decision tree and carry out
It constructs, i.e., multiple prediction models based on decision tree is flocked together using integrated study thought, to improve prediction knot
The accuracy rate of fruit.Decision tree is to belong to one kind fairly simple in machine learning supervised learning sorting algorithm, and decision tree is prediction
Model;What it was represented is a kind of mapping relations between object properties and object value.Each node indicates some object in tree, and
Some possible attribute value that each diverging paths then represent, and then correspondence is passed through each leaf node from root node to the leaf node
The value of object represented by the path gone through.Decision tree only has single output, if being intended to plural output, can establish independent decision
Tree is to handle different outputs.Decision Tree algorithms have an ID3, C4.5, CART algorithm, common ground be all be greedy algorithm, degree of being distinguished as
Amount mode is different, and just such as ID3 has used obtained information quantity as metric form, and C4.5 uses maximum gain ratio.
By create obtained regressive prediction model can be very good to reflect different blood pressures, stratum corneum lipids, insulin,
The corresponding postprandial plasma glucose level of sample of users of BMI body-mass index, diabetes hereditary information, age, diagnostic result etc.
With empty stomach 2h blood glucose value.
103, when judging the first physical examination index value of target user's fasting blood-glucose using regressive prediction model and preset after the meal
Second physical examination index value of long blood glucose.
Wherein, target user is the user for needing to carry out diabetic condition prediction;First physical examination index value corresponds to target use
The data detection result of family fasting blood-glucose;Second physical examination index value corresponds to the data detection of the postprandial preset duration blood glucose of target user
As a result;Preset duration can determine according to actual needs.
The postprandial plasma glucose level and empty stomach 2h blood glucose value reflected for the present embodiment, the sample of users based on different characteristic,
The feature of target user is matched with the feature of sample of users, finds the corresponding postprandial plasma glucose level of matched sample user characteristics
With empty stomach 2h blood glucose value.
104, according to the first physical examination index value and/or the second physical examination index value, the extent of target user is determined.
In specific application scenarios, can judge whether target user's fasting blood-glucose is normal according to the first physical examination index value,
Judge whether the postprandial preset duration blood glucose of target user normal according to the second physical examination index value, when the first physical examination index value and/or
When the display of second physical examination index value is abnormal, it can judge that user suffers from diabetes, and can also be by carrying out with critical value
Compare, further judges the extent of patient.
The prediction technique of middle diabetes through this embodiment can create number according to the user characteristics in sample of users data
The regressive prediction model of value type judges the first physical examination index value of target user's fasting blood-glucose and postprandial using regressive prediction model
Second physical examination index value of preset duration blood glucose, and according to the first physical examination index value and/or the second physical examination index value, determine mesh
Mark user whether the severity of illness and illness, keep condition-inference result more accurate, diagnosis content is more perfect, is convenient for
Timely and effectively mating treatment is carried out according to the different development degree of diabetes, and then contains progression of the disease.
Further, as the refinement and extension of above-described embodiment specific embodiment, in order to completely illustrate the application reality
The specific implementation process in example is applied, the prediction technique of another diabetes is provided, as shown in Fig. 2, this method comprises:
201, the sample of users data in original health archives and electronic health record data are obtained.
For example, obtaining 100 or so the complete samples of user characteristics altogether in original health archives and electronic health record data
Then user data is further analyzed processing to sample of users data.
It is divided to two kinds of prediction modes to be illustrated below, one is this physical examination index values using fasting blood sugar to be predicted
(i.e. process shown in step 202a to 205a), another kind are predicted using postprandial this physical examination index value of two hours blood glucose values
(i.e. process shown in step 202b to 205b).
202a, fasting blood sugar is mixed the sample in the user characteristics at family as label information Y1, and mix the sample with family except sky
Target signature data other than abdomen blood glucose value and postprandial two hours blood glucose values create the first model training as characteristic information X1
Collection.
Wherein, user characteristics are extracted from sample of users data using regular expression, and target signature data are at least
Suffer from history data, hospitalization data, medical administration data, physical examination data, healthy one informed in data including sample of users
Or it is multinomial, such as may include that suffer from medical history taking, record of being hospitalized, medicining condition, physical examination situation, the health of user is informed etc. and related to be believed
Breath.
The first obtained model training is created to concentrate comprising each characteristic information X1 and corresponding label information
Y1.The i.e. different sample of users for suffering from medical history taking, record of being hospitalized, medicining condition, physical examination situation, health informing etc. are corresponding
Fasting blood sugar.
203a, default regression forecasting algorithm training is based on for judging the first physical examination index value by the first model training collection
The first identification model.
Wherein, it presets regression forecasting algorithm and decision tree is promoted by random forest (Random Forest), gradient
Tetra- kinds of (Gradient Boosting Decision Tree, GBDT), Xgboost, LightGBM algorithm fusions obtain, and first
The assessment of identification model uses mean absolute percentage error (MAPE) index, when the corresponding MAPE index value of the first identification model
When comparing threshold value less than pre-set criteria, determine that the first identification model meets evaluation criteria.MAPE index is predicted for assessment models
Error between value and true value.Common regression model evaluation index has MAP, MSE, RMSE and MAPE, but MAP, MSE and
RMSE only considers the value of error, and MAPE also contemplates the ratio between error and true value, its calculation formula is:
In formula above, N is total sample number, and X is measured value, and Y is the analogue value.MAPE value is smaller, illustrates model prediction
Error between value and true value is smaller, in a specific embodiment, standard comparing threshold value can be set according to actual conditions, when
When MAPE is less than standard comparing threshold value, illustrate that the first identification model meets evaluation criteria.By the identification mould for meeting evaluation criteria
Type is predicted, it is ensured that the accuracy of prediction result.
The first identification model by meeting evaluation criteria can determine first between characteristic information X1 and label information Y1
Mapping relations.
The first identification mould is obtained in order to illustrate the default regression forecasting algorithm training obtained using above-mentioned four kinds of algorithm fusions
The process of type, alternatively, the process are specific can include:
(1) the first training sample set, the second training are obtained from the first model training concentration using stochastical sampling mode respectively
Sample set, third training sample set, the 4th training sample set, such as n trained sample is randomly selected from the first model training concentration
This, carries out four-wheel extraction altogether, obtains four training sets.(mutually indepedent between four training sets, element can have repetition);
(2) random forests algorithm is utilized based on the first training sample set, training obtains the first classifier;Based on the second training
Sample set utilizes GBDT algorithm, and training obtains the second classifier;Xgboost algorithm, training are utilized based on third training sample set
Obtain third classifier;LightGBM algorithm is utilized based on the 4th training sample set, training obtains the 4th classifier;
Wherein, each training sample concentration includes different characteristic information X1 and corresponding label information
The training process of Y1, these four classifiers can train these four that obtain, and obtain based on corresponding model training algorithm
Classifier all can individually carry out the prediction of user's diabetes, that is, input characteristic (the particular content character pair of user to be measured
Information X1), corresponding label information Y1 is found by classifier.
For the specific training process of the first classifier: 1. being concentrated from the first training sample and use the side Bootstraping
Method puts back to sampling at random and selects m sample, carries out n times sampling altogether, generates n training set;2. for n training set, respectively
N decision-tree model of training (can be constructed) by the existing algorithm such as ID3 algorithm, C4.5 algorithm, CART algorithm;3. for single
Decision-tree model, it is assumed that the number of training sample feature is n, then according to information gain/information gain ratio/base when dividing every time
The best feature of Buddhist nun's Index selection divides;4. each tree all go down so always by division, until all training of the node
Sample belongs to same class, does not need beta pruning in the fission process of decision tree;5. more decision trees of generation are formed random
Forest.For regression problem, the mean value for setting predicted value by more determines final prediction result, the i.e. prediction as the first classifier
As a result.
For the specific training process of the second classifier: the second training sample T=of input (x1, y1), (x2, y2) ...
(xm, ym) } T=(x1, y1), (x2, y2) ... (xm, ym) }, maximum number of iterations T, loss function L.Output is to learn by force
Device f (x):
A weak learner) is initialized
Wherein, c is setting constant.
B) to iteration wheel number t=1,2 ... T has:
A) to sample i=1,2 ... m calculates negative gradient rti
B) (xi, rti) (i=1,2 ..m) is utilized, is fitted a CART regression tree, obtains the t regression tree, corresponded to
Leaf node region be Rtj, j=1,2 ..., J.Wherein J is the number of the leaf node of regression tree t.
C) to area foliage j=1,2 ..J, best-fit values c is calculatedtj
D) strong learner is updated
Wherein, I is the training sample of all leaf node region Rtj
Set.
C the expression formula of strong learner f (x)) is obtained
Based on above-mentioned strong learner f (x), training obtains the second classifier.
For the specific training process of third classifier:
A initial model) is established, formula specific as follows:
Wherein, k indicates the number of tree, and F indicates that each tree construction of building, xi indicate
I-th of sample, predicted value that is score value of the xi on each tree and being exactly xi,For predicted value.
The objective function of the initial model is
Yi is the corresponding sample actual value of xi.
B) with the growth of tree, the formula recursion taken turns by t, obtaining final goal function is
Wherein, IjIt indicates: j-th of leaf
In include all samples, wj indicates the weight of j-th of leaf, and γ T corresponds to the number of leaf.
C third training sample set data) are substituted into using above-mentioned initial model and are fitted training, and utilize above-mentioned final mesh
The fine or not degree that scalar functions measure models fitting training data (calculates loss function using objective function, loss function is smaller
Illustrate that model can preferably be fitted training data) it is so that the deviation and variance of model obtain standard requirements, i.e., final trained
To third classifier.
For the specific training process of the 4th classifier:
A the data that the 4th sample training is concentrated) are fitted using existing LightGBM algorithm, and to obtaining after each fitting
Model concentrates the test set selected to be tested using from the 4th sample training, obtains the corresponding coefficient of determination and mean square error
Value;
B the model) when the coefficient of determination is greater than certain threshold value and respectively error amount is less than certain threshold value, after determining fitting
It complies with standard, and standard compliant model is determined as the 4th classifier.
(3) the first classifier, the second classifier, third classifier, the 4th classifier are finally utilized into bagging method
(bagging) fusion treatment is carried out, the first identification model is obtained.
Specific fusion treatment mode is the process by voting, that is, uses most of principles, the minority is subordinate to the majority.Example
Such as, for these four classifiers, suffer from history data, hospitalization data, medical administration data, physical examination number in input user to be measured
After informing data according to, health, if the corresponding fasting blood sugar of prediction result obtained in four classifiers there are three classifier
Meet the standard with diabetes, then can determine user to be measured with diabetes;If only one classifier obtains pre-
It surveys the corresponding fasting blood sugar of result and meets the standard with diabetes, the corresponding fasting blood sugar of the other three classifier is not inconsistent
The standard is closed, then can determine that user to be measured does not suffer from diabetes.
It should be noted that being trained if the MAPE index value of the first identification model is greater than pre-set criteria and compares threshold value
The first obtained identification model does not meet evaluation criteria, then the first model training collection can be repartitioned, obtains the first new instruction
Practice sample set, the second training sample set, third training sample set, the 4th training sample set, then utilizes the first new training sample
This collection continues to train the first classifier, and continues the second classifier of training using the second new training sample set, and utilize newly
Third training sample set continues to train third classifier, and continues the 4th classifier of training using the 4th new training sample set,
Then this four Multiple Classifier Fusion processing obtained again by newly training, determine the MAPE index value of the first new identification model
Whether it is less than pre-set criteria and compares threshold value, if being still greater than pre-set criteria compares threshold value, repeats above-mentioned repetition and divide mould
Type training set and the process for updating training classifier, until the MAPE index value of newest the first obtained identification model is greater than
Pre-set criteria compares threshold value, that is, meets evaluation criteria.
204a, the characteristic information of target user is input in the first identification model and characteristic information X1 progress similarity
Match.
Wherein, the characteristic information of target user corresponds to target user in addition to fasting blood sugar and postprandial two hours blood glucose values
Target signature data.
Alternatively, step 204a is specific can include: by the characteristic information of target user by data cleansing,
Feature extraction, Missing Data Filling, outlier processing obtain the characteristic information of structural data;The feature of structural data is believed
Breath carries out similarity mode with characteristic information X1.
Since the characteristic information of target user sometimes includes hash, and/or there are missing values, and/or there are different
Constant value, that is, be not suitable for the unstructured data directly predicted using the first identification model.Therefore, target can be used first
The characteristic information at family carries out data cleansing, and removing hash, (such as removal user shows dwelling location, registered permanent residence location number
According to only history data, hospitalization data, medical administration data, physical examination data, health informing data etc. are suffered from reservation);Again to reservation
Data carry out feature extraction and (suffer from history data, hospitalization data, medical administration data, physical examination data, health informing number as extracted
According to etc.);If there are can be filled (height in such as user's physical examination data when missing values using 0 value in the characteristic extracted
With one vacancy of weight, to be filled using 0 value, guarantee when matching with characteristic information X1 in model subsequent in this way is comparable,
Generating when avoiding characteristic matching can not matched mistake);If in the characteristic extracted, there are exceptional values can refer to practical feelings
It (is 99999 days as being hospitalized duration one, hence it is evident that there are exceptions, can further pass through the time started of being hospitalized that condition, which is modified processing,
Duration of being correctly hospitalized, processing of then modifying are calculated with the end time).
Pass through a series of places such as data cleansing, feature extraction, Missing Data Filling, outlier processing in this optional way
Reason, it is ensured that obtain avoiding feature with the structural data being comparable when characteristic information X1 is matched in the first identification model
When matching generate can not matched mistake, remove exceptional value, improve the accuracy of characteristic matching.
205a, it is greater than preset threshold and the highest characteristic information X1 of similarity and the first mapping relations using similarity, really
The corresponding first physical examination index value of the user that sets the goal.
Wherein, preset threshold can be preset according to actual needs.For example, preset threshold is arranged bigger, it is corresponding special
It is higher to levy matching precision, if similarity is 100%, illustrates that feature exactly matches.
For example, suffering from history data, hospitalization data, medical administration data, body in the first identification model input target user
Examine data, health inform data after, be equivalent to by these data be separately input in four classifiers of step 203a and with point
The corresponding characteristic information of class device carries out similarity mode, finds characteristic information that is most like and being greater than certain threshold value respectively,
And then corresponding physical examination index value, the i.e. fasting blood sugar of target user are found out respectively by this four classifiers, if
Meet the standard with diabetes there are three fasting blood sugar in this four fasting blood sugars, then can determine that target user suffers from
Diabetes, and the first physical examination index that the average value for calculating these three fasting blood sugars is calculated as the first identification model
Value;If not meeting the standard with diabetes there are two fasting blood sugar in this four fasting blood sugars, other two is on an empty stomach
Blood glucose value meets the standard with diabetes, then calculating the average value of this four fasting blood sugars as the first identification model meter
The first obtained physical examination index value, and determine whether target user suffers from diabetes according to this average value.
The step 202b arranged side by side with step 202a, mix the sample with the user characteristics Chinese meal at family after two hours blood glucose values as mark
Information Y2 is signed, and mixes the sample with target signature data of the family in addition to fasting blood sugar and postprandial two hours blood glucose values as feature
Information X2 creates the second model training collection.
It should be noted that step 202b is similar with step 202a, the target signature data of sample of users include at least sample
This user suffer from history data, hospitalization data, medical administration data, physical examination data, health inform it is one or more in data.
And it creates the second obtained model training to concentrate comprising each characteristic information X2 and corresponding label information Y2.I.e.
The sample of users that difference suffers from medical history taking, record of being hospitalized, medicining condition, physical examination situation, health informing etc. is corresponding postprandial
Two hours blood glucose values.
203b, default regression forecasting algorithm training is based on for judging the second physical examination index value by the second model training collection
The second identification model.
Wherein, the assessment of the second identification model equally uses MAPE index, when the corresponding MAPE index of the second identification model
When value compares threshold value less than preassigned, determine that the second identification model meets evaluation criteria, by meet evaluation criteria second
Identification model can determine the second mapping relations between characteristic information X2 and label information Y2.
Alternatively, the detailed process of the 203b step can include:
(1) it is concentrated using stochastical sampling mode from the second model training and obtains the 5th training sample set, the 6th training respectively
Sample set, the 7th training sample set, the 8th training sample set;
(2) random forests algorithm is utilized based on the 5th training sample set, training obtains the 5th classifier;Based on the 6th training
Sample set utilizes GBDT algorithm, and training obtains the 6th classifier;Xgboost algorithm, training are utilized based on the 7th training sample set
Obtain the 7th classifier;LightGBM algorithm is utilized based on the 8th training sample set, training obtains the 8th classifier;
(3) the 5th classifier, the 6th classifier, the 7th classifier, the 8th classifier are carried out at fusion using bagging method
Reason, obtains the second identification model.
Similar with the optional way in step 203a, specific fusion treatment mode is also the process by voting, i.e.,
Using most of principles, the minority is subordinate to the majority.For example, for these four classifiers, input user to be measured suffer from history data,
After hospitalization data, medical administration data, physical examination data, health inform data, if obtained in four classifiers there are three classifier
To the corresponding postprandial two hours blood glucose values of prediction result meet the standard with diabetes, then can determine that user to be measured suffers from
Diabetes;If the corresponding postprandial two hours blood glucose values of the prediction result that only one classifier obtains meet with diabetes
Standard, the corresponding postprandial two hours blood glucose values of the other three classifier do not meet the standard, then can determine that user to be measured does not have
Suffer from diabetes.
It should be noted that being trained if the MAPE index value of the second identification model is greater than pre-set criteria and compares threshold value
The second obtained identification model does not meet evaluation criteria, then the first model training collection can be repartitioned, obtains the 5th new instruction
Practice sample set, the 6th training sample set, the 7th training sample set, the 8th training sample set, then utilizes the 5th new training sample
This collection continues to train the 5th classifier, and continues the 6th classifier of training using the 6th new training sample set, and utilize newly
7th training sample set continues the 7th classifier of training, and continues the 8th classifier of training using the 8th new training sample set,
Then this four Multiple Classifier Fusion processing obtained again by newly training, determine the MAPE index value of the second new identification model
Whether it is less than pre-set criteria and compares threshold value, if being still greater than pre-set criteria compares threshold value, repeats above-mentioned repetition and divide mould
Type training set and the process for updating training classifier, until the MAPE index value of newest the second obtained identification model is greater than
Pre-set criteria compares threshold value, that is, meets evaluation criteria.
204b, the characteristic information of target user is input in the second identification model and characteristic information X2 progress similarity
Match.
In this step, the characteristic information of target user corresponds to target user except fasting blood sugar and postprandial two hours blood glucose
Target signature data other than value.
Alternatively, step 204b is specific can include: by the characteristic information of target user by data cleansing,
Feature extraction, Missing Data Filling, outlier processing obtain the characteristic information of structural data;The feature of structural data is believed
Breath carries out similarity mode with characteristic information X2.
It is similar with the optional way in step 204a, pass through the data cleansing in this optional way, feature extraction, missing values
A series of processing such as filling, outlier processing, it is ensured that having when obtaining matching with characteristic information X2 in the second identification model can
Than the structural data of property, avoid generating when characteristic matching can not matched mistake, remove exceptional value, improve the essence of characteristic matching
Exactness.
205b, it is greater than predetermined threshold and the highest characteristic information X2 of similarity and the second mapping relations using similarity, really
The corresponding second physical examination index value of the user that sets the goal.
Wherein, predetermined threshold can be preset according to actual needs.For example, predetermined threshold is arranged bigger, it is corresponding special
It is higher to levy matching precision, if similarity is 100%, illustrates that feature exactly matches.
For example, suffering from history data, hospitalization data, medical administration data, body in the second identification model input target user
Examine data, health inform data after, be equivalent to by these data be separately input in four classifiers of step 203b and with point
The corresponding characteristic information of class device carries out similarity mode, finds characteristic information that is most like and being greater than certain threshold value respectively,
And then corresponding physical examination index value, the i.e. postprandial two hours blood glucose of target user are found out respectively by this four classifiers
Value, if meeting the standard with diabetes there are three postprandial two hours blood glucose values in this four postprandial two hours blood glucose values, that
It can determine that target user suffers from diabetes, and calculate the average value of these three postprandial two hours blood glucose values as the second identification mould
The second physical examination index value that type is calculated;If there are two postprandial two hours blood glucose values in this four postprandial two hours blood glucose values
The standard with diabetes is not met, other two postprandial two hours blood glucose value meets the standard with diabetes, then calculating
The second physical examination index value that the average value of this four postprandial two hours blood glucose values is calculated as the second identification model, and foundation
This average value determines whether target user suffers from diabetes.
206, according to the first physical examination index value and/or the second physical examination index value, the extent of target user is determined.
Alternatively, step 206 is specific can include: if the corresponding first physical examination index value of target user is greater than
It is more than or equal to the second preset threshold equal to the first preset threshold and/or the second physical examination index value, it is determined that target user is with sugar
Urine disease;Then pass through the second number locating for the first numerical intervals locating for the first physical examination index value, and/or the second physical examination index value
It is worth section, judges the extent of target user.
Wherein, the first preset threshold is to judge the established standards of diabetes according to fasting blood-glucose to determine, such as
7.0mmol/L;Second preset threshold is to judge the established standards of diabetes according to postprandial two hours blood glucose to determine, such as
11.1mmol/L。
Such as, however, it is determined that the first preset threshold of target user is 8.0mmol/L, and the second preset threshold is 7.6mmol/L,
Because the first physical examination index value is greater than the first preset threshold, therefore it can determine target user with diabetes;If it is determined that target user
One preset threshold is 5.7mmol/L, and the second preset threshold is 11.9mmol/L, is preset because the second physical examination index value is greater than second
Threshold value, therefore can determine target user with diabetes;If it is determined that the first preset threshold of target user is 8.3mmol/L, second is pre-
If threshold value is 11.7mmol/L, because the first physical examination index value is greater than the first preset threshold, it is pre- that the second physical examination index value is greater than second
If threshold value, therefore it can determine target user with diabetes.
And for the extent of diabetes, divide three kinds of situations to discuss below:
(1) only judged with the first physical examination index value, that is, pass through the first numerical intervals locating for the first physical examination index value, judgement
The extent of the target user, specifically can include: divide and be greater than the first preset threshold, and according to predetermined value regular increase
Multiple numerical intervals;Create the third mapping relations between multiple numerical intervals and diabetes extent;Determine the first body
Examine corresponding the first numerical intervals in multiple numerical intervals of index value;According to third mapping relations and the first numerical value area
Between, judge the first diabetes extent of target user.
For example, setting is right greater than in multiple numerical intervals of the first preset threshold 7.0mmol/L and third mapping relations
The diabetes extent answered is respectively as follows: mild diabetes: 7.0~8.4mmol/L, medium diabetes mellitus: 8.4~10.1mmol/
L, severe diabetes: greater than 10.11mmol/L.If it is determined that the first physical examination index value is 9.6mmol/L, then it can determine whether out the first body
The first numerical intervals that inspection index value is in are as follows: 8.4~11.1mmol/L, then according to third mapping relations and the first numerical value
Section can determine whether out that the extent of the diabetes of target user is medium diabetes mellitus.
(2) only judged with the second physical examination index value, that is, pass through second value section locating for the second physical examination index value, judgement
The extent of target user, specifically includes: dividing and is greater than the second preset threshold, and according to the multiple of predetermined value regular increase
Numerical intervals;Create the 4th mapping relations between multiple numerical intervals and diabetes extent;Determine the second physical examination index
The corresponding second value section in multiple numerical intervals of value;According to the 4th mapping relations and second value section, judgement
The second diabetes extent of target user.
For example, setting is right greater than in multiple numerical intervals of the second preset threshold 11.1mmol/L and the 4th mapping relations
The diabetes extent answered is respectively as follows: medium diabetes mellitus: 11.1~16.7mmol/L, severe diabetes: greater than 16.7mmol/
L (the phenomenon that being easy to appear ketoacidosis when being greater than 16.7mmol/L).If it is determined that the second physical examination index value is 12.6mmol/
L then can determine whether out the second value section that the second physical examination index value is in are as follows: 11.1~16.7mmol/L is then reflected according to the 4th
Relationship and second value section are penetrated, can determine whether out that the extent of the diabetes of target user has been medium diabetes mellitus.
(3) combining the first physical examination index value and the second physical examination index value to carry out comprehensive judgement, (this decision procedure is due to examining
Consider many factors, therefore precision of prediction is relatively high), i.e., by the first numerical intervals locating for the first physical examination index value and
Second value section locating for second physical examination index value, judges the extent of target user, specifically includes: if the first diabetes
Extent and the second diabetes extent are identical, then determine final illness journey according to the identical diabetes extent of the two
Degree.If the first diabetes extent and the second diabetes extent are different, according to user to passing through the first identification model
With the accuracy rate of both prediction modes of the second identification model feedback or adopt rate, obtain the first identification model corresponding the respectively
One weight and corresponding second weight of the second identification model;When the first weight is greater than the second weight, by the first diabetes illness
Degree is determined as the extent of target user;It is when the second weight is greater than the first weight, the second diabetes extent is true
It is set to the extent of target user.
In the present embodiment, two kinds of corresponding weights of prediction mode can according to user feedback accuracy rate or adopt rate
It is set.Specific statistics available different accuracy rate adopts the corresponding weighted value of rate, is then reflected by what statistics obtained
Relationship is penetrated, the corresponding weight of prediction mode is found.For the present embodiment, according to user feedback accuracy rate or adopt rate, can be quasi-
The prediction result for really reflecting which kind of prediction mode precision of prediction is higher, and then the higher prediction mode of precision of prediction being selected to obtain
Determine as final as a result, more accurate.In addition to this, it is each that two kinds of prediction modes can be also artificially preset according to the actual situation
Self-corresponding weight.
For example, according to user feedback as a result, discovery utilizes the standard of the first physical examination index value prediction diabetes extent
True rate is higher, then can be 70% for the weight of the first physical examination index value prediction mode configuration, be the second physical examination index value prediction side
The weight of formula configuration is 30%, and the result that can predict the first physical examination index value when the result difference that two kinds of predictions generate is fed back
To target user, as last diagnostic result.Assuming that the prediction of the first physical examination index value is medium diabetes mellitus, the second physical examination index
Value prediction is severe diabetes, then according to the weight of configuration height, the final diabetes extent for determining target user is
Medium diabetes mellitus.
It is subsequent obtain target user practical fasting blood sugar and postprandial two hours blood glucose values after, be alternatively arranged as new sample
This training set continues to train to two identification models in the present embodiment, to reach the higher effect of precision of prediction.Pass through
The prediction technique of above-mentioned diabetes can be determined between characteristic information and label information by being trained to model training collection
Mapping relations, the structural data of target user is matched with regressive prediction model, so pass through mapping relations determine
First physical examination index value of fasting blood-glucose and postprandial two hours the second physical examination index values, by with the first preset threshold and second
The numerical value of preset threshold compares, and can judge whether user suffers from diabetes, not only can be pre- from diabetes diagnosis index
Survey user whether illness, moreover it is possible to pass through the first numerical intervals locating for the first physical examination index value, and/or the second physical examination index value institute
The second value section at place, judges the extent of target user, keeps diagnostic result more perfect.
Further, the concrete embodiment as method shown in Fig. 1 and Fig. 2, the embodiment of the present application provide a kind of diabetes
Prediction meanss, as shown in figure 3, the device includes: acquiring unit 31, creating unit 32, judging unit 33, determination unit 34.
Acquiring unit 31 can be used for obtaining the sample of users data in original health archives and electronic health record data;
Creating unit 32 can be used for the regression forecasting mould according to the user characteristics creation numeric type in sample of users data
Type;
Judging unit 33 can be used for judging using regressive prediction model the first physical examination index value of target user's fasting blood-glucose
With the second physical examination index value of postprandial preset duration blood glucose;
Determination unit 34 can be used for determining target user's according to the first physical examination index value and/or the second physical examination index value
Extent.
In specific implementation application scenarios, for the recurrence according to the user characteristics creation numeric type in sample of users data
Prediction model, as shown in figure 4, creating unit 32, specifically can include: creation module 321, training module 322, determining module 323.
Creation module 321 is particularly used in using fasting blood sugar in the user characteristics as label information Y1, and will
Target signature data of the sample of users in addition to the fasting blood sugar and postprandial two hours blood glucose values are as characteristic information
X1, creates the first model training collection, and the target signature data include at least suffering from history data, counting in hospital for the sample of users
According to, medical administration data, physical examination data, health inform it is one or more in data;
Training module 322 is particularly used in and is based on default regression forecasting algorithm training by the first model training collection
For judging the first identification model of the first physical examination index value, wherein the default regression forecasting algorithm by random forest,
Gradient promotes tetra- kinds of algorithm fusions of decision tree GBDT, Xgboost, LightGBM and obtains, and the assessment of first identification model is adopted
With mean absolute percentage error MAPE index, when the corresponding MAPE index value of first identification model is less than pre-set criteria ratio
When compared with threshold value, determine that first identification model meets evaluation criteria;
Determining module 323 is particularly used in and can determine the spy by first identification model for meeting evaluation criteria
Reference ceases the first mapping relations between the X1 and label information Y1;
Creation module 321 specifically can also be used in using two hours blood glucose values after the user characteristics Chinese meal as label information
Y2, and using the target signature data of the sample of users as characteristic information X2, create the second model training collection;
Training module 322 specifically can also be used to be based on the default regression forecasting calculation by the second model training collection
Method trains the second identification model for judging the second physical examination index value, wherein the assessment of second identification model is adopted
With MAPE index, when the corresponding MAPE index value of second identification model, which is less than preassigned, compares threshold value, determine described in
Second identification model meets evaluation criteria;
Determining module 323 specifically can also be used to can determine by second identification model for meeting evaluation criteria described
The second mapping relations between characteristic information X2 and the label information Y2.
Correspondingly, in order to judge target user's fasting blood-glucose the first physical examination index value and postprandial preset duration blood glucose
Second physical examination index value, as shown in figure 4, judging unit 33, specifically can include: matching module 331, determining module 332.
Matching module 331 is particularly used in the characteristic information of the target user being input to first identification model
In with the characteristic information X1 carry out similarity mode, the characteristic information of the target user correspond to the target user remove described in
The target signature data other than fasting blood sugar and postprandial two hours blood glucose values;
Determining module 332 is particularly used in and is greater than preset threshold and the highest feature letter of similarity using similarity
X1 and first mapping relations are ceased, determine the corresponding first physical examination index value of the target user;
Matching module 331 specifically can also be used to for the characteristic information of the target user to be input to the second identification mould
Similarity mode is carried out with the characteristic information X2 in type;
Determining module 332 specifically can also be used to be greater than predetermined threshold and the highest characteristic information of similarity using similarity
X2 and second mapping relations determine the corresponding second physical examination index value of the target user.
In specific application scenarios, in order to determine mesh according to the first physical examination index value and/or the second physical examination index value
The extent of user is marked, as shown in figure 4, determination unit 34, specifically can include: determining module 341, judgment module 342.
Determining module 341, if can be used for the corresponding first physical examination index value of target user is more than or equal to the first preset threshold,
And/or second physical examination index value be more than or equal to the second preset threshold, it is determined that target user suffer from diabetes;
Judgment module 342 can be used for through the first numerical intervals locating for the first physical examination index value, and/or the second physical examination
Second value section locating for index value, judges the extent of target user.
In specific application scenarios, in order to accurately judge the extent of target user, judgment module 342, tool
Body, which is also used to divide, is greater than the first preset threshold, and according to multiple numerical intervals of predetermined value regular increase;Create multiple numbers
The third mapping relations being worth between section and diabetes extent;Determine that the first physical examination index value is corresponding in multiple numerical value areas
Between in the first numerical intervals;According to third mapping relations and the first numerical intervals, the diabetes illness of target user is judged
Degree.It divides and is greater than the second preset threshold, and according to multiple numerical intervals of predetermined value regular increase;Create multiple numerical value areas
Between the 4th mapping relations between diabetes extent;Determine that the second physical examination index value is corresponding in multiple numerical intervals
Second value section;According to the 4th mapping relations and second value section, the diabetes extent of target user is judged;
Judgment module 342, if being specifically also used to the first diabetes extent and the second patient of diabetes course of disease
Degree is different, then according to user to being fed back by first identification model and described both prediction modes of second identification model
Accuracy rate adopts rate, obtains corresponding first weight of first identification model respectively and second identification model is corresponding
Second weight;When first weight is greater than second weight, the first diabetes extent is determined as described
The extent of target user;When second weight is greater than first weight, by the second diabetes extent
It is determined as the extent of the target user.
In specific application scenarios, matching module 331 is particularly used in and passes through the characteristic information of the target user
Data cleansing, feature extraction, Missing Data Filling, outlier processing obtain the characteristic information of structural data;By structural data
Characteristic information and the characteristic information X1 carry out similarity mode;
Matching module 331, be particularly used in by the characteristic information of the target user by data cleansing, feature extraction,
Missing Data Filling, outlier processing obtain the characteristic information of structural data;By the characteristic information of structural data and the spy
Reference ceases X2 and carries out similarity mode.
In specific application scenarios, training module 322 is particularly used in using stochastical sampling mode from first mould
The first training sample set, the second training sample set, third training sample set, the 4th training sample are obtained in type training set respectively
Collection;Random forests algorithm is utilized based on first training sample set, training obtains the first classifier;Based on second training
Sample set utilizes GBDT algorithm, and training obtains the second classifier;Xgboost algorithm is utilized based on the third training sample set,
Training obtains third classifier;LightGBM algorithm is utilized based on the 4th training sample set, training obtains the 4th classifier;
First classifier, second classifier, the third classifier, the 4th classifier are melted using bagging method
Conjunction processing, obtains first identification model;
Training module 322 specifically can also be used to obtain respectively using stochastical sampling mode from second model training concentration
Take the 5th training sample set, the 6th training sample set, the 7th training sample set, the 8th training sample set;Based on the 5th instruction
Practice sample set and utilize random forests algorithm, training obtains the 5th classifier;It is calculated based on the 6th training sample set using GBDT
Method, training obtain the 6th classifier;Xgboost algorithm is utilized based on the 7th training sample set, training obtains the 7th classification
Device;LightGBM algorithm is utilized based on the 8th training sample set, training obtains the 8th classifier;By the 5th classification
Device, the 6th classifier, the 7th classifier, the 8th classifier carry out fusion treatment using bagging method, obtain institute
State the second identification model.
It should be noted that each functional unit involved by a kind of prediction meanss of diabetes provided in this embodiment is other
Corresponding description, can be referring to figs. 1 to the corresponding description in Fig. 2, and details are not described herein.
Based on above-mentioned method as depicted in figs. 1 and 2, correspondingly, the embodiment of the present application also provides a kind of storage medium,
On be stored with computer program, which realizes the above-mentioned prediction such as Fig. 1 and diabetes shown in Fig. 2 when being executed by processor
Method.
Based on this understanding, the technical solution of the application can be embodied in the form of software products, which produces
Product can store in a non-volatile memory medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions
With so that computer equipment (can be personal computer, server or the network equipment an etc.) execution the application is each
The method of implement scene.
Based on above-mentioned method as shown in Figure 1 and Figure 2 and Fig. 3, virtual bench embodiment shown in Fig. 4, in order to realize
Above-mentioned purpose, the embodiment of the present application also provides a kind of computer equipments, are specifically as follows personal computer, server, network
Equipment etc., the entity device include storage medium and processor;Storage medium, for storing computer program;Processor is used for
Computer program is executed to realize the prediction technique of above-mentioned diabetes as depicted in figs. 1 and 2.
Optionally, which can also include user interface, network interface, camera, radio frequency (Radio
Frequency, RF) circuit, sensor, voicefrequency circuit, WI-FI module etc..User interface may include display screen
(Display), input unit such as keyboard (Keyboard) etc., optional user interface can also connect including USB interface, card reader
Mouthful etc..Network interface optionally may include standard wireline interface and wireless interface (such as blue tooth interface, WI-FI interface).
It will be understood by those skilled in the art that computer equipment structure provided in this embodiment is not constituted and is set to the entity
Standby restriction may include more or fewer components, perhaps combine certain components or different component layouts.
It can also include operating system, network communication module in non-volatile readable storage medium.Operating system is management
The program of the entity device hardware and software resource of the prediction of diabetes, support message handling program and other softwares and/or
The operation of program.Network communication module for realizing the communication between component each inside non-volatile readable storage medium, and
It is communicated between hardware and softwares other in the entity device.
Through the above description of the embodiments, those skilled in the art can be understood that the application can borrow
It helps software that the mode of necessary general hardware platform is added to realize, hardware realization can also be passed through.Pass through the skill of application the application
Art scheme, compared with currently available technology, the application can be on the basis of detecting target user with diabetes, further
Judge the severity of illness, diagnostic result can be made more perfect, and then the state of an illness hair for understanding target user can be tracked in time
Situation is opened up, and carries out corresponding mating treatment.
It will be appreciated by those skilled in the art that the accompanying drawings are only schematic diagrams of a preferred implementation scenario, module in attached drawing or
Process is not necessarily implemented necessary to the application.It will be appreciated by those skilled in the art that the mould in device in implement scene
Block can according to implement scene describe be distributed in the device of implement scene, can also carry out corresponding change be located at be different from
In one or more devices of this implement scene.The module of above-mentioned implement scene can be merged into a module, can also be into one
Step splits into multiple submodule.
Above-mentioned the application serial number is for illustration only, does not represent the superiority and inferiority of implement scene.Disclosed above is only the application
Several specific implementation scenes, still, the application is not limited to this, and the changes that any person skilled in the art can think of is all
The protection scope of the application should be fallen into.
Claims (10)
1. a kind of prediction technique of diabetes characterized by comprising
Obtain the sample of users data in original health archives and electronic health record data;
Regressive prediction model according to the user characteristics creation numeric type in the sample of users data;
Using the regressive prediction model judge target user's fasting blood-glucose the first physical examination index value and postprandial preset duration blood
Second physical examination index value of sugar;
According to the first physical examination index value and/or the second physical examination index value, the extent of the target user is determined.
2. the method according to claim 1, wherein the user characteristics are to utilize regular expression from the sample
It is extracted in this user data, the preset duration is two hours;
The regressive prediction model according to the user characteristics creation numeric type in the sample of users data, specifically includes:
Using fasting blood sugar in the user characteristics as label information Y1, and family is mixed the sample with except the fasting blood sugar and institute
The target signature data other than postprandial two hours blood glucose values are stated as characteristic information X1, create the first model training collection, the mesh
Mark characteristic includes at least suffering from history data, hospitalization data, medical administration data, physical examination data, being good for for the sample of users
Health is accused one or more in primary data;
Default regression forecasting algorithm training is based on for judging the first physical examination index value by the first model training collection
The first identification model, wherein the default regression forecasting algorithm by random forest, gradient promoted decision tree GBDT,
Tetra- kinds of algorithm fusions of Xgboost, LightGBM obtain, and the assessment of first identification model uses mean absolute percentage error
MAPE index determines described the when the corresponding MAPE index value of first identification model, which is less than pre-set criteria, compares threshold value
One identification model meets evaluation criteria, and first identification model by meeting evaluation criteria can determine the characteristic information X1
The first mapping relations between the label information Y1;
Using two hours blood glucose values after the user characteristics Chinese meal as label information Y2, and by the target of the sample of users
Characteristic creates the second model training collection as characteristic information X2;
The default regression forecasting algorithm training is based on for judging that second physical examination refers to by the second model training collection
Second identification model of scale value, wherein the assessment of second identification model uses MAPE index, when second identification model
When corresponding MAPE index value compares threshold value less than preassigned, determines that second identification model meets evaluation criteria, pass through
Meet evaluation criteria second identification model can determine between the characteristic information X2 and the label information Y2 second
Mapping relations.
3. according to the method described in claim 2, it is characterized in that, described judge target user using the regressive prediction model
First physical examination index value of fasting blood-glucose and the second physical examination index value of postprandial preset duration blood glucose, specifically include:
The characteristic information of the target user is input in first identification model similar to the characteristic information X1 progress
Degree matching, the characteristic information of the target user are corresponding in addition to the fasting blood sugar and postprandial two hours blood glucose values
The target signature data;
Using similarity greater than preset threshold and the highest characteristic information X1 of similarity and first mapping relations, really
Determine the corresponding first physical examination index value of the target user;
The characteristic information of the target user is input in second identification model similar to the characteristic information X2 progress
Degree matching;
Using similarity greater than predetermined threshold and the highest characteristic information X2 of similarity and second mapping relations, institute is determined
State the corresponding second physical examination index value of target user.
4. according to the method described in claim 3, it is characterized in that, described according to the first physical examination index value and/or described
Second physical examination index value, determines the extent of the target user, specifically includes:
If the corresponding first physical examination index value of the target user is more than or equal to the first preset threshold and/or described second
Physical examination index value is more than or equal to the second preset threshold, it is determined that the target user suffers from diabetes;
Pass through locating for the first numerical intervals locating for the first physical examination index value, and/or the second physical examination index value
Two numerical intervals judge the extent of the target user.
5. according to the method described in claim 4, it is characterized in that, passing through the first numerical value locating for the first physical examination index value
Section judges the extent of the target user, specifically includes:
It divides and is greater than first preset threshold, and according to multiple numerical intervals of predetermined value regular increase;
Create the third mapping relations between the multiple numerical intervals and diabetes extent;
Determine corresponding first numerical intervals in the multiple numerical intervals of the first physical examination index value;
According to the third mapping relations and first numerical intervals, the first diabetes illness of the target user is judged
Degree;
By second value section locating for the second physical examination index value, the extent of the target user is judged, specifically
Include:
It divides and is greater than second preset threshold, and according to multiple numerical intervals of predetermined value regular increase;
Create the 4th mapping relations between the multiple numerical intervals and diabetes extent;
Determine the corresponding second value section in the multiple numerical intervals of the second physical examination index value;
According to the 4th mapping relations and the second value section, the second diabetes illness of the target user is judged
Degree;
Pass through the second number locating for the first numerical intervals locating for the first physical examination index value and the second physical examination index value
It is worth section, judges the extent of the target user, specifically include:
If the first diabetes extent is different with the second diabetes extent, according to user to by described
The accuracy rate or adopt rate that first identification model and described both prediction modes of second identification model are fed back, respectively described in acquisition
Corresponding first weight of first identification model and corresponding second weight of second identification model;
When first weight is greater than second weight, the first diabetes extent is determined as the target and is used
The extent at family;
When second weight is greater than first weight, the second diabetes extent is determined as the target and is used
The extent at family.
6. according to the method described in claim 3, it is characterized in that, the characteristic information by the target user is input to institute
It states in the first identification model and carries out similarity mode with the characteristic information X1, specifically include:
The characteristic information of the target user is passed through into data cleansing, feature extraction, Missing Data Filling, outlier processing, is obtained
The characteristic information of structural data;
The characteristic information of structural data and the characteristic information X1 are subjected to similarity mode;
The characteristic information by the target user is input in second identification model to carry out with the characteristic information X2
Similarity mode specifically includes:
The characteristic information of the target user is passed through into data cleansing, feature extraction, Missing Data Filling, outlier processing, is obtained
The characteristic information of structural data;
The characteristic information of structural data and the characteristic information X2 are subjected to similarity mode.
7. according to the method described in claim 2, it is characterized in that, described be based on presetting back by the first model training collection
Return prediction algorithm to train the first identification model for judging the first physical examination index value, specifically include:
First training sample set, the second training sample are obtained from first model training concentration using stochastical sampling mode respectively
Collection, third training sample set, the 4th training sample set;
Random forests algorithm is utilized based on first training sample set, training obtains the first classifier;
GBDT algorithm is utilized based on second training sample set, training obtains the second classifier;
Xgboost algorithm is utilized based on the third training sample set, training obtains third classifier;
LightGBM algorithm is utilized based on the 4th training sample set, training obtains the 4th classifier;
By first classifier, second classifier, the third classifier, the 4th classifier using bagging method into
Row fusion treatment obtains first identification model;
It is described to be collected based on the default regression forecasting algorithm training by second model training for judging second body
The second identification model for examining index value, specifically includes:
It is concentrated using stochastical sampling mode from second model training and obtains the 5th training sample set, the 6th training sample respectively
Collection, the 7th training sample set, the 8th training sample set;
Random forests algorithm is utilized based on the 5th training sample set, training obtains the 5th classifier;
GBDT algorithm is utilized based on the 6th training sample set, training obtains the 6th classifier;
Xgboost algorithm is utilized based on the 7th training sample set, training obtains the 7th classifier;
LightGBM algorithm is utilized based on the 8th training sample set, training obtains the 8th classifier;
By the 5th classifier, the 6th classifier, the 7th classifier, the 8th classifier using bagging method into
Row fusion treatment obtains second identification model.
8. a kind of prediction meanss of diabetes characterized by comprising
Acquiring unit, for obtaining the sample of users data in original health archives and electronic health record data;
Creating unit, for the regressive prediction model according to the user characteristics creation numeric type in the sample of users data;
Judging unit, for judging the first physical examination index value and meal of target user's fasting blood-glucose using the regressive prediction model
Second physical examination index value of preset duration blood glucose afterwards;
Determination unit, for determining that the target is used according to the first physical examination index value and/or the second physical examination index value
The extent at family.
9. a kind of non-volatile readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed
Device realizes the prediction technique of diabetes described in any one of claims 1 to 7 when executing.
10. a kind of computer equipment, including non-volatile readable storage medium, processor and it is stored in non-volatile readable storage
On medium and the computer program that can run on a processor, which is characterized in that the processor is realized when executing described program
The prediction technique of diabetes described in any one of claims 1 to 7.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910185079.2A CN110197720A (en) | 2019-03-12 | 2019-03-12 | Prediction technique and device, storage medium, the computer equipment of diabetes |
PCT/CN2019/117217 WO2020181805A1 (en) | 2019-03-12 | 2019-11-11 | Diabetes prediction method and apparatus, storage medium, and computer device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910185079.2A CN110197720A (en) | 2019-03-12 | 2019-03-12 | Prediction technique and device, storage medium, the computer equipment of diabetes |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110197720A true CN110197720A (en) | 2019-09-03 |
Family
ID=67751751
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910185079.2A Pending CN110197720A (en) | 2019-03-12 | 2019-03-12 | Prediction technique and device, storage medium, the computer equipment of diabetes |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110197720A (en) |
WO (1) | WO2020181805A1 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111429289A (en) * | 2020-03-23 | 2020-07-17 | 平安医疗健康管理股份有限公司 | Single disease identification method and device, computer equipment and storage medium |
CN111599470A (en) * | 2020-04-23 | 2020-08-28 | 中国科学院上海技术物理研究所 | Method for improving near-infrared noninvasive blood glucose detection precision |
CN111657873A (en) * | 2020-07-07 | 2020-09-15 | 四川长虹电器股份有限公司 | Physical constitution prediction method based on visible light and near infrared spectrum technology |
WO2020181805A1 (en) * | 2019-03-12 | 2020-09-17 | 平安科技(深圳)有限公司 | Diabetes prediction method and apparatus, storage medium, and computer device |
CN111710420A (en) * | 2020-05-15 | 2020-09-25 | 深圳先进技术研究院 | Complication morbidity risk prediction method, system, terminal and storage medium based on electronic medical record big data |
CN111739646A (en) * | 2020-06-22 | 2020-10-02 | 平安医疗健康管理股份有限公司 | Data verification method and device, computer equipment and readable storage medium |
CN111797284A (en) * | 2020-07-08 | 2020-10-20 | 北京康健德科技有限公司 | Graph database construction method and device, electronic equipment and storage medium |
CN112382394A (en) * | 2020-11-05 | 2021-02-19 | 苏州麦迪斯顿医疗科技股份有限公司 | Event processing method and device, electronic equipment and storage medium |
WO2021151273A1 (en) * | 2020-05-26 | 2021-08-05 | 平安科技(深圳)有限公司 | Disease prediction method and apparatus, electronic device, and storage medium |
CN113658704A (en) * | 2021-09-17 | 2021-11-16 | 平安国际智慧城市科技股份有限公司 | Diabetes risk prediction device, apparatus and storage medium |
CN113796852A (en) * | 2021-09-30 | 2021-12-17 | 太原理工大学 | Diabetes foot prediction method based on gradient lifting decision tree model algorithm |
CN114242247A (en) * | 2021-12-30 | 2022-03-25 | 吉林大学第一医院 | Non-obese MAFLD prediction system, device and storage medium |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112164454A (en) * | 2020-10-10 | 2021-01-01 | 联仁健康医疗大数据科技股份有限公司 | Diagnosis prediction method and device and electronic equipment |
CN113057586B (en) * | 2021-03-17 | 2024-03-12 | 上海电气集团股份有限公司 | Disease early warning method, device, equipment and medium |
CN113035357A (en) * | 2021-04-06 | 2021-06-25 | 昆明医科大学第一附属医院 | Diabetic kidney disease risk assessment system |
CN113113134A (en) * | 2021-04-07 | 2021-07-13 | 闵东 | Clinical etiology prejudgment device and system |
CN113488166A (en) * | 2021-07-28 | 2021-10-08 | 联仁健康医疗大数据科技股份有限公司 | Diabetes data analysis model training and data management method, device and equipment |
CN113808744A (en) * | 2021-09-22 | 2021-12-17 | 河北工程大学 | Diabetes risk prediction method, device, equipment and storage medium |
CN116189896B (en) * | 2023-04-24 | 2023-08-08 | 北京快舒尔医疗技术有限公司 | Cloud-based diabetes health data early warning method and system |
CN117112729A (en) * | 2023-08-21 | 2023-11-24 | 北京科文思数据管理有限公司 | Medical resource docking method and system based on artificial intelligence |
CN117494688B (en) * | 2023-12-29 | 2024-03-29 | 深圳智能思创科技有限公司 | Form information extraction method, device, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN2850518Y (en) * | 2005-10-24 | 2006-12-27 | 北京软测科技有限公司 | Portable diabetes condition monitoring apparatus |
US20150347707A1 (en) * | 2014-05-30 | 2015-12-03 | Anthony Michael Albisser | Computer-Implemented System And Method For Improving Glucose Management Through Cloud-Based Modeling Of Circadian Profiles |
CN109378072A (en) * | 2018-10-13 | 2019-02-22 | 中山大学 | A kind of abnormal fasting blood sugar method for early warning based on integrated study Fusion Model |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11144825B2 (en) * | 2016-12-01 | 2021-10-12 | University Of Southern California | Interpretable deep learning framework for mining and predictive modeling of health care data |
CN106682412A (en) * | 2016-12-22 | 2017-05-17 | 浙江大学 | Diabetes prediction method based on medical examination data |
CN109308545B (en) * | 2018-08-21 | 2023-07-07 | 中国平安人寿保险股份有限公司 | Method, device, computer equipment and storage medium for predicting diabetes probability |
CN110197720A (en) * | 2019-03-12 | 2019-09-03 | 平安科技(深圳)有限公司 | Prediction technique and device, storage medium, the computer equipment of diabetes |
-
2019
- 2019-03-12 CN CN201910185079.2A patent/CN110197720A/en active Pending
- 2019-11-11 WO PCT/CN2019/117217 patent/WO2020181805A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN2850518Y (en) * | 2005-10-24 | 2006-12-27 | 北京软测科技有限公司 | Portable diabetes condition monitoring apparatus |
US20150347707A1 (en) * | 2014-05-30 | 2015-12-03 | Anthony Michael Albisser | Computer-Implemented System And Method For Improving Glucose Management Through Cloud-Based Modeling Of Circadian Profiles |
CN109378072A (en) * | 2018-10-13 | 2019-02-22 | 中山大学 | A kind of abnormal fasting blood sugar method for early warning based on integrated study Fusion Model |
Non-Patent Citations (1)
Title |
---|
武士敏 等: "《实用全科护理学》", 30 April 2017, 吉林科学技术出版社 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020181805A1 (en) * | 2019-03-12 | 2020-09-17 | 平安科技(深圳)有限公司 | Diabetes prediction method and apparatus, storage medium, and computer device |
CN111429289B (en) * | 2020-03-23 | 2023-03-24 | 平安医疗健康管理股份有限公司 | Single disease identification method and device, computer equipment and storage medium |
CN111429289A (en) * | 2020-03-23 | 2020-07-17 | 平安医疗健康管理股份有限公司 | Single disease identification method and device, computer equipment and storage medium |
CN111599470A (en) * | 2020-04-23 | 2020-08-28 | 中国科学院上海技术物理研究所 | Method for improving near-infrared noninvasive blood glucose detection precision |
CN111710420A (en) * | 2020-05-15 | 2020-09-25 | 深圳先进技术研究院 | Complication morbidity risk prediction method, system, terminal and storage medium based on electronic medical record big data |
CN111710420B (en) * | 2020-05-15 | 2024-03-19 | 深圳先进技术研究院 | Complication onset risk prediction method, system, terminal and storage medium based on electronic medical record big data |
WO2021151273A1 (en) * | 2020-05-26 | 2021-08-05 | 平安科技(深圳)有限公司 | Disease prediction method and apparatus, electronic device, and storage medium |
CN111739646A (en) * | 2020-06-22 | 2020-10-02 | 平安医疗健康管理股份有限公司 | Data verification method and device, computer equipment and readable storage medium |
CN111657873A (en) * | 2020-07-07 | 2020-09-15 | 四川长虹电器股份有限公司 | Physical constitution prediction method based on visible light and near infrared spectrum technology |
CN111797284A (en) * | 2020-07-08 | 2020-10-20 | 北京康健德科技有限公司 | Graph database construction method and device, electronic equipment and storage medium |
CN112382394A (en) * | 2020-11-05 | 2021-02-19 | 苏州麦迪斯顿医疗科技股份有限公司 | Event processing method and device, electronic equipment and storage medium |
CN113658704A (en) * | 2021-09-17 | 2021-11-16 | 平安国际智慧城市科技股份有限公司 | Diabetes risk prediction device, apparatus and storage medium |
CN113796852A (en) * | 2021-09-30 | 2021-12-17 | 太原理工大学 | Diabetes foot prediction method based on gradient lifting decision tree model algorithm |
CN113796852B (en) * | 2021-09-30 | 2023-09-08 | 太原理工大学 | Diabetes foot prediction method based on gradient lifting decision tree model algorithm |
CN114242247A (en) * | 2021-12-30 | 2022-03-25 | 吉林大学第一医院 | Non-obese MAFLD prediction system, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2020181805A1 (en) | 2020-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110197720A (en) | Prediction technique and device, storage medium, the computer equipment of diabetes | |
US7809660B2 (en) | System and method to optimize control cohorts using clustering algorithms | |
McFall et al. | Quantifying the information value of clinical assessments with signal detection theory | |
US8793144B2 (en) | Treatment effect prediction system, a treatment effect prediction method, and a computer program product thereof | |
CN110197724A (en) | Predict the method, apparatus and computer equipment in diabetes illness stage | |
CN110197728A (en) | Prediction technique, device and the computer equipment of diabetes | |
WO2021151295A1 (en) | Method, apparatus, computer device, and medium for determining patient treatment plan | |
Koehl et al. | Landmark-free geometric methods in biological shape analysis | |
CN102405473A (en) | A point-of-care enactive medical system and method | |
De Falco et al. | A genetic programming-based regression for extrapolating a blood glucose-dynamics model from interstitial glucose measurements and their first derivatives | |
Nagpal et al. | Auton-survival: An open-source package for regression, counterfactual estimation, evaluation and phenotyping with censored time-to-event data | |
WO2016006042A1 (en) | Data analysis device, control method for data analysis device, and control program for data analysis device | |
Hezarjaribi et al. | Human-in-the-loop learning for personalized diet monitoring from unstructured mobile data | |
Cheng et al. | Classification models for pulmonary function using motion analysis from phone sensors | |
US11961204B2 (en) | State visualization device, state visualization method, and state visualization program | |
Cheng et al. | Mining discriminative patterns to predict health status for cardiopulmonary patients | |
Liu et al. | Methods for estimating and interpreting provider-specific standardized mortality ratios | |
WO2021122345A1 (en) | Aortic stenosis classification | |
Johnson | Mortality prediction and acuity assessment in critical care | |
Gyuk et al. | Diabetes lifestyle support with improved glycemia prediction algorithm | |
KR102550465B1 (en) | Artificial intelligence-based virtual patient management system | |
US20220406017A1 (en) | Health management system, and human body information display method and human body model generation method applied to same | |
Xu | The Application of Machine Learning-Based Prediction Models for Cardiometabolic Risk Among a Representative US Adult Population: A Cross-Sectional Study of NHANES 1999-2006 | |
Priya et al. | Multi Modal Smart Diagnosis of Pulmonary Diseases | |
Shamsuddin | Analyzing and Synthesizing Healthcare Time Series Data for Decision-Support |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190903 |
|
RJ01 | Rejection of invention patent application after publication |