CN109215781A - A kind of construction method and building system of the Kawasaki disease risk evaluation model based on logistic algorithm - Google Patents

A kind of construction method and building system of the Kawasaki disease risk evaluation model based on logistic algorithm Download PDF

Info

Publication number
CN109215781A
CN109215781A CN201811075730.2A CN201811075730A CN109215781A CN 109215781 A CN109215781 A CN 109215781A CN 201811075730 A CN201811075730 A CN 201811075730A CN 109215781 A CN109215781 A CN 109215781A
Authority
CN
China
Prior art keywords
kawasaki disease
data
sample
model
logistic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811075730.2A
Other languages
Chinese (zh)
Other versions
CN109215781B (en
Inventor
丁国徽
贾佳
李光
徐重飞
周珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Daozhi precision medicine technology (Shanghai) Co.,Ltd.
Original Assignee
Basepair Biotechnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Basepair Biotechnology Co Ltd filed Critical Basepair Biotechnology Co Ltd
Priority to CN201811075730.2A priority Critical patent/CN109215781B/en
Publication of CN109215781A publication Critical patent/CN109215781A/en
Application granted granted Critical
Publication of CN109215781B publication Critical patent/CN109215781B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a kind of construction method of Kawasaki disease risk evaluation model based on logistic algorithm and building systems.The construction method includes: to concentrate to extract the effective sample that can be used for modelling evaluation from sample data;10 features for meeting the application of live medical auxiliary diagnosis are filtered out from the characteristic set of effective sample;It is training set and verifying collection by the incomplete data sets random division of effective sample;Model construction is carried out using the method fitting training set of logistic, fitting function is adjusted using cross-validation method, records optimal model parameters;According to ROC curve using verifying collection computation model classification thresholds t, so that building obtains Kawasaki disease risk evaluation model.The present invention also constructs corresponding Kawasaki disease risk evaluating system and is applied to assess data to be assessed, obtains KDx scoring.The present invention helps to reduce the misdiagnosis rate and rate of missed diagnosis of Kawasaki disease, obtain patient can in early stage of falling ill and effectively prevents, intervenes and treat.

Description

A kind of construction method of Kawasaki disease risk evaluation model based on logistic algorithm and Building system
Technical field
The present invention relates to a kind of construction methods of model, relate in particular to a kind of prediction river based on logistic algorithm Construction method, building system and the assessment system of the assessment models of rugged disease risk, belong to risk evaluation model constructing technology neck Domain.
Technical background
Kawasaki disease is also known as Acute Kawasaki Syndrome, is a kind of using system vascular inflammation as the acute of major lesions Rash and fever illness, high incidence age are 5 years old Infants Below, and male is more than female, due to can occur serious cardiovascular complication It attracts people's attention.For Kawasaki disease with persistent fever for most common symptom, clinical manifestation is similar to common disease such as pneumonia etc., is easy It causes to fail to pinpoint a disease in diagnosis or mistaken diagnosis, lesion of coronary artery or even life-threatening may be left, be children's acquired heart disease most commonly encountered diseases The risk factor of ischemic heart disease after cause, and adult.The therapic opportunity of Kawasaki disease can significantly affect prognosis, timely diagnosis and treatment, It is the key that avoid coronary artery pathological changes.
Current diagnosis basis must generate heat >=5 days, and need that clinical symptoms is waited to occur, and be aided with laboratory diagnosis and surpass ECG examination is easy that infant is made to miss golden hour.Still not specific diagnostic method at present, be easy to cause infant It fails to pinpoint a disease in diagnosis, the generation of mistaken diagnosis, clinical treatment delay in turn results in bigger harm.Therefore, high sensitivity is researched and developed, high specificity The needs of middle urgent need that diagnostic mode becomes Kawasaki disease diagnosis and treatment meets.
Kawasaki disease illness prediction model based on medical data modeling can with aided assessment, help to reduce its rate of missed diagnosis and Misdiagnosis rate further instructs its subsequent therapeutic process.Presently, there are the Kawasaki disease disaggregated model based on data mostly use linearly Method, Typical Representative are logistic regression analysis method.It causes patients with Kawasaki disease to fail to pinpoint a disease in diagnosis because its sensibility, specificity are insufficient, miss Situation is examined, to be delayed patient's treatment.
Therefore, how existing Kawasaki disease illness prediction model is optimized, constructing a kind of has hypersensitivity, special Property risk evaluation model, already become industry researcher effort always for a long time direction.
Summary of the invention
The main purpose of the present invention is to provide a kind of structures of Kawasaki disease risk evaluation model based on logistic algorithm Construction method and building system, to overcome deficiency in the prior art.
Another object of the present invention, which also resides in, provides a kind of Kawasaki disease risk evaluating system based on logistic algorithm.
For realization aforementioned invention purpose, the technical solution adopted by the present invention includes:
The embodiment of the invention provides a kind of construction method of Kawasaki disease risk evaluation model based on logistic algorithm, Comprising:
It is concentrated from sample data and extracts the effective sample that can be used for modelling evaluation model;
10 features for meeting the application of live medical auxiliary diagnosis are filtered out from the feature set of the effective sample;
It is training set and verifying collection by the incomplete data sets random division of the effective sample;
Model construction is carried out using the method fitting training set of logistic, fitting function is adjusted using cross-validation method, Record optimal model parameters;Meanwhile according to ROC curve using verifying collection computation model classification thresholds t, so that building obtains Kawasaki Sick risk evaluation model.
The embodiment of the invention also provides a kind of building systems of Kawasaki disease risk evaluation model based on logistic algorithm System is applied to construction method above-mentioned comprising:
Data acquisition module is at least acquired for data, obtains sample data set;
Data processing module, at least for can be used for constructing the effective sample of assessment models from sample data concentration extraction;
Model construction module, at least for being training set by the incomplete data sets random division of the effective sample and testing Card collection, and it is fitted training set using the method for logistic, fitting function, record optimal models ginseng are adjusted using cross-validation method Number;
Threshold calculation module, at least for according to ROC curve using verifying collection computation model classification thresholds.
The embodiment of the invention also provides the Kawasaki disease risks based on logistic algorithm constructed by preceding method Assessment models.
The embodiment of the invention also provides a kind of Kawasaki disease risk evaluating systems based on logistic algorithm comprising:
Input module, at least for inputting data to be assessed;
The Kawasaki disease risk evaluation model based on logistic algorithm constructed by preceding method, at least for this Data to be assessed are assessed;
Display module, at least for showing assessment result, i.e. KDx scoring.
1) compared with prior art, the Kawasaki disease risk evaluation model building provided by the invention based on logistic algorithm Method and system, statistical analysis, the modeling of system are carried out using medical data relevant to Kawasaki disease, and provide model evaluation side Method can be based on existing Kawasaki disease medical data by the model, carry out to the patient of doubtful Kawasaki disease scientific and effective auxiliary Assessment is helped, helps to reduce its misdiagnosis rate and rate of missed diagnosis, makes patient that can obtain effective prevention, intervention in morbidity early stage, and Science reliably instructs successive treatment process, provides foundation to reach optimum therapeuticing effect, efficiently avoids existing diagnosis side It causes patients with Kawasaki disease to fail to pinpoint a disease in diagnosis because there is no the assessment models of hypersensitivity and specificity in formula, Misdiagnosis, prevents delay from suffering from The generation of person's treatment condition;
2) for diagnosis the used time the considerations of, the present invention selected by characteristic item the detection used time it is shorter, greatly shorten doctor and examine The disconnected time used.Also, characteristic item chooses less, reduction detection cost used.
3) data sample amount of the present invention is huge, and advantage is prominent.
Detailed description of the invention
It, below will be to required in embodiment or description of the prior art in order to illustrate more clearly of technical solution of the present invention The attached drawing used is simply introduced, it should be apparent that, drawings discussed below is as just some implementations invented herein Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings Obtain other accompanying drawings.
Fig. 1 is a kind of Kawasaki disease risk evaluation model based on logistic algorithm in an exemplary embodiments of the invention The flow diagram of construction method.
Fig. 2 is the ROC curve figure of the Kawasaki disease risk evaluation model in the embodiment of the present invention 1 based on logistic algorithm.
Specific embodiment
As previously mentioned, inventor is studied for a long period of time and largely practiced in view of the deficiencies in the prior art, it is able to propose this The technical solution of invention.With reference to the accompanying drawing and the embodiment of the present invention is to a kind of Kawasaki disease wind based on logistic algorithm The construction method of dangerous assessment models and building system etc. are described in further detail.It is of the invention to protect the content to include but not office It is limited to following case study on implementation.Without departing from the spirit and scope of the invention, those skilled in the art it is conceivable that variation It is all included in the present invention with advantage, and using appended claims as protection scope.
Logistic used in the present invention is a kind of method for improving weak typing algorithm accuracy, this method By constructing an anticipation function series, they are then combined into an anticipation function in some way.Logistic is A method of improving any given learning algorithm accuracy.Its thought originates from the PAC (Probably that Valiant is proposed Approxi mately Correct) learning model.
Present invention is primarily based on the medical datas in electronic medical records to be modeled, using the information contained in data to patient Risk with Kawasaki disease is assessed, and assessment result is carried out digitized description and is scored to get to KDx.The present invention includes The important methods such as the flow chart of data processing modeled for medical data and progress Kawasaki disease classification prediction, analysis, digitlization And result.Present invention incorporates medical datas and data digging method, are one of medical data in conjunction with big data analysis method Kind innovation, the present invention have filled up the blank of domestic medical data research to a certain extent, and medical data is being utilized to carry out Kawasaki Disease auxiliary, which tests and analyzes aspect, has novelty.
A kind of Kawasaki disease risk evaluation model based on logistic algorithm that the one aspect of the embodiment of the present invention provides Construction method comprising:
It is concentrated from sample data and extracts the effective sample that can be used for modelling evaluation model;
10 features for meeting the application of live medical auxiliary diagnosis are filtered out from the feature set of the effective sample;
It is training set and verifying collection by the incomplete data sets random division of the effective sample;
Model construction is carried out using the method fitting training set of logistic, fitting function is adjusted using cross-validation method, Record optimal model parameters;Meanwhile according to ROC curve using verifying collection computation model classification thresholds t, so that building obtains Kawasaki Sick risk evaluation model.
In some embodiments, the construction method includes:
Step 1: data sample selects;It is concentrated from sample data and extracts the effective sample that can be used for modeling and model evaluation;
Step 2: Feature Selection;It is filtered out from the feature set of building sample data and meets live medical auxiliary diagnosis and answer 10 features;The specific steps of which are as follows:
Reference items of the red pond information criterion (AIC) as feature selecting are chosen in the present embodiment.It is selected using optimal subset Method is fitted the subset for choosing sample characteristics variable using linear regression method, pattern information criterion is selected to be counted It calculates, obtains several linear models, the AIC of each model is calculated by AIC=2k-In (L).The lesser model of AIC is chosen, And be incorporated in and be applied in live medical auxiliary diagnosis, time shorter one used in various features value is obtained, comprehensive ratio is carried out Relatively obtain.
Wherein, k is the unknown parameter number in model, and L is maximum likelihood function value in model.
N characteristic value of each observation sample is indicated with matrix X,
Wherein, n is the variable number for entering sample, and m is population sample amount, xmnIndicate n-th of feature of m-th of patient Value, αnFor feature vector, all observations of n-th of feature are indicated.
Whether there is multicollinearity between verification characteristics vector, the specific steps of which are as follows:
1. by feature vector αiVariable in response, remaining feature vector use multiple linear regression as predictive variable Method carry out Model Construction, calculate feature vector αiWith the coefficient of multiple correlation R of remaining feature vectori
2. calculating feature vector αiVariance inflation factor
3. if multicollinearity is not present between characteristic variable 0 < VIF < 10;If 10 < VIF < 100, characteristic variable it Between there are stronger multicollinearity, need to eliminate synteny;Exist if VIF > 100, between characteristic variable serious multiple total Linearly, it needs to delete correlated variables.
Step 3: Kawasaki disease risk prediction model constructs;Model construction, step are carried out using the method for logistic It is rapid as follows:
(1) existing incomplete data sets and complete data set: by incomplete data sets random division be training set Xrain, Verifying collection Xderivation, ratio is 1:1~10:1, and using complete data set as test set Xtest;
(2) model construction is carried out using the method fitting Xtrain data set of logistic, is adjusted using cross-validation method Fitting function records optimal model parameters, the specific steps of which are as follows:
1. training set data is equally divided into ten parts;
2. taking wherein nine broken number evidence, it is fitted using the method for logistic, obtains model:
Wherein p (X) indicates event occurrence rate (i.e. probability of illness), β0And β=(β12,...,βn) it is model coefficient, n For the variable number entered in model.
3. utilizing step 2 gained model, the data set of a remaining folding is predicted, and calculates it and predicts error;
4. changing parameter, step 2~3 are repeated;
5. comparison prediction error, record is joined so that the corresponding parameter of the prediction the smallest model of error as optimal models Number.
(3) using verifying collection computation model classification thresholds t, threshold value t calculating, specific step is as follows according to ROC curve:
1. utilizing optimized parameter model, optimal models are established on training set;
2. being predicted on model using verifying collection observation, classification score is obtained;
3. choosing different numerical value in [0,1] range as classification valve thresholding, being drawn to classification score obtained by step 2 Point;
4. calculating under different classifications valve domain, susceptibility, specificity and the accuracy of prediction, and draw ROC curve figure;
5. figure is chosen and preferably classifies so that meeting the susceptibility of prediction, specificity and accuracy simultaneously according to ROC curve Valve domain.
In some embodiments, 10 features are respectively as follows:
A. gender;
B. the age;
C.C- reactive protein concentration (CRP g/L);
D. fibrinogen concentration (FG g/L);
E. albumin concentration (ALB g/L);
F. globulin concentration (GLB g/L);
G. Complement C_3 concentration (C3g/L);
H. IgG density (IgG g/L);
I. prealbumin PAB concentration (PAB g/L);
J. Archon ratio (A/G).
In some embodiments, training set (Xrain) and verifying integrate the ration of division of (Xderivation) as 1:1~10: 1。
In some embodiments, the construction method includes: that category of model is calculated using verifying collection according to ROC curve Threshold value t, KDx scoring is higher than this classification thresholds t and is predicted as Kawasaki disease high risk, and numerical value is higher, represents Kawasaki disease probability of illness and gets over Greatly;It is predicted as Kawasaki disease low-risk lower than this classification thresholds t, numerical value is lower, and it is smaller to represent Kawasaki disease probability of illness.
Further, the construction method further include: using complete data set as test set (Xtest), building is obtained Kawasaki disease risk evaluation model tested.According to gained classification valve domain t is calculated, the forecast analysis of test set sample is carried out.
For example, more specifically, constructing prediction model according to training set and including: the step of prediction test set data
1) the optimal logistic prediction model obtained using fitting training set, predicts its point to patient each in test set The scoring of class score, i.e. KDx.It is Kawasaki disease illness high-risk patient that score of classifying, which is greater than t, and it is Kawasaki sufferer that classification score, which is less than t, Sick low-risk patient;
2) sensibility, specificity and standard of this model in auxiliary Kawasaki disease diagnosis are calculated according to the classification score of test set True property.
For example, obtaining the mistake that can be used for constructing the effective sample of assessment models in some more specifically embodiments Journey includes:
(a) sample data is divided by river according to the Kawasaki disease diagnostic criteria of American Heart Association (AHA) formulation in 2017 Two groups of rugged disease and common fever diseases carry out delete processing to the sample data for the result that cannot clarify a diagnosis;
(b) delete processing is carried out to repeated data;
(c) index to data volume less than 80% carries out delete processing;
(d) median filling is carried out to incomplete, wrong data, to obtain the effective sample that can be used for constructing assessment models This.
The medical data that the present invention uses i.e. sample data set derives from the online electronic medical records input system of hospital EDC, packet Include doctor's advice, inspection, inspection, the course of disease, patient medical history data, follow up data, multicenter sample data, sample Molecular Detection number outside institute According to equal multidimensional datas.
It is shown in Figure 1 in some more specifically embodiments, a kind of Kawasaki disease wind based on logistic algorithm The construction method of dangerous assessment models, the specific steps are as follows:
1, samples selection
Raw data set is dataset1, the patient without result of clarifying a diagnosis, repeated data, data volume less than 80% It is removed from data set, data set is dataset2 at this time.
2, Feature Selection
For dataset2 carry out Feature Selection, according to red pond information criterion, at the same in view of characteristic item numerical value obtain when Between length, take and obtain time shorter characteristic item, data set is dataset3 at this time.
3, Kawasaki disease disaggregated model constructs
1) existing incomplete data sets and complete data set: by incomplete data sets random division be training set Xrain, test Card collection Xderivation, ratio is 1:1~10:1, and using complete data set as test set Xtest;
2) model construction is carried out using the method fitting Xtrain data set of logistic, is adjusted using cross-validation method quasi- Function is closed, optimal model parameters are recorded;
3) according to ROC curve using verifying collection computation model classification thresholds t.
The other side of the embodiment of the present invention additionally provides a kind of Kawasaki disease risk assessment based on logistic algorithm The building system of model is applied to construction method above-mentioned comprising:
Data acquisition module is at least acquired for data, obtains sample data set;
Data processing module, at least for can be used for constructing the effective sample of assessment models from sample data concentration extraction;
Model construction module, at least for being training set by the incomplete data sets random division of the effective sample and testing Card collection, and it is fitted training set using the method for logistic, fitting function, record optimal models ginseng are adjusted using cross-validation method Number;
Threshold calculation module, at least for according to ROC curve using verifying collection computation model classification thresholds.
The other side of the embodiment of the present invention additionally provide by preceding method construct based on logistic algorithm Kawasaki disease risk evaluation model.
Correspondingly, the other side of the embodiment of the present invention additionally provides a kind of Kawasaki disease wind based on logistic algorithm Dangerous assessment system comprising:
Input module, at least for inputting data to be assessed;
The Kawasaki disease risk evaluation model based on logistic algorithm constructed by preceding method, at least for this Data to be assessed are assessed;
Display module, at least for showing assessment result, i.e. KDx scoring.
In conclusion model building method and system of the invention, use medical data relevant to Kawasaki disease system The statistical analysis of system, modeling, and model evaluation method is provided, existing Kawasaki disease medical data can be based on by the model, Scientific and effective aided assessment is carried out to the patient of doubtful Kawasaki disease, helps to reduce its misdiagnosis rate and rate of missed diagnosis, patient is made to exist Morbidity early stage can obtain effective prevention, intervene, and science reliably instructs successive treatment process, to reach optimal treatment effect Fruit provides foundation, efficiently avoids causing river because not having the assessment models of hypersensitivity and specificity in existing diagnostic mode Rugged patient fails to pinpoint a disease in diagnosis, Misdiagnosis, prevents the generation of delay patient's treatment condition.
To make the object, technical solutions and advantages of the present invention clearer, below with reference to several preferred embodiments to this hair Bright technical solution is further specifically described, but the present invention is not limited only to following embodiments, field technology people The non-intrinsically safe modifications and adaptations that member makes under core guiding theory of the present invention, still fall within protection scope of the present invention.
Embodiment 1:
In order to verify a kind of having for building system of the Kawasaki disease risk evaluation model based on logistic algorithm of the invention Effect property, the present embodiment access time range are 42498 patient datas in 2008.7-2018.3 electronic medical records.The present embodiment is adopted With logistic method.
1, data processing:
Incomplete data sets include 8204 samples after raw data set passes through delete processing, and complete data collection includes 471 samples.There is form using data set according to the present invention are as follows: every row is expressed as the information of a patient, and each column is expressed as One characteristic information, such as ID, group, gender, age, CRP, FG etc., data set format such as table 1.
By data sample selection and Feature Selection, 8675 rows that data set includes, 11 column features, such as table 1 are ultimately generated It is shown.
Table 1
2, optimal models data
Incomplete data sets are randomly divided into training set (5742), verifying collection (2462), ratio 7:3, complete data set As test set (471), it is as shown in table 2 to obtain optimal model parameters:
Table 2
3, selection sort valve domain t
It is verified and is collected with optimized parameter model prediction, 2109 classification valve domains of automated randomized generation in [0,1] range calculate Susceptibility, specificity and accuracy can must be corresponded to, and draws ROC curve figure, as shown in Figure 2.
It chooses close to the curve upper left corner and susceptibility, specificity and accuracy is preferably classified valve domain t=0.5.
4, digitlization marking is carried out to prediction result
Model above will be used as a kind of Kawasaki disease risk assessment system, and the observation in test set, which is applied to this, is It is predicted in system.
Test set result is as shown in table 3-1 and table 3-2, and in this experiment, test set includes 471 people.
Table 3-1
Table 3-2
Note: about some index explanations of classification problem, for two classification problems, define two classification be positive respectively class and Negative class, each of positive class object become positive example, and each of negative class object becomes negative example.In general, in prediction river When rugged disease, Kawasaki disease sample is positive class, other fever patients are negative class.Test sample is predicted using disaggregated model, meeting There are four types of situations, if an example is positive class and is predicted to be real class (true positive, TP), if example is negative Class is predicted the class that is positive, referred to as false positive class (false positive, FP).Correspondingly, if example is that negative class is predicted to be Negative class, referred to as very negative class (true negative, TN), the positive example class that is predicted to be negative then is false negative class (false Negative, FN).
TP: positive example predicts the class number that is positive;
FN: positive example predicts the class number that is negative;
FP: negative example predicts the class number that is positive;
TN: negative example predicts the class number that is negative;
Sensibility (sensitivity): the example ratio of the correctly predicted class that is positive, i.e. TP/ (TP+FN) in positive class;
Specific (specificity): the example ratio for the class that is negative, i.e. TN/ (TN+FP) are predicted correctly in negative class;
Positive predictive value (positive predictive value, PPV): prediction is positive in the example of class, and positive example accounts for The ratio of obtaining, i.e. TP/ (TP+FP).
Correctness: the example ratio being predicted correctly in whole examples, i.e. (TP+TN)/(TP+FN+TN+FP).
Experimental result
From the true classification situation of test set data: 278 people suffer from Kawasaki disease, and 193 be common fever.By test set Data application predicts the class probability KDx of its response (such as table 3-1 institute into optimal logistic model, with its observation Show), and the result is divided according to classification valve domain t=0.5, obtain result: 259 people are predicted to be with Kawasaki disease, and 212 People is predicted to be common fever.Can obtain compared with the true classification in test set: real class (TP) is 227 people, very negative class (TN) For 161 people, false positive class (FP) is 32 people, and false negative class (FN) is 51 people (as shown in table 3-2).
Can be obtained by testing classification result: susceptibility (sensitivity) is 81.65%, and specific (specificity) is 83.42%, positive predictive value (PPV) is 87.64%, correctness 82.38%.
In conclusion a kind of Kawasaki disease risk assessment system of the present invention being capable of base by the model by above data In existing Kawasaki disease medical data, scientific and effective aided assessment is carried out to the patient of doubtful Kawasaki disease, helps to reduce it Misdiagnosis rate and rate of missed diagnosis make patient that can obtain effective prevention, intervention in morbidity early stage, and science reliably instructs subsequent control Treatment process provides foundation to reach optimum therapeuticing effect.For to diagnosis the used time the considerations of, the present invention selected by characteristic item detection Used time is shorter, greatly shortens the time used in diagnosis.Also, characteristic item chooses less, reduction detection cost used.The present invention Data sample amount is huge, and advantage is prominent, and incomplete data sets include 8204 samples after raw data set passes through delete processing, Complete data collection includes 471 samples.
Technical solution of the present invention is described in detail in embodiment described above, it should be understood that the above is only For specific embodiments of the present invention, it is not intended to restrict the invention, all any modifications made in spirit of the invention, Supplement or similar fashion substitution etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of construction method of the Kawasaki disease risk evaluation model based on logistic algorithm, characterized by comprising:
It is concentrated from sample data and extracts the effective sample that can be used for modelling evaluation model;
10 features for meeting the application of live medical auxiliary diagnosis are filtered out from the feature set of the effective sample;
It is training set and verifying collection by the incomplete data sets random division of the effective sample;
Model construction is carried out using the method fitting training set of logistic, fitting function, record are adjusted using cross-validation method Optimal model parameters;Meanwhile according to ROC curve using verifying collection computation model classification thresholds t, so that building obtains Kawasaki disease wind Dangerous assessment models.
2. the construction method of the Kawasaki disease risk evaluation model according to claim 1 based on logistic algorithm, special Sign is: 10 features be respectively gender, the age, C reactive protein concentration, fibrinogen concentration, albumin concentration, Globulin concentration, Complement C_3 concentration, IgG density, prealbumin concentration and Archon ratio.
3. the construction method of the Kawasaki disease risk evaluation model according to claim 1 based on logistic algorithm, special Sign is: the ration of division that training set integrates with verifying is 1:1~10:1.
4. the construction method of the Kawasaki disease risk evaluation model according to claim 1 based on logistic algorithm, special Sign is to include: to be higher than classification thresholds t using verifying collection computation model classification thresholds t, KDx scoring according to ROC curve to be predicted as Kawasaki disease high risk is predicted as Kawasaki disease low-risk lower than classification thresholds t.
5. the building of the Kawasaki disease risk evaluation model described in any one of -4 based on logistic algorithm according to claim 1 Method, it is characterised in that further include: using complete data set as test set, to the obtained Kawasaki disease risk evaluation model of building It is predicted.
6. the construction method of the Kawasaki disease risk evaluation model according to claim 1 based on logistic algorithm, special Sign is
Sample data set is divided into Kawasaki disease and two groups of common fever diseases according to Kawasaki disease diagnostic criteria, to cannot clarify a diagnosis As a result sample carries out delete processing;
Delete processing is carried out to repeated data;
Index to data volume less than 80% carries out delete processing;
Median filling is carried out to incomplete, wrong data, to obtain the effective sample that can be used for constructing assessment models.
7. the construction method of the Kawasaki disease risk evaluation model according to claim 6 based on logistic algorithm, special Sign is: the sample data set derive from the online electronic medical records input system of hospital, including doctor's advice, inspection, inspection, the course of disease, Follow up data, multicenter sample data and sample Molecular Detection data outside patient medical history data, institute.
8. a kind of building system of the Kawasaki disease risk evaluation model based on logistic algorithm is applied to claim 1-7 Any one of described in construction method comprising:
Data acquisition module is at least acquired for data, obtains sample data set;
Data processing module, at least for can be used for constructing the effective sample of assessment models from sample data concentration extraction;
Model construction module, at least for being training set and verifying by the incomplete data sets random division of the effective sample Collection, and it is fitted training set using the method for logistic, fitting function, record optimal models ginseng are adjusted using cross-validation method Number;
Threshold calculation module, at least for according to ROC curve using verifying collection computation model classification thresholds.
9. the Kawasaki disease risk assessment based on logistic algorithm constructed by any one of claim 1-7 the method Model.
10. a kind of Kawasaki disease risk evaluating system based on logistic algorithm, characterized by comprising:
Input module, at least for inputting data to be assessed;
The Kawasaki disease risk assessment mould based on logistic algorithm constructed by any one of claim 1-7 the method Type, at least for assessing the data to be assessed;
Display module, at least for showing assessment result, i.e. KDx scoring.
CN201811075730.2A 2018-09-14 2018-09-14 Method and system for constructing risk assessment model of Kawasaki disease based on logistic algorithm Active CN109215781B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811075730.2A CN109215781B (en) 2018-09-14 2018-09-14 Method and system for constructing risk assessment model of Kawasaki disease based on logistic algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811075730.2A CN109215781B (en) 2018-09-14 2018-09-14 Method and system for constructing risk assessment model of Kawasaki disease based on logistic algorithm

Publications (2)

Publication Number Publication Date
CN109215781A true CN109215781A (en) 2019-01-15
CN109215781B CN109215781B (en) 2021-11-12

Family

ID=64983757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811075730.2A Active CN109215781B (en) 2018-09-14 2018-09-14 Method and system for constructing risk assessment model of Kawasaki disease based on logistic algorithm

Country Status (1)

Country Link
CN (1) CN109215781B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111243736A (en) * 2019-10-24 2020-06-05 中国人民解放军海军军医大学第三附属医院 Survival risk assessment method and system
CN113223708A (en) * 2021-05-24 2021-08-06 浙江医院 Method for constructing disease risk prediction model and related equipment
CN113936804A (en) * 2021-08-23 2022-01-14 四川大学华西医院 System for constructing model for predicting risk of continuous air leakage after lung cancer resection
US20220084635A1 (en) * 2020-09-15 2022-03-17 Acer Incorporated Disease classification method and disease classification device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100205042A1 (en) * 2009-02-11 2010-08-12 Mun Johnathan C Integrated risk management process
CN106295229A (en) * 2016-08-30 2017-01-04 青岛大学 Kawasaki disease hierarchical prediction method based on medical data modeling
CN106339593A (en) * 2016-08-31 2017-01-18 青岛睿帮信息技术有限公司 Kawasaki disease classification and prediction method based on medical data modeling
CN107230108A (en) * 2017-06-13 2017-10-03 北京百分点信息科技有限公司 The processing method and processing device of business datum
US20180098728A1 (en) * 2011-03-11 2018-04-12 Centre Hospitalier Universitaire D'angers Non-invasive method for assessing the presence or severity of liver fibrosis based on a new detailed classification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100205042A1 (en) * 2009-02-11 2010-08-12 Mun Johnathan C Integrated risk management process
US20180098728A1 (en) * 2011-03-11 2018-04-12 Centre Hospitalier Universitaire D'angers Non-invasive method for assessing the presence or severity of liver fibrosis based on a new detailed classification
CN106295229A (en) * 2016-08-30 2017-01-04 青岛大学 Kawasaki disease hierarchical prediction method based on medical data modeling
CN106339593A (en) * 2016-08-31 2017-01-18 青岛睿帮信息技术有限公司 Kawasaki disease classification and prediction method based on medical data modeling
CN107230108A (en) * 2017-06-13 2017-10-03 北京百分点信息科技有限公司 The processing method and processing device of business datum

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何洋等: "基于支持向量机回归的机场航班延误预测", 《中国民航大学学报》 *
樊楚 等: "基于数据挖掘技术建立的BP 神经网络模型", 《中国循症儿科杂志》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111243736A (en) * 2019-10-24 2020-06-05 中国人民解放军海军军医大学第三附属医院 Survival risk assessment method and system
CN111243736B (en) * 2019-10-24 2023-09-01 中国人民解放军海军军医大学第三附属医院 Survival risk assessment method and system
US20220084635A1 (en) * 2020-09-15 2022-03-17 Acer Incorporated Disease classification method and disease classification device
CN113223708A (en) * 2021-05-24 2021-08-06 浙江医院 Method for constructing disease risk prediction model and related equipment
CN113936804A (en) * 2021-08-23 2022-01-14 四川大学华西医院 System for constructing model for predicting risk of continuous air leakage after lung cancer resection
CN113936804B (en) * 2021-08-23 2023-03-28 四川大学华西医院 System for constructing model for predicting risk of continuous air leakage after lung cancer resection

Also Published As

Publication number Publication date
CN109215781B (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN109273094A (en) A kind of construction method and building system of the Kawasaki disease risk evaluation model based on Boosting algorithm
CN109065171B (en) Integrated learning-based Kawasaki disease risk assessment model construction method and system
CN109273093A (en) A kind of construction method and building system of Kawasaki disease risk evaluation model
CN109215781A (en) A kind of construction method and building system of the Kawasaki disease risk evaluation model based on logistic algorithm
CN109243604A (en) A kind of construction method and building system of the Kawasaki disease risk evaluation model based on neural network algorithm
CN110097975A (en) A kind of nosocomial infection intelligent diagnosing method and system based on multi-model fusion
CN106339593A (en) Kawasaki disease classification and prediction method based on medical data modeling
Manzak et al. Automated classification of Alzheimer’s disease using deep neural network (DNN) by random forest feature elimination
CN106295229A (en) Kawasaki disease hierarchical prediction method based on medical data modeling
CN110491506A (en) Auricular fibrillation prediction model and its forecasting system
Chang et al. The study that applies artificial intelligence and logistic regression for assistance in differential diagnostic of pancreatic cancer
Sharanyaa et al. Hybrid machine learning techniques for heart disease prediction
CN117116477A (en) Construction method and system of prostate cancer disease risk prediction model based on random forest and XGBoost
Idowu Classification techniques using EHG signals for detecting preterm births
Sherly An ensemble basedheart disease predictionusing gradient boosting decision tree
Akbar et al. Comparison of Machine Learning Techniques for Heart Disease Diagnosis and Prediction
CN117116475A (en) Method, system, terminal and storage medium for predicting risk of ischemic cerebral apoplexy
JP2024061599A (en) A system for identifying abnormalities in the course of medical treatment based on a hierarchical neural network
Thangarasu et al. Prediction of hidden knowledge from clinical database using data mining techniques
Tang et al. A neural network to pulmonary embolism aided diagnosis with a feature selection approach
Chatzimichail et al. An evolutionary two-objective genetic algorithm for asthma prediction
Uphade et al. Identification of parameters for classification of COVID-19 patient’s recovery days using machine learning techniques
Kruthi et al. Detection of autism spectrum disorder using machine learning
Castronuovo et al. Analyzing the Interactions between Environmental Parameters and Cardiovascular Diseases Using Random Forest and SHAP Algorithms
Jasmine et al. Heart Disease Prediction and Analysis Using Ensemble Classifier in Machine Learning Techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190115

Assignee: Shanghai Qianbei Medical Technology Co.,Ltd.

Assignor: BASEPAIR BIOTECHNOLOGY Co.,Ltd.

Contract record no.: X2020980002296

Denomination of invention: Logistic algorithm-based construction method of Kawasaki disease risk assessment model and construction system

License type: Common License

Record date: 20200518

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210719

Address after: 201600 room 406, no.6, Lane 1015, Longteng Road, Songjiang District, Shanghai

Applicant after: Daozhi precision medicine technology (Shanghai) Co.,Ltd.

Address before: Unit 426, A2 Floor, 218 Xinghu Street, Suzhou Industrial Park, Jiangsu Province

Applicant before: BASEPAIR BIOTECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
EC01 Cancellation of recordation of patent licensing contract
EC01 Cancellation of recordation of patent licensing contract

Assignee: Shanghai Qianbei Medical Technology Co.,Ltd.

Assignor: BASEPAIR BIOTECHNOLOGY Co.,Ltd.

Contract record no.: X2020980002296

Date of cancellation: 20231218