CN106980757A - The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging - Google Patents

The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging Download PDF

Info

Publication number
CN106980757A
CN106980757A CN201710154709.0A CN201710154709A CN106980757A CN 106980757 A CN106980757 A CN 106980757A CN 201710154709 A CN201710154709 A CN 201710154709A CN 106980757 A CN106980757 A CN 106980757A
Authority
CN
China
Prior art keywords
data
kawasaki disease
coronary artery
pathological changes
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710154709.0A
Other languages
Chinese (zh)
Inventor
贺向前
张胜
田杰
樊楚
谭续海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Medical University
Original Assignee
Chongqing Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Medical University filed Critical Chongqing Medical University
Priority to CN201710154709.0A priority Critical patent/CN106980757A/en
Publication of CN106980757A publication Critical patent/CN106980757A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Abstract

The invention discloses a kind of concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging, including management control module, typing module is provided with the input of management control module, the output end connection Kawasaki disease database of management control module, the output end of Kawasaki disease database is connected with data processor;Typing module is used for typing Kawasaki disease data;After management control module is pre-processed according to the Kawasaki disease data of typing, classification is preserved to Kawasaki disease database;Data processor is used to carry out all data in Kawasaki disease database data scrubbing, data integration and data conversion.Beneficial effect:By management system, the analysis quality and efficiency of Kawasaki disease data are improved;The hazards related to disease are found using Strong association rule, precision of prediction is improved using Random Forest model, usability is high, and good reliability, the data source being related to is wide, it is easy to accomplish, artificial workload is small.

Description

The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging
Technical field
The present invention relates to technical field of life science, specifically a kind of concurrent coronary artery pathological changes of Kawasaki disease it is dangerous because Plain management system and method for digging.
Background technology
Kawasaki disease is a kind of with the scorching eruptive pediatric disease of febris acuta for major lesions of system vascular.Coronary artery is damaged Wound is the major complications of Kawasaki disease, and the infant about 15%-25% of untreated forms coronary artery pathological changes, wherein, it is coronal dynamic Arteries and veins lesion includes thrombus shape at coronary artery expansion, coronary aneurysm, coronary artery stenosis, occlusion and atherosis, aneurysm Into myocardial infarction, ischemic heart disease or even sudden death occurs in severe patient, therefore preventing and treating Coronary Artery Lesions are pediatrician's treatment Kawasakis The primary and foremost purpose of sufferer youngster.
Domestic and international correlative factor of the scientist all to the concurrent coronary artery injury of Kawasaki disease has carried out substantial amounts of, deep grind Study carefully.But at present still without the conclusion accepted extensively by the whole world and can be widely used in clinic evaluation Kawasaki disease it is concurrently coronal The system of arterial injury degree of danger.
It is concurrent coronal dynamic that many researchers find out Kawasaki disease by the clinical data progress statistical analysis to patients with Kawasaki disease The hazards of arteries and veins damage.But as electronic medical record system development in recent years is very fast, hospital has gradually formed a set pattern The clinical data resource of mould, the electronic data that involves a wide range of knowledge big to these quantity, the inquiry of conventional data base management system Search mechanism and statistical analysis method can not be effectively analyzed mass data.
But, for existing statistics and search mechanism technology, there are many drawbacks in such method, such as:Analysis method Single, manpower and time loss are big, the science difference of Forecasting Methodology the shortcomings of, when can not meet the intelligence of people's life Generation.
In summary, it is necessary to a kind of technology for meeting people's demand of design is proposed, to the concurrent coronary artery of Kawasaki disease Lesion hazards make more detailed, intelligent analysis.
The content of the invention
In view of the above-mentioned problems, the invention provides a kind of concurrent coronary artery pathological changes hazards management system of Kawasaki disease and Method for digging, sets up the hazards management system of Diagnosisof Kawasaki Disease with Coronary Artery Involvement, and Kawasaki disease data are counted, from big The hazards of Diagnosisof Kawasaki Disease with Coronary Artery Involvement are excavated in the statistics of amount.
To reach above-mentioned purpose, the concrete technical scheme that the present invention is used is as follows:
A kind of concurrent coronary artery pathological changes hazards management system of Kawasaki disease, its key is:Including management control mould Block, typing module, the output end connection Kawasaki disease of the management control module are provided with the input of the management control module Database, the output end of the Kawasaki disease database is connected with data processor;The typing module is used for typing Kawasaki disease number According to;After the management control module is pre-processed according to the Kawasaki disease data of typing, classification is preserved to Kawasaki disease database;Institute Stating data processor is used to carry out all data in Kawasaki disease database data scrubbing, data integration and data conversion.
By above-mentioned design, the management system is counted to Kawasaki disease data, and people transfer to data.Wherein, Management control module is pre-processed according to the Kawasaki disease data of typing, is carried out classification preservation, is become apparent from Kawasaki disease data. Data processor carries out data scrubbing, data integration and data to all data in Kawasaki disease database and converted, and obtains Kawasaki Sick data set.
Further, in order to obtain the data that Kawasaki disease is all, the Kawasaki disease database includes of patients with Kawasaki disease People's document data base, clinical examination database, ultrasonic cardiography chart database, diagnostic result database and electronic health record database.
A kind of concurrent coronary artery pathological changes hazards method for digging of Kawasaki disease, its key is to comprise the following steps:
S1:Personal data, clinical examination data, the ultrasonic cardiography of all patients with Kawasaki disease are obtained from Kawasaki disease database Diagram data and diagnostic result data, electronic health record data;
S2:All data that the data processor is obtained to step S1 carry out data scrubbing, data integration and data and become Change, obtain Kawasaki disease data set;
S3:Data mining is carried out to Kawasaki disease data set using association rules method, obtains related to coronary artery pathological changes Hazards;
S4:Random forest risk forecast model is set up to the concurrent Coronary Artery Lesions of Kawasaki disease with random forests algorithm, and calculated The AUC areas of the random forest risk forecast model.
It is that the management control module has carried out pretreated data, the pretreatment step in the step S1 data obtained It is rapid to be specially:
S11:Obtain personal data, clinical examination data, ultrasonic cardiography diagram data and the diagnosis knot of all patients with Kawasaki disease Fruit data, electronic health record data;
S12:All data obtained according to step S11, take out all predictive variable and predictive variable average;
S13:Determine patients with Kawasaki disease whether there is classified variable, classification grade and each grade institute for occurring coronary artery pathological changes Corresponding classified variable value;
S14:All patients with Kawasaki disease are classified, preserved to Kawasaki disease database.
The predictive variable includes sex, age and 52 laboratory checking index of patients with Kawasaki disease, 52 realities Testing room Index for examination is:C reactive protein, leucocyte, monocyte absolute value, lymphocyte absolute value, neutrophil leucocyte is absolute Value, red blood cell, hemoglobin, packed cell volume, MCVU, mean corpuscular hemoglobin concentration (MCHC), red blood cell point Cloth width, RDW absolute value, platelet count, mean platelet volume, large platelet cell ratio, blood platelet distribution Width, thrombocytocrit, the absolute value of eosinophil, with reference to bilirubin, total bile acid, albumin, serum complement C4, courage Red pigment, Urine proteins, gamma-glutamyl turns peptide, glutamic-pyruvic transaminase, glutamic-oxalacetic transaminease, millet straw/paddy third, red cell morphology;Creatinine, flesh Acid kinase, creatine kinase isozyme, indirect bilirubin, alkaline phosphatase, phosphorus, chlorine, magnesium, sodium, urea nitrogen, uric acid urinates glucose, Prealbumin, globulin, lactic dehydrogenase, body ketone urinates vitamin C, erythrocyte sedimentation rate, nitrite, total bilirubin, total protein, total calcium;
The classified variable is the z-score values in ultrasonic cardiography diagram data;
The classification grade is included without coronary artery pathological changes (NCAL), small-sized coronary aneurysm (SCAL), medium-sized coronal dynamic Arteries and veins knurl (MCAL) and huge coronary aneurysm (GCAL);
It is described to be without the corresponding classified variable value of coronary artery pathological changes:z-score<2.5;
The corresponding classified variable value of the small-sized coronary aneurysm is:2.5≦z-score<5.0;
The corresponding classified variable value of the medium-sized coronary aneurysm is:5.0≦z-score<10.0;
The corresponding classified variable value of the huge coronary aneurysm is:z-score≧10.0.
The particular content of data scrubbing is described in step S2:
To there is the index more than missing data, filled up using multiple interpolation enthesis, wherein using predictive variable Average carries out interpolation;
To existing, missing data is few and missing data occurs at random, then missing data is deleted;
The particular content of the data integration is:By the number in all tables of data in electronic health record data described in step S1 According to being merged into synthesis table;
The particular content of data conversion is:The value of each attribute in the synthesis table is converted into the shape of data mining Formula, and respectively to all properties the characteristics of carry out normalization processing and coding.
Rapid S3 carries out data mining to Kawasaki disease data set using association rules method and concretely comprised the following steps:
S31:Obtain the data of all patients with Kawasaki disease for suffering from Coronary Artery Lesions;
S32:Rule analysis is associated to the obtained data of step S31, obtained using rule constraint and interest-degree constraint The related Strong association rule of the concurrent Coronary Artery Lesions of Kawasaki disease;
S33:Using the predictive variable occurred in Strong association rule as the concurrent coronary artery pathological changes of Kawasaki disease hazards.
The specific method of Strong association rule acquisition is in step S32:
Correlation rule X → Y is set up, X is condition:Including at least one predictive variable, Y is result;Including wherein coronal dynamic One classification grade of arteries and veins lesion;
Set min confidence and minimum support;
When the support and confidence level of correlation rule are all higher than min confidence and minimum support, then associated by force Rule.
What step S4 random forests risk forecast model was set up and assessed concretely comprises the following steps:
S41:The Kawasaki disease data set that step S2 is obtained is according to N:1 ratio cut partition is training sample and test sample;
S42:Using the hazards occurred in step S3 as forecast model prediction index;
S43:Set up the random forest risk forecast model to the concurrent Coronary Artery Lesions of Kawasaki disease;
Select the number mtry and parameter of Split Attribute to generate the number ntree of decision tree to observe mould by adjusting parameter The predicated error of type sets up random forest risk profile with ntree situation of change with the optimal random forest number of this determination Model;
S44:According to step S41 test sample, the AUC areas of random forest risk forecast model are calculated.
Beneficial effects of the present invention:The hazards management system of Diagnosisof Kawasaki Disease with Coronary Artery Involvement is set up, and to Kawasaki disease Data are counted, and the hazards of Diagnosisof Kawasaki Disease with Coronary Artery Involvement are excavated from substantial amounts of statistics.Utilize strong association The rule discovery hazards related to disease, using random forests algorithm set up Random Forest model precision of prediction up to much surpass Go out traditional Multivariate Logistic Regression model, improve the quality and efficiency of analysis;Usability is high, and good reliability is related to Data source is wide, it is easy to accomplish, artificial workload is small.
Brief description of the drawings
Fig. 1 is management system block diagram of the invention;
Fig. 2 is data digging flow figure of the invention;
Fig. 3 is the analysis result figure of association rules method;
Fig. 4 generates the changing trend diagram of decision tree number for the predicated error of random forest risk forecast model with parameter;
Fig. 5 is the figure that predicts the outcome of random forest risk forecast model;
Fig. 6 is the ROC curve comparison diagram of random forest risk forecast model and Logistic regression models;
Embodiment
The embodiment and operation principle to the present invention are described in further detail below in conjunction with the accompanying drawings.
As shown in Figure 1:A kind of concurrent coronary artery pathological changes hazards management system of Kawasaki disease, including management control mould Block, typing module, the output end connection Kawasaki disease of the management control module are provided with the input of the management control module Database, the output end of the Kawasaki disease database is connected with data processor;The typing module is used for typing Kawasaki disease number According to;After the management control module is pre-processed according to the Kawasaki disease data of typing, classification is preserved to Kawasaki disease database;Institute Stating data processor is used to carry out all data in Kawasaki disease database data scrubbing, data integration and data conversion.
In the present embodiment, the Kawasaki disease database includes the personal information database of patients with Kawasaki disease, clinical examination Database, ultrasonic cardiography chart database, diagnostic result database and electronic health record database.
Figure it is seen that a kind of concurrent coronary artery pathological changes hazards method for digging of Kawasaki disease, including following step Suddenly:
S1:Personal data, clinical examination data, the ultrasonic cardiography of all patients with Kawasaki disease are obtained from Kawasaki disease database Diagram data and diagnostic result data, electronic health record data;
The pre-treatment step is specially:
S11:Obtain personal data, clinical examination data, ultrasonic cardiography diagram data and the diagnosis knot of all patients with Kawasaki disease Fruit data, electronic health record data;
S12:All data obtained according to step S11, take out all predictive variable and predictive variable average;
S13:Determine patients with Kawasaki disease whether there is classified variable, classification grade and each grade institute for occurring coronary artery pathological changes Corresponding classified variable value;
S14:All patients with Kawasaki disease are classified, preserved to Kawasaki disease database.
In the present embodiment, totally 8501 patients in electronic health record database, have 5020 patients to be diagnosed as Kawasaki disease, Coronary Artery Lesions occur for wherein 343 people and Coronary Artery Lesions do not occur for 4677 people.
In the present embodiment, sex, age and 52 laboratory examinations of the predictive variable including patients with Kawasaki disease refer to Mark, 52 laboratory checking index are:C reactive protein, leucocyte, monocyte absolute value, lymphocyte absolute value, in Property granulocyte absolute value, red blood cell, hemoglobin, packed cell volume, MCVU, mean corpuscular hemoglobin is dense Degree, RDW, RDW absolute value, platelet count, mean platelet volume, large platelet cell Than, MPW, thrombocytocrit, the absolute value of eosinophil, with reference to bilirubin, total bile acid, albumin, Serum complement C4, bilirubin, Urine proteins, gamma-glutamyl turns peptide, glutamic-pyruvic transaminase, glutamic-oxalacetic transaminease, millet straw/paddy third, red blood cell Form;Creatinine, creatine kinase, creatine kinase isozyme, indirect bilirubin, alkaline phosphatase, phosphorus, chlorine, magnesium, sodium, urea nitrogen, urine Acid, urine glucose, prealbumin, globulin, lactic dehydrogenase, body ketone, urine vitamin C, erythrocyte sedimentation rate, nitrite, total bilirubin, Total protein, total calcium;
The classified variable is the z-score values in ultrasonic cardiography diagram data;
The classification grade is included without coronary artery pathological changes (NCAL), small-sized coronary aneurysm (SCAL), medium-sized coronal dynamic Arteries and veins knurl (MCAL) and huge coronary aneurysm (GCAL);
It is described to be without the corresponding classified variable value of coronary artery pathological changes:z-score<2.5;
The corresponding classified variable value of the small-sized coronary aneurysm is:2.5≦z-score<5.0;
The corresponding classified variable value of the medium-sized coronary aneurysm is:5.0≦z-score<10.0;
The corresponding classified variable value of the huge coronary aneurysm is:z-score≧10.0.
S2:All data that the data processor is obtained to step S1 carry out data scrubbing, data integration and data and become Change, obtain Kawasaki disease data set;
The particular content of the data scrubbing is:
To there is the index more than missing data, filled up using multiple interpolation enthesis, wherein using predictive variable Average carries out interpolation;
To existing, missing data is few and missing data occurs at random, then missing data is deleted;
The particular content of the data integration is:
Data in all tables of data in electronic health record data described in step S1 are merged into synthesis table;
The particular content of data conversion is:
The value of each attribute in the synthesis table is converted into the form of data mining, and respectively to all properties the characteristics of Carry out normalization processing and coding.
In the present embodiment, normalization processing and coding are:
For the age, it is divided into less than 2 years old, 2 years old to 5 years old, 5 years old to 7 years old, more than 7 years old 4 intervals, successively with a, b, c, d Represent.
To the Biological indicators of laboratory inspection, such as it is according to the range of normal value of c reactive protein<8mg/L, then be divided into< 8mg/L He≤the intervals of 8mg/L two, successively with N, H is represented.Completed using the SQL statement of MySQL database.
S3:Data mining is carried out to Kawasaki disease data set using association rules method, obtains related to coronary artery pathological changes Hazards;Using totally 343 data sets of patients with Kawasaki disease for suffering from Coronary Artery Lesions are excavated in total sample, specific steps For:
S31:Obtain the data of all patients with Kawasaki disease for suffering from Coronary Artery Lesions;
S32:Rule analysis is associated to the obtained data of step S31, obtained using rule constraint and interest-degree constraint The related Strong association rule of the concurrent Coronary Artery Lesions of Kawasaki disease;
Specific method is:
Correlation rule X → Y is set up, X is condition:Including at least one predictive variable, Y is result;Including wherein coronal dynamic One classification grade of arteries and veins lesion;
Set min confidence and minimum support;
When the support and confidence level of correlation rule are all higher than min confidence and minimum support, then associated by force Rule.
In the present embodiment, min confidence is 0.9, and minimum support is 0.01.
S33:Using the predictive variable occurred in Strong association rule as the concurrent coronary artery pathological changes of Kawasaki disease hazards.
In the present embodiment, occur in Strong association rule 30 predictions related to the concurrent Coronary Artery Lesions of Kawasaki disease are become Amount is as hazards for predicting, these indexs are:Sex, age, packed cell volume, Platelet large cell ratio, C reaction eggs In vain, platelet count, glutamic-oxalacetic transaminease, glutamic-pyruvic transaminase, millet straw/paddy third, erythrocyte sedimentation rate, mean platelet volume, monocyte are exhausted To value, albumin, ketoboidies, serium inorganic phosphorus, blood chlorine, alkaline phosphatase, red blood cell, NCHC, acidophil absolute value, Urea nitrogen, neutrophil leucocyte absolute value, mean corpuscular volume (MCV), RDW, red cell morphology, red cell distribution are exhausted To value, urine protein, total protein, prealbumin, average hemoglobin amount.
As shown in figure 3, preceding 1000 correlation rules are found by counting, male, the rise of large platelet cells ratio, blood are small The rise of the plate dispersion of distribution, urea nitrogen rise and serium inorganic phosphorus rise have stronger correlation with the concurrent coronary artery pathological changes of Kawasaki disease.
S4:Random forest risk forecast model is set up to the concurrent Coronary Artery Lesions of Kawasaki disease with random forests algorithm, and calculated The AUC areas of the random forest risk forecast model.
Concretely comprise the following steps:
S41:The Kawasaki disease data set that step S2 is obtained is according to N:1 ratio cut partition is training sample and test sample;
In the present embodiment, according to 3:Data set is divided into training sample (3765) and test sample by 1 ratio at random (1255).
Training sample is used to model, and test sample is used for model evaluation.
S42:Using the hazards occurred in step S3 as forecast model prediction index;
Using 30 indexs occurred in above-mentioned correlation rule as model prediction index;
S43:Set up the random forest risk forecast model to the concurrent Coronary Artery Lesions of Kawasaki disease;
Select the number mtry and parameter of Split Attribute to generate the number ntree of decision tree to observe mould by adjusting parameter The predicated error of type sets up random forest risk profile with ntree situation of change with the optimal random forest number of this determination Model;
Because mtry default value is the root mean square of attribute number, the predictive variable number that the present invention is selected is 54, because This starts adjustment using mtry as 8, and generation decision tree number ntree changes to 400 from 100, respectively the predicated error of observing and nursing With ntree situation of change, random forest risk forecast model is set up with the optimal decision tree generation number of this determination.
S44:According to step S41 test sample, the AUC areas of random forest risk forecast model are calculated.
As shown in figure 4, diminish with the decision tree number of generation, the macro-forecast error of random forest risk forecast model Reduce therewith, from fig. 3 it can also be seen that optimal generation decision tree number is 80 or so, to without coronary artery pathological changes (NCAL), small The predicated error of type coronary aneurysm (SCAL), medium-sized coronary aneurysm (MCAL) and huge coronary aneurysm (GCAL) all reaches Stable state, and all control below 0.1.
Fig. 5 be random forest risk forecast model the figure that predicts the outcome, it has been observed that, c reactive protein, erythrocyte sedimentation rate, sex, Age, mean corpuscular hemoglobin concentration (MCHC), albumin, prealbumin, eosinophil absolute value are in model prediction The higher predictive variable of importance;It is glutamic-pyruvic transaminase, blood platelet, red in addition, with the increase of severity degree of coronary Cell pack, glutamic-oxalacetic transaminease, body ketone, millet straw/paddy third, mean corpuscular volume (MCV), Urine proteins, urea nitrogen, total protein, red blood cell Importance of the dispersion of distribution absolute value in prediction is consequently increased.
Fig. 6 is random forest risk forecast model (Randomforest) and Multivariate Logistic Regression model Operating characteristic (ROC) curve of (Logistec Regression), by calculating respective AUC areas, random forest risk is pre- The AUC areas for surveying model are 98.2%, and the AUC areas of regression model are 59.2%, it will be apparent that, the prediction effect of Random Forest model Fruit is more excellent than the prediction effect of regression model.
It should be pointed out that described above is not limitation of the present invention, the present invention is also not limited to the example above, What those skilled in the art were made in the essential scope of the present invention changes, is modified, adds or replaces, and also should Belong to protection scope of the present invention.

Claims (9)

1. a kind of concurrent coronary artery pathological changes hazards management system of Kawasaki disease, it is characterised in that:Including management control module, Typing module, the output end connection Kawasaki disease data of the management control module are provided with the input of the management control module Storehouse, the output end of the Kawasaki disease database is connected with data processor;
The typing module is used for typing Kawasaki disease data;
After the management control module is pre-processed according to the Kawasaki disease data of typing, classification is preserved to Kawasaki disease database;
The data processor is used to carry out all data in Kawasaki disease database data scrubbing, data integration and data change Change.
2. the concurrent coronary artery pathological changes hazards management system of Kawasaki disease according to claim 1, it is characterised in that:Institute State the personal information database of Kawasaki disease database including patients with Kawasaki disease, clinical examination database, ultrasonic cardiography chart database, Diagnostic result database and electronic health record database.
3. a kind of concurrent coronary artery pathological changes hazards method for digging of Kawasaki disease, it is characterised in that comprise the following steps:
S1:Personal data, clinical examination data, the echocardiogram number of all patients with Kawasaki disease are obtained from Kawasaki disease database According to and diagnostic result data, electronic health record data;
S2:All data that the data processor is obtained to step S1 carry out data scrubbing, data integration and data conversion, obtain To Kawasaki disease data set;
S3:Data mining is carried out to Kawasaki disease data set using association rules method, the danger related to coronary artery pathological changes is obtained Dangerous factor;
S4:With random forests algorithm the concurrent Coronary Artery Lesions of Kawasaki disease are set up with random forest risk forecast model, and calculates described The AUC areas of random forest risk forecast model.
4. the concurrent coronary artery pathological changes hazards method for digging of Kawasaki disease according to claim 3, it is characterised in that The data that step S1 is obtained have carried out pretreated data for the managing system device, and the pre-treatment step is specially:
S11:Obtain personal data, clinical examination data, ultrasonic cardiography diagram data and the diagnostic result number of all patients with Kawasaki disease According to, electronic health record data;
S12:All data obtained according to step S11, take out all predictive variable and predictive variable average;
S13:Determine patients with Kawasaki disease is whether there is corresponding to classified variable, classification grade and each grade for occurring coronary artery pathological changes Classified variable value;
S14:All patients with Kawasaki disease are classified, preserved to Kawasaki disease database.
5. the concurrent coronary artery pathological changes hazards method for digging of Kawasaki disease according to claim 4, it is characterised in that:Institute Stating predictive variable includes sex, age and 52 laboratory checking index of patients with Kawasaki disease, and 52 laboratory examinations refer to It is designated as:C reactive protein, leucocyte, monocyte absolute value, lymphocyte absolute value, neutrophil leucocyte absolute value, red blood cell, Hemoglobin, packed cell volume, MCVU, mean corpuscular hemoglobin concentration (MCHC), RDW is red thin Born of the same parents' dispersion of distribution absolute value, platelet count, mean platelet volume, large platelet cell ratio, MPW, blood platelet Hematocrit, the absolute value of eosinophil, with reference to bilirubin, total bile acid, albumin, serum complement C4, bilirubin urinates egg In vain, gamma-glutamyl turns peptide, glutamic-pyruvic transaminase, glutamic-oxalacetic transaminease, millet straw/paddy third, red cell morphology;Creatinine, creatine kinase, flesh Acid kinase isodynamic enzyme, indirect bilirubin, alkaline phosphatase, phosphorus, chlorine, magnesium, sodium, urea nitrogen, uric acid, urine glucose, prealbumin, Globulin, lactic dehydrogenase, body ketone urinates vitamin C, erythrocyte sedimentation rate, nitrite, total bilirubin, total protein, total calcium;
The classified variable is the z-score values in ultrasonic cardiography diagram data;
The classification grade is included without coronary artery pathological changes, small-sized coronary aneurysm, medium-sized coronary aneurysm and huge coronal dynamic Arteries and veins knurl;
It is described to be without the corresponding classified variable value of coronary artery pathological changes:z-score<2.5;
The corresponding classified variable value of the small-sized coronary aneurysm is:2.5≦z-score<5.0;
The corresponding classified variable value of the medium-sized coronary aneurysm is:5.0≦z-score<10.0;
The corresponding classified variable value of the huge coronary aneurysm is:z-score≧10.0.
6. the concurrent coronary artery pathological changes hazards method for digging of Kawasaki disease according to claim 3, it is characterised in that:
The particular content of data scrubbing is described in step S2:
To there is the index more than missing data, filled up using multiple interpolation enthesis, wherein using predictive variable average Carry out interpolation;
To existing, missing data is few and missing data occurs at random, then missing data is deleted;
The particular content of the data integration is:Data in all tables of data in electronic health record data described in step S1 are closed And into synthesis table;
The particular content of data conversion is:The value of each attribute in the synthesis table is converted into the form of data mining, And respectively to all properties the characteristics of carry out normalization processing and coding.
7. the concurrent coronary artery pathological changes hazards of the Kawasaki disease according to claim 6 based on data mining technology is pre- Survey method, it is characterised in that step S3 carries out the specific steps of data mining using association rules method to Kawasaki disease data set For:
S31:Obtain the data of all patients with Kawasaki disease for suffering from Coronary Artery Lesions;
S32:Rule analysis is associated to the obtained data of step S31, Kawasaki is obtained using rule constraint and interest-degree constraint The related Strong association rule of sick concurrent Coronary Artery Lesions;
S33:Using the predictive variable occurred in Strong association rule as the concurrent coronary artery pathological changes of Kawasaki disease hazards.
8. the concurrent coronary artery pathological changes hazards of the Kawasaki disease according to claim 7 based on data mining technology is pre- Survey method, it is characterised in that the specific method of Strong association rule acquisition is in step S32:
Correlation rule X → Y is set up, X is condition:Including at least one predictive variable, Y is result;Including wherein coronary artery disease The classification grade become;
Set min confidence and minimum support;
When the support and confidence level of correlation rule are all higher than min confidence and minimum support, then strong association rule are obtained Then.
9. the concurrent coronary artery pathological changes danger of the Kawasaki disease based on data mining technology according to claim 3-8 any one The Forecasting Methodology of dangerous factor, it is characterised in that what step S4 random forests risk forecast model was set up and assessed concretely comprises the following steps:
S41:The Kawasaki disease data set that step S2 is obtained is according to N:1 ratio cut partition is training sample and test sample;
S42:Using the hazards occurred in step S3 as forecast model prediction index;
S43:Set up the random forest risk forecast model to the concurrent Coronary Artery Lesions of Kawasaki disease;
The number mtry and parameter of Split Attribute is selected to generate the number ntree of decision tree come observing and nursing by adjusting parameter Predicated error sets up random forest risk forecast model with ntree situation of change with the optimal random forest number of this determination;
S44:According to step S41 test sample, the AUC areas of random forest risk forecast model are calculated.
CN201710154709.0A 2017-03-15 2017-03-15 The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging Pending CN106980757A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710154709.0A CN106980757A (en) 2017-03-15 2017-03-15 The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710154709.0A CN106980757A (en) 2017-03-15 2017-03-15 The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging

Publications (1)

Publication Number Publication Date
CN106980757A true CN106980757A (en) 2017-07-25

Family

ID=59339518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710154709.0A Pending CN106980757A (en) 2017-03-15 2017-03-15 The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging

Country Status (1)

Country Link
CN (1) CN106980757A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688653A (en) * 2017-09-01 2018-02-13 武汉倚天剑科技有限公司 User behavior data digging system and its method based on network shallow-layer data
CN108039207A (en) * 2017-12-06 2018-05-15 无锡市儿童医院 The assessment system and method for thrombotic risk factor under Kawasaki disease
CN109215788A (en) * 2018-08-22 2019-01-15 四川大学 A kind of prediction technique and device of mucous membrane of mouth disease damage canceration degree of danger
CN110335679A (en) * 2019-06-21 2019-10-15 山东大学 A kind of Prediction of survival method and system based on more granularity graph mode excavations
CN110957034A (en) * 2018-09-26 2020-04-03 金敏 Disease prediction system
CN111241148A (en) * 2018-11-29 2020-06-05 金敏 Medical data sorting method, medical data sorting device and electronic equipment
CN113270194A (en) * 2021-04-22 2021-08-17 深圳市雅士长华智能科技有限公司 Health data management system based on cloud computing
CN113380329A (en) * 2021-06-08 2021-09-10 重庆医科大学 Prediction system and device for resistance of first injection gamma globulin of Kawasaki disease patient child

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866713A (en) * 2015-05-12 2015-08-26 南京霁云信息科技有限公司 Kawasaki disease and fever diagnosis system based on embedding of incremental local discrimination subspace
CN105095673A (en) * 2015-08-26 2015-11-25 中国人民解放军军事医学科学院放射与辐射医学研究所 Construction method of chronic disease risk model on the basis of medical big data mining
CN106295229A (en) * 2016-08-30 2017-01-04 青岛大学 A kind of mucocutaneous lymphnode syndrome grade predicting method based on medical data modeling
CN106339593A (en) * 2016-08-31 2017-01-18 青岛睿帮信息技术有限公司 Kawasaki disease classification and prediction method based on medical data modeling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866713A (en) * 2015-05-12 2015-08-26 南京霁云信息科技有限公司 Kawasaki disease and fever diagnosis system based on embedding of incremental local discrimination subspace
CN105095673A (en) * 2015-08-26 2015-11-25 中国人民解放军军事医学科学院放射与辐射医学研究所 Construction method of chronic disease risk model on the basis of medical big data mining
CN106295229A (en) * 2016-08-30 2017-01-04 青岛大学 A kind of mucocutaneous lymphnode syndrome grade predicting method based on medical data modeling
CN106339593A (en) * 2016-08-31 2017-01-18 青岛睿帮信息技术有限公司 Kawasaki disease classification and prediction method based on medical data modeling

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
TAE YEUN KIM ET AL: "Predictive risk factors for coronary artery abnormalities in Kawasaki disease", 《EUR J PEDIATR》 *
V.E.A.HONKANEN: "Clinical Relevance of the Risk Factors for Coronary Artery Inflammation in Kawasaki Disease", 《PEDIATRIC CARDIOLOGY》 *
刘天时等: "《软件案例分析》", 31 January 2016, 清华大学出版社 *
张影等: "《预测与评价》", 31 May 2015, 天津大学出版社 *
曹文哲等: "基于Logistic回归和随机森林算法的2型糖尿病并发视网膜病变风险预测及对比研究", 《中国医疗设备》 *
樊楚等: "基于数据挖掘技术建立的BP神经网络模型鉴别儿童川崎病与发热性疾病的研究", 《中国循证儿科杂志》 *
段泓宇等: "川崎病患儿并发冠状动脉损害的高危因素分析", 《临床儿科杂志》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688653A (en) * 2017-09-01 2018-02-13 武汉倚天剑科技有限公司 User behavior data digging system and its method based on network shallow-layer data
CN108039207A (en) * 2017-12-06 2018-05-15 无锡市儿童医院 The assessment system and method for thrombotic risk factor under Kawasaki disease
CN109215788A (en) * 2018-08-22 2019-01-15 四川大学 A kind of prediction technique and device of mucous membrane of mouth disease damage canceration degree of danger
CN109215788B (en) * 2018-08-22 2022-01-18 四川大学 Method and device for predicting canceration risk degree of oral mucosa lesion
CN110957034A (en) * 2018-09-26 2020-04-03 金敏 Disease prediction system
CN110957043A (en) * 2018-09-26 2020-04-03 金敏 Disease prediction system
CN111241148A (en) * 2018-11-29 2020-06-05 金敏 Medical data sorting method, medical data sorting device and electronic equipment
CN110335679A (en) * 2019-06-21 2019-10-15 山东大学 A kind of Prediction of survival method and system based on more granularity graph mode excavations
CN113270194A (en) * 2021-04-22 2021-08-17 深圳市雅士长华智能科技有限公司 Health data management system based on cloud computing
CN113380329A (en) * 2021-06-08 2021-09-10 重庆医科大学 Prediction system and device for resistance of first injection gamma globulin of Kawasaki disease patient child

Similar Documents

Publication Publication Date Title
CN106980757A (en) The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging
CN109378072A (en) A kind of abnormal fasting blood sugar method for early warning based on integrated study Fusion Model
Sun et al. Early prediction of acute kidney injury in critical care setting using clinical notes and structured multivariate physiological measurements.
Blanco et al. Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS
Stephan et al. Clinical evaluation of circulating blood volume in critically ill patients—contribution of a clinical scoring system
Chaurasia et al. Chronic kidney disease: a predictive model using decision tree
CN107194138A (en) A kind of fasting blood-glucose Forecasting Methodology based on physical examination data modeling
Fialho et al. Disease-based modeling to predict fluid response in intensive care units
Shahin et al. Data mining in healthcare information systems: case studies in Northern Lebanon
Xiong et al. Prediction of hemodialysis timing based on LVW feature selection and ensemble learning
Zhang et al. Model construction for biological age based on a cross-sectional study of a healthy Chinese Han population
CN109585011A (en) The Illnesses Diagnoses method and machine readable storage medium of chest pain patients
Pilloud et al. Re-evaluating traditional markers of stress in an archaeological sample from central California
CN114974585A (en) Construction method of early risk prediction and evaluation model of metabolic syndrome in gestational period
Newaz et al. A case study on risk prediction in heart failure patients using random survival forest
Wadhawan et al. ETCD: An effective machine learning based technique for cardiac disease prediction with optimal feature subset selection
Kaur et al. Artificial Intelligence approaches for Predicting Hypertension Diseases: Open Challenges and Research Issues
CN111627559B (en) System for predicting patient mortality risk
CN110895969A (en) Atrial fibrillation prediction decision tree and pruning method thereof
Vilas-Boas et al. Hourly prediction of organ failure and outcome in intensive care based on data mining techniques
Khitan et al. Predicting adverse outcomes in chronic kidney disease using machine learning methods: data from the modification of diet in renal disease
Kumar A survey on data mining techniques for prediction of heart diseases
Sanaiha et al. Morbidity and mortality associated with blood transfusions in elective adult cardiac surgery
Zhang et al. Prediction of Gestational Diabetes Mellitus under Cascade and Ensemble Learning Algorithm
CN114141359A (en) Liquid treatment early warning system for general anesthesia abdominal operation patient

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170725

RJ01 Rejection of invention patent application after publication