CN106980757A - The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging - Google Patents
The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging Download PDFInfo
- Publication number
- CN106980757A CN106980757A CN201710154709.0A CN201710154709A CN106980757A CN 106980757 A CN106980757 A CN 106980757A CN 201710154709 A CN201710154709 A CN 201710154709A CN 106980757 A CN106980757 A CN 106980757A
- Authority
- CN
- China
- Prior art keywords
- data
- kawasaki disease
- coronary artery
- pathological changes
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Abstract
The invention discloses a kind of concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging, including management control module, typing module is provided with the input of management control module, the output end connection Kawasaki disease database of management control module, the output end of Kawasaki disease database is connected with data processor;Typing module is used for typing Kawasaki disease data;After management control module is pre-processed according to the Kawasaki disease data of typing, classification is preserved to Kawasaki disease database;Data processor is used to carry out all data in Kawasaki disease database data scrubbing, data integration and data conversion.Beneficial effect:By management system, the analysis quality and efficiency of Kawasaki disease data are improved;The hazards related to disease are found using Strong association rule, precision of prediction is improved using Random Forest model, usability is high, and good reliability, the data source being related to is wide, it is easy to accomplish, artificial workload is small.
Description
Technical field
The present invention relates to technical field of life science, specifically a kind of concurrent coronary artery pathological changes of Kawasaki disease it is dangerous because
Plain management system and method for digging.
Background technology
Kawasaki disease is a kind of with the scorching eruptive pediatric disease of febris acuta for major lesions of system vascular.Coronary artery is damaged
Wound is the major complications of Kawasaki disease, and the infant about 15%-25% of untreated forms coronary artery pathological changes, wherein, it is coronal dynamic
Arteries and veins lesion includes thrombus shape at coronary artery expansion, coronary aneurysm, coronary artery stenosis, occlusion and atherosis, aneurysm
Into myocardial infarction, ischemic heart disease or even sudden death occurs in severe patient, therefore preventing and treating Coronary Artery Lesions are pediatrician's treatment Kawasakis
The primary and foremost purpose of sufferer youngster.
Domestic and international correlative factor of the scientist all to the concurrent coronary artery injury of Kawasaki disease has carried out substantial amounts of, deep grind
Study carefully.But at present still without the conclusion accepted extensively by the whole world and can be widely used in clinic evaluation Kawasaki disease it is concurrently coronal
The system of arterial injury degree of danger.
It is concurrent coronal dynamic that many researchers find out Kawasaki disease by the clinical data progress statistical analysis to patients with Kawasaki disease
The hazards of arteries and veins damage.But as electronic medical record system development in recent years is very fast, hospital has gradually formed a set pattern
The clinical data resource of mould, the electronic data that involves a wide range of knowledge big to these quantity, the inquiry of conventional data base management system
Search mechanism and statistical analysis method can not be effectively analyzed mass data.
But, for existing statistics and search mechanism technology, there are many drawbacks in such method, such as:Analysis method
Single, manpower and time loss are big, the science difference of Forecasting Methodology the shortcomings of, when can not meet the intelligence of people's life
Generation.
In summary, it is necessary to a kind of technology for meeting people's demand of design is proposed, to the concurrent coronary artery of Kawasaki disease
Lesion hazards make more detailed, intelligent analysis.
The content of the invention
In view of the above-mentioned problems, the invention provides a kind of concurrent coronary artery pathological changes hazards management system of Kawasaki disease and
Method for digging, sets up the hazards management system of Diagnosisof Kawasaki Disease with Coronary Artery Involvement, and Kawasaki disease data are counted, from big
The hazards of Diagnosisof Kawasaki Disease with Coronary Artery Involvement are excavated in the statistics of amount.
To reach above-mentioned purpose, the concrete technical scheme that the present invention is used is as follows:
A kind of concurrent coronary artery pathological changes hazards management system of Kawasaki disease, its key is:Including management control mould
Block, typing module, the output end connection Kawasaki disease of the management control module are provided with the input of the management control module
Database, the output end of the Kawasaki disease database is connected with data processor;The typing module is used for typing Kawasaki disease number
According to;After the management control module is pre-processed according to the Kawasaki disease data of typing, classification is preserved to Kawasaki disease database;Institute
Stating data processor is used to carry out all data in Kawasaki disease database data scrubbing, data integration and data conversion.
By above-mentioned design, the management system is counted to Kawasaki disease data, and people transfer to data.Wherein,
Management control module is pre-processed according to the Kawasaki disease data of typing, is carried out classification preservation, is become apparent from Kawasaki disease data.
Data processor carries out data scrubbing, data integration and data to all data in Kawasaki disease database and converted, and obtains Kawasaki
Sick data set.
Further, in order to obtain the data that Kawasaki disease is all, the Kawasaki disease database includes of patients with Kawasaki disease
People's document data base, clinical examination database, ultrasonic cardiography chart database, diagnostic result database and electronic health record database.
A kind of concurrent coronary artery pathological changes hazards method for digging of Kawasaki disease, its key is to comprise the following steps:
S1:Personal data, clinical examination data, the ultrasonic cardiography of all patients with Kawasaki disease are obtained from Kawasaki disease database
Diagram data and diagnostic result data, electronic health record data;
S2:All data that the data processor is obtained to step S1 carry out data scrubbing, data integration and data and become
Change, obtain Kawasaki disease data set;
S3:Data mining is carried out to Kawasaki disease data set using association rules method, obtains related to coronary artery pathological changes
Hazards;
S4:Random forest risk forecast model is set up to the concurrent Coronary Artery Lesions of Kawasaki disease with random forests algorithm, and calculated
The AUC areas of the random forest risk forecast model.
It is that the management control module has carried out pretreated data, the pretreatment step in the step S1 data obtained
It is rapid to be specially:
S11:Obtain personal data, clinical examination data, ultrasonic cardiography diagram data and the diagnosis knot of all patients with Kawasaki disease
Fruit data, electronic health record data;
S12:All data obtained according to step S11, take out all predictive variable and predictive variable average;
S13:Determine patients with Kawasaki disease whether there is classified variable, classification grade and each grade institute for occurring coronary artery pathological changes
Corresponding classified variable value;
S14:All patients with Kawasaki disease are classified, preserved to Kawasaki disease database.
The predictive variable includes sex, age and 52 laboratory checking index of patients with Kawasaki disease, 52 realities
Testing room Index for examination is:C reactive protein, leucocyte, monocyte absolute value, lymphocyte absolute value, neutrophil leucocyte is absolute
Value, red blood cell, hemoglobin, packed cell volume, MCVU, mean corpuscular hemoglobin concentration (MCHC), red blood cell point
Cloth width, RDW absolute value, platelet count, mean platelet volume, large platelet cell ratio, blood platelet distribution
Width, thrombocytocrit, the absolute value of eosinophil, with reference to bilirubin, total bile acid, albumin, serum complement C4, courage
Red pigment, Urine proteins, gamma-glutamyl turns peptide, glutamic-pyruvic transaminase, glutamic-oxalacetic transaminease, millet straw/paddy third, red cell morphology;Creatinine, flesh
Acid kinase, creatine kinase isozyme, indirect bilirubin, alkaline phosphatase, phosphorus, chlorine, magnesium, sodium, urea nitrogen, uric acid urinates glucose,
Prealbumin, globulin, lactic dehydrogenase, body ketone urinates vitamin C, erythrocyte sedimentation rate, nitrite, total bilirubin, total protein, total calcium;
The classified variable is the z-score values in ultrasonic cardiography diagram data;
The classification grade is included without coronary artery pathological changes (NCAL), small-sized coronary aneurysm (SCAL), medium-sized coronal dynamic
Arteries and veins knurl (MCAL) and huge coronary aneurysm (GCAL);
It is described to be without the corresponding classified variable value of coronary artery pathological changes:z-score<2.5;
The corresponding classified variable value of the small-sized coronary aneurysm is:2.5≦z-score<5.0;
The corresponding classified variable value of the medium-sized coronary aneurysm is:5.0≦z-score<10.0;
The corresponding classified variable value of the huge coronary aneurysm is:z-score≧10.0.
The particular content of data scrubbing is described in step S2:
To there is the index more than missing data, filled up using multiple interpolation enthesis, wherein using predictive variable
Average carries out interpolation;
To existing, missing data is few and missing data occurs at random, then missing data is deleted;
The particular content of the data integration is:By the number in all tables of data in electronic health record data described in step S1
According to being merged into synthesis table;
The particular content of data conversion is:The value of each attribute in the synthesis table is converted into the shape of data mining
Formula, and respectively to all properties the characteristics of carry out normalization processing and coding.
Rapid S3 carries out data mining to Kawasaki disease data set using association rules method and concretely comprised the following steps:
S31:Obtain the data of all patients with Kawasaki disease for suffering from Coronary Artery Lesions;
S32:Rule analysis is associated to the obtained data of step S31, obtained using rule constraint and interest-degree constraint
The related Strong association rule of the concurrent Coronary Artery Lesions of Kawasaki disease;
S33:Using the predictive variable occurred in Strong association rule as the concurrent coronary artery pathological changes of Kawasaki disease hazards.
The specific method of Strong association rule acquisition is in step S32:
Correlation rule X → Y is set up, X is condition:Including at least one predictive variable, Y is result;Including wherein coronal dynamic
One classification grade of arteries and veins lesion;
Set min confidence and minimum support;
When the support and confidence level of correlation rule are all higher than min confidence and minimum support, then associated by force
Rule.
What step S4 random forests risk forecast model was set up and assessed concretely comprises the following steps:
S41:The Kawasaki disease data set that step S2 is obtained is according to N:1 ratio cut partition is training sample and test sample;
S42:Using the hazards occurred in step S3 as forecast model prediction index;
S43:Set up the random forest risk forecast model to the concurrent Coronary Artery Lesions of Kawasaki disease;
Select the number mtry and parameter of Split Attribute to generate the number ntree of decision tree to observe mould by adjusting parameter
The predicated error of type sets up random forest risk profile with ntree situation of change with the optimal random forest number of this determination
Model;
S44:According to step S41 test sample, the AUC areas of random forest risk forecast model are calculated.
Beneficial effects of the present invention:The hazards management system of Diagnosisof Kawasaki Disease with Coronary Artery Involvement is set up, and to Kawasaki disease
Data are counted, and the hazards of Diagnosisof Kawasaki Disease with Coronary Artery Involvement are excavated from substantial amounts of statistics.Utilize strong association
The rule discovery hazards related to disease, using random forests algorithm set up Random Forest model precision of prediction up to much surpass
Go out traditional Multivariate Logistic Regression model, improve the quality and efficiency of analysis;Usability is high, and good reliability is related to
Data source is wide, it is easy to accomplish, artificial workload is small.
Brief description of the drawings
Fig. 1 is management system block diagram of the invention;
Fig. 2 is data digging flow figure of the invention;
Fig. 3 is the analysis result figure of association rules method;
Fig. 4 generates the changing trend diagram of decision tree number for the predicated error of random forest risk forecast model with parameter;
Fig. 5 is the figure that predicts the outcome of random forest risk forecast model;
Fig. 6 is the ROC curve comparison diagram of random forest risk forecast model and Logistic regression models;
Embodiment
The embodiment and operation principle to the present invention are described in further detail below in conjunction with the accompanying drawings.
As shown in Figure 1:A kind of concurrent coronary artery pathological changes hazards management system of Kawasaki disease, including management control mould
Block, typing module, the output end connection Kawasaki disease of the management control module are provided with the input of the management control module
Database, the output end of the Kawasaki disease database is connected with data processor;The typing module is used for typing Kawasaki disease number
According to;After the management control module is pre-processed according to the Kawasaki disease data of typing, classification is preserved to Kawasaki disease database;Institute
Stating data processor is used to carry out all data in Kawasaki disease database data scrubbing, data integration and data conversion.
In the present embodiment, the Kawasaki disease database includes the personal information database of patients with Kawasaki disease, clinical examination
Database, ultrasonic cardiography chart database, diagnostic result database and electronic health record database.
Figure it is seen that a kind of concurrent coronary artery pathological changes hazards method for digging of Kawasaki disease, including following step
Suddenly:
S1:Personal data, clinical examination data, the ultrasonic cardiography of all patients with Kawasaki disease are obtained from Kawasaki disease database
Diagram data and diagnostic result data, electronic health record data;
The pre-treatment step is specially:
S11:Obtain personal data, clinical examination data, ultrasonic cardiography diagram data and the diagnosis knot of all patients with Kawasaki disease
Fruit data, electronic health record data;
S12:All data obtained according to step S11, take out all predictive variable and predictive variable average;
S13:Determine patients with Kawasaki disease whether there is classified variable, classification grade and each grade institute for occurring coronary artery pathological changes
Corresponding classified variable value;
S14:All patients with Kawasaki disease are classified, preserved to Kawasaki disease database.
In the present embodiment, totally 8501 patients in electronic health record database, have 5020 patients to be diagnosed as Kawasaki disease,
Coronary Artery Lesions occur for wherein 343 people and Coronary Artery Lesions do not occur for 4677 people.
In the present embodiment, sex, age and 52 laboratory examinations of the predictive variable including patients with Kawasaki disease refer to
Mark, 52 laboratory checking index are:C reactive protein, leucocyte, monocyte absolute value, lymphocyte absolute value, in
Property granulocyte absolute value, red blood cell, hemoglobin, packed cell volume, MCVU, mean corpuscular hemoglobin is dense
Degree, RDW, RDW absolute value, platelet count, mean platelet volume, large platelet cell
Than, MPW, thrombocytocrit, the absolute value of eosinophil, with reference to bilirubin, total bile acid, albumin,
Serum complement C4, bilirubin, Urine proteins, gamma-glutamyl turns peptide, glutamic-pyruvic transaminase, glutamic-oxalacetic transaminease, millet straw/paddy third, red blood cell
Form;Creatinine, creatine kinase, creatine kinase isozyme, indirect bilirubin, alkaline phosphatase, phosphorus, chlorine, magnesium, sodium, urea nitrogen, urine
Acid, urine glucose, prealbumin, globulin, lactic dehydrogenase, body ketone, urine vitamin C, erythrocyte sedimentation rate, nitrite, total bilirubin,
Total protein, total calcium;
The classified variable is the z-score values in ultrasonic cardiography diagram data;
The classification grade is included without coronary artery pathological changes (NCAL), small-sized coronary aneurysm (SCAL), medium-sized coronal dynamic
Arteries and veins knurl (MCAL) and huge coronary aneurysm (GCAL);
It is described to be without the corresponding classified variable value of coronary artery pathological changes:z-score<2.5;
The corresponding classified variable value of the small-sized coronary aneurysm is:2.5≦z-score<5.0;
The corresponding classified variable value of the medium-sized coronary aneurysm is:5.0≦z-score<10.0;
The corresponding classified variable value of the huge coronary aneurysm is:z-score≧10.0.
S2:All data that the data processor is obtained to step S1 carry out data scrubbing, data integration and data and become
Change, obtain Kawasaki disease data set;
The particular content of the data scrubbing is:
To there is the index more than missing data, filled up using multiple interpolation enthesis, wherein using predictive variable
Average carries out interpolation;
To existing, missing data is few and missing data occurs at random, then missing data is deleted;
The particular content of the data integration is:
Data in all tables of data in electronic health record data described in step S1 are merged into synthesis table;
The particular content of data conversion is:
The value of each attribute in the synthesis table is converted into the form of data mining, and respectively to all properties the characteristics of
Carry out normalization processing and coding.
In the present embodiment, normalization processing and coding are:
For the age, it is divided into less than 2 years old, 2 years old to 5 years old, 5 years old to 7 years old, more than 7 years old 4 intervals, successively with a, b, c, d
Represent.
To the Biological indicators of laboratory inspection, such as it is according to the range of normal value of c reactive protein<8mg/L, then be divided into<
8mg/L He≤the intervals of 8mg/L two, successively with N, H is represented.Completed using the SQL statement of MySQL database.
S3:Data mining is carried out to Kawasaki disease data set using association rules method, obtains related to coronary artery pathological changes
Hazards;Using totally 343 data sets of patients with Kawasaki disease for suffering from Coronary Artery Lesions are excavated in total sample, specific steps
For:
S31:Obtain the data of all patients with Kawasaki disease for suffering from Coronary Artery Lesions;
S32:Rule analysis is associated to the obtained data of step S31, obtained using rule constraint and interest-degree constraint
The related Strong association rule of the concurrent Coronary Artery Lesions of Kawasaki disease;
Specific method is:
Correlation rule X → Y is set up, X is condition:Including at least one predictive variable, Y is result;Including wherein coronal dynamic
One classification grade of arteries and veins lesion;
Set min confidence and minimum support;
When the support and confidence level of correlation rule are all higher than min confidence and minimum support, then associated by force
Rule.
In the present embodiment, min confidence is 0.9, and minimum support is 0.01.
S33:Using the predictive variable occurred in Strong association rule as the concurrent coronary artery pathological changes of Kawasaki disease hazards.
In the present embodiment, occur in Strong association rule 30 predictions related to the concurrent Coronary Artery Lesions of Kawasaki disease are become
Amount is as hazards for predicting, these indexs are:Sex, age, packed cell volume, Platelet large cell ratio, C reaction eggs
In vain, platelet count, glutamic-oxalacetic transaminease, glutamic-pyruvic transaminase, millet straw/paddy third, erythrocyte sedimentation rate, mean platelet volume, monocyte are exhausted
To value, albumin, ketoboidies, serium inorganic phosphorus, blood chlorine, alkaline phosphatase, red blood cell, NCHC, acidophil absolute value,
Urea nitrogen, neutrophil leucocyte absolute value, mean corpuscular volume (MCV), RDW, red cell morphology, red cell distribution are exhausted
To value, urine protein, total protein, prealbumin, average hemoglobin amount.
As shown in figure 3, preceding 1000 correlation rules are found by counting, male, the rise of large platelet cells ratio, blood are small
The rise of the plate dispersion of distribution, urea nitrogen rise and serium inorganic phosphorus rise have stronger correlation with the concurrent coronary artery pathological changes of Kawasaki disease.
S4:Random forest risk forecast model is set up to the concurrent Coronary Artery Lesions of Kawasaki disease with random forests algorithm, and calculated
The AUC areas of the random forest risk forecast model.
Concretely comprise the following steps:
S41:The Kawasaki disease data set that step S2 is obtained is according to N:1 ratio cut partition is training sample and test sample;
In the present embodiment, according to 3:Data set is divided into training sample (3765) and test sample by 1 ratio at random
(1255).
Training sample is used to model, and test sample is used for model evaluation.
S42:Using the hazards occurred in step S3 as forecast model prediction index;
Using 30 indexs occurred in above-mentioned correlation rule as model prediction index;
S43:Set up the random forest risk forecast model to the concurrent Coronary Artery Lesions of Kawasaki disease;
Select the number mtry and parameter of Split Attribute to generate the number ntree of decision tree to observe mould by adjusting parameter
The predicated error of type sets up random forest risk profile with ntree situation of change with the optimal random forest number of this determination
Model;
Because mtry default value is the root mean square of attribute number, the predictive variable number that the present invention is selected is 54, because
This starts adjustment using mtry as 8, and generation decision tree number ntree changes to 400 from 100, respectively the predicated error of observing and nursing
With ntree situation of change, random forest risk forecast model is set up with the optimal decision tree generation number of this determination.
S44:According to step S41 test sample, the AUC areas of random forest risk forecast model are calculated.
As shown in figure 4, diminish with the decision tree number of generation, the macro-forecast error of random forest risk forecast model
Reduce therewith, from fig. 3 it can also be seen that optimal generation decision tree number is 80 or so, to without coronary artery pathological changes (NCAL), small
The predicated error of type coronary aneurysm (SCAL), medium-sized coronary aneurysm (MCAL) and huge coronary aneurysm (GCAL) all reaches
Stable state, and all control below 0.1.
Fig. 5 be random forest risk forecast model the figure that predicts the outcome, it has been observed that, c reactive protein, erythrocyte sedimentation rate, sex,
Age, mean corpuscular hemoglobin concentration (MCHC), albumin, prealbumin, eosinophil absolute value are in model prediction
The higher predictive variable of importance;It is glutamic-pyruvic transaminase, blood platelet, red in addition, with the increase of severity degree of coronary
Cell pack, glutamic-oxalacetic transaminease, body ketone, millet straw/paddy third, mean corpuscular volume (MCV), Urine proteins, urea nitrogen, total protein, red blood cell
Importance of the dispersion of distribution absolute value in prediction is consequently increased.
Fig. 6 is random forest risk forecast model (Randomforest) and Multivariate Logistic Regression model
Operating characteristic (ROC) curve of (Logistec Regression), by calculating respective AUC areas, random forest risk is pre-
The AUC areas for surveying model are 98.2%, and the AUC areas of regression model are 59.2%, it will be apparent that, the prediction effect of Random Forest model
Fruit is more excellent than the prediction effect of regression model.
It should be pointed out that described above is not limitation of the present invention, the present invention is also not limited to the example above,
What those skilled in the art were made in the essential scope of the present invention changes, is modified, adds or replaces, and also should
Belong to protection scope of the present invention.
Claims (9)
1. a kind of concurrent coronary artery pathological changes hazards management system of Kawasaki disease, it is characterised in that:Including management control module,
Typing module, the output end connection Kawasaki disease data of the management control module are provided with the input of the management control module
Storehouse, the output end of the Kawasaki disease database is connected with data processor;
The typing module is used for typing Kawasaki disease data;
After the management control module is pre-processed according to the Kawasaki disease data of typing, classification is preserved to Kawasaki disease database;
The data processor is used to carry out all data in Kawasaki disease database data scrubbing, data integration and data change
Change.
2. the concurrent coronary artery pathological changes hazards management system of Kawasaki disease according to claim 1, it is characterised in that:Institute
State the personal information database of Kawasaki disease database including patients with Kawasaki disease, clinical examination database, ultrasonic cardiography chart database,
Diagnostic result database and electronic health record database.
3. a kind of concurrent coronary artery pathological changes hazards method for digging of Kawasaki disease, it is characterised in that comprise the following steps:
S1:Personal data, clinical examination data, the echocardiogram number of all patients with Kawasaki disease are obtained from Kawasaki disease database
According to and diagnostic result data, electronic health record data;
S2:All data that the data processor is obtained to step S1 carry out data scrubbing, data integration and data conversion, obtain
To Kawasaki disease data set;
S3:Data mining is carried out to Kawasaki disease data set using association rules method, the danger related to coronary artery pathological changes is obtained
Dangerous factor;
S4:With random forests algorithm the concurrent Coronary Artery Lesions of Kawasaki disease are set up with random forest risk forecast model, and calculates described
The AUC areas of random forest risk forecast model.
4. the concurrent coronary artery pathological changes hazards method for digging of Kawasaki disease according to claim 3, it is characterised in that
The data that step S1 is obtained have carried out pretreated data for the managing system device, and the pre-treatment step is specially:
S11:Obtain personal data, clinical examination data, ultrasonic cardiography diagram data and the diagnostic result number of all patients with Kawasaki disease
According to, electronic health record data;
S12:All data obtained according to step S11, take out all predictive variable and predictive variable average;
S13:Determine patients with Kawasaki disease is whether there is corresponding to classified variable, classification grade and each grade for occurring coronary artery pathological changes
Classified variable value;
S14:All patients with Kawasaki disease are classified, preserved to Kawasaki disease database.
5. the concurrent coronary artery pathological changes hazards method for digging of Kawasaki disease according to claim 4, it is characterised in that:Institute
Stating predictive variable includes sex, age and 52 laboratory checking index of patients with Kawasaki disease, and 52 laboratory examinations refer to
It is designated as:C reactive protein, leucocyte, monocyte absolute value, lymphocyte absolute value, neutrophil leucocyte absolute value, red blood cell,
Hemoglobin, packed cell volume, MCVU, mean corpuscular hemoglobin concentration (MCHC), RDW is red thin
Born of the same parents' dispersion of distribution absolute value, platelet count, mean platelet volume, large platelet cell ratio, MPW, blood platelet
Hematocrit, the absolute value of eosinophil, with reference to bilirubin, total bile acid, albumin, serum complement C4, bilirubin urinates egg
In vain, gamma-glutamyl turns peptide, glutamic-pyruvic transaminase, glutamic-oxalacetic transaminease, millet straw/paddy third, red cell morphology;Creatinine, creatine kinase, flesh
Acid kinase isodynamic enzyme, indirect bilirubin, alkaline phosphatase, phosphorus, chlorine, magnesium, sodium, urea nitrogen, uric acid, urine glucose, prealbumin,
Globulin, lactic dehydrogenase, body ketone urinates vitamin C, erythrocyte sedimentation rate, nitrite, total bilirubin, total protein, total calcium;
The classified variable is the z-score values in ultrasonic cardiography diagram data;
The classification grade is included without coronary artery pathological changes, small-sized coronary aneurysm, medium-sized coronary aneurysm and huge coronal dynamic
Arteries and veins knurl;
It is described to be without the corresponding classified variable value of coronary artery pathological changes:z-score<2.5;
The corresponding classified variable value of the small-sized coronary aneurysm is:2.5≦z-score<5.0;
The corresponding classified variable value of the medium-sized coronary aneurysm is:5.0≦z-score<10.0;
The corresponding classified variable value of the huge coronary aneurysm is:z-score≧10.0.
6. the concurrent coronary artery pathological changes hazards method for digging of Kawasaki disease according to claim 3, it is characterised in that:
The particular content of data scrubbing is described in step S2:
To there is the index more than missing data, filled up using multiple interpolation enthesis, wherein using predictive variable average
Carry out interpolation;
To existing, missing data is few and missing data occurs at random, then missing data is deleted;
The particular content of the data integration is:Data in all tables of data in electronic health record data described in step S1 are closed
And into synthesis table;
The particular content of data conversion is:The value of each attribute in the synthesis table is converted into the form of data mining,
And respectively to all properties the characteristics of carry out normalization processing and coding.
7. the concurrent coronary artery pathological changes hazards of the Kawasaki disease according to claim 6 based on data mining technology is pre-
Survey method, it is characterised in that step S3 carries out the specific steps of data mining using association rules method to Kawasaki disease data set
For:
S31:Obtain the data of all patients with Kawasaki disease for suffering from Coronary Artery Lesions;
S32:Rule analysis is associated to the obtained data of step S31, Kawasaki is obtained using rule constraint and interest-degree constraint
The related Strong association rule of sick concurrent Coronary Artery Lesions;
S33:Using the predictive variable occurred in Strong association rule as the concurrent coronary artery pathological changes of Kawasaki disease hazards.
8. the concurrent coronary artery pathological changes hazards of the Kawasaki disease according to claim 7 based on data mining technology is pre-
Survey method, it is characterised in that the specific method of Strong association rule acquisition is in step S32:
Correlation rule X → Y is set up, X is condition:Including at least one predictive variable, Y is result;Including wherein coronary artery disease
The classification grade become;
Set min confidence and minimum support;
When the support and confidence level of correlation rule are all higher than min confidence and minimum support, then strong association rule are obtained
Then.
9. the concurrent coronary artery pathological changes danger of the Kawasaki disease based on data mining technology according to claim 3-8 any one
The Forecasting Methodology of dangerous factor, it is characterised in that what step S4 random forests risk forecast model was set up and assessed concretely comprises the following steps:
S41:The Kawasaki disease data set that step S2 is obtained is according to N:1 ratio cut partition is training sample and test sample;
S42:Using the hazards occurred in step S3 as forecast model prediction index;
S43:Set up the random forest risk forecast model to the concurrent Coronary Artery Lesions of Kawasaki disease;
The number mtry and parameter of Split Attribute is selected to generate the number ntree of decision tree come observing and nursing by adjusting parameter
Predicated error sets up random forest risk forecast model with ntree situation of change with the optimal random forest number of this determination;
S44:According to step S41 test sample, the AUC areas of random forest risk forecast model are calculated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710154709.0A CN106980757A (en) | 2017-03-15 | 2017-03-15 | The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710154709.0A CN106980757A (en) | 2017-03-15 | 2017-03-15 | The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106980757A true CN106980757A (en) | 2017-07-25 |
Family
ID=59339518
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710154709.0A Pending CN106980757A (en) | 2017-03-15 | 2017-03-15 | The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106980757A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107688653A (en) * | 2017-09-01 | 2018-02-13 | 武汉倚天剑科技有限公司 | User behavior data digging system and its method based on network shallow-layer data |
CN108039207A (en) * | 2017-12-06 | 2018-05-15 | 无锡市儿童医院 | The assessment system and method for thrombotic risk factor under Kawasaki disease |
CN109215788A (en) * | 2018-08-22 | 2019-01-15 | 四川大学 | A kind of prediction technique and device of mucous membrane of mouth disease damage canceration degree of danger |
CN110335679A (en) * | 2019-06-21 | 2019-10-15 | 山东大学 | A kind of Prediction of survival method and system based on more granularity graph mode excavations |
CN110957034A (en) * | 2018-09-26 | 2020-04-03 | 金敏 | Disease prediction system |
CN111241148A (en) * | 2018-11-29 | 2020-06-05 | 金敏 | Medical data sorting method, medical data sorting device and electronic equipment |
CN113270194A (en) * | 2021-04-22 | 2021-08-17 | 深圳市雅士长华智能科技有限公司 | Health data management system based on cloud computing |
CN113380329A (en) * | 2021-06-08 | 2021-09-10 | 重庆医科大学 | Prediction system and device for resistance of first injection gamma globulin of Kawasaki disease patient child |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104866713A (en) * | 2015-05-12 | 2015-08-26 | 南京霁云信息科技有限公司 | Kawasaki disease and fever diagnosis system based on embedding of incremental local discrimination subspace |
CN105095673A (en) * | 2015-08-26 | 2015-11-25 | 中国人民解放军军事医学科学院放射与辐射医学研究所 | Construction method of chronic disease risk model on the basis of medical big data mining |
CN106295229A (en) * | 2016-08-30 | 2017-01-04 | 青岛大学 | A kind of mucocutaneous lymphnode syndrome grade predicting method based on medical data modeling |
CN106339593A (en) * | 2016-08-31 | 2017-01-18 | 青岛睿帮信息技术有限公司 | Kawasaki disease classification and prediction method based on medical data modeling |
-
2017
- 2017-03-15 CN CN201710154709.0A patent/CN106980757A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104866713A (en) * | 2015-05-12 | 2015-08-26 | 南京霁云信息科技有限公司 | Kawasaki disease and fever diagnosis system based on embedding of incremental local discrimination subspace |
CN105095673A (en) * | 2015-08-26 | 2015-11-25 | 中国人民解放军军事医学科学院放射与辐射医学研究所 | Construction method of chronic disease risk model on the basis of medical big data mining |
CN106295229A (en) * | 2016-08-30 | 2017-01-04 | 青岛大学 | A kind of mucocutaneous lymphnode syndrome grade predicting method based on medical data modeling |
CN106339593A (en) * | 2016-08-31 | 2017-01-18 | 青岛睿帮信息技术有限公司 | Kawasaki disease classification and prediction method based on medical data modeling |
Non-Patent Citations (7)
Title |
---|
TAE YEUN KIM ET AL: "Predictive risk factors for coronary artery abnormalities in Kawasaki disease", 《EUR J PEDIATR》 * |
V.E.A.HONKANEN: "Clinical Relevance of the Risk Factors for Coronary Artery Inflammation in Kawasaki Disease", 《PEDIATRIC CARDIOLOGY》 * |
刘天时等: "《软件案例分析》", 31 January 2016, 清华大学出版社 * |
张影等: "《预测与评价》", 31 May 2015, 天津大学出版社 * |
曹文哲等: "基于Logistic回归和随机森林算法的2型糖尿病并发视网膜病变风险预测及对比研究", 《中国医疗设备》 * |
樊楚等: "基于数据挖掘技术建立的BP神经网络模型鉴别儿童川崎病与发热性疾病的研究", 《中国循证儿科杂志》 * |
段泓宇等: "川崎病患儿并发冠状动脉损害的高危因素分析", 《临床儿科杂志》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107688653A (en) * | 2017-09-01 | 2018-02-13 | 武汉倚天剑科技有限公司 | User behavior data digging system and its method based on network shallow-layer data |
CN108039207A (en) * | 2017-12-06 | 2018-05-15 | 无锡市儿童医院 | The assessment system and method for thrombotic risk factor under Kawasaki disease |
CN109215788A (en) * | 2018-08-22 | 2019-01-15 | 四川大学 | A kind of prediction technique and device of mucous membrane of mouth disease damage canceration degree of danger |
CN109215788B (en) * | 2018-08-22 | 2022-01-18 | 四川大学 | Method and device for predicting canceration risk degree of oral mucosa lesion |
CN110957034A (en) * | 2018-09-26 | 2020-04-03 | 金敏 | Disease prediction system |
CN110957043A (en) * | 2018-09-26 | 2020-04-03 | 金敏 | Disease prediction system |
CN111241148A (en) * | 2018-11-29 | 2020-06-05 | 金敏 | Medical data sorting method, medical data sorting device and electronic equipment |
CN110335679A (en) * | 2019-06-21 | 2019-10-15 | 山东大学 | A kind of Prediction of survival method and system based on more granularity graph mode excavations |
CN113270194A (en) * | 2021-04-22 | 2021-08-17 | 深圳市雅士长华智能科技有限公司 | Health data management system based on cloud computing |
CN113380329A (en) * | 2021-06-08 | 2021-09-10 | 重庆医科大学 | Prediction system and device for resistance of first injection gamma globulin of Kawasaki disease patient child |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106980757A (en) | The concurrent coronary artery pathological changes hazards management system of Kawasaki disease and method for digging | |
CN109378072A (en) | A kind of abnormal fasting blood sugar method for early warning based on integrated study Fusion Model | |
Sun et al. | Early prediction of acute kidney injury in critical care setting using clinical notes and structured multivariate physiological measurements. | |
Blanco et al. | Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS | |
Stephan et al. | Clinical evaluation of circulating blood volume in critically ill patients—contribution of a clinical scoring system | |
Chaurasia et al. | Chronic kidney disease: a predictive model using decision tree | |
CN107194138A (en) | A kind of fasting blood-glucose Forecasting Methodology based on physical examination data modeling | |
Fialho et al. | Disease-based modeling to predict fluid response in intensive care units | |
Shahin et al. | Data mining in healthcare information systems: case studies in Northern Lebanon | |
Xiong et al. | Prediction of hemodialysis timing based on LVW feature selection and ensemble learning | |
Zhang et al. | Model construction for biological age based on a cross-sectional study of a healthy Chinese Han population | |
CN109585011A (en) | The Illnesses Diagnoses method and machine readable storage medium of chest pain patients | |
Pilloud et al. | Re-evaluating traditional markers of stress in an archaeological sample from central California | |
CN114974585A (en) | Construction method of early risk prediction and evaluation model of metabolic syndrome in gestational period | |
Newaz et al. | A case study on risk prediction in heart failure patients using random survival forest | |
Wadhawan et al. | ETCD: An effective machine learning based technique for cardiac disease prediction with optimal feature subset selection | |
Kaur et al. | Artificial Intelligence approaches for Predicting Hypertension Diseases: Open Challenges and Research Issues | |
CN111627559B (en) | System for predicting patient mortality risk | |
CN110895969A (en) | Atrial fibrillation prediction decision tree and pruning method thereof | |
Vilas-Boas et al. | Hourly prediction of organ failure and outcome in intensive care based on data mining techniques | |
Khitan et al. | Predicting adverse outcomes in chronic kidney disease using machine learning methods: data from the modification of diet in renal disease | |
Kumar | A survey on data mining techniques for prediction of heart diseases | |
Sanaiha et al. | Morbidity and mortality associated with blood transfusions in elective adult cardiac surgery | |
Zhang et al. | Prediction of Gestational Diabetes Mellitus under Cascade and Ensemble Learning Algorithm | |
CN114141359A (en) | Liquid treatment early warning system for general anesthesia abdominal operation patient |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170725 |
|
RJ01 | Rejection of invention patent application after publication |