CN110895972A - Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction - Google Patents

Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction Download PDF

Info

Publication number
CN110895972A
CN110895972A CN201811068302.7A CN201811068302A CN110895972A CN 110895972 A CN110895972 A CN 110895972A CN 201811068302 A CN201811068302 A CN 201811068302A CN 110895972 A CN110895972 A CN 110895972A
Authority
CN
China
Prior art keywords
atrial fibrillation
decision tree
prediction
selecting
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811068302.7A
Other languages
Chinese (zh)
Inventor
张树龙
张敏
杨慧英
冯雪颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN201811068302.7A priority Critical patent/CN110895972A/en
Publication of CN110895972A publication Critical patent/CN110895972A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)

Abstract

A method for selecting indexes through an atrial fibrillation artificial intelligence experiment and application of a prediction decision tree in atrial fibrillation prediction belong to the field of data processing, and aim to solve the problem of selecting indexes which more accurately reflect atrial fibrillation, the method comprises the following steps of S1, constructing a decision tree; s2, adjusting parameters to optimize a decision tree; and S3, performing experiments on possible values of various parameters, and finally selecting an optimal experiment result as a main index of decision tree prediction.

Description

Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction
Technical Field
The invention belongs to the field of data processing, and relates to a method for constructing an atrial fibrillation prediction decision tree and a method for selecting indexes in an atrial fibrillation artificial intelligence experiment.
Background
Atrial fibrillation is a supraventricular tachyarrhythmia characterized by rapid, chaotic electrical atrial activity. Atrial fibrillation is mainly shown on an electrocardiogram by disappearance of P waves and replacement with irregular atrial fibrillation waves; RR intervals are absolutely irregular (when atrioventricular conduction is present). This is also the main basis for judging atrial fibrillation in medical field and the like at present. Atrial fibrillation is medically classified mainly into paroxysmal atrial fibrillation (paroxysmal AF), persistent atrial fibrillation (persistent AF), long-range persistent atrial fibrillation (long-standing persistent AF), and permanent atrial fibrillation (persistent AF) according to the duration of an episode of atrial fibrillation. The specific classification is shown in Table 1.
TABLE 1.1 detailed classification of atrial fibrillation in medicine
Figure RE-GDA0001851358430000011
Atrial fibrillation is a very common arrhythmia in clinic, the incidence rate of the atrial fibrillation in China is 0.5% -1%, and the incidence probability is higher with the increase of age. The risk of atrial fibrillation of the hypertensive patients is 1.7 times higher than that of the normotensive patients, and at present, 33 percent of patients with atrial fibrillation are caused by hypertension. In response to the high incidence of atrial fibrillation in hypertensive patients, it is even thought that atrial fibrillation is another manifestation of damage to the target organs of hypertension. But at present, no better index exists clinically for predicting the occurrence of AF of hypertension patients. In addition, some patients with atrial fibrillation do not have obvious clinical symptoms, so that the patients are unconsciously exposed to the risks of various critical diseases, and when clinical symptoms appear or the diseases are sudden, cardiovascular organic lesions are often caused, so that the physical health of the patients is greatly influenced and even the life of the patients is threatened. Therefore, it is very important to study the probability of atrial fibrillation in the population of hypertensive patients.
At present, a plurality of methods for predicting atrial fibrillation exist, and the method starts from the aspect of treatment of atrial fibrillation in the medical field. Although CHA exists internationally2DS2The VASc score (hypertension, age, diabetes, stroke, vasculopathy, gender, congestive heart failure) and the hach score (hypertension, age, onset of cerebral ischemia, chronic obstructive pulmonary disease, heart failure) are used to predict atrial fibrillation, but both of these scores have various limitations that make the prediction method non-normative and the prediction result inaccurate. In the field of computers, it is common to use the electrocardiogram of the patient, to determine the P-wave and to analyze the RRThe variation rule of interval distribution along with time and other factors are used for judging whether the patient has atrial fibrillation, and the used algorithm has the aspects of statistics and machine learning. Some characteristic indexes of a human body are detected through a smart watch for prediction, the face is scanned through a smart phone for prediction through the face color of the human body, and even for asymptomatic patients, the Holter heart rate of the patients is directly tested through a medical instrument for prediction. These are still lacking in standardization and have no particular standard.
Disclosure of Invention
In order to solve the problem of selecting an index which more accurately reflects atrial fibrillation, the invention provides the following scheme:
a method for selecting indexes in an atrial fibrillation artificial intelligence experiment comprises the following steps:
s1, constructing a decision tree;
s2, adjusting parameters to optimize a decision tree;
and S3, performing experiments on possible values of various parameters, and finally selecting an optimal experiment result which is used as a main index for decision tree prediction.
Further, the main indexes are three attributes of A peak, ef and last in the attributes of the cardiac ultrasound.
Further, the main indexes are XGN (cardiac function grade), peak a (cardiac ultrasound index), FS (rheumatic valvular heart disease), FJB (interstitial lung disease), LVPWD (cardiac ultrasound index), EF (cardiac ultrasound index), FDMB1 (pulmonary valve blood flow velocity), FDMB (pulmonary valve), LAD (cardiac ultrasound index), GXB (coronary heart disease), TNB (diabetes), MCHC (hemoglobin concentration), peak E (cardiac ultrasound index).
Further, the method for constructing the decision tree is as follows:
step 1: if the data set S belongs to the same category, a leaf node is created, a corresponding category label is marked, and the tree building is stopped; otherwise, performing the step 2;
step 2: calculating information Gain rates Gain-rate (A) of all attributes in the data set S;
and step 3: selecting an attribute A of the maximum information gain rate;
and 4, step 4: establishing the attribute A as a root node of a decision tree T, wherein the T is a decision tree to be established;
and 5: dividing the data set into a plurality of subsets according to different values of the attribute A, circularly executing the steps 1-4 on the subset Sv, and constructing a subtree Tv, wherein the Sv is a sample subset with the value of the attribute A being v;
step 6: adding the subtree Tv to the corresponding branch of the decision tree T;
and 7: and (5) finishing the circulation to obtain a decision tree T.
The invention also relates to application of the prediction decision tree in atrial fibrillation prediction.
Has the advantages that: the invention makes more reasonable selection on the atrial fibrillation prediction index through artificial intelligence and big data processing, the index is obtained through big data processing and can more accurately reflect the index of atrial fibrillation, the indexes are used for evaluating the atrial fibrillation and reducing the missed detection of the atrial fibrillation pair.
Drawings
FIG. 1 is a schematic diagram of a decision tree structure;
FIG. 2 is a schematic illustration of a medical data manuscript;
FIG. 3 is a schematic diagram of a derived Excel table;
FIG. 4 is a schematic representation of cardiac ultrasound properties;
FIG. 5 is a schematic view of a 4weka operating interface;
FIG. 6 is a schematic diagram of decision trees each using default values;
FIG. 7 is a schematic diagram of decision tree accuracy;
FIG. 8 is a schematic diagram of a decision tree of 154 factors;
FIG. 9 is a schematic diagram of decision tree accuracy.
Detailed Description
Example 1:
in order to solve the problem of building a decision tree for atrial fibrillation prediction, the invention provides the following technical scheme: a method of constructing an atrial fibrillation prediction decision tree, comprising:
step 1: if the data set S belongs to the same category, a leaf node is created, a corresponding category label is marked, and the tree building is stopped; otherwise, performing the step 2;
step 2: calculating information Gain rates Gain-rate (A) of all attributes in the data set S;
and step 3: selecting an attribute A of the maximum information gain rate;
and 4, step 4: establishing the attribute A as a root node of a decision tree T, wherein the T is a decision tree to be established;
and 5: dividing the data set into a plurality of subsets according to different values of the attribute A, circularly executing the steps 1-4 on the subset Sv, and constructing a subtree Tv, wherein the Sv is a sample subset with the value of the attribute A being v;
step 6: adding the subtree Tv to the corresponding branch of the decision tree T;
and 7: and (5) finishing the circulation to obtain a decision tree T.
Further, the data processing method comprises the following steps: for class label missing, directly deleting the information; for attribute values missing, values are incorporated into a class or replaced with values that are most common; processing continuous values firstly needs to sort multiple data, each data is used as a threshold value to divide a data set, information gain of each division is calculated, the threshold value is selected according to the maximum gain, and the threshold value is used for dividing the data set.
Further, pruning the decision tree:
1) respectively calculating the number of three prediction misclassification samples: calculating the sum of the numbers of the mispredicted samples of all leaf nodes of the subtree Tv, and recording the sum as E1; calculating the number of prediction error score samples when the subtree Tv is pruned and replaced by leaf nodes, and recording as E2; calculating the maximum branch prediction wrong score sample number of the subtree Tv, and recording the maximum branch prediction wrong score sample number as E3;
2) and (3) comparison: e1 minimum, do not prune; e2 is minimum, pruning is carried out, and a leaf node replaces a subtree Tv; e3 is minimum, the subtree Tv is replaced by the maximum branch.
Further, the splitting attribute is selected according to the information gain ratio:
the formula of the information entropy is as follows:
Info_Gain(A)=H(S)-H(A)
Figure RE-GDA0001851358430000041
wherein S represents a data set, ciRepresents the ith class of the data set, p (c)i) Represents ciThe probability that this category is selected;
in decision tree division, the information entropy of a certain characteristic attribute is generally calculated, and if the characteristic attribute A has n different values, the characteristic attribute A divides a data set S into n small data sets, and S is usediMeaning that the probability of each small data set being selected is p(s)i) As can be seen from equation (1), each small data set siHas an information entropy of H(s)i) The information entropy calculation formula of the characteristic attribute A is as follows:
Figure RE-GDA0001851358430000042
the information gain calculation formula is as follows:
Info_Gain(A)=H(S)-H(A)(3)
the information gain ratio calculation formula is as follows:
Figure RE-GDA0001851358430000043
furthermore, the constructed decision tree is continuously adjusted by changing the parameters of the decision tree algorithm, so that the accuracy and the branch attribute value of the constructed decision tree are optimal: the J48 algorithm can modify 11 parameters, wherein default values are adopted for binary spaces, debug, savInstance, subtreeRaising, unpruned and useLaplace, five parameters of Configenefactor, minNumObj, numFolds, seed and ReduceErrorPruneng are modified and verified to continuously approximate the accurate value of the medical data; putting the data files subjected to data processing into weka software, selecting an algorithm, modifying parameters corresponding to the algorithm, running a result, performing experiments on possible values of various parameters, and finally selecting an optimal experiment result;
the experiment is divided into two branches:
the method comprises the following steps of performing a first branch experiment on a plurality of attributes of cardiac ultrasound indexes, wherein the last column is a class label, f is atrial fibrillation, z is normal, and each parameter of an algorithm uses a default value; according to the decision tree, three attributes of A peak, ef and last are used for greatly influencing atrial fibrillation in the attributes of cardiac ultrasound, specifically, the attribute is a root node A peak in the decision tree, the information gain rate is the maximum, the normal range of the attribute is 41 to 87, the first branch of the decision tree is that when a < ═ 0, a refers to the value of the A peak, the patient has atrial fibrillation, and because the data does not have non-0 number, namely when a equals 0, the patient is judged to have atrial fibrillation; when a is greater than 0, the ef attribute needs to be continuously considered, and when the ef value is less than 58, the patient is judged to be normal;
the second branch experiment, collecting the characteristic indexes of the patient, wherein the characteristic indexes comprise blood routine, first function, coagulation image, liver function, blood fat and cardiac ultrasonic index detection items as attribute columns, the last column is a class label, f is atrial fibrillation, z is normal, each parameter of the algorithm uses default values, according to the decision tree, XGN (heart function grade), A peak (cardiac ultrasonic index), FS (rheumatic heart valvular disease), FJB (interstitial lung disease), LVPWD (cardiac ultrasonic index), EF (cardiac ultrasonic index), FDMB1 (pulmonary valve blood flow velocity), FDMB (pulmonary valve), LAD (cardiac ultrasonic index), GXB (coronary heart disease), TNB (diabetes), MCHC (hemoglobin concentration) and E peak (cardiac ultrasonic index) which act on atrial fibrillation, specifically, in the decision tree, the root node is XGN, when the XGN grade is more than 1, the patient is judged to be the patient, when the XGN grade is less than or equal to 1, continuously considering the A peak, when the A peak is 0, continuously considering the FS, when the FS is more than 0, judging that the patient has atrial fibrillation, otherwise, continuously considering the FJB, when the FJB is less than or equal to 0, considering the LVPWD, when the LVPWD is less than or equal to 9, continuously considering the value of EF (namely EF1 in the decision tree), when the EF is less than or equal to 57, judging that the patient is normal, otherwise, judging that the patient is atrial fibrillation; continuing to trace back to the right branch of the LVPWD, when the LVPWD is more than 9, considering the value of the FDMB1, when the value is less than or equal to 101, judging that the patient is atrial fibrillation, otherwise, considering the LAD, when the LAD is less than or equal to 50, judging that the patient is atrial fibrillation, otherwise, judging that the patient is normal; continuously backtracking the right branch of the FJB, considering the GXB when the FJB is greater than 0, judging that the patient is normal when the GXB is less than or equal to 2, and otherwise judging that the patient is atrial fibrillation; continuously backtracking to the right branch of the FS, and judging that the patient has atrial fibrillation when the FS is larger than 0; and continuously backtracking the right branch of the peak A, considering TNB when A is larger than 0, judging that the patient is normal when TNB is smaller than or equal to 0, judging that the patient is normal when FDMB is larger than 0, considering the value E otherwise, judging that the patient is atrial fibrillation when E is larger than 72, considering the value MCHC otherwise, judging that the patient is atrial fibrillation when MCHC is smaller than or equal to 338, and traversing the whole decision tree if the patient is normal.
Example 2:
the present disclosure adopts a data mining method to establish a standard decision tree model for medical reference.
The standard terminology involved therein is explained:
data Mining (DM) is to extract valuable patterns, connections, knowledge and the like that people know in the face of massive Data which has various Data sources and is accumulated for a long time. It is to mine data and discover knowledge without assumption in advance. Data mining is a technology for searching the rule of a large amount of data by analyzing each piece of data, and mainly comprises 3 steps of data preparation, rule searching and rule representation. The data mining task comprises association analysis, cluster analysis, classification analysis, anomaly analysis, specific group analysis, evolution analysis and the like. The method is used for carrying out classification analysis and analyzing whether the hypertensive patient has atrial fibrillation.
The decision tree algorithm is a typical algorithm used for classification prediction in the field of data mining, and has low computational complexity and intuitive output results. The present invention introduces a decision tree algorithm into predicting the probability of having atrial fibrillation in a hypertensive patient.
The decision tree is a basic classification and regression method, and the invention mainly adopts a classification decision tree. The decision tree model is in a tree structure and represents a process of classifying the instances based on the characteristics in the classification problem. Compared with naive Bayes classification, the decision tree has the advantage that no domain knowledge or parameter setting is needed in the construction process, so that the decision tree is more applicable to detection type knowledge discovery in practical application. Decision tree algorithms include the ID3 algorithm, the C4.5 algorithm, and the CART algorithm. The invention adopts C4.5 algorithm to carry out experiment. C4.5 is mainly improved on the basis of ID3, and attributes with more values are preferentially selected when the attributes are selected by information gain in an ID3 algorithm. To solve this problem, the information gain is replaced by an information gain rate in the C4.5 algorithm. The decision tree is a tree structure and is composed of a root node, a series of internal nodes and leaf nodes, each node is provided with only one father node and two or more child nodes, and the nodes are connected through branches. Each internal node of the decision tree corresponds to a non-category attribute or a combination of attributes, each edge corresponds to each possible value of the attribute, and each leaf node corresponds to a category attribute value. An example of a decision tree structure is shown in fig. 1.
Aiming at the known meanings of the decision tree, the method is suitable for classifying the indexes of atrial fibrillation prediction and comprises the following steps:
c4.5 algorithm flow
Step 1: if the data set S belongs to the same category, a leaf node is created, a corresponding category label is marked, and the tree building is stopped; otherwise, performing the step 2;
step 2: calculating information Gain rates Gain-rate (A) of all attributes in the data set S;
and step 3: selecting an attribute A of the maximum information gain rate;
and 4, step 4: establishing the attribute A as a root node of a decision tree T, wherein the T is a decision tree to be established;
and 5: dividing the data set into a plurality of subsets according to different values of the attribute A, circularly executing the steps 1-4 on the subset Sv, and constructing a subtree Tv, wherein the Sv is a sample subset with the value of the attribute A being v;
step 6: adding the subtree Tv to the corresponding branch of the decision tree T;
and 7: and (5) finishing the circulation to obtain a decision tree T.
Numerical value processing: training data with missing attribute values can be processed. For class label missing, directly deleting the information; for attribute values missing, these values are either incorporated into a class or replaced with the most common values. Continuous value attributes may be processed. Processing continuous values firstly needs to sort multiple data, each data is used as a threshold value to divide a data set, information gain of each division is calculated, the threshold value is selected according to the maximum gain, and the threshold value is used for dividing the data set.
Pruning: through the above decision tree generation process, we can construct a decision tree based on a training data set, but the accuracy of the decision tree, and other performances, are some of the tasks we need to evaluate the tree. Since our resulting decision tree is purely based on the training data set, there may be some problems with overfitting. To solve this problem, we need to prune the decision tree. The basic idea of decision tree pruning is to remove a part of the tree (subtree) that does not contribute to the classification accuracy of unknown test samples, and there are two improved recursive branching methods for generating a simple and more easily understood tree: pre-pruning and post-pruning.
Pre-pruning: making decisions before branching prevents the data set from generating too many branches. Pruning is performed while constructing the decision tree.
Post pruning: mainly aims at solving the noise influence and pruning redundant branches.
Since the J48 algorithm employed in the present invention is post-pruning, the post-pruning method is described in detail herein. The post-pruning method comprises the following steps: REP (reduced Error reporting), PEP (pending Error reporting), MEP (minimum Error reporting), CCP (Cost-completing) and the like. The default pruning method for the C4.5 algorithm is REP pruning. The basic idea is as follows:
1) respectively calculating the number of three prediction misclassification samples: calculating the sum of the numbers of the mispredicted samples of all leaf nodes of the subtree Tv, and recording the sum as E1; calculating the number of prediction error score samples when the subtree Tv is pruned and replaced by leaf nodes, and recording as E2; the maximum branch prediction wrong score sample number of the subtree Tv is calculated and is denoted as E3.
2) A comparison is made. E1 minimum, do not prune; e2 is minimum, pruning is carried out, and a leaf node replaces a subtree Tv; when E3 is minimum, a "grafting" strategy is adopted, i.e. this maximum branch is used to replace the subtree Tv.
Splitting attribute selection: the criterion for split attribute selection is the fundamental difference between decision tree algorithms. It has been mentioned above that ID3 is the splitting attribute selected by the information gain, and C4.5 is the splitting attribute selected by the information gain ratio. The information entropy is an expected value of information, and for a data set, the information entropy expresses the degree of disorder of the data set. The more categories a data set contains, the greater the corresponding information entropy. The formula is as follows:
Figure RE-GDA0001851358430000071
wherein S represents a data set, ciRepresents the ith class of the data set, p (c)i) Represents ciThe probability that this category is selected;
in decision tree division, the information entropy of a certain characteristic attribute is generally calculated, and if the characteristic attribute A has n different values, the characteristic attribute A divides a data set S into n small data sets, and S is usediMeaning that the probability of each small data set being selected is p(s)i) As can be seen from equation (1), each small data set siHas an information entropy of H(s)i) The information entropy calculation formula of the characteristic attribute A is as follows:
Figure RE-GDA0001851358430000081
the information gain calculation formula is as follows:
Info_Gain(A)=H(S)-H(A) (3)
the information gain ratio calculation formula is as follows:
Figure RE-GDA0001851358430000082
application of algorithms
Description of data: the data adopted by the invention is provided by a certain hospital in Dalian and is actually measured and generated by the hypertension patients, and the data is 360 parts in total. The experimental report sheet mainly comprises white blood cell count (WBC), granulocyte absolute value (Neu #), NT-proBNP, EF (ejection fraction), LVEF (left ventricular ejection fraction), hypertension grade, whether atrial fibrillation exists or not and the like. Shown in fig. 2 is a portion of the original data item.
Data preprocessing: the data file type running on the Weka platform is the csv file, and our data file is Excel table data, so the first step now needs to convert the data file into the csv file. Other indicators in the data given by the hospital that are not considered by the present invention are filtered out, leaving only the study objects. Abnormal data is deleted, and the vacancy value attribute J48 algorithm automatically processes. As the 154D data has large magnitude order of magnitude, the invention can extract 11D data, namely the heart ultrasonic index, in a targeted manner through related medical standards to carry out more specific experiments. Such as ef (ejection fraction), a peak, e peak, etc. As shown in figure 3.
And (3) operating environment: the Waikato Environment for knowledgeable analysis, WEKA, is a free, non-commercial JAVA-based open source machine learning and data mining software, the major developers from new zealand. The official website is: the http// WEKA. wikispace. com/. WEKA is used as a public data mining working platform, a large number of machine learning algorithms capable of bearing data mining tasks are integrated, data are preprocessed, subjected to correlation analysis, classified, regressed, clustered and visualized on a new interactive interface, the WEKA is embedded into Myeclipse, and secondary development of the WEKA is facilitated; and the latest data mining algorithms are modified or added, and the mining results can be displayed in various forms, so that the user can conveniently and clearly find the required knowledge. Before mining, JDBC is required to be configured, and a driver of a database is loaded. The Weka control platform and operational interface is shown in figure 4. If the weka software is used, the control platform needs to be opened, a first option Explorer is selected to start an experiment, an opened interface is shown as an operation interface diagram, in the first step, the experiment to be performed needs to be selected through openfile options, then different experiments are performed on data according to the requirements of the experiment, and for example, the parameters of options such as data preprocessing, a classification algorithm, a clustering algorithm, association rules and the like exist. The method selects the J48 algorithm in the classification algorithm to carry out experiments according to the experiment requirements. The software operating interface is shown in fig. 5.
And (3) decision tree construction: the construction of decision trees is not unique, and unfortunately the construction of an optimal decision tree belongs to NP problems. Therefore, how to construct a good decision tree is the focus of research. The invention continuously adjusts the constructed decision tree by changing the parameters of the decision tree algorithm, so that the accuracy rate and the branch attribute value of the constructed decision tree are optimal. The J48 algorithm can modify 11 parameters, wherein default values are used for binary spaces, debug, savInstance, subtreeRaising, unpresuned and useLaplace, and five parameters of Configenefactor, minNumObj, numFolds, seed and ReduceErrorPrunning are modified. The experiment of the invention mainly modifies and verifies the remaining six parameters to continuously approximate the accurate value of the medical data, so that the accuracy and the feasibility of the decision tree are stronger. The weka software is similar to a black box, and the result can be run by only putting the processed data file into the weka to select the desired algorithm and modifying the corresponding parameters of the algorithm. All possible values of various parameters are tested, and finally, the optimal test result is selected as follows. The experiment is generally divided into two branches, one is to perform the experiment on 11 attributes of cardiac ultrasound, wherein the last column is a class label, f is atrial fibrillation, and z is normal. The experimental data contained a total of 360, 186 male and 174 female. There are 178 persons with atrial fibrillation and 182 persons with normal fibrillation (normal persons here refer to patients with pure hypertension). Default values are used for various parameters of the algorithm, and the experimental result is shown in fig. 6.
Through the decision tree, three attributes of A peak, ef and last which have great influence on atrial fibrillation can be known in the attributes of cardiac ultrasound. This property, in particular the root node a peak in the decision tree (when vanishing, meaning that atrial fibrillation has occurred.) is that the information gain rate is maximal, with a normal range of 41 to 87. In the first branch, we can see that when a < 0, the patient has atrial fibrillation, and since there is no non-0 number in the data, that is, when a < 0, the patient can be judged to have atrial fibrillation. When a is larger than 0, the ef attribute needs to be considered, and when the ef value is smaller than 58, the patient is judged to be normal. And so on the decision tree. The decision tree accuracy screenshot comprises accuracy, error rate, Kappa value and the like, and all the factors can be used for evaluating the quality of the algorithm. The invention mainly takes the accuracy as the judgment basis. From fig. 7, it can be seen that the accuracy is 83.0556%.
The second part of the experimental data contained 308 data in total. 154 characteristic indexes of the patients comprise index detection items of blood routine, Jiagong, coagulogram, liver function, blood fat, cardiac ultrasound and the like as attribute columns, the last column is a class label, f is atrial fibrillation, and z is normal. In the data, there were 162 men and 146 women. Patients with atrial fibrillation had 128 patients and normal patients had 180. Similarly to the above, the default values are used for each parameter of the algorithm, and the experimental results are shown in fig. 8.
From the decision tree we can see that among the 154 attributes contributing to atrial fibrillation are XGN (cardiac function level), peak a (cardiac ultrasound index), FS (rheumatic heart valve disease), FJB (interstitial lung disease), LVPWD (cardiac ultrasound index), EF (cardiac ultrasound index), FDMB1 (pulmonary valve blood flow velocity), FDMB (pulmonary valve), LAD (cardiac ultrasound index), GXB (coronary heart disease), TNB (diabetes), MCHC (hemoglobin concentration), peak E (cardiac ultrasound index). Some of these 13 indices have not attracted sufficient attention in medicine. Such as the effect of hemoglobin concentration on atrial fibrillation.
Specifically, in a decision tree, a root node is XGN, which indicates that the index has a great effect on the occurrence of atrial fibrillation, when the level of XGN is less than or equal to 1, the A peak is continuously considered, when the A peak is 0, the FS is continuously considered, when the FS is greater than 0, the patient is judged to have atrial fibrillation, otherwise, the FJB is continuously considered, when the FJB is less than or equal to 0, the LVPWD is considered, when the LVPWD is less than or equal to 9, the value of EF (namely EF1 in the decision tree) is continuously considered, when the EF is less than or equal to 57, the patient is judged to be normal, otherwise, the patient is atrial fibrillation; continuing to trace back to the right branch of the LVPWD, when the LVPWD is more than 9, considering the value of the FDMB1, when the value is less than or equal to 101, judging that the patient is atrial fibrillation, otherwise, considering the LAD, when the LAD is less than or equal to 50, judging that the patient is atrial fibrillation, otherwise, judging that the patient is normal; continuously backtracking the right branch of the FJB, considering the GXB when the FJB is greater than 0, judging that the patient is normal when the GXB is less than or equal to 2, and otherwise judging that the patient is atrial fibrillation; continuously backtracking to the right branch of the FS, and judging that the patient has atrial fibrillation when the FS is larger than 0; continuing to trace back the right branch of the peak A, considering TNB when A is larger than 0, judging that the patient is normal when TNB is smaller than or equal to 0, otherwise considering FDMB and judging that the patient is normal when FDMB is larger than 0, otherwise considering the value E, judging that the patient is atrial fibrillation when E is larger than 72, otherwise considering the value MCHC, judging that the patient is atrial fibrillation when MCHC is smaller than or equal to 338, otherwise, considering the value E, and repeating the steps to traverse the whole decision tree. The accuracy of this model was 85.0649%.
Through the above different experiments, the decision tree and the accuracy are comprehensively considered, and the method selects the graph 8 as the final model. The model has more consideration factors and is more comprehensive. The medical workers are more concise and elegant. The model is also approved in medicine.
Aiming at the problem that a model without unified specification is used for predicting atrial fibrillation in the medical field and a hypertensive patient has higher probability of having atrial fibrillation than a common person, the invention refers to the summary of atrial fibrillation prediction in medicine and provides an atrial fibrillation prediction method based on a decision tree to solve the problem. By the method, an intuitive and concise decision tree is established for medical research reference. The model combines a large amount of real medical data, and the accuracy of the model is ensured to be as complete as possible, wherein the accuracy of the model is 85.0649%. During the model building process, not only the potential relation among the medical indexes of the hypertensive can be mined, but also which index is more likely to cause atrial fibrillation can be mined, and some indexes are not concerned deeply in medicine. In the next work, the first point will increase the data volume, so that the model has more generalization capability and is prevented from being over-fitted. And secondly, performing better classification by using a machine learning algorithm to establish a practical and standard decision tree.
Example 3:
in order to solve the problem of selecting an index which more accurately reflects atrial fibrillation, the invention constructs a method for selecting the index in an atrial fibrillation artificial intelligence experiment, which comprises the following steps:
s1, constructing a decision tree;
s2, adjusting parameters to optimize a decision tree;
and S3, performing experiments on possible values of various parameters, and finally selecting an optimal experiment result which is used as a main index for decision tree prediction.
Further, the main indexes are three attributes of A peak, ef and last in the attributes of the cardiac ultrasound.
Further, the main indexes are XGN (cardiac function grade), peak a (cardiac ultrasound index), FS (rheumatic valvular heart disease), FJB (interstitial lung disease), LVPWD (cardiac ultrasound index), EF (cardiac ultrasound index), FDMB1 (pulmonary valve blood flow velocity), FDMB (pulmonary valve), LAD (cardiac ultrasound index), GXB (coronary heart disease), TNB (diabetes), MCHC (hemoglobin concentration), peak E (cardiac ultrasound index).
The method for constructing the decision tree is described in the embodiments 1 and 2.
The invention also relates to application of the prediction decision tree in atrial fibrillation prediction.
The invention makes more reasonable selection on the atrial fibrillation prediction index through artificial intelligence and big data processing, the index is obtained through big data processing and can more accurately reflect the index of atrial fibrillation, the indexes are used for evaluating the atrial fibrillation and reducing the missed detection of the atrial fibrillation pair.
Example 4:
atrial Fibrillation (AF) is one of the most common cardiac arrhythmias in clinic, and has a prevalence rate of about 0.4% to 1.0% in the population as a whole, and increases with age, and studies have shown that the prevalence rate is only 0.1% in the population <55 years, and up to 9% in the population >80 years. The common clinical complication of atrial fibrillation is systemic thromboembolism, cerebral apoplexy is the main embolic event that atrial fibrillation arouses, also is the highest complication of the patient's disability rate of atrial fibrillation simultaneously, and atrial fibrillation patient compares with non-atrial fibrillation patient, and the cerebral apoplexy incidence increases 5 times, and the fatality rate increases 2 times, and ischemic cerebral apoplexy is the leading cause that the fatality rate increases, and atrial fibrillation is the independent risk factor who takes place ischemic cerebral apoplexy, and its incidence increases along with the age. Other hazards of atrial fibrillation include: heart failure due to loss of the function of the atrial assist pump, sudden death due to electrical disturbance, irregular and rapid ventricular rate, and other physical and psychological disorders.
The accurate prediction of atrial fibrillation occurrence and the application of effective prevention means are important rings in the atrial fibrillation treatment process. At present, the diagnosis of atrial fibrillation is mainly based on electrocardiogram and extension of electrocardiogram such as dynamic electrocardiogram, guardianship electrocardiogram and implanted long-range electrocardiogram. In recent years, great achievements are achieved by combining the electrocardiogram technology with artificial intelligence, but the atrial fibrillation diagnosis accuracy rate is high based on the traditional 100-year electrocardiogram technology, but the missed diagnosis rate is also high, and particularly paroxysmal atrial fibrillation with infrequent attack and asymptomatic atrial fibrillation are harmed by the atrial fibrillation diagnosis method and are not inferior to symptomatic atrial fibrillation. The technology develops a new atrial fibrillation diagnosis system based on clinical big data combined with Artificial Intelligence (AI) so as to replace the traditional electrocardiogram diagnosis technology, at least as a screening diagnosis system for patients at high risk of atrial fibrillation before electrocardiogram examination, and as an important supplement for classical electrocardiogram examination.
The method and the technology are as follows: the research utilizes an information integration platform of an applicant's affiliated hospital-Dalian university affiliated Zhongshan hospital to analyze all data of the hypertensive patient such as clinic, image and examination, and an automatic intelligent diagnosis model such as a decision tree model is manufactured through a big data processing means such as the decision tree means described in the embodiment 3. The invention closely combines clinical big data and AI, can certainly open up a new breakthrough for predicting AF occurrence through big data processing and AI self-learning, and provides an important diagnosis means for atrial fibrillation prevention strategies.
Manufacturing an AI model: by utilizing an information integration platform of an affiliated hospital of the applicant, namely an affiliated Zhongshan hospital of Dalian university, the clinical data (medical history, physical examination, physical and chemical examination and the like) of hypertensive patients registered and registered in the hospital in 1 month to 2017 month in 2010 are subjected to big data processing, and a primary diagnosis model is established.
And (3) AI model verification: inputting the related parameter data of hypertension patient hospitalized and diagnosed in our hospital into computer by using primary AI model, and checking the diagnosis ability (including prediction sensitivity, specificity, coincidence rate and prediction efficiency) of the AI model
And (3) improving an AI model: the model is continuously corrected and perfected through the self deep learning ability of AI, and is gradually developed and perfected.
The above description is only for the purpose of creating a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the technical scope of the present invention.

Claims (5)

1. A method for selecting indexes in an atrial fibrillation artificial intelligence experiment is characterized by comprising the following steps:
s1, constructing a decision tree;
s2, adjusting parameters to optimize a decision tree;
and S3, performing experiments on possible values of various parameters, and finally selecting an optimal experiment result which is used as a main index for decision tree prediction.
2. The method of selecting an indicator for atrial fibrillation artificial intelligence experiments according to claim 1, wherein: the main indexes are three attributes of A peak, ef and last in the attributes of the cardiac ultrasound.
3. The method of selecting an indicator for atrial fibrillation artificial intelligence experiments according to claim 1, wherein: the main indexes are XGN (cardiac function grade), A peak (cardiac ultrasonic index), FS (rheumatic heart valve disease), FJB (interstitial lung disease), LVPWD (cardiac ultrasonic index), EF (cardiac ultrasonic index), FDMB1 (pulmonary valve blood flow velocity), FDMB (pulmonary valve), LAD (cardiac ultrasonic index), GXB (coronary heart disease), TNB (diabetes), MCHC (hemoglobin concentration) and E peak (cardiac ultrasonic index).
4. The method of selecting an indicator for atrial fibrillation artificial intelligence experiments according to claim 1, wherein: the method for constructing the decision tree comprises the following steps:
step 1: if the data set S belongs to the same category, a leaf node is created, a corresponding category label is marked, and the tree building is stopped; otherwise, performing the step 2;
step 2: calculating information Gain rates Gain-rate (A) of all attributes in the data set S;
and step 3: selecting an attribute A of the maximum information gain rate;
and 4, step 4: establishing the attribute A as a root node of a decision tree T, wherein the T is a decision tree to be established;
and 5: dividing the data set into a plurality of subsets according to different values of the attribute A, circularly executing the steps 1-4 on the subset Sv, and constructing a subtree Tv, wherein the Sv is a sample subset with the value of the attribute A being v;
step 6: adding the subtree Tv to the corresponding branch of the decision tree T;
and 7: and (5) finishing the circulation to obtain a decision tree T.
5. An application of a prediction decision tree in atrial fibrillation prediction.
CN201811068302.7A 2018-09-13 2018-09-13 Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction Pending CN110895972A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811068302.7A CN110895972A (en) 2018-09-13 2018-09-13 Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811068302.7A CN110895972A (en) 2018-09-13 2018-09-13 Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction

Publications (1)

Publication Number Publication Date
CN110895972A true CN110895972A (en) 2020-03-20

Family

ID=69785537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811068302.7A Pending CN110895972A (en) 2018-09-13 2018-09-13 Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction

Country Status (1)

Country Link
CN (1) CN110895972A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150164349A1 (en) * 2013-12-12 2015-06-18 Alivecor, Inc. Methods and systems for arrhythmia tracking and scoring
CN107610771A (en) * 2017-08-23 2018-01-19 上海电力学院 A kind of medical science Testing index screening technique based on decision tree

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150164349A1 (en) * 2013-12-12 2015-06-18 Alivecor, Inc. Methods and systems for arrhythmia tracking and scoring
CN107610771A (en) * 2017-08-23 2018-01-19 上海电力学院 A kind of medical science Testing index screening technique based on decision tree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
范洁, 杨岳湘: "决策树后剪枝算法的研究", 湖南广播电视大学学报, no. 01, 25 March 2005 (2005-03-25), pages 54 - 56 *

Similar Documents

Publication Publication Date Title
Krishnani et al. Prediction of coronary heart disease using supervised machine learning algorithms
David et al. HEART DISEASE PREDICTION USING DATA MINING TECHNIQUES.
Behlouli et al. Identifying relative cut-off scores with neural networks for interpretation of the Minnesota Living with Heart Failure questionnaire
Ordonez et al. Evaluating association rules and decision trees to predict multiple target attributes
Au-Yeung et al. Reduction of false alarms in the intensive care unit using an optimized machine learning based approach
CN110895969A (en) Atrial fibrillation prediction decision tree and pruning method thereof
Rezapour et al. Implementation of predictive data mining techniques for identifying risk factors of early AVF failure in hemodialysis patients
Pal et al. Data mining approach for coronary artery disease screening
Jelinek et al. Decision trees and multi-level ensemble classifiers for neurological diagnostics
Shen et al. Risk prediction for cardiovascular disease using ECG data in the China Kadoorie Biobank
Bakar et al. A Review: Heart Disease Prediction in Machine Learning & Deep Learning
Patidar et al. Comparative analysis of machine learning algorithms for heart disease predictions
Li et al. Research on massive ECG data in XGBoost
Thaiparnit et al. A classification for patients with heart disease based on hoeffding tree
Janghorbani et al. Prediction of acute hypotension episodes using logistic regression model and support vector machine: A comparative study
Jelinek et al. Multi-layer attribute selection and classification algorithm for the diagnosis of cardiac autonomic neuropathy based on HRV attributes
de Andrades et al. Hyperparameter tuning and its effects on cardiac arrhythmia prediction
CN110895972A (en) Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction
CN110895669A (en) Method for constructing atrial fibrillation prediction decision tree
Tsipouras et al. A decision support system for the diagnosis of coronary artery disease
Guidi et al. Performance assessment of a clinical decision support system for analysis of heart failure
Roland et al. An Automated System for Arrhythmia Detection using ECG records from MITDB
Li et al. Multi-label feature selection for long-term electrocardiogram signals
Jelinek et al. A survey of data mining methods for automated diagnosis of cardiac autonomic neuropathy progression
Nandanwar et al. ECG Signals-Early detection of Arrhythmia using Machine Learning approaches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination