CN110895972A - Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction - Google Patents
Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction Download PDFInfo
- Publication number
- CN110895972A CN110895972A CN201811068302.7A CN201811068302A CN110895972A CN 110895972 A CN110895972 A CN 110895972A CN 201811068302 A CN201811068302 A CN 201811068302A CN 110895972 A CN110895972 A CN 110895972A
- Authority
- CN
- China
- Prior art keywords
- atrial fibrillation
- decision tree
- prediction
- selecting
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010003658 Atrial Fibrillation Diseases 0.000 title claims abstract description 121
- 238000003066 decision tree Methods 0.000 title claims abstract description 101
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000002474 experimental method Methods 0.000 title claims abstract description 35
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 22
- 230000000747 cardiac effect Effects 0.000 claims description 35
- 238000002604 ultrasonography Methods 0.000 claims description 24
- 210000003102 pulmonary valve Anatomy 0.000 claims description 10
- 102000001554 Hemoglobins Human genes 0.000 claims description 6
- 108010054147 Hemoglobins Proteins 0.000 claims description 6
- 206010012601 diabetes mellitus Diseases 0.000 claims description 6
- 208000029523 Interstitial Lung disease Diseases 0.000 claims description 5
- 208000025747 Rheumatic disease Diseases 0.000 claims description 5
- 230000017531 blood circulation Effects 0.000 claims description 5
- 208000029078 coronary artery disease Diseases 0.000 claims description 5
- 230000004217 heart function Effects 0.000 claims description 5
- 208000018578 heart valve disease Diseases 0.000 claims description 5
- 230000000552 rheumatic effect Effects 0.000 claims description 5
- 230000004087 circulation Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 13
- 238000004422 calculation algorithm Methods 0.000 description 35
- 206010020772 Hypertension Diseases 0.000 description 18
- 238000013138 pruning Methods 0.000 description 16
- 238000003745 diagnosis Methods 0.000 description 11
- 238000007418 data mining Methods 0.000 description 9
- 230000001631 hypertensive effect Effects 0.000 description 9
- 241000288113 Gallirallus australis Species 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 208000006011 Stroke Diseases 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 239000003814 drug Substances 0.000 description 5
- 206010008190 Cerebrovascular accident Diseases 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 4
- 210000004369 blood Anatomy 0.000 description 4
- 239000008280 blood Substances 0.000 description 4
- 230000002490 cerebral effect Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 206010019280 Heart failures Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 230000001788 irregular Effects 0.000 description 3
- 230000002085 persistent effect Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 208000005189 Embolism Diseases 0.000 description 2
- 206010003119 arrhythmia Diseases 0.000 description 2
- 230000001746 atrial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000010224 classification analysis Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000000302 ischemic effect Effects 0.000 description 2
- 210000000265 leukocyte Anatomy 0.000 description 2
- 230000003908 liver function Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 208000024891 symptom Diseases 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000002861 ventricular Effects 0.000 description 2
- 201000006474 Brain Ischemia Diseases 0.000 description 1
- 206010007559 Cardiac failure congestive Diseases 0.000 description 1
- 206010008120 Cerebral ischaemia Diseases 0.000 description 1
- 208000006545 Chronic Obstructive Pulmonary Disease Diseases 0.000 description 1
- 102400001263 NT-proBNP Human genes 0.000 description 1
- 206010042434 Sudden death Diseases 0.000 description 1
- 206010065342 Supraventricular tachyarrhythmia Diseases 0.000 description 1
- 208000001435 Thromboembolism Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000006793 arrhythmia Effects 0.000 description 1
- 238000012098 association analyses Methods 0.000 description 1
- 238000004820 blood count Methods 0.000 description 1
- 206010061592 cardiac fibrillation Diseases 0.000 description 1
- 206010008118 cerebral infarction Diseases 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000739 chaotic effect Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 230000015271 coagulation Effects 0.000 description 1
- 238000005345 coagulation Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000002526 effect on cardiovascular system Effects 0.000 description 1
- 230000002600 fibrillogenic effect Effects 0.000 description 1
- 210000003714 granulocyte Anatomy 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000001314 paroxysmal effect Effects 0.000 description 1
- 208000024335 physical disease Diseases 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 108010008064 pro-brain natriuretic peptide (1-76) Proteins 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 208000020016 psychiatric disease Diseases 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000009885 systemic effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- Biomedical Technology (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Pathology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
Abstract
A method for selecting indexes through an atrial fibrillation artificial intelligence experiment and application of a prediction decision tree in atrial fibrillation prediction belong to the field of data processing, and aim to solve the problem of selecting indexes which more accurately reflect atrial fibrillation, the method comprises the following steps of S1, constructing a decision tree; s2, adjusting parameters to optimize a decision tree; and S3, performing experiments on possible values of various parameters, and finally selecting an optimal experiment result as a main index of decision tree prediction.
Description
Technical Field
The invention belongs to the field of data processing, and relates to a method for constructing an atrial fibrillation prediction decision tree and a method for selecting indexes in an atrial fibrillation artificial intelligence experiment.
Background
Atrial fibrillation is a supraventricular tachyarrhythmia characterized by rapid, chaotic electrical atrial activity. Atrial fibrillation is mainly shown on an electrocardiogram by disappearance of P waves and replacement with irregular atrial fibrillation waves; RR intervals are absolutely irregular (when atrioventricular conduction is present). This is also the main basis for judging atrial fibrillation in medical field and the like at present. Atrial fibrillation is medically classified mainly into paroxysmal atrial fibrillation (paroxysmal AF), persistent atrial fibrillation (persistent AF), long-range persistent atrial fibrillation (long-standing persistent AF), and permanent atrial fibrillation (persistent AF) according to the duration of an episode of atrial fibrillation. The specific classification is shown in Table 1.
TABLE 1.1 detailed classification of atrial fibrillation in medicine
Atrial fibrillation is a very common arrhythmia in clinic, the incidence rate of the atrial fibrillation in China is 0.5% -1%, and the incidence probability is higher with the increase of age. The risk of atrial fibrillation of the hypertensive patients is 1.7 times higher than that of the normotensive patients, and at present, 33 percent of patients with atrial fibrillation are caused by hypertension. In response to the high incidence of atrial fibrillation in hypertensive patients, it is even thought that atrial fibrillation is another manifestation of damage to the target organs of hypertension. But at present, no better index exists clinically for predicting the occurrence of AF of hypertension patients. In addition, some patients with atrial fibrillation do not have obvious clinical symptoms, so that the patients are unconsciously exposed to the risks of various critical diseases, and when clinical symptoms appear or the diseases are sudden, cardiovascular organic lesions are often caused, so that the physical health of the patients is greatly influenced and even the life of the patients is threatened. Therefore, it is very important to study the probability of atrial fibrillation in the population of hypertensive patients.
At present, a plurality of methods for predicting atrial fibrillation exist, and the method starts from the aspect of treatment of atrial fibrillation in the medical field. Although CHA exists internationally2DS2The VASc score (hypertension, age, diabetes, stroke, vasculopathy, gender, congestive heart failure) and the hach score (hypertension, age, onset of cerebral ischemia, chronic obstructive pulmonary disease, heart failure) are used to predict atrial fibrillation, but both of these scores have various limitations that make the prediction method non-normative and the prediction result inaccurate. In the field of computers, it is common to use the electrocardiogram of the patient, to determine the P-wave and to analyze the RRThe variation rule of interval distribution along with time and other factors are used for judging whether the patient has atrial fibrillation, and the used algorithm has the aspects of statistics and machine learning. Some characteristic indexes of a human body are detected through a smart watch for prediction, the face is scanned through a smart phone for prediction through the face color of the human body, and even for asymptomatic patients, the Holter heart rate of the patients is directly tested through a medical instrument for prediction. These are still lacking in standardization and have no particular standard.
Disclosure of Invention
In order to solve the problem of selecting an index which more accurately reflects atrial fibrillation, the invention provides the following scheme:
a method for selecting indexes in an atrial fibrillation artificial intelligence experiment comprises the following steps:
s1, constructing a decision tree;
s2, adjusting parameters to optimize a decision tree;
and S3, performing experiments on possible values of various parameters, and finally selecting an optimal experiment result which is used as a main index for decision tree prediction.
Further, the main indexes are three attributes of A peak, ef and last in the attributes of the cardiac ultrasound.
Further, the main indexes are XGN (cardiac function grade), peak a (cardiac ultrasound index), FS (rheumatic valvular heart disease), FJB (interstitial lung disease), LVPWD (cardiac ultrasound index), EF (cardiac ultrasound index), FDMB1 (pulmonary valve blood flow velocity), FDMB (pulmonary valve), LAD (cardiac ultrasound index), GXB (coronary heart disease), TNB (diabetes), MCHC (hemoglobin concentration), peak E (cardiac ultrasound index).
Further, the method for constructing the decision tree is as follows:
step 1: if the data set S belongs to the same category, a leaf node is created, a corresponding category label is marked, and the tree building is stopped; otherwise, performing the step 2;
step 2: calculating information Gain rates Gain-rate (A) of all attributes in the data set S;
and step 3: selecting an attribute A of the maximum information gain rate;
and 4, step 4: establishing the attribute A as a root node of a decision tree T, wherein the T is a decision tree to be established;
and 5: dividing the data set into a plurality of subsets according to different values of the attribute A, circularly executing the steps 1-4 on the subset Sv, and constructing a subtree Tv, wherein the Sv is a sample subset with the value of the attribute A being v;
step 6: adding the subtree Tv to the corresponding branch of the decision tree T;
and 7: and (5) finishing the circulation to obtain a decision tree T.
The invention also relates to application of the prediction decision tree in atrial fibrillation prediction.
Has the advantages that: the invention makes more reasonable selection on the atrial fibrillation prediction index through artificial intelligence and big data processing, the index is obtained through big data processing and can more accurately reflect the index of atrial fibrillation, the indexes are used for evaluating the atrial fibrillation and reducing the missed detection of the atrial fibrillation pair.
Drawings
FIG. 1 is a schematic diagram of a decision tree structure;
FIG. 2 is a schematic illustration of a medical data manuscript;
FIG. 3 is a schematic diagram of a derived Excel table;
FIG. 4 is a schematic representation of cardiac ultrasound properties;
FIG. 5 is a schematic view of a 4weka operating interface;
FIG. 6 is a schematic diagram of decision trees each using default values;
FIG. 7 is a schematic diagram of decision tree accuracy;
FIG. 8 is a schematic diagram of a decision tree of 154 factors;
FIG. 9 is a schematic diagram of decision tree accuracy.
Detailed Description
Example 1:
in order to solve the problem of building a decision tree for atrial fibrillation prediction, the invention provides the following technical scheme: a method of constructing an atrial fibrillation prediction decision tree, comprising:
step 1: if the data set S belongs to the same category, a leaf node is created, a corresponding category label is marked, and the tree building is stopped; otherwise, performing the step 2;
step 2: calculating information Gain rates Gain-rate (A) of all attributes in the data set S;
and step 3: selecting an attribute A of the maximum information gain rate;
and 4, step 4: establishing the attribute A as a root node of a decision tree T, wherein the T is a decision tree to be established;
and 5: dividing the data set into a plurality of subsets according to different values of the attribute A, circularly executing the steps 1-4 on the subset Sv, and constructing a subtree Tv, wherein the Sv is a sample subset with the value of the attribute A being v;
step 6: adding the subtree Tv to the corresponding branch of the decision tree T;
and 7: and (5) finishing the circulation to obtain a decision tree T.
Further, the data processing method comprises the following steps: for class label missing, directly deleting the information; for attribute values missing, values are incorporated into a class or replaced with values that are most common; processing continuous values firstly needs to sort multiple data, each data is used as a threshold value to divide a data set, information gain of each division is calculated, the threshold value is selected according to the maximum gain, and the threshold value is used for dividing the data set.
Further, pruning the decision tree:
1) respectively calculating the number of three prediction misclassification samples: calculating the sum of the numbers of the mispredicted samples of all leaf nodes of the subtree Tv, and recording the sum as E1; calculating the number of prediction error score samples when the subtree Tv is pruned and replaced by leaf nodes, and recording as E2; calculating the maximum branch prediction wrong score sample number of the subtree Tv, and recording the maximum branch prediction wrong score sample number as E3;
2) and (3) comparison: e1 minimum, do not prune; e2 is minimum, pruning is carried out, and a leaf node replaces a subtree Tv; e3 is minimum, the subtree Tv is replaced by the maximum branch.
Further, the splitting attribute is selected according to the information gain ratio:
the formula of the information entropy is as follows:
Info_Gain(A)=H(S)-H(A)
wherein S represents a data set, ciRepresents the ith class of the data set, p (c)i) Represents ciThe probability that this category is selected;
in decision tree division, the information entropy of a certain characteristic attribute is generally calculated, and if the characteristic attribute A has n different values, the characteristic attribute A divides a data set S into n small data sets, and S is usediMeaning that the probability of each small data set being selected is p(s)i) As can be seen from equation (1), each small data set siHas an information entropy of H(s)i) The information entropy calculation formula of the characteristic attribute A is as follows:
the information gain calculation formula is as follows:
Info_Gain(A)=H(S)-H(A)(3)
the information gain ratio calculation formula is as follows:
furthermore, the constructed decision tree is continuously adjusted by changing the parameters of the decision tree algorithm, so that the accuracy and the branch attribute value of the constructed decision tree are optimal: the J48 algorithm can modify 11 parameters, wherein default values are adopted for binary spaces, debug, savInstance, subtreeRaising, unpruned and useLaplace, five parameters of Configenefactor, minNumObj, numFolds, seed and ReduceErrorPruneng are modified and verified to continuously approximate the accurate value of the medical data; putting the data files subjected to data processing into weka software, selecting an algorithm, modifying parameters corresponding to the algorithm, running a result, performing experiments on possible values of various parameters, and finally selecting an optimal experiment result;
the experiment is divided into two branches:
the method comprises the following steps of performing a first branch experiment on a plurality of attributes of cardiac ultrasound indexes, wherein the last column is a class label, f is atrial fibrillation, z is normal, and each parameter of an algorithm uses a default value; according to the decision tree, three attributes of A peak, ef and last are used for greatly influencing atrial fibrillation in the attributes of cardiac ultrasound, specifically, the attribute is a root node A peak in the decision tree, the information gain rate is the maximum, the normal range of the attribute is 41 to 87, the first branch of the decision tree is that when a < ═ 0, a refers to the value of the A peak, the patient has atrial fibrillation, and because the data does not have non-0 number, namely when a equals 0, the patient is judged to have atrial fibrillation; when a is greater than 0, the ef attribute needs to be continuously considered, and when the ef value is less than 58, the patient is judged to be normal;
the second branch experiment, collecting the characteristic indexes of the patient, wherein the characteristic indexes comprise blood routine, first function, coagulation image, liver function, blood fat and cardiac ultrasonic index detection items as attribute columns, the last column is a class label, f is atrial fibrillation, z is normal, each parameter of the algorithm uses default values, according to the decision tree, XGN (heart function grade), A peak (cardiac ultrasonic index), FS (rheumatic heart valvular disease), FJB (interstitial lung disease), LVPWD (cardiac ultrasonic index), EF (cardiac ultrasonic index), FDMB1 (pulmonary valve blood flow velocity), FDMB (pulmonary valve), LAD (cardiac ultrasonic index), GXB (coronary heart disease), TNB (diabetes), MCHC (hemoglobin concentration) and E peak (cardiac ultrasonic index) which act on atrial fibrillation, specifically, in the decision tree, the root node is XGN, when the XGN grade is more than 1, the patient is judged to be the patient, when the XGN grade is less than or equal to 1, continuously considering the A peak, when the A peak is 0, continuously considering the FS, when the FS is more than 0, judging that the patient has atrial fibrillation, otherwise, continuously considering the FJB, when the FJB is less than or equal to 0, considering the LVPWD, when the LVPWD is less than or equal to 9, continuously considering the value of EF (namely EF1 in the decision tree), when the EF is less than or equal to 57, judging that the patient is normal, otherwise, judging that the patient is atrial fibrillation; continuing to trace back to the right branch of the LVPWD, when the LVPWD is more than 9, considering the value of the FDMB1, when the value is less than or equal to 101, judging that the patient is atrial fibrillation, otherwise, considering the LAD, when the LAD is less than or equal to 50, judging that the patient is atrial fibrillation, otherwise, judging that the patient is normal; continuously backtracking the right branch of the FJB, considering the GXB when the FJB is greater than 0, judging that the patient is normal when the GXB is less than or equal to 2, and otherwise judging that the patient is atrial fibrillation; continuously backtracking to the right branch of the FS, and judging that the patient has atrial fibrillation when the FS is larger than 0; and continuously backtracking the right branch of the peak A, considering TNB when A is larger than 0, judging that the patient is normal when TNB is smaller than or equal to 0, judging that the patient is normal when FDMB is larger than 0, considering the value E otherwise, judging that the patient is atrial fibrillation when E is larger than 72, considering the value MCHC otherwise, judging that the patient is atrial fibrillation when MCHC is smaller than or equal to 338, and traversing the whole decision tree if the patient is normal.
Example 2:
the present disclosure adopts a data mining method to establish a standard decision tree model for medical reference.
The standard terminology involved therein is explained:
data Mining (DM) is to extract valuable patterns, connections, knowledge and the like that people know in the face of massive Data which has various Data sources and is accumulated for a long time. It is to mine data and discover knowledge without assumption in advance. Data mining is a technology for searching the rule of a large amount of data by analyzing each piece of data, and mainly comprises 3 steps of data preparation, rule searching and rule representation. The data mining task comprises association analysis, cluster analysis, classification analysis, anomaly analysis, specific group analysis, evolution analysis and the like. The method is used for carrying out classification analysis and analyzing whether the hypertensive patient has atrial fibrillation.
The decision tree algorithm is a typical algorithm used for classification prediction in the field of data mining, and has low computational complexity and intuitive output results. The present invention introduces a decision tree algorithm into predicting the probability of having atrial fibrillation in a hypertensive patient.
The decision tree is a basic classification and regression method, and the invention mainly adopts a classification decision tree. The decision tree model is in a tree structure and represents a process of classifying the instances based on the characteristics in the classification problem. Compared with naive Bayes classification, the decision tree has the advantage that no domain knowledge or parameter setting is needed in the construction process, so that the decision tree is more applicable to detection type knowledge discovery in practical application. Decision tree algorithms include the ID3 algorithm, the C4.5 algorithm, and the CART algorithm. The invention adopts C4.5 algorithm to carry out experiment. C4.5 is mainly improved on the basis of ID3, and attributes with more values are preferentially selected when the attributes are selected by information gain in an ID3 algorithm. To solve this problem, the information gain is replaced by an information gain rate in the C4.5 algorithm. The decision tree is a tree structure and is composed of a root node, a series of internal nodes and leaf nodes, each node is provided with only one father node and two or more child nodes, and the nodes are connected through branches. Each internal node of the decision tree corresponds to a non-category attribute or a combination of attributes, each edge corresponds to each possible value of the attribute, and each leaf node corresponds to a category attribute value. An example of a decision tree structure is shown in fig. 1.
Aiming at the known meanings of the decision tree, the method is suitable for classifying the indexes of atrial fibrillation prediction and comprises the following steps:
c4.5 algorithm flow
Step 1: if the data set S belongs to the same category, a leaf node is created, a corresponding category label is marked, and the tree building is stopped; otherwise, performing the step 2;
step 2: calculating information Gain rates Gain-rate (A) of all attributes in the data set S;
and step 3: selecting an attribute A of the maximum information gain rate;
and 4, step 4: establishing the attribute A as a root node of a decision tree T, wherein the T is a decision tree to be established;
and 5: dividing the data set into a plurality of subsets according to different values of the attribute A, circularly executing the steps 1-4 on the subset Sv, and constructing a subtree Tv, wherein the Sv is a sample subset with the value of the attribute A being v;
step 6: adding the subtree Tv to the corresponding branch of the decision tree T;
and 7: and (5) finishing the circulation to obtain a decision tree T.
Numerical value processing: training data with missing attribute values can be processed. For class label missing, directly deleting the information; for attribute values missing, these values are either incorporated into a class or replaced with the most common values. Continuous value attributes may be processed. Processing continuous values firstly needs to sort multiple data, each data is used as a threshold value to divide a data set, information gain of each division is calculated, the threshold value is selected according to the maximum gain, and the threshold value is used for dividing the data set.
Pruning: through the above decision tree generation process, we can construct a decision tree based on a training data set, but the accuracy of the decision tree, and other performances, are some of the tasks we need to evaluate the tree. Since our resulting decision tree is purely based on the training data set, there may be some problems with overfitting. To solve this problem, we need to prune the decision tree. The basic idea of decision tree pruning is to remove a part of the tree (subtree) that does not contribute to the classification accuracy of unknown test samples, and there are two improved recursive branching methods for generating a simple and more easily understood tree: pre-pruning and post-pruning.
Pre-pruning: making decisions before branching prevents the data set from generating too many branches. Pruning is performed while constructing the decision tree.
Post pruning: mainly aims at solving the noise influence and pruning redundant branches.
Since the J48 algorithm employed in the present invention is post-pruning, the post-pruning method is described in detail herein. The post-pruning method comprises the following steps: REP (reduced Error reporting), PEP (pending Error reporting), MEP (minimum Error reporting), CCP (Cost-completing) and the like. The default pruning method for the C4.5 algorithm is REP pruning. The basic idea is as follows:
1) respectively calculating the number of three prediction misclassification samples: calculating the sum of the numbers of the mispredicted samples of all leaf nodes of the subtree Tv, and recording the sum as E1; calculating the number of prediction error score samples when the subtree Tv is pruned and replaced by leaf nodes, and recording as E2; the maximum branch prediction wrong score sample number of the subtree Tv is calculated and is denoted as E3.
2) A comparison is made. E1 minimum, do not prune; e2 is minimum, pruning is carried out, and a leaf node replaces a subtree Tv; when E3 is minimum, a "grafting" strategy is adopted, i.e. this maximum branch is used to replace the subtree Tv.
Splitting attribute selection: the criterion for split attribute selection is the fundamental difference between decision tree algorithms. It has been mentioned above that ID3 is the splitting attribute selected by the information gain, and C4.5 is the splitting attribute selected by the information gain ratio. The information entropy is an expected value of information, and for a data set, the information entropy expresses the degree of disorder of the data set. The more categories a data set contains, the greater the corresponding information entropy. The formula is as follows:
wherein S represents a data set, ciRepresents the ith class of the data set, p (c)i) Represents ciThe probability that this category is selected;
in decision tree division, the information entropy of a certain characteristic attribute is generally calculated, and if the characteristic attribute A has n different values, the characteristic attribute A divides a data set S into n small data sets, and S is usediMeaning that the probability of each small data set being selected is p(s)i) As can be seen from equation (1), each small data set siHas an information entropy of H(s)i) The information entropy calculation formula of the characteristic attribute A is as follows:
the information gain calculation formula is as follows:
Info_Gain(A)=H(S)-H(A) (3)
the information gain ratio calculation formula is as follows:
application of algorithms
Description of data: the data adopted by the invention is provided by a certain hospital in Dalian and is actually measured and generated by the hypertension patients, and the data is 360 parts in total. The experimental report sheet mainly comprises white blood cell count (WBC), granulocyte absolute value (Neu #), NT-proBNP, EF (ejection fraction), LVEF (left ventricular ejection fraction), hypertension grade, whether atrial fibrillation exists or not and the like. Shown in fig. 2 is a portion of the original data item.
Data preprocessing: the data file type running on the Weka platform is the csv file, and our data file is Excel table data, so the first step now needs to convert the data file into the csv file. Other indicators in the data given by the hospital that are not considered by the present invention are filtered out, leaving only the study objects. Abnormal data is deleted, and the vacancy value attribute J48 algorithm automatically processes. As the 154D data has large magnitude order of magnitude, the invention can extract 11D data, namely the heart ultrasonic index, in a targeted manner through related medical standards to carry out more specific experiments. Such as ef (ejection fraction), a peak, e peak, etc. As shown in figure 3.
And (3) operating environment: the Waikato Environment for knowledgeable analysis, WEKA, is a free, non-commercial JAVA-based open source machine learning and data mining software, the major developers from new zealand. The official website is: the http// WEKA. wikispace. com/. WEKA is used as a public data mining working platform, a large number of machine learning algorithms capable of bearing data mining tasks are integrated, data are preprocessed, subjected to correlation analysis, classified, regressed, clustered and visualized on a new interactive interface, the WEKA is embedded into Myeclipse, and secondary development of the WEKA is facilitated; and the latest data mining algorithms are modified or added, and the mining results can be displayed in various forms, so that the user can conveniently and clearly find the required knowledge. Before mining, JDBC is required to be configured, and a driver of a database is loaded. The Weka control platform and operational interface is shown in figure 4. If the weka software is used, the control platform needs to be opened, a first option Explorer is selected to start an experiment, an opened interface is shown as an operation interface diagram, in the first step, the experiment to be performed needs to be selected through openfile options, then different experiments are performed on data according to the requirements of the experiment, and for example, the parameters of options such as data preprocessing, a classification algorithm, a clustering algorithm, association rules and the like exist. The method selects the J48 algorithm in the classification algorithm to carry out experiments according to the experiment requirements. The software operating interface is shown in fig. 5.
And (3) decision tree construction: the construction of decision trees is not unique, and unfortunately the construction of an optimal decision tree belongs to NP problems. Therefore, how to construct a good decision tree is the focus of research. The invention continuously adjusts the constructed decision tree by changing the parameters of the decision tree algorithm, so that the accuracy rate and the branch attribute value of the constructed decision tree are optimal. The J48 algorithm can modify 11 parameters, wherein default values are used for binary spaces, debug, savInstance, subtreeRaising, unpresuned and useLaplace, and five parameters of Configenefactor, minNumObj, numFolds, seed and ReduceErrorPrunning are modified. The experiment of the invention mainly modifies and verifies the remaining six parameters to continuously approximate the accurate value of the medical data, so that the accuracy and the feasibility of the decision tree are stronger. The weka software is similar to a black box, and the result can be run by only putting the processed data file into the weka to select the desired algorithm and modifying the corresponding parameters of the algorithm. All possible values of various parameters are tested, and finally, the optimal test result is selected as follows. The experiment is generally divided into two branches, one is to perform the experiment on 11 attributes of cardiac ultrasound, wherein the last column is a class label, f is atrial fibrillation, and z is normal. The experimental data contained a total of 360, 186 male and 174 female. There are 178 persons with atrial fibrillation and 182 persons with normal fibrillation (normal persons here refer to patients with pure hypertension). Default values are used for various parameters of the algorithm, and the experimental result is shown in fig. 6.
Through the decision tree, three attributes of A peak, ef and last which have great influence on atrial fibrillation can be known in the attributes of cardiac ultrasound. This property, in particular the root node a peak in the decision tree (when vanishing, meaning that atrial fibrillation has occurred.) is that the information gain rate is maximal, with a normal range of 41 to 87. In the first branch, we can see that when a < 0, the patient has atrial fibrillation, and since there is no non-0 number in the data, that is, when a < 0, the patient can be judged to have atrial fibrillation. When a is larger than 0, the ef attribute needs to be considered, and when the ef value is smaller than 58, the patient is judged to be normal. And so on the decision tree. The decision tree accuracy screenshot comprises accuracy, error rate, Kappa value and the like, and all the factors can be used for evaluating the quality of the algorithm. The invention mainly takes the accuracy as the judgment basis. From fig. 7, it can be seen that the accuracy is 83.0556%.
The second part of the experimental data contained 308 data in total. 154 characteristic indexes of the patients comprise index detection items of blood routine, Jiagong, coagulogram, liver function, blood fat, cardiac ultrasound and the like as attribute columns, the last column is a class label, f is atrial fibrillation, and z is normal. In the data, there were 162 men and 146 women. Patients with atrial fibrillation had 128 patients and normal patients had 180. Similarly to the above, the default values are used for each parameter of the algorithm, and the experimental results are shown in fig. 8.
From the decision tree we can see that among the 154 attributes contributing to atrial fibrillation are XGN (cardiac function level), peak a (cardiac ultrasound index), FS (rheumatic heart valve disease), FJB (interstitial lung disease), LVPWD (cardiac ultrasound index), EF (cardiac ultrasound index), FDMB1 (pulmonary valve blood flow velocity), FDMB (pulmonary valve), LAD (cardiac ultrasound index), GXB (coronary heart disease), TNB (diabetes), MCHC (hemoglobin concentration), peak E (cardiac ultrasound index). Some of these 13 indices have not attracted sufficient attention in medicine. Such as the effect of hemoglobin concentration on atrial fibrillation.
Specifically, in a decision tree, a root node is XGN, which indicates that the index has a great effect on the occurrence of atrial fibrillation, when the level of XGN is less than or equal to 1, the A peak is continuously considered, when the A peak is 0, the FS is continuously considered, when the FS is greater than 0, the patient is judged to have atrial fibrillation, otherwise, the FJB is continuously considered, when the FJB is less than or equal to 0, the LVPWD is considered, when the LVPWD is less than or equal to 9, the value of EF (namely EF1 in the decision tree) is continuously considered, when the EF is less than or equal to 57, the patient is judged to be normal, otherwise, the patient is atrial fibrillation; continuing to trace back to the right branch of the LVPWD, when the LVPWD is more than 9, considering the value of the FDMB1, when the value is less than or equal to 101, judging that the patient is atrial fibrillation, otherwise, considering the LAD, when the LAD is less than or equal to 50, judging that the patient is atrial fibrillation, otherwise, judging that the patient is normal; continuously backtracking the right branch of the FJB, considering the GXB when the FJB is greater than 0, judging that the patient is normal when the GXB is less than or equal to 2, and otherwise judging that the patient is atrial fibrillation; continuously backtracking to the right branch of the FS, and judging that the patient has atrial fibrillation when the FS is larger than 0; continuing to trace back the right branch of the peak A, considering TNB when A is larger than 0, judging that the patient is normal when TNB is smaller than or equal to 0, otherwise considering FDMB and judging that the patient is normal when FDMB is larger than 0, otherwise considering the value E, judging that the patient is atrial fibrillation when E is larger than 72, otherwise considering the value MCHC, judging that the patient is atrial fibrillation when MCHC is smaller than or equal to 338, otherwise, considering the value E, and repeating the steps to traverse the whole decision tree. The accuracy of this model was 85.0649%.
Through the above different experiments, the decision tree and the accuracy are comprehensively considered, and the method selects the graph 8 as the final model. The model has more consideration factors and is more comprehensive. The medical workers are more concise and elegant. The model is also approved in medicine.
Aiming at the problem that a model without unified specification is used for predicting atrial fibrillation in the medical field and a hypertensive patient has higher probability of having atrial fibrillation than a common person, the invention refers to the summary of atrial fibrillation prediction in medicine and provides an atrial fibrillation prediction method based on a decision tree to solve the problem. By the method, an intuitive and concise decision tree is established for medical research reference. The model combines a large amount of real medical data, and the accuracy of the model is ensured to be as complete as possible, wherein the accuracy of the model is 85.0649%. During the model building process, not only the potential relation among the medical indexes of the hypertensive can be mined, but also which index is more likely to cause atrial fibrillation can be mined, and some indexes are not concerned deeply in medicine. In the next work, the first point will increase the data volume, so that the model has more generalization capability and is prevented from being over-fitted. And secondly, performing better classification by using a machine learning algorithm to establish a practical and standard decision tree.
Example 3:
in order to solve the problem of selecting an index which more accurately reflects atrial fibrillation, the invention constructs a method for selecting the index in an atrial fibrillation artificial intelligence experiment, which comprises the following steps:
s1, constructing a decision tree;
s2, adjusting parameters to optimize a decision tree;
and S3, performing experiments on possible values of various parameters, and finally selecting an optimal experiment result which is used as a main index for decision tree prediction.
Further, the main indexes are three attributes of A peak, ef and last in the attributes of the cardiac ultrasound.
Further, the main indexes are XGN (cardiac function grade), peak a (cardiac ultrasound index), FS (rheumatic valvular heart disease), FJB (interstitial lung disease), LVPWD (cardiac ultrasound index), EF (cardiac ultrasound index), FDMB1 (pulmonary valve blood flow velocity), FDMB (pulmonary valve), LAD (cardiac ultrasound index), GXB (coronary heart disease), TNB (diabetes), MCHC (hemoglobin concentration), peak E (cardiac ultrasound index).
The method for constructing the decision tree is described in the embodiments 1 and 2.
The invention also relates to application of the prediction decision tree in atrial fibrillation prediction.
The invention makes more reasonable selection on the atrial fibrillation prediction index through artificial intelligence and big data processing, the index is obtained through big data processing and can more accurately reflect the index of atrial fibrillation, the indexes are used for evaluating the atrial fibrillation and reducing the missed detection of the atrial fibrillation pair.
Example 4:
atrial Fibrillation (AF) is one of the most common cardiac arrhythmias in clinic, and has a prevalence rate of about 0.4% to 1.0% in the population as a whole, and increases with age, and studies have shown that the prevalence rate is only 0.1% in the population <55 years, and up to 9% in the population >80 years. The common clinical complication of atrial fibrillation is systemic thromboembolism, cerebral apoplexy is the main embolic event that atrial fibrillation arouses, also is the highest complication of the patient's disability rate of atrial fibrillation simultaneously, and atrial fibrillation patient compares with non-atrial fibrillation patient, and the cerebral apoplexy incidence increases 5 times, and the fatality rate increases 2 times, and ischemic cerebral apoplexy is the leading cause that the fatality rate increases, and atrial fibrillation is the independent risk factor who takes place ischemic cerebral apoplexy, and its incidence increases along with the age. Other hazards of atrial fibrillation include: heart failure due to loss of the function of the atrial assist pump, sudden death due to electrical disturbance, irregular and rapid ventricular rate, and other physical and psychological disorders.
The accurate prediction of atrial fibrillation occurrence and the application of effective prevention means are important rings in the atrial fibrillation treatment process. At present, the diagnosis of atrial fibrillation is mainly based on electrocardiogram and extension of electrocardiogram such as dynamic electrocardiogram, guardianship electrocardiogram and implanted long-range electrocardiogram. In recent years, great achievements are achieved by combining the electrocardiogram technology with artificial intelligence, but the atrial fibrillation diagnosis accuracy rate is high based on the traditional 100-year electrocardiogram technology, but the missed diagnosis rate is also high, and particularly paroxysmal atrial fibrillation with infrequent attack and asymptomatic atrial fibrillation are harmed by the atrial fibrillation diagnosis method and are not inferior to symptomatic atrial fibrillation. The technology develops a new atrial fibrillation diagnosis system based on clinical big data combined with Artificial Intelligence (AI) so as to replace the traditional electrocardiogram diagnosis technology, at least as a screening diagnosis system for patients at high risk of atrial fibrillation before electrocardiogram examination, and as an important supplement for classical electrocardiogram examination.
The method and the technology are as follows: the research utilizes an information integration platform of an applicant's affiliated hospital-Dalian university affiliated Zhongshan hospital to analyze all data of the hypertensive patient such as clinic, image and examination, and an automatic intelligent diagnosis model such as a decision tree model is manufactured through a big data processing means such as the decision tree means described in the embodiment 3. The invention closely combines clinical big data and AI, can certainly open up a new breakthrough for predicting AF occurrence through big data processing and AI self-learning, and provides an important diagnosis means for atrial fibrillation prevention strategies.
Manufacturing an AI model: by utilizing an information integration platform of an affiliated hospital of the applicant, namely an affiliated Zhongshan hospital of Dalian university, the clinical data (medical history, physical examination, physical and chemical examination and the like) of hypertensive patients registered and registered in the hospital in 1 month to 2017 month in 2010 are subjected to big data processing, and a primary diagnosis model is established.
And (3) AI model verification: inputting the related parameter data of hypertension patient hospitalized and diagnosed in our hospital into computer by using primary AI model, and checking the diagnosis ability (including prediction sensitivity, specificity, coincidence rate and prediction efficiency) of the AI model
And (3) improving an AI model: the model is continuously corrected and perfected through the self deep learning ability of AI, and is gradually developed and perfected.
The above description is only for the purpose of creating a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the technical scope of the present invention.
Claims (5)
1. A method for selecting indexes in an atrial fibrillation artificial intelligence experiment is characterized by comprising the following steps:
s1, constructing a decision tree;
s2, adjusting parameters to optimize a decision tree;
and S3, performing experiments on possible values of various parameters, and finally selecting an optimal experiment result which is used as a main index for decision tree prediction.
2. The method of selecting an indicator for atrial fibrillation artificial intelligence experiments according to claim 1, wherein: the main indexes are three attributes of A peak, ef and last in the attributes of the cardiac ultrasound.
3. The method of selecting an indicator for atrial fibrillation artificial intelligence experiments according to claim 1, wherein: the main indexes are XGN (cardiac function grade), A peak (cardiac ultrasonic index), FS (rheumatic heart valve disease), FJB (interstitial lung disease), LVPWD (cardiac ultrasonic index), EF (cardiac ultrasonic index), FDMB1 (pulmonary valve blood flow velocity), FDMB (pulmonary valve), LAD (cardiac ultrasonic index), GXB (coronary heart disease), TNB (diabetes), MCHC (hemoglobin concentration) and E peak (cardiac ultrasonic index).
4. The method of selecting an indicator for atrial fibrillation artificial intelligence experiments according to claim 1, wherein: the method for constructing the decision tree comprises the following steps:
step 1: if the data set S belongs to the same category, a leaf node is created, a corresponding category label is marked, and the tree building is stopped; otherwise, performing the step 2;
step 2: calculating information Gain rates Gain-rate (A) of all attributes in the data set S;
and step 3: selecting an attribute A of the maximum information gain rate;
and 4, step 4: establishing the attribute A as a root node of a decision tree T, wherein the T is a decision tree to be established;
and 5: dividing the data set into a plurality of subsets according to different values of the attribute A, circularly executing the steps 1-4 on the subset Sv, and constructing a subtree Tv, wherein the Sv is a sample subset with the value of the attribute A being v;
step 6: adding the subtree Tv to the corresponding branch of the decision tree T;
and 7: and (5) finishing the circulation to obtain a decision tree T.
5. An application of a prediction decision tree in atrial fibrillation prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811068302.7A CN110895972A (en) | 2018-09-13 | 2018-09-13 | Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811068302.7A CN110895972A (en) | 2018-09-13 | 2018-09-13 | Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110895972A true CN110895972A (en) | 2020-03-20 |
Family
ID=69785537
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811068302.7A Pending CN110895972A (en) | 2018-09-13 | 2018-09-13 | Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110895972A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150164349A1 (en) * | 2013-12-12 | 2015-06-18 | Alivecor, Inc. | Methods and systems for arrhythmia tracking and scoring |
CN107610771A (en) * | 2017-08-23 | 2018-01-19 | 上海电力学院 | A kind of medical science Testing index screening technique based on decision tree |
-
2018
- 2018-09-13 CN CN201811068302.7A patent/CN110895972A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150164349A1 (en) * | 2013-12-12 | 2015-06-18 | Alivecor, Inc. | Methods and systems for arrhythmia tracking and scoring |
CN107610771A (en) * | 2017-08-23 | 2018-01-19 | 上海电力学院 | A kind of medical science Testing index screening technique based on decision tree |
Non-Patent Citations (1)
Title |
---|
范洁, 杨岳湘: "决策树后剪枝算法的研究", 湖南广播电视大学学报, no. 01, 25 March 2005 (2005-03-25), pages 54 - 56 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Krishnani et al. | Prediction of coronary heart disease using supervised machine learning algorithms | |
David et al. | HEART DISEASE PREDICTION USING DATA MINING TECHNIQUES. | |
Behlouli et al. | Identifying relative cut-off scores with neural networks for interpretation of the Minnesota Living with Heart Failure questionnaire | |
Ordonez et al. | Evaluating association rules and decision trees to predict multiple target attributes | |
Au-Yeung et al. | Reduction of false alarms in the intensive care unit using an optimized machine learning based approach | |
CN110895969A (en) | Atrial fibrillation prediction decision tree and pruning method thereof | |
Rezapour et al. | Implementation of predictive data mining techniques for identifying risk factors of early AVF failure in hemodialysis patients | |
Pal et al. | Data mining approach for coronary artery disease screening | |
Jelinek et al. | Decision trees and multi-level ensemble classifiers for neurological diagnostics | |
Shen et al. | Risk prediction for cardiovascular disease using ECG data in the China Kadoorie Biobank | |
Bakar et al. | A Review: Heart Disease Prediction in Machine Learning & Deep Learning | |
Patidar et al. | Comparative analysis of machine learning algorithms for heart disease predictions | |
Li et al. | Research on massive ECG data in XGBoost | |
Thaiparnit et al. | A classification for patients with heart disease based on hoeffding tree | |
Janghorbani et al. | Prediction of acute hypotension episodes using logistic regression model and support vector machine: A comparative study | |
Jelinek et al. | Multi-layer attribute selection and classification algorithm for the diagnosis of cardiac autonomic neuropathy based on HRV attributes | |
de Andrades et al. | Hyperparameter tuning and its effects on cardiac arrhythmia prediction | |
CN110895972A (en) | Method for selecting indexes through atrial fibrillation artificial intelligence experiment and application of prediction decision tree in atrial fibrillation prediction | |
CN110895669A (en) | Method for constructing atrial fibrillation prediction decision tree | |
Tsipouras et al. | A decision support system for the diagnosis of coronary artery disease | |
Guidi et al. | Performance assessment of a clinical decision support system for analysis of heart failure | |
Roland et al. | An Automated System for Arrhythmia Detection using ECG records from MITDB | |
Li et al. | Multi-label feature selection for long-term electrocardiogram signals | |
Jelinek et al. | A survey of data mining methods for automated diagnosis of cardiac autonomic neuropathy progression | |
Nandanwar et al. | ECG Signals-Early detection of Arrhythmia using Machine Learning approaches |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |