AU2021103976A4 - Asthma diagnosis system based on decision tree and improved SMOTE algorithm - Google Patents

Asthma diagnosis system based on decision tree and improved SMOTE algorithm Download PDF

Info

Publication number
AU2021103976A4
AU2021103976A4 AU2021103976A AU2021103976A AU2021103976A4 AU 2021103976 A4 AU2021103976 A4 AU 2021103976A4 AU 2021103976 A AU2021103976 A AU 2021103976A AU 2021103976 A AU2021103976 A AU 2021103976A AU 2021103976 A4 AU2021103976 A4 AU 2021103976A4
Authority
AU
Australia
Prior art keywords
data
asthma
decision tree
model
diagnosis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2021103976A
Inventor
Wen Chen
Yubao CUI
Zhifeng Liu
Ya Ma
Limin Xia
Conghua Zhou
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Application granted granted Critical
Publication of AU2021103976A4 publication Critical patent/AU2021103976A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

OF THE DISCLOSURE The present disclosure relates to the technical field of data mining, and in particular to an asthma diagnosis system based on a decision tree and the improved SMOTE algorithm. The data is composed of blood routine physical examination data of normal people and blood routine physical examination data of asthma patients. Particle Swarm Optimization (PSO) is used to optimize the sampling rate of the SMOTE over-sampling technology so as to obtain an improved SMOTE over-sampling technology is obtained. This algorithm is used to over sample data sets, and then the data is modeled and diagnosed by the decision tree algorithm. Compared with the traditional diagnosis depending on symptoms of patients, the asthma diagnosis system can automatically diagnose whether a patient is suffered from asthma according to his/her blood routine physical examination data, which reduces the influence due to fatigue, misjudgment or inexperience of physicians and improves the efficiency of asthma diagnosis. The present disclosure can be applied to intelligent detection of asthma. -1/2 DRAWINGS Data acquisition Data processing [ Oversampling processing Blood routine database Data acquisition (for training) No Construct and train Data processing decision tree model Yes Blood routine database Does the model works? Decision tree model of (for validation) asthma diagnosis Results of intelligent diagnosis by machine Auxiliary diagnosis by physician Model training and validation Application of disease diagnosis FIG.1

Description

-1/2
DRAWINGS
Data acquisition
Data processing [
Oversampling processing
Blood routine database Data acquisition (for training) No
Construct and train Data processing decision tree model
Yes Blood routine database Does the model works? Decision tree model of (for validation) asthma diagnosis
Results ofintelligent diagnosis by machine
Auxiliary diagnosis by physician
Model training and validation Application of disease diagnosis
FIG.1
ASTHMA DIAGNOSIS SYSTEM BASED ON DECISION TREE AND IMPROVED SMOTE ALGORITHM TECHNICAL FIELD
[01] The present disclosure relates to the technical field of data mining, and in particular to an asthma diagnosis system based on a decision tree and the improved SMOTE algorithm.
BACKGROUNDART
[02] Bronchial asthma (asthma for short) is a chronic inflammatory disease of the airway, which involves various cells (such as eosinophils, mastocytes, T lymphocytes, neutrophils, and airway epithelial cells) and cellular components. Asthma is an allergic inflammation reaction of the airway. Its clinical manifestation in an acute attack includes: repeated wheezing, dyspnea, chest tightness and cough, and decreased exercise tolerance accompanied by airway hyper-responsiveness and obstruction. Asthma is a chronic respiratory disease that seriously threatens human health, which is high in incidence and cannot be cured, seriously affecting normal working and life of patients. A lot of patients who didn't receive treatment in time or made mistakes in treatment methods have their lung functions further damaged. A bad attack of asthma, if not intervened and treated in time, will even endanger the life security of patients.
[03] Statistically, about 300 million people in the world are suffering from asthma, and the number of affected patients is increasing exponentially. By 2025, another 100 million people may be affected by asthma. Commonly used methods for evaluating asthma, such as sputum smear observation of eosinophils, pulmonary function (SPIR) and impulse oscillometry system (IOS), are difficult to perform detection, time-consuming, strenuous, and expensive. The above detection methods require a large amount of professionals equipped with expertise and diagnosis experience, but the number of professionals is relatively small relative to the large disease base, which will create great fatigue to medical staff, and even prone to misdiagnosis. Moreover, because of the lack of unified clinical indexes, different physicians will give different diagnosis results, which is greatly restrictive and dangerous. Some patients often have paroxysmal cough as their unique symptom, which is often misdiagnosed as bronchitis in clinic, while some teenagers have chest distress and shortness of breath during exercise as their unique clinical manifestation. If physicians don't know enough about asthma or have incorrect ideas about clinical diagnosis, they will easily make misdiagnosis or missed diagnosis.
[04] In the present disclosure, we focus on asthma, use the blood routine data of asthma patients obtained from relevant departments of hospitals, and combine the data with related data mining algorithms of machine learning to establish an asthma diagnosis model system, so as to help physicians working on clinical diagnosis, thus achieving early diagnosis and treatment and helping patients reduce the incidence of asthma.
SUMMARY
[05] In view of the above problems, the present disclosure provides an asthma diagnosis system based on a decision tree and the improved SMOTE algorithm, which includes a primary module of data acquisition, an oversampling processing module, a primary module of decision tree, a primary training module and a primary detection module;
[06] The primary module of data acquisition is used for acquiring asthma data, preprocessing the acquired asthma data to obtain preprocessed data, and inputting the preprocessed data into the primary module of oversampling processing;
[07] The primary module of over-sampling processing is used for processing input data and randomly dividing the processed data into two groups, namely a training sample set and a validation sample set;
[08] The primary module of decision tree is used for constructing an asthma diagnosis model;
[09] The primary training module trains the constructed asthma disease diagnosis model by using the training sample set, and obtains the trained asthma diagnosis model;
[10] The primary detection module is used for loading the trained asthma diagnosis model, and validating the trained asthma diagnosis model by using the validation sample set;
[11] If the trained asthma diagnosis model has an asthma diagnosis accuracy of greater than or equal to 85% on the validation sample set, the trained asthma diagnosis model is used as the final model, and the final model is used for asthma diagnosis;
[12] Otherwise, the parameters of the constructed asthma model are adjusted, and the constructed asthma model is retrained by using the training sample set until the asthma diagnosis accuracy of the trained model on the validation sample set is greater than or equal to 85%, then the final model is obtained, and the final model is used for asthma diagnosis.
[13] The present disclosure has the following beneficial effects:
[14] For asthma diagnosis in the prior art, physicians made determinations according to their own experience in combination with patients' characteristics. According to the present disclosure, physicians may carry out diagnosis simply by the physical examination data of patients' blood routine, which brings a great auxiliary to physicians, reduces medical burden, and makes the diagnosis faster.
BRIEFT DESCRIPTION OF THE DRAWINGS
[15] Fig. 1 is a structure schematic view of the system according to the present disclosure.
[16] Fig. 2 is a flow chart of the improved SMOTE algorithm according to the present disclosure.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[17] In order to make the technical schemes provided by the present disclosure clearer, the present disclosure will be further described in detail with reference to accompany drawings and embodiments below. It should be understood that the specific embodiments described herein are only used to explain the present disclosure without limiting the same.
[18] As shown in Fig. 1, the present disclosure discloses an asthma diagnosis system based on a decision tree and an improved SMOTE algorithm, which includes a primary module of acquisition, an oversampling processing module, a primary module of decision tree, a primary training module and a primary detection module;
[19] The specific steps are as follows:
[20] In Step 1, the primary module of data acquisition obtains 1800 entries of blood routine physical examination data of patients and normal people from the outpatient department of Wuxi People's Hospital, including 400 patients, and the outpatient data mainly relates to the basic information of patients and various asthma-related detection indexes.
[21] In Step 2, the data is cleaned, including missing value cleaning, format content cleaning, logic error cleaning, non-required data cleaning, correlation verification and other steps as follows:
[22] 2.1) The missing value cleaning step includes: determining a missing value range, calculating a missing value ratio for each field, and formulating strategies according to the missing value ratio and field significance; removing unnecessary fields and deleting some meaningless fields, such as a patient's physical examination serial number.
[23] 2.2) The missing values are filled in; for missing values with different features, using different filling methods, such as filling the missing values according to a physician's experience, or filling the missing values with special values, median, and hot deck.
[24] 2.3) Reacquiring data, as some features are very important but the missing ratio is too high, it is necessary to contact the outpatient department for reacquiring the data.
[25] 2.4) Cleaning the format content includes solutions of the following problems: time and date values have inconsistent display formats, the content contains characters that should have not existed in the content, and the content of the field is inconsistent with the content that the field should have.
[26] 2.5) The operation of cleaning logical errors is to remove some data that can be found problematic by using simple logical reasoning, so as to prevent the analysis results from deviating. The step mainly includes eliminating duplication, removing unreasonable values, correcting contradictory contents and so on.
[27] 2.6) Non-required data cleaning is to delete unnecessary fields.
[28] 2.7) The purpose of correlation verification is to ensure the correctness of the correlation among data when the data comes from multiple tables or databases, so as to prevent errors from occurring in the correlation or contradictions from occurring among data.
[29] In Step 3, the discrete data is pre-processed, which includes the following steps:
[30] For the preprocessing of discrete data, we should not carry out encoding schemes in normal conditions, but should digitize features of the discrete data. The One-Hot encoding scheme is adopted in the present disclosure. One-Hot encoding, also known as one-bit valid encoding, is a scheme that mainly adopts an N-bit status register to encode N states, wherein each state has its own register bit, and only one bit is valid at any time. The One-Hot encoding is the representation of classification variables acting as binary vectors. It requires mapping the classification values into integer values, and then representing each integer value as a binary vector, wherein all integers are zero except the index of the integers, and the index is marked as 1. By using the One-Hot encoding, the value selection of discrete features are extended to the Euclidean space, and a certain value of discrete features corresponds to a certain point in the Euclidean space. Since the calculation of distance or similarity among features is very important, and the calculation of distance or similarity commonly used is the similarity calculation in the Euclidean space, using the One-Hot encoding for discrete features will make the calculation of distance among features more reasonable.
[31] In Step 4: the primary module of over-sampling processing includes the following steps:
[32] Firstly, the K-means clustering algorithm is used to cluster samples of minority classes to form fixed K clusters and record each cluster center. E=Z |x,-z||2
[33] Wherein:
[34] In the above formula, / represents the i data sample in the data set; Ni
represents the i cluster; zJ represents a cluster center of the i cluster.
[35] m sampling points are selected from n samples nearest to the minority class sample. The sampling rate is optimized by particle swarm optimization (PSO) algorithm.
[36] In the formula:
[37] vi = wx v[+ cl x rl(pbestf- x+ c2 x r2(gbest," - z)
[38] d d
[39] w represents inertia factor, whose value is non-negative, i represents the
particle and d represents the d dimension of the particle. r1 , r2 represents two
random numbers located at [0,1] (for different dimensions of a particle, r1 and r2
have different values), pbest[i] refer to the position where the particle obtains the
highest (lowest) fitness and gbest[i] refer to the position where the whole system obtains the highest (lowest) fitness. Therefore, the optimal sampling rate may be found out.
[40] After selecting the original point and the sampling rate, new minority class samples are generated.
[41] In the formula: X_1- = X+rand(0,1)*(M, -X),i=1,2,,,,N
[421 X-' is a newly inserted sample; X is the selected original sample data;
rand(0,1) represents a certain random number between 0 and 1; Mi is the best
sampling point optimized by PSO in the nearest samples of the original sample data X.
[43] In Step 5, the primary decision tree module includes the following steps:
[44] 1) In the attribute space of the training sample, a region is segmented into two sub-regions, the output values of the sub-regions are determined. By recursively executing this step, a decision tree is constructed, the optimal segmentation pointj and segmentation point s are selected for solving mi1 [mn (y - c,2+mn (y, - c)2 X;E R, (j,s)
[45] XE R1(j,s)
[46] R1 and R 2 represent the segmented space. By traversing the variable j, the segmentation point s is scanned for the fixed segmentation variable j, such that above formula achieves the variable (j, s) with the minimum error.
[47] 2) The region is segmented with (j, s) and the output values in response are determined.
i S
[48] R jlS)=(2 1 s,R2(jlS)=(x
c, = N y,,xe R,,m=1,2
[49] , XgE R (,Sx)
[50] 3) (1) and (2) are called repeatedly from the two sub-regions until the conditions are met.
[51] 4) The signature space is segmented into M regions R1, R2R3.......RM and a decision tree is constructed;
f(x)= Yc,I(xe R,)
[52] -1
[53] In Step 6: the MEP post-pruning algorithm is adopted, with the step including the following steps:
[54] 1) If there are K classes of samples in total, the probability of belonging to class i in the training sample at the decision tree node t is as follows: n, (t) + P,, (t) * m n(t = +
[55] n(t)+m
[56] Wherein: i is the priori probability of the i class samples, namely the
proportion accounted by the i class samples in the whole data set; m is the influence
factor of i on the posterior probability , so that m is not a fixed value. Then the prediction error rate E,(t) of the node tis defined as the following formula:
E(t)= min{l-P,(t))= min n(t) - n,(t)+ (1+ Pi,(t)* m)
[57] n(t)+ m j
[58] If the priori probabilities of all classes are the same, namely PTh=1/k,(i= 1,2,,,,k), m=k, then E,(t) at this moment can be expressed as
+(k - 1) E, (t)= n(t) -n, (t)
[59] n(t)+ k
[60] In the above formula: n(t) is the total number of samples at the node t; n, (t) is the sample number of the primary class at the node t.
[61] Finally, the errors E,(Tt) of non-leaf nodes are calculated respectively, and the sub-tree is retained, otherwise the sub-tree is cut off.
[62] In Step 7: the system is constructed and the visual design is executed, including the following steps:
[63] The trained model is used to construct the system, and a visual operation interface is designed. After the user can input his/her blood routine data into the system, the system will diagnose whether he/she is suffered from asthma according to each entry of the user's data. After a large amount of data testing, the validation accuracy of the system reaches more than 96.5%, which is valuable in practical application.
[64] The above description only aims at providing preferred embodiments of the present disclosure, but not limiting the present disclosure in other forms. Anyone skilled in this art may use the technical content disclosed above to change or modify the embodiments herein into equivalent embodiments with equivalent variations to be applied in other fields. However, any simple modification, equivalent variation and modification made to the above embodiment according to the technical essence of the present disclosure without departing from the technical scheme content thereof still falls within the claimed scope of the technical scheme of the present disclosure.

Claims (4)

WHAT IS CLAIMED IS:
1. An asthma diagnosis system based on a decision tree and the improved SMOTE algorithm, comprising a primary module of data acquisition, a primary module of oversampling processing, a primary module of decision tree, a primary training module and a primary detection module; The primary module of data acquisition is used for acquiring physical examination data of blood routine, preprocessing the acquired data to obtain preprocessed data, and inputting the preprocessed data into the primary module of oversampling processing; The primary module of over-sampling processing is used for processing input data and dividing the processed and balanced data into two groups, namely a training sample set and a validation sample set; The primary module of over-sampling processing consists of a PSO optimization module, a newly generated sample detection module and a correlation sorting module; The PSO optimization module is an SMOTE over-sampling method based on a PSO algorithm; in order to improve the accuracy of the model diagnosis, it is necessary to over-sample asthma samples of minority classes; aiming at the blindness of neighboring selection due to fixed sampling rates of traditional SMOTE, PSO is used herein to optimize the over-sampling rate of SMOTE and select an optimal sampling rate. The newly generated sample detection module focuses on the fuzzy boundary issue of the newly generated sample points by SMOTE, and frames a space with the newly generated points being the center. If the samples of minority classes/majority classes are less than 1/2, the newly generated samples are considered as "garbage points", and are discarded, otherwise, they are retained. The correlation sorting module selects features of a whole data set of the generated data, sorts the features according to the correlation among the data, and selects the features before a median as the data set for training the decision tree model. The primary module of decision tree is used for constructing an asthma diagnosis model; As the asthma diagnosis is a binary classification issue, and the eigenvalues are continuous values, for which the CART regression tree algorithm is suitable, the CART regression tree algorithm is adopted. Moreover, since most of the data sets are less in data due to the unbalanced distribution of samples, ID3 and C4.5 algorithms respectively use information gain and information gain rate for note calculations. This will lead to the selection of nodes tending to a multi-class feature, thereby affecting the accuracy. Therefore, a CART regression tree algorithm can better deal with continuous eigenvalues, and it is more advantageous when a mean square deviation is used as a standard for selecting nodes. As the pre-pruning algorithm is simple, but it may lose more important information, the MEP post-pruning algorithm is adopted. For the MEP post-pruning algorithm, no additional pruning set is required, so that it can be applied in a wider range. Firstly, the K-fold cross-validation method is introduced to select the optimal influence factor m, and then m is substituted into the MEP algorithm to prune the original decision tree. In this way, a more accurate and precise decision tree can be obtained, and the influence characteristics of the decision tree can be retained at the same time. The primary training module trains the constructed asthma disease diagnosis model by using the training sample set, and obtains the trained asthma diagnosis model; specific process of this step is as follows: Cross-validation and grid search are used to construct the decision tree model, wherein is selected as the fold number of the cross-validation of training set and testing set, and a ratio of training set to testing set is 4:1. The training set is used for model training and the testing set is used for model checking. Each parameter value is divided into cells, results of different parameters are compared to find out the global optimal or nearly global optimal target value and parameter solution. The primary detection module is used for loading the trained asthma diagnosis model, and validating the trained asthma diagnosis model by using the validation sample set; If the trained asthma diagnosis model has an asthma diagnosis accuracy of greater than or equal to 85% on the validation sample set, the trained asthma diagnosis model is used as the final model, and the final model is used for asthma diagnosis; Otherwise, the parameters of the constructed asthma model are adjusted, and the constructed asthma model is retrained by using the training sample set until the asthma diagnosis accuracy of the trained model on the validation sample set is greater than or equal to 85%, then the final model is obtained, and the final model is used for asthma diagnosis.
2. The asthma diagnosis system based on a decision tree and the improved SMOTE algorithm according to claim 1, wherein the primary module of data acquisition acquires blood routine physical examination data from hospitals, wherein the physical examination data of asthma patients are taken as positive samples, and a large number of physical examination data of people not suffered from asthma are taken as negative samples. Each examinee is takes as a sample, and each sample has 23 features as follows: gender, age, basophil ratio, basophil count, eosinophil ratio, eosinophil count, HCT, hemoglobin, lymphocyte ratio, lymphocyte count, average erythrocyte hemoglobin content, average erythrocyte hemoglobin concentration, average erythrocyte volume, monocyte ratio, monocyte count, average platelet volume, neutrophil ratio, neutrophil count, PCT, platelet distribution width, platelet count, red blood cell count, red blood cell distribution width, white blood cell count, diagnosis result, etc.
3. The asthma diagnosis system based on a decision tree and the improved SMOTE algorithm according to claim 1, wherein the primary module of over-sampling processing comprises the following processing steps: Firstly, the K-means clustering algorithm is used to cluster samples of minority classes to form fixed K clusters and record each cluster center. wherein: E = ||x -z, ||
In the above formula, ' represents the i data sample in the data set; " represents
the i cluster; zJ represents a cluster center of the i cluster.
m sampling points are selected from n samples nearest to the minority class sample. The sampling rate is optimized by particle swarm optimization (PSO) algorithm. In the formula:
v = wx v +clx rl(pbest'-x)+c2xr2(gbestd-x
) d =d +d xi xi vi
w represents inertia factor, whose value is non-negative, i represents the i particle
and d represents the d dimension of the particle. r1 , r2 represents two random
numbers located at [0,1] (for different dimensions of a particle, r1 and r2 have
different values), pbest[i] refer to the position where the particle obtains the highest
(lowest) fitness and gbest[i] refer to the position where the whole system obtains the highest (lowest) fitness. Therefore, the optimal sampling rate may be found out. After selecting the original point and the sampling rate, new minority class samples are generated. In the formula:
Xww = X + rand(0,1) * (M, - X), i= 1,2,,,,. N
Xe, is a newly inserted sample; X is the selected original sample data; rand(0,1)
represents a certain random number between 0 and 1; Mi is the best sampling point
optimized by PSO in the nearest samples of the original sample data X.
4. The asthma diagnosis system based on a decision tree and the improved SMOTE algorithm according to claim 1, wherein the primary decision tree module comprises the following processing steps: After the positive and negative samples are balanced, a CART regression tree is generated; The MEP post-pruning algorithm is adopted for the generated decision tree.
AU2021103976A 2021-03-22 2021-07-08 Asthma diagnosis system based on decision tree and improved SMOTE algorithm Ceased AU2021103976A4 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110302072.1 2021-03-22
CN202110302072.1A CN112951413B (en) 2021-03-22 2021-03-22 Asthma diagnosis system based on decision tree and improved SMOTE algorithm

Publications (1)

Publication Number Publication Date
AU2021103976A4 true AU2021103976A4 (en) 2021-09-09

Family

ID=76227537

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2021103976A Ceased AU2021103976A4 (en) 2021-03-22 2021-07-08 Asthma diagnosis system based on decision tree and improved SMOTE algorithm

Country Status (3)

Country Link
CN (1) CN112951413B (en)
AU (1) AU2021103976A4 (en)
WO (1) WO2022198761A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114611616A (en) * 2022-03-16 2022-06-10 吕少岚 Unmanned aerial vehicle intelligent fault detection method and system based on integrated isolated forest
CN115169556A (en) * 2022-07-25 2022-10-11 美的集团(上海)有限公司 Model pruning method and device

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114091026A (en) * 2021-11-25 2022-02-25 云南电网有限责任公司信息中心 Integrated learning-based network abnormal intrusion detection method and system
CN116434950B (en) * 2023-06-05 2023-08-29 山东建筑大学 Diagnosis system for autism spectrum disorder based on data clustering and ensemble learning
CN117198517B (en) * 2023-06-27 2024-04-30 安徽省立医院(中国科学技术大学附属第一医院) Modeling method of motion reactivity assessment and prediction model based on machine learning
CN117316295A (en) * 2023-09-13 2023-12-29 哈尔滨工业大学 Endocrine disease cell identification method based on cell heterogeneity gene and pathway function
CN117637154B (en) * 2024-01-27 2024-03-29 南通大学附属医院 Nerve internal department severe index prediction method and system based on optimization algorithm
CN117743957B (en) * 2024-02-06 2024-05-07 北京大学第三医院(北京大学第三临床医学院) Data sorting method and related equipment of Th2A cells based on machine learning
CN117766155B (en) * 2024-02-22 2024-05-10 中国人民解放军海军青岛特勤疗养中心 Dynamic blood pressure medical data processing system based on artificial intelligence

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930856A (en) * 2016-03-23 2016-09-07 深圳市颐通科技有限公司 Classification method based on improved DBSCAN-SMOTE algorithm
JP2020004178A (en) * 2018-06-29 2020-01-09 ルネサスエレクトロニクス株式会社 Learning model evaluation method, learning method, device, and program
CN109147949A (en) * 2018-08-16 2019-01-04 辽宁大学 A method of based on post-class processing come for detecting teacher's sub-health state
CN111145902A (en) * 2019-12-06 2020-05-12 江苏大学 Asthma diagnosis method based on improved artificial neural network
CN112102945B (en) * 2020-11-09 2021-02-05 电子科技大学 Device for predicting severe condition of COVID-19 patient

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114611616A (en) * 2022-03-16 2022-06-10 吕少岚 Unmanned aerial vehicle intelligent fault detection method and system based on integrated isolated forest
CN114611616B (en) * 2022-03-16 2023-02-07 吕少岚 Unmanned aerial vehicle intelligent fault detection method and system based on integrated isolated forest
CN115169556A (en) * 2022-07-25 2022-10-11 美的集团(上海)有限公司 Model pruning method and device
CN115169556B (en) * 2022-07-25 2023-08-04 美的集团(上海)有限公司 Model pruning method and device

Also Published As

Publication number Publication date
CN112951413B (en) 2023-07-21
WO2022198761A1 (en) 2022-09-29
CN112951413A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
AU2021103976A4 (en) Asthma diagnosis system based on decision tree and improved SMOTE algorithm
CN109350032B (en) Classification method, classification system, electronic equipment and storage medium
CN107066791A (en) A kind of aided disease diagnosis method based on patient's assay
CN110246577B (en) Method for assisting gestational diabetes genetic risk prediction based on artificial intelligence
CN111951965B (en) Panoramic health dynamic monitoring and predicting system based on time sequence knowledge graph
CN113274031B (en) Arrhythmia classification method based on depth convolution residual error network
CN111145902A (en) Asthma diagnosis method based on improved artificial neural network
Inan et al. A hybrid probabilistic ensemble based extreme gradient boosting approach for breast cancer diagnosis
CN113470816A (en) Machine learning-based diabetic nephropathy prediction method, system and prediction device
WO2021073255A1 (en) Time series clustering-based medication reminder method and related device
CN112652398A (en) New coronary pneumonia severe prediction method and system based on machine learning algorithm
CN108346474A (en) The electronic health record feature selection approach of distribution within class and distribution between class based on word
CN115691788A (en) Dual attention coupling network diabetes classification system based on heterogeneous data
CN116564521A (en) Chronic disease risk assessment model establishment method, medium and system
CN109907751B (en) Laboratory chest pain data inspection auxiliary identification method based on artificial intelligence supervised learning
CN113674824B (en) Disease coding method and system based on regional medical big data
Sari et al. Best performance comparative analysis of architecture deep learning on ct images for lung nodules classification
Zhang et al. A deep Bayesian neural network for cardiac arrhythmia classification with rejection from ECG recordings
Chandra et al. Application Of Machine Learning K-Nearest Neighbour Algorithm To Predict Diabetes
CN111261283B (en) Electrocardiosignal deep neural network modeling method based on pyramid convolution layer
CN117195027A (en) Cluster weighted clustering integration method based on member selection
Ali et al. Cardiovascular disease detection using multiple machine learning algorithms and their performance analysis
Xu et al. Hybrid label noise correction algorithm for medical auxiliary diagnosis
Ahouz et al. Extracting rules for diagnosis of diabetes using genetic programming
Chudacek et al. Comparison of seven approaches for holter ECG clustering and classification

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry