CN109545372B - Patient physiological data feature selection method based on greedy-of-distance strategy - Google Patents

Patient physiological data feature selection method based on greedy-of-distance strategy Download PDF

Info

Publication number
CN109545372B
CN109545372B CN201811313953.8A CN201811313953A CN109545372B CN 109545372 B CN109545372 B CN 109545372B CN 201811313953 A CN201811313953 A CN 201811313953A CN 109545372 B CN109545372 B CN 109545372B
Authority
CN
China
Prior art keywords
wolf
vector
value
distance
greedy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811313953.8A
Other languages
Chinese (zh)
Other versions
CN109545372A (en
Inventor
钮焱
李军
童坤
刘宇强
李星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN201811313953.8A priority Critical patent/CN109545372B/en
Publication of CN109545372A publication Critical patent/CN109545372A/en
Application granted granted Critical
Publication of CN109545372B publication Critical patent/CN109545372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Abstract

The invention discloses a patient physiological data feature selection method based on a greedy distance strategy, which is improved aiming at the disadvantage of lower performance of the existing feature selection algorithm.

Description

Patient physiological data feature selection method based on greedy-of-distance strategy
Technical Field
The invention belongs to the technical field of medical treatment, relates to a patient physiological data feature selection method, and particularly relates to a wolf's own characteristics selection method based on a greedy distance strategy.
Background
Nowadays, the science and technology are developed at a high speed, the medical detection system is continuously updated, and the detection system is mature day by day. Heart disease is a killer of human health and has great significance in detecting it before the onset of the disease. The physiological data of the patient has large characteristic quantity and is redundant, the redundant characteristic makes the workload of detecting the heart disease become huge, and the effect becomes poor. The gray wolf optimization algorithm (GWO) is a group intelligence algorithm which is put into use at present, determines the position of prey to be prey by simulating the process of prey on wolf groups, namely, the optimal solution of the optimization problem, and is largely used in the feature selection part, but the algorithm itself has a slow convergence speed and a low search efficiency. The invention provides an improved wolf algorithm for a feature selection part, the algorithm replaces a general wolf algorithm position updating part with a greedy strategy, and the optimal price searching efficiency is improved, so that a better feature set can be extracted, and the detection of a sample is facilitated.
The purpose of feature selection is to extract important features from the data and remove redundant features. The feature selection can reduce data dimensionality, improve prediction performance, reduce overfitting, enhance understanding between features and feature values, and the like. In the real world, data to be classified often has a large number of redundant features, which means that some features in the data can be replaced by other features, and the replaced features can be removed in the classification process, furthermore, the mutual connection between the features has a great influence on the output effect of the classification, and if we can find out the connection between the features, we can dig out a large amount of information hidden in the data.
All feature selection algorithms can be classified into the following three categories, filtering, embedding and wrapping. The filtering method is realized by firstly selecting the characteristics of the data set and then training a classifier to split the data set and the classifier. The key of the method is to find a method for measuring the importance of features, such as pearson correlation coefficients, mutual information and the like. Then sorting is carried out according to the size of the metric, and the characteristic with the metric value sorted in the front is selected as the characteristic of the classification standard. However, the method has the disadvantage of neglecting the interdependence relationship between the features, and on one hand, the top-ranked features are equivalent to the features with redundancy introduced if the correlation between some features is strong. On the other hand, the feature in the next rank, although the metric value is not large and the value is not obvious, has good prediction effect independently of other features and is combined with other features, so that the valuable features are lost. The embedded method is to integrate the feature selection process into the learner training process, and the two are completed in a unified process, such as lasso ridge regression. The core idea of the wrapping method is that under the condition that a training model and an evaluation method of prediction effect are given, the prediction effect of each subset is evaluated according to different feature subsets in a feature space, and the feature subset with the best prediction effect is selected as a finally selected training subset. The method has the advantages that the characteristic subset selected by the wrapping method has better prediction effect than the filtering method in consideration of the interdependence relation among the characteristics, but the method has the defect of large calculation amount because the characteristic subset is in an exponential order. Different algorithms are generated for how efficiently the entire feature space is searched.
The genetic algorithm is the first intelligent algorithm used for solving the problem, the idea of the genetic algorithm is derived from the reproductive genetic process among natural biological populations, the solution of the optimization problem is considered as a gene, and then genetic communication including crossing and variation is carried out among the whole populations. The natural environment can be regarded as an objective function, and genes with high adaptability to the natural environment are reserved and are passed on to the next generation. Genetic algorithms have the ability to solve complex nonlinear optimization problems. However, the genetic algorithm has many disadvantages such as low operation efficiency and easy falling into the local optimal solution.
The Particle Swarm Optimization (PSO) concept stems from the study of the foraging behavior of a flock of birds. The potential solution of each optimization problem can be thought of as a point on a d-dimensional search space, which is called as a 'particle', all particles have an adaptive value determined by an objective function, each particle also has a speed to determine the flying direction and distance of the particle, and then the particles follow the current optimal particle to search in the solution space. Compared with the traditional multi-target optimization method, the particle swarm optimization method has great advantages in solving the multi-target problem. However, the method has the disadvantages of low precision, easy divergence and the like.
Disclosure of Invention
The invention aims to solve the problems that the existing patient physiological data feature selection algorithm is low in convergence speed and searching efficiency and is easy to fall into a local optimal solution, and provides a gray wolf feature selection method based on a distance greedy strategy, so that the algorithm classification accuracy is improved, and the data feature redundancy is reduced.
The technical scheme adopted by the invention is as follows: a patient physiological data characteristic selection method based on a greedy-of-distance strategy is characterized by comprising the following steps of:
step 1: inputting data captured from physiological data of a patient, and forming sample data containing labels into a training set; wherein, the label marks that the physiological data of the patient represents the disease state of the patient, and the disease state is divided into diseased state and non-diseased state;
step 2: aiming at the captured data, utilizing a gray wolf feature selection method based on a greedy distance strategy to select the physiological data features of the patient;
step 2.1: initializing the current iteration times, the number of the wolf individuals, the population size of the wolf group and the position vector of each wolf individual; the position vector of each wolf individual represents a candidate solution of the feature selection problem;
step 2.2: calculating the coding vector of each wolf according to the position vector, and calculating the adaptive value of each wolf according to the coding vector;
step 2.3: setting the maximum iteration number as maximum, and selecting the first three as alpha, beta and delta according to the size of the adaptive value;
step 2.4: calculating a distance map of each wolf;
step 2.5: updating the coding vectors of alpha, beta and delta according to the distance mapping of each wolf head;
step 2.6: judging whether t is larger than maximum;
if yes, executing the following step 3;
if not, returning to the step 2.4 after t is equal to t + 1;
and step 3: and outputting the feature subset corresponding to the alpha code vector.
The invention improves the disadvantage of lower performance of the existing feature selection algorithm, improves the position updating part in the original Huilusu algorithm by using a greedy strategy, improves the capability of the algorithm in developing the optimal solution, improves the convergence rate, can effectively improve the classification accuracy and reduce the data feature redundancy.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of the present invention;
FIG. 2 is a graph comparing the detection error rate of the present invention with three other feature selection algorithms;
FIG. 3 is a comparison graph of feature selection numbers after feature selection in the present invention versus three other feature selection algorithms.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
The core of the technology of the invention is to regard the feature selection problem of medical data containing N features as a discrete optimization combination problem in a binary system N-dimensional space, each feature subset can be represented by an N-dimensional binary vector, and the improved Hui wolf optimization algorithm is adopted to search in the N-dimensional binary system space.
Referring to fig. 1, the method for selecting physiological data characteristics of a patient based on a greedy-of-distance strategy provided by the invention comprises the following steps:
step 1: inputting data captured from physiological data of a patient, and forming sample data containing labels into a training set; the label marks the diseased condition in the physiological data of the patient, the diseased condition is divided into a diseased condition and an undiseased condition, 0 represents normal, and 1-4 represents the degree of vasoconstriction;
in this embodiment, Z pieces of captured medical data containing N features are input, each piece of data in the data set is a sample, the sample capacity is Z, each piece of input data is represented by a feature vector, each dimension of the vector represents one feature of the data, and all samples containing category labels constitute a training sample set T.
Step 2: aiming at the captured data, utilizing a gray wolf feature selection method based on a greedy distance strategy to select the physiological data features of the patient;
step 2.1: initializing the current iteration times, the number of the wolf individuals, the population size of the wolf group and the position vector of each wolf individual; the position vector of each wolf individual represents a candidate solution of the feature selection problem;
(1) initializing the current iteration time t as 1, the number i of a wolf individual as 1, and the population size of a wolf group as K;
(2) for the wolf individual from i ═ 1, 2, … K, the position vector of each head wolf in the wolf cluster is initialized randomly within (0, max)
Figure BDA0001855769850000041
The vector dimension is N, wherein max represents the maximum value of the position of the wolf individual, and is taken as 1;
step 2.2: calculating the coding vector of each wolf according to the position vector, and calculating the adaptive value of each wolf according to the coding vector;
(1) find a mapping function f that can map values in the (0, max) interval into {0, 1} the discrete set, and guarantee that there is a number δ in (0, max) such that f (temp1) for all temp1 ∈ (0, δ) and temp2 ∈ [ δ, max)<f (temp2) so that the continuous feature vector can be used
Figure BDA0001855769850000042
Become binary coded vectors containing only 0 and 1
Figure BDA0001855769850000043
The following function is selected as the mapping function in this embodiment:
Figure BDA0001855769850000044
wherein position (i, j) represents
Figure BDA0001855769850000045
The value of the j-th dimension in (i, j) represents
Figure BDA0001855769850000046
The j-th dimension of the vector, so that the position of the gray wolf is converted from a continuous value to a binary coded value of 0, 1 by using the function, and the binary coded value can be used in a feature selection algorithm.
(2) Encoding vector in wolf
Figure BDA0001855769850000051
Figure BDA0001855769850000051
1 represents that the characteristic is selected, 0 represents that the characteristic is not selected, and the training set T is arranged in the coding vector
Figure BDA0001855769850000052
Retaining the corresponding selected features, deleting the unselected features to obtain a new training setIs T _ solution.
(3) The average precision (or classification error rate) of the classified T _ solution is calculated by a classifier, and the precision is used as a wolf pack coding vector
Figure BDA0001855769850000053
Corresponding adaptive value Pi. The classifier can select different classifiers such as an SVM (support vector machine), an artificial neural network and the like according to actual conditions, the embodiment uses a KNN classifier, and K in KNN takes a value of 5;
step 2.3: setting the maximum iteration number as maximum, and selecting the first three as alpha, beta and delta according to the size of the adaptive value;
setting maximum iteration number maximum, and then selecting adaptive value PiThe optimal encoding vector of wolf is taken as the encoding vector of alpha. The excellent of the adaptive value is relative, and is related to the meaning of the selected adaptive value function, the invention selects the classification error rate as the adaptive value of the wolf, and the lower the classification error rate is, the better the classification effect is, the better the wolf is. Therefore, the initialization of α, β and δ in the present invention is divided into the following three substeps:
(1) selecting an adaptation value PiLowest wolf
Figure BDA0001855769850000054
Initializing a code vector of alpha
Figure BDA0001855769850000055
Code vector of wolf j
Figure BDA0001855769850000056
(2) After j is deleted, the adaptive value P is selected from the rest wolf individualsiLowest wolf
Figure BDA0001855769850000057
Initializing a code vector of beta
Figure BDA0001855769850000058
Code vector of wolf's n
Figure BDA0001855769850000059
(3) After n is deleted, an adaptive value P is finally selected from the remaining wolf individualsiLowest wolf
Figure BDA00018557698500000510
Initializing a code vector of delta
Figure BDA00018557698500000511
Code vector of wolf m
Figure BDA00018557698500000512
Step 2.4: calculating a distance map of each wolf;
the step is the core of the invention and is an innovation point, the invention improves the defects of the existing wolf optimization algorithm, improves the capability of the algorithm for developing the optimal solution, improves the convergence speed, and can effectively improve the classification accuracy and reduce the data characteristic redundancy.
In the embodiment, a greedy strategy is utilized to calculate the distance mapping of each wolf head; the specific implementation comprises the following substeps:
step 2.4.1: computing successive encoded distance vectors based on selection of alpha, beta, and delta
Figure BDA0001855769850000061
Figure BDA0001855769850000062
Wherein the content of the first and second substances,
Figure BDA0001855769850000063
representing parameters
Figure BDA0001855769850000064
Three different random vectors, parameters
Figure BDA0001855769850000065
Calculated in step 2.4.2;
Figure BDA0001855769850000066
and
Figure BDA0001855769850000067
distances representing the individual distances α, β and δ are defined as follows:
Figure BDA0001855769850000068
Figure BDA0001855769850000069
Figure BDA00018557698500000610
wherein the content of the first and second substances,
Figure BDA00018557698500000611
representing parameters
Figure BDA00018557698500000612
Three different random vectors of, wherein the parameters
Figure BDA00018557698500000613
Calculated in step 2.4.2;
Figure BDA00018557698500000614
and
Figure BDA00018557698500000615
position vectors representing α, β, and δ in the t-th iteration;
Figure BDA00018557698500000616
is a middleA parameter representing the final position of each wolf moving along α, β, and δ at the tth iteration; it is defined as follows:
Figure BDA00018557698500000617
Figure BDA00018557698500000618
Figure BDA00018557698500000619
step 2.4.2: calculating parameters
Figure BDA00018557698500000620
And a, calculated using the following formula:
Figure BDA00018557698500000621
Figure BDA00018557698500000622
Figure BDA00018557698500000623
wherein the content of the first and second substances,
Figure BDA0001855769850000071
is in a value range of [0, 1]A is a parameter variable for controlling the development and searchability of the algorithm, the parameter variable is linearly reduced from 2 to 0 along with the increase of the iteration times, t is the number of current iteration times, and maximer is the total number of algorithm iteration times;
step 2.4.3: computing
Figure BDA0001855769850000072
Figure BDA0001855769850000073
Wherein
Figure BDA0001855769850000074
Represents the calculation of step 2.4.1
Figure BDA0001855769850000075
The value of the n-th dimension is,
Figure BDA0001855769850000076
representing a vector
Figure BDA0001855769850000077
The value of the nth dimension; b represents the maximum value of the assumed problem search interval,
Figure BDA0001855769850000078
is represented by
Figure BDA0001855769850000079
Mapping functions obtained in different problem search intervals;
step 2.4.4: calculating XdChange and hold;
Figure BDA00018557698500000710
Figure BDA00018557698500000711
Figure BDA00018557698500000712
wherein the content of the first and second substances,
Figure BDA00018557698500000713
is composed of
Figure BDA00018557698500000714
The value of the d-th dimension in (1),
Figure BDA00018557698500000715
binary coded vectors, X, representing individualsdRepresenting the d-dimension value of each single binary coding vector; ddFor continuously encoding vectors
Figure BDA00018557698500000716
The value of d is [0, 1 ]]Random numbers in intervals, where hold and change represent pairs
Figure BDA00018557698500000717
The value after the operation is taken as XdThe value of (c).
Step 2.5: updating the coding vectors of alpha, beta and delta according to the distance mapping of each wolf head;
updating the code vectors of alpha, beta and delta, sorting the updated individual adaptive values of wolfs, and selecting the adaptive value P of the three-headed wolf with the first three of the adaptive valuesα'、Pβ' and Pδ' Adaptation values P to original alpha, beta and deltaα,PβAnd PδPerforming corresponding comparison if the new adaptive value PiIs superior to the original adaptation value PiThen the corresponding code vector is used
Figure BDA0001855769850000081
Updating the code vector corresponding to the new adaptive value
Figure BDA0001855769850000082
Otherwise, the updating is not carried out.
Step 2.6: judging whether t is larger than maximum;
if yes, executing the following step 3;
if not, returning to the step 2.4 after t is equal to t + 1;
and step 3: and outputting the feature subset corresponding to the alpha code vector.
The coded vector of alpha
Figure BDA0001855769850000083
Binary string representing optimal feature subset, 1 representing feature selected, 0 representing feature not selected, and outputting
Figure BDA0001855769850000084
And the feature corresponding to the dimension with the value of 1 is extracted.
The effects of the present invention will be further described below by comparative experiments.
(1) Simulating conditions;
the data set used in the experiment was a set of cardiac data in the uci database, which was divided equally into two parts, one as the training set and the other as the test set. In the experiment, the language used by each method is realized by matlab.
(2) Experimental content and results;
the method comprises the steps of utilizing a group of heart disease data in an uci database as a data set, utilizing a KNN algorithm in matlab as a classifier to detect, then optimizing a post-algorithm GWO, a Genetic Algorithm (GA) and a particle swarm algorithm (PSO) as algorithms of a feature selection part, utilizing KNN as a sample classifier, utilizing a sample classification error rate and a final feature selection number as comparison indexes, and comparing average performance indexes of four different feature selection algorithms under different running times.
The data set used in the experiment was a set of cardiac disease data sets provided by the UCI database, for a total of 303 data, each of which recorded all physiological indicators of cardiac patients. Each datum consists of 14 features and a label, the population number of the wolf pack is set to be 12, the iteration number maximum of the algorithm is set to be 6, KNN is selected as a classifier in the experiment, and K is 5.
The specific 14 data characteristics are respectively: age represents the patient's age; sex denotes patient gender, wherein 0 denotes female and 1 denotes male; cp represents the chest pain type of the patient and is divided into four types, namely 1, 2, 3 and 4; trestbps represents the resting blood pressure of the patient; chol denotes the cholesterol value of the patient; fbs denotes fasting plasma glucose level of the patient; restecg means electrocardiogram results of patients, 0 means normal, 1 means mild, 2 means severe; thalach represents the maximum heart beat number of the patient; exang indicates whether the patient has exercise angina, 0 indicates present, and 1 indicates absent; oldpeak represents the number of st wave drops caused by patient motion; slop represents the patient's motion st band slope; ca represents the number of vessels seen by the patient's fluoroscopy; thal represents the defect types of the patients, namely 3, 6 and 7; status indicates the disease status of the patient, 0 indicates normal, and 1 to 4 indicate the degree of vasoconstriction.
The performance indexes of four different feature selection algorithms under different algorithm running times are compared in experiments, and the algorithm running times are increased from 20 times to 200 times. The abscissa of fig. 2 and 3 represents the number of algorithm runs, 1 represents the number of first experimental runs as 20, and 10 represents the number of tenth experimental runs as 200. Errorb and count indicate the error rate and feature selection number after the original Grey wolf algorithm is used as the feature selection part, and Errore and count indicate the error rate and feature selection number after the improved Grey wolf algorithm is used as the feature selection part. As can be seen from fig. 1, except for the 2 nd experiment (40 times of operation), the classification accuracy of the improved algorithm is better than that of all other algorithms, the average error rate is below 1.85%, the effect is obviously improved, the fluctuation amplitude is small, and the operation effect is stable. As can be seen from fig. 2, the average feature selection numbers using the improved algorithm were all less than 3.85, both lower than those of PSO and GA in ten experiments. Compared with the improved gray wolf algorithm, the number of feature choices is reduced greatly, and the volatility is stable.
In conclusion, experiments show that under the same conditions, the algorithm can achieve better effect in the aspect of feature selection. In longitudinal comparison, the detection error rate of the algorithm after feature selection is superior to that of the original BGWO, and the algorithm is superior to the EBGWO of the improved version in the aspect of feature selection number, so that the advantages of the two algorithms are combined in general. The algorithm is superior to PSO and GA no matter the number of feature choices or the detection error rate are compared in the transverse direction, the convergence speed of the algorithm is high, and a good effect can be achieved with few iteration times.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (4)

1.A patient physiological data characteristic selection method based on a greedy-of-distance strategy is characterized by comprising the following steps of:
step 1: inputting data captured from physiological data of a patient, and forming sample data containing labels into a training set; wherein, the label marks that the physiological data of the patient represents the disease state of the patient, and the disease state is divided into diseased state and non-diseased state;
step 2: aiming at the captured data, utilizing a gray wolf feature selection method based on a greedy distance strategy to select the physiological data features of the patient;
step 2.1: initializing the current iteration times, the number of the wolf individuals, the population size of the wolf group and the position vector of each wolf individual; the position vector of each wolf individual represents a candidate solution of the feature selection problem;
wherein, the number of initialization iterations t is 1, the population size of the wolf cluster is K, and for wolf individuals with i being 1, 2, … and K, the position vector of each head wolf in the wolf cluster is initialized randomly in (0, max)
Figure FDA0002987005230000011
The vector dimension is N, where max represents the maximum value of the position of the wolf individual;
step 2.2: calculating the coding vector of each wolf according to the position vector, and calculating the adaptive value of each wolf according to the coding vector;
step 2.3: setting the maximum iteration number as maximum, and selecting the first three as alpha, beta and delta according to the size of the adaptive value;
step 2.4: calculating a distance map of each wolf;
wherein, the distance mapping of each wolf is calculated by utilizing a greedy strategy; the specific implementation comprises the following substeps:
step 2.4.1: computing successive encoded distance vectors based on selection of alpha, beta, and delta
Figure FDA0002987005230000012
Figure FDA0002987005230000013
Wherein the content of the first and second substances,
Figure FDA0002987005230000014
representing parameters
Figure FDA0002987005230000015
Three different random vectors, parameters
Figure FDA0002987005230000016
Calculated in step 2.4.2;
Figure FDA0002987005230000017
and
Figure FDA0002987005230000018
distances representing the individual distances α, β and δ are defined as follows:
Figure FDA0002987005230000019
Figure FDA00029870052300000110
Figure FDA00029870052300000111
wherein the content of the first and second substances,
Figure FDA0002987005230000021
representing parameters
Figure FDA0002987005230000022
Three different random vectors of, wherein the parameters
Figure FDA0002987005230000023
Calculated in step 2.4.2;
Figure FDA0002987005230000024
and
Figure FDA0002987005230000025
position vectors representing α, β, and δ in the t-th iteration;
Figure FDA0002987005230000026
for the intermediate parameters, the final position of each wolf moving along α, β, and δ at the tth iteration is represented; it is defined as follows:
Figure FDA0002987005230000027
Figure FDA0002987005230000028
Figure FDA0002987005230000029
step 2.4.2: calculating parameters
Figure FDA00029870052300000210
And a, calculated using the following formula:
Figure FDA00029870052300000211
Figure FDA00029870052300000212
Figure FDA00029870052300000213
wherein the content of the first and second substances,
Figure FDA00029870052300000214
is in a value range of [0, 1]A is a parameter variable for controlling the development and searchability of the algorithm, the parameter variable is linearly reduced from 2 to 0 along with the increase of the iteration times, t is the number of current iteration times, and maximer is the total number of algorithm iteration times;
step 2.4.3: computing
Figure FDA00029870052300000215
Figure FDA00029870052300000216
Wherein
Figure FDA00029870052300000217
Represents the calculation of step 2.4.1
Figure FDA00029870052300000218
The value of the n-th dimension is,
Figure FDA00029870052300000219
representing a vector
Figure FDA00029870052300000220
The value of the nth dimension; b represents the maximum value of the assumed problem search interval,
Figure FDA00029870052300000221
is represented by
Figure FDA00029870052300000222
Mapping functions obtained in different problem search intervals;
step 2.4.4: calculating XdChange and hold;
Figure FDA0002987005230000031
Figure FDA0002987005230000032
Figure FDA0002987005230000033
wherein the content of the first and second substances,
Figure FDA0002987005230000034
is composed of
Figure FDA0002987005230000035
The value of the d-th dimension in (1),
Figure FDA0002987005230000036
binary coded vectors, X, representing individualsdRepresenting the d-dimension value of each single binary coding vector; ddFor continuously encoding vectors
Figure FDA0002987005230000037
The value of d is [0, 1 ]]Random numbers in intervals, where hold and change represent pairs
Figure FDA0002987005230000038
The value after the operation is taken as XdA value of (d);
step 2.5: updating the coding vectors of alpha, beta and delta according to the distance mapping of each wolf head;
wherein, updating the code vectors of alpha, beta and delta comprises sorting the updated individual adaptive values of wolf, and selecting the adaptive value P of the three-headed wolf with the first three of the adaptive valuesα'、Pβ' and Pδ' Adaptation values P to original alpha, beta and deltaα,PβAnd PδPerforming corresponding comparison if the new adaptive value PiIs superior to the original adaptation value PiThen the corresponding code vector is used
Figure FDA0002987005230000039
Updating the code vector corresponding to the new adaptive value
Figure FDA00029870052300000310
Otherwise, not updating;
step 2.6: judging whether t is larger than maximum;
if yes, executing the following step 3;
if not, returning to the step 2.4 after t is equal to t + 1;
and step 3: and outputting the feature subset corresponding to the alpha code vector.
2. The greedy-of-distance-strategy-based patient physiological data feature selection method as recited in claim 1, wherein: in step 1, for the data captured and labeled in known manner, each piece of data is represented by a feature vector, and each dimension of the vector represents a feature of the data.
3. The greedy-of-distance-strategy-based patient physiological data feature selection method as recited in claim 1, wherein: in step 2.2, find a mapping function f that maps the values in the (0, max) interval into the {0, 1} discrete set and ensures that there is a number δ in (0, max) such that f (temp1) exists for all temp1 ∈ (0, δ) and temp2 ∈ [ δ, max)<f (temp2), so that the continuous feature vector
Figure FDA0002987005230000041
Become binary coded vectors containing only 0 and 1
Figure FDA0002987005230000042
4. The greedy-of-distance-strategy-based patient physiological data feature selection method as recited in claim 1, wherein: in step 2.2, the vector is encoded according to binary of wolf
Figure FDA0002987005230000043
Calculating an adaptation value of each wolf, the code vector of each wolf
Figure FDA0002987005230000044
1 represents that the characteristic is selected, 0 represents that the characteristic is not selected, and the training set T is enabled to be encoded in the encoding vector
Figure FDA0002987005230000045
Corresponding to the training set under the selected characteristics as T _ solution, calculating the average precision or the classification error rate P after classifying the T _ solution by utilizing a classifieriThe accuracy is used as the wolf group code vector
Figure FDA0002987005230000046
The corresponding adaptation value.
CN201811313953.8A 2018-11-06 2018-11-06 Patient physiological data feature selection method based on greedy-of-distance strategy Active CN109545372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811313953.8A CN109545372B (en) 2018-11-06 2018-11-06 Patient physiological data feature selection method based on greedy-of-distance strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811313953.8A CN109545372B (en) 2018-11-06 2018-11-06 Patient physiological data feature selection method based on greedy-of-distance strategy

Publications (2)

Publication Number Publication Date
CN109545372A CN109545372A (en) 2019-03-29
CN109545372B true CN109545372B (en) 2021-07-06

Family

ID=65846544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811313953.8A Active CN109545372B (en) 2018-11-06 2018-11-06 Patient physiological data feature selection method based on greedy-of-distance strategy

Country Status (1)

Country Link
CN (1) CN109545372B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382366B (en) * 2020-03-03 2022-11-25 重庆邮电大学 Social network user identification method and device based on language and non-language features
CN112002419B (en) * 2020-09-17 2023-09-26 吾征智能技术(北京)有限公司 Disease auxiliary diagnosis system, equipment and storage medium based on clustering

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832830A (en) * 2017-11-17 2018-03-23 湖北工业大学 Intruding detection system feature selection approach based on modified grey wolf optimized algorithm

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832830A (en) * 2017-11-17 2018-03-23 湖北工业大学 Intruding detection system feature selection approach based on modified grey wolf optimized algorithm

Also Published As

Publication number Publication date
CN109545372A (en) 2019-03-29

Similar Documents

Publication Publication Date Title
Purwar et al. Hybrid prediction model with missing value imputation for medical data
Zriqat et al. A comparative study for predicting heart diseases using data mining classification methods
Mucherino et al. Data mining in agriculture
CN111180068A (en) Chronic disease prediction system based on multi-task learning model
CN110110753B (en) Effective mixed characteristic selection method based on elite flower pollination algorithm and ReliefF
CN109545372B (en) Patient physiological data feature selection method based on greedy-of-distance strategy
CN113962278A (en) Intelligent ensemble learning classification method based on clustering
Bicego K-Random Forests: A K-means style algorithm for Random Forest clustering
CN116628510A (en) Self-training iterative artificial intelligent model training method
CN110400610B (en) Small sample clinical data classification method and system based on multichannel random forest
Rattan et al. Artificial intelligence and machine learning: what you always wanted to know but were afraid to ask
Thinsungnoen et al. Deep autoencoder networks optimized with genetic algorithms for efficient ECG clustering
CN113707317A (en) Disease risk factor importance analysis method based on mixed model
Thakkar et al. Metaheuristics in classification, clustering, and frequent pattern mining
Angayarkanni Predictive analytics of chronic kidney disease using machine learning algorithm
CN114255865A (en) Diagnosis and treatment project prediction method based on recurrent neural network
CN113378946A (en) Robust multi-label feature selection method considering feature label dependency
CN112800224A (en) Text feature selection method and device based on improved bat algorithm and storage medium
Kecman et al. Adaptive local hyperplane for regression tasks
Aslan An Artificial Bee Colony-Guided Approach for Electro-Encephalography Signal Decomposition-Based Big Data Optimization
CN114565972B (en) Skeleton action recognition method, system, equipment and storage medium
Punjabi et al. Enhancing Performance of Lazy Learner by Means of Binary Particle Swarm Optimization
CN116130110A (en) Biological big data analysis, disease precise identification, classification and prediction system based on algorithm and blockchain and application
Priya et al. Deep learning-based breast cancer disease prediction framework for medical industries
Kiage A data mining approach for forecasting cancer threats

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant