CN111709441A - Behavior recognition feature selection method based on improved feature subset discrimination - Google Patents
Behavior recognition feature selection method based on improved feature subset discrimination Download PDFInfo
- Publication number
- CN111709441A CN111709441A CN202010377788.3A CN202010377788A CN111709441A CN 111709441 A CN111709441 A CN 111709441A CN 202010377788 A CN202010377788 A CN 202010377788A CN 111709441 A CN111709441 A CN 111709441A
- Authority
- CN
- China
- Prior art keywords
- feature
- dfs
- subset
- features
- redundancy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010187 selection method Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 22
- 238000005259 measurement Methods 0.000 claims abstract description 12
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims abstract description 8
- 230000008407 joint function Effects 0.000 claims abstract description 5
- 239000004576 sand Substances 0.000 claims abstract description 4
- 239000013598 vector Substances 0.000 claims description 12
- 238000012935 Averaging Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 238000007637 random forest analysis Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 4
- 230000006399 behavior Effects 0.000 description 17
- 238000004422 calculation algorithm Methods 0.000 description 10
- 230000001133 acceleration Effects 0.000 description 5
- 238000001914 filtration Methods 0.000 description 4
- 238000002790 cross-validation Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2134—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
- G06F18/21343—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis using decorrelation or non-stationarity, e.g. minimising lagged cross-correlations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a behavior recognition feature selection method based on improved feature subset distinguishability, which comprises the steps of establishing a sample feature set based on the feature subset distinguishability measurement criterion of DFS, and obtaining a feature subset distinguishability measurement formula DFSS(ii) a Acquiring mutual information between two random variables based on information theory and probability theory concepts, thereby obtaining an expression of minimum redundancy of characteristics in a class; combined feature subset distinguishability measurement formula DFSSAnd an intra-class feature minimum redundancy expression, defining a joint function of maximum correlation and minimum redundancy; and training a joint function of the maximum correlation and the minimum redundancy to complete the selection process. According to the behavior recognition feature selection method, on one hand, redundancy analysis is added, and redundancy features are deleted, so that the classification accuracy is improved, and meanwhile, the calculation complexity is further reduced; on the other hand, through the calculation of the maximum correlation and the minimum redundancy, the redundancy between the characteristics is reduced while the distinguishing capability between the categories of the characteristic subsets is ensured.
Description
Technical Field
The invention belongs to the technical field of behavior recognition selection methods, and particularly relates to a behavior recognition feature selection method based on improved feature subset discrimination.
Background
In recent years, with the rapid development of computer science and sensor technology, sensors with different shapes are gradually influencing various layers of people's lives, and the recognition and understanding of human actions and behaviors based on sensor data constitute a key task in future human-centered calculation.
The research of human behavior recognition can provide more humanized services for people, such as: monitoring the old, body feeling games and health care. Human perception behavior recognition based on the sensor is a new branch in behavior recognition, is more convenient, free and safe compared with behavior recognition based on images, is low in dependence on external environment and can be freely worn, and the privacy of user data is improved.
In the human behavior recognition research based on the acceleration sensor, characteristics such as a time domain and a frequency domain are generally extracted. Where a small subset of features may result in a relatively high classification error rate, and a large subset of features may result in a relatively low classification error rate. It should be noted that the extracted features cannot be excessive because: as the number of features increases, the amount of computation grows exponentially, resulting in dimensionality disasters; in addition, because the extracted features have irrelevant and redundant characteristics, the extracted features need to be subjected to dimensionality reduction, relevant features are selected, irrelevant and redundant features are removed, and a better classification effect is finally achieved.
According to the relationship between the feature selection method and the classifier, the feature selection method can be generally classified into four categories: filter, pack, inlay and hybrid. Filtering feature selection algorithms are independent of classifiers, typically using distance, correlation, consistency, or information metrics to measure the correlation between features and classification classes and the redundancy between features, different evaluation criteria may result in a distinct optimal subset of features. The packing type feature selection method considers the interaction among the features, depends on the performance of the classifier, and directly outputs the optimal feature subset after the algorithm is completed. The feature selection is taken as a construction requirement of a learning algorithm and is carried out synchronously with classification, and the embedded feature selection algorithm can embed the feature selection into an algorithm constructed by a classifier so as to effectively select a feature subset. The hybrid feature selection algorithm combines the advantages of both filtering and wrapping algorithms, and first generates a certain feature subset using the filtering algorithm, and then further compresses the feature subset using the wrapping algorithm.
Considering the advantage of the hybrid mode having higher classification accuracy, the Feature selection method based on the Feature Subset (DFS) weighing criterion of the Subset discrimination of the Xian English considers the correlation between the features, and combines the search strategy and the classifier to optimize the Feature Subset by calculating the size of the joint contribution of a plurality of features to the classification. However, in the process of selecting the features, the method does not consider the influence of the redundancy among the features on the classification result, and the preferred feature subset has redundant features.
Disclosure of Invention
The invention aims to provide a behavior recognition feature selection method based on improved feature subset discrimination, and solves the problems of more redundant features, low classification accuracy and high calculation complexity in the existing behavior recognition feature selection method.
The invention adopts the technical scheme that the behavior recognition feature selection method based on the improved feature subset discrimination comprises the following steps:
step 1, establishing a sample feature set based on the feature subset distinguishability measuring criterion of the DFS, and obtaining a feature subset distinguishability measuring formula DFSS;
Step 2, acquiring mutual information between two random variables based on information theory and probability theory concepts, thereby obtaining an intra-class feature minimum redundancy expression in the sample feature set in the step 1;
step 3, combining the feature subset distinguishability measurement formula DFS in the step 1SAnd step 2, an intra-class characteristic minimum redundancy expression is used for defining a joint function of maximum correlation and minimum redundancy;
and 4, training the combined function of the maximum correlation and the minimum redundancy in the step 4 to finish the selection process.
The present invention is also characterized in that,
the step 1 specifically comprises the following steps:
let m dimension real space be written as RmAny k (k is not less than 2) category, set sampleIs n, and the total number of the i-th class samples is niAnd the spatial dimension of the sample is m, the training set T can be expressed as formula (1):
T={(xt,yt)|xt∈Rm,yt∈{1,2...k},t∈{1,2...n}} (1)
therefore, the method includes k categories, | S | represents the number of elements in the feature set S and 0 & lt | S | < m, and | S | feature subset discrimination measurement formula DFS of featuresSIs formula (2):
wherein the parametersRepresents the mean vector obtained after averaging all samples,represents the mean vector obtained after averaging the ith type samples,and representing the feature vector of the jth sample of the ith category, wherein the vectors of the three categories all contain | S | < m features.
The step 2 specifically comprises the following steps:
setting the probability density of any two random variables X and Y and the joint probability density of the random variables X and Y as p (X), p (Y) and p (X, Y) in sequence; then the mutual information between these two random variables can be defined as in equation (3):
the calculation formula for setting the minimum redundancy of the features in the class is as the following formula (4):
wherein the parameters u, v represent any two features in the feature set, and r (S) represents mutual information values between all features in the set S, i.e. redundancy between features.
The step 3 specifically comprises the following steps:
combining equations (2) and (4), the combined function that yields the maximum correlation and the minimum redundancy is shown as equation (5):
f(DFSS,R'(S))=DFSS-R'(S)
wherein the parameter k represents the number of categories;respectively expressed under the ith category, the characteristics α1And α2The true value in the jth sample; and DFSSIndicating the DFS value size corresponding to the subset of features containing | S | features.
The step 4 specifically comprises the following steps:
firstly, a feature subset distinguishability measurement formula DFSSSelecting the characteristic with the maximum DFS value from the empty set, and adding the characteristic into the initially empty optimal characteristic subset X;
then, judging whether the newly added features are reserved or not according to the accuracy of the random forest classifier corresponding to the optimal feature subset X after the new features are added: if the accuracy rate is increased, retaining the newly added feature; otherwise, delete it; and (4) performing iteration until all the characteristics are tested, wherein the optimal characteristic subset X after the iteration is finished is the final selection result.
The invention has the beneficial effects that: according to the behavior recognition feature selection method based on the improved feature subset discrimination, on one hand, redundancy analysis is added in the feature selection process, redundant features are deleted, and the calculation complexity is further reduced while the classification accuracy is improved; on the other hand, through the calculation of the maximum correlation and the minimum redundancy, the distinguishing capability among the categories of the feature subsets is ensured, meanwhile, the redundancy among the features is reduced, and the method has good practical value.
Detailed Description
The present invention will be described in detail below with reference to specific embodiments.
The invention relates to a behavior recognition characteristic selection method based on improved characteristic subset distinguishability,
step 1, establishing a sample feature set based on the feature subset distinguishability measuring criterion of the DFS, and obtaining a feature subset distinguishability measuring formula DFSS(ii) a The method specifically comprises the following steps:
let m dimension real space be written as RmAnd any k (k is more than or equal to 2) type, setting the total recorded number of the samples as n and the total number of the ith type samples as niAnd the spatial dimension of the sample is m, the training set T can be expressed as formula (1):
T={(xt,yt)|xt∈Rm,yt∈{1,2...k},t∈{1,2...n}} (1)
therefore, the method includes k categories, | S | represents the number of elements in the feature set S and 0 & lt | S | < m, and | S | feature subset discrimination measurement formula DFS of featuresSIs formula (2):
wherein the parametersRepresents the mean vector obtained after averaging all samples,represents the mean vector obtained after averaging the ith type samples,and representing the feature vector of the jth sample of the ith category, wherein the vectors of the three categories all contain | S | < m features.
Larger numerator values indicate more sparseness between feature subset classes, and smaller denominator values indicate more clustering within feature subset classes. Thus, DFSSThe larger the value of (A), the stronger the inter-class discrimination capability of the characteristic subset is, and the classification identification effect isThe better the result, the greater the impact on the classification results.
Step 2, acquiring mutual information between two random variables based on information theory and probability theory concepts, thereby obtaining an intra-class feature minimum redundancy expression in the sample feature set in the step 1; the method specifically comprises the following steps:
in the information theory and the probability theory, mutual information of two random variables is used to measure the degree of interdependence between the two variables. More specifically, it is the "amount of information" obtained by quantifying another random variable by observing it. Which is different from the correlation coefficient and is not limited to a real-valued random variable, determines how similar the products of the joint distribution and the respective edge distributions are. By utilizing mutual information, the feature redundancy can be effectively reduced, the classification accuracy can be further improved, and the method has outstanding contribution in the aspect of feature optimization.
Mutual information indicates the degree of correlation, i.e., the degree of redundancy, of two features. Setting the probability density of any two random variables X and Y and the joint probability density of the random variables X and Y as p (X), p (Y) and p (X, Y) in sequence; then the mutual information between these two random variables can be defined as in equation (3):
the calculation formula for setting the minimum redundancy of the features in the class is as the following formula (4):
wherein the parameters u, v represent any two features in the feature set, and r (S) represents mutual information values between all features in the set S, i.e. redundancy between features.
The process of feature selection can be viewed as a process of searching for the most representative feature subset, namely: on the basis of maximizing the accuracy, the computational complexity is reduced. Therefore, the selected feature subset should not only maximize the correlation with the classification category, but also minimize the redundancy between features.
In the step 3, the step of,combining the feature subset distinguishability measurement formula DFS in the step 1SAnd step 2, an intra-class characteristic minimum redundancy expression is used for defining a joint function of maximum correlation and minimum redundancy; the method specifically comprises the following steps:
combining equations (2) and (4), the combined function that yields the maximum correlation and the minimum redundancy is shown as equation (5):
wherein the parameter k represents the number of categories;respectively expressed under the ith category, the characteristics α1And α2The true value in the jth sample; and DFSSIndicating the DFS value size corresponding to the subset of features containing | S | features.
And 4, training the combined function of the maximum correlation and the minimum redundancy in the step 4 to finish the selection process. The method specifically comprises the following steps:
firstly, a feature subset distinguishability measurement formula DFSSSelecting the characteristic with the maximum DFS value from the empty set, and adding the characteristic into the initially empty optimal characteristic subset X;
then, judging whether the newly added features are reserved or not according to the accuracy of the random forest classifier corresponding to the optimal feature subset X after the new features are added: if the accuracy rate is increased, retaining the newly added feature; otherwise, delete it; and (4) performing iteration until all the characteristics are tested, wherein the optimal characteristic subset X after the iteration is finished is the final selection result.
Examples
First, experimental data
The UCI HAR Dataset is adopted in the experiment, 30 testers with different ages, heights and weights carry a smart phone on the waist, and then six types of human behavior and action acceleration sensor data are acquired at a constant speed (50Hz), wherein the data are respectively as follows: walking, ascending stairs, descending stairs, sitting, standing, and lying down.
Using a sliding window technology (the window size is 110%, the coverage rate is 50%) to perform feature extraction on the denoised data set, wherein the extracted features are 15 types, the feature numbers are respectively 1 to 15, and are respectively: mean, variance, root mean square, mean absolute deviation, interquartile range, interaxial correlation coefficient, kurtosis, skewness, energy, maximum, minimum, median absolute difference, signal amplitude domain, peak-to-peak value, and median.
In order to obtain a reliable and stable classification model, a 10-fold cross validation experiment is adopted. In order to obtain uniform experimental data, firstly, the sample sequence is randomly disturbed, each type of sample is sequentially added into 10 initially empty sample sets one by one until each sample of the type is added, and the purpose of randomly and uniformly dividing the samples into 10 parts is achieved. Then, using 1 sample as a test sample set and the other 9 samples as a training sample set, sequentially polling, and finally realizing 10-fold cross validation.
Second, pretreatment of experimental data
Aiming at the acceleration sensor of the smart phone, due to the influence of noise and external environment existing in hardware of the acceleration sensor, the acquired original data deviates from a true value. The moving average filtering method adopting smooth denoising has the following calculation formula:
wherein, the parameter Original is the raw data collected by the acceleration sensor, the parameter Result is the calculation Result, i represents the ith moment, and n represents the window length for smoothing.
Third, analysis of experimental results
In order to verify the effectiveness of the behavior recognition feature selection method R-DFS, the recall ratio R (Recall), the accuracy ratio P (precision) andthree metrics.
Performing 10-fold cross validation on the K neighbor KNN, the support vector machine SVM, the decision tree DT, the naive Bayes NB, the RF and other five classifiers respectively; the five classifiers are compared in a confusion matrix under the optimal feature subset, and the experimental results are shown in tables 1-6.
TABLE 1 DFS recall based comparison data
Parameter(s) | Walking device | Go upstairs | Go downstairs | Sit down | Standing up | Lie down |
KNN | 0.9 | 1 | 0.8 | 0.9 | 1 | 1 |
SVM | 0.8 | 0.6 | 0.9 | 0.9 | 1 | 1 |
DT | 0.7 | 0.6 | 0.8 | 0.9 | 1 | 0.9 |
NB | 0.9 | 0.7 | 1 | 0.9 | 1 | 1 |
RF | 0.9 | 1 | 1 | 1 | 1 | 1 |
TABLE 2 DFS based accuracy comparison data
Parameter(s) | Walking device | Go upstairsLadder with adjustable height | Go downstairs | Sit down | Standing up | Lie down |
KNN | 0.9 | 0.9 | 0.9 | 1 | 1 | 0.9 |
SVM | 0.9 | 0.8 | 0.7 | 1 | 1 | 0.9 |
DT | 0.8 | 0.7 | 0.6 | 0.9 | 0.9 | 0.9 |
NB | 0.9 | 0.9 | 0.8 | 1 | 0.9 | 1 |
RF | 1 | 0.9 | 0.9 | 1 | 1 | 1 |
Table 3 comparing data based on F1 values of DFS
Parameter(s) | Walking device | Go upstairs | Go downstairs | Sit down | Standing up | Lie down |
KNN | 0.95 | 0.91 | 0.87 | 0.95 | 0.95 | 0.95 |
SVM | 0.85 | 0.72 | 0.83 | 0.94 | 0.96 | 0.96 |
DT | 0.77 | 0.61 | 0.69 | 0.89 | 0.95 | 0.93 |
NB | 0.9 | 0.79 | 0.86 | 0.95 | 0.98 | 0.98 |
RF | 0.95 | 0.94 | 0.96 | 0.99 | 0.99 | 0.99 |
TABLE 4 recall ratio comparison data based on R-DFS
Parameter(s) | Walking device | Go upstairs | Go downstairs | Sit down | Standing up | Lie down |
KNN | 0.9 | 1 | 0.8 | 0.9 | 1 | 1 |
SVM | 0.8 | 0.6 | 0.9 | 0.9 | 1 | 1 |
DT | 0.7 | 0.6 | 0.8 | 0.9 | 1 | 0.9 |
NB | 0.9 | 0.7 | 1 | 0.9 | 1 | 1 |
RF | 0.9 | 1 | 1 | 1 | 1 | 1 |
TABLE 5R-DFS based accuracy comparison data
Parameter(s) | Walking device | Go upstairs | Go downstairs | Sit down | Standing up | Lie down |
KNN | 1 | 0.9 | 1 | 1 | 1 | 1 |
SVM | 0.9 | 0.8 | 0.8 | 1 | 1 | 1 |
DT | 0.8 | 0.7 | 0.7 | 0.9 | 0.9 | 1 |
NB | 0.9 | 0.9 | 0.8 | 1 | 1 | 1 |
RF | 1 | 1 | 1 | 1 | 1 | 1 |
TABLE 6F 1 value comparison data based on R-DFS
Parameter(s) | Walking device | Go upstairs | Go downstairs | Sit down | Standing up | Lie down |
KNN | 0.96 | 0.94 | 0.9 | 0.97 | 0.97 | 0.98 |
SVM | 0.87 | 0.76 | 0.87 | 0.99 | 0.99 | 1 |
DT | 0.81 | 0.65 | 0.75 | 0.91 | 0.97 | 0.95 |
NB | 0.93 | 0.83 | 0.89 | 0.97 | 0.99 | 0.99 |
RF | 0.98 | 0.98 | 0.99 | 1 | 1 | 1 |
From the comparison results of tables 1 to 6, it can be seen that: the R-DFS feature selection method is generally superior to the DFS method in three evaluation indexes of accuracy, recall rate and F1 score.
Furthermore, the RF algorithm was verified to work best among the five classifiers. Compared with the DFS feature selection method, the R-DFS has the average performance improvement of 1.9% in accuracy rate, 1.8% in recall rate and 1.9% in F1 score.
The behavior recognition feature selection method based on the improved feature subset discrimination uses the UCI HARDATASET data set to perform experimental analysis, and the result shows that: compared with the DFS method, the method can further improve the classification performance in the aspect of deleting redundancy; in addition, among the five classes of classifiers, the RF classifier has the highest accuracy.
Claims (5)
1. The behavior recognition feature selection method based on the improved feature subset discrimination is characterized by comprising the following steps of:
step 1, establishing a sample feature set based on the feature subset distinguishability measuring criterion of the DFS, and obtaining a feature subset distinguishability measuring formula DFSS;
Step 2, acquiring mutual information between two random variables based on information theory and probability theory concepts, thereby obtaining an intra-class feature minimum redundancy expression in the sample feature set in the step 1;
step 3, combining the feature subset distinguishability measurement formula DFS in the step 1SAnd step 2, an intra-class characteristic minimum redundancy expression is used for defining a joint function of maximum correlation and minimum redundancy;
and 4, training the combined function of the maximum correlation and the minimum redundancy in the step 4 to finish the selection process.
2. The behavior recognition feature selection method based on the improved feature subset discrimination as claimed in claim 1, wherein the step 1 specifically comprises:
let m dimension real space be written as RmAnd any k (k is more than or equal to 2) type, setting the total recorded number of the samples as n and the total number of the ith type samples as niAnd the spatial dimension of the sample is m, the training set T can be expressed as formula (1):
T={(xt,yt)|xt∈Rm,yt∈{1,2...k},t∈{1,2...n}} (1)
therefore, the method includes k categories, | S | represents the number of elements in the feature set S and 0 & lt | S | < m, and | S | feature subset discrimination measurement formula DFS of featuresSIs formula (2):
wherein the parametersRepresents the mean vector obtained after averaging all samples,represents the mean vector obtained after averaging the ith type samples,and representing the feature vector of the jth sample of the ith category, wherein the vectors of the three categories all contain | S | < m features.
3. The behavior recognition feature selection method based on the improved feature subset discrimination as claimed in claim 2, wherein the step 2 is specifically:
setting the probability density of any two random variables X and Y and the joint probability density of the random variables X and Y as p (X), p (Y) and p (X, Y) in sequence; then the mutual information between these two random variables can be defined as in equation (3):
the calculation formula for setting the minimum redundancy of the features in the class is as the following formula (4):
wherein the parameters u, v represent any two features in the feature set, and r (S) represents mutual information values between all features in the set S, i.e. redundancy between features.
4. The behavior recognition feature selection method based on the improved feature subset discrimination as claimed in claim 3, wherein the step 3 is specifically:
combining equations (2) and (4), the combined function that yields the maximum correlation and the minimum redundancy is shown as equation (5):
f(DFSS,R'(S))=DFSS-R'(S)
5. The behavior recognition feature selection method based on the improved feature subset discrimination as claimed in claim 4, wherein the step 4 is specifically:
firstly, a feature subset distinguishability measurement formula DFSSSelecting the characteristic with the maximum DFS value from the empty set, and adding the characteristic into the initially empty optimal characteristic subset X;
then, judging whether the newly added features are reserved or not according to the accuracy of the random forest classifier corresponding to the optimal feature subset X after the new features are added: if the accuracy rate is increased, retaining the newly added feature; otherwise, delete it; and (4) performing iteration until all the characteristics are tested, wherein the optimal characteristic subset X after the iteration is finished is the final selection result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010377788.3A CN111709441B (en) | 2020-05-07 | 2020-05-07 | Behavior recognition feature selection method based on improved feature subset distinction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010377788.3A CN111709441B (en) | 2020-05-07 | 2020-05-07 | Behavior recognition feature selection method based on improved feature subset distinction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111709441A true CN111709441A (en) | 2020-09-25 |
CN111709441B CN111709441B (en) | 2024-02-02 |
Family
ID=72536556
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010377788.3A Active CN111709441B (en) | 2020-05-07 | 2020-05-07 | Behavior recognition feature selection method based on improved feature subset distinction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111709441B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112633346A (en) * | 2020-12-17 | 2021-04-09 | 西安理工大学 | Feature selection method based on feature interactivity |
CN114615020A (en) * | 2022-02-15 | 2022-06-10 | 中国人民解放军战略支援部队信息工程大学 | Method and system for quickly identifying network equipment based on feature reduction and dynamic weighting |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991446A (en) * | 2017-04-06 | 2017-07-28 | 哈尔滨理工大学 | A kind of embedded dynamic feature selection method of the group policy of mutual information |
-
2020
- 2020-05-07 CN CN202010377788.3A patent/CN111709441B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991446A (en) * | 2017-04-06 | 2017-07-28 | 哈尔滨理工大学 | A kind of embedded dynamic feature selection method of the group policy of mutual information |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112633346A (en) * | 2020-12-17 | 2021-04-09 | 西安理工大学 | Feature selection method based on feature interactivity |
CN114615020A (en) * | 2022-02-15 | 2022-06-10 | 中国人民解放军战略支援部队信息工程大学 | Method and system for quickly identifying network equipment based on feature reduction and dynamic weighting |
CN114615020B (en) * | 2022-02-15 | 2023-05-26 | 中国人民解放军战略支援部队信息工程大学 | Method and system for rapidly identifying network equipment based on feature reduction and dynamic weighting |
Also Published As
Publication number | Publication date |
---|---|
CN111709441B (en) | 2024-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rustempasic et al. | Diagnosis of parkinson’s disease using fuzzy c-means clustering and pattern recognition | |
CN111009321A (en) | Application method of machine learning classification model in juvenile autism auxiliary diagnosis | |
CN111000553B (en) | Intelligent classification method for electrocardiogram data based on voting ensemble learning | |
CN108304887A (en) | Naive Bayesian data processing system and method based on the synthesis of minority class sample | |
CN114469120B (en) | Multi-scale Dtw-BiLstm-Gan electrocardiosignal generation method based on similarity threshold migration | |
CN111986811A (en) | Disease prediction system based on big data | |
CN107110743A (en) | Check data processing equipment and check data processing method | |
CN111709441A (en) | Behavior recognition feature selection method based on improved feature subset discrimination | |
CN112215259B (en) | Gene selection method and apparatus | |
CN112885334A (en) | Disease recognition system, device, storage medium based on multi-modal features | |
CN115273236A (en) | Multi-mode human gait emotion recognition method | |
CN112233742B (en) | Medical record document classification system, equipment and storage medium based on clustering | |
CN114242178A (en) | Method for quantitatively predicting biological activity of ER alpha antagonist based on gradient lifting decision tree | |
CN117390371A (en) | Bearing fault diagnosis method, device and equipment based on convolutional neural network | |
Sari et al. | Best performance comparative analysis of architecture deep learning on ct images for lung nodules classification | |
CN110010246A (en) | A kind of disease Intelligent Diagnosis Technology based on neural network and confidence interval | |
Gil et al. | Fusion of feature selection methods in gene recognition | |
Zhang et al. | Enhanced Breast Cancer Classification through Data Fusion Modeling | |
CN111709440A (en) | Feature selection method based on FSA-Choquet fuzzy integration | |
Varma et al. | Machine learning based breast cancer visualization and classification | |
Sulaiman et al. | Classification of healthy and white root disease infected rubber trees based on relative permittivity and capacitance input properties using LM and SCG artificial neural network | |
CN114239738B (en) | Medical data classification method and related equipment for small samples | |
CN114239742B (en) | Medical data classification method based on rule classifier and related equipment | |
CN116226629B (en) | Multi-model feature selection method and system based on feature contribution | |
CN113553896B (en) | Electroencephalogram emotion recognition method based on multi-feature depth forest |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |