CN114529975B

CN114529975B - Self-adaptive rapid unsupervised feature selection method applied to face recognition

Info

Publication number: CN114529975B
Application number: CN202210183736.1A
Authority: CN
Inventors: 段立娟; 解晨瑶; 张文博; 乔元华
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2024-05-31
Anticipated expiration: 2042-02-25
Also published as: CN114529975A

Abstract

The invention relates to a self-adaptive rapid unsupervised feature selection method applied to face recognition, which is used for solving the problem that analysis is difficult due to a large number of meaningless and redundant features in a high-dimensional face image. Firstly, a self-adaptive rapid density peak clustering method is provided to perform clustering operation on facial image features, then a feature importance evaluation function is defined, the most representative features are selected from each feature cluster, and a feature subset is added to complete feature selection. The invention can achieve the effects of more accurate feature subset and quicker feature selection.

Description

Self-adaptive rapid unsupervised feature selection method applied to face recognition

Technical Field

The invention belongs to the technical fields of signal processing, data analysis and the like, and particularly relates to a self-adaptive rapid unsupervised feature selection method applied to face recognition.

Background

With the advent of the information explosion age, a large number of high-dimensional data such as face image data has been generated. Processing these large and high-dimensional face image data directly not only significantly increases the computational time and memory burden of the algorithm and computer hardware, but also results in poor performance due to the uncorrelated background, clothing, etc., noise and redundant dimensional features in the face image. Features in the face image are pixels of the image, the purpose of feature selection is to select key pixels in the image, and only the pixels selected by the features need to be considered when the face image data are analyzed, so that the efficiency of data analysis can be greatly improved. The feature selection is one of the main means for solving the problem of face image data, and is to select a small number of features from the original face image feature set to form a subset, and the feature subset can map main information and structures in the face image, so that the purposes of improving the computing efficiency and the accuracy in face recognition application can be achieved. Feature selection methods are generally classified into two main categories, a supervised feature selection method and an unsupervised feature selection method, according to the availability of data tags. In the supervised feature selection approach, feature selection is based on the association between features and class labels. It selects features that are strongly correlated with class labels, resulting in a subset of features with high performance. In the unsupervised feature selection method, feature selection is performed according to the relevance between the features and the features of each feature, and no additional label support is required. Most of the real face image data sets have the situation of scarce labels, and the unsupervised feature selection method is favored in the field.

The existing Unsupervised Feature Selection (UFS) algorithm for solving the face recognition problem mostly builds a sparse matrix to complete feature selection while maintaining the structural property among the features of the face image, and redundancy among the features is not fully considered. In recent years, students in the related art have proposed several UFS methods based on feature clustering to effectively reduce redundancy between features, but these UFS methods often require more computational cost and also require a large number of additional parameters to construct feature subsets. In order to better solve the problems of redundancy, time consumption cost and a large number of parameters among features, the invention designs a self-adaptive rapid unsupervised feature selection method based on feature clustering. Firstly, a self-adaptive parameter rapid density peak clustering algorithm is provided for carrying out feature clustering on data, then a feature importance evaluation function is defined, a most representative feature is selected from each feature cluster to form a feature subset, and feature selection is completed.

Disclosure of Invention

Aiming at the defects of the prior art, the invention designs and provides an unsupervised feature selection method based on self-adaptive rapid density peak clustering. Compared with the existing density peak clustering method, the invention provides a self-adaptive rapid density peak feature clustering method, which can adaptively determine the number of selected features and a cutoff distance parameter d _c to solve the problem of data set adaptation, and (b) select representative features in feature clusters formed by clustering, so that the invention defines a feature importance evaluation function, scores the features by calculating the feature standard deviation and feature expression capacity of each feature, adds the feature with the highest score into a feature subset, and more fully considers the redundancy among the features. In order to achieve the above purpose, the present invention adopts the following technical scheme:

an unsupervised feature selection method based on self-adaptive rapid density peak clustering, comprising:

S1, acquiring an original face image data matrix D, and performing data standardization processing to obtain original face image characteristics;

s2, carrying out self-adaptive rapid density peak clustering on the features, and clustering the features with higher similarity to form similar feature clusters;

S3, selecting the most representative features from the similar feature clusters through a feature importance evaluation function;

s4, adding the most representative feature of each feature cluster into the feature subset to obtain an optimal feature subset.

Further, the normalization processing in the step S1 adopts a normalization method to normalize the sample, specifically:

Firstly, carrying out normalization processing on an original face image data matrix D to obtain a normalized face image data matrix X,

Wherein, the original face image data matrix D e R ^n×w×h, n is the number of face images, w is the width of the face images, h is the height of the face images, D _ij is an element at the (i, j) position in the single Zhang Ren face image, the single two-dimensional face image is flattened into a one-dimensional row vector, the data matrix is changed into D ' e R ^n×d, wherein d=w×h is the feature number of the face images, D ' _ij is an element at the (i, j) position of the data matrix, and the normalization calculation formula of D ' _ij is as follows:

Wherein D' _j＝[D′_1j,D′_2j,…,D′_nj]^T;

then, the normalized matrix x= { X _ij } is spread in rows, and is denoted as:

X＝[x₁,x₂,…,x_n]^T∈R^n×d，

Wherein x _i＝[x_i1,x_i2,...,x_id ];

The normalized matrix x= { X _ij } is developed in columns and is noted as:

X＝[f₁,f₂,…,f_d]∈R^n×d，

F _i＝[x_1i,x_2i,...,x_ni]^T,f₁,f₂,…,f_d is d features of the original face image.

Further, the step S2 is a self-adaptive rapid density peak feature clustering method, which specifically includes:

S21, calculating the local density of the feature and adaptively determining a cut-off distance parameter d _c:

In density peak feature clustering, local density is the feature number that the distance between a feature and other features is smaller than the cutoff distance d _c, the selection of cutoff distance parameters d _c directly influences the clustering performance of an algorithm, and the conventional density peak clustering algorithm manually selects parameters according to experience, so that the problem that the adaptation of a data set with an unsatisfactory clustering effect is difficult is often caused. If the difference of the local densities of all the features is smaller, the entropy of the local density information is larger. Otherwise, if the difference of the local densities of all the features is larger, the entropy of the local density information is smaller. When the local density information entropy is maximum, all the features have the same local density, and the clustering center cannot be determined by using the local density. In order to be able to better utilize the local densities to determine the cluster centers, it is desirable that the feature local densities differ maximally, i.e. that the local density information entropy is minimal.

The parameter d _c is determined by minimizing the local density information entropy function H (d _c), and the model is expressed as follows:

wherein,

Ρ _i represents the local density of the ith feature, which is calculated by using a gaussian function, and the calculation formula is as follows:

Wherein d _ij represents the Euclidean distance between the ith feature and the jth feature;

Z represents the sum of the local densities of all the features, and the calculation formula is as follows:

S22, calculating a characteristic distance delta _i:

Concept of cluster center: the center of the cluster is surrounded by some locally less dense dots, which are all farther away from other high density dots. The present invention uses the feature distance delta _i to characterize this relationship rather than the euclidean distance between features. When ρ _i is maximum, the feature distance of the ith feature is the euclidean distance between this feature and the other features that is the maximum euclidean distance. When ρ _i is not the maximum, the feature distance of the ith feature is defined as the minimum euclidean distance between feature f _i and all other features having a local density greater than ρ _i, specifically expressed as:

S23, determining a clustering center:

And determining a clustering center by using the local density rho _i and the characteristic distance delta _i, wherein the final purpose of clustering is that the obtained cluster is small in intra-cluster distance and large in inter-cluster distance, and the corresponding density peak clustering algorithm is changed into a measurement standard that the larger the local density is, the better the local density is and the larger the characteristic distance is. In the map ρ _i～δ_i, points near the upper right, which are cluster centers, have these properties. In order to more accurately determine the clustering center, sorting from large to small according to rho _i*δ_i, obtaining an n-gamma graph, wherein gamma is a result after sorting according to rho _i*δ_i, and n is a feature ranking. There is an obvious step phenomenon in the point array in the graph, and the point before the step is the cluster center. In order to determine a clustering center, the invention makes a straight line tangent to a lattice fitting curve, the tangent point is the calculated turning point, the n value corresponding to the turning point is obtained by solving an optimal function, an optimization problem is used for calculating a feature ranking n, and the optimization problem is expressed as follows:

wherein alpha is the inclination angle corresponding to the tangent line.

S24, determining a cluster to which the feature belongs:

and distributing the features to different classes, and distributing the features except the clustering centers to the classes of the clustering centers closest to the Euclidean distance to complete the clustering of all the features.

Further, the step S3 specifically includes:

s31, calculating a characteristic standard deviation S _i:

The characteristic standard deviation can reflect the discrete degree of data under the characteristic, the values of different types of samples under an effective characteristic are often quite different, and the characteristic with a larger standard deviation is selected to be more consistent with cognition. For example, in distinguishing a zebra from a horse, the feature of whether or not there is a streak is an effective feature, while other features have little effect, and the standard deviation of the feature of whether or not there is a streak must be large.

The feature standard deviation S _i of the feature f _i in the similar feature cluster is expressed as:

Where m is the number of features in the feature cluster that is similar to f _i, f _ij＝x_ji.

S32, calculating characteristic expression capacity Corr _i:

The selection of representative features is inaccurate only by the standard deviation of the features, and from another aspect, efficient and representative features in the high redundancy feature cluster can fully express other redundant features in the entire feature cluster, which is referred to herein as feature expression capability. Pearson correlation coefficients are widely used to measure the degree of correlation between two variables, the greater the absolute value of the Pearson correlation coefficient for the two variables, the greater its correlation. To exclude a very small but large number of feature disturbances, the feature expression is represented by a Pearson correlation coefficient summation between features that is greater than a threshold t.

The characteristic expression ability Corr _i of the characteristic f _i is specifically expressed as:

Wherein δ (i, j) is an indicator function of the absolute value of the Pearson correlation coefficient between features f _i、f_j being greater than a threshold t, expressed as:

wherein,

The value of the threshold t is 0.1 quantile on the Pearson correlation coefficient of the feature and other features;

The Pearson correlation coefficient definition between features f _i、f_j is expressed as:

wherein, Is a characteristic average value, and the calculation formula is

S33, determining a feature importance evaluation function:

In order to balance the order of magnitude difference between the feature standard deviation S _i and the feature expression capability Corr _i, an adaptive parameter lambda is introduced, wherein lambda is taken as the ratio of the feature standard deviation of each feature in the same feature cluster to the sum of the feature expression capability, and a feature importance evaluation function is defined by the feature standard deviation and a feature correlation coefficient and expressed as follows:

Score_i＝S_i+λCorr_i

Where Score _i represents the feature Score for feature f _i, the adaptive parameter λ is expressed as:

s34, selecting the most representative characteristics:

And (3) carrying out importance evaluation on the features in each feature cluster, selecting the features with the highest scores from the feature clusters, namely, the most representative features, adding the features into the feature subset, and finishing feature selection.

Advantageous effects

Compared with other unsupervised feature selection algorithms, the invention firstly designs an adaptive rapid density peak feature clustering method for clustering features, in which the problem of difficult adaptation of a data set in the traditional density peak clustering can be solved according to the adaptive determined parameters of the data set, the number of selected features does not need to be additionally specified, and then a feature importance evaluation function is defined, and the most representative feature is selected in each cluster from the two aspects of feature standard deviation and feature expression capability. Meanwhile, due to the speed advantage of density clustering, and the optimization of the feature selection scene by the clustering method, AFDPCFS, the feature selection result is better and the speed is higher.

Drawings

Fig. 1 is an algorithm framework diagram of a feature clustering-based adaptive rapid unsupervised feature selection method (AFDPCFS) according to a first embodiment

Fig. 2 is a flowchart of a self-adaptive fast density peak feature clustering method according to an embodiment

Fig. 3 is a p _i～δ_i graph and n- γ graphs in a self-adaptive fast density peak feature clustering method according to a first embodiment

FIG. 4 is a diagram showing statistics of six face image datasets provided in an experiment

Fig. 5 is a schematic diagram of classification results (ACC) of corresponding KNN classifiers on respective face image datasets for different feature selection methods provided in experiments

FIG. 6 is a schematic diagram of classification results (ACC) of corresponding SVM classifiers on various face image datasets for different feature selection methods provided in experiments

Fig. 7 is a schematic diagram of classification results (ACC) of corresponding CART classifiers on each face image dataset for different feature selection methods provided in experiments

FIG. 8 is a graphical representation of the total Time spent (Time) on various face image datasets for different feature selection methods provided in the experiment

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention.

Aiming at the existing defects, the invention provides a self-adaptive rapid unsupervised feature selection method based on feature clustering.

Example 1

The adaptive quick unsupervised feature selection method applied to face recognition provided in this embodiment, as shown in fig. 1, includes:

s1, acquiring an original face image data matrix, and performing data standardization processing to obtain original face image characteristics;

Further, the normalization in step S1 performs normalization on the sample by using a normalization method, which is expressed as:

Wherein D' _j＝[D′_1j,D′_2j,…,D′_nj]^T;

then, the normalized matrix x= { X _ij } is spread in rows, and is denoted as:

X＝[x₁,x₂,…,x_n]^T∈R^n×d，

Wherein x _i＝[x_i1,x_i2,...,x_id ];

The normalized matrix x= { X _ij } is developed in columns and is noted as:

X＝[f₁,f₂,…,f_d]∈R^n×d，

wherein f _i＝[x_1i,x_2i,...,x_ni]^T;

further, as shown in fig. 2, the flow of step S2, step S2 specifically includes:

S21, determining a parameter d _c by minimizing a local density information entropy function H (d _c), wherein the model is expressed as follows:

wherein,

S22, calculating a characteristic distance delta _i:

When ρ _i is maximum, the feature distance of the ith feature is the euclidean distance between this feature and the other features that is the maximum euclidean distance. When ρ _i is not the maximum, the feature distance of the ith feature is defined as the minimum euclidean distance between feature f _i and all other features having a local density greater than ρ _i, specifically expressed as:

For a better understanding of the manner in which feature distances are calculated, the following is exemplified: if D e R ^n×3, i.e. the dataset has n samples, 3 features, ρ ₁＝1,ρ₂＝0.8,ρ₃ =0.6. Delta ₁ is equal to the euclidean distance value with the largest euclidean distance from f ₁ to other features, delta ₂ is equal to the euclidean distance value with the smallest feature euclidean distance from f ₂ to other local densities greater than ρ _i, i.e. the euclidean distance from f ₁, delta ₃ is equal to the smaller euclidean distance value from f ₃ to the feature f ₁、f₂.

S23, determining a clustering center:

And determining a clustering center by using the local density rho _i and the characteristic distance delta _i, wherein a rho _i～δ_i chart and n-gamma charts are shown in fig. 3. And obtaining an n value corresponding to the turning point by solving an optimization problem, wherein the clustering center is the characteristic corresponding to n before the gamma ranking, and the optimization problem is expressed as follows:

Wherein gamma is the result of rho _i*δ_i after sequencing, n is the feature ranking, alpha is the inclination angle corresponding to the tangent line, and alpha= -20..

S24, determining a cluster to which the feature belongs:

Further, the step S3 specifically includes:

s31, calculating a characteristic standard deviation S _i:

S32, calculating characteristic expression capacity Corr _i:

the characteristic expression capacity Corr _i of the characteristic f _i is expressed as:

wherein,

wherein, Is a characteristic average value, and the calculation formula is

S33, determining a feature importance evaluation function:

defining a feature importance evaluation function by using a feature standard deviation and a feature correlation coefficient, wherein a calculation formula is as follows:

Score_i＝S_i+λCorr_i

Where Score _i denotes the feature Score of feature f _i, the adaptive parameter λ was introduced to balance the order of magnitude difference between the feature standard deviation S _i and the feature expression capability Corr _i, specifically expressed as:

s34, selecting the most representative characteristics:

Compared with other unsupervised feature selection algorithms, the invention firstly designs an adaptive rapid density peak feature clustering method for clustering features, which can solve the problem of difficult adaptation of the data set in the traditional density peak clustering according to the adaptive determined parameters of the data set, and then defines a feature importance evaluation function, and selects the most representative feature in each cluster from two angles of feature standard deviation and feature expression capability. Meanwhile, due to the speed advantage of density clustering, and the optimization of the feature selection scene by the clustering method, AFDPCFS, the feature selection result is better and the speed is higher.

Experimental part

The experimental part in the invention is to fully verify the high efficiency of the AFDPCFS method of the invention.

The performance of the AFDPCFS method was tested on six public face image datasets (Yale, yaleB, warpAR10P, warpPIE10P, ORL, COIL) while comparing to the following six currently popular unsupervised feature selection algorithms:

(1) Baseline: all of the original features are employed.

(2) LS: the importance of features is assessed using the method definition LAPLACIAN SCORE of the Laplacian feature map, which is the standard Filter-type feature selection method.

(3) MCFS: the unsupervised feature selection algorithm of the multi-class clusters considers the inherent manifold structure to perform spectrum analysis when the goodness of the clusters is measured, so that the multi-cluster structure of the features is better reserved, and meanwhile, the sparse feature value problem and the L1 regularized least square problem are optimized.

(4) NDFS: the non-negative discrimination unsupervised feature selection algorithm utilizes non-negative discrimination information to obtain more accurate cluster labels, and simultaneously performs joint learning on the cluster labels and the feature selection matrix, so that features with the most discrimination capability can be selected.

(5) UDFS: l _2,1 -norm regularization discriminant feature selection, combining discriminant analysis and l _2,1 -norm minimization into a joint framework of unsupervised feature selection to optimize the algorithm, and enabling selection of the most discriminative feature subset in a batch mode.

(6) FSSC-SD: the feature is clustered through the spectral clustering of the self-adaptive neighborhood, and the product of feature distinction and feature independence is defined as a feature importance index to select a feature subset with strong classification capability.

The contrast algorithms LS, MCFS and NDFS calculate the distance between features by using Euclidean distance, the thermonuclear similarity measures the feature similarity, the neighbor number K is set to be 5, and the bandwidth parameter t is set to be 1; NDFS algorithm other parameters γ are set to 10 ⁸, and α and β are both set to 1; the UDFS algorithm regularization parameter is set to 0.1, the cluster-like number is set to 5, and the neighbor number K is set to 5.

In the experiment, AFDPCFS methods are compared with other six unsupervised feature selection methods on six public face image datasets. The six face image experimental datasets included Yale, yaleB, warpAR, 10P, warpPIE, 10, P, ORL, COIL, 20. Statistics of these face image datasets are shown in fig. 4.

For all experimental data sets, other comparison algorithms except AFDPCFS methods need to additionally determine the number of selected features, the number of selected features of other comparison algorithms is set to {2,4, 6..100 }, AFDPCFS can adaptively determine the number of selected features, and K-nearest neighbor (KNN), support Vector Machine (SVM) and classification and regression tree (CART) classifiers are used for the experiments to classify the data sets, wherein parameters in the KNN algorithm are set to k=5, p=1; the other KNN parameters, SVM parameters and CART parameters adopt default parameters in scikit-learnpython packets. The experiment adopts minimum-maximum standardization to standardize the data, 5 times of 10-fold cross validation experiments are carried out, and the average value of 5 times of results is taken as the experimental result to compare the performance of each algorithm.

The evaluation criteria uses two indexes, ACC and Time, where ACC refers to classification accuracy, the correct proportion is fully divided in all samples, and Time refers to the sum of the Time the algorithm spends in all experimental datasets during the feature selection stage.

In order to evaluate the accuracy performance of the unsupervised feature selection algorithm, the invention records the optimal results of the feature selection algorithm in different feature subsets, and the maximum classification accuracy ACC in the face image dataset shown in FIG. 4 when the feature selection algorithm is selected by AFDPCFS and other 5 supervised feature selection algorithms FSSC-SD, laplacian, MCFS, NDFS, UDFS under KNN, SVM and CART classifiers shown in FIG. 5-FIG. 7 respectively. Wherein the optimal results are marked in bold, suboptimal in underline, and finally the average results of each algorithm over all experimental datasets are ranked.

To evaluate the speed performance of the unsupervised feature selection algorithm, the present invention records the sum of the time spent by each algorithm in all experimental datasets during the feature selection stage, and FIG. 8 shows the total time spent by AFDPCFS and the other 5 supervised feature selection algorithms FSSC-SD, laplacian, MCFS, NDFS, UDFS in the face image dataset shown in FIG. 4.

The result shows that the unsupervised feature selection algorithm greatly reduces feature dimension and improves classification precision, particularly the AFDPCFS algorithm provided by the invention has the highest average ranking under each classifier, and in terms of time consumption, the AFDPCFS algorithm has obvious speed advantage. The feature subset selected by AFDPCFS is shown to be efficient and representative.

Claims

1. The self-adaptive rapid unsupervised feature selection method applied to face recognition is characterized by comprising the following steps of:

s4, adding the most representative feature of each feature cluster into the feature subset to obtain an optimal feature subset;

further, the step S2 specifically includes the following steps:

wherein,

S22, calculating a characteristic distance delta _i:

When ρ _i is maximum, the feature distance of the ith feature is the euclidean distance between the feature and other features that are the maximum, and when ρ _i is not the maximum, the feature distance of the ith feature is defined as the euclidean distance between the feature f _i and all other features with local densities greater than ρ _i that are the minimum, specifically expressed as:

S23, determining a clustering center:

Determining a clustering center by using the local density rho _i and the feature distance delta _i, wherein the clustering center is the feature corresponding to n in front of the gamma ranking, and the optimization problem is used for calculating the feature ranking n and is expressed as follows:

wherein, gamma is the result after rho _i*δ_i is ordered, n is the feature ranking, alpha is the inclination angle corresponding to the tangent line, and alpha= -20 degrees;

S24, determining a cluster to which the feature belongs:

the features are distributed to different classes, the features except the clustering centers are distributed to the classes of the clustering centers with the nearest Euclidean distance, and the clustering of all the features is completed;

The step S3 specifically comprises the following steps:

s31, calculating a characteristic standard deviation S _i:

wherein m is the feature quantity in the feature cluster similar to f _i, and f _ij＝x_ji;

S32, calculating characteristic expression capacity Corr _i:

wherein,

wherein, Is a characteristic average value, and the calculation formula is

S33, determining a feature importance evaluation function:

Score_i＝S_i+λCorr_i

s34, selecting the most representative characteristics:

2. The adaptive fast unsupervised feature selection method for face recognition according to claim 1, wherein: further, the step S1 specifically includes the following steps:

Wherein D' _j＝[D′_1j,D′_2j,…,D′_nj]^T;

then, the normalized matrix x= { X _ij } is spread in rows, and is denoted as:

X＝[x₁,x₂,…,x_n]^T∈R^n×d，

Wherein x _i＝[x_i1,x_i2,...,x_id ];

The normalized matrix x= { X _ij } is developed in columns and is noted as:

X＝[f₁,f₂,…,f_d]∈R^n×d，