CN116206166A

CN116206166A - Data dimension reduction method, device and medium based on kernel projection learning

Info

Publication number: CN116206166A
Application number: CN202310490898.4A
Authority: CN
Inventors: 张小乾; 李世举; 王丽超; 谈振; 汝佧; 陈宇峰
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2023-05-05
Filing date: 2023-05-05
Publication date: 2023-06-02
Anticipated expiration: 2043-05-05
Also published as: CN116206166B

Abstract

The invention discloses a data dimension reduction method, device and medium based on kernel projection learning, which comprises the following specific steps: constructing an OPLFE model, introducing a kernel function, and transferring an original data space into a high-dimensional nonlinear space; obtaining a mapping relation of space conversion, determining constraint conditions, constructing a nuclear Gram matrix, and guiding a high-dimensional nonlinear relation of low-rank matrix learning data; constructing a multi-core Gram matrix based on the single-core Gram matrix, and defining a core Gram matrix weight parameter; determining a mathematical joint kernel projection and image feature extraction JNPRL model representing learning according to the multi-kernel Gram matrix; and optimizing the JNPRL model and outputting the image data after the dimension reduction. The data structure relation of the learning data in the high-dimensional nonlinear kernel space is used for guiding the data conversion of the data from the original space to the low-dimensional feature space, the main feature information of the image is extracted, and the accuracy of extracting the features of the face image is improved.

Description

Data dimension reduction method, device and medium based on kernel projection learning

Technical Field

The invention relates to the technical field of data processing, in particular to a data dimension reduction method, device and medium based on kernel projection learning.

Background

At present, in the image data processing process, especially in the face data image processing process, because of the complexity of the face of an acting object, in the feature extraction and data conversion process of the face image, the extracted image and the extracted target are generally different greatly, the traditional feature extraction method based on projection learning is usually used in the existing stage to realize the conversion of data from high dimension to low dimension by learning a relevant projection matrix, and the main feature information of the data is reserved in the process, so as to filter the irrelevant features and noise interference existing in the data. The image feature extraction (OPLFE, orthogonal projection learning feature extraction) model based on orthogonal projection learning focuses on data conversion from an original data space to a projection space in the process of processing face image data, optimizes the projection matrix through learning of a low-rank matrix, and retains main feature information of the face data, but the model ignores higher-order correlation among data, so that the model cannot show excellent performance when processing complex face data or high-dimensional data, and the extracted face data still has difference with the original face data.

Disclosure of Invention

The invention aims to provide a data dimension reduction method, equipment and medium based on kernel projection learning, and the method and the device are used for introducing the kernel projection learning based on an image feature extraction (OPLFE) method based on orthogonal projection learning, guiding data conversion of data from an original space to a low-dimensional feature space process through a data structure relation of learning data in a high-dimensional nonlinear kernel space, constructing main feature information of an image extracted by combining the kernel projection and the image feature extraction model representing the learning, and improving the accuracy of extracting the face image features.

The invention is realized by the following technical scheme:

the first aspect of the invention provides a data dimension reduction method based on kernel projection learning, which comprises the following specific steps:

s1, acquiring face data, preprocessing the face data, constructing an image feature extraction OPLFE model of orthogonal projection learning, introducing a kernel function into the OPLFE model, and transferring an original data space into a high-dimensional nonlinear space;

s2, obtaining a mapping relation of space conversion, determining constraint conditions, constructing a nuclear Gram matrix, and guiding a high-dimensional nonlinear relation of low-rank matrix learning data;

s3, constructing a multi-core Gram matrix based on the single-core Gram matrix, and defining a core Gram matrix weight parameter;

s4, determining a JNPRL model by mathematical joint kernel projection and image feature extraction representing learning according to the multi-kernel Gram matrix;

and S5, optimizing the JNPRL model and outputting the image data after the dimension reduction.

Based on an image feature extraction (OPLFE) method based on orthogonal projection learning, nuclear projection learning is introduced, data conversion of data from an original space to a low-dimensional feature space process is guided through a data structure relation of learning data in a high-dimensional nonlinear kernel space, main feature information of an image is extracted by constructing a combined nuclear projection and image feature extraction model representing learning, and when complex face data or high-dimensional data are processed, the accuracy of face image feature extraction is improved.

Further, the constructing the OPLFE model further includes:

acquiring image data, preprocessing the image data, and extracting image features;

and constructing an error matrix, a projection matrix, a data matrix and a low-rank matrix according to the image characteristics, optimizing the projection matrix by using the low-rank matrix, and determining the data matrix.

Further, the constructing the core GRAM matrix specifically includes:

acquiring a nuclear mapping relation according to a space conversion relation from an original data space to a high-dimensional nonlinear space;

substituting the nuclear mapping relation into the data matrix to obtain a mapped data matrix;

and determining a beam condition according to the mapped data matrix, and constructing a nuclear Gram matrix.

Further, the constructing the multi-core Gram matrix specifically includes:

and determining the number of the reference core Gram matrixes and the consensus core Gram matrixes to be learned according to the mapping relation, and constructing the multi-core Gram matrixes.

Further, the defining the weight parameter of the kernel Gram matrix specifically includes:

designing weight parameters, and distributing different weights for different kernel Gram matrixes to obtain weight vectors of a single kernel Gram matrix;

according to the weight vector of the single core Gram matrix, the weight parameters of the core Gram matrix are defined.

Further, the optimizing the model specifically includes:

defining Lagrange parameters and multiplication parameters of the JNPRL model, and combining an auxiliary matrix to obtain a Lagrange solving formula;

and respectively solving a low-rank matrix L, a projection matrix Q, an auxiliary matrix J or a consensus Gram matrix K in the Lagrange solving formula.

Further, the solving step specifically includes:

solving a matrix L: updating L, fixing K, J and Q, deleting irrelevant items related to L, and determining an iterative type of a matrix L;

solving a matrix Q: updating Q, fixing L, K and J, deleting irrelevant items of Q, and determining an iterative type of the matrix Q;

solving a matrix J: updating J, fixing L, K and Q, deleting irrelevant items of J, and determining an iterative type of the matrix J;

solving a matrix K: updating K, fixing L, J and Q, deleting irrelevant items related to K, and determining an iterative type of the matrix K.

Further, the solving process further includes:

acquiring a relevant constraint condition of the JNPRL model, designing an iterative solution algorithm according to the relevant constraint condition, and approaching an optimal solution of the JNPRL model through a matrix result obtained by iterative solution;

in the iteration process, when the distance between the matrix obtained by the previous iteration and the matrix obtained by the current iteration is smaller than a set threshold value, stopping iteration to obtain an optimal solution.

A second aspect of the present invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements a data dimension reduction method based on kernel projection learning when executing the program.

A third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for dimension reduction of data based on kernel projection learning.

Compared with the prior art, the invention has the following advantages and beneficial effects:

based on an image feature extraction (OPLFE) method based on orthogonal projection learning, nuclear projection learning is introduced, data conversion of data from an original space to a low-dimensional feature space process is guided through a data structure relation of learning data in a high-dimensional nonlinear nuclear space, main feature information of an image is extracted by combining nuclear projection and an image feature extraction model representing the learning, and the accuracy of face image feature extraction is improved.

Drawings

In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, the drawings that are needed in the examples will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and that other related drawings may be obtained from these drawings without inventive effort for a person skilled in the art. In the drawings:

FIG. 1 is a flowchart of an algorithm in an embodiment of the present invention.

Description of the embodiments

For the purpose of making apparent the objects, technical solutions and advantages of the present invention, the present invention will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present invention and the descriptions thereof are for illustrating the present invention only and are not to be construed as limiting the present invention.

Examples

As shown in fig. 1, a first aspect of the present invention provides a data dimension reduction method based on kernel projection learning, which includes the following specific steps:

According to the embodiment, based on an image feature extraction (OPLFE) method of orthogonal projection learning, kernel projection learning is introduced, data conversion of data from an original space to a low-dimensional feature space process is guided through a data structure relation of learning data in a high-dimensional nonlinear kernel space, main feature information of an image is extracted by constructing a combined kernel projection and image feature extraction (JNPRL, joint nuclear projection and representation learning for image feature extraction) model representing the learning, and the accuracy of image feature extraction is improved.

In some possible embodiments, constructing the OPLFE model further comprises:

In some possible embodiments, constructing the core GRAM matrix specifically includes:

In some possible embodiments, constructing the multi-core Gram matrix specifically includes:

In some possible embodiments, defining the weight parameters of the core Gram matrix specifically includes:

In some possible embodiments, optimizing the model specifically includes:

In some possible embodiments, the solving step specifically includes:

In some possible embodiments, the solving process further comprises:

in the iteration process, when the distance between the matrix obtained in the previous iteration and the matrix obtained in the current iteration is smaller than a set threshold value, stopping iteration to obtain an optimal solution.

A second aspect of the present embodiment provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements a method for reducing data dimension based on kernel projection learning when the processor executes the program.

A third aspect of the present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for data dimension reduction based on kernel projection learning.

Examples

The present embodiment provides a model solving overall process:

1. in an OPLFE algorithm based on image feature extraction of orthogonal projection, information contained in a low-rank matrix is utilized to optimize projection matrix learning, and a specific model is as follows:

（1）/>

equation (1) aims at minimizing E, F, L, P, Q, the correlation entropy

Measuring data errors to enhance the robustness of the model; generalized related entropy->

For testing classification errors; />

The constraint enables the model to obtain good embedded representation in the dimension reduction process; />

The problem of low rank constraint optimization can be better solved, < ->

For Schatten-p norm, p is a parameter weighting Schatten-p norm, and w is a weight.

The model aims at focusing on data conversion in the process of data from an original data space to a projection space, optimizes the projection matrix through learning of a low-rank matrix, and ignores high-order correlation among the data while retaining main characteristic information of the data, so that the model cannot show excellent performance when processing complex face data or high-dimensional data. To capture the high-order correlation of high-dimensional data, we introduce a kernel method into the model.

(1) P is a reconstruction matrix, I is an identity matrix, Q ^T Transpose of Q, E and F are error matrices: the difference between the original data matrix (X) and the processed data matrix (X), L is a low rank matrix: processing the data matrix X to obtain a matrix with low rank property, wherein Q is a projection matrix: the matrix Q satisfies a symmetric matrix (a matrix with the main diagonal as the symmetry axis and equal elements) and an idempotent matrix (Q is a square matrix and Q ² =Q），

For balance parameter->

Square of 2,1 norm, +.>

Is the square of the F norm;

(2)

，/>

is the element of the first row and column in the matrix, < >>

Is the kernel width;

(3)

,/>

,/>

the label matrix is a column element of a row I in the matrix;

(4) WSNM is a non-convex low-rank optimization, which can give different weights to different singular values, and can more reasonably mine the information of ranks, and is defined as:

where P is a parameter of the weighted Schatten-P norm,

；/>

is a weight coefficient; />

Is the first singular value of matrix X.

2. For the model in step 1, the original data space is transferred to the high-dimensional nonlinear space, and for convenience of discussion, the model of the formula (1) is simplified, and the method specifically comprises the following steps:

（2）

in the simplified process, we keep

These two items are removed

Both of these items. Add->

Facilitating the subsequent model solving, while the constraint becomes +.>

. Equation (2) minimizes E, L, Q under constraint.

3. Introducing kernel functions maps data in the original space into a high-dimensional kernel space, if the kernel mapping relationship is that

Then it can be expressed as +.>

Then equation (2) becomes:

（3）

equation (3) under constraint

The lower minimizes E, L, Q.

4. However, the conversion in equation (3) requires specific values of the calculated data in the kernel space that we display, which is a very difficult task, whereas based on kernel skills we can pass through the kernel Gram matrix without requiring specific data in the calculated kernel space to be displayed

Instead of directly calculating +.>

Is the case in (a). But when we use->

When the model is subjected to convex relaxation, the learned projection matrix is the data projection from the kernel space to the low-dimensional space, and the kernel method is introduced to capture more data structure relations in the high-dimensional space, so as to guide the projection learning process of the data from the original space to the low-dimensional space, and the constraint condition is changed into ++>

Is ready to use->

Substitution E, whereby the above formula can be changed to a reduced form as follows:

（4）/>

wherein the kernel Gram matrix term is calculated as

And->

Is a kernel function and +.>

。

(1) Tr represents the trace of the matrix. Examples: tr (A), the sum of the elements on the main diagonal (diagonal from top left to bottom right) of an n matrix A is called the trace (or trace number) of matrix A;

(2) F norm square derivative

Examples: />

；

(3) Gram matrix

Thus +.>

And K is used instead.

5. For equation (4)

This term can be seen as +.>

And (3) with

And->

The method shows that the high-dimensional nonlinear relation of the effective learning data of the low-rank matrix can be guided by constructing the kernel Gram matrix through the kernel function, so that the following model can be obtained:

（5）

equation (5) under constraint

The lower minimizes Q, L, K.

6. The model of the formula (5) only considers that data is projected into the Hilbert space through a single kernel function, the kernel function suitable for a specific data set is usually unknown, the selection of the category and the debugging of kernel parameters are all relatively difficult problems, and the data in the real world are not linearly separable through single kernel mapping, so that the multi-kernel learning can be realized by constructing a multi-kernel Gram matrix to restrain and learn the consensus kernel Gram matrix, the task of searching the optimal kernel function from a kernel pool can be avoided, and the following formula is added in the formula (5):

（6）

the formula (6) can automatically select the optimal kernel function and adjustment parameters according to different data sets, so that high-dimensional nonlinear data can be better processed.

Where i is the number of reference kernel Gram matrices, K is the consensus kernel Gram matrix to be learned,

representing one of the kernel Gram matrices, < +.>

Is a weight vector. And design the weight parameter +.>

Different weights are assigned to different core Gram matrices. The definition is as follows: />

Wherein the method comprises the steps of

Is a scalar parameter->

，/>

Representation->

Average value of (2).

7. And finally, obtaining an image feature extraction JNPRL algorithm combining nuclear projection and representation learning, wherein the model is as follows:

（6）

equation (6) under constraint

The lower minimizes Q, L, K. Introduced into

This item is added with->

The multi-core learning is expanded, so that the model can learn the high-dimensional nonlinear characteristics of data better; the better embedded representation can be obtained by applying 2,1 norms on the projection matrix Q; regularization of the low rank matrix weighted truncated Schatten-p norm (WTSN) can better solve the problem of low rank constraint optimization.

8. After the JNPRL final model is obtained, it is very difficult to directly solve the model, generally, we design an iterative solution algorithm according to a specific model and related constraint conditions, and the matrix result obtained by iterative solution approaches to the optimal solution of the model. When the distance between the matrix obtained by the previous iteration and the matrix obtained by the current iteration is smaller than the value set by the user, the result obtained at the moment is considered to be the optimal solution of the model. We have devised an effective solution algorithm ADMM to find the final result. For convenience, an auxiliary matrix is introduced into the expression to obtain the following model:

（7）

9. rewriting the model in step 8 into Lagrangian form, wherein

Is a penalty parameter->

Is the Lagrangian parameter:

（8）

10. according to the alternate direction multiplier method (ADMM algorithm), the solution of step 9 is specifically:

(1) Updating L: fix K, J, Q, and delete the irrelevant items to L, the matrix L can be solved by:

（9）

we define

We can get the formula (10):

（10）

in the formula (10), both sides calculate the bias derivative of L at the same time, and finally the iteration of the matrix L is obtained as follows:

（11）

wherein mu represents

。

(2) Update Q: fix K, J, L, and delete the irrelative terms to Q, the matrix Q can be solved by:

（12）

similar to the solution of matrix L, the iteration of the final matrix Q is as follows:

（13）

（14）

wherein the method comprises the steps of

D is a diagonal element->

Is a matrix of block diagonals of (a),

is i rows of the matrix.

(3) Update J: fix K, L, Q, and delete the irrelative terms to J, matrix J can be solved by:

（15）

wherein the method comprises the steps of

Is a weight vector, +.>

Is the i-th largest singular value of J.

Finally, an iteration type matrix J is obtained:

（16）

wherein the method comprises the steps of

Representing a unitary matrix.

(4) Updating K: fixing L, Q and J, deleting irrelevant items related to K, and solving the matrix K by the following formula:

（17）

the iteration of the resulting matrix is as follows:

（18）

the whole algorithm flow is as follows:

1. input dataset matrix X, parameters

；

2. Initializing: for low rank matrix (L), auxiliary matrix (J), projection matrix (Q), consensus kernel matrix (K), lagrange parameters

Initializing the iteration times t to 0; penalty parameter->

Maximum number of iterations->

Convergence accuracy->

；

3. When the algorithm is not converged

The following steps are performed;

3.1 assume that is known

Ask for->

；

3.2 assume that is known

Ask for->

；

3.3 assume that is known

Ask for->

；

3.4 assume that is known

Ask for->

；

3.5 updating Lagrangian parameters

Penalty parameter->

；

3.6 update t=t+1;

4. until the algorithm converges or

The algorithm is ended.

Examples

The embodiment provides specific face data to propose relevant implementation verification:

related experiments were performed on three common face data sets, including YTC, AR and ORL face data sets, the detailed information of which is shown in table 1:

all samples are divided into two parts for the feature extraction training phase and the feature extraction testing phase, respectively. The JNPRL model is compared with various advanced models thereof, and the best result is obtained in terms of accuracy of identification. This effectively demonstrates the rationality, superiority of the model.

As can be seen from table 2:

from the above table, it can be derived that the best results are obtained in this embodiment, compared with other latest feature extraction methods on three face datasets, YTC, AR and ORL, where 92.52% is best achieved in the YTC dataset, 98.27% is best achieved in the AR dataset, and 96.31% is best achieved in the ORL dataset.

The JNPRL model proposed by us is largely compared with 5 models of sparse representation classification (Spare Representation Classification, SRC), potential Low-rank representation (LatLRR), low-rank embedded (LRE), approximate Low-rank projection learning (Ap-proximate Low-rank Projection Learning, SALPL), and image feature extraction based on orthogonal projection learning (orthogonal projection learning feature extraction, OPLFE). The model obtains better results on the identification accuracy index.

We performed a comparison experiment on three face datasets, YTC, AR and ORL, with 7 best values obtained among 9 indices.

The detailed experimental results and analyses were as follows:

(1) Comparative experiments on YTC dataset:

the YouTube celebrity (YTC) dataset is a celebrity picture taken from 1900 videos, 47 celebrities are taken, and each celebrity has more than 1000 face pictures with different expressions and different scenes. Each celebrity in the experiment randomly selects 100 pictures, and 4700 face pictures in total, and the pixels of each face image are adjusted to be 30 multiplied by 30 to form a YTC sub-data set.

During the experiment, we selected t= {20, 30, 40} data samples from each class in the YTC subset data set, and the remaining images were processed as test samples. The recognition results of all the methods are shown in table 2. It is worth mentioning that JNPRL achieves the best results when the training sample number tr=40. Specifically, JNPRL has a recognition accuracy higher than OPLFE by 1.06% when the training sample number tr=20, as compared with OPLFE; when the training sample number tr=30, the recognition accuracy of JNPRL is 0.6% higher than that of OPLFE; when the training sample number tr=40, the recognition accuracy of JNPRL is 0.96% higher than that of OPLFE.

(2) Comparative experiments on AR dataset:

the face image dataset contained 4000 images of 126 persons (70 men and 56 women). Each subject provided 26 frontal images relating to different expressions, brightness and occlusion. The experiment consisted of 50 females and 50 males to construct a sub-dataset containing 100 categories, for a total of 700 data samples.

In the experimental process, t= {3,4,5} data were selected as training samples in each category, and the rest were used as test samples. As shown in table 2, the test results of the jnprl model were improved over the other models, regardless of whether the number of training samples was 3,4, or 5. When the training number is equal to the training sample number tr=3, the recognition accuracy of JNPRL is lower than OPLFE, but as the training sample number increases, the recognition accuracy of JNPRL also exceeds OPLFE. When the training sample number tr=4, the recognition accuracy of JNPRL is 1.4% higher than that of OPLFE; when training sample number tr=5, JNPRL recognition accuracy is 1.37% higher than OPLFE recognition accuracy

(3) Comparative experiments on ORL datasets

Experiments on the ORL dataset used 40 different classes of faces, each class containing 400 total face samples of 10. In the process of acquiring the face images, the facial expression and the shooting angle are changed. In addition, some subjects wear glasses and others do not wear glasses, which results in differences in the facial details of the subjects.

In this experiment, t= {5,6,7} data samples were selected from each category, and the remaining data were used as test data sets, and as can be seen from table 2, relatively excellent models such as LRE, SALPL and OPLFE models were better represented in three cases of t= {5,6,7}, whereas the results of JNPRL models were better than the recognition accuracy of all the models described above in the cases of the training sample numbers t=5 or 7. Specifically, when the training sample number tr=5, the recognition accuracy of JNPRL is 0.15% higher than that of OPLFE; when the training sample number tr=7, the recognition accuracy of JNPRL is 0.14% higher than that of OPLFE.

High-dimensional nonlinear data: high dimensional data means that the number of images in the dataset is large, while non-linearity means that the features between samples do not exist in a linear relationship. Taking an ORL data set as an example, 40 faces in different categories are provided, wherein each category comprises 400 face samples in total, and the high-dimension performance is achieved; in the process of acquiring the pictures, the facial expression and the shooting angle are changed. In addition, some subjects wear glasses and others do not wear glasses, which results in differences in the facial details of the subjects. The pictures are quite different in characteristics and are nonlinear.

Experimental analysis: from the above, on the YTC dataset, the recognition accuracy of the JNPRL proposed by us is higher than that of the OPLFE, but experimental results on the AR and ORL data can find that the recognition accuracy of the OPLFE is better than that of the JNPRL when the number of training samples is small, but the JNPRL performance is better than that of the OPLFE as the number of training samples increases, which depends on that we introduce multi-kernel learning into the JNPRL, and the introduction of the multi-kernel method enables the model to learn the high-dimensional nonlinear characteristics of the data better, so as to effectively guide the projection learning process of the data from the original space to the low-dimensional space.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The data dimension reduction method based on kernel projection learning is characterized by comprising the following specific steps of:

2. The method for dimension reduction of data based on kernel projection learning of claim 1, wherein said constructing an OPLFE model further comprises:

3. The method for data dimension reduction based on kernel projection learning according to claim 2, wherein the constructing the kernel GRAM matrix specifically comprises:

4. The method for data dimension reduction based on kernel projection learning according to claim 3, wherein the constructing the multi-kernel Gram matrix specifically comprises:

5. The method for data dimension reduction based on kernel projection learning according to claim 4, wherein the defining the kernel Gram matrix weight parameter specifically comprises:

6. The method for data dimension reduction based on kernel projection learning according to claim 1, wherein the optimizing the model specifically comprises:

7. The method for dimension reduction of data based on kernel projection learning of claim 6, wherein the solving step specifically comprises:

8. The method of claim 7, wherein the solving process further comprises:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements a data dimension reduction method based on kernel projection learning as claimed in any one of claims 1 to 8 when the program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a data dimension reduction method based on kernel projection learning as claimed in any one of claims 1 to 8.