CN111582321A

CN111582321A - Tensor subspace learning algorithm based on HSIC maximization

Info

Publication number: CN111582321A
Application number: CN202010303130.8A
Authority: CN
Inventors: 马争鸣; 陈李创凯; 甘伟超; 冯伟佳; 刘洁
Original assignee: National Sun Yat Sen University
Current assignee: Sun Yat Sen University; National Sun Yat Sen University
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2020-08-25

Abstract

The invention relates to a method for researching the dimension reduction problem of multi-dimensional data. The invention uses a tensor to represent a multi-dimensional dataset, wherein the first dimensions of the tensor represent the dimensions of the multi-dimensional data, and the last dimension represents the number of data contained in the dataset. Because the mode product of the tensor and the matrix can change the size of a certain dimension of the tensor, the tensor data dimension reduction model based on the tensor mode product is provided, and the matrix of the mode product is optional and can be determined according to different criteria. The invention determines the matrix of the mode product according to the criterion of HSIC maximization between the two tensors before and after dimensionality reduction. The HSIC transforms the two data sets onto two re-generated nuclear Hilbert spaces (RKHS) and then measures the statistical dependence of the two transformed data sets using an HS operator between the two RKHS. The advantage of the present invention is that the RKHS is optional and one can select the RKHS with the best dimension reduction effect based on a given data set.

Description

Tensor subspace learning algorithm based on HSIC maximization

Technical Field

The invention belongs to the field of machine learning, and relates to a brand-new algorithm based on HSIC maximized subspace learning in manifold learning, which is applied to dimensionality reduction of tensor data, original data and dimensionality reduced data are respectively mapped into two different Regeneration Kernel Hilbert Spaces (RKHS), and the statistical dependency between two data sets is measured by using an HS operator between the two RKHS, so that the dimensionality reduced subspace is determined, and the data set of the space retains the geometric structure of the original data set as much as possible. The problem of the machine learning dimension disaster is solved by a dimension reduction method.

Background

With the advent of the big data age, problems related to dimensionality disasters have become more and more serious. Therefore, subspace learning algorithms are also gaining increasing importance. The subspace learning algorithm is one of dimension reduction algorithms, the subspace learning meaning is that the mapping from high-dimensional features to low-dimensional space is realized through projection, and the method is a classical dimension reduction idea. In pattern recognition, most of the possible dimension reduction (dimension reduction, projection) algorithms are computed as subspace learning, such as PCA, LDA, LPP, LLE, etc. The main problems of subspace learning are how to compress features from a high-dimensional space to a low-dimensional space, what information needs to be retained, what criteria are set, what features the features of the low-dimensional space have, and the like.

The HSIC criterion is to scale the statistical dependency between the data sets with HS operators in both RKHS and maximize this dependency. Thereby determining the orthonormal basis of the reduced subspace. However, the most critical problem for mapping the original data set to the RKHS is how to preserve the geometry of the original data, and the kernel determines the RKHS, so the selection of the kernel is also an important issue.

In mathematical definition, a function satisfies symmetry, square integrable, and positive definite, and then this function is called a kernel function. According to Moore-Aronszajn theorem: knowing a kernel function k (x, y), there is only one Hilbert space H, such that H is a regenerating kernel Hilbert space, and k (x, y) is the regenerating kernel of H, it can be seen that as long as a kernel function is defined, an RKHS is defined, as well as the regenerating kernels of RKHS. Because the dimensionality of tensor data is high, in a machine learning algorithm, the problem of dimensionality disaster can occur in the processing process of the tensor data, and therefore dimensionality reduction processing needs to be performed on the data. Dimension reduction is the main application of manifold learning, and most manifold learning algorithms are local feature retention algorithms from the aspect of dimension reduction. This may be due to the mathematical nature of the manifold. In mathematics, a manifold is defined as a local, isogenous manifold in Euclidean space. In recent years, manifold learning based local and global feature preserving algorithms have been widely used. In many such algorithms, the global (linear or non-linear) relationships between high-dimensional and low-dimensional data are first determined and then substituted into the manifold learning objective function to determine these global relationships.

In contrast, subspace learning and tensor data dimensionality reduction are combined, the original tensor data are considered to be elements of a high-dimensional data space, and the data subjected to dimensionality reduction are considered to be elements of a learned low-dimensional subspace. Mapping the high-dimensional data and the low-dimensional data into two different RKHS, and measuring the statistical dependence of the two different RKHS by using an HS operator between the two different RKHS to make the dependence maximum. That is, the subspace is determined by finding the orthonormal basis of the target subspace using the criterion of HSIC maximization. By doing so, the data after dimensionality reduction can keep the geometric structure of the original data as much as possible, and a better dimensionality reduction effect is achieved.

Disclosure of Invention

In the prior machine learning, a lot of input data are non-Europe data, the data dimension is high, and the linear operation can not be directly carried out, so that the input data are mapped onto the RKHS by using the kernel function, and the linear operation is carried out on the RKHS, so that various machine learning problems can be well processed. The dimensionality disaster is solved by performing dimensionality reduction on the RKHS. The kernel functions used in the machine learning algorithm are substantially fixed and therefore the RKHS mapped by the kernel functions is also fixed and different RKHS is generated by different kernel functions. And each RKHS corresponds to the application of different fields, and the application of the dimension reduction algorithm in manifold learning in different fields is generalized. Therefore, the invention provides a subspace learning framework. Most of the objective functions of manifold learning can be simplified to the following form:

where tr (-) is the trace of the matrix, Y is the data after dimensionality reduction, L is a symmetric semi-positive definite matrix and is derived fromHigh dimensional data according to different manifold learning algorithms. The high-dimensional data X and the low-dimensional data Y are set to be linearly related, that is: y ═ W^TX, the linear transformation matrix W is determined by the following manifold-learned objective function:

the algorithm represented by this formula is LPP or a variant of LPP, theoretically tr (YLY)^T) It can be said to be the objective function of any manifold learning algorithm.

The framework of dimension reduction of tensor data based on subspace learning is to a high-dimensional data set

Requires finding a subspace span W according to a certain criterion to obtain

The coordinates projected on the subspace span w, namely:

also called as

The fourier coefficients of (a). Where span W is the space spanned by the column vectors of W,

and J_n＜＜L_nN is 1,2, …, N-1. For tensor data, the operations satisfying the tensor product by the tensor data operations include:

and (3) unfolding the tensor to obtain a matrix form as follows:

for subspace learning, the orthonormal basis for the subspace should be determined according to a certain criterion, wherein,

a common criterion is the minimum distance criterion, namely: raw data

The distance from the projected data is minimal as follows:

this algorithm is now the so-called PCA algorithm. However, the algorithm proposed by the present invention is based on the criterion of HSIC maximization to determine the orthonormal basis of the subspace.

The invention has the characteristics and significance that:

(1) the problem of dimension reduction of multidimensional data is studied herein. Unlike most applications that use a tensor to represent a multi-dimensional data, the first contribution herein is to represent a multi-dimensional dataset using a tensor, where the first dimensions of the tensor represent the dimensions of the multi-dimensional data and the last dimension represents the number of data contained in the dataset.

(2) The tensor-matrix product can change the size of a certain dimension of the tensor, therefore, the second contribution of the invention is to provide a tensor data dimension reduction model based on the tensor product, and in the model, the matrix of the product is optional and can be determined according to different criteria.

(3) And determining a matrix of the mode product according to the criterion of HSIC maximization between the two tensors before and after dimensionality reduction. The HSIC transforms the two data sets onto two re-generated nuclear Hilbert spaces (RKHS) and then measures the statistical correlation of the two transformed data sets using an HS operator between the two RKHS.

Drawings

FIG. 1: a tensor subspace learning algorithm flow chart based on HSIC maximization.

Detailed Description

A tensor subspace learning algorithm based on HSIC maximization comprises the following specific contents:

in this invention, HSIC (X, Y) maximization is used as a criterion for dimension reduction, namely:

in other words, it aims to find a low-dimensional Euclidean space R^dThe data set Y of (A) is matched with a high-dimensional Euclidean space R as much as possible by using HSIC^DIs linearly correlated (statistically correlated). Y can be regarded as the result of X dimensionality reduction. For ease of description, in the next section of this document, the algorithm presented here is referred to as HSIC. Compared with other dimension reduction algorithms with linear correlation requirement (such as PCA, wherein Y is W^TX and W are linear transformation matrices), the HSIC algorithm more respects the nature of the data itself.

In HSIC (X, Y), the dimension reduction result Y is hidden in the kernel matrix K_YThis will be disadvantageous for the formula

Solution of the HSIC problem shown. To express Y explicitly, the kernel function of Y in HSIC is defined as k_Y:R^d×R^d→ R, and for any y', y "∈ R^d，

k_Y(y',y”)＝y'^Ty”+k(y',y”)

Wherein, k is more than 0,

is to guarantee k theoretically_YAdd up for positive nature.

Obviously, in the formula k_Y(y',y”)＝y'^Ty "+ k (y', y") represents a function k_YIs a kernel function, and thus, the kernel matrix K_YCan be expressed as follows:

substituting the above formula into formula

The following can be obtained:

due to tr (C)_NK_XC_N) Independent of Y and N or κ independent of Y, then the problem of the above formula may be equivalent to the following:

because Y is W^TX, and X is the known original dataset, then selecting Y is equivalent to selecting W, so the above equation is equivalent to the following problem:

obviously, the above formula is simple and easy to understand and use. In addition, the solution to the problem presented by the above equation is very simple. In fact, due to the kernel matrix K_XIs a symmetric positive definite matrix, and the solution of the problem represented by the above formula can be converted into W^TW＝I_dAnd calculating the maximum Rayleigh entropy under the orthogonal condition. The rayleigh entropy calculation problem is a common problem for matrix calculations. There are many existing source programs available for invocation.

The application of the above-described framework of subspace learning to high-dimensional tensor data is specifically as follows:

first, a tensor data is known

It can be converted into low-dimensional tensor data

So that

And

the statistical correlation between them is maximal, wherein J_n＜＜L_nN is 1, …, N-1. Then we set up

Wherein

n＝1,…,N-1,

Are coordinates on the subspace. The task of dimension reduction is to find the matrix A_nN is 1, …, N-1. This mainly uses the HSIC maximization criterion to define this subspace. Therefore, the temperature of the molten metal is controlled,

can be converted into

Wherein

Finally order

Substituted type

The following can be obtained:

the final objective function can be obtained as follows:

wherein the content of the first and second substances,

and is

Is a kernel function.

It is also simple to solve the problem that this equation reveals. In fact, due to

Is a symmetric positive definite matrix, the problem presented by solving this equation can be translated into

Under the condition (2), calculating the maximum value of Rayleigh entropy. According to the above, the objective function of the present invention can be obtained, namely:

according to the lagrange multiplier method, we can obtain:

thus only the matrix pair is required

Decomposing the characteristic value and taking the front

The feature vectors constitute W. Thereby completing the learning of the subspace.

Claims

1. A tensor subspace learning algorithm based on HSIC maximization is characterized in that:

A. a subspace learning framework based on HSIC maximization is proposed, and the HSIC criterion is to measure the data dependency between data sets in two regeneration core Hilbert spaces (RKHS). The data dependence effectively reflects the geometric structure of the manifold through the information of the existing data samples, so that the kernel function can keep the original geometric structure of the manifold while mapping;

B. applying the subspace learning framework to tensor data;

C. considering that the tensor data has a high dimensionality and a dimensionality disaster occurs, directly processing the tensor data is complicated. Therefore, tensor data are mapped to the RKHS and subjected to dimension reduction processing;

D. expressing a multi-dimensional data set by using a tensor, and reducing dimensions by using a dimension reduction model of tensor data based on a tensor mode product;

E. the data set after dimensionality reduction adopts an inner product core as a regeneration core to further generate RKHS;

F. the regeneration core of the original dataset is optional, with different regeneration cores producing different RKHS;

G. measuring statistical dependencies between the two data sets using an HS operator between the two RKHS;

H. determining a matrix of a mode product by adopting an HSIC maximization criterion, and further determining a standard orthogonal basis of a subspace;

I. the algorithm of the patent is suitable for data sets such as face recognition and object classification in the field of machine learning.