CN110619348A

CN110619348A - Feature learning algorithm based on personalized discrimination

Info

Publication number: CN110619348A
Application number: CN201910724615.1A
Authority: CN
Inventors: 郭艳蓉; 郝世杰; 汪萌; 洪日昌; 陈涛
Original assignee: Hefei Polytechnic University
Current assignee: Hefei University of Technology; Hefei Polytechnic University
Priority date: 2019-08-07
Filing date: 2019-08-07
Publication date: 2019-12-27

Abstract

The invention discloses a characteristic learning algorithm based on personalized discrimination, which comprises the specific method of learning the algorithm of weights aiming at different types of samples, and comprises the following steps: the specific implementation method of the algorithm for learning the weights of different types of samples comprises the following steps: beginning: updating the global weight, and determining a global weight related item; updating the personalized weight, and determining the items related to the personalized weight: updating a similarity matrix, and converting an objective function through fixed variables WG and Wp: and (4) ending and starting: and updating the global weight: updating the personalized weight: updating the similarity matrix: finishing; by simultaneously exploring features compatible with all samples and features specific to each sample topic, global shared discriminative features and personalized heterogeneous features thereof are explored. In addition, the method simultaneously introduces a self-adaptive spectral clustering algorithm and mines the manifold structure of the data. The method can obtain better prediction performance.

Description

Feature learning algorithm based on personalized discrimination

Technical Field

The invention relates to the field of algorithms, in particular to a feature learning algorithm based on personalized judgment.

Background

In recent years, a large amount of high-dimensional data has appeared in applications such as data mining, machine learning, computer vision, and natural language processing. These high-dimensional data not only require enormous computational and storage costs, but also greatly affect model performance. Feature selection is the most efficient preprocessing means to process these high dimensional data. It selects relevant features directly from the original feature space, learns a more compact representative feature subset. Feature selection facilitates selection of a more compact, more interpretable model, thereby further improving learning performance.

The supervision algorithm is mainly used for classification and regression problems and aims to determine characteristics with distinguishing properties to predict samples. Sparse learning based on feature selection, all samples share the same global pattern, although it has met with some success in terms of classification accuracy. However, in practical application, the samples are highly personalized, the global model inevitably ignores the personalized features of the samples, the personalized features among the samples are not fully mined, and the heterogeneity of the samples is explored.

However, in the existing algorithm, the global mode can only determine global features with discriminant, and cannot capture heterogeneity and personality of the samples, so that the personality features among the samples are ignored, and the effect of the supervision algorithm is influenced.

Disclosure of Invention

The invention aims to provide a feature learning algorithm based on personalized discrimination, which not only establishes a global mode for all samples, but also explores personalized modes among samples so as to solve the following defects of the existing algorithm in the background art: the global mode can only determine the global features with discriminant, and can not capture the heterogeneity and personality of the samples, and the personality features among the samples are ignored.

In order to achieve the purpose, the invention provides the following technical scheme: a feature learning algorithm based on personalized judgment comprises finding out personalized features corresponding to each type of samples and global features of all the samples, so that the prediction performance of a model is improved. The method comprises the steps of firstly clustering samples of different categories, regarding each clustering result as a theme, and simultaneously searching characteristics compatible with all samples and characteristics specific to each sample theme to search global shared discriminative characteristics and personalized heterogeneous characteristics of the samples.

Preferably, the specific method of the algorithm for learning the weights of the samples in different categories includes:

beginning:

and updating the global weight:

updating the personalized weight:

updating the similarity matrix:

and (6) ending.

Preferably, the algorithm for learning the weights of the samples of different categories is implemented as follows:

beginning:

updating the global weight, and determining a global weight related item;

updating the personalized weight, and determining the items related to the personalized weight:

updating a similarity matrix, and converting an objective function through fixed variables WG and Wp:

and (6) ending.

Has the advantages that:

the invention explores the global shared discriminant features and personalized heterogeneity features of the samples by simultaneously exploring features compatible with all samples and features specific to each sample topic. In addition, the method simultaneously introduces a self-adaptive spectral clustering algorithm and mines the manifold structure of the data. The method can obtain better prediction performance.

Drawings

FIG. 1 is a schematic structural view of the present invention;

FIG. 2 is a diagram of a method of practicing the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

As shown in fig. 1-2, the present invention provides a technical solution: a feature learning algorithm based on personalized judgment comprises finding out personalized features corresponding to each type of samples and global features of all the samples, so that the prediction performance of a model is improved. Firstly, clustering samples of different categories, taking each clustering result as a theme, and exploring global shared discriminant characteristics and personalized heterogeneity characteristics of the samples by simultaneously exploring characteristics compatible with all the samples and characteristics specific to each sample theme;

the specific method of the algorithm for learning the weights of different classes of samples comprises the following steps:

beginning:

update global weights 101:

update personalized weights 102:

update similarity matrix 103:

and (6) ending.

The specific implementation method of the algorithm for learning the weights of different types of samples comprises the following steps:

beginning:

updating the global weight, and determining an item related to the global weight S101;

updating the personalized weight, and determining the items related to the personalized weight S102:

updating the similarity matrix, and converting the objective function S103 by fixing the variables WG and Wp:

and (6) ending.

Wherein the overall objective function:

wherein X ═ X₁，...，x_i，...，x_n]∈R^d×nEach sample is described by a d-dimensional feature, and each class is clustered first (class k), with m samples in each personalized cluster. W_GIs a global weight, W_PIs a personalized weight learned for each cluster. First use loss functionModeling each target function, adding constraint to the weight matrix, | | W_G||_2,1，Over-fitting is avoided and sparsity is also obtained between features.

Meanwhile, a spectral clustering algorithm is introduced into the target function to explore a data structure, but due to a large number of useless features and noise features contained in data, a Laplace matrix in the target function cannot accurately reflect the relation between samples, self-adaptive manifold structure learning is further introduced, original data are mapped to another space through an individualized weight matrix, a similarity matrix is constrained, and the accuracy of the data structure is guaranteed to the maximum extent. Therein

The optimization process of the algorithm is as follows:

secondly, preferably, the specific method of the algorithm for learning the weights of the samples in different categories includes:

(1) updating global weight W_G

Determining and global weighting W_GThe related items are:

and calculating the partial derivatives to obtain:

where U is a diagonal matrix with diagonal elements ofLet the partial derivative be 0, have:

W_G＝-(XX^T+αU)^-1X(ZW_P-Y)

(2) updating the personalized weight W_p

Determining the terms associated with the personalized weight:

l is a laplacian matrix, learned from the similarity between samples:

L-D-G, G being the similarity matrix, D being the diagonal matrix, the diagonal elements being the sum of the row elements of the G matrix.

And (3) performing partial derivation on the personalized weight:

f ∈ R in formula^kd×kdIs a diagonal matrix, the diagonal elements are defined as:

I_mjis an indication function, ifBelong toI_mjSet to 1, otherwise 0, further we can get a closed-form solution of the personalized weight:

W_P＝-(Z^TZ+βF+Z^TLZ)^-1Z^T(X^TW_G-Y)

updating the similarity matrix L

By a fixed variable W_GAnd W_pThe objective function may be converted to the following:

1 represents that the vector elements are all 1. Further, willThus, according to the lagrangian function:

wherein tau, eta are Lagrangian operators, and based on Karush-Kuhn-Tucker (KKT), a closed solution of the similarity matrix is obtained:

although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An algorithm for learning weights for different classes of samples, characterized by: and finding out the corresponding personalized features of each type of samples and the global features of all the samples, thereby improving the prediction performance of the model. The method comprises the steps of firstly clustering samples of different categories, regarding each clustering result as a theme, and simultaneously searching characteristics compatible with all samples and characteristics specific to each sample theme to search global shared discriminative characteristics and personalized heterogeneous characteristics of the samples.

2. The feature learning algorithm based on personalized discriminant as claimed in claim 1, wherein: the specific method of the algorithm for learning the weights of the samples in different categories comprises the following steps: beginning: update global weight (101): update personalization weight (102): update similarity matrix (103):

and (6) ending.

3. The feature learning algorithm based on personalized discriminant as claimed in claim 1, wherein: the specific implementation method of the algorithm for learning the weights of different types of samples comprises the following steps:

beginning: updating the global weight, and determining a global weight related item (S101);

updating the personalized weight, and determining the items related to the personalized weight (S102):

updating the similarity matrix, and converting the objective function by fixing the variables WG and Wp (S103):

and (6) ending.