CN115563492A

CN115563492A - Data dimension reduction method and device based on local ratio and linear discriminant analysis

Info

Publication number: CN115563492A
Application number: CN202211547519.2A
Authority: CN
Inventors: 杨晓君; 周科艺; 闵海波; 曹传杰; 程昱
Original assignee: Beijing Aibingo Technology Co ltd; Guangzhou University Town Guangong Science And Technology Achievement Transformation Center; Guangdong University of Technology
Current assignee: Beijing Aibingo Technology Co ltd; Guangzhou University Town Guangong Science And Technology Achievement Transformation Center; Guangdong University of Technology
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2023-01-03

Abstract

The invention belongs to the technical field of data processing, and discloses a data dimension reduction method and a device based on local ratio and linear discriminant analysis.A current neighbor matrix and a current projection matrix are initialized according to high-dimensional sample data, then the current neighbor matrix is updated according to the current projection matrix to obtain an optimized neighbor matrix, then the current projection matrix is trained according to the optimized neighbor matrix to obtain a target projection matrix, finally the optimized neighbor matrix is updated by using the target projection matrix to obtain a target neighbor matrix, if the local ratio and a model which are constructed in advance are converged, the target projection matrix is determined to be an optimal solution, and the optimal solution is used for data dimension reduction; the method can consider the local structure of the sample data by introducing the neighbor weight, better adapts to a real world data set, and simultaneously reduces the noise influence of high-dimensional data on the neighbor matrix by utilizing the alternate mutual training optimization between the projection matrix and the neighbor matrix, thereby further optimizing the dimension reduction effect.

Description

Data dimension reduction method and device based on local ratio and linear discriminant analysis

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a data dimension reduction method, a data dimension reduction device, data dimension reduction equipment and a data dimension reduction storage medium based on local ratio and linear discriminant analysis.

Background

With the development of science and technology, the sampling precision of modern sensors is higher and higher, and the dimensionality of sample data obtained by sampling is increased. However, the discrimination performance of the classifier is not always positively correlated with the increase of the sample data dimension, but after the discrimination performance of the classifier is increased to a critical point, the performance of the classifier is deteriorated if the sample data dimension is continuously increased, which is a well-known "houss effect", and the increase of the sample data dimension also causes the calculation cost index of the classifier to increase.

To solve the above problem, a large number of scholars propose to use a dimension reduction algorithm to map data points to a low-dimensional subspace to find a representation with optimal discriminant performance. The dimension reduction algorithm can be further subdivided into a feature selection algorithm and a feature extraction algorithm, the feature selection algorithm only searches an optimal subset of the original features, the optimal subset and the original feature set are in an included relation, and the original feature space is not changed; and the feature extraction algorithm aims to find the optimal projection direction through linear transformation, so that the original feature space is changed.

Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are the most popular feature extraction dimension reduction algorithms in the unsupervised and supervised fields, respectively. Unsupervised PCA aims to find a variance information that remains as much as possible. While supervised LDA introduces label information to find a projection space that can simultaneously maximize intra-class method distances and minimize inter-class variance distances. LDA was first proposed only for the dichotomy problem, and later further expansion to the multiclassification task has brought it to the increasing interest.

Unfortunately, the original LDA is a trace ratio (first finding the trace and then finding the ratio) problem, which is difficult to directly solve to obtain a closed-form solution. Therefore, some scholars convert the projection matrix into a ratio trace problem (firstly, the ratio is calculated, and then the trace is calculated) to solve the projection matrix. However, the algorithm cannot consider the local structure of the data, and cannot better adapt to the real world data set, and the solution obtained by such conversion is suboptimal, and the optimal projection matrix cannot be obtained.

At present, scholars try to convert the trace ratio problem into a ratio sum (ratio is first solved and then summed) problem for solving, for example, an adaptive neighbor local ratio sum linear discriminant analysis algorithm which maximizes the ratio sum and solves by using a greedy algorithm is proposed in the prior art, the algorithm introduces a local concept, and weights are distributed through a parametrical strategy so as to further obtain a better projection matrix. However, in practice, it is found that the graph constructed by the algorithm is constructed by using the original data, and the original space has a large amount of noise, so that the constructed affinity matrix is influenced by the noise of the original data and is still suboptimal.

Disclosure of Invention

The invention aims to provide a data dimension reduction method, a data dimension reduction device, data dimension reduction equipment and a data dimension reduction storage medium based on local ratio and linear discriminant analysis, which can consider the local structure of a sample, better adapt to a real world data set, reduce the noise influence of original data and further optimize the dimension reduction effect.

The invention discloses a data dimension reduction method based on local ratio and linear discriminant analysis, which comprises the following steps:

initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;

updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;

training a current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;

updating the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;

substituting the target projection matrix and the target neighbor matrix into a local ratio and model;

if the substituted local ratio and the model are converged, outputting the target projection matrix as an optimal solution;

and performing dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.

The second aspect of the present invention discloses a data dimension reduction device based on local ratio and linear discriminant analysis, which includes:

the initialization unit is used for initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;

the first updating unit is used for updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;

the training unit is used for training the current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;

the second updating unit is used for updating the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;

a substituting unit, configured to substitute the target projection matrix and the target neighbor matrix into a local ratio and a model;

the output unit is used for outputting the target projection matrix as an optimal solution when the substituted local ratio and the model are converged;

and the processing unit is used for carrying out dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.

A third aspect of the invention discloses an electronic device comprising a memory storing executable program code and a processor coupled to the memory; the processor calls the executable program code stored in the memory for executing the data dimension reduction method based on the local ratio and linear discriminant analysis disclosed in the first aspect.

A fourth aspect of the present invention discloses a computer-readable storage medium storing a computer program, wherein the computer program causes a computer to execute the data dimension reduction method based on local ratio and linear discriminant analysis disclosed in the first aspect.

The method, the device and the storage medium have the advantages that the method, the device and the storage medium for data dimension reduction based on the local ratio and the linear discriminant analysis are initialized according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix, then the current neighbor matrix is updated according to the current projection matrix to obtain an optimized neighbor matrix, then the current projection matrix is trained according to the optimized neighbor matrix to obtain a target projection matrix, finally the optimized neighbor matrix is updated by using the target projection matrix to obtain a target neighbor matrix, if the target projection matrix and the target neighbor matrix can enable the local ratio and the model which are constructed in advance to be converged, the target projection matrix is determined to be an optimal solution, and finally the optimal solution is used for data dimension reduction. Therefore, the method can take the local structure of the sample data into consideration by introducing the neighbor weight, can better adapt to a real world data set, simultaneously utilizes the alternate mutual training optimization between the projection matrix and the neighbor matrix, and can also carry out the iterative optimization on the neighbor matrix while carrying out the iterative optimization on the projection matrix.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the principles and effects of the invention.

Unless otherwise specified or defined, the same reference numerals in different figures represent the same or similar technical features, and different reference numerals may be used for the same or similar technical features.

FIG. 1 is a flow chart of a data dimension reduction method based on local ratio and linear discriminant analysis according to an embodiment of the present invention;

FIG. 2 is a two-dimensional visualization of a prior LDA algorithm on a synthetic three-ring dataset;

FIG. 3 is a two-dimensional visualization of a prior art Decomposed Newton's Method (DNM) algorithm on a synthetic three-ring dataset;

FIG. 4 is a two-dimensional visualization of the Greedy Ratio Sum (GRS) algorithm of the prior art on a synthetic three-ring dataset;

FIG. 5 is a two-dimensional visualization of a prior art Local Fisher Discriminant Analysis (LFDA) algorithm on a synthetic three-loop dataset;

FIG. 6 is a two-dimensional visualization diagram of a prior Local Sensitive Discriminant Analysis (LSDA) algorithm on a synthetic three-loop dataset;

FIG. 7 is a two-dimensional visualization of a prior art Dynamic Maximum Entropy map (DMEG) algorithm on a synthetic tricyclic data set;

FIG. 8 is a two-dimensional visualization of the Adaptive Neighbor Local Ratio Sum Linear Discriminant Analysis (ANLRSLDA) algorithm on a synthetic three-loop dataset;

FIG. 9 is a two-dimensional visualization diagram of the Local Ratio and Discriminant Analysis based on Adaptive Subspace map (LRSDAASG) algorithm on the synthetic triple-ring dataset proposed by the present invention;

FIG. 10 is a schematic structural diagram of a data dimension reduction apparatus based on local ratio and linear discriminant analysis according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Description of the reference numerals:

101. an initialization unit; 102. a first update unit; 103. a training unit; 104. a second updating unit; 105. a substitution unit; 106. an output unit; 107. a processing unit; 108. a circulation unit; 1101. a memory; 1102. a processor.

Detailed Description

Unless specifically stated or otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In the case of combining the technical solutions of the present invention in a realistic scenario, all technical and scientific terms used herein may also have meanings corresponding to the purpose of achieving the technical solutions of the present invention. As used herein, "first and second" \ 8230, "are used merely to distinguish between names and do not denote a particular quantity or order. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

As used herein, unless otherwise specified or defined, the terms "comprises," "comprising," and "including" are used interchangeably and refer to the term "comprising," and are used interchangeably and refer to the term "comprising," or "comprises," as used herein.

It is needless to say that technical contents or technical features which are contrary to the object of the present invention or clearly contradicted by the object of the present invention should be excluded. In order to facilitate an understanding of the invention, specific embodiments thereof will be described in more detail below with reference to the accompanying drawings.

As shown in fig. 1, an embodiment of the present invention discloses a data dimension reduction method based on local ratio and linear discriminant analysis, which may be implemented by computer programming. The executing main body of the method may be an electronic device such as a computer, a notebook computer, a tablet computer, or a data dimension reduction device based on local ratio and linear discriminant analysis embedded in the electronic device, which is not limited in the present invention. In this embodiment, an electronic device is taken as an example for explanation. The method comprises the following steps S10-S80:

and S10, initializing the electronic equipment according to the high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix.

In the embodiment of the invention, an initial projection matrix can be randomly generated as a current projection matrix, and in addition, a current neighbor matrix can be randomly generated or constructed according to high-dimensional sample data of an original space.

The method for constructing the current neighbor matrix according to the high-dimensional sample data in the original space specifically includes that neighbor weight distribution is performed according to the high-dimensional sample data in the original space to initialize the neighbor matrix and obtain the current neighbor matrix.

Before executing step S10, the electronic device may further obtain a plurality of original sample data and tag information thereof, and then perform sorting and normalization processing on the plurality of original sample data according to the tag information to obtain high-dimensional sample data.

It should be noted that, in the embodiment of the present invention, the local ratio and the model are constructed in advance. The local ratio and model may be a locally maximized ratio and model, or may be a locally minimized ratio and model. Preferably, the search strategy for locally minimizing the ratio and the model is better than that for locally maximizing the ratio and the model. The derivation process aiming at the local minimum ratio and the model comprises the following steps S01-S02:

s01, minimize Ratio Sum (Ratio Sum) algorithm

Set high dimensional sample data

Wherein, in the process,

is the total number of the samples,

is a characteristic dimension. Assuming high dimensional sample data

Have been sorted by label size. Low dimensional sample data

For high dimensional sample data

Is represented by a low-dimensional representation of, among others,

for subspace dimensions (i.e. common)

One projection direction), then there are:

wherein,

represent

The transpose matrix of (a) is,

as high dimensional sample data

Converting to low dimensional sample data

The projection matrix of (2).

Some existing dimension reduction algorithms can write a Trace Ratio value (Trace Ratio) problem in the form of the following formula (2):

if it is

The above formula is the objective function of LDA, wherein

Is an inter-class divergence matrix, and is,

the optimization goal of the intra-class divergence matrix is to make the data distance of the same class as close as possible and the data distance between different classes as far as possible after projection. But the objective function of Trace Ratio may get some bad projection directions. Therefore, a Ratio Sum algorithm is proposed, and the specific expression form is as the following formula (3):

but to maximize this problem, it still has drawbacks. Therefore, the preferred option of the present invention is to invert each ratio term to solve a minimization problem, and obtain the expression shown in the following formula (4):

s02, adaptive subspace map

To complete the derivation, equation (4) is converted to vector form, i.e., equation (5) is obtained:

wherein,

the total number of samples is represented by,

respectively represent the first in the total sample

The number of the data is one,

the number of sample categories is indicated and,

is a first

The total number of class samples is,

are respectively the first

Class II specimen

A piece of data;

is the total number of the projection directions,

representing projection matrices, their column vectors

Represents the first

The direction of the projection is determined by the direction of the projection,

；

in

As identity matrix, superscript

Is a transposed symbol;

represents the first

Class II specimen

Data and number one

The weight between the pieces of data is calculated,

denotes the first

Class II specimen

Data and the second

The weight between each data.

The above formula (5) means that an optimal projection matrix is found

The euclidean distance between samples of the same class is made as close as possible and the distance between all data samples is made as far as possible.

In order to allow the model to take into account local structure, a weighting factor is added to the equation of the denominator

And obtaining a local minimum ratio and a model as shown in the following formula (6):

（6）

in the formula (II)

And

when the phase difference is equal to each other,

the distance after the projection of the same kind of samples is required to be as close as possible, and the penalty factor of the weight is also as small as possible.

Further preferably, considering that the formula (6) may have a trivial solution, a maximum entropy regular term constraint may be added, i.e. the following expression (7) is obtained:

（7）

based on this, steps S10 to S60 of the embodiment of the present invention are executed, and the problem (7) is actually solved.

Problem (7) is an NP problem and a polynomial solution cannot be directly obtained. The embodiment of the invention provides a new optimization method, firstly, a neighbor matrix is given

Neighbor matrix

Is formed by

Block diagonal moment of formationThe matrix is shown as formula (8):

wherein,

indicating the number of sample categories.

In order to solve the problem (7) smoothly, a projection matrix is initialized randomly

As a current projection matrix

And using the original sample data to the neighbor matrix

The weight is assigned to the element(s) in (1), that is, the following formula (9) is calculated, so as to obtain the initialized neighbor matrix as the current neighbor matrix

：

Wherein,

to represent

In the same group of adjacent combinations

In the step (1), the first step,

representing weight factors, i.e. elements in a neighbor matrix, for describingSample data

A weight relationship therebetween when

And

when the time is equal to each other, the two phases,

。

and S20, the electronic equipment updates the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix.

In step S20, the electronic device may specifically fix the current projection matrix

Updating neighbor matrices

. I.e. by the current projection matrix

Projecting high-dimensional sample data to subspace to obtain subspace sample data, and then carrying out nearest neighbor matrix alignment according to the subspace sample data

Carrying out neighbor weight redistribution to obtain an optimized neighbor matrix

。

It is understood that if

As is known, the problem (7) can be simplified as:

（10）

the corresponding lagrange function is:

derivation is 0, which gives:

therefore, using equation (12), samples of the same type are analyzed

Front of (2)

The nearest neighbor samples are redistributed with weights so as to realize the updating of the current nearest neighbor matrix

Obtaining an optimized neighbor matrix

。

Comparing equation (9) with equation (12), equation (9) can be derived by calculating the original sample data points

The Euclidean distance between the two is used as the distribution weight value

Such a weight assignment strategy is susceptible to interference from redundant features in the original sample data. While equation (12) may pass through the current projection matrix

Computing subspace sample data projected into a subspace

The Euclidean distance between the two is used as the distribution weight value

The problem can be well avoided.

And S30, the electronic equipment trains the current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix.

At a fixed current projection matrix

Updating neighbor matrices

Obtaining an optimized neighbor matrix

Then, the optimized neighbor matrix obtained by fixing the above steps can be further obtained

Then training the current projection matrix

Obtaining a target projection matrix

。

It is to be understood that if

As is known, the problem (7) can be simplified as:

wherein,

is equivalent to

Covariance matrix

。

Is equivalent to

，

,

Is a diagonal matrix, each diagonal element is

. Equation (13) can therefore be rewritten as:

comparative formula (4) then has

. Suppose that

Already for the optimal solution, it is now necessary to solve

In particular:

then there is a change in the number of,

the corresponding lagrange function is:

about

Taking the derivative, let the derivative be 0, we can obtain:

the minimum eigenvalue of equation (18) corresponds to a eigenvector of

. Alternative solution

Continuously and iteratively updating until the objective function value corresponding to the problem (13) is converged, thereby obtaining the objective projection matrix

。

And S40, the electronic equipment updates the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain the target neighbor matrix.

Based on optimizing the neighbor matrix

Training to obtain a target projection matrix

The target projection matrix can then be reused

For optimizing neighbor matrix

Updating to obtain target neighbor matrix

Then, it is determined whether the problem (7) converges, i.e., step S50 is performed.

And S50, substituting the target projection matrix and the target neighbor matrix into the local ratio and the model by the electronic equipment to judge whether the local ratio and the model are converged. If yes, executing steps S60-S70; otherwise, step S80 is executed and the flow goes to step S20.

All the above mentioned criteria for determining whether the objective function value converges are:

wherein,

the objective function value for the previous iteration,

for the value of the objective function of the current iteration,

to converge the threshold, it is usually set empirically. For example, in some embodiments, it is arranged to

。

And S60, the electronic equipment outputs the target projection matrix as an optimal solution.

And S70, the electronic equipment performs dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.

S80, the electronic equipment projects the target to a matrix

As a new current projection matrix

And neighbor the target to the matrix

As a new current neighbor matrix

And the step S20 is executed in a turning manner until the local ratio and the model converge.

Wherein,

、

representing the current quantity after the previous iteration,

、

is the current quantity after this iteration,

、

can be obtained by respectively initializing corresponding formulas.

To sum up, by alternately fixing the neighbor matrices

And a projection matrix

One item is updated, and the other item is updated until the objective function value corresponding to the problem (7) is converged, so that the optimal objective projection matrix can be obtained

And obtaining an optimal solution.

Examples

The specific algorithm is shown in the following steps S91-S99:

s91, the electronic equipment acquires original sample data input by a user and label information thereof

Number of neighbors

Subspace dimension

And regularization term parameter

；

S92, the electronic equipment sorts and normalizes the original sample data according to the label information to obtain high-dimensional sample data

. Random initialization projection matrix

Satisfy the requirement of

Initializing the neighbor matrix using equation (9)

。

S93, the electronic equipment utilizes the current projection matrix

Updating the current neighbor matrix according to equation (12)

Obtaining an optimized neighbor matrix

。

S94, the electronic equipment utilizes the latest optimized neighbor matrix

Training the current projection matrix according to equation (18)

Until the objective function value of the problem (13) converges, an objective projection matrix is obtained

。

S95, the electronic equipment utilizes the target projection matrix

Updating the optimized neighbor matrix according to equation (12)

Obtaining a target neighbor matrix

。

S96, the electronic equipment projects the matrix according to the latest target

And target neighbor matrix

And judging whether the objective function value corresponding to the problem (7) converges. If the convergence is reached, executing steps S97-S98, otherwise executing step S99, returning to step S93, and re-executing steps S93-S96.

S97, electronic equipment output target projection matrix

As the optimal solution.

And S98, the electronic equipment performs dimension reduction on the high-dimensional sample data according to the optimal solution.

S99, the electronic equipment projects the target to a matrix

As a new current projection matrix

And neighbor the target to the matrix

As a new current neighbor matrix

。

Therefore, by introducing the neighbor weight, the embodiment of the invention can consider the local structure of the sample data, can better adapt to the real world data set, and simultaneously utilizes the alternate mutual training optimization between the projection matrix and the neighbor matrix, namely, the newly learned projection matrix is utilized to project the high-dimensional sample data to the optimal subspace, so as to utilize the data of the optimal subspace to construct a new neighbor matrix. In addition, the embodiment of the invention provides a minimized ratio and a model, so that the searching strategy is better than the maximized ratio and the model, and the accuracy is further improved.

In order to verify the effectiveness of the embodiment of the invention, two experiments are designed and compared with the existing mainstream dimension reduction algorithm, and the existing mainstream dimension reduction algorithm comprises the following steps: LDA, DNM, GRS, LFDA, LSDA, DMEG, ANLRSLDA, and the algorithm proposed by the invention is abbreviated as LRSDAASG.

Experiment one: the local retention between different algorithms is verified by a synthetic three-ring dataset. Specifically, the synthesized data is two-dimensional data of three concentric circles, to which 50 noise dimensions are artificially added. Then, learning a projection matrix by using different dimension reduction algorithms, projecting the synthesized data to a corresponding space, and performing visual display, which is specifically shown in fig. 2 to 9. It can be concluded that the dimension reduction algorithms (LDA, DNM, GRS) of fig. 2 to 4 all lack the ability of local maintenance, can only consider the overall characteristics of the sample, and suffer from the interference of noise dimension seriously, and thus are not well suited for real-world datasets. The dimension reduction algorithms (LFDA, LSDA, DMEG, ANLRSLDA) of fig. 5 to 8, although having the capability of considering the local structure, can only roughly see that the samples are concentric circles, whereas the algorithm of the present invention (LRSDAASG) shown in fig. 9 better retains the local structure of the samples.

Experiment two: experiments were performed on real world datasets to verify the validity of kicked-out algorithms. The UMIST dataset is used as the reference dataset. Specifically, 40% of samples in the data set are randomly selected as a training set to learn a projection matrix, and then the rest of samples are projected to the learned subspace. And then using the 1NN classifier as accuracy verification. This experiment was repeated ten times, with the results shown in table 1 below:

TABLE 1 Performance comparison of the algorithm of the present invention with the existing mainstream dimensionality reduction algorithm

As shown in fig. 10, the embodiment of the present invention discloses a data dimension reduction device based on local ratio and linear discriminant analysis (hereinafter referred to as data dimension reduction device), which includes an initialization unit 101, a first updating unit 102, a training unit 103, a second updating unit 104, a substitution unit 105, an output unit 106, a processing unit 107, and a circulation unit 108, wherein,

an initialization unit 101, configured to initialize according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;

the first updating unit 102 is configured to update the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;

the training unit 103 is used for training the current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;

the second updating unit 104 is configured to update the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;

a substituting unit 105, configured to substitute the target projection matrix and the target neighbor matrix into the local ratio and the model;

an output unit 106, configured to, after the substituting unit 105 substitutes the target projection matrix and the target neighbor matrix into the local ratio and the model, output the target projection matrix as an optimal solution if the substituted local ratio and the model are converged;

and the processing unit 107 is configured to perform dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.

As an alternative implementation, the initialization unit 101 may include the following sub-units not shown in the figure:

the first allocation subunit is used for performing neighbor weight allocation according to high-dimensional sample data to obtain a current neighbor matrix;

and the generation subunit is used for randomly generating an initial projection matrix and taking the initial projection matrix as the current projection matrix.

Optionally, the data dimension reduction apparatus may further include the following sub-units, not shown:

the acquisition unit is used for carrying out neighbor weight allocation on the first allocation subunit according to the high-dimensional sample data and acquiring a plurality of original sample data and label information thereof before acquiring the current neighbor matrix;

and the preprocessing unit is used for sequencing and normalizing the plurality of original sample data according to the label information to obtain high-dimensional sample data.

As an optional implementation manner, the data dimension reduction apparatus may further include a circulation unit 108, configured to, after the substitution unit 105 substitutes the target projection matrix and the target neighbor matrix into the local ratio and the model, if the substituted local ratio and the model are not converged, take the target projection matrix as a new current projection matrix and the target neighbor matrix as a new current neighbor matrix, and trigger the first updating unit 102 to repeatedly perform an operation of updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix until the local ratio and the model are converged.

Further optionally, the first updating unit 102 may include the following sub-units not shown in the figure:

the projection shadow unit is used for projecting the high-dimensional sample data to a subspace by utilizing the current projection matrix to obtain the subspace sample data;

and the second sub-allocation subunit is used for performing neighbor weight reallocation on the current neighbor matrix according to the subspace sample data to obtain an optimized neighbor matrix.

As shown in fig. 11, an embodiment of the present invention discloses an electronic device, which includes a memory 1101 storing executable program codes and a processor 1102 coupled to the memory 1101;

the processor 1102 calls the executable program code stored in the memory 1101 to execute the data dimension reduction method based on the local ratio and linear discriminant analysis described in the above embodiments.

The embodiment of the invention also discloses a computer readable storage medium which stores a computer program, wherein the computer program enables a computer to execute the data dimension reduction method based on the local ratio and the linear discriminant analysis described in the embodiments.

The purpose of the above embodiments is to make an exemplary reproduction and derivation of the technical solutions of the present invention, and to fully describe the technical solutions, objects and effects of the present invention, so as to make the public more thoroughly and comprehensively understand the disclosure of the present invention, and not to limit the protection scope of the present invention.

The above examples are not intended to be exhaustive of the invention and there may be many other embodiments not listed. Any alterations and modifications without departing from the spirit of the invention are within the scope of the invention.

Claims

1. The data dimension reduction method based on the local ratio and the linear discriminant analysis is characterized by comprising the following steps of:

substituting the target projection matrix and the target neighbor matrix into a local ratio and a model;

2. The data dimension reduction method of claim 1, wherein after substituting the target projection matrix and the target neighbor matrix into local ratio values and models, the method further comprises:

if the substituted local ratio and the model are not converged, taking the target projection matrix as a new current projection matrix and taking the target neighbor matrix as a new current neighbor matrix, and repeatedly executing the step of updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix until the local ratio and the model are converged.

3. The method of claim 1, wherein updating the current neighbor matrix to obtain an optimized neighbor matrix according to the current projection matrix and the high-dimensional sample data comprises:

projecting the high-dimensional sample data to a subspace by using a current projection matrix to obtain subspace sample data;

and performing neighbor weight redistribution on the current neighbor matrix according to the subspace sample data to obtain an optimized neighbor matrix.

4. The method according to any one of claims 1 to 3, wherein initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix, comprises:

performing neighbor weight distribution according to high-dimensional sample data to obtain a current neighbor matrix;

and randomly generating an initial projection matrix, and taking the initial projection matrix as a current projection matrix.

5. The method of claim 4, wherein before performing neighbor weight assignment according to high-dimensional sample data to obtain a current neighbor matrix, the method further comprises:

acquiring a plurality of original sample data and label information thereof;

and sequencing and normalizing the plurality of original sample data according to the label information to obtain high-dimensional sample data.

6. A method for data dimensionality reduction according to any one of claims 1 to 3, wherein the local ratio and model are represented by the following formula:

in the formula,

which represents the total number of samples,

respectively represent the second in the total sample

The number of the data is set to be,

the number of sample categories is indicated and,

is a first

The total number of class samples is,

are respectively the first

Class II sample of

A piece of data;

is the total number of the projection directions,

representing a projection matrix of column vectors

Represents the first

；

in (1)

As a unit matrix, superscript

Is a transposed symbol;

represents the first

Class II specimen

Data and the second

The weight between the individual pieces of data,

is shown as

Class II specimen

Data and the second

The weight between each data.

7. Data dimension reduction device based on local ratio and linear discriminant analysis is characterized by comprising:

a substitution unit for substituting the target projection matrix and the target neighbor matrix into a local ratio and a model;

8. The data dimension reduction apparatus of claim 7, further comprising:

and the circulating unit is used for taking the target projection matrix as a new current projection matrix and taking the target neighbor matrix as a new current neighbor matrix when the substituted local ratio and the model are not converged, and triggering the first updating unit to repeatedly execute the operation of updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix until the local ratio and the model are converged.

9. An electronic device comprising a memory storing executable program code and a processor coupled to the memory; the processor calls the executable program code stored in the memory for performing the data dimension reduction method based on local ratio and linear discriminant analysis of any one of claims 1 to 6.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, wherein the computer program causes a computer to execute the method for data dimension reduction based on local ratio and linear discriminant analysis according to any one of claims 1 to 6.