CN115563492A - Data dimension reduction method and device based on local ratio and linear discriminant analysis - Google Patents

Data dimension reduction method and device based on local ratio and linear discriminant analysis Download PDF

Info

Publication number
CN115563492A
CN115563492A CN202211547519.2A CN202211547519A CN115563492A CN 115563492 A CN115563492 A CN 115563492A CN 202211547519 A CN202211547519 A CN 202211547519A CN 115563492 A CN115563492 A CN 115563492A
Authority
CN
China
Prior art keywords
matrix
neighbor
current
sample data
projection matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211547519.2A
Other languages
Chinese (zh)
Inventor
杨晓君
周科艺
闵海波
曹传杰
程昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Aibingo Technology Co ltd
Guangzhou University Town Guangong Science And Technology Achievement Transformation Center
Guangdong University of Technology
Original Assignee
Beijing Aibingo Technology Co ltd
Guangzhou University Town Guangong Science And Technology Achievement Transformation Center
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Aibingo Technology Co ltd, Guangzhou University Town Guangong Science And Technology Achievement Transformation Center, Guangdong University of Technology filed Critical Beijing Aibingo Technology Co ltd
Priority to CN202211547519.2A priority Critical patent/CN115563492A/en
Publication of CN115563492A publication Critical patent/CN115563492A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Operations Research (AREA)
  • Complex Calculations (AREA)

Abstract

The invention belongs to the technical field of data processing, and discloses a data dimension reduction method and a device based on local ratio and linear discriminant analysis.A current neighbor matrix and a current projection matrix are initialized according to high-dimensional sample data, then the current neighbor matrix is updated according to the current projection matrix to obtain an optimized neighbor matrix, then the current projection matrix is trained according to the optimized neighbor matrix to obtain a target projection matrix, finally the optimized neighbor matrix is updated by using the target projection matrix to obtain a target neighbor matrix, if the local ratio and a model which are constructed in advance are converged, the target projection matrix is determined to be an optimal solution, and the optimal solution is used for data dimension reduction; the method can consider the local structure of the sample data by introducing the neighbor weight, better adapts to a real world data set, and simultaneously reduces the noise influence of high-dimensional data on the neighbor matrix by utilizing the alternate mutual training optimization between the projection matrix and the neighbor matrix, thereby further optimizing the dimension reduction effect.

Description

Data dimension reduction method and device based on local ratio and linear discriminant analysis
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a data dimension reduction method, a data dimension reduction device, data dimension reduction equipment and a data dimension reduction storage medium based on local ratio and linear discriminant analysis.
Background
With the development of science and technology, the sampling precision of modern sensors is higher and higher, and the dimensionality of sample data obtained by sampling is increased. However, the discrimination performance of the classifier is not always positively correlated with the increase of the sample data dimension, but after the discrimination performance of the classifier is increased to a critical point, the performance of the classifier is deteriorated if the sample data dimension is continuously increased, which is a well-known "houss effect", and the increase of the sample data dimension also causes the calculation cost index of the classifier to increase.
To solve the above problem, a large number of scholars propose to use a dimension reduction algorithm to map data points to a low-dimensional subspace to find a representation with optimal discriminant performance. The dimension reduction algorithm can be further subdivided into a feature selection algorithm and a feature extraction algorithm, the feature selection algorithm only searches an optimal subset of the original features, the optimal subset and the original feature set are in an included relation, and the original feature space is not changed; and the feature extraction algorithm aims to find the optimal projection direction through linear transformation, so that the original feature space is changed.
Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are the most popular feature extraction dimension reduction algorithms in the unsupervised and supervised fields, respectively. Unsupervised PCA aims to find a variance information that remains as much as possible. While supervised LDA introduces label information to find a projection space that can simultaneously maximize intra-class method distances and minimize inter-class variance distances. LDA was first proposed only for the dichotomy problem, and later further expansion to the multiclassification task has brought it to the increasing interest.
Unfortunately, the original LDA is a trace ratio (first finding the trace and then finding the ratio) problem, which is difficult to directly solve to obtain a closed-form solution. Therefore, some scholars convert the projection matrix into a ratio trace problem (firstly, the ratio is calculated, and then the trace is calculated) to solve the projection matrix. However, the algorithm cannot consider the local structure of the data, and cannot better adapt to the real world data set, and the solution obtained by such conversion is suboptimal, and the optimal projection matrix cannot be obtained.
At present, scholars try to convert the trace ratio problem into a ratio sum (ratio is first solved and then summed) problem for solving, for example, an adaptive neighbor local ratio sum linear discriminant analysis algorithm which maximizes the ratio sum and solves by using a greedy algorithm is proposed in the prior art, the algorithm introduces a local concept, and weights are distributed through a parametrical strategy so as to further obtain a better projection matrix. However, in practice, it is found that the graph constructed by the algorithm is constructed by using the original data, and the original space has a large amount of noise, so that the constructed affinity matrix is influenced by the noise of the original data and is still suboptimal.
Disclosure of Invention
The invention aims to provide a data dimension reduction method, a data dimension reduction device, data dimension reduction equipment and a data dimension reduction storage medium based on local ratio and linear discriminant analysis, which can consider the local structure of a sample, better adapt to a real world data set, reduce the noise influence of original data and further optimize the dimension reduction effect.
The invention discloses a data dimension reduction method based on local ratio and linear discriminant analysis, which comprises the following steps:
initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;
updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;
training a current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;
updating the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;
substituting the target projection matrix and the target neighbor matrix into a local ratio and model;
if the substituted local ratio and the model are converged, outputting the target projection matrix as an optimal solution;
and performing dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.
The second aspect of the present invention discloses a data dimension reduction device based on local ratio and linear discriminant analysis, which includes:
the initialization unit is used for initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;
the first updating unit is used for updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;
the training unit is used for training the current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;
the second updating unit is used for updating the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;
a substituting unit, configured to substitute the target projection matrix and the target neighbor matrix into a local ratio and a model;
the output unit is used for outputting the target projection matrix as an optimal solution when the substituted local ratio and the model are converged;
and the processing unit is used for carrying out dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.
A third aspect of the invention discloses an electronic device comprising a memory storing executable program code and a processor coupled to the memory; the processor calls the executable program code stored in the memory for executing the data dimension reduction method based on the local ratio and linear discriminant analysis disclosed in the first aspect.
A fourth aspect of the present invention discloses a computer-readable storage medium storing a computer program, wherein the computer program causes a computer to execute the data dimension reduction method based on local ratio and linear discriminant analysis disclosed in the first aspect.
The method, the device and the storage medium have the advantages that the method, the device and the storage medium for data dimension reduction based on the local ratio and the linear discriminant analysis are initialized according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix, then the current neighbor matrix is updated according to the current projection matrix to obtain an optimized neighbor matrix, then the current projection matrix is trained according to the optimized neighbor matrix to obtain a target projection matrix, finally the optimized neighbor matrix is updated by using the target projection matrix to obtain a target neighbor matrix, if the target projection matrix and the target neighbor matrix can enable the local ratio and the model which are constructed in advance to be converged, the target projection matrix is determined to be an optimal solution, and finally the optimal solution is used for data dimension reduction. Therefore, the method can take the local structure of the sample data into consideration by introducing the neighbor weight, can better adapt to a real world data set, simultaneously utilizes the alternate mutual training optimization between the projection matrix and the neighbor matrix, and can also carry out the iterative optimization on the neighbor matrix while carrying out the iterative optimization on the projection matrix.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the principles and effects of the invention.
Unless otherwise specified or defined, the same reference numerals in different figures represent the same or similar technical features, and different reference numerals may be used for the same or similar technical features.
FIG. 1 is a flow chart of a data dimension reduction method based on local ratio and linear discriminant analysis according to an embodiment of the present invention;
FIG. 2 is a two-dimensional visualization of a prior LDA algorithm on a synthetic three-ring dataset;
FIG. 3 is a two-dimensional visualization of a prior art Decomposed Newton's Method (DNM) algorithm on a synthetic three-ring dataset;
FIG. 4 is a two-dimensional visualization of the Greedy Ratio Sum (GRS) algorithm of the prior art on a synthetic three-ring dataset;
FIG. 5 is a two-dimensional visualization of a prior art Local Fisher Discriminant Analysis (LFDA) algorithm on a synthetic three-loop dataset;
FIG. 6 is a two-dimensional visualization diagram of a prior Local Sensitive Discriminant Analysis (LSDA) algorithm on a synthetic three-loop dataset;
FIG. 7 is a two-dimensional visualization of a prior art Dynamic Maximum Entropy map (DMEG) algorithm on a synthetic tricyclic data set;
FIG. 8 is a two-dimensional visualization of the Adaptive Neighbor Local Ratio Sum Linear Discriminant Analysis (ANLRSLDA) algorithm on a synthetic three-loop dataset;
FIG. 9 is a two-dimensional visualization diagram of the Local Ratio and Discriminant Analysis based on Adaptive Subspace map (LRSDAASG) algorithm on the synthetic triple-ring dataset proposed by the present invention;
FIG. 10 is a schematic structural diagram of a data dimension reduction apparatus based on local ratio and linear discriminant analysis according to an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Description of the reference numerals:
101. an initialization unit; 102. a first update unit; 103. a training unit; 104. a second updating unit; 105. a substitution unit; 106. an output unit; 107. a processing unit; 108. a circulation unit; 1101. a memory; 1102. a processor.
Detailed Description
Unless specifically stated or otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In the case of combining the technical solutions of the present invention in a realistic scenario, all technical and scientific terms used herein may also have meanings corresponding to the purpose of achieving the technical solutions of the present invention. As used herein, "first and second" \ 8230, "are used merely to distinguish between names and do not denote a particular quantity or order. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
As used herein, unless otherwise specified or defined, the terms "comprises," "comprising," and "including" are used interchangeably and refer to the term "comprising," and are used interchangeably and refer to the term "comprising," or "comprises," as used herein.
It is needless to say that technical contents or technical features which are contrary to the object of the present invention or clearly contradicted by the object of the present invention should be excluded. In order to facilitate an understanding of the invention, specific embodiments thereof will be described in more detail below with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention discloses a data dimension reduction method based on local ratio and linear discriminant analysis, which may be implemented by computer programming. The executing main body of the method may be an electronic device such as a computer, a notebook computer, a tablet computer, or a data dimension reduction device based on local ratio and linear discriminant analysis embedded in the electronic device, which is not limited in the present invention. In this embodiment, an electronic device is taken as an example for explanation. The method comprises the following steps S10-S80:
and S10, initializing the electronic equipment according to the high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix.
In the embodiment of the invention, an initial projection matrix can be randomly generated as a current projection matrix, and in addition, a current neighbor matrix can be randomly generated or constructed according to high-dimensional sample data of an original space.
The method for constructing the current neighbor matrix according to the high-dimensional sample data in the original space specifically includes that neighbor weight distribution is performed according to the high-dimensional sample data in the original space to initialize the neighbor matrix and obtain the current neighbor matrix.
Before executing step S10, the electronic device may further obtain a plurality of original sample data and tag information thereof, and then perform sorting and normalization processing on the plurality of original sample data according to the tag information to obtain high-dimensional sample data.
It should be noted that, in the embodiment of the present invention, the local ratio and the model are constructed in advance. The local ratio and model may be a locally maximized ratio and model, or may be a locally minimized ratio and model. Preferably, the search strategy for locally minimizing the ratio and the model is better than that for locally maximizing the ratio and the model. The derivation process aiming at the local minimum ratio and the model comprises the following steps S01-S02:
s01, minimize Ratio Sum (Ratio Sum) algorithm
Set high dimensional sample data
Figure 100002_DEST_PATH_IMAGE001
Wherein, in the process,
Figure 100002_DEST_PATH_IMAGE002
is the total number of the samples,
Figure 100002_DEST_PATH_IMAGE003
is a characteristic dimension. Assuming high dimensional sample data
Figure 100002_DEST_PATH_IMAGE004
Have been sorted by label size. Low dimensional sample data
Figure 100002_DEST_PATH_IMAGE005
For high dimensional sample data
Figure 100002_DEST_PATH_IMAGE006
Is represented by a low-dimensional representation of, among others,
Figure 100002_DEST_PATH_IMAGE007
for subspace dimensions (i.e. common)
Figure 111859DEST_PATH_IMAGE007
One projection direction), then there are:
Figure 100002_DEST_PATH_IMAGE008
wherein,
Figure 100002_DEST_PATH_IMAGE009
represent
Figure 100002_DEST_PATH_IMAGE010
The transpose matrix of (a) is,
Figure 100002_DEST_PATH_IMAGE011
as high dimensional sample data
Figure 100002_DEST_PATH_IMAGE012
Converting to low dimensional sample data
Figure 100002_DEST_PATH_IMAGE013
The projection matrix of (2).
Some existing dimension reduction algorithms can write a Trace Ratio value (Trace Ratio) problem in the form of the following formula (2):
Figure 100002_DEST_PATH_IMAGE014
if it is
Figure 100002_DEST_PATH_IMAGE015
The above formula is the objective function of LDA, wherein
Figure 100002_DEST_PATH_IMAGE016
Is an inter-class divergence matrix, and is,
Figure 100002_DEST_PATH_IMAGE017
the optimization goal of the intra-class divergence matrix is to make the data distance of the same class as close as possible and the data distance between different classes as far as possible after projection. But the objective function of Trace Ratio may get some bad projection directions. Therefore, a Ratio Sum algorithm is proposed, and the specific expression form is as the following formula (3):
Figure 100002_DEST_PATH_IMAGE018
but to maximize this problem, it still has drawbacks. Therefore, the preferred option of the present invention is to invert each ratio term to solve a minimization problem, and obtain the expression shown in the following formula (4):
Figure 100002_DEST_PATH_IMAGE019
s02, adaptive subspace map
To complete the derivation, equation (4) is converted to vector form, i.e., equation (5) is obtained:
Figure 100002_DEST_PATH_IMAGE020
wherein,
Figure 100002_DEST_PATH_IMAGE021
the total number of samples is represented by,
Figure 100002_DEST_PATH_IMAGE022
respectively represent the first in the total sample
Figure 100002_DEST_PATH_IMAGE023
The number of the data is one,
Figure 100002_DEST_PATH_IMAGE024
the number of sample categories is indicated and,
Figure 100002_DEST_PATH_IMAGE025
is a first
Figure 100002_DEST_PATH_IMAGE026
The total number of class samples is,
Figure DEST_PATH_IMAGE027
are respectively the first
Figure DEST_PATH_IMAGE028
Class II specimen
Figure DEST_PATH_IMAGE029
A piece of data;
Figure DEST_PATH_IMAGE030
is the total number of the projection directions,
Figure DEST_PATH_IMAGE031
representing projection matrices, their column vectors
Figure DEST_PATH_IMAGE032
Represents the first
Figure DEST_PATH_IMAGE033
The direction of the projection is determined by the direction of the projection,
Figure DEST_PATH_IMAGE034
Figure DEST_PATH_IMAGE035
in
Figure DEST_PATH_IMAGE036
As identity matrix, superscript
Figure DEST_PATH_IMAGE037
Is a transposed symbol;
Figure DEST_PATH_IMAGE038
represents the first
Figure DEST_PATH_IMAGE039
Class II specimen
Figure DEST_PATH_IMAGE040
Data and number one
Figure DEST_PATH_IMAGE041
The weight between the pieces of data is calculated,
Figure DEST_PATH_IMAGE042
denotes the first
Figure DEST_PATH_IMAGE043
Class II specimen
Figure DEST_PATH_IMAGE044
Data and the second
Figure DEST_PATH_IMAGE045
The weight between each data.
The above formula (5) means that an optimal projection matrix is found
Figure DEST_PATH_IMAGE046
The euclidean distance between samples of the same class is made as close as possible and the distance between all data samples is made as far as possible.
In order to allow the model to take into account local structure, a weighting factor is added to the equation of the denominator
Figure DEST_PATH_IMAGE047
And obtaining a local minimum ratio and a model as shown in the following formula (6):
Figure DEST_PATH_IMAGE048
(6)
in the formula (II)
Figure DEST_PATH_IMAGE049
And
Figure DEST_PATH_IMAGE050
when the phase difference is equal to each other,
Figure DEST_PATH_IMAGE051
the distance after the projection of the same kind of samples is required to be as close as possible, and the penalty factor of the weight is also as small as possible.
Further preferably, considering that the formula (6) may have a trivial solution, a maximum entropy regular term constraint may be added, i.e. the following expression (7) is obtained:
Figure DEST_PATH_IMAGE052
(7)
based on this, steps S10 to S60 of the embodiment of the present invention are executed, and the problem (7) is actually solved.
Problem (7) is an NP problem and a polynomial solution cannot be directly obtained. The embodiment of the invention provides a new optimization method, firstly, a neighbor matrix is given
Figure DEST_PATH_IMAGE053
Neighbor matrix
Figure DEST_PATH_IMAGE054
Is formed by
Figure DEST_PATH_IMAGE055
Block diagonal moment of formationThe matrix is shown as formula (8):
Figure DEST_PATH_IMAGE056
wherein,
Figure DEST_PATH_IMAGE057
indicating the number of sample categories.
In order to solve the problem (7) smoothly, a projection matrix is initialized randomly
Figure DEST_PATH_IMAGE058
As a current projection matrix
Figure DEST_PATH_IMAGE059
And using the original sample data to the neighbor matrix
Figure DEST_PATH_IMAGE060
The weight is assigned to the element(s) in (1), that is, the following formula (9) is calculated, so as to obtain the initialized neighbor matrix as the current neighbor matrix
Figure DEST_PATH_IMAGE061
Figure DEST_PATH_IMAGE062
Wherein,
Figure DEST_PATH_IMAGE063
to represent
Figure DEST_PATH_IMAGE064
In the same group of adjacent combinations
Figure DEST_PATH_IMAGE065
In the step (1), the first step,
Figure DEST_PATH_IMAGE066
representing weight factors, i.e. elements in a neighbor matrix, for describingSample data
Figure DEST_PATH_IMAGE067
A weight relationship therebetween when
Figure DEST_PATH_IMAGE068
And
Figure DEST_PATH_IMAGE069
when the time is equal to each other, the two phases,
Figure DEST_PATH_IMAGE070
and S20, the electronic equipment updates the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix.
In step S20, the electronic device may specifically fix the current projection matrix
Figure DEST_PATH_IMAGE071
Updating neighbor matrices
Figure DEST_PATH_IMAGE072
. I.e. by the current projection matrix
Figure DEST_PATH_IMAGE073
Projecting high-dimensional sample data to subspace to obtain subspace sample data, and then carrying out nearest neighbor matrix alignment according to the subspace sample data
Figure DEST_PATH_IMAGE074
Carrying out neighbor weight redistribution to obtain an optimized neighbor matrix
Figure DEST_PATH_IMAGE075
It is understood that if
Figure DEST_PATH_IMAGE076
As is known, the problem (7) can be simplified as:
Figure DEST_PATH_IMAGE077
(10)
the corresponding lagrange function is:
Figure DEST_PATH_IMAGE078
derivation is 0, which gives:
Figure DEST_PATH_IMAGE079
therefore, using equation (12), samples of the same type are analyzed
Figure DEST_PATH_IMAGE080
Front of (2)
Figure DEST_PATH_IMAGE081
The nearest neighbor samples are redistributed with weights so as to realize the updating of the current nearest neighbor matrix
Figure DEST_PATH_IMAGE082
Obtaining an optimized neighbor matrix
Figure DEST_PATH_IMAGE083
Comparing equation (9) with equation (12), equation (9) can be derived by calculating the original sample data points
Figure DEST_PATH_IMAGE084
The Euclidean distance between the two is used as the distribution weight value
Figure DEST_PATH_IMAGE085
Such a weight assignment strategy is susceptible to interference from redundant features in the original sample data. While equation (12) may pass through the current projection matrix
Figure DEST_PATH_IMAGE086
Computing subspace sample data projected into a subspace
Figure DEST_PATH_IMAGE087
The Euclidean distance between the two is used as the distribution weight value
Figure DEST_PATH_IMAGE088
The problem can be well avoided.
And S30, the electronic equipment trains the current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix.
At a fixed current projection matrix
Figure DEST_PATH_IMAGE089
Updating neighbor matrices
Figure DEST_PATH_IMAGE090
Obtaining an optimized neighbor matrix
Figure DEST_PATH_IMAGE091
Then, the optimized neighbor matrix obtained by fixing the above steps can be further obtained
Figure DEST_PATH_IMAGE092
Then training the current projection matrix
Figure DEST_PATH_IMAGE093
Obtaining a target projection matrix
Figure DEST_PATH_IMAGE094
It is to be understood that if
Figure DEST_PATH_IMAGE095
As is known, the problem (7) can be simplified as:
Figure DEST_PATH_IMAGE096
wherein,
Figure DEST_PATH_IMAGE097
is equivalent to
Figure DEST_PATH_IMAGE098
Covariance matrix
Figure DEST_PATH_IMAGE099
Figure DEST_PATH_IMAGE100
Is equivalent to
Figure DEST_PATH_IMAGE101
Figure DEST_PATH_IMAGE102
,
Figure DEST_PATH_IMAGE103
Is a diagonal matrix, each diagonal element is
Figure DEST_PATH_IMAGE104
. Equation (13) can therefore be rewritten as:
Figure DEST_PATH_IMAGE105
comparative formula (4) then has
Figure DEST_PATH_IMAGE106
. Suppose that
Figure DEST_PATH_IMAGE107
Already for the optimal solution, it is now necessary to solve
Figure DEST_PATH_IMAGE108
In particular:
Figure DEST_PATH_IMAGE109
then there is a change in the number of,
Figure DEST_PATH_IMAGE110
the corresponding lagrange function is:
Figure DEST_PATH_IMAGE111
about
Figure DEST_PATH_IMAGE112
Taking the derivative, let the derivative be 0, we can obtain:
Figure DEST_PATH_IMAGE113
the minimum eigenvalue of equation (18) corresponds to a eigenvector of
Figure DEST_PATH_IMAGE114
. Alternative solution
Figure DEST_PATH_IMAGE115
Continuously and iteratively updating until the objective function value corresponding to the problem (13) is converged, thereby obtaining the objective projection matrix
Figure DEST_PATH_IMAGE116
And S40, the electronic equipment updates the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain the target neighbor matrix.
Based on optimizing the neighbor matrix
Figure DEST_PATH_IMAGE117
Training to obtain a target projection matrix
Figure DEST_PATH_IMAGE118
The target projection matrix can then be reused
Figure DEST_PATH_IMAGE119
For optimizing neighbor matrix
Figure DEST_PATH_IMAGE120
Updating to obtain target neighbor matrix
Figure DEST_PATH_IMAGE121
Then, it is determined whether the problem (7) converges, i.e., step S50 is performed.
And S50, substituting the target projection matrix and the target neighbor matrix into the local ratio and the model by the electronic equipment to judge whether the local ratio and the model are converged. If yes, executing steps S60-S70; otherwise, step S80 is executed and the flow goes to step S20.
All the above mentioned criteria for determining whether the objective function value converges are:
Figure DEST_PATH_IMAGE122
wherein,
Figure DEST_PATH_IMAGE123
the objective function value for the previous iteration,
Figure DEST_PATH_IMAGE124
for the value of the objective function of the current iteration,
Figure DEST_PATH_IMAGE125
to converge the threshold, it is usually set empirically. For example, in some embodiments, it is arranged to
Figure DEST_PATH_IMAGE126
And S60, the electronic equipment outputs the target projection matrix as an optimal solution.
And S70, the electronic equipment performs dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.
S80, the electronic equipment projects the target to a matrix
Figure DEST_PATH_IMAGE127
As a new current projection matrix
Figure DEST_PATH_IMAGE128
And neighbor the target to the matrix
Figure DEST_PATH_IMAGE129
As a new current neighbor matrix
Figure DEST_PATH_IMAGE130
And the step S20 is executed in a turning manner until the local ratio and the model converge.
Wherein,
Figure DEST_PATH_IMAGE131
Figure DEST_PATH_IMAGE132
representing the current quantity after the previous iteration,
Figure DEST_PATH_IMAGE133
Figure DEST_PATH_IMAGE134
is the current quantity after this iteration,
Figure DEST_PATH_IMAGE135
Figure DEST_PATH_IMAGE136
can be obtained by respectively initializing corresponding formulas.
To sum up, by alternately fixing the neighbor matrices
Figure DEST_PATH_IMAGE137
And a projection matrix
Figure DEST_PATH_IMAGE138
One item is updated, and the other item is updated until the objective function value corresponding to the problem (7) is converged, so that the optimal objective projection matrix can be obtained
Figure 428001DEST_PATH_IMAGE138
And obtaining an optimal solution.
Examples
The specific algorithm is shown in the following steps S91-S99:
s91, the electronic equipment acquires original sample data input by a user and label information thereof
Figure DEST_PATH_IMAGE139
Number of neighbors
Figure DEST_PATH_IMAGE140
Subspace dimension
Figure DEST_PATH_IMAGE141
And regularization term parameter
Figure DEST_PATH_IMAGE142
S92, the electronic equipment sorts and normalizes the original sample data according to the label information to obtain high-dimensional sample data
Figure DEST_PATH_IMAGE143
. Random initialization projection matrix
Figure DEST_PATH_IMAGE144
Satisfy the requirement of
Figure DEST_PATH_IMAGE145
Initializing the neighbor matrix using equation (9)
Figure DEST_PATH_IMAGE146
S93, the electronic equipment utilizes the current projection matrix
Figure DEST_PATH_IMAGE147
Updating the current neighbor matrix according to equation (12)
Figure DEST_PATH_IMAGE148
Obtaining an optimized neighbor matrix
Figure DEST_PATH_IMAGE149
S94, the electronic equipment utilizes the latest optimized neighbor matrix
Figure DEST_PATH_IMAGE150
Training the current projection matrix according to equation (18)
Figure DEST_PATH_IMAGE151
Until the objective function value of the problem (13) converges, an objective projection matrix is obtained
Figure DEST_PATH_IMAGE152
S95, the electronic equipment utilizes the target projection matrix
Figure DEST_PATH_IMAGE153
Updating the optimized neighbor matrix according to equation (12)
Figure DEST_PATH_IMAGE154
Obtaining a target neighbor matrix
Figure DEST_PATH_IMAGE155
S96, the electronic equipment projects the matrix according to the latest target
Figure DEST_PATH_IMAGE156
And target neighbor matrix
Figure DEST_PATH_IMAGE157
And judging whether the objective function value corresponding to the problem (7) converges. If the convergence is reached, executing steps S97-S98, otherwise executing step S99, returning to step S93, and re-executing steps S93-S96.
S97, electronic equipment output target projection matrix
Figure DEST_PATH_IMAGE158
As the optimal solution.
And S98, the electronic equipment performs dimension reduction on the high-dimensional sample data according to the optimal solution.
S99, the electronic equipment projects the target to a matrix
Figure DEST_PATH_IMAGE159
As a new current projection matrix
Figure DEST_PATH_IMAGE160
And neighbor the target to the matrix
Figure DEST_PATH_IMAGE161
As a new current neighbor matrix
Figure DEST_PATH_IMAGE162
Therefore, by introducing the neighbor weight, the embodiment of the invention can consider the local structure of the sample data, can better adapt to the real world data set, and simultaneously utilizes the alternate mutual training optimization between the projection matrix and the neighbor matrix, namely, the newly learned projection matrix is utilized to project the high-dimensional sample data to the optimal subspace, so as to utilize the data of the optimal subspace to construct a new neighbor matrix. In addition, the embodiment of the invention provides a minimized ratio and a model, so that the searching strategy is better than the maximized ratio and the model, and the accuracy is further improved.
In order to verify the effectiveness of the embodiment of the invention, two experiments are designed and compared with the existing mainstream dimension reduction algorithm, and the existing mainstream dimension reduction algorithm comprises the following steps: LDA, DNM, GRS, LFDA, LSDA, DMEG, ANLRSLDA, and the algorithm proposed by the invention is abbreviated as LRSDAASG.
Experiment one: the local retention between different algorithms is verified by a synthetic three-ring dataset. Specifically, the synthesized data is two-dimensional data of three concentric circles, to which 50 noise dimensions are artificially added. Then, learning a projection matrix by using different dimension reduction algorithms, projecting the synthesized data to a corresponding space, and performing visual display, which is specifically shown in fig. 2 to 9. It can be concluded that the dimension reduction algorithms (LDA, DNM, GRS) of fig. 2 to 4 all lack the ability of local maintenance, can only consider the overall characteristics of the sample, and suffer from the interference of noise dimension seriously, and thus are not well suited for real-world datasets. The dimension reduction algorithms (LFDA, LSDA, DMEG, ANLRSLDA) of fig. 5 to 8, although having the capability of considering the local structure, can only roughly see that the samples are concentric circles, whereas the algorithm of the present invention (LRSDAASG) shown in fig. 9 better retains the local structure of the samples.
Experiment two: experiments were performed on real world datasets to verify the validity of kicked-out algorithms. The UMIST dataset is used as the reference dataset. Specifically, 40% of samples in the data set are randomly selected as a training set to learn a projection matrix, and then the rest of samples are projected to the learned subspace. And then using the 1NN classifier as accuracy verification. This experiment was repeated ten times, with the results shown in table 1 below:
TABLE 1 Performance comparison of the algorithm of the present invention with the existing mainstream dimensionality reduction algorithm
Figure DEST_PATH_IMAGE163
As shown in fig. 10, the embodiment of the present invention discloses a data dimension reduction device based on local ratio and linear discriminant analysis (hereinafter referred to as data dimension reduction device), which includes an initialization unit 101, a first updating unit 102, a training unit 103, a second updating unit 104, a substitution unit 105, an output unit 106, a processing unit 107, and a circulation unit 108, wherein,
an initialization unit 101, configured to initialize according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;
the first updating unit 102 is configured to update the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;
the training unit 103 is used for training the current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;
the second updating unit 104 is configured to update the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;
a substituting unit 105, configured to substitute the target projection matrix and the target neighbor matrix into the local ratio and the model;
an output unit 106, configured to, after the substituting unit 105 substitutes the target projection matrix and the target neighbor matrix into the local ratio and the model, output the target projection matrix as an optimal solution if the substituted local ratio and the model are converged;
and the processing unit 107 is configured to perform dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.
As an alternative implementation, the initialization unit 101 may include the following sub-units not shown in the figure:
the first allocation subunit is used for performing neighbor weight allocation according to high-dimensional sample data to obtain a current neighbor matrix;
and the generation subunit is used for randomly generating an initial projection matrix and taking the initial projection matrix as the current projection matrix.
Optionally, the data dimension reduction apparatus may further include the following sub-units, not shown:
the acquisition unit is used for carrying out neighbor weight allocation on the first allocation subunit according to the high-dimensional sample data and acquiring a plurality of original sample data and label information thereof before acquiring the current neighbor matrix;
and the preprocessing unit is used for sequencing and normalizing the plurality of original sample data according to the label information to obtain high-dimensional sample data.
As an optional implementation manner, the data dimension reduction apparatus may further include a circulation unit 108, configured to, after the substitution unit 105 substitutes the target projection matrix and the target neighbor matrix into the local ratio and the model, if the substituted local ratio and the model are not converged, take the target projection matrix as a new current projection matrix and the target neighbor matrix as a new current neighbor matrix, and trigger the first updating unit 102 to repeatedly perform an operation of updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix until the local ratio and the model are converged.
Further optionally, the first updating unit 102 may include the following sub-units not shown in the figure:
the projection shadow unit is used for projecting the high-dimensional sample data to a subspace by utilizing the current projection matrix to obtain the subspace sample data;
and the second sub-allocation subunit is used for performing neighbor weight reallocation on the current neighbor matrix according to the subspace sample data to obtain an optimized neighbor matrix.
As shown in fig. 11, an embodiment of the present invention discloses an electronic device, which includes a memory 1101 storing executable program codes and a processor 1102 coupled to the memory 1101;
the processor 1102 calls the executable program code stored in the memory 1101 to execute the data dimension reduction method based on the local ratio and linear discriminant analysis described in the above embodiments.
The embodiment of the invention also discloses a computer readable storage medium which stores a computer program, wherein the computer program enables a computer to execute the data dimension reduction method based on the local ratio and the linear discriminant analysis described in the embodiments.
The purpose of the above embodiments is to make an exemplary reproduction and derivation of the technical solutions of the present invention, and to fully describe the technical solutions, objects and effects of the present invention, so as to make the public more thoroughly and comprehensively understand the disclosure of the present invention, and not to limit the protection scope of the present invention.
The above examples are not intended to be exhaustive of the invention and there may be many other embodiments not listed. Any alterations and modifications without departing from the spirit of the invention are within the scope of the invention.

Claims (10)

1. The data dimension reduction method based on the local ratio and the linear discriminant analysis is characterized by comprising the following steps of:
initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;
updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;
training a current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;
updating the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;
substituting the target projection matrix and the target neighbor matrix into a local ratio and a model;
if the substituted local ratio and the model are converged, outputting the target projection matrix as an optimal solution;
and performing dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.
2. The data dimension reduction method of claim 1, wherein after substituting the target projection matrix and the target neighbor matrix into local ratio values and models, the method further comprises:
if the substituted local ratio and the model are not converged, taking the target projection matrix as a new current projection matrix and taking the target neighbor matrix as a new current neighbor matrix, and repeatedly executing the step of updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix until the local ratio and the model are converged.
3. The method of claim 1, wherein updating the current neighbor matrix to obtain an optimized neighbor matrix according to the current projection matrix and the high-dimensional sample data comprises:
projecting the high-dimensional sample data to a subspace by using a current projection matrix to obtain subspace sample data;
and performing neighbor weight redistribution on the current neighbor matrix according to the subspace sample data to obtain an optimized neighbor matrix.
4. The method according to any one of claims 1 to 3, wherein initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix, comprises:
performing neighbor weight distribution according to high-dimensional sample data to obtain a current neighbor matrix;
and randomly generating an initial projection matrix, and taking the initial projection matrix as a current projection matrix.
5. The method of claim 4, wherein before performing neighbor weight assignment according to high-dimensional sample data to obtain a current neighbor matrix, the method further comprises:
acquiring a plurality of original sample data and label information thereof;
and sequencing and normalizing the plurality of original sample data according to the label information to obtain high-dimensional sample data.
6. A method for data dimensionality reduction according to any one of claims 1 to 3, wherein the local ratio and model are represented by the following formula:
Figure DEST_PATH_IMAGE001
in the formula,
Figure DEST_PATH_IMAGE002
which represents the total number of samples,
Figure DEST_PATH_IMAGE003
respectively represent the second in the total sample
Figure DEST_PATH_IMAGE004
The number of the data is set to be,
Figure DEST_PATH_IMAGE005
the number of sample categories is indicated and,
Figure DEST_PATH_IMAGE006
is a first
Figure DEST_PATH_IMAGE007
The total number of class samples is,
Figure DEST_PATH_IMAGE008
are respectively the first
Figure DEST_PATH_IMAGE009
Class II sample of
Figure DEST_PATH_IMAGE010
A piece of data;
Figure DEST_PATH_IMAGE011
is the total number of the projection directions,
Figure DEST_PATH_IMAGE012
representing a projection matrix of column vectors
Figure DEST_PATH_IMAGE013
Represents the first
Figure DEST_PATH_IMAGE014
The direction of the projection is determined by the direction of the projection,
Figure DEST_PATH_IMAGE015
Figure DEST_PATH_IMAGE016
in (1)
Figure DEST_PATH_IMAGE017
As a unit matrix, superscript
Figure DEST_PATH_IMAGE018
Is a transposed symbol;
Figure DEST_PATH_IMAGE019
represents the first
Figure DEST_PATH_IMAGE020
Class II specimen
Figure DEST_PATH_IMAGE021
Data and the second
Figure DEST_PATH_IMAGE022
The weight between the individual pieces of data,
Figure DEST_PATH_IMAGE023
is shown as
Figure DEST_PATH_IMAGE024
Class II specimen
Figure DEST_PATH_IMAGE025
Data and the second
Figure DEST_PATH_IMAGE026
The weight between each data.
7. Data dimension reduction device based on local ratio and linear discriminant analysis is characterized by comprising:
the initialization unit is used for initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;
the first updating unit is used for updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;
the training unit is used for training the current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;
the second updating unit is used for updating the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;
a substitution unit for substituting the target projection matrix and the target neighbor matrix into a local ratio and a model;
the output unit is used for outputting the target projection matrix as an optimal solution when the substituted local ratio and the model are converged;
and the processing unit is used for carrying out dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.
8. The data dimension reduction apparatus of claim 7, further comprising:
and the circulating unit is used for taking the target projection matrix as a new current projection matrix and taking the target neighbor matrix as a new current neighbor matrix when the substituted local ratio and the model are not converged, and triggering the first updating unit to repeatedly execute the operation of updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix until the local ratio and the model are converged.
9. An electronic device comprising a memory storing executable program code and a processor coupled to the memory; the processor calls the executable program code stored in the memory for performing the data dimension reduction method based on local ratio and linear discriminant analysis of any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, wherein the computer program causes a computer to execute the method for data dimension reduction based on local ratio and linear discriminant analysis according to any one of claims 1 to 6.
CN202211547519.2A 2022-12-05 2022-12-05 Data dimension reduction method and device based on local ratio and linear discriminant analysis Pending CN115563492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211547519.2A CN115563492A (en) 2022-12-05 2022-12-05 Data dimension reduction method and device based on local ratio and linear discriminant analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211547519.2A CN115563492A (en) 2022-12-05 2022-12-05 Data dimension reduction method and device based on local ratio and linear discriminant analysis

Publications (1)

Publication Number Publication Date
CN115563492A true CN115563492A (en) 2023-01-03

Family

ID=84769665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211547519.2A Pending CN115563492A (en) 2022-12-05 2022-12-05 Data dimension reduction method and device based on local ratio and linear discriminant analysis

Country Status (1)

Country Link
CN (1) CN115563492A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116386830A (en) * 2023-04-10 2023-07-04 山东博鹏信息科技有限公司 Hospital management system based on big data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116386830A (en) * 2023-04-10 2023-07-04 山东博鹏信息科技有限公司 Hospital management system based on big data
CN116386830B (en) * 2023-04-10 2023-09-22 山东博鹏信息科技有限公司 Hospital management system based on big data

Similar Documents

Publication Publication Date Title
Liu et al. Multiple kernel $ k $ k-means with incomplete kernels
Minaei-Bidgoli et al. Ensembles of partitions via data resampling
Zhu et al. Unsupervised spectral feature selection with dynamic hyper-graph learning
He et al. Unsupervised cross-modal retrieval through adversarial learning
Landgrebe et al. Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis
He et al. Kernel K-means sampling for Nyström approximation
Minaei-Bidgoli et al. Effects of resampling method and adaptation on clustering ensemble efficacy
Liu et al. Learning instance correlation functions for multilabel classification
Qin et al. Compressive sequential learning for action similarity labeling
Ali et al. Modeling global geometric spatial information for rotation invariant classification of satellite images
Zhang et al. An efficient framework for unsupervised feature selection
US7062504B2 (en) Creating ensembles of oblique decision trees with evolutionary algorithms and sampling
CN115563492A (en) Data dimension reduction method and device based on local ratio and linear discriminant analysis
CN111476100A (en) Data processing method and device based on principal component analysis and storage medium
Zhao et al. Joint adaptive graph learning and discriminative analysis for unsupervised feature selection
CN116229179A (en) Dual-relaxation image classification method based on width learning system
CN115080749A (en) Weak supervision text classification method, system and device based on self-supervision training
CN108108769A (en) Data classification method and device and storage medium
Niemelä et al. Toolbox for distance estimation and cluster validation on data with missing values
CN115526268A (en) Method and device for selecting characteristics for identifying sensitive information
CN116092138A (en) K neighbor graph iterative vein recognition method and system based on deep learning
Wang et al. A fuzzy consensus clustering based undersampling approach for class imbalanced learning
Zhao et al. Multiclass discriminant analysis via adaptive weighted scheme
Yin et al. Learning a representation with the block-diagonal structure for pattern classification
Lu et al. Combining multiple clusterings using fast simulated annealing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230103