CN115563492A - Data dimension reduction method and device based on local ratio and linear discriminant analysis - Google Patents
Data dimension reduction method and device based on local ratio and linear discriminant analysis Download PDFInfo
- Publication number
- CN115563492A CN115563492A CN202211547519.2A CN202211547519A CN115563492A CN 115563492 A CN115563492 A CN 115563492A CN 202211547519 A CN202211547519 A CN 202211547519A CN 115563492 A CN115563492 A CN 115563492A
- Authority
- CN
- China
- Prior art keywords
- matrix
- neighbor
- current
- sample data
- projection matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000009467 reduction Effects 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000004458 analytical method Methods 0.000 title claims abstract description 27
- 239000011159 matrix material Substances 0.000 claims abstract description 243
- 238000012549 training Methods 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 238000006467 substitution reaction Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 abstract description 7
- 230000000694 effects Effects 0.000 abstract description 5
- 238000004422 calculation algorithm Methods 0.000 description 35
- 230000006870 function Effects 0.000 description 11
- 230000000875 corresponding effect Effects 0.000 description 8
- 238000012800 visualization Methods 0.000 description 8
- 238000002474 experimental method Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000009795 derivation Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Operations Research (AREA)
- Complex Calculations (AREA)
Abstract
The invention belongs to the technical field of data processing, and discloses a data dimension reduction method and a device based on local ratio and linear discriminant analysis.A current neighbor matrix and a current projection matrix are initialized according to high-dimensional sample data, then the current neighbor matrix is updated according to the current projection matrix to obtain an optimized neighbor matrix, then the current projection matrix is trained according to the optimized neighbor matrix to obtain a target projection matrix, finally the optimized neighbor matrix is updated by using the target projection matrix to obtain a target neighbor matrix, if the local ratio and a model which are constructed in advance are converged, the target projection matrix is determined to be an optimal solution, and the optimal solution is used for data dimension reduction; the method can consider the local structure of the sample data by introducing the neighbor weight, better adapts to a real world data set, and simultaneously reduces the noise influence of high-dimensional data on the neighbor matrix by utilizing the alternate mutual training optimization between the projection matrix and the neighbor matrix, thereby further optimizing the dimension reduction effect.
Description
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a data dimension reduction method, a data dimension reduction device, data dimension reduction equipment and a data dimension reduction storage medium based on local ratio and linear discriminant analysis.
Background
With the development of science and technology, the sampling precision of modern sensors is higher and higher, and the dimensionality of sample data obtained by sampling is increased. However, the discrimination performance of the classifier is not always positively correlated with the increase of the sample data dimension, but after the discrimination performance of the classifier is increased to a critical point, the performance of the classifier is deteriorated if the sample data dimension is continuously increased, which is a well-known "houss effect", and the increase of the sample data dimension also causes the calculation cost index of the classifier to increase.
To solve the above problem, a large number of scholars propose to use a dimension reduction algorithm to map data points to a low-dimensional subspace to find a representation with optimal discriminant performance. The dimension reduction algorithm can be further subdivided into a feature selection algorithm and a feature extraction algorithm, the feature selection algorithm only searches an optimal subset of the original features, the optimal subset and the original feature set are in an included relation, and the original feature space is not changed; and the feature extraction algorithm aims to find the optimal projection direction through linear transformation, so that the original feature space is changed.
Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are the most popular feature extraction dimension reduction algorithms in the unsupervised and supervised fields, respectively. Unsupervised PCA aims to find a variance information that remains as much as possible. While supervised LDA introduces label information to find a projection space that can simultaneously maximize intra-class method distances and minimize inter-class variance distances. LDA was first proposed only for the dichotomy problem, and later further expansion to the multiclassification task has brought it to the increasing interest.
Unfortunately, the original LDA is a trace ratio (first finding the trace and then finding the ratio) problem, which is difficult to directly solve to obtain a closed-form solution. Therefore, some scholars convert the projection matrix into a ratio trace problem (firstly, the ratio is calculated, and then the trace is calculated) to solve the projection matrix. However, the algorithm cannot consider the local structure of the data, and cannot better adapt to the real world data set, and the solution obtained by such conversion is suboptimal, and the optimal projection matrix cannot be obtained.
At present, scholars try to convert the trace ratio problem into a ratio sum (ratio is first solved and then summed) problem for solving, for example, an adaptive neighbor local ratio sum linear discriminant analysis algorithm which maximizes the ratio sum and solves by using a greedy algorithm is proposed in the prior art, the algorithm introduces a local concept, and weights are distributed through a parametrical strategy so as to further obtain a better projection matrix. However, in practice, it is found that the graph constructed by the algorithm is constructed by using the original data, and the original space has a large amount of noise, so that the constructed affinity matrix is influenced by the noise of the original data and is still suboptimal.
Disclosure of Invention
The invention aims to provide a data dimension reduction method, a data dimension reduction device, data dimension reduction equipment and a data dimension reduction storage medium based on local ratio and linear discriminant analysis, which can consider the local structure of a sample, better adapt to a real world data set, reduce the noise influence of original data and further optimize the dimension reduction effect.
The invention discloses a data dimension reduction method based on local ratio and linear discriminant analysis, which comprises the following steps:
initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;
updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;
training a current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;
updating the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;
substituting the target projection matrix and the target neighbor matrix into a local ratio and model;
if the substituted local ratio and the model are converged, outputting the target projection matrix as an optimal solution;
and performing dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.
The second aspect of the present invention discloses a data dimension reduction device based on local ratio and linear discriminant analysis, which includes:
the initialization unit is used for initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;
the first updating unit is used for updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;
the training unit is used for training the current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;
the second updating unit is used for updating the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;
a substituting unit, configured to substitute the target projection matrix and the target neighbor matrix into a local ratio and a model;
the output unit is used for outputting the target projection matrix as an optimal solution when the substituted local ratio and the model are converged;
and the processing unit is used for carrying out dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.
A third aspect of the invention discloses an electronic device comprising a memory storing executable program code and a processor coupled to the memory; the processor calls the executable program code stored in the memory for executing the data dimension reduction method based on the local ratio and linear discriminant analysis disclosed in the first aspect.
A fourth aspect of the present invention discloses a computer-readable storage medium storing a computer program, wherein the computer program causes a computer to execute the data dimension reduction method based on local ratio and linear discriminant analysis disclosed in the first aspect.
The method, the device and the storage medium have the advantages that the method, the device and the storage medium for data dimension reduction based on the local ratio and the linear discriminant analysis are initialized according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix, then the current neighbor matrix is updated according to the current projection matrix to obtain an optimized neighbor matrix, then the current projection matrix is trained according to the optimized neighbor matrix to obtain a target projection matrix, finally the optimized neighbor matrix is updated by using the target projection matrix to obtain a target neighbor matrix, if the target projection matrix and the target neighbor matrix can enable the local ratio and the model which are constructed in advance to be converged, the target projection matrix is determined to be an optimal solution, and finally the optimal solution is used for data dimension reduction. Therefore, the method can take the local structure of the sample data into consideration by introducing the neighbor weight, can better adapt to a real world data set, simultaneously utilizes the alternate mutual training optimization between the projection matrix and the neighbor matrix, and can also carry out the iterative optimization on the neighbor matrix while carrying out the iterative optimization on the projection matrix.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the principles and effects of the invention.
Unless otherwise specified or defined, the same reference numerals in different figures represent the same or similar technical features, and different reference numerals may be used for the same or similar technical features.
FIG. 1 is a flow chart of a data dimension reduction method based on local ratio and linear discriminant analysis according to an embodiment of the present invention;
FIG. 2 is a two-dimensional visualization of a prior LDA algorithm on a synthetic three-ring dataset;
FIG. 3 is a two-dimensional visualization of a prior art Decomposed Newton's Method (DNM) algorithm on a synthetic three-ring dataset;
FIG. 4 is a two-dimensional visualization of the Greedy Ratio Sum (GRS) algorithm of the prior art on a synthetic three-ring dataset;
FIG. 5 is a two-dimensional visualization of a prior art Local Fisher Discriminant Analysis (LFDA) algorithm on a synthetic three-loop dataset;
FIG. 6 is a two-dimensional visualization diagram of a prior Local Sensitive Discriminant Analysis (LSDA) algorithm on a synthetic three-loop dataset;
FIG. 7 is a two-dimensional visualization of a prior art Dynamic Maximum Entropy map (DMEG) algorithm on a synthetic tricyclic data set;
FIG. 8 is a two-dimensional visualization of the Adaptive Neighbor Local Ratio Sum Linear Discriminant Analysis (ANLRSLDA) algorithm on a synthetic three-loop dataset;
FIG. 9 is a two-dimensional visualization diagram of the Local Ratio and Discriminant Analysis based on Adaptive Subspace map (LRSDAASG) algorithm on the synthetic triple-ring dataset proposed by the present invention;
FIG. 10 is a schematic structural diagram of a data dimension reduction apparatus based on local ratio and linear discriminant analysis according to an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Description of the reference numerals:
101. an initialization unit; 102. a first update unit; 103. a training unit; 104. a second updating unit; 105. a substitution unit; 106. an output unit; 107. a processing unit; 108. a circulation unit; 1101. a memory; 1102. a processor.
Detailed Description
Unless specifically stated or otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In the case of combining the technical solutions of the present invention in a realistic scenario, all technical and scientific terms used herein may also have meanings corresponding to the purpose of achieving the technical solutions of the present invention. As used herein, "first and second" \ 8230, "are used merely to distinguish between names and do not denote a particular quantity or order. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
As used herein, unless otherwise specified or defined, the terms "comprises," "comprising," and "including" are used interchangeably and refer to the term "comprising," and are used interchangeably and refer to the term "comprising," or "comprises," as used herein.
It is needless to say that technical contents or technical features which are contrary to the object of the present invention or clearly contradicted by the object of the present invention should be excluded. In order to facilitate an understanding of the invention, specific embodiments thereof will be described in more detail below with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention discloses a data dimension reduction method based on local ratio and linear discriminant analysis, which may be implemented by computer programming. The executing main body of the method may be an electronic device such as a computer, a notebook computer, a tablet computer, or a data dimension reduction device based on local ratio and linear discriminant analysis embedded in the electronic device, which is not limited in the present invention. In this embodiment, an electronic device is taken as an example for explanation. The method comprises the following steps S10-S80:
and S10, initializing the electronic equipment according to the high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix.
In the embodiment of the invention, an initial projection matrix can be randomly generated as a current projection matrix, and in addition, a current neighbor matrix can be randomly generated or constructed according to high-dimensional sample data of an original space.
The method for constructing the current neighbor matrix according to the high-dimensional sample data in the original space specifically includes that neighbor weight distribution is performed according to the high-dimensional sample data in the original space to initialize the neighbor matrix and obtain the current neighbor matrix.
Before executing step S10, the electronic device may further obtain a plurality of original sample data and tag information thereof, and then perform sorting and normalization processing on the plurality of original sample data according to the tag information to obtain high-dimensional sample data.
It should be noted that, in the embodiment of the present invention, the local ratio and the model are constructed in advance. The local ratio and model may be a locally maximized ratio and model, or may be a locally minimized ratio and model. Preferably, the search strategy for locally minimizing the ratio and the model is better than that for locally maximizing the ratio and the model. The derivation process aiming at the local minimum ratio and the model comprises the following steps S01-S02:
s01, minimize Ratio Sum (Ratio Sum) algorithm
Set high dimensional sample dataWherein, in the process,is the total number of the samples,is a characteristic dimension. Assuming high dimensional sample dataHave been sorted by label size. Low dimensional sample dataFor high dimensional sample dataIs represented by a low-dimensional representation of, among others,for subspace dimensions (i.e. common)One projection direction), then there are:
wherein,representThe transpose matrix of (a) is,as high dimensional sample dataConverting to low dimensional sample dataThe projection matrix of (2).
Some existing dimension reduction algorithms can write a Trace Ratio value (Trace Ratio) problem in the form of the following formula (2):
if it isThe above formula is the objective function of LDA, whereinIs an inter-class divergence matrix, and is,the optimization goal of the intra-class divergence matrix is to make the data distance of the same class as close as possible and the data distance between different classes as far as possible after projection. But the objective function of Trace Ratio may get some bad projection directions. Therefore, a Ratio Sum algorithm is proposed, and the specific expression form is as the following formula (3):
but to maximize this problem, it still has drawbacks. Therefore, the preferred option of the present invention is to invert each ratio term to solve a minimization problem, and obtain the expression shown in the following formula (4):
s02, adaptive subspace map
To complete the derivation, equation (4) is converted to vector form, i.e., equation (5) is obtained:
wherein,the total number of samples is represented by,respectively represent the first in the total sampleThe number of the data is one,the number of sample categories is indicated and,is a firstThe total number of class samples is,are respectively the firstClass II specimenA piece of data;is the total number of the projection directions,representing projection matrices, their column vectorsRepresents the firstThe direction of the projection is determined by the direction of the projection,;inAs identity matrix, superscriptIs a transposed symbol;represents the firstClass II specimenData and number oneThe weight between the pieces of data is calculated,denotes the firstClass II specimenData and the secondThe weight between each data.
The above formula (5) means that an optimal projection matrix is foundThe euclidean distance between samples of the same class is made as close as possible and the distance between all data samples is made as far as possible.
In order to allow the model to take into account local structure, a weighting factor is added to the equation of the denominatorAnd obtaining a local minimum ratio and a model as shown in the following formula (6):
(6)
in the formula (II)Andwhen the phase difference is equal to each other,the distance after the projection of the same kind of samples is required to be as close as possible, and the penalty factor of the weight is also as small as possible.
Further preferably, considering that the formula (6) may have a trivial solution, a maximum entropy regular term constraint may be added, i.e. the following expression (7) is obtained:
(7)
based on this, steps S10 to S60 of the embodiment of the present invention are executed, and the problem (7) is actually solved.
Problem (7) is an NP problem and a polynomial solution cannot be directly obtained. The embodiment of the invention provides a new optimization method, firstly, a neighbor matrix is givenNeighbor matrixIs formed byBlock diagonal moment of formationThe matrix is shown as formula (8):
In order to solve the problem (7) smoothly, a projection matrix is initialized randomlyAs a current projection matrixAnd using the original sample data to the neighbor matrixThe weight is assigned to the element(s) in (1), that is, the following formula (9) is calculated, so as to obtain the initialized neighbor matrix as the current neighbor matrix:
Wherein,to representIn the same group of adjacent combinationsIn the step (1), the first step,representing weight factors, i.e. elements in a neighbor matrix, for describingSample dataA weight relationship therebetween whenAndwhen the time is equal to each other, the two phases,。
and S20, the electronic equipment updates the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix.
In step S20, the electronic device may specifically fix the current projection matrixUpdating neighbor matrices. I.e. by the current projection matrixProjecting high-dimensional sample data to subspace to obtain subspace sample data, and then carrying out nearest neighbor matrix alignment according to the subspace sample dataCarrying out neighbor weight redistribution to obtain an optimized neighbor matrix。
(10)
the corresponding lagrange function is:
derivation is 0, which gives:
therefore, using equation (12), samples of the same type are analyzedFront of (2)The nearest neighbor samples are redistributed with weights so as to realize the updating of the current nearest neighbor matrixObtaining an optimized neighbor matrix。
Comparing equation (9) with equation (12), equation (9) can be derived by calculating the original sample data pointsThe Euclidean distance between the two is used as the distribution weight valueSuch a weight assignment strategy is susceptible to interference from redundant features in the original sample data. While equation (12) may pass through the current projection matrixComputing subspace sample data projected into a subspaceThe Euclidean distance between the two is used as the distribution weight valueThe problem can be well avoided.
And S30, the electronic equipment trains the current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix.
At a fixed current projection matrixUpdating neighbor matricesObtaining an optimized neighbor matrixThen, the optimized neighbor matrix obtained by fixing the above steps can be further obtainedThen training the current projection matrixObtaining a target projection matrix。
wherein,is equivalent toCovariance matrix。Is equivalent to,,Is a diagonal matrix, each diagonal element is. Equation (13) can therefore be rewritten as:
comparative formula (4) then has. Suppose thatAlready for the optimal solution, it is now necessary to solveIn particular:
then there is a change in the number of,
the corresponding lagrange function is:
the minimum eigenvalue of equation (18) corresponds to a eigenvector of. Alternative solutionContinuously and iteratively updating until the objective function value corresponding to the problem (13) is converged, thereby obtaining the objective projection matrix。
And S40, the electronic equipment updates the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain the target neighbor matrix.
Based on optimizing the neighbor matrixTraining to obtain a target projection matrixThe target projection matrix can then be reusedFor optimizing neighbor matrixUpdating to obtain target neighbor matrixThen, it is determined whether the problem (7) converges, i.e., step S50 is performed.
And S50, substituting the target projection matrix and the target neighbor matrix into the local ratio and the model by the electronic equipment to judge whether the local ratio and the model are converged. If yes, executing steps S60-S70; otherwise, step S80 is executed and the flow goes to step S20.
All the above mentioned criteria for determining whether the objective function value converges are:
wherein,the objective function value for the previous iteration,for the value of the objective function of the current iteration,to converge the threshold, it is usually set empirically. For example, in some embodiments, it is arranged to。
And S60, the electronic equipment outputs the target projection matrix as an optimal solution.
And S70, the electronic equipment performs dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.
S80, the electronic equipment projects the target to a matrixAs a new current projection matrixAnd neighbor the target to the matrixAs a new current neighbor matrixAnd the step S20 is executed in a turning manner until the local ratio and the model converge.
Wherein,、representing the current quantity after the previous iteration,、is the current quantity after this iteration,、can be obtained by respectively initializing corresponding formulas.
To sum up, by alternately fixing the neighbor matricesAnd a projection matrixOne item is updated, and the other item is updated until the objective function value corresponding to the problem (7) is converged, so that the optimal objective projection matrix can be obtainedAnd obtaining an optimal solution.
Examples
The specific algorithm is shown in the following steps S91-S99:
s91, the electronic equipment acquires original sample data input by a user and label information thereofNumber of neighborsSubspace dimensionAnd regularization term parameter;
S92, the electronic equipment sorts and normalizes the original sample data according to the label information to obtain high-dimensional sample data. Random initialization projection matrixSatisfy the requirement ofInitializing the neighbor matrix using equation (9)。
S93, the electronic equipment utilizes the current projection matrixUpdating the current neighbor matrix according to equation (12)Obtaining an optimized neighbor matrix。
S94, the electronic equipment utilizes the latest optimized neighbor matrixTraining the current projection matrix according to equation (18)Until the objective function value of the problem (13) converges, an objective projection matrix is obtained。
S95, the electronic equipment utilizes the target projection matrixUpdating the optimized neighbor matrix according to equation (12)Obtaining a target neighbor matrix。
S96, the electronic equipment projects the matrix according to the latest targetAnd target neighbor matrixAnd judging whether the objective function value corresponding to the problem (7) converges. If the convergence is reached, executing steps S97-S98, otherwise executing step S99, returning to step S93, and re-executing steps S93-S96.
And S98, the electronic equipment performs dimension reduction on the high-dimensional sample data according to the optimal solution.
S99, the electronic equipment projects the target to a matrixAs a new current projection matrixAnd neighbor the target to the matrixAs a new current neighbor matrix。
Therefore, by introducing the neighbor weight, the embodiment of the invention can consider the local structure of the sample data, can better adapt to the real world data set, and simultaneously utilizes the alternate mutual training optimization between the projection matrix and the neighbor matrix, namely, the newly learned projection matrix is utilized to project the high-dimensional sample data to the optimal subspace, so as to utilize the data of the optimal subspace to construct a new neighbor matrix. In addition, the embodiment of the invention provides a minimized ratio and a model, so that the searching strategy is better than the maximized ratio and the model, and the accuracy is further improved.
In order to verify the effectiveness of the embodiment of the invention, two experiments are designed and compared with the existing mainstream dimension reduction algorithm, and the existing mainstream dimension reduction algorithm comprises the following steps: LDA, DNM, GRS, LFDA, LSDA, DMEG, ANLRSLDA, and the algorithm proposed by the invention is abbreviated as LRSDAASG.
Experiment one: the local retention between different algorithms is verified by a synthetic three-ring dataset. Specifically, the synthesized data is two-dimensional data of three concentric circles, to which 50 noise dimensions are artificially added. Then, learning a projection matrix by using different dimension reduction algorithms, projecting the synthesized data to a corresponding space, and performing visual display, which is specifically shown in fig. 2 to 9. It can be concluded that the dimension reduction algorithms (LDA, DNM, GRS) of fig. 2 to 4 all lack the ability of local maintenance, can only consider the overall characteristics of the sample, and suffer from the interference of noise dimension seriously, and thus are not well suited for real-world datasets. The dimension reduction algorithms (LFDA, LSDA, DMEG, ANLRSLDA) of fig. 5 to 8, although having the capability of considering the local structure, can only roughly see that the samples are concentric circles, whereas the algorithm of the present invention (LRSDAASG) shown in fig. 9 better retains the local structure of the samples.
Experiment two: experiments were performed on real world datasets to verify the validity of kicked-out algorithms. The UMIST dataset is used as the reference dataset. Specifically, 40% of samples in the data set are randomly selected as a training set to learn a projection matrix, and then the rest of samples are projected to the learned subspace. And then using the 1NN classifier as accuracy verification. This experiment was repeated ten times, with the results shown in table 1 below:
TABLE 1 Performance comparison of the algorithm of the present invention with the existing mainstream dimensionality reduction algorithm
As shown in fig. 10, the embodiment of the present invention discloses a data dimension reduction device based on local ratio and linear discriminant analysis (hereinafter referred to as data dimension reduction device), which includes an initialization unit 101, a first updating unit 102, a training unit 103, a second updating unit 104, a substitution unit 105, an output unit 106, a processing unit 107, and a circulation unit 108, wherein,
an initialization unit 101, configured to initialize according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;
the first updating unit 102 is configured to update the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;
the training unit 103 is used for training the current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;
the second updating unit 104 is configured to update the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;
a substituting unit 105, configured to substitute the target projection matrix and the target neighbor matrix into the local ratio and the model;
an output unit 106, configured to, after the substituting unit 105 substitutes the target projection matrix and the target neighbor matrix into the local ratio and the model, output the target projection matrix as an optimal solution if the substituted local ratio and the model are converged;
and the processing unit 107 is configured to perform dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.
As an alternative implementation, the initialization unit 101 may include the following sub-units not shown in the figure:
the first allocation subunit is used for performing neighbor weight allocation according to high-dimensional sample data to obtain a current neighbor matrix;
and the generation subunit is used for randomly generating an initial projection matrix and taking the initial projection matrix as the current projection matrix.
Optionally, the data dimension reduction apparatus may further include the following sub-units, not shown:
the acquisition unit is used for carrying out neighbor weight allocation on the first allocation subunit according to the high-dimensional sample data and acquiring a plurality of original sample data and label information thereof before acquiring the current neighbor matrix;
and the preprocessing unit is used for sequencing and normalizing the plurality of original sample data according to the label information to obtain high-dimensional sample data.
As an optional implementation manner, the data dimension reduction apparatus may further include a circulation unit 108, configured to, after the substitution unit 105 substitutes the target projection matrix and the target neighbor matrix into the local ratio and the model, if the substituted local ratio and the model are not converged, take the target projection matrix as a new current projection matrix and the target neighbor matrix as a new current neighbor matrix, and trigger the first updating unit 102 to repeatedly perform an operation of updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix until the local ratio and the model are converged.
Further optionally, the first updating unit 102 may include the following sub-units not shown in the figure:
the projection shadow unit is used for projecting the high-dimensional sample data to a subspace by utilizing the current projection matrix to obtain the subspace sample data;
and the second sub-allocation subunit is used for performing neighbor weight reallocation on the current neighbor matrix according to the subspace sample data to obtain an optimized neighbor matrix.
As shown in fig. 11, an embodiment of the present invention discloses an electronic device, which includes a memory 1101 storing executable program codes and a processor 1102 coupled to the memory 1101;
the processor 1102 calls the executable program code stored in the memory 1101 to execute the data dimension reduction method based on the local ratio and linear discriminant analysis described in the above embodiments.
The embodiment of the invention also discloses a computer readable storage medium which stores a computer program, wherein the computer program enables a computer to execute the data dimension reduction method based on the local ratio and the linear discriminant analysis described in the embodiments.
The purpose of the above embodiments is to make an exemplary reproduction and derivation of the technical solutions of the present invention, and to fully describe the technical solutions, objects and effects of the present invention, so as to make the public more thoroughly and comprehensively understand the disclosure of the present invention, and not to limit the protection scope of the present invention.
The above examples are not intended to be exhaustive of the invention and there may be many other embodiments not listed. Any alterations and modifications without departing from the spirit of the invention are within the scope of the invention.
Claims (10)
1. The data dimension reduction method based on the local ratio and the linear discriminant analysis is characterized by comprising the following steps of:
initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;
updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;
training a current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;
updating the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;
substituting the target projection matrix and the target neighbor matrix into a local ratio and a model;
if the substituted local ratio and the model are converged, outputting the target projection matrix as an optimal solution;
and performing dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.
2. The data dimension reduction method of claim 1, wherein after substituting the target projection matrix and the target neighbor matrix into local ratio values and models, the method further comprises:
if the substituted local ratio and the model are not converged, taking the target projection matrix as a new current projection matrix and taking the target neighbor matrix as a new current neighbor matrix, and repeatedly executing the step of updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix until the local ratio and the model are converged.
3. The method of claim 1, wherein updating the current neighbor matrix to obtain an optimized neighbor matrix according to the current projection matrix and the high-dimensional sample data comprises:
projecting the high-dimensional sample data to a subspace by using a current projection matrix to obtain subspace sample data;
and performing neighbor weight redistribution on the current neighbor matrix according to the subspace sample data to obtain an optimized neighbor matrix.
4. The method according to any one of claims 1 to 3, wherein initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix, comprises:
performing neighbor weight distribution according to high-dimensional sample data to obtain a current neighbor matrix;
and randomly generating an initial projection matrix, and taking the initial projection matrix as a current projection matrix.
5. The method of claim 4, wherein before performing neighbor weight assignment according to high-dimensional sample data to obtain a current neighbor matrix, the method further comprises:
acquiring a plurality of original sample data and label information thereof;
and sequencing and normalizing the plurality of original sample data according to the label information to obtain high-dimensional sample data.
6. A method for data dimensionality reduction according to any one of claims 1 to 3, wherein the local ratio and model are represented by the following formula:
in the formula,which represents the total number of samples,respectively represent the second in the total sampleThe number of the data is set to be,the number of sample categories is indicated and,is a firstThe total number of class samples is,are respectively the firstClass II sample ofA piece of data;is the total number of the projection directions,representing a projection matrix of column vectorsRepresents the firstThe direction of the projection is determined by the direction of the projection,;in (1)As a unit matrix, superscriptIs a transposed symbol;represents the firstClass II specimenData and the secondThe weight between the individual pieces of data,is shown asClass II specimenData and the secondThe weight between each data.
7. Data dimension reduction device based on local ratio and linear discriminant analysis is characterized by comprising:
the initialization unit is used for initializing according to high-dimensional sample data to obtain a current neighbor matrix and a current projection matrix;
the first updating unit is used for updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix;
the training unit is used for training the current projection matrix according to the optimized neighbor matrix and the high-dimensional sample data to obtain a target projection matrix;
the second updating unit is used for updating the optimized neighbor matrix according to the target projection matrix and the high-dimensional sample data to obtain a target neighbor matrix;
a substitution unit for substituting the target projection matrix and the target neighbor matrix into a local ratio and a model;
the output unit is used for outputting the target projection matrix as an optimal solution when the substituted local ratio and the model are converged;
and the processing unit is used for carrying out dimensionality reduction on the high-dimensional sample data according to the optimal solution to obtain low-dimensional sample data.
8. The data dimension reduction apparatus of claim 7, further comprising:
and the circulating unit is used for taking the target projection matrix as a new current projection matrix and taking the target neighbor matrix as a new current neighbor matrix when the substituted local ratio and the model are not converged, and triggering the first updating unit to repeatedly execute the operation of updating the current neighbor matrix according to the current projection matrix and the high-dimensional sample data to obtain an optimized neighbor matrix until the local ratio and the model are converged.
9. An electronic device comprising a memory storing executable program code and a processor coupled to the memory; the processor calls the executable program code stored in the memory for performing the data dimension reduction method based on local ratio and linear discriminant analysis of any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, wherein the computer program causes a computer to execute the method for data dimension reduction based on local ratio and linear discriminant analysis according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211547519.2A CN115563492A (en) | 2022-12-05 | 2022-12-05 | Data dimension reduction method and device based on local ratio and linear discriminant analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211547519.2A CN115563492A (en) | 2022-12-05 | 2022-12-05 | Data dimension reduction method and device based on local ratio and linear discriminant analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115563492A true CN115563492A (en) | 2023-01-03 |
Family
ID=84769665
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211547519.2A Pending CN115563492A (en) | 2022-12-05 | 2022-12-05 | Data dimension reduction method and device based on local ratio and linear discriminant analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115563492A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116386830A (en) * | 2023-04-10 | 2023-07-04 | 山东博鹏信息科技有限公司 | Hospital management system based on big data |
-
2022
- 2022-12-05 CN CN202211547519.2A patent/CN115563492A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116386830A (en) * | 2023-04-10 | 2023-07-04 | 山东博鹏信息科技有限公司 | Hospital management system based on big data |
CN116386830B (en) * | 2023-04-10 | 2023-09-22 | 山东博鹏信息科技有限公司 | Hospital management system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Multiple kernel $ k $ k-means with incomplete kernels | |
Minaei-Bidgoli et al. | Ensembles of partitions via data resampling | |
Zhu et al. | Unsupervised spectral feature selection with dynamic hyper-graph learning | |
He et al. | Unsupervised cross-modal retrieval through adversarial learning | |
Landgrebe et al. | Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis | |
He et al. | Kernel K-means sampling for Nyström approximation | |
Minaei-Bidgoli et al. | Effects of resampling method and adaptation on clustering ensemble efficacy | |
Liu et al. | Learning instance correlation functions for multilabel classification | |
Qin et al. | Compressive sequential learning for action similarity labeling | |
Ali et al. | Modeling global geometric spatial information for rotation invariant classification of satellite images | |
Zhang et al. | An efficient framework for unsupervised feature selection | |
US7062504B2 (en) | Creating ensembles of oblique decision trees with evolutionary algorithms and sampling | |
CN115563492A (en) | Data dimension reduction method and device based on local ratio and linear discriminant analysis | |
CN111476100A (en) | Data processing method and device based on principal component analysis and storage medium | |
Zhao et al. | Joint adaptive graph learning and discriminative analysis for unsupervised feature selection | |
CN116229179A (en) | Dual-relaxation image classification method based on width learning system | |
CN115080749A (en) | Weak supervision text classification method, system and device based on self-supervision training | |
CN108108769A (en) | Data classification method and device and storage medium | |
Niemelä et al. | Toolbox for distance estimation and cluster validation on data with missing values | |
CN115526268A (en) | Method and device for selecting characteristics for identifying sensitive information | |
CN116092138A (en) | K neighbor graph iterative vein recognition method and system based on deep learning | |
Wang et al. | A fuzzy consensus clustering based undersampling approach for class imbalanced learning | |
Zhao et al. | Multiclass discriminant analysis via adaptive weighted scheme | |
Yin et al. | Learning a representation with the block-diagonal structure for pattern classification | |
Lu et al. | Combining multiple clusterings using fast simulated annealing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230103 |