CN104966098A - Data discrimination dimension reducing method of exceptional point inhibition - Google Patents
Data discrimination dimension reducing method of exceptional point inhibition Download PDFInfo
- Publication number
- CN104966098A CN104966098A CN201510325234.8A CN201510325234A CN104966098A CN 104966098 A CN104966098 A CN 104966098A CN 201510325234 A CN201510325234 A CN 201510325234A CN 104966098 A CN104966098 A CN 104966098A
- Authority
- CN
- China
- Prior art keywords
- matrix
- data
- overbar
- weights
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000005764 inhibitory process Effects 0.000 title abstract 2
- 239000011159 matrix material Substances 0.000 claims abstract description 71
- 230000008569 process Effects 0.000 claims abstract description 15
- 230000009467 reduction Effects 0.000 claims description 13
- 238000000354 decomposition reaction Methods 0.000 claims description 4
- 230000001629 suppression Effects 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000002238 attenuated effect Effects 0.000 abstract 1
- 230000004069 differentiation Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/245—Classification techniques relating to the decision surface
- G06F18/2451—Classification techniques relating to the decision surface linear, e.g. hyperplane
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a data discrimination dimension reducing method of exceptional point inhibition, having the beneficial effects that (1) all data points are supposed to be endowed with weight according to contribution discriminated in a subspace learning process, wherein the data point playing an active role is endowed with large sample weight, in this way, exceptional points can be adaptively attenuated in the subspace learning process; (2) based on given class labels, a mean vector and a covariance matrix of each class are independently estimated, and then a linear discrimination criterion based on new statistics is proposed. And the new mode of sample weighting also can be applied in other algorithms based on the covariance matrix.
Description
Technical field
The present invention relates to data processing field, more specifically, relate to the discriminating data dimension reduction method that a kind of exceptional point suppresses.
Background technology
Method of Data with Adding Windows based on sub-space learning obtains sufficient attention in intellectual analysis and cognitive system.Linear discriminant analysis (LDA) and various improved procedure thereof are because it has the mode of learning of supervision and simple implementation procedure to receive and pays close attention to more widely and study.
But in reality scene, the further application of the drawbacks limit of two aspects LDA and popularization.First, independent identically distributed basic assumption seems too harsh.Those are not met to the data of this basic assumption, just cannot ensure theoretically to obtain optimum solution.And for high dimensional data, how to differentiate the problem that independent same distribution hypothesis is inherently very difficult.Secondly, the data collected in actual environment are often with noise to a certain degree and exceptional point, and their existence will cause subspace sane not, and independent identically distributed Data distribution8 hypothesis makes model with larger error.In both cases, traditional average and covariance matrix estimation method is used to lose the discriminant information of subspace.
Scientific research personnel finds in data modeling and numerical procedure, and partial data plays than the more positive effect of other data in differentiation sub-space learning process.Like this, if do not add the statistic estimation of any differentiation for all data, not only seem reasonable not, also in actual application, have more weak performance.Therefore necessary partial structurtes feature of again refining data, rationally distinguishes data sample, and larger weights given by the samples played a positive role to those, could the discriminant information of containing of more effective mining data.By the combination to Fischer linear discriminant analysis and locality preserving projections basic thought, LFDA can learn out the differentiation subspace with partial structurtes retention performance.Sparse representation method is incorporated into local neighbor sample and portrays process by L1-Graph, thus effectively excavates the sparse expression characteristic between sample, but obtains the subspace contributing to classification on this basis.Professor Xu Yong of Harbin Institute of Technology proposes the LLDA method of two steps.First for any given test sample y, selected one group of relevant (neighbour) sample of y by the method for rarefaction representation in training set, then classical Fischer criterion is performed based on these correlated sampleses, redundant samples can be eliminated further like this, thus reduce computation complexity.Recently, Mu et al. proposes the multi-class Data Dimensionality Reduction problem of adaptive embedding framework process.
It should be noted that above algorithm or method can be summed up as the basic ideas of " relation weighting ".In other words, by to " the neighbor relationships " (neighbour between sample, non-neighbors) again estimate and analysis, relation (having the generic relation of supervision and unsupervised neighbor relationships) between any one group of sample obtains the adjustment based on local geometry, thus more contributes to discriminatory analysis.But a major defect of this kind of algorithm is, if data exist exceptional point to a certain degree, therefore the relation so between exceptional point and normal data points also will amplify, thus impact differentiates the study of subspace.
Summary of the invention
In order to overcome above-mentioned the deficiencies in the prior art, the present invention proposes the discriminating data dimension reduction method that a kind of exceptional point suppresses.The method effectively can solve optimum differentiation subspace, estimating the contribution margin of each sample in learning process, can processing preferably with blocking the data with exceptional point.
To achieve these goals, technical scheme of the present invention is:
The discriminating data dimension reduction method that exceptional point suppresses, comprises the following steps:
S1. input with class label 1,2 ..., C raw data, C be classification sum;
S2. inner in kth classification, 1≤k≤C, obtain wherein each to data point
with
between relation weights
wherein σ is a Study first; Then i-th data point in this classification is obtained
and the weights between other data point
and weights sum
n wherein
krepresent data point number in a kth classification;
S3. the data point of kth class is listed
and the weights between all generic data points, the unified weights sum used in S2 step
do normalized, obtain
final weights
k=1,2 ..., C; I=1,2 ..., n
k,
S4. to the data point in a kth classification, such other sample mean vector and covariance matrix is calculated according to respective sample weights:
S5. to the data in all categories, calculate respectively different classes of between Scatter Matrix
With Scatter Matrix in the class of all categories
S6. in order to extract spacing and differentiate feature, need to solve optimum orthogonal intersection cast shadow matrix A, make it meet S
ba=λ S
wa.This process can be converted into
matrix decomposition, wherein
s
winverse matrix, A is m × (C-1) dimensional matrix to be solved, and λ is the diagonal matrix be made up of eigenwert.
Further, described step S6 comprises:
1) S
w← S
w+ ρ I, wherein ρ is a very little positive number, I representation unit matrix;
2) S is solved
winverse matrix
order
3) S is decomposed into Q Σ Q
tform, wherein Q is the orthogonal matrix of m × m, and Σ is the diagonal matrix of m × m, its diagonal element be nonnegative real number and by from big to small order arrangement;
4) the front C-1 row getting matrix Q form new matrix A.
Compared with prior art, beneficial effect of the present invention is: (1) all data points all suppose that the data point wherein playing positive role gives larger samples weights according to differentiating that weights are given in the contribution in sub-space learning process; Accordingly, exceptional point will be decayed adaptively in sub-space learning process.
(2) based on given class label, independent estimations goes out mean vector and the covariance matrix of each classification, then proposes the linear decision rule based on new statistic; The new model of this sample weighting also may be used for other based in the middle of the algorithm of covariance matrix.
The present invention proposes new sample weighting method and Method of Data with Adding Windows, differentiating the sane performance of subspace for improving, in suppression noise and exceptional point etc., have very important effect and application space widely.
Accompanying drawing explanation
Fig. 1 is conventional neighbor relationships schematic diagram.
Fig. 2 is new average estimation model schematic.
Fig. 3 is the process flow diagram of the inventive method.
Embodiment
Below in conjunction with accompanying drawing, the present invention will be further described, but embodiments of the present invention are not limited to this.
Fig. 1 is conventional figure incorporation model, and other all data point of same class all gives identical weights.
Fig. 2 is the average estimation based on importance sampling, and wherein filled circles represents the significant data point of the larger weights of imparting, and they are for estimating the weighted mean in class.
Fig. 3 is the process flow diagram of the inventive method, wherein comprise data input, weights estimation, Estimation of Mean, in class/main process such as between class scatter Matrix Estimation, subspace calculating.
The discriminating data dimension reduction method that exceptional point suppresses, comprises the following steps:
S1. input with class label 1,2 ..., C raw data, C be classification sum;
S2. inner in kth classification, 1≤k≤C, obtain wherein each to data point
with
between relation weights
wherein σ is a Study first; Then i-th data point in this classification is obtained
and the weights between other data point
and weights sum
n wherein
krepresent data point number in a kth classification;
S3. the data point of kth class is listed
and the weights between all generic data points, the unified weights sum used in S2 step
do normalized, obtain
final weights
k=1,2 ..., C; I=1,2 ..., n
k,
S4. to the data point of kth class, such other sample mean vector and covariance matrix is calculated according to above-mentioned sample weights:
S5. to the data in all categories, calculate respectively different classes of between Scatter Matrix
With Scatter Matrix in the class of all categories
S6. in order to extract spacing and differentiate feature, need to solve optimum orthogonal intersection cast shadow matrix A and make it meet S
ba=λ S
wa.This process can be converted into
matrix decomposition, wherein
s
winverse matrix, A is m × (C-1) dimensional matrix to be solved, and λ is the diagonal matrix be made up of eigenwert.
Further, described step S6 comprises:
1) S
w← S
w+ ρ I, wherein ρ is a very little positive number, I representation unit matrix;
2) S is solved
winverse matrix
order
3) S is decomposed into Q Σ Q
tform, wherein Q is the orthogonal matrix of m × m, and Σ is the diagonal matrix of m × m, its diagonal element be nonnegative real number and by from big to small order arrangement;
4) the front C-1 row getting matrix Q form new matrix A.
Of the present inventionly to be specially:
1. basic statistics amount (in class in average, class Scatter Matrix, between class scatter matrix) is estimated
N the data { (x having supposed m-dimensional space given
i, b
i) | x
i∈ R
m, i=1,2 ..., n}, b
i∈ 1,2 ..., C} is x
iclass label, C is the classification number of data.According to mathematical statistics knowledge, if the probability density function of data is p (x), then the Mean Matrix of sample point and covariance matrix are estimated by following formula respectively:
P (x wherein
i) be often difficult to determine its actual value, can be understood as data point x
iin the significance level of regional area, also can be understood as the contribution margin calculating class center.The present invention proposes one and estimates p (x
i) new method, and for estimate at each classification of class mark data class in average, Scatter Matrix and between class scatter matrix in class, thus obtain the Data Dimensionality Reduction new algorithm that exceptional point suppresses.
General, suppose d=||x
i-x
j|| be x
iwith x
jbetween Euclidean distance, r is a scale parameter,
represent x
iwith x
jbetween similarity measure, then W forms a n × n matrix.D is made to be a n-dimensional vector, and
then D (i) illustrates sample x
iand the similarity sum between other all sample point.Order
the present invention will use p (x
i) as sample point x
isolving the contribution margin/weights differentiated in the process of subspace.
Suppose
the data matrix of kth class, n
kthe sample size of kth class,
represent X
ki-th data, p
ki () is
weights, then X
kmean vector can be designated as
Be updated to
x can be calculated
kscatter Matrix in corresponding class.
Suppose A ∈ R
m × hprojection matrix (acquiescence h=C-1) to be solved, then
feature after dimensionality reduction is
x
kin class after dimensionality reduction, average and covariance matrix can be expressed as:
Due to
We are by vectorial p
kconvert diagonal matrix to, obtain diag (p
k)=D
k/ trace (D
k) and
If
Then the covariance matrix of kth class can be reduced to
If X=is [X
1, X
2..., X
c] represent that we are by matrix E corresponding for each classification by the data matrix of the composition of sample of whole C classification
kbe stitched together according to diagonal way and form weight matrix W
w, then in the class of whole classification, Scatter Matrix sum is expressed as:
On the other hand, the weighted mean vector due to each classification represents such other center, so they can be used for Scatter Matrix between compute classes:
2. linear dimensionality reduction model
In order to obtain optimum low-dimensional discriminant space, the present invention selects Fischer business as basic model.Thus, objective function can be expressed as:
Wherein tr represents the trace operator in linear algebra.This can be solved by a conventional matrix decomposition problem: S
wa=S
ba Δ, wherein Δ represent by
the diagonal matrix that forms as diagonal element of front h eigenvalue of maximum, and A is the orthogonal matrix be made up of its characteristic of correspondence vector.Usually, the high dimension of data can cause S
wirreversible, this can make troubles to above-mentioned optimization problem, therefore needs regularization method (as S
w← S
w+ ρ I, wherein ρ is a very little positive number, I representation unit matrix) ensure S
wreversibility.
It is worth mentioning that, this basic model can also be applied to other the criterion based on interval, such as
In the stage that feature embeds, suppose x
tbe test sample book, only need to obtain the character representation after dimensionality reduction by compute matrix projection: y
t← A
tx
t.
Above-described embodiments of the present invention, do not form limiting the scope of the present invention.Any amendment done within spiritual principles of the present invention, equivalent replacement and improvement etc., all should be included within claims of the present invention.
Claims (2)
1. a discriminating data dimension reduction method for exceptional point suppression, is characterized in that, comprise the following steps:
S1. input with class label 1,2 ..., C raw data, C be classification sum;
S2. inner in kth classification, 1≤k≤C, obtain wherein each to data point
with
between relation weights
wherein σ is a Study first; Then i-th data point in a kth classification is obtained
and the weights in this classification between other data point
and weights sum
n wherein
krepresent data point number in a kth classification;
S3. the data point of kth class is listed
and the weights between all generic data points, the unified weights sum used in S2 step
do normalized, obtain
final weights
k=1,2 ..., C; I=1,2 ..., n
k,
S4. to the data point in a kth classification, such other sample mean vector and covariance matrix is calculated according to each sample weights:
S5. to the data in all categories, calculate respectively different classes of between Scatter Matrix
With Scatter Matrix in the class of all categories
S6. in order to extract spacing and differentiate feature, need to solve optimum orthogonal intersection cast shadow matrix A, make it meet S
ba=λ S
wthis process of A can be converted into
matrix decomposition, wherein
s
winverse matrix, A is m × (C-1) dimensional matrix to be solved, and λ is the diagonal matrix be made up of eigenwert.
2. the discriminating data dimension reduction method of exceptional point suppression according to claim 1, it is characterized in that, described step S6 comprises:
1) S
w← S
w+ ρ I, wherein ρ is a very little positive number, I representation unit matrix;
2) S is solved
winverse matrix
order
3) S is decomposed into Q Σ Q
tform, wherein Q is the orthogonal matrix of m × m, and Σ is the diagonal matrix of m × m, its diagonal element be nonnegative real number and by from big to small order arrangement;
4) the front C-1 row getting matrix Q form new matrix A.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510325234.8A CN104966098A (en) | 2015-06-15 | 2015-06-15 | Data discrimination dimension reducing method of exceptional point inhibition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510325234.8A CN104966098A (en) | 2015-06-15 | 2015-06-15 | Data discrimination dimension reducing method of exceptional point inhibition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104966098A true CN104966098A (en) | 2015-10-07 |
Family
ID=54220133
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510325234.8A Pending CN104966098A (en) | 2015-06-15 | 2015-06-15 | Data discrimination dimension reducing method of exceptional point inhibition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104966098A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105631433A (en) * | 2016-01-14 | 2016-06-01 | 江苏大学 | Two-dimension linearity discrimination analysis face identification method |
CN106446503A (en) * | 2016-07-21 | 2017-02-22 | 华侨大学 | Method for identifying time-varying working mode of auto-covariance matrix recursive principal component analysis with forgetting factor |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102142091A (en) * | 2011-03-30 | 2011-08-03 | 东华大学 | Kernel integration optimizing classification method |
CN104156628A (en) * | 2014-08-29 | 2014-11-19 | 东南大学 | Ship radiation signal recognition method based on multi-kernel learning and discriminant analysis |
-
2015
- 2015-06-15 CN CN201510325234.8A patent/CN104966098A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102142091A (en) * | 2011-03-30 | 2011-08-03 | 东华大学 | Kernel integration optimizing classification method |
CN104156628A (en) * | 2014-08-29 | 2014-11-19 | 东南大学 | Ship radiation signal recognition method based on multi-kernel learning and discriminant analysis |
Non-Patent Citations (1)
Title |
---|
任传贤: "高维异构信息融合与重要性采样:建模,计算及应用", 《万方数据学位论文》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105631433A (en) * | 2016-01-14 | 2016-06-01 | 江苏大学 | Two-dimension linearity discrimination analysis face identification method |
CN105631433B (en) * | 2016-01-14 | 2019-05-31 | 江苏大学 | A kind of face identification method of bidimensional linear discriminant analysis |
CN106446503A (en) * | 2016-07-21 | 2017-02-22 | 华侨大学 | Method for identifying time-varying working mode of auto-covariance matrix recursive principal component analysis with forgetting factor |
CN106446503B (en) * | 2016-07-21 | 2019-02-22 | 华侨大学 | Forget the time-varying operation mode recognition methods of auto-covariance matrix recursion pivot |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110674938B (en) | Anti-attack defense method based on cooperative multi-task training | |
CN112115963B (en) | Method for generating unbiased deep learning model based on transfer learning | |
CN103632168B (en) | Classifier integration method for machine learning | |
CN110837850A (en) | Unsupervised domain adaptation method based on counterstudy loss function | |
CN110232319A (en) | A kind of ship Activity recognition method based on deep learning | |
CN105630901A (en) | Knowledge graph representation learning method | |
CN102520341A (en) | Analog circuit fault diagnosis method based on Bayes-KFCM (Kernelized Fuzzy C-Means) algorithm | |
CN114692741B (en) | Generalized face counterfeiting detection method based on domain invariant features | |
CN104751469B (en) | The image partition method clustered based on Fuzzy c-means | |
CN104239712B (en) | Real-time evaluation method for anti-interference performance of radar | |
CN109767225B (en) | Network payment fraud detection method based on self-learning sliding time window | |
CN111753918B (en) | Gender bias-removed image recognition model based on countermeasure learning and application | |
CN103473556A (en) | Hierarchical support vector machine classifying method based on rejection subspace | |
CN109214444B (en) | Game anti-addiction determination system and method based on twin neural network and GMM | |
CN110502989A (en) | A kind of small sample EO-1 hyperion face identification method and system | |
CN104778466A (en) | Detection method combining various context clues for image focus region | |
CN111310719B (en) | Unknown radiation source individual identification and detection method | |
CN116433909A (en) | Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method | |
CN111930945A (en) | Tor hidden service illegal content classification method | |
Sun et al. | Detecting Crime Types Using Classification Algorithms. | |
CN104966098A (en) | Data discrimination dimension reducing method of exceptional point inhibition | |
CN103150476B (en) | A kind of system efficiency evaluation method based on data station field | |
CN102664771A (en) | Network agent action detection system and detection method based on SVM (Support Vector Machine) | |
Shih et al. | Urban and Rural BMI Trajectories in Southeastern Ghana: A Space-Time Modeling Perspective on Spatial Autocorrelation | |
CN102880638B (en) | Self-adaptive robust constrained maximum variance mapping (CMVM) characteristic dimensionality reduction and extraction method for diversified image retrieval of plant leaves |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20151007 |
|
RJ01 | Rejection of invention patent application after publication |