CN109615014A - A kind of data sorting system and method based on the optimization of KL divergence - Google Patents
A kind of data sorting system and method based on the optimization of KL divergence Download PDFInfo
- Publication number
- CN109615014A CN109615014A CN201811540690.4A CN201811540690A CN109615014A CN 109615014 A CN109615014 A CN 109615014A CN 201811540690 A CN201811540690 A CN 201811540690A CN 109615014 A CN109615014 A CN 109615014A
- Authority
- CN
- China
- Prior art keywords
- data
- feature
- divergence
- sample
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a kind of methods for being optimized based on KL divergence and being classified to data: the data such as original image, text being carried out data prediction, by object modeling at multiple dimensional distribution;A certain amount of triple is selected to carry out model training from the training data for have label;Using selected triple as training data, apply a Linear Mapping A on all mean vectors and optimal Linear Mapping is learnt by iteration optimization, basic assumption of the learning process based on metric learning, i.e., the distance between similar sample become smaller, and the distance between inhomogeneity sample becomes larger;It is optimized using the gradient descent algorithm accumulate in one kind, after the tangent line space of the gradient projection of objective function to the same manifold, the decline of Riemann's gradient is executed in the manifold of the SPD matrix of a given affine constant Riemann metric;The KL divergence between test set and training set is calculated, is classified using k nearest neighbor (KNN) classifier to sample.This method can effectively improve the nicety of grading of system, and possess more stable performance.
Description
Technical field
The invention belongs to machine learning fields, carry out categorizing system to data based on the optimization of KL divergence more particularly to one kind
With method.
Background technique
With the development of information technology, Data Classification Technology increasingly becomes the research hotspot of academia and industry.Number
Refer under given classification system according to classification, the process of data category, Data Classification Technology packet are automatically determined according to data content
Include various applications, such as picture classification, text classification, Classification of Speech etc..One good classifier is conducive to logarithm
It is applied according to more later periods are done.For example, can classify automatically in text filtering, Web document after carrying out preliminary text classification,
The multiple fields such as the organization and management of digital library, semanteme of word discrimination and document are applied.
In machine learning, data object is usually modeled as multiple dimensional distribution to characterize their feature.Therefore, in data
During classification, how to measure the similitude between two data distributions becomes key problem in classification task.Two
Similarity between sample is higher, then they have bigger probability to belong to same class.Common probability distribution is measured
Jensen-Shannon divergence, the Earth Mover ' s Distance (EMD), Maximum Mean Discrepancy etc.
Deng.In these measurements, Kullback-Leibler divergence (KL divergence), also known as relative entropy are the most commonly used to be used to spend
One of the measurement for measuring similitude between two probability distribution is widely used in multiple fields, such as computer vision, mode are known
Not etc..The expression of KL divergence is fitted the information loss generated when true distribution P with probability distribution Q, therefore can preferably spend
Measure the similitude between data distribution.
However in real world, the source of data is extremely complex, and the quality of data is also unable to get guarantee, in collection
In data set, it is understood that there may be noise data, missing data, data distribution imbalance, data point, which mix, is difficult to the problems such as differentiating.?
Under such circumstances, traditional KL divergence is in the similitude being difficult between two data samples of precisive under theorem in Euclid space, into
And the accuracy rate of data classification can be had a huge impact.In other words, traditional KL divergence can not be carried out always optimal
Data indicate.
The existing research for KL divergence has focused largely on two aspects, first is that directly using KL divergence as multiple dimensional distribution
Between estimate, such as variation self-encoding encoder etc., second is that by the way that the approximate of KL divergence is obtained one and is more effectively estimated, example
Such as pass through the methods of the variation upper bound, Monte Carlo approximation approximation.It can be seen that KL divergence is directly applied in existing research mostly,
And the optimization of few research concern KL divergences itself.
Summary of the invention
It is an object of the invention to propose a kind of data sorting system and method based on the optimization of KL divergence, the system and side
Method is optimized for traditional KL divergence, learns the best expression an of data, can effectively improve point of existed system
Class ability, and have more stable performance.
The technical solution of the present invention is to provide a kind of data sorting systems based on the optimization of KL divergence, comprising: feature extraction
Module, feature whitening module, multiple view feature modeling module, training data selecting module, Feature Mapping module, multiple view sample
Similarity calculation module, the optimization module based on KL divergence, the categorization module based on KL divergence measurement under optimal linear mapping,
It is characterized in that:
The multiple view that characteristic extracting module is used to extract initial data from including original image, text data is special
Sign;
Feature whitening module projects the multiple view feature extracted in characteristic extracting module unification to the same low-dimensional
Whitening processing is made to feature after space, reduces the redundancy for the multiple view feature that initial data extracts, it is special to remove different samples
Correlation between sign, then will remap back original space by transformed data;
Multiple view feature modeling module is used for the processed multiple view feature modeling of feature whitening module and characterizes;
Training data selecting module from the training data for have label for selecting a certain amount of triple to carry out model instruction
Practice;
After selecting training data, Feature Mapping module generates projection matrix, and original data characteristics is mapped to one newly
Feature space, in new feature space, the distance between similar sample becomes smaller, and the distance between inhomogeneity sample becomes larger;
Then, multiple view Sample Similarity computing module measures multiple view by the KL divergence of the calculation optimization in new feature space
The similitude of sample;
Based on the optimization module of KL divergence, for the optimization problem of KL divergence to be modeled as one in positive definite matrix group's manifold
On minimization problem;Using Feature Mapping module, multiple view data similarity calculation module and based on the optimization mould of KL divergence
Block is repeated continuously training pattern, until convergence, to learn optimal linear mapping out;
Categorization module based on KL divergence measurement under optimal linear mapping, for being incited somebody to action using the optimal linear mapping learnt
Original data characteristics is mapped to new feature space, based on the KL divergence between test set and training set, is classified using k nearest neighbor
Device classifies to test set sample.
Further, it is Gaussian Profile by each sample, and assumes two Gaussian Profile association sides having the same
Poor matrix;For each sample, multiple view feature is characterized with the mean vector of Gaussian Profile and covariance matrix.
Further, including belonging to the sample of similar object and belonging to the sample of different type objects in triple.
The present invention also provides a kind of data classification methods that the data system based on the optimization of KL divergence is realized, comprising:
Step 1 carries out feature extraction to initial data, i.e., from it is original include image, extract in text data it is original
The multiple view feature of data;
Step 2 carries out whitening processing to the feature extracted, i.e., after extracting multiple view feature in initial data,
Whitening processing will be made to feature after the unified projection to the same lower dimensional space of multiple view feature, reduces what initial data extracted
The redundancy of multiple view feature removes the correlation between different sample characteristics, then the original that will remap back by transformed data
The space come;
Step 3 by processed multiple view feature modeling and characterizes;
Step 4 selects a certain amount of triple as training data, point of training data from the training data for have label
Cloth is feature distribution made of sample in step 3;
Step 5, using the triple selected as training data, carry out Feature Mapping, i.e., applied on all mean vectors
Add a Linear Mapping, original data characteristics is mapped to a new feature space, in new feature space, similar sample
The distance between this becomes smaller, and the distance between inhomogeneity sample becomes larger;
Similarity between step 6, calculating multiple view sample, that is, pass through the calculation optimization in the new feature space of mapping
KL divergence measures the similitude of multiple view sample;
Step 7, using the triple selected as training data, optimized based on KL divergence;Utilize Feature Mapping mould
Block, multiple view data similarity calculation module and the optimization module based on KL divergence are repeated continuously training pattern, until convergence,
To learn optimal linear mapping out;
Original data characteristics is mapped to new feature using the optimum linear mapping learnt in step 7 by step 8
Space, and classified using k nearest neighbor classifier to test set sample in new feature space.
Further, in step 7, homogeneous data and inhomogeneity data bring are balanced using positive parameter γ to be influenced;
And parameter γ is arranged toWherein,It is the average KL divergence of entire training dataset.
Further, in step 7, by after the tangent line space of the gradient projection of objective function to the same manifold,
Give an affine constant Riemann metric symmetric positive definite matrix) manifold on execute the decline of Riemann's gradient, excellent after symmetrization
Retain the manifold structure of learnt Linear Mapping in the iterative process each time changed;
Further, in step 8, the KL for calculating each of test set each of sample and training set sample dissipates
Degree;It safeguards that a size is k by apart from descending priority query, trains tuple for storing arest neighbors;At random from
K tuple is chosen in training tuple as initial arest neighbors tuple, calculate separately test tuple to the k tuple away from
From by the first deck label of training and apart from deposit priority query.
The beneficial effects of the present invention are:
(1) method and system proposed by the present invention can effectively improve the nicety of grading of system, and possess more stable
Performance.It can make up for it under reality scene there may be noise data, missing data, data distribution are uneven, data point mixes
It is difficult to the problems such as differentiating.
(2) present invention can be suitable for the classification of multiple view data.
(3) this method learns an optimal Linear Mapping from the training data for have label, by original data space
It is mapped to a new feature space., can be closer from the distance between similar data sample in new feature space, and
It can be farther from the distance between inhomogeneous data sample.Compared with the system of existing same type, the present invention can be effective
Ground improves the classification capacity of system, and possesses more stable performance.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention;
Fig. 2 is the schematic diagram of the method for the present invention;
Fig. 3 by the method for the present invention use in the explanation figure of gradient descent algorithm accumulate;
Fig. 4 is that this system applies the nicety of grading comparison result under 3D object recognition task with other systems, tests number
According to integrating as NTU16 data set;
Fig. 5 is that this system applies the nicety of grading comparison result under 3D object recognition task with other systems, tests number
According to integrating as NTU47 data set;
Fig. 6 is that this system applies the nicety of grading comparison result under text categorization task with other systems, test data
Integrate as TWITTER data set;
The nicety of grading variation of system when Fig. 7 is this system Parameters variation.
Specific embodiment
Technical solution of the present invention is described in detail below with reference to attached drawing 1-5.
As shown in Figure 1, this embodiment offers a kind of data sorting systems based on the optimization of KL divergence, comprising: feature mentions
Modulus block, feature whitening module, multiple view feature modeling module, training data selecting module, Feature Mapping module, multiple view sample
This similarity calculation module, the optimization module based on KL divergence, the categorization module based on KL divergence measurement under optimal linear mapping,
Wherein:
Characteristic extracting module is used to extract the multiple view feature of initial data from the data such as original image, text.
Feature whitening module projects the multiple view feature extracted in characteristic extracting module unification to the same low-dimensional
Whitening processing is made to feature after space, reduces the redundancy for the multiple view feature that initial data extracts, it is special to remove different samples
Correlation between sign, then will remap back original space by transformed data.
Multiple view feature modeling module is used for the processed multiple view feature modeling of feature whitening module and characterizes.
Wherein: each sample is modeled as Gaussian Profile by this system, and assumes that two Gaussian Profiles are having the same
Covariance matrix.For each sample, multiple view feature is characterized with the mean vector of Gaussian Profile and covariance matrix.
After the completion of multiple view feature modeling, training data selecting module is used to select from the training data for have label certain
The triple of amount carries out model training.
Wherein: including belonging to the sample of similar object and belonging to the sample of different type objects in triple, that is, one
The triple of input include a pair of of positive sample to a pair of of negative sample pair.For example, it is assumed that now with two type objects, respectively desk
And chair.For desk A, a possible triple can be expressed as desk A, desk B and chair C, desk A and desk B table
Show a pair of of positive sample pair, desk A and chair C indicate a pair of of negative sample pair.
After selecting training data, Feature Mapping module generates projection matrix, and original data characteristics is mapped to one newly
Feature space, in new feature space, the distance between similar sample becomes smaller, and the distance between inhomogeneity sample becomes larger,
So that the data after mapping are easier to classification.
After the completion of training data selection, need to be trained the similarity calculation between data.Multiple view Sample Similarity
Computing module measures the similitude of multiple view sample by the KL divergence of the calculation optimization in the new feature space of mapping.
Based on the optimization module of KL divergence, for the optimization problem of KL divergence to be modeled as one in positive definite matrix group's manifold
On minimization problem.In order to solve this optimization problem, which uses the gradient decline accumulate in one kind and calculates this system
The method of method, i.e. Riemann's gradient decline, and symmetrization improvement has been done to this algorithm.Namely: it willSymmetrically turn toWhereinExp expression is using natural constant e as the exponential function at bottom, f (At) indicate linearly to reflect
Objective function after penetrating iteration t times,Indicate corresponding gradient, α indicates learning rate.
In this way, the manifold knot of learnt Linear Mapping can be retained in the iterative process each time of optimization
Structure, thereby it is ensured that the symmetric positive definite of the optimal KL divergence measurement learnt.
Constantly using Feature Mapping module, multiple view data similarity calculation module and optimization module based on KL divergence
Repetition training model, until convergence, to learn optimal Linear Mapping out.Until convergence, learns optimal Linear Mapping out.
On the whole, this system is using triple selected in training data selecting module as training data, based on pair
The optimization of KL divergence learns an optimal linear mapping, so that the data after mapping are easier to classify.
Categorization module based on KL divergence measurement under optimal linear mapping, for using the optimal linear mapping learnt
Original data characteristics is mapped to new feature space, based on the KL divergence between test set and training set, using k nearest neighbor
(KNN) classifier classifies to test set sample.
Characteristic extracting module is first module of the data sorting system based on the optimization of KL divergence, is mentioned from initial data
Take feature.Later, whitening pretreatment is carried out to feature, and carries out multiple view feature modeling.
After the completion of system is to the multiple view feature modeling of sample, the data sorting system based on the optimization of KL divergence is therefrom selected
A certain number of triples are as training data.
In training module, the data sorting system based on the optimization of KL divergence is first empty to a new feature by Feature Mapping
Between, then using mapping optimization after KL divergence metric calculation multiple view Sample Similarity using the interior gradient descent algorithm accumulate into
Row optimization.System constantly repeats this process, until convergence.At this point, system can learn an optimal Linear Mapping.Most
Afterwards, classify using categorization module in the new feature space of system in the mapped.
Particularly, optimization of this method based on KL divergence, in addition to this, the embodiment do not use under traditional gradient
Drop method optimizes categorizing system, but uses and accumulate gradient descent algorithm in one kind and optimize, and done symmetrical
The improvement of change, to guarantee symmetric positive definite of learnt Linear Mapping during each Optimized Iterative.There are also any to be, is based on
The data sorting system of KL divergence optimization can be applied to the data classification of multiple view data, and many homogeneous systems can only answer
For haplopia diagram data.
The embodiment additionally provides a kind of data classification method based on the optimization of KL divergence, and this method includes,
Step 1 carries out feature extraction to initial data.
Firstly, extracting the multiple view feature of initial data from the data such as original image, text.
By taking 3D object classification as an example, in the step 1, each 3D object is carved by the view of one group of different directions
It draws.For each view, one group of convolutional neural networks (CNN) feature is extracted.Other than feature extraction process, view is carried out
Cluster therefrom chooses one group of representative view to generate view cluster, removes the view of some possible redundancies.Pass through this
Kind method, each object can be characterized by the one group of representative view selected from view cluster.By this
Method, we can carry out the feature extraction of multiple view to 3D object classification task.
By taking text classification as an example, in the step 1, each text can use bag-of-words (BOW) word
Bag feature characterizes.Later, text distribution can be used as the distribution of text data.Specifically, firstly, first taking out all texts
Stop-word in this simultaneously removes, and other all non-stop words are then embedded into a term vector space (word2vec
Space in).That is, the vector of each word indicates to learn by one three layers of neural network (i.e. word2vec model)
It obtains.Later, each text can obtain normalization bag of words (nBOW) vector, reflect that each non-stop word goes out in text
Existing frequency.By this method, we can carry out feature extraction to text categorization task.
It should be noted that system of the invention does not require characteristic extraction procedure and method specifically, this meaning
Other features can also use in the system of the present invention, convolutional neural networks feature and normalization bag of words vector characteristics be
One example.
Step 2 carries out whitening processing to the feature extracted.
After extracting multiple view feature in initial data, by the unified projection of multiple view feature to the same lower dimensional space
Whitening processing made to feature later, reduces the redundancy for the multiple view feature that initial data extracts, is removed between different sample characteristics
Correlation, then will remap back original space by transformed data.
By taking 3D object classification as an example, for each view of object A, all views are first attached operation, are used
After the unified projection to the same lower dimensional space of PCA method (principal component analysis), whitening processing is made to the feature after projection, finally
The different views of transformed data are separated again, switch back to original space.
Step 3 by processed multiple view feature modeling and characterizes.
Each sample is modeled as a Multi-dimensional Gaussian distribution by this system, and it is identical to assume that two Gaussian Profiles have
Covariance matrix.For each sample, more views are characterized with the mean vector of the sample Gaussian Profile and covariance matrix
Figure feature.For example, in an embodiment of the present invention, each 3D object, every text are modeled as a Multi-dimensional Gaussian distribution.Often
A 3D object, every text covariance matrix all having the same.
Step 4 selects a certain amount of triple as training data, point of training data from the training data for have label
Cloth is feature distribution made of sample in step 3;
Such as Fig. 2, it is contemplated that (due to selecting training data for triple, complexity is O (n to computed losses3)), therefore, no
Need to calculate all triples.The sample that data sorting system based on the optimization of KL divergence concentrates each training data
This, chooses kiA sample and k from the same categorygIt is a to be trained from different classes of sample.kiAnd kgIt is all super ginseng
Number.
By taking 3D object classification as an example, to each of training set object, k is selectediA and most like the object (namely
KL divergence between two samples is the smallest), object and k from the same categorygIt is a from different classes of object, due to subsequent
Gradient in optimization process, which calculates, to be updated.
Step 5, using selected triple as training data, Feature Mapping is carried out, that is, in all mean vectors
One Linear Mapping A of upper application, is mapped to a new feature space for original data characteristics, in new feature space,
The distance between similar sample becomes smaller, and the distance between inhomogeneity sample becomes larger.
Most of homogeneous system existing at present measures two samples by directly calculating the KL divergence between two samples
This similarity.In order to preferably distinguish the sample in inhomogeneity, based on KL divergence optimization data sorting system make as
Lower improvement: apply a Linear Mapping A on all mean vectors, i.e., by all μiIt is substituted for A μi, by original data
Feature Mapping is to a new feature space.Objective function such as Fig. 2.
By taking 3D object classification as an example, the 3D object after Feature Mapping meets θi=g (x;Aμi;Σi) Gaussian Profile, A is institute
The Linear Mapping of study, θiIndicate that i-th of object, g indicate Gaussian Profile, Σ indicates that covariance matrix, μ indicate mean vector.
Similarity between step 6, calculating multiple view sample;Pass through the calculation optimization in the new feature space of mapping
KL divergence measures the similitude of multiple view sample.Note that the KL divergence of optimization herein is continuous renewal namely multiple view
Similarity between sample is to constantly update.
In an embodiment of the present invention, every text is modeled as a Multi-dimensional Gaussian distribution, therefore, between two texts
Similarity measured by the KL divergence between them, original KL divergence is expressed as Wherein, DKL(P1||P2) indicate sample P1With sample P2Between KL dissipate
Degree, log indicate natural logrithm, the determinant of det representing matrix, and n indicates that the intrinsic dimensionality of sample, tr are the mark of matrix, Σ table
Show that covariance matrix, μ indicate mean vector.Method assumes that two Gaussian Profiles covariance matrix having the same namely Σ1
=Σ2.Therefore, can be by above formula abbreviationIn the mistake of iteration each time
Cheng Zhong, the mapping learnt can be constantly updated, at this point, the KL divergence that optimizes in the new feature space for passing through mapping measures two
The similarity of sample, is expressed asWherein, KA(P1||P2) indicate sample
This P1With sample P2Between optimization KL divergence measurement, Σ indicates that covariance matrix, μ indicate mean vector, and A learnt
Linear Mapping.
Step 7, using the triple selected as training data, optimized based on KL divergence;Utilize Feature Mapping mould
Block, multiple view data similarity calculation module and the optimization module based on KL divergence are repeated continuously training pattern, until convergence,
To learn optimal linear mapping out;
It is flowed as shown in figure 3, problem is modeled as one by the data sorting system based on the optimization of KL divergence in positive definite matrix group
Minimization problem in shape.In specific execute, between any two excellent of all samples in training data has been calculated in step 6 first
KL divergence (apply Linear Mapping A) after change, and different groups is put into according to whether classification is identical.First item in objective function
For the sum of the KL divergence of all samples from the same category, the Section 2 in objective function is all from different classes of sample
This sum of KL divergence.Herein and subsequent mentioned all KL divergences be optimization after KL divergence.
In addition, the data sorting system based on the optimization of KL divergence is not using hinge unlike existing homogeneous system
Chain loss function (hinge-loss function), but use a new positive parameter γ balance homogeneous data with not
Homogeneous data bring influences.According to the characteristic distributions of data set, in the present embodiment, parameter γ is arranged toWherein,It is the average KL divergence of entire training dataset.With homogeneous data and inhomogeneity data
Between distance increase, that is, the distance between inhomogeneity data becomes increasingly remoter, the entropy of system can be because data become uniform
And become very big, at this time parameter γ can be intended to 1 (Become very big,It is intended to 0, so 1) γ can be intended to.Therefore, it repairs
Positive ternary constraint, i.e. γ can interpretive classification well feature.In addition to this, the λ in objective function is one for putting down
The parameter of weighing apparatus loss item and regularization term, value is between 0 to 1.The quantity of n representative sample.KA(θi||θj) indicate sample θiWith
Sample θjBetween mapping under KL divergence measurement.
Other than the constraint of modified ternary, the data sorting system based on the optimization of KL divergence uses a new regularization
Prevent over-fitting.Over-fitting is betided often in such data sorting system, especially frequently in higher-dimension
Hair.To keep the local topology under the input space, the data sorting system based on the optimization of KL divergence designs one and is based on office
Portion's topological structure, the regularizer characterized by the local neighbor for meeting local slickness, and the regularizer is added to target
In function, namelyWhereinβiBelong to positive real number domain, actually
Indicate input XiDensity function p (Xi).Data sorting system based on the optimization of KL divergence utilizes Parzen window method (core
Density estimation) carry out density function estimation p (Xi)。
Wherein khIt is a gaussian kernel function.What h was indicated is the width of core.The width of core controls the shadow of sample spacing
It rings.H is typically set to 0.4.NiIt is neighbour's indexed set of centronucleus xi.NiLength take 3.
SijWhat is indicated is the similitude between two samples, is calculated using gaussian kernel function,Wherein, σ=minD+1/v (maxD-minD), max D and min D are respectively indicated in all samples
KL divergence minimum and maximum between any two.V is control parameter, is set as 10 in the present system.DijIndicate sample i and sample j
Between KL divergence.Pay attention to S hereijIt is calculated using the initial KL divergence of training data, and is no longer updated later.
Namely SijIn the original feature space of data.KA(θi||θj) indicate sample θiWith sample θjBetween mapping under KL divergence degree
Amount.
In addition to this, in order to solve the minimum optimization problem in manifold, the present invention is not declined using traditional gradient
Method, but devise the gradient descent algorithm accumulate in one kind.Such as Fig. 3, this method is arrived by the gradient projection of objective function
After the tangent line space of the same manifold, the SPD matrix (symmetric positive definite matrix) of an affine constant Riemann metric is being given
The decline of Riemann's gradient is executed in manifold, can be retained in the iterative process each time of optimization after symmetrization and be learnt linearly
The manifold structure of mapping, thereby it is ensured that the symmetric positive definite of the optimal KL divergence measurement learnt.The gradient decline inside accumulate
Optimization algorithm formula isWherein
Exp expression is using natural constant e as the exponential function at bottom, f (At) indicate Linear Mapping iteration t times after objective function,Table
Show corresponding gradient, α indicates learning rate.
Continuous repeated characteristic mapping process, multiple view Sample Similarity calculating process and optimization process until convergence, this
When Linear Mapping A be the optimal linear mapping that learns.
Original data characteristics is mapped to new feature using the optimum linear mapping learnt in step 7 by step 8
Space, and classified with using k nearest neighbor (KNN) classifier to test set sample in new feature space;
In categorization module, the data sorting system based on the optimization of KL divergence uses k nearest neighbor algorithm.Test set is calculated first
Each of each of sample and training set sample KL divergence.Safeguard a size be k by distance by greatly to
Small priority query, for storing arest neighbors training tuple.At random from training tuple in choose k tuple as initially most
Neighbour's tuple calculates separately test tuple to the distance of this k tuple, will train first deck label and apart from deposit priority team
Column.If most of categories in the k in feature space most like samples (i.e. closest in feature space) of a sample
In some classification, then determine that the sample also belongs to this classification.
In the embodiment, the data sorting system based on the optimization of KL divergence is in two generic task of 3D object identification and text categories
On all carried out example test, as a result as described in Figure 4, based on KL divergence optimization data sorting system classification accuracy on want
Higher than existing system.
As Figure 4-Figure 6, abscissa indicates respectively using 20%, 30%, 40%, 50% data in original data set as instruction
Practice data, ordinate indicates to choose corresponding nicety of grading when the training data of different proportion.
Categorizing system of the present invention is shown in FIG as KLD-M, by image it can be seen that Lai data with other eight kinds of mainstreams
Categorizing system is compared, and the data sorting system based on the optimization of KL divergence is higher than existing system in classification accuracy.Wherein eight
The data sorting system of kind mainstream is " Partial Least Squares covariance differentiates study (CDL_PLS) ", " Pasteur's distance respectively
(BD) ", " Fisher face covariance differentiates study (CDL_LDA) ", " manifold discriminant analysis (MDA) ", " projective measurement
Habit (PML) ", " Log-Euclidean metric learning (LEML) ", " removing soil distance (EMD) ", " traditional KL divergence (KLD) ".
As described in Figure 5, abscissa indicates the variation range of λ, and ordinate indicates to choose corresponding classification when different λ values
Precision.From the figure, it can be seen that the data sorting system based on the optimization of KL divergence is still able to maintain preferable property when Parameters variation
It can, it is shown that the robustness of the data sorting system based on the optimization of KL divergence.
It should be pointed out that for those skilled in the art, without departing from the principle of the present invention,
Can be with several improvements and modifications are made to the present invention, these improvement and modification also fall into the protection scope of the claims in the present invention
It is interior.
Claims (7)
1. a kind of data sorting system based on the optimization of KL divergence, comprising: characteristic extracting module, feature whitening module, multiple view
Feature modeling module, training data selecting module, Feature Mapping module, multiple view Sample Similarity computing module, based on KL dissipate
Categorization module based on KL divergence measurement under the optimization module of degree, optimal linear mapping, it is characterised in that:
Characteristic extracting module is used to extract the multiple view feature of initial data from including original image, text data;
Feature whitening module projects the multiple view feature extracted in characteristic extracting module unification to the same lower dimensional space
Whitening processing made to feature later, reduces the redundancy for the multiple view feature that initial data extracts, is removed between different sample characteristics
Correlation, then will remap back original space by transformed data;
Multiple view feature modeling module is used for the processed multiple view feature modeling of feature whitening module and characterizes;
Training data selecting module from the training data for have label for selecting a certain amount of triple to carry out model training;
After selecting training data, Feature Mapping module generates projection matrix, and original data characteristics is mapped to a new spy
Space is levied, in new feature space, the distance between similar sample becomes smaller, and the distance between inhomogeneity sample becomes larger;Then,
Multiple view Sample Similarity computing module measures multiple view sample by the KL divergence of the calculation optimization in new feature space
Similitude;
Based on the optimization module of KL divergence, for the optimization problem of KL divergence to be modeled as one in positive definite matrix group's manifold
Minimization problem;Not using Feature Mapping module, multiple view data similarity calculation module and optimization module based on KL divergence
Disconnected ground repetition training model, until convergence, to learn optimal linear mapping out;
Categorization module based on KL divergence measurement under optimal linear mapping, for will be original using the optimal linear mapping learnt
Data characteristics be mapped to new feature space, based on the KL divergence between test set and training set, using k nearest neighbor classifier pair
Test set sample is classified.
2. the data sorting system according to claim 1 based on the optimization of KL divergence, it is characterised in that: by each sample
It is modeled as Gaussian Profile, and assumes two Gaussian Profile covariance matrixes having the same;For each sample, with Gauss point
The mean vector and covariance matrix of cloth characterizes multiple view feature.
3. the data sorting system according to claim 1 based on the optimization of KL divergence, it is characterised in that: include in triple
Belong to the sample of similar object and belongs to the sample of different type objects.
4. the data classification method realized using the data system described in claim 1 based on the optimization of KL divergence, feature are existed
In: the data classification method includes:
Step 1 carries out feature extraction to initial data, i.e., from original including extracting initial data in image, text data
Multiple view feature;
Step 2, whitening processing is carried out to the feature that extracts will be more that is, after extracting multiple view feature in initial data
Whitening processing is made to feature after the unified projection to the same lower dimensional space of view feature, reduces more views that initial data extracts
The redundancy of figure feature removes the correlation between different sample characteristics, then will remap back originally by transformed data
Space;
Step 3 by processed multiple view feature modeling and characterizes;
Step 4 selects a certain amount of triple as training data from the training data for have label, and training data is distributed as
Feature distribution made of sample in step 3;
Step 5, using the triple selected as training data, carry out Feature Mapping, i.e., apply one on all mean vectors
Original data characteristics is mapped to a new feature space by a Linear Mapping, in new feature space, similar sample it
Between distance become smaller, the distance between inhomogeneity sample becomes larger;
Similarity between step 6, calculating multiple view sample, i.e., dissipated by the KL of the calculation optimization in the new feature space of mapping
It spends to measure the similitude of multiple view sample;
Step 7, using the triple selected as training data, optimized based on KL divergence;Utilize Feature Mapping module, more
Viewdata similarity calculation module and optimization module based on KL divergence are repeated continuously training pattern, until convergence, thus
Learn optimal linear mapping out;
Original data characteristics is mapped to new feature sky using the optimum linear mapping learnt in step 7 by step 8
Between, and classified using the k nearest neighbor classifier based on KL divergence to test set sample in new feature space.
5. data classification method according to claim 5, it is characterised in that: in step 7, balanced using positive parameter γ
Homogeneous data and inhomogeneity data bring influence;And parameter γ is arranged toWherein,It is entire
The average KL divergence of training dataset.
6. data classification method according to claim 5, it is characterised in that: in step 7, the gradient of objective function is thrown
After shadow to the tangent line space of the same manifold, give an affine constant Riemann metric symmetric positive definite matrix) manifold on
The decline of Riemann's gradient is executed, retains the manifold knot of learnt Linear Mapping after symmetrization in the iterative process each time of optimization
Structure.
7. data classification method according to claim 5, it is characterised in that: in step 8, calculate each of test set
The KL divergence of each of sample and training set sample;Safeguard that a size is k by apart from descending priority team
Column, for storing arest neighbors training tuple;K tuple is chosen from training tuple at random as initial arest neighbors tuple, is divided
Tuple Ji Suan not tested to the distance of the k tuple, first deck label will be trained and apart from deposit priority query.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811540690.4A CN109615014B (en) | 2018-12-17 | 2018-12-17 | KL divergence optimization-based 3D object data classification system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811540690.4A CN109615014B (en) | 2018-12-17 | 2018-12-17 | KL divergence optimization-based 3D object data classification system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109615014A true CN109615014A (en) | 2019-04-12 |
CN109615014B CN109615014B (en) | 2023-08-22 |
Family
ID=66009466
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811540690.4A Active CN109615014B (en) | 2018-12-17 | 2018-12-17 | KL divergence optimization-based 3D object data classification system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109615014B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110118657A (en) * | 2019-06-21 | 2019-08-13 | 杭州安脉盛智能技术有限公司 | Based on relative entropy and K nearest neighbor algorithm Fault Diagnosis of Roller Bearings and system |
CN110223275A (en) * | 2019-05-28 | 2019-09-10 | 陕西师范大学 | A kind of cerebral white matter fiber depth clustering method of task-fMRI guidance |
CN111259938A (en) * | 2020-01-09 | 2020-06-09 | 浙江大学 | Manifold learning and gradient lifting model-based image multi-label classification method |
CN111738351A (en) * | 2020-06-30 | 2020-10-02 | 创新奇智(重庆)科技有限公司 | Model training method and device, storage medium and electronic equipment |
CN112149699A (en) * | 2019-06-28 | 2020-12-29 | 北京京东尚科信息技术有限公司 | Method and device for generating model and method and device for recognizing image |
CN112949296A (en) * | 2019-12-10 | 2021-06-11 | 医渡云(北京)技术有限公司 | Riemann space-based word embedding method and device, medium and equipment |
CN113095731A (en) * | 2021-05-10 | 2021-07-09 | 北京人人云图信息技术有限公司 | Flight regulation and control method and system based on passenger flow time sequence clustering optimization |
CN113298731A (en) * | 2021-05-24 | 2021-08-24 | Oppo广东移动通信有限公司 | Image color migration method and device, computer readable medium and electronic equipment |
CN113655385A (en) * | 2021-10-19 | 2021-11-16 | 深圳市德兰明海科技有限公司 | Lithium battery SOC estimation method and device and computer readable storage medium |
CN113688773A (en) * | 2021-09-03 | 2021-11-23 | 重庆大学 | Storage tank dome displacement data restoration method and device based on deep learning |
CN113887661A (en) * | 2021-10-25 | 2022-01-04 | 济南大学 | Image set classification method and system based on representation learning reconstruction residual analysis |
CN114662620A (en) * | 2022-05-24 | 2022-06-24 | 岚图汽车科技有限公司 | Automobile endurance load data processing method and device for market users |
CN114882262A (en) * | 2022-05-07 | 2022-08-09 | 四川大学 | Multi-view clustering method and system based on topological manifold |
CN116687406A (en) * | 2023-05-06 | 2023-09-05 | 粤港澳大湾区精准医学研究院(广州) | Emotion recognition method and device, electronic equipment and storage medium |
CN112949296B (en) * | 2019-12-10 | 2024-05-31 | 医渡云(北京)技术有限公司 | Word embedding method and device based on Riemann space, medium and equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090171870A1 (en) * | 2007-12-31 | 2009-07-02 | Yahoo! Inc. | System and method of feature selection for text classification using subspace sampling |
US20130064444A1 (en) * | 2011-09-12 | 2013-03-14 | Xerox Corporation | Document classification using multiple views |
CN105574548A (en) * | 2015-12-23 | 2016-05-11 | 北京化工大学 | Hyperspectral data dimensionality-reduction method based on sparse and low-rank representation graph |
CN106126474A (en) * | 2016-04-13 | 2016-11-16 | 扬州大学 | A kind of linear classification method embedded based on local spline |
CN106951914A (en) * | 2017-02-22 | 2017-07-14 | 江苏大学 | The Electronic Nose that a kind of Optimization of Fuzzy discriminant vectorses are extracted differentiates vinegar kind method |
-
2018
- 2018-12-17 CN CN201811540690.4A patent/CN109615014B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090171870A1 (en) * | 2007-12-31 | 2009-07-02 | Yahoo! Inc. | System and method of feature selection for text classification using subspace sampling |
US20130064444A1 (en) * | 2011-09-12 | 2013-03-14 | Xerox Corporation | Document classification using multiple views |
CN105574548A (en) * | 2015-12-23 | 2016-05-11 | 北京化工大学 | Hyperspectral data dimensionality-reduction method based on sparse and low-rank representation graph |
CN106126474A (en) * | 2016-04-13 | 2016-11-16 | 扬州大学 | A kind of linear classification method embedded based on local spline |
CN106951914A (en) * | 2017-02-22 | 2017-07-14 | 江苏大学 | The Electronic Nose that a kind of Optimization of Fuzzy discriminant vectorses are extracted differentiates vinegar kind method |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110223275A (en) * | 2019-05-28 | 2019-09-10 | 陕西师范大学 | A kind of cerebral white matter fiber depth clustering method of task-fMRI guidance |
CN110118657B (en) * | 2019-06-21 | 2021-06-11 | 杭州安脉盛智能技术有限公司 | Rolling bearing fault diagnosis method and system based on relative entropy and K nearest neighbor algorithm |
CN110118657A (en) * | 2019-06-21 | 2019-08-13 | 杭州安脉盛智能技术有限公司 | Based on relative entropy and K nearest neighbor algorithm Fault Diagnosis of Roller Bearings and system |
CN112149699B (en) * | 2019-06-28 | 2023-09-05 | 北京京东尚科信息技术有限公司 | Method and device for generating model and method and device for identifying image |
CN112149699A (en) * | 2019-06-28 | 2020-12-29 | 北京京东尚科信息技术有限公司 | Method and device for generating model and method and device for recognizing image |
CN112949296B (en) * | 2019-12-10 | 2024-05-31 | 医渡云(北京)技术有限公司 | Word embedding method and device based on Riemann space, medium and equipment |
CN112949296A (en) * | 2019-12-10 | 2021-06-11 | 医渡云(北京)技术有限公司 | Riemann space-based word embedding method and device, medium and equipment |
CN111259938B (en) * | 2020-01-09 | 2022-04-12 | 浙江大学 | Manifold learning and gradient lifting model-based image multi-label classification method |
CN111259938A (en) * | 2020-01-09 | 2020-06-09 | 浙江大学 | Manifold learning and gradient lifting model-based image multi-label classification method |
CN111738351A (en) * | 2020-06-30 | 2020-10-02 | 创新奇智(重庆)科技有限公司 | Model training method and device, storage medium and electronic equipment |
CN111738351B (en) * | 2020-06-30 | 2023-12-19 | 创新奇智(重庆)科技有限公司 | Model training method and device, storage medium and electronic equipment |
CN113095731A (en) * | 2021-05-10 | 2021-07-09 | 北京人人云图信息技术有限公司 | Flight regulation and control method and system based on passenger flow time sequence clustering optimization |
CN113298731A (en) * | 2021-05-24 | 2021-08-24 | Oppo广东移动通信有限公司 | Image color migration method and device, computer readable medium and electronic equipment |
CN113688773A (en) * | 2021-09-03 | 2021-11-23 | 重庆大学 | Storage tank dome displacement data restoration method and device based on deep learning |
CN113688773B (en) * | 2021-09-03 | 2023-09-26 | 重庆大学 | Storage tank dome displacement data restoration method and device based on deep learning |
CN113655385B (en) * | 2021-10-19 | 2022-02-08 | 深圳市德兰明海科技有限公司 | Lithium battery SOC estimation method and device and computer readable storage medium |
CN113655385A (en) * | 2021-10-19 | 2021-11-16 | 深圳市德兰明海科技有限公司 | Lithium battery SOC estimation method and device and computer readable storage medium |
CN113887661A (en) * | 2021-10-25 | 2022-01-04 | 济南大学 | Image set classification method and system based on representation learning reconstruction residual analysis |
CN114882262A (en) * | 2022-05-07 | 2022-08-09 | 四川大学 | Multi-view clustering method and system based on topological manifold |
CN114882262B (en) * | 2022-05-07 | 2024-01-26 | 四川大学 | Multi-view clustering method and system based on topological manifold |
CN114662620A (en) * | 2022-05-24 | 2022-06-24 | 岚图汽车科技有限公司 | Automobile endurance load data processing method and device for market users |
CN116687406A (en) * | 2023-05-06 | 2023-09-05 | 粤港澳大湾区精准医学研究院(广州) | Emotion recognition method and device, electronic equipment and storage medium |
CN116687406B (en) * | 2023-05-06 | 2024-01-02 | 粤港澳大湾区精准医学研究院(广州) | Emotion recognition method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109615014B (en) | 2023-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109615014A (en) | A kind of data sorting system and method based on the optimization of KL divergence | |
CN109344736B (en) | Static image crowd counting method based on joint learning | |
CN110309331B (en) | Cross-modal deep hash retrieval method based on self-supervision | |
Ding et al. | Unsupervised self-correlated learning smoothy enhanced locality preserving graph convolution embedding clustering for hyperspectral images | |
CN111079639B (en) | Method, device, equipment and storage medium for constructing garbage image classification model | |
Carrière et al. | Stable topological signatures for points on 3d shapes | |
CN105184298B (en) | A kind of image classification method of quick local restriction low-rank coding | |
Kim et al. | Color–texture segmentation using unsupervised graph cuts | |
CN103988232B (en) | Motion manifold is used to improve images match | |
Xing et al. | Pixel-to-pixel learning with weak supervision for single-stage nucleus recognition in ki67 images | |
CN105354595A (en) | Robust visual image classification method and system | |
CN104281835B (en) | Face recognition method based on local sensitive kernel sparse representation | |
CN110751027B (en) | Pedestrian re-identification method based on deep multi-instance learning | |
CN111126464A (en) | Image classification method based on unsupervised domain confrontation field adaptation | |
CN110097096A (en) | A kind of file classification method based on TF-IDF matrix and capsule network | |
CN109271546A (en) | The foundation of image retrieval Feature Selection Model, Database and search method | |
Kang et al. | Robust visual tracking via nonlocal regularized multi-view sparse representation | |
CN103955709A (en) | Weighted synthetic kernel and triple markov field (TMF) based polarimetric synthetic aperture radar (SAR) image classification method | |
CN109993208A (en) | A kind of clustering processing method having noise image | |
Das et al. | Batch mode active learning on the Riemannian manifold for automated scoring of nuclear pleomorphism in breast cancer | |
Zhi et al. | Gray image segmentation based on fuzzy c-means and artificial bee colony optimization | |
Pratikakis et al. | Partial 3d object retrieval combining local shape descriptors with global fisher vectors | |
CN107315984A (en) | A kind of method and device of pedestrian retrieval | |
CN104573726B (en) | Facial image recognition method based on the quartering and each ingredient reconstructed error optimum combination | |
CN110097067A (en) | It is a kind of based on layer into the Weakly supervised fine granularity image classification method of formula eigentransformation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |