CN109615014A

CN109615014A - A kind of data sorting system and method based on the optimization of KL divergence

Info

Publication number: CN109615014A
Application number: CN201811540690.4A
Authority: CN
Inventors: 高跃; 吉书仪; 赵曦滨; 黄晋
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2019-04-12
Anticipated expiration: 2038-12-17
Also published as: CN109615014B

Abstract

The present invention relates to a kind of methods for being optimized based on KL divergence and being classified to data: the data such as original image, text being carried out data prediction, by object modeling at multiple dimensional distribution；A certain amount of triple is selected to carry out model training from the training data for have label；Using selected triple as training data, apply a Linear Mapping A on all mean vectors and optimal Linear Mapping is learnt by iteration optimization, basic assumption of the learning process based on metric learning, i.e., the distance between similar sample become smaller, and the distance between inhomogeneity sample becomes larger；It is optimized using the gradient descent algorithm accumulate in one kind, after the tangent line space of the gradient projection of objective function to the same manifold, the decline of Riemann's gradient is executed in the manifold of the SPD matrix of a given affine constant Riemann metric；The KL divergence between test set and training set is calculated, is classified using k nearest neighbor (KNN) classifier to sample.This method can effectively improve the nicety of grading of system, and possess more stable performance.

Description

A kind of data sorting system and method based on the optimization of KL divergence

Technical field

The invention belongs to machine learning fields, carry out categorizing system to data based on the optimization of KL divergence more particularly to one kind With method.

Background technique

With the development of information technology, Data Classification Technology increasingly becomes the research hotspot of academia and industry.Number Refer under given classification system according to classification, the process of data category, Data Classification Technology packet are automatically determined according to data content Include various applications, such as picture classification, text classification, Classification of Speech etc..One good classifier is conducive to logarithm It is applied according to more later periods are done.For example, can classify automatically in text filtering, Web document after carrying out preliminary text classification, The multiple fields such as the organization and management of digital library, semanteme of word discrimination and document are applied.

In machine learning, data object is usually modeled as multiple dimensional distribution to characterize their feature.Therefore, in data During classification, how to measure the similitude between two data distributions becomes key problem in classification task.Two Similarity between sample is higher, then they have bigger probability to belong to same class.Common probability distribution is measured Jensen-Shannon divergence, the Earth Mover ' s Distance (EMD), Maximum Mean Discrepancy etc. Deng.In these measurements, Kullback-Leibler divergence (KL divergence), also known as relative entropy are the most commonly used to be used to spend One of the measurement for measuring similitude between two probability distribution is widely used in multiple fields, such as computer vision, mode are known Not etc..The expression of KL divergence is fitted the information loss generated when true distribution P with probability distribution Q, therefore can preferably spend Measure the similitude between data distribution.

However in real world, the source of data is extremely complex, and the quality of data is also unable to get guarantee, in collection In data set, it is understood that there may be noise data, missing data, data distribution imbalance, data point, which mix, is difficult to the problems such as differentiating.? Under such circumstances, traditional KL divergence is in the similitude being difficult between two data samples of precisive under theorem in Euclid space, into And the accuracy rate of data classification can be had a huge impact.In other words, traditional KL divergence can not be carried out always optimal Data indicate.

The existing research for KL divergence has focused largely on two aspects, first is that directly using KL divergence as multiple dimensional distribution Between estimate, such as variation self-encoding encoder etc., second is that by the way that the approximate of KL divergence is obtained one and is more effectively estimated, example Such as pass through the methods of the variation upper bound, Monte Carlo approximation approximation.It can be seen that KL divergence is directly applied in existing research mostly, And the optimization of few research concern KL divergences itself.

Summary of the invention

It is an object of the invention to propose a kind of data sorting system and method based on the optimization of KL divergence, the system and side Method is optimized for traditional KL divergence, learns the best expression an of data, can effectively improve point of existed system Class ability, and have more stable performance.

The technical solution of the present invention is to provide a kind of data sorting systems based on the optimization of KL divergence, comprising: feature extraction Module, feature whitening module, multiple view feature modeling module, training data selecting module, Feature Mapping module, multiple view sample Similarity calculation module, the optimization module based on KL divergence, the categorization module based on KL divergence measurement under optimal linear mapping, It is characterized in that:

The multiple view that characteristic extracting module is used to extract initial data from including original image, text data is special Sign；

Feature whitening module projects the multiple view feature extracted in characteristic extracting module unification to the same low-dimensional Whitening processing is made to feature after space, reduces the redundancy for the multiple view feature that initial data extracts, it is special to remove different samples Correlation between sign, then will remap back original space by transformed data；

Multiple view feature modeling module is used for the processed multiple view feature modeling of feature whitening module and characterizes；

Training data selecting module from the training data for have label for selecting a certain amount of triple to carry out model instruction Practice；

After selecting training data, Feature Mapping module generates projection matrix, and original data characteristics is mapped to one newly Feature space, in new feature space, the distance between similar sample becomes smaller, and the distance between inhomogeneity sample becomes larger； Then, multiple view Sample Similarity computing module measures multiple view by the KL divergence of the calculation optimization in new feature space The similitude of sample；

Based on the optimization module of KL divergence, for the optimization problem of KL divergence to be modeled as one in positive definite matrix group's manifold On minimization problem；Using Feature Mapping module, multiple view data similarity calculation module and based on the optimization mould of KL divergence Block is repeated continuously training pattern, until convergence, to learn optimal linear mapping out；

Categorization module based on KL divergence measurement under optimal linear mapping, for being incited somebody to action using the optimal linear mapping learnt Original data characteristics is mapped to new feature space, based on the KL divergence between test set and training set, is classified using k nearest neighbor Device classifies to test set sample.

Further, it is Gaussian Profile by each sample, and assumes two Gaussian Profile association sides having the same Poor matrix；For each sample, multiple view feature is characterized with the mean vector of Gaussian Profile and covariance matrix.

Further, including belonging to the sample of similar object and belonging to the sample of different type objects in triple.

The present invention also provides a kind of data classification methods that the data system based on the optimization of KL divergence is realized, comprising:

Step 1 carries out feature extraction to initial data, i.e., from it is original include image, extract in text data it is original The multiple view feature of data；

Step 2 carries out whitening processing to the feature extracted, i.e., after extracting multiple view feature in initial data, Whitening processing will be made to feature after the unified projection to the same lower dimensional space of multiple view feature, reduces what initial data extracted The redundancy of multiple view feature removes the correlation between different sample characteristics, then the original that will remap back by transformed data The space come；

Step 3 by processed multiple view feature modeling and characterizes；

Step 4 selects a certain amount of triple as training data, point of training data from the training data for have label Cloth is feature distribution made of sample in step 3；

Step 5, using the triple selected as training data, carry out Feature Mapping, i.e., applied on all mean vectors Add a Linear Mapping, original data characteristics is mapped to a new feature space, in new feature space, similar sample The distance between this becomes smaller, and the distance between inhomogeneity sample becomes larger；

Similarity between step 6, calculating multiple view sample, that is, pass through the calculation optimization in the new feature space of mapping KL divergence measures the similitude of multiple view sample；

Step 7, using the triple selected as training data, optimized based on KL divergence；Utilize Feature Mapping mould Block, multiple view data similarity calculation module and the optimization module based on KL divergence are repeated continuously training pattern, until convergence, To learn optimal linear mapping out；

Original data characteristics is mapped to new feature using the optimum linear mapping learnt in step 7 by step 8 Space, and classified using k nearest neighbor classifier to test set sample in new feature space.

Further, in step 7, homogeneous data and inhomogeneity data bring are balanced using positive parameter γ to be influenced； And parameter γ is arranged toWherein,It is the average KL divergence of entire training dataset.

Further, in step 7, by after the tangent line space of the gradient projection of objective function to the same manifold, Give an affine constant Riemann metric symmetric positive definite matrix) manifold on execute the decline of Riemann's gradient, excellent after symmetrization Retain the manifold structure of learnt Linear Mapping in the iterative process each time changed；

Further, in step 8, the KL for calculating each of test set each of sample and training set sample dissipates Degree；It safeguards that a size is k by apart from descending priority query, trains tuple for storing arest neighbors；At random from K tuple is chosen in training tuple as initial arest neighbors tuple, calculate separately test tuple to the k tuple away from From by the first deck label of training and apart from deposit priority query.

The beneficial effects of the present invention are:

(1) method and system proposed by the present invention can effectively improve the nicety of grading of system, and possess more stable Performance.It can make up for it under reality scene there may be noise data, missing data, data distribution are uneven, data point mixes It is difficult to the problems such as differentiating.

(2) present invention can be suitable for the classification of multiple view data.

(3) this method learns an optimal Linear Mapping from the training data for have label, by original data space It is mapped to a new feature space., can be closer from the distance between similar data sample in new feature space, and It can be farther from the distance between inhomogeneous data sample.Compared with the system of existing same type, the present invention can be effective Ground improves the classification capacity of system, and possesses more stable performance.

Detailed description of the invention

Fig. 1 is the flow chart of the method for the present invention；

Fig. 2 is the schematic diagram of the method for the present invention；

Fig. 3 by the method for the present invention use in the explanation figure of gradient descent algorithm accumulate；

Fig. 4 is that this system applies the nicety of grading comparison result under 3D object recognition task with other systems, tests number According to integrating as NTU16 data set；

Fig. 5 is that this system applies the nicety of grading comparison result under 3D object recognition task with other systems, tests number According to integrating as NTU47 data set；

Fig. 6 is that this system applies the nicety of grading comparison result under text categorization task with other systems, test data Integrate as TWITTER data set；

The nicety of grading variation of system when Fig. 7 is this system Parameters variation.

Specific embodiment

Technical solution of the present invention is described in detail below with reference to attached drawing 1-5.

As shown in Figure 1, this embodiment offers a kind of data sorting systems based on the optimization of KL divergence, comprising: feature mentions Modulus block, feature whitening module, multiple view feature modeling module, training data selecting module, Feature Mapping module, multiple view sample This similarity calculation module, the optimization module based on KL divergence, the categorization module based on KL divergence measurement under optimal linear mapping, Wherein:

Characteristic extracting module is used to extract the multiple view feature of initial data from the data such as original image, text.

Feature whitening module projects the multiple view feature extracted in characteristic extracting module unification to the same low-dimensional Whitening processing is made to feature after space, reduces the redundancy for the multiple view feature that initial data extracts, it is special to remove different samples Correlation between sign, then will remap back original space by transformed data.

Multiple view feature modeling module is used for the processed multiple view feature modeling of feature whitening module and characterizes.

Wherein: each sample is modeled as Gaussian Profile by this system, and assumes that two Gaussian Profiles are having the same Covariance matrix.For each sample, multiple view feature is characterized with the mean vector of Gaussian Profile and covariance matrix.

After the completion of multiple view feature modeling, training data selecting module is used to select from the training data for have label certain The triple of amount carries out model training.

Wherein: including belonging to the sample of similar object and belonging to the sample of different type objects in triple, that is, one The triple of input include a pair of of positive sample to a pair of of negative sample pair.For example, it is assumed that now with two type objects, respectively desk And chair.For desk A, a possible triple can be expressed as desk A, desk B and chair C, desk A and desk B table Show a pair of of positive sample pair, desk A and chair C indicate a pair of of negative sample pair.

After selecting training data, Feature Mapping module generates projection matrix, and original data characteristics is mapped to one newly Feature space, in new feature space, the distance between similar sample becomes smaller, and the distance between inhomogeneity sample becomes larger, So that the data after mapping are easier to classification.

After the completion of training data selection, need to be trained the similarity calculation between data.Multiple view Sample Similarity Computing module measures the similitude of multiple view sample by the KL divergence of the calculation optimization in the new feature space of mapping.

Based on the optimization module of KL divergence, for the optimization problem of KL divergence to be modeled as one in positive definite matrix group's manifold On minimization problem.In order to solve this optimization problem, which uses the gradient decline accumulate in one kind and calculates this system The method of method, i.e. Riemann's gradient decline, and symmetrization improvement has been done to this algorithm.Namely: it willSymmetrically turn toWhereinExp expression is using natural constant e as the exponential function at bottom, f (A_t) indicate linearly to reflect Objective function after penetrating iteration t times,Indicate corresponding gradient, α indicates learning rate.

In this way, the manifold knot of learnt Linear Mapping can be retained in the iterative process each time of optimization Structure, thereby it is ensured that the symmetric positive definite of the optimal KL divergence measurement learnt.

Constantly using Feature Mapping module, multiple view data similarity calculation module and optimization module based on KL divergence Repetition training model, until convergence, to learn optimal Linear Mapping out.Until convergence, learns optimal Linear Mapping out.

On the whole, this system is using triple selected in training data selecting module as training data, based on pair The optimization of KL divergence learns an optimal linear mapping, so that the data after mapping are easier to classify.

Categorization module based on KL divergence measurement under optimal linear mapping, for using the optimal linear mapping learnt Original data characteristics is mapped to new feature space, based on the KL divergence between test set and training set, using k nearest neighbor (KNN) classifier classifies to test set sample.

Characteristic extracting module is first module of the data sorting system based on the optimization of KL divergence, is mentioned from initial data Take feature.Later, whitening pretreatment is carried out to feature, and carries out multiple view feature modeling.

After the completion of system is to the multiple view feature modeling of sample, the data sorting system based on the optimization of KL divergence is therefrom selected A certain number of triples are as training data.

In training module, the data sorting system based on the optimization of KL divergence is first empty to a new feature by Feature Mapping Between, then using mapping optimization after KL divergence metric calculation multiple view Sample Similarity using the interior gradient descent algorithm accumulate into Row optimization.System constantly repeats this process, until convergence.At this point, system can learn an optimal Linear Mapping.Most Afterwards, classify using categorization module in the new feature space of system in the mapped.

Particularly, optimization of this method based on KL divergence, in addition to this, the embodiment do not use under traditional gradient Drop method optimizes categorizing system, but uses and accumulate gradient descent algorithm in one kind and optimize, and done symmetrical The improvement of change, to guarantee symmetric positive definite of learnt Linear Mapping during each Optimized Iterative.There are also any to be, is based on The data sorting system of KL divergence optimization can be applied to the data classification of multiple view data, and many homogeneous systems can only answer For haplopia diagram data.

The embodiment additionally provides a kind of data classification method based on the optimization of KL divergence, and this method includes,

Step 1 carries out feature extraction to initial data.

Firstly, extracting the multiple view feature of initial data from the data such as original image, text.

By taking 3D object classification as an example, in the step 1, each 3D object is carved by the view of one group of different directions It draws.For each view, one group of convolutional neural networks (CNN) feature is extracted.Other than feature extraction process, view is carried out Cluster therefrom chooses one group of representative view to generate view cluster, removes the view of some possible redundancies.Pass through this Kind method, each object can be characterized by the one group of representative view selected from view cluster.By this Method, we can carry out the feature extraction of multiple view to 3D object classification task.

By taking text classification as an example, in the step 1, each text can use bag-of-words (BOW) word Bag feature characterizes.Later, text distribution can be used as the distribution of text data.Specifically, firstly, first taking out all texts Stop-word in this simultaneously removes, and other all non-stop words are then embedded into a term vector space (word2vec Space in).That is, the vector of each word indicates to learn by one three layers of neural network (i.e. word2vec model) It obtains.Later, each text can obtain normalization bag of words (nBOW) vector, reflect that each non-stop word goes out in text Existing frequency.By this method, we can carry out feature extraction to text categorization task.

It should be noted that system of the invention does not require characteristic extraction procedure and method specifically, this meaning Other features can also use in the system of the present invention, convolutional neural networks feature and normalization bag of words vector characteristics be One example.

Step 2 carries out whitening processing to the feature extracted.

After extracting multiple view feature in initial data, by the unified projection of multiple view feature to the same lower dimensional space Whitening processing made to feature later, reduces the redundancy for the multiple view feature that initial data extracts, is removed between different sample characteristics Correlation, then will remap back original space by transformed data.

By taking 3D object classification as an example, for each view of object A, all views are first attached operation, are used After the unified projection to the same lower dimensional space of PCA method (principal component analysis), whitening processing is made to the feature after projection, finally The different views of transformed data are separated again, switch back to original space.

Step 3 by processed multiple view feature modeling and characterizes.

Each sample is modeled as a Multi-dimensional Gaussian distribution by this system, and it is identical to assume that two Gaussian Profiles have Covariance matrix.For each sample, more views are characterized with the mean vector of the sample Gaussian Profile and covariance matrix Figure feature.For example, in an embodiment of the present invention, each 3D object, every text are modeled as a Multi-dimensional Gaussian distribution.Often A 3D object, every text covariance matrix all having the same.

Such as Fig. 2, it is contemplated that (due to selecting training data for triple, complexity is O (n to computed losses³)), therefore, no Need to calculate all triples.The sample that data sorting system based on the optimization of KL divergence concentrates each training data This, chooses k_iA sample and k from the same category_gIt is a to be trained from different classes of sample.k_iAnd k_gIt is all super ginseng Number.

By taking 3D object classification as an example, to each of training set object, k is selected_iA and most like the object (namely KL divergence between two samples is the smallest), object and k from the same category_gIt is a from different classes of object, due to subsequent Gradient in optimization process, which calculates, to be updated.

Step 5, using selected triple as training data, Feature Mapping is carried out, that is, in all mean vectors One Linear Mapping A of upper application, is mapped to a new feature space for original data characteristics, in new feature space, The distance between similar sample becomes smaller, and the distance between inhomogeneity sample becomes larger.

Most of homogeneous system existing at present measures two samples by directly calculating the KL divergence between two samples This similarity.In order to preferably distinguish the sample in inhomogeneity, based on KL divergence optimization data sorting system make as Lower improvement: apply a Linear Mapping A on all mean vectors, i.e., by all μ_iIt is substituted for A μ_i, by original data Feature Mapping is to a new feature space.Objective function such as Fig. 2.

By taking 3D object classification as an example, the 3D object after Feature Mapping meets θ_i=g (x；Aμ_i；Σ_i) Gaussian Profile, A is institute The Linear Mapping of study, θ_iIndicate that i-th of object, g indicate Gaussian Profile, Σ indicates that covariance matrix, μ indicate mean vector.

Similarity between step 6, calculating multiple view sample；Pass through the calculation optimization in the new feature space of mapping KL divergence measures the similitude of multiple view sample.Note that the KL divergence of optimization herein is continuous renewal namely multiple view Similarity between sample is to constantly update.

In an embodiment of the present invention, every text is modeled as a Multi-dimensional Gaussian distribution, therefore, between two texts Similarity measured by the KL divergence between them, original KL divergence is expressed as Wherein, D_KL(P₁||P₂) indicate sample P₁With sample P₂Between KL dissipate Degree, log indicate natural logrithm, the determinant of det representing matrix, and n indicates that the intrinsic dimensionality of sample, tr are the mark of matrix, Σ table Show that covariance matrix, μ indicate mean vector.Method assumes that two Gaussian Profiles covariance matrix having the same namely Σ₁ =Σ₂.Therefore, can be by above formula abbreviationIn the mistake of iteration each time Cheng Zhong, the mapping learnt can be constantly updated, at this point, the KL divergence that optimizes in the new feature space for passing through mapping measures two The similarity of sample, is expressed asWherein, K_A(P₁||P₂) indicate sample This P₁With sample P₂Between optimization KL divergence measurement, Σ indicates that covariance matrix, μ indicate mean vector, and A learnt Linear Mapping.

It is flowed as shown in figure 3, problem is modeled as one by the data sorting system based on the optimization of KL divergence in positive definite matrix group Minimization problem in shape.In specific execute, between any two excellent of all samples in training data has been calculated in step 6 first KL divergence (apply Linear Mapping A) after change, and different groups is put into according to whether classification is identical.First item in objective function For the sum of the KL divergence of all samples from the same category, the Section 2 in objective function is all from different classes of sample This sum of KL divergence.Herein and subsequent mentioned all KL divergences be optimization after KL divergence.

In addition, the data sorting system based on the optimization of KL divergence is not using hinge unlike existing homogeneous system Chain loss function (hinge-loss function), but use a new positive parameter γ balance homogeneous data with not Homogeneous data bring influences.According to the characteristic distributions of data set, in the present embodiment, parameter γ is arranged toWherein,It is the average KL divergence of entire training dataset.With homogeneous data and inhomogeneity data Between distance increase, that is, the distance between inhomogeneity data becomes increasingly remoter, the entropy of system can be because data become uniform And become very big, at this time parameter γ can be intended to 1 (Become very big,It is intended to 0, so 1) γ can be intended to.Therefore, it repairs Positive ternary constraint, i.e. γ can interpretive classification well feature.In addition to this, the λ in objective function is one for putting down The parameter of weighing apparatus loss item and regularization term, value is between 0 to 1.The quantity of n representative sample.K_A(θ_i||θ_j) indicate sample θ_iWith Sample θ_jBetween mapping under KL divergence measurement.

Other than the constraint of modified ternary, the data sorting system based on the optimization of KL divergence uses a new regularization Prevent over-fitting.Over-fitting is betided often in such data sorting system, especially frequently in higher-dimension Hair.To keep the local topology under the input space, the data sorting system based on the optimization of KL divergence designs one and is based on office Portion's topological structure, the regularizer characterized by the local neighbor for meeting local slickness, and the regularizer is added to target In function, namelyWhereinβ_iBelong to positive real number domain, actually Indicate input X_iDensity function p (X_i).Data sorting system based on the optimization of KL divergence utilizes Parzen window method (core Density estimation) carry out density function estimation p (X_i)。

Wherein k_hIt is a gaussian kernel function.What h was indicated is the width of core.The width of core controls the shadow of sample spacing It rings.H is typically set to 0.4.N_iIt is neighbour's indexed set of centronucleus xi.N_iLength take 3.

S_ijWhat is indicated is the similitude between two samples, is calculated using gaussian kernel function,Wherein, σ=minD+1/v (maxD-minD), max D and min D are respectively indicated in all samples KL divergence minimum and maximum between any two.V is control parameter, is set as 10 in the present system.D_ijIndicate sample i and sample j Between KL divergence.Pay attention to S here_ijIt is calculated using the initial KL divergence of training data, and is no longer updated later. Namely S_ijIn the original feature space of data.K_A(θ_i||θ_j) indicate sample θ_iWith sample θ_jBetween mapping under KL divergence degree Amount.

In addition to this, in order to solve the minimum optimization problem in manifold, the present invention is not declined using traditional gradient Method, but devise the gradient descent algorithm accumulate in one kind.Such as Fig. 3, this method is arrived by the gradient projection of objective function After the tangent line space of the same manifold, the SPD matrix (symmetric positive definite matrix) of an affine constant Riemann metric is being given The decline of Riemann's gradient is executed in manifold, can be retained in the iterative process each time of optimization after symmetrization and be learnt linearly The manifold structure of mapping, thereby it is ensured that the symmetric positive definite of the optimal KL divergence measurement learnt.The gradient decline inside accumulate Optimization algorithm formula isWherein Exp expression is using natural constant e as the exponential function at bottom, f (A_t) indicate Linear Mapping iteration t times after objective function,Table Show corresponding gradient, α indicates learning rate.

Continuous repeated characteristic mapping process, multiple view Sample Similarity calculating process and optimization process until convergence, this When Linear Mapping A be the optimal linear mapping that learns.

Original data characteristics is mapped to new feature using the optimum linear mapping learnt in step 7 by step 8 Space, and classified with using k nearest neighbor (KNN) classifier to test set sample in new feature space；

In categorization module, the data sorting system based on the optimization of KL divergence uses k nearest neighbor algorithm.Test set is calculated first Each of each of sample and training set sample KL divergence.Safeguard a size be k by distance by greatly to Small priority query, for storing arest neighbors training tuple.At random from training tuple in choose k tuple as initially most Neighbour's tuple calculates separately test tuple to the distance of this k tuple, will train first deck label and apart from deposit priority team Column.If most of categories in the k in feature space most like samples (i.e. closest in feature space) of a sample In some classification, then determine that the sample also belongs to this classification.

In the embodiment, the data sorting system based on the optimization of KL divergence is in two generic task of 3D object identification and text categories On all carried out example test, as a result as described in Figure 4, based on KL divergence optimization data sorting system classification accuracy on want Higher than existing system.

As Figure 4-Figure 6, abscissa indicates respectively using 20%, 30%, 40%, 50% data in original data set as instruction Practice data, ordinate indicates to choose corresponding nicety of grading when the training data of different proportion.

Categorizing system of the present invention is shown in FIG as KLD-M, by image it can be seen that Lai data with other eight kinds of mainstreams Categorizing system is compared, and the data sorting system based on the optimization of KL divergence is higher than existing system in classification accuracy.Wherein eight The data sorting system of kind mainstream is " Partial Least Squares covariance differentiates study (CDL_PLS) ", " Pasteur's distance respectively (BD) ", " Fisher face covariance differentiates study (CDL_LDA) ", " manifold discriminant analysis (MDA) ", " projective measurement Habit (PML) ", " Log-Euclidean metric learning (LEML) ", " removing soil distance (EMD) ", " traditional KL divergence (KLD) ".

As described in Figure 5, abscissa indicates the variation range of λ, and ordinate indicates to choose corresponding classification when different λ values Precision.From the figure, it can be seen that the data sorting system based on the optimization of KL divergence is still able to maintain preferable property when Parameters variation It can, it is shown that the robustness of the data sorting system based on the optimization of KL divergence.

It should be pointed out that for those skilled in the art, without departing from the principle of the present invention, Can be with several improvements and modifications are made to the present invention, these improvement and modification also fall into the protection scope of the claims in the present invention It is interior.

Claims

1. a kind of data sorting system based on the optimization of KL divergence, comprising: characteristic extracting module, feature whitening module, multiple view Feature modeling module, training data selecting module, Feature Mapping module, multiple view Sample Similarity computing module, based on KL dissipate Categorization module based on KL divergence measurement under the optimization module of degree, optimal linear mapping, it is characterised in that:

Characteristic extracting module is used to extract the multiple view feature of initial data from including original image, text data；

Feature whitening module projects the multiple view feature extracted in characteristic extracting module unification to the same lower dimensional space Whitening processing made to feature later, reduces the redundancy for the multiple view feature that initial data extracts, is removed between different sample characteristics Correlation, then will remap back original space by transformed data；

Training data selecting module from the training data for have label for selecting a certain amount of triple to carry out model training；

After selecting training data, Feature Mapping module generates projection matrix, and original data characteristics is mapped to a new spy Space is levied, in new feature space, the distance between similar sample becomes smaller, and the distance between inhomogeneity sample becomes larger；Then, Multiple view Sample Similarity computing module measures multiple view sample by the KL divergence of the calculation optimization in new feature space Similitude；

Based on the optimization module of KL divergence, for the optimization problem of KL divergence to be modeled as one in positive definite matrix group's manifold Minimization problem；Not using Feature Mapping module, multiple view data similarity calculation module and optimization module based on KL divergence Disconnected ground repetition training model, until convergence, to learn optimal linear mapping out；

Categorization module based on KL divergence measurement under optimal linear mapping, for will be original using the optimal linear mapping learnt Data characteristics be mapped to new feature space, based on the KL divergence between test set and training set, using k nearest neighbor classifier pair Test set sample is classified.

2. the data sorting system according to claim 1 based on the optimization of KL divergence, it is characterised in that: by each sample It is modeled as Gaussian Profile, and assumes two Gaussian Profile covariance matrixes having the same；For each sample, with Gauss point The mean vector and covariance matrix of cloth characterizes multiple view feature.

3. the data sorting system according to claim 1 based on the optimization of KL divergence, it is characterised in that: include in triple Belong to the sample of similar object and belongs to the sample of different type objects.

4. the data classification method realized using the data system described in claim 1 based on the optimization of KL divergence, feature are existed In: the data classification method includes:

Step 1 carries out feature extraction to initial data, i.e., from original including extracting initial data in image, text data Multiple view feature；

Step 2, whitening processing is carried out to the feature that extracts will be more that is, after extracting multiple view feature in initial data Whitening processing is made to feature after the unified projection to the same lower dimensional space of view feature, reduces more views that initial data extracts The redundancy of figure feature removes the correlation between different sample characteristics, then will remap back originally by transformed data Space；

Step 3 by processed multiple view feature modeling and characterizes；

Step 4 selects a certain amount of triple as training data from the training data for have label, and training data is distributed as Feature distribution made of sample in step 3；

Step 5, using the triple selected as training data, carry out Feature Mapping, i.e., apply one on all mean vectors Original data characteristics is mapped to a new feature space by a Linear Mapping, in new feature space, similar sample it Between distance become smaller, the distance between inhomogeneity sample becomes larger；

Similarity between step 6, calculating multiple view sample, i.e., dissipated by the KL of the calculation optimization in the new feature space of mapping It spends to measure the similitude of multiple view sample；

Step 7, using the triple selected as training data, optimized based on KL divergence；Utilize Feature Mapping module, more Viewdata similarity calculation module and optimization module based on KL divergence are repeated continuously training pattern, until convergence, thus Learn optimal linear mapping out；

Original data characteristics is mapped to new feature sky using the optimum linear mapping learnt in step 7 by step 8 Between, and classified using the k nearest neighbor classifier based on KL divergence to test set sample in new feature space.

5. data classification method according to claim 5, it is characterised in that: in step 7, balanced using positive parameter γ Homogeneous data and inhomogeneity data bring influence；And parameter γ is arranged toWherein,It is entire The average KL divergence of training dataset.

6. data classification method according to claim 5, it is characterised in that: in step 7, the gradient of objective function is thrown After shadow to the tangent line space of the same manifold, give an affine constant Riemann metric symmetric positive definite matrix) manifold on The decline of Riemann's gradient is executed, retains the manifold knot of learnt Linear Mapping after symmetrization in the iterative process each time of optimization Structure.

7. data classification method according to claim 5, it is characterised in that: in step 8, calculate each of test set The KL divergence of each of sample and training set sample；Safeguard that a size is k by apart from descending priority team Column, for storing arest neighbors training tuple；K tuple is chosen from training tuple at random as initial arest neighbors tuple, is divided Tuple Ji Suan not tested to the distance of the k tuple, first deck label will be trained and apart from deposit priority query.