CN109615014A - A kind of data sorting system and method based on the optimization of KL divergence - Google Patents

A kind of data sorting system and method based on the optimization of KL divergence Download PDF

Info

Publication number
CN109615014A
CN109615014A CN201811540690.4A CN201811540690A CN109615014A CN 109615014 A CN109615014 A CN 109615014A CN 201811540690 A CN201811540690 A CN 201811540690A CN 109615014 A CN109615014 A CN 109615014A
Authority
CN
China
Prior art keywords
data
feature
divergence
sample
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811540690.4A
Other languages
Chinese (zh)
Other versions
CN109615014B (en
Inventor
高跃
吉书仪
赵曦滨
黄晋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201811540690.4A priority Critical patent/CN109615014B/en
Publication of CN109615014A publication Critical patent/CN109615014A/en
Application granted granted Critical
Publication of CN109615014B publication Critical patent/CN109615014B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a kind of methods for being optimized based on KL divergence and being classified to data: the data such as original image, text being carried out data prediction, by object modeling at multiple dimensional distribution;A certain amount of triple is selected to carry out model training from the training data for have label;Using selected triple as training data, apply a Linear Mapping A on all mean vectors and optimal Linear Mapping is learnt by iteration optimization, basic assumption of the learning process based on metric learning, i.e., the distance between similar sample become smaller, and the distance between inhomogeneity sample becomes larger;It is optimized using the gradient descent algorithm accumulate in one kind, after the tangent line space of the gradient projection of objective function to the same manifold, the decline of Riemann's gradient is executed in the manifold of the SPD matrix of a given affine constant Riemann metric;The KL divergence between test set and training set is calculated, is classified using k nearest neighbor (KNN) classifier to sample.This method can effectively improve the nicety of grading of system, and possess more stable performance.

Description

A kind of data sorting system and method based on the optimization of KL divergence
Technical field
The invention belongs to machine learning fields, carry out categorizing system to data based on the optimization of KL divergence more particularly to one kind With method.
Background technique
With the development of information technology, Data Classification Technology increasingly becomes the research hotspot of academia and industry.Number Refer under given classification system according to classification, the process of data category, Data Classification Technology packet are automatically determined according to data content Include various applications, such as picture classification, text classification, Classification of Speech etc..One good classifier is conducive to logarithm It is applied according to more later periods are done.For example, can classify automatically in text filtering, Web document after carrying out preliminary text classification, The multiple fields such as the organization and management of digital library, semanteme of word discrimination and document are applied.
In machine learning, data object is usually modeled as multiple dimensional distribution to characterize their feature.Therefore, in data During classification, how to measure the similitude between two data distributions becomes key problem in classification task.Two Similarity between sample is higher, then they have bigger probability to belong to same class.Common probability distribution is measured Jensen-Shannon divergence, the Earth Mover ' s Distance (EMD), Maximum Mean Discrepancy etc. Deng.In these measurements, Kullback-Leibler divergence (KL divergence), also known as relative entropy are the most commonly used to be used to spend One of the measurement for measuring similitude between two probability distribution is widely used in multiple fields, such as computer vision, mode are known Not etc..The expression of KL divergence is fitted the information loss generated when true distribution P with probability distribution Q, therefore can preferably spend Measure the similitude between data distribution.
However in real world, the source of data is extremely complex, and the quality of data is also unable to get guarantee, in collection In data set, it is understood that there may be noise data, missing data, data distribution imbalance, data point, which mix, is difficult to the problems such as differentiating.? Under such circumstances, traditional KL divergence is in the similitude being difficult between two data samples of precisive under theorem in Euclid space, into And the accuracy rate of data classification can be had a huge impact.In other words, traditional KL divergence can not be carried out always optimal Data indicate.
The existing research for KL divergence has focused largely on two aspects, first is that directly using KL divergence as multiple dimensional distribution Between estimate, such as variation self-encoding encoder etc., second is that by the way that the approximate of KL divergence is obtained one and is more effectively estimated, example Such as pass through the methods of the variation upper bound, Monte Carlo approximation approximation.It can be seen that KL divergence is directly applied in existing research mostly, And the optimization of few research concern KL divergences itself.
Summary of the invention
It is an object of the invention to propose a kind of data sorting system and method based on the optimization of KL divergence, the system and side Method is optimized for traditional KL divergence, learns the best expression an of data, can effectively improve point of existed system Class ability, and have more stable performance.
The technical solution of the present invention is to provide a kind of data sorting systems based on the optimization of KL divergence, comprising: feature extraction Module, feature whitening module, multiple view feature modeling module, training data selecting module, Feature Mapping module, multiple view sample Similarity calculation module, the optimization module based on KL divergence, the categorization module based on KL divergence measurement under optimal linear mapping, It is characterized in that:
The multiple view that characteristic extracting module is used to extract initial data from including original image, text data is special Sign;
Feature whitening module projects the multiple view feature extracted in characteristic extracting module unification to the same low-dimensional Whitening processing is made to feature after space, reduces the redundancy for the multiple view feature that initial data extracts, it is special to remove different samples Correlation between sign, then will remap back original space by transformed data;
Multiple view feature modeling module is used for the processed multiple view feature modeling of feature whitening module and characterizes;
Training data selecting module from the training data for have label for selecting a certain amount of triple to carry out model instruction Practice;
After selecting training data, Feature Mapping module generates projection matrix, and original data characteristics is mapped to one newly Feature space, in new feature space, the distance between similar sample becomes smaller, and the distance between inhomogeneity sample becomes larger; Then, multiple view Sample Similarity computing module measures multiple view by the KL divergence of the calculation optimization in new feature space The similitude of sample;
Based on the optimization module of KL divergence, for the optimization problem of KL divergence to be modeled as one in positive definite matrix group's manifold On minimization problem;Using Feature Mapping module, multiple view data similarity calculation module and based on the optimization mould of KL divergence Block is repeated continuously training pattern, until convergence, to learn optimal linear mapping out;
Categorization module based on KL divergence measurement under optimal linear mapping, for being incited somebody to action using the optimal linear mapping learnt Original data characteristics is mapped to new feature space, based on the KL divergence between test set and training set, is classified using k nearest neighbor Device classifies to test set sample.
Further, it is Gaussian Profile by each sample, and assumes two Gaussian Profile association sides having the same Poor matrix;For each sample, multiple view feature is characterized with the mean vector of Gaussian Profile and covariance matrix.
Further, including belonging to the sample of similar object and belonging to the sample of different type objects in triple.
The present invention also provides a kind of data classification methods that the data system based on the optimization of KL divergence is realized, comprising:
Step 1 carries out feature extraction to initial data, i.e., from it is original include image, extract in text data it is original The multiple view feature of data;
Step 2 carries out whitening processing to the feature extracted, i.e., after extracting multiple view feature in initial data, Whitening processing will be made to feature after the unified projection to the same lower dimensional space of multiple view feature, reduces what initial data extracted The redundancy of multiple view feature removes the correlation between different sample characteristics, then the original that will remap back by transformed data The space come;
Step 3 by processed multiple view feature modeling and characterizes;
Step 4 selects a certain amount of triple as training data, point of training data from the training data for have label Cloth is feature distribution made of sample in step 3;
Step 5, using the triple selected as training data, carry out Feature Mapping, i.e., applied on all mean vectors Add a Linear Mapping, original data characteristics is mapped to a new feature space, in new feature space, similar sample The distance between this becomes smaller, and the distance between inhomogeneity sample becomes larger;
Similarity between step 6, calculating multiple view sample, that is, pass through the calculation optimization in the new feature space of mapping KL divergence measures the similitude of multiple view sample;
Step 7, using the triple selected as training data, optimized based on KL divergence;Utilize Feature Mapping mould Block, multiple view data similarity calculation module and the optimization module based on KL divergence are repeated continuously training pattern, until convergence, To learn optimal linear mapping out;
Original data characteristics is mapped to new feature using the optimum linear mapping learnt in step 7 by step 8 Space, and classified using k nearest neighbor classifier to test set sample in new feature space.
Further, in step 7, homogeneous data and inhomogeneity data bring are balanced using positive parameter γ to be influenced; And parameter γ is arranged toWherein,It is the average KL divergence of entire training dataset.
Further, in step 7, by after the tangent line space of the gradient projection of objective function to the same manifold, Give an affine constant Riemann metric symmetric positive definite matrix) manifold on execute the decline of Riemann's gradient, excellent after symmetrization Retain the manifold structure of learnt Linear Mapping in the iterative process each time changed;
Further, in step 8, the KL for calculating each of test set each of sample and training set sample dissipates Degree;It safeguards that a size is k by apart from descending priority query, trains tuple for storing arest neighbors;At random from K tuple is chosen in training tuple as initial arest neighbors tuple, calculate separately test tuple to the k tuple away from From by the first deck label of training and apart from deposit priority query.
The beneficial effects of the present invention are:
(1) method and system proposed by the present invention can effectively improve the nicety of grading of system, and possess more stable Performance.It can make up for it under reality scene there may be noise data, missing data, data distribution are uneven, data point mixes It is difficult to the problems such as differentiating.
(2) present invention can be suitable for the classification of multiple view data.
(3) this method learns an optimal Linear Mapping from the training data for have label, by original data space It is mapped to a new feature space., can be closer from the distance between similar data sample in new feature space, and It can be farther from the distance between inhomogeneous data sample.Compared with the system of existing same type, the present invention can be effective Ground improves the classification capacity of system, and possesses more stable performance.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention;
Fig. 2 is the schematic diagram of the method for the present invention;
Fig. 3 by the method for the present invention use in the explanation figure of gradient descent algorithm accumulate;
Fig. 4 is that this system applies the nicety of grading comparison result under 3D object recognition task with other systems, tests number According to integrating as NTU16 data set;
Fig. 5 is that this system applies the nicety of grading comparison result under 3D object recognition task with other systems, tests number According to integrating as NTU47 data set;
Fig. 6 is that this system applies the nicety of grading comparison result under text categorization task with other systems, test data Integrate as TWITTER data set;
The nicety of grading variation of system when Fig. 7 is this system Parameters variation.
Specific embodiment
Technical solution of the present invention is described in detail below with reference to attached drawing 1-5.
As shown in Figure 1, this embodiment offers a kind of data sorting systems based on the optimization of KL divergence, comprising: feature mentions Modulus block, feature whitening module, multiple view feature modeling module, training data selecting module, Feature Mapping module, multiple view sample This similarity calculation module, the optimization module based on KL divergence, the categorization module based on KL divergence measurement under optimal linear mapping, Wherein:
Characteristic extracting module is used to extract the multiple view feature of initial data from the data such as original image, text.
Feature whitening module projects the multiple view feature extracted in characteristic extracting module unification to the same low-dimensional Whitening processing is made to feature after space, reduces the redundancy for the multiple view feature that initial data extracts, it is special to remove different samples Correlation between sign, then will remap back original space by transformed data.
Multiple view feature modeling module is used for the processed multiple view feature modeling of feature whitening module and characterizes.
Wherein: each sample is modeled as Gaussian Profile by this system, and assumes that two Gaussian Profiles are having the same Covariance matrix.For each sample, multiple view feature is characterized with the mean vector of Gaussian Profile and covariance matrix.
After the completion of multiple view feature modeling, training data selecting module is used to select from the training data for have label certain The triple of amount carries out model training.
Wherein: including belonging to the sample of similar object and belonging to the sample of different type objects in triple, that is, one The triple of input include a pair of of positive sample to a pair of of negative sample pair.For example, it is assumed that now with two type objects, respectively desk And chair.For desk A, a possible triple can be expressed as desk A, desk B and chair C, desk A and desk B table Show a pair of of positive sample pair, desk A and chair C indicate a pair of of negative sample pair.
After selecting training data, Feature Mapping module generates projection matrix, and original data characteristics is mapped to one newly Feature space, in new feature space, the distance between similar sample becomes smaller, and the distance between inhomogeneity sample becomes larger, So that the data after mapping are easier to classification.
After the completion of training data selection, need to be trained the similarity calculation between data.Multiple view Sample Similarity Computing module measures the similitude of multiple view sample by the KL divergence of the calculation optimization in the new feature space of mapping.
Based on the optimization module of KL divergence, for the optimization problem of KL divergence to be modeled as one in positive definite matrix group's manifold On minimization problem.In order to solve this optimization problem, which uses the gradient decline accumulate in one kind and calculates this system The method of method, i.e. Riemann's gradient decline, and symmetrization improvement has been done to this algorithm.Namely: it willSymmetrically turn toWhereinExp expression is using natural constant e as the exponential function at bottom, f (At) indicate linearly to reflect Objective function after penetrating iteration t times,Indicate corresponding gradient, α indicates learning rate.
In this way, the manifold knot of learnt Linear Mapping can be retained in the iterative process each time of optimization Structure, thereby it is ensured that the symmetric positive definite of the optimal KL divergence measurement learnt.
Constantly using Feature Mapping module, multiple view data similarity calculation module and optimization module based on KL divergence Repetition training model, until convergence, to learn optimal Linear Mapping out.Until convergence, learns optimal Linear Mapping out.
On the whole, this system is using triple selected in training data selecting module as training data, based on pair The optimization of KL divergence learns an optimal linear mapping, so that the data after mapping are easier to classify.
Categorization module based on KL divergence measurement under optimal linear mapping, for using the optimal linear mapping learnt Original data characteristics is mapped to new feature space, based on the KL divergence between test set and training set, using k nearest neighbor (KNN) classifier classifies to test set sample.
Characteristic extracting module is first module of the data sorting system based on the optimization of KL divergence, is mentioned from initial data Take feature.Later, whitening pretreatment is carried out to feature, and carries out multiple view feature modeling.
After the completion of system is to the multiple view feature modeling of sample, the data sorting system based on the optimization of KL divergence is therefrom selected A certain number of triples are as training data.
In training module, the data sorting system based on the optimization of KL divergence is first empty to a new feature by Feature Mapping Between, then using mapping optimization after KL divergence metric calculation multiple view Sample Similarity using the interior gradient descent algorithm accumulate into Row optimization.System constantly repeats this process, until convergence.At this point, system can learn an optimal Linear Mapping.Most Afterwards, classify using categorization module in the new feature space of system in the mapped.
Particularly, optimization of this method based on KL divergence, in addition to this, the embodiment do not use under traditional gradient Drop method optimizes categorizing system, but uses and accumulate gradient descent algorithm in one kind and optimize, and done symmetrical The improvement of change, to guarantee symmetric positive definite of learnt Linear Mapping during each Optimized Iterative.There are also any to be, is based on The data sorting system of KL divergence optimization can be applied to the data classification of multiple view data, and many homogeneous systems can only answer For haplopia diagram data.
The embodiment additionally provides a kind of data classification method based on the optimization of KL divergence, and this method includes,
Step 1 carries out feature extraction to initial data.
Firstly, extracting the multiple view feature of initial data from the data such as original image, text.
By taking 3D object classification as an example, in the step 1, each 3D object is carved by the view of one group of different directions It draws.For each view, one group of convolutional neural networks (CNN) feature is extracted.Other than feature extraction process, view is carried out Cluster therefrom chooses one group of representative view to generate view cluster, removes the view of some possible redundancies.Pass through this Kind method, each object can be characterized by the one group of representative view selected from view cluster.By this Method, we can carry out the feature extraction of multiple view to 3D object classification task.
By taking text classification as an example, in the step 1, each text can use bag-of-words (BOW) word Bag feature characterizes.Later, text distribution can be used as the distribution of text data.Specifically, firstly, first taking out all texts Stop-word in this simultaneously removes, and other all non-stop words are then embedded into a term vector space (word2vec Space in).That is, the vector of each word indicates to learn by one three layers of neural network (i.e. word2vec model) It obtains.Later, each text can obtain normalization bag of words (nBOW) vector, reflect that each non-stop word goes out in text Existing frequency.By this method, we can carry out feature extraction to text categorization task.
It should be noted that system of the invention does not require characteristic extraction procedure and method specifically, this meaning Other features can also use in the system of the present invention, convolutional neural networks feature and normalization bag of words vector characteristics be One example.
Step 2 carries out whitening processing to the feature extracted.
After extracting multiple view feature in initial data, by the unified projection of multiple view feature to the same lower dimensional space Whitening processing made to feature later, reduces the redundancy for the multiple view feature that initial data extracts, is removed between different sample characteristics Correlation, then will remap back original space by transformed data.
By taking 3D object classification as an example, for each view of object A, all views are first attached operation, are used After the unified projection to the same lower dimensional space of PCA method (principal component analysis), whitening processing is made to the feature after projection, finally The different views of transformed data are separated again, switch back to original space.
Step 3 by processed multiple view feature modeling and characterizes.
Each sample is modeled as a Multi-dimensional Gaussian distribution by this system, and it is identical to assume that two Gaussian Profiles have Covariance matrix.For each sample, more views are characterized with the mean vector of the sample Gaussian Profile and covariance matrix Figure feature.For example, in an embodiment of the present invention, each 3D object, every text are modeled as a Multi-dimensional Gaussian distribution.Often A 3D object, every text covariance matrix all having the same.
Step 4 selects a certain amount of triple as training data, point of training data from the training data for have label Cloth is feature distribution made of sample in step 3;
Such as Fig. 2, it is contemplated that (due to selecting training data for triple, complexity is O (n to computed losses3)), therefore, no Need to calculate all triples.The sample that data sorting system based on the optimization of KL divergence concentrates each training data This, chooses kiA sample and k from the same categorygIt is a to be trained from different classes of sample.kiAnd kgIt is all super ginseng Number.
By taking 3D object classification as an example, to each of training set object, k is selectediA and most like the object (namely KL divergence between two samples is the smallest), object and k from the same categorygIt is a from different classes of object, due to subsequent Gradient in optimization process, which calculates, to be updated.
Step 5, using selected triple as training data, Feature Mapping is carried out, that is, in all mean vectors One Linear Mapping A of upper application, is mapped to a new feature space for original data characteristics, in new feature space, The distance between similar sample becomes smaller, and the distance between inhomogeneity sample becomes larger.
Most of homogeneous system existing at present measures two samples by directly calculating the KL divergence between two samples This similarity.In order to preferably distinguish the sample in inhomogeneity, based on KL divergence optimization data sorting system make as Lower improvement: apply a Linear Mapping A on all mean vectors, i.e., by all μiIt is substituted for A μi, by original data Feature Mapping is to a new feature space.Objective function such as Fig. 2.
By taking 3D object classification as an example, the 3D object after Feature Mapping meets θi=g (x;Aμi;Σi) Gaussian Profile, A is institute The Linear Mapping of study, θiIndicate that i-th of object, g indicate Gaussian Profile, Σ indicates that covariance matrix, μ indicate mean vector.
Similarity between step 6, calculating multiple view sample;Pass through the calculation optimization in the new feature space of mapping KL divergence measures the similitude of multiple view sample.Note that the KL divergence of optimization herein is continuous renewal namely multiple view Similarity between sample is to constantly update.
In an embodiment of the present invention, every text is modeled as a Multi-dimensional Gaussian distribution, therefore, between two texts Similarity measured by the KL divergence between them, original KL divergence is expressed as Wherein, DKL(P1||P2) indicate sample P1With sample P2Between KL dissipate Degree, log indicate natural logrithm, the determinant of det representing matrix, and n indicates that the intrinsic dimensionality of sample, tr are the mark of matrix, Σ table Show that covariance matrix, μ indicate mean vector.Method assumes that two Gaussian Profiles covariance matrix having the same namely Σ12.Therefore, can be by above formula abbreviationIn the mistake of iteration each time Cheng Zhong, the mapping learnt can be constantly updated, at this point, the KL divergence that optimizes in the new feature space for passing through mapping measures two The similarity of sample, is expressed asWherein, KA(P1||P2) indicate sample This P1With sample P2Between optimization KL divergence measurement, Σ indicates that covariance matrix, μ indicate mean vector, and A learnt Linear Mapping.
Step 7, using the triple selected as training data, optimized based on KL divergence;Utilize Feature Mapping mould Block, multiple view data similarity calculation module and the optimization module based on KL divergence are repeated continuously training pattern, until convergence, To learn optimal linear mapping out;
It is flowed as shown in figure 3, problem is modeled as one by the data sorting system based on the optimization of KL divergence in positive definite matrix group Minimization problem in shape.In specific execute, between any two excellent of all samples in training data has been calculated in step 6 first KL divergence (apply Linear Mapping A) after change, and different groups is put into according to whether classification is identical.First item in objective function For the sum of the KL divergence of all samples from the same category, the Section 2 in objective function is all from different classes of sample This sum of KL divergence.Herein and subsequent mentioned all KL divergences be optimization after KL divergence.
In addition, the data sorting system based on the optimization of KL divergence is not using hinge unlike existing homogeneous system Chain loss function (hinge-loss function), but use a new positive parameter γ balance homogeneous data with not Homogeneous data bring influences.According to the characteristic distributions of data set, in the present embodiment, parameter γ is arranged toWherein,It is the average KL divergence of entire training dataset.With homogeneous data and inhomogeneity data Between distance increase, that is, the distance between inhomogeneity data becomes increasingly remoter, the entropy of system can be because data become uniform And become very big, at this time parameter γ can be intended to 1 (Become very big,It is intended to 0, so 1) γ can be intended to.Therefore, it repairs Positive ternary constraint, i.e. γ can interpretive classification well feature.In addition to this, the λ in objective function is one for putting down The parameter of weighing apparatus loss item and regularization term, value is between 0 to 1.The quantity of n representative sample.KAi||θj) indicate sample θiWith Sample θjBetween mapping under KL divergence measurement.
Other than the constraint of modified ternary, the data sorting system based on the optimization of KL divergence uses a new regularization Prevent over-fitting.Over-fitting is betided often in such data sorting system, especially frequently in higher-dimension Hair.To keep the local topology under the input space, the data sorting system based on the optimization of KL divergence designs one and is based on office Portion's topological structure, the regularizer characterized by the local neighbor for meeting local slickness, and the regularizer is added to target In function, namelyWhereinβiBelong to positive real number domain, actually Indicate input XiDensity function p (Xi).Data sorting system based on the optimization of KL divergence utilizes Parzen window method (core Density estimation) carry out density function estimation p (Xi)。
Wherein khIt is a gaussian kernel function.What h was indicated is the width of core.The width of core controls the shadow of sample spacing It rings.H is typically set to 0.4.NiIt is neighbour's indexed set of centronucleus xi.NiLength take 3.
SijWhat is indicated is the similitude between two samples, is calculated using gaussian kernel function,Wherein, σ=minD+1/v (maxD-minD), max D and min D are respectively indicated in all samples KL divergence minimum and maximum between any two.V is control parameter, is set as 10 in the present system.DijIndicate sample i and sample j Between KL divergence.Pay attention to S hereijIt is calculated using the initial KL divergence of training data, and is no longer updated later. Namely SijIn the original feature space of data.KAi||θj) indicate sample θiWith sample θjBetween mapping under KL divergence degree Amount.
In addition to this, in order to solve the minimum optimization problem in manifold, the present invention is not declined using traditional gradient Method, but devise the gradient descent algorithm accumulate in one kind.Such as Fig. 3, this method is arrived by the gradient projection of objective function After the tangent line space of the same manifold, the SPD matrix (symmetric positive definite matrix) of an affine constant Riemann metric is being given The decline of Riemann's gradient is executed in manifold, can be retained in the iterative process each time of optimization after symmetrization and be learnt linearly The manifold structure of mapping, thereby it is ensured that the symmetric positive definite of the optimal KL divergence measurement learnt.The gradient decline inside accumulate Optimization algorithm formula isWherein Exp expression is using natural constant e as the exponential function at bottom, f (At) indicate Linear Mapping iteration t times after objective function,Table Show corresponding gradient, α indicates learning rate.
Continuous repeated characteristic mapping process, multiple view Sample Similarity calculating process and optimization process until convergence, this When Linear Mapping A be the optimal linear mapping that learns.
Original data characteristics is mapped to new feature using the optimum linear mapping learnt in step 7 by step 8 Space, and classified with using k nearest neighbor (KNN) classifier to test set sample in new feature space;
In categorization module, the data sorting system based on the optimization of KL divergence uses k nearest neighbor algorithm.Test set is calculated first Each of each of sample and training set sample KL divergence.Safeguard a size be k by distance by greatly to Small priority query, for storing arest neighbors training tuple.At random from training tuple in choose k tuple as initially most Neighbour's tuple calculates separately test tuple to the distance of this k tuple, will train first deck label and apart from deposit priority team Column.If most of categories in the k in feature space most like samples (i.e. closest in feature space) of a sample In some classification, then determine that the sample also belongs to this classification.
In the embodiment, the data sorting system based on the optimization of KL divergence is in two generic task of 3D object identification and text categories On all carried out example test, as a result as described in Figure 4, based on KL divergence optimization data sorting system classification accuracy on want Higher than existing system.
As Figure 4-Figure 6, abscissa indicates respectively using 20%, 30%, 40%, 50% data in original data set as instruction Practice data, ordinate indicates to choose corresponding nicety of grading when the training data of different proportion.
Categorizing system of the present invention is shown in FIG as KLD-M, by image it can be seen that Lai data with other eight kinds of mainstreams Categorizing system is compared, and the data sorting system based on the optimization of KL divergence is higher than existing system in classification accuracy.Wherein eight The data sorting system of kind mainstream is " Partial Least Squares covariance differentiates study (CDL_PLS) ", " Pasteur's distance respectively (BD) ", " Fisher face covariance differentiates study (CDL_LDA) ", " manifold discriminant analysis (MDA) ", " projective measurement Habit (PML) ", " Log-Euclidean metric learning (LEML) ", " removing soil distance (EMD) ", " traditional KL divergence (KLD) ".
As described in Figure 5, abscissa indicates the variation range of λ, and ordinate indicates to choose corresponding classification when different λ values Precision.From the figure, it can be seen that the data sorting system based on the optimization of KL divergence is still able to maintain preferable property when Parameters variation It can, it is shown that the robustness of the data sorting system based on the optimization of KL divergence.
It should be pointed out that for those skilled in the art, without departing from the principle of the present invention, Can be with several improvements and modifications are made to the present invention, these improvement and modification also fall into the protection scope of the claims in the present invention It is interior.

Claims (7)

1. a kind of data sorting system based on the optimization of KL divergence, comprising: characteristic extracting module, feature whitening module, multiple view Feature modeling module, training data selecting module, Feature Mapping module, multiple view Sample Similarity computing module, based on KL dissipate Categorization module based on KL divergence measurement under the optimization module of degree, optimal linear mapping, it is characterised in that:
Characteristic extracting module is used to extract the multiple view feature of initial data from including original image, text data;
Feature whitening module projects the multiple view feature extracted in characteristic extracting module unification to the same lower dimensional space Whitening processing made to feature later, reduces the redundancy for the multiple view feature that initial data extracts, is removed between different sample characteristics Correlation, then will remap back original space by transformed data;
Multiple view feature modeling module is used for the processed multiple view feature modeling of feature whitening module and characterizes;
Training data selecting module from the training data for have label for selecting a certain amount of triple to carry out model training;
After selecting training data, Feature Mapping module generates projection matrix, and original data characteristics is mapped to a new spy Space is levied, in new feature space, the distance between similar sample becomes smaller, and the distance between inhomogeneity sample becomes larger;Then, Multiple view Sample Similarity computing module measures multiple view sample by the KL divergence of the calculation optimization in new feature space Similitude;
Based on the optimization module of KL divergence, for the optimization problem of KL divergence to be modeled as one in positive definite matrix group's manifold Minimization problem;Not using Feature Mapping module, multiple view data similarity calculation module and optimization module based on KL divergence Disconnected ground repetition training model, until convergence, to learn optimal linear mapping out;
Categorization module based on KL divergence measurement under optimal linear mapping, for will be original using the optimal linear mapping learnt Data characteristics be mapped to new feature space, based on the KL divergence between test set and training set, using k nearest neighbor classifier pair Test set sample is classified.
2. the data sorting system according to claim 1 based on the optimization of KL divergence, it is characterised in that: by each sample It is modeled as Gaussian Profile, and assumes two Gaussian Profile covariance matrixes having the same;For each sample, with Gauss point The mean vector and covariance matrix of cloth characterizes multiple view feature.
3. the data sorting system according to claim 1 based on the optimization of KL divergence, it is characterised in that: include in triple Belong to the sample of similar object and belongs to the sample of different type objects.
4. the data classification method realized using the data system described in claim 1 based on the optimization of KL divergence, feature are existed In: the data classification method includes:
Step 1 carries out feature extraction to initial data, i.e., from original including extracting initial data in image, text data Multiple view feature;
Step 2, whitening processing is carried out to the feature that extracts will be more that is, after extracting multiple view feature in initial data Whitening processing is made to feature after the unified projection to the same lower dimensional space of view feature, reduces more views that initial data extracts The redundancy of figure feature removes the correlation between different sample characteristics, then will remap back originally by transformed data Space;
Step 3 by processed multiple view feature modeling and characterizes;
Step 4 selects a certain amount of triple as training data from the training data for have label, and training data is distributed as Feature distribution made of sample in step 3;
Step 5, using the triple selected as training data, carry out Feature Mapping, i.e., apply one on all mean vectors Original data characteristics is mapped to a new feature space by a Linear Mapping, in new feature space, similar sample it Between distance become smaller, the distance between inhomogeneity sample becomes larger;
Similarity between step 6, calculating multiple view sample, i.e., dissipated by the KL of the calculation optimization in the new feature space of mapping It spends to measure the similitude of multiple view sample;
Step 7, using the triple selected as training data, optimized based on KL divergence;Utilize Feature Mapping module, more Viewdata similarity calculation module and optimization module based on KL divergence are repeated continuously training pattern, until convergence, thus Learn optimal linear mapping out;
Original data characteristics is mapped to new feature sky using the optimum linear mapping learnt in step 7 by step 8 Between, and classified using the k nearest neighbor classifier based on KL divergence to test set sample in new feature space.
5. data classification method according to claim 5, it is characterised in that: in step 7, balanced using positive parameter γ Homogeneous data and inhomogeneity data bring influence;And parameter γ is arranged toWherein,It is entire The average KL divergence of training dataset.
6. data classification method according to claim 5, it is characterised in that: in step 7, the gradient of objective function is thrown After shadow to the tangent line space of the same manifold, give an affine constant Riemann metric symmetric positive definite matrix) manifold on The decline of Riemann's gradient is executed, retains the manifold knot of learnt Linear Mapping after symmetrization in the iterative process each time of optimization Structure.
7. data classification method according to claim 5, it is characterised in that: in step 8, calculate each of test set The KL divergence of each of sample and training set sample;Safeguard that a size is k by apart from descending priority team Column, for storing arest neighbors training tuple;K tuple is chosen from training tuple at random as initial arest neighbors tuple, is divided Tuple Ji Suan not tested to the distance of the k tuple, first deck label will be trained and apart from deposit priority query.
CN201811540690.4A 2018-12-17 2018-12-17 KL divergence optimization-based 3D object data classification system and method Active CN109615014B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811540690.4A CN109615014B (en) 2018-12-17 2018-12-17 KL divergence optimization-based 3D object data classification system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811540690.4A CN109615014B (en) 2018-12-17 2018-12-17 KL divergence optimization-based 3D object data classification system and method

Publications (2)

Publication Number Publication Date
CN109615014A true CN109615014A (en) 2019-04-12
CN109615014B CN109615014B (en) 2023-08-22

Family

ID=66009466

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811540690.4A Active CN109615014B (en) 2018-12-17 2018-12-17 KL divergence optimization-based 3D object data classification system and method

Country Status (1)

Country Link
CN (1) CN109615014B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110118657A (en) * 2019-06-21 2019-08-13 杭州安脉盛智能技术有限公司 Based on relative entropy and K nearest neighbor algorithm Fault Diagnosis of Roller Bearings and system
CN110223275A (en) * 2019-05-28 2019-09-10 陕西师范大学 A kind of cerebral white matter fiber depth clustering method of task-fMRI guidance
CN111259938A (en) * 2020-01-09 2020-06-09 浙江大学 Manifold learning and gradient lifting model-based image multi-label classification method
CN111738351A (en) * 2020-06-30 2020-10-02 创新奇智(重庆)科技有限公司 Model training method and device, storage medium and electronic equipment
CN112149699A (en) * 2019-06-28 2020-12-29 北京京东尚科信息技术有限公司 Method and device for generating model and method and device for recognizing image
CN112949296A (en) * 2019-12-10 2021-06-11 医渡云(北京)技术有限公司 Riemann space-based word embedding method and device, medium and equipment
CN113095731A (en) * 2021-05-10 2021-07-09 北京人人云图信息技术有限公司 Flight regulation and control method and system based on passenger flow time sequence clustering optimization
CN113298731A (en) * 2021-05-24 2021-08-24 Oppo广东移动通信有限公司 Image color migration method and device, computer readable medium and electronic equipment
CN113655385A (en) * 2021-10-19 2021-11-16 深圳市德兰明海科技有限公司 Lithium battery SOC estimation method and device and computer readable storage medium
CN113688773A (en) * 2021-09-03 2021-11-23 重庆大学 Storage tank dome displacement data restoration method and device based on deep learning
CN113887661A (en) * 2021-10-25 2022-01-04 济南大学 Image set classification method and system based on representation learning reconstruction residual analysis
CN114662620A (en) * 2022-05-24 2022-06-24 岚图汽车科技有限公司 Automobile endurance load data processing method and device for market users
CN114882262A (en) * 2022-05-07 2022-08-09 四川大学 Multi-view clustering method and system based on topological manifold
CN116687406A (en) * 2023-05-06 2023-09-05 粤港澳大湾区精准医学研究院(广州) Emotion recognition method and device, electronic equipment and storage medium
CN112949296B (en) * 2019-12-10 2024-05-31 医渡云(北京)技术有限公司 Word embedding method and device based on Riemann space, medium and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090171870A1 (en) * 2007-12-31 2009-07-02 Yahoo! Inc. System and method of feature selection for text classification using subspace sampling
US20130064444A1 (en) * 2011-09-12 2013-03-14 Xerox Corporation Document classification using multiple views
CN105574548A (en) * 2015-12-23 2016-05-11 北京化工大学 Hyperspectral data dimensionality-reduction method based on sparse and low-rank representation graph
CN106126474A (en) * 2016-04-13 2016-11-16 扬州大学 A kind of linear classification method embedded based on local spline
CN106951914A (en) * 2017-02-22 2017-07-14 江苏大学 The Electronic Nose that a kind of Optimization of Fuzzy discriminant vectorses are extracted differentiates vinegar kind method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090171870A1 (en) * 2007-12-31 2009-07-02 Yahoo! Inc. System and method of feature selection for text classification using subspace sampling
US20130064444A1 (en) * 2011-09-12 2013-03-14 Xerox Corporation Document classification using multiple views
CN105574548A (en) * 2015-12-23 2016-05-11 北京化工大学 Hyperspectral data dimensionality-reduction method based on sparse and low-rank representation graph
CN106126474A (en) * 2016-04-13 2016-11-16 扬州大学 A kind of linear classification method embedded based on local spline
CN106951914A (en) * 2017-02-22 2017-07-14 江苏大学 The Electronic Nose that a kind of Optimization of Fuzzy discriminant vectorses are extracted differentiates vinegar kind method

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110223275A (en) * 2019-05-28 2019-09-10 陕西师范大学 A kind of cerebral white matter fiber depth clustering method of task-fMRI guidance
CN110118657B (en) * 2019-06-21 2021-06-11 杭州安脉盛智能技术有限公司 Rolling bearing fault diagnosis method and system based on relative entropy and K nearest neighbor algorithm
CN110118657A (en) * 2019-06-21 2019-08-13 杭州安脉盛智能技术有限公司 Based on relative entropy and K nearest neighbor algorithm Fault Diagnosis of Roller Bearings and system
CN112149699B (en) * 2019-06-28 2023-09-05 北京京东尚科信息技术有限公司 Method and device for generating model and method and device for identifying image
CN112149699A (en) * 2019-06-28 2020-12-29 北京京东尚科信息技术有限公司 Method and device for generating model and method and device for recognizing image
CN112949296B (en) * 2019-12-10 2024-05-31 医渡云(北京)技术有限公司 Word embedding method and device based on Riemann space, medium and equipment
CN112949296A (en) * 2019-12-10 2021-06-11 医渡云(北京)技术有限公司 Riemann space-based word embedding method and device, medium and equipment
CN111259938B (en) * 2020-01-09 2022-04-12 浙江大学 Manifold learning and gradient lifting model-based image multi-label classification method
CN111259938A (en) * 2020-01-09 2020-06-09 浙江大学 Manifold learning and gradient lifting model-based image multi-label classification method
CN111738351A (en) * 2020-06-30 2020-10-02 创新奇智(重庆)科技有限公司 Model training method and device, storage medium and electronic equipment
CN111738351B (en) * 2020-06-30 2023-12-19 创新奇智(重庆)科技有限公司 Model training method and device, storage medium and electronic equipment
CN113095731A (en) * 2021-05-10 2021-07-09 北京人人云图信息技术有限公司 Flight regulation and control method and system based on passenger flow time sequence clustering optimization
CN113298731A (en) * 2021-05-24 2021-08-24 Oppo广东移动通信有限公司 Image color migration method and device, computer readable medium and electronic equipment
CN113688773A (en) * 2021-09-03 2021-11-23 重庆大学 Storage tank dome displacement data restoration method and device based on deep learning
CN113688773B (en) * 2021-09-03 2023-09-26 重庆大学 Storage tank dome displacement data restoration method and device based on deep learning
CN113655385B (en) * 2021-10-19 2022-02-08 深圳市德兰明海科技有限公司 Lithium battery SOC estimation method and device and computer readable storage medium
CN113655385A (en) * 2021-10-19 2021-11-16 深圳市德兰明海科技有限公司 Lithium battery SOC estimation method and device and computer readable storage medium
CN113887661A (en) * 2021-10-25 2022-01-04 济南大学 Image set classification method and system based on representation learning reconstruction residual analysis
CN114882262A (en) * 2022-05-07 2022-08-09 四川大学 Multi-view clustering method and system based on topological manifold
CN114882262B (en) * 2022-05-07 2024-01-26 四川大学 Multi-view clustering method and system based on topological manifold
CN114662620A (en) * 2022-05-24 2022-06-24 岚图汽车科技有限公司 Automobile endurance load data processing method and device for market users
CN116687406A (en) * 2023-05-06 2023-09-05 粤港澳大湾区精准医学研究院(广州) Emotion recognition method and device, electronic equipment and storage medium
CN116687406B (en) * 2023-05-06 2024-01-02 粤港澳大湾区精准医学研究院(广州) Emotion recognition method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109615014B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN109615014A (en) A kind of data sorting system and method based on the optimization of KL divergence
CN109344736B (en) Static image crowd counting method based on joint learning
CN110309331B (en) Cross-modal deep hash retrieval method based on self-supervision
Ding et al. Unsupervised self-correlated learning smoothy enhanced locality preserving graph convolution embedding clustering for hyperspectral images
CN111079639B (en) Method, device, equipment and storage medium for constructing garbage image classification model
Carrière et al. Stable topological signatures for points on 3d shapes
CN105184298B (en) A kind of image classification method of quick local restriction low-rank coding
Kim et al. Color–texture segmentation using unsupervised graph cuts
CN103988232B (en) Motion manifold is used to improve images match
Xing et al. Pixel-to-pixel learning with weak supervision for single-stage nucleus recognition in ki67 images
CN105354595A (en) Robust visual image classification method and system
CN104281835B (en) Face recognition method based on local sensitive kernel sparse representation
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN111126464A (en) Image classification method based on unsupervised domain confrontation field adaptation
CN110097096A (en) A kind of file classification method based on TF-IDF matrix and capsule network
CN109271546A (en) The foundation of image retrieval Feature Selection Model, Database and search method
Kang et al. Robust visual tracking via nonlocal regularized multi-view sparse representation
CN103955709A (en) Weighted synthetic kernel and triple markov field (TMF) based polarimetric synthetic aperture radar (SAR) image classification method
CN109993208A (en) A kind of clustering processing method having noise image
Das et al. Batch mode active learning on the Riemannian manifold for automated scoring of nuclear pleomorphism in breast cancer
Zhi et al. Gray image segmentation based on fuzzy c-means and artificial bee colony optimization
Pratikakis et al. Partial 3d object retrieval combining local shape descriptors with global fisher vectors
CN107315984A (en) A kind of method and device of pedestrian retrieval
CN104573726B (en) Facial image recognition method based on the quartering and each ingredient reconstructed error optimum combination
CN110097067A (en) It is a kind of based on layer into the Weakly supervised fine granularity image classification method of formula eigentransformation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant