CN109615014B

CN109615014B - KL divergence optimization-based 3D object data classification system and method

Info

Publication number: CN109615014B
Application number: CN201811540690.4A
Authority: CN
Inventors: 高跃; 吉书仪; 赵曦滨; 黄晋
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2023-08-22
Anticipated expiration: 2038-12-17
Also published as: CN109615014A

Abstract

The invention relates to a method for classifying 3D object data based on KL divergence optimization, which comprises the following steps: preprocessing data of original images, texts and the like, and modeling objects into multidimensional distribution; selecting a certain amount of triples from the training data with the labels to perform model training; taking the selected triples as training data, applying a linear mapping A on all mean vectors, and learning the optimal linear mapping through iterative optimization, wherein the learning process is based on the basic assumption of metric learning, namely that the distance between similar samples is reduced, and the distance between different types of samples is increased; optimizing by adopting an internal gradient descent algorithm, projecting the gradient of an objective function to the tangent space of the same manifold, and then executing Riemann gradient descent on the manifold of an SPD matrix given an affine invariant Riemann metric; and calculating KL divergence between the test set and the training set, and classifying the samples by adopting a K Nearest Neighbor (KNN) classifier. The method can effectively improve the classification precision of the system and has more stable performance.

Description

KL divergence optimization-based 3D object data classification system and method

Technical Field

The invention belongs to the field of machine learning, and particularly relates to a system and a method for classifying data based on KL divergence optimization.

Background

With the development of information technology, data classification technology is becoming a research hotspot in academia and industry. Data classification refers to the process of automatically determining data categories based on data content under a given classification hierarchy, and data classification techniques include various applications such as picture classification, text classification, speech classification, and so forth. A good classifier is advantageous for more post-application of the data. For example, after preliminary text classification, the method can be applied to a plurality of fields such as text filtering, automatic classification of Web documents, digital libraries, word semantic analysis, and document organization and management.

In machine learning, data objects are often modeled as multidimensional distributions to characterize their features. Therefore, how to measure the similarity between two data distributions during the data classification becomes a core problem in the classification task. The higher the similarity between two samples, the greater the probability they belong to the same class. Common probability distribution metrics include Jensen-Shannon divergence, the Earth Mover's Distance (EMD), maximum Mean Discrepancy, and the like. Among these metrics, kullback-Leibler divergence (KL divergence), also known as relative entropy, is one of the most commonly used metrics for measuring similarity between two probability distributions, and is widely used in various fields such as computer vision, pattern recognition, and the like. The KL divergence represents the loss of information generated when the probability distribution Q is used to fit the true distribution P, and therefore the similarity between the data distributions can be measured well.

However, in the real world, the source of the data is very complex, the quality of the data cannot be guaranteed, and in the collected data set, there may be problems of noise data, missing data, unbalanced data distribution, mixed data points, and the like, which are difficult to distinguish. Under such circumstances, it is difficult for the conventional KL divergence to precisely measure the similarity between two data samples in the european space, and thus the accuracy of data classification may be greatly affected. In other words, conventional KL divergence does not always allow optimal data representation.

The existing research on the KL divergence is mostly focused on two aspects, namely, the KL divergence is directly used as a measure between multidimensional distributions, such as a variational self-encoder and the like, and a more effective measure is obtained through approximation on the KL divergence, such as approximation through a variational upper bound, a Monte Carlo approximation and the like. It can be seen that most of the existing studies directly apply KL divergence, while few studies focus on the optimization of KL divergence itself.

Disclosure of Invention

The invention aims to provide a data classification system and method based on KL divergence optimization, which optimize the traditional KL divergence, learn the best expression of one data, effectively improve the classification capacity of the existing system and have more stable expression.

The technical scheme of the invention is to provide a data classification system based on KL divergence optimization, which comprises: the multi-view feature extraction system comprises a feature extraction module, a feature whitening module, a multi-view feature modeling module, a training data selection module, a feature mapping module, a multi-view sample similarity calculation module, an optimization module based on KL divergence and a classification module based on KL divergence under optimal linear mapping, and is characterized in that:

the feature extraction module is used for extracting multi-view features of the original data from the original image and text data;

the feature whitening module uniformly projects the multi-view features extracted from the feature extraction module to the same low-dimensional space, performs whitening treatment on the features, reduces redundancy of the multi-view features extracted from the original data, removes correlation among the features of different samples, and reconverts the transformed data back to the original space;

the multi-view feature modeling module is used for modeling and characterizing the multi-view features processed by the feature whitening module;

the training data selection module is used for selecting a certain amount of triples from the labeled training data to perform model training;

after training data is selected, a feature mapping module generates a projection matrix, original data features are mapped to a new feature space, and in the new feature space, the distances between similar samples are reduced, and the distances between different types of samples are increased; then, the multi-view sample similarity calculation module measures the similarity of the multi-view samples by calculating the optimized KL divergence in the new feature space;

an optimization module based on KL divergence, which is used for modeling the optimization problem of KL divergence as a minimization problem on a positive definite matrix group manifold; continuously and repeatedly training the model by utilizing a feature mapping module, a multi-view data similarity calculation module and an optimization module based on KL divergence until convergence, thereby learning the optimal linear mapping;

and the classification module is used for mapping the original data features to a new feature space by using the learned optimal linear mapping based on the KL divergence measurement under the optimal linear mapping, and classifying the test set samples by adopting a K neighbor classifier based on the KL divergence between the test set and the training set.

Further, each sample is modeled as a gaussian distribution, and it is assumed that both gaussian distributions have the same covariance matrix; for each sample, the multi-view feature is characterized by a mean vector and covariance matrix of the gaussian distribution.

Further, the triplets include samples belonging to the same class of objects.

The invention also provides a data classification method realized by the data system based on KL divergence optimization, which comprises the following steps:

step 1, extracting characteristics of original data, namely extracting multi-view characteristics of the original data from the original data comprising images and texts;

step 2, performing whitening treatment on the extracted features, namely uniformly projecting the multi-view features to the same low-dimensional space after extracting the multi-view features from the original data, performing whitening treatment on the features, reducing redundancy of the multi-view features extracted from the original data, removing correlation among the features of different samples, and re-transforming the transformed data back to the original space;

step 3, modeling and characterizing the processed multi-view features;

step 4, selecting a certain amount of triples from the labeled training data as training data, wherein the distribution of the training data is the characteristic distribution formed by modeling the sample in the step 3;

step 5, taking the selected triples as training data, performing feature mapping, namely applying a linear mapping on all mean vectors, mapping the original data features to a new feature space, wherein the distances between similar samples are reduced, and the distances between different types of samples are increased in the new feature space;

step 6, calculating the similarity between the multi-view samples, namely measuring the similarity of the multi-view samples by calculating the optimized KL divergence in the new feature space of the mapping;

step 7, taking the selected triples as training data, and optimizing based on KL divergence; the model is repeatedly trained by utilizing a feature mapping module, a multi-view data similarity calculation module and an optimization module based on KL divergence until convergence, so that the optimal linear mapping is learned;

and 8, mapping the original data features to a new feature space by using the optimal linear mapping learned in the step 7, and classifying the test set samples in the new feature space by adopting a K nearest neighbor classifier.

Further, in step 7, positive parameter γ is used to balance the effects of the same kind of data and different kinds of data; and set the parameter gamma toWherein (1)>Is the average KL divergence of the entire training dataset.

Further, in step 7, after projecting the gradient of the objective function into the tangent space of the same manifold, performing a Riemann gradient descent on the manifold given an affine invariant Riemann metric symmetric positive definite matrix), after symmetrizing, preserving the manifold structure of the learned linear mapping during each iteration of the optimization;

further, in step 8, calculating the KL divergence of each sample in the test set and each sample in the training set; maintaining a priority queue with a size k from large to small according to the distance, and storing nearest neighbor training tuples; and randomly selecting k tuples from the training tuples as initial nearest neighbor tuples, respectively calculating the distances from the test tuple to the k tuples, and storing the training tuple marks and the distances into a priority queue.

The invention has the beneficial effects that:

(1) The method and the system provided by the invention can effectively improve the classification precision of the system and have more stable performance. The problems that noise data, missing data, unbalanced data distribution, mixed data points and the like are difficult to distinguish and the like possibly exist in a real scene can be solved.

(2) The invention can be applied to classification of multi-view data.

(3) The method learns an optimal linear mapping from the labeled training data, mapping the original data space to a new feature space. In the new feature space, the distances between data samples from the same class will be closer, while the distances between data samples from different classes will be further. Compared with the existing systems of the same type, the system and the method can effectively improve the classification capacity of the system and have more stable performance.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of the method of the present invention;

FIG. 3 is an explanatory diagram of the gradient descent algorithm of the intrinsic safety employed in the method of the present invention;

FIG. 4 is a comparison result of classification accuracy of the system with other systems applied to a 3D object recognition task, wherein the test data set is an NTU16 data set;

FIG. 5 is a comparison result of classification accuracy of the system with other systems applied to a 3D object recognition task, wherein the test data set is an NTU47 data set;

FIG. 6 is a comparison result of classification accuracy of the present system with other systems applied in text classification tasks, the test dataset being a TWITTER dataset;

fig. 7 shows the change of the classification accuracy of the system when the parameters of the system are changed.

Detailed Description

The technical scheme of the invention will be described in detail with reference to fig. 1-5.

As shown in fig. 1, this embodiment provides a KL divergence optimization-based data classification system, including: the multi-view feature extraction system comprises a feature extraction module, a feature whitening module, a multi-view feature modeling module, a training data selection module, a feature mapping module, a multi-view sample similarity calculation module, an optimization module based on KL divergence and a classification module based on KL divergence under optimal linear mapping, wherein:

the feature extraction module is used for extracting multi-view features of the original data from the original image, text and other data.

The feature whitening module uniformly projects the multi-view features extracted from the feature extraction module to the same low-dimensional space, performs whitening treatment on the features, reduces redundancy of the multi-view features extracted from the original data, removes correlation among the features of different samples, and reconverts the transformed data back to the original space.

The multi-view feature modeling module is used for modeling and characterizing the multi-view features processed by the feature whitening module.

Wherein: the present system models each sample as a gaussian distribution and assumes that both gaussian distributions have the same covariance matrix. For each sample, the multi-view feature is characterized by a mean vector and covariance matrix of the gaussian distribution.

After the multi-view feature modeling is completed, the training data selection module is used for selecting a certain amount of triples from the labeled training data to perform model training.

Wherein: the triplets include samples belonging to the same class of objects as well as samples belonging to different classes of objects, i.e. one input triplet includes a pair of positive samples and a pair of negative samples. For example, assume that there are now two types of objects, a table and a chair, respectively. For table a, one possible triplet may be represented as table a, table B, and chair C, with table a and table B representing a pair of positive sample pairs and table a and chair C representing a pair of negative sample pairs.

After training data is selected, the feature mapping module generates a projection matrix, original data features are mapped to a new feature space, and in the new feature space, the distances between similar samples are reduced, and the distances between different types of samples are increased, so that the mapped data are easier to classify. .

After the training data is selected, similarity calculation between the training data is needed. The multi-view sample similarity calculation module measures similarity of the multi-view samples by calculating an optimized KL-divergence in the mapped new feature space.

An optimization module based on KL divergence is used for modeling the optimization problem of KL divergence as a minimization problem on a positive definite matrix group manifold. In order to solve the optimization problem, the embodiment adopts an internal gradient descent algorithm, namely a Riemann gradient descent method, and makes symmetrical improvement on the algorithm. Namely: will beSymmetrization of->Wherein the method comprises the steps ofexp represents an exponential function based on a natural constant e, f (A _t ) Representing the objective function after t iterations of the linear mapping, < ->Representing the corresponding gradient, α represents the learning rate.

In this way, the manifold structure of the learned linear mapping can be preserved during each iteration of the optimization, and therefore, the symmetry-normality of the learned optimal KL divergence metric can be ensured.

And continuously and repeatedly training the model by using the feature mapping module, the multi-view data similarity calculation module and the KL divergence-based optimization module until convergence, so as to learn the optimal linear mapping. Until convergence, the optimal linear mapping is learned.

In general, the system takes the triples selected in the training data selection module as training data, and learns an optimal linear mapping based on the optimization of KL divergence, so that the mapped data is easier to classify.

And the classification module is used for mapping the original data features to a new feature space by using the learned optimal linear mapping based on the KL divergence measurement under the optimal linear mapping, and classifying the test set samples by adopting a K Nearest Neighbor (KNN) classifier based on the KL divergence between the test set and the training set.

The feature extraction module is the first module of the data classification system based on KL divergence optimization, and extracts features from the original data. And performing whitening pretreatment on the features, and performing multi-view feature modeling.

After the system models the multi-view characteristics of the samples, the data classification system based on KL divergence optimization selects a certain number of triples as training data.

In the training module, a data classification system based on KL divergence optimization maps features to a new feature space, then the gradient descent algorithm in the multi-view sample similarity calculation is used for optimization by using the KL divergence measurement after mapping optimization. The system repeats this process until convergence. At this point, the system can learn an optimal linear mapping. Finally, the system performs classification in the mapped new feature space by using a classification module.

In particular, this method is based on the optimization of KL divergence, and in addition, the embodiment does not use the traditional gradient descent method to optimize the classification system, but uses an internal gradient descent algorithm to optimize, and makes symmetrical improvement to ensure the symmetry and the normalization of the learned linear mapping in each optimization iteration process. Still another point is that the KL divergence optimization based data classification system can be applied to data classification of multi-view data, while many homogeneous systems can only be applied to single-view data.

The embodiment also provides a data classification method based on KL divergence optimization, which comprises,

and step 1, extracting features of the original data.

First, multi-view features of original data are extracted from the original image, text, etc.

Taking the 3D object classification as an example, in this step 1, each 3D object is depicted by a set of views in different directions. For each view, a set of Convolutional Neural Network (CNN) features is extracted. In addition to the feature extraction process, view clustering is performed to generate a view cluster, and a representative set of views is selected from among the view clusters, removing some of the views that may be redundant. In this way, each object may be characterized by a representative set of views selected from the cluster of views. By this method we can perform multi-view feature extraction for the 3D object classification task.

Taking text classification as an example, in this step 1, each piece of text may be characterized by a bag-of-words (BOW) bag feature. Thereafter, the text distribution may be used as the distribution of the text data. Specifically, first, stop words in all texts are taken out and removed, and then all other non-stop words are embedded into a word vector space (word 2vec space). That is, the vector representation of each word is learned through a three-layer neural network (i.e., word2vec model). Each text then gets a normalized bag of words (nBOW) vector reflecting the frequency of occurrence of each non-stop word in the text. By this method we can perform feature extraction on the text classification task.

It should be noted that the system of the present invention has no particular requirements for the feature extraction process and method, meaning that other features may be used in the system of the present invention, convolutional neural network features and normalized bag-of-word vector features are just one example.

And 2, performing whitening treatment on the extracted features.

After multi-view features are extracted from the original data, the multi-view features are projected to the same low-dimensional space in a unified mode, whitening processing is carried out on the features, redundancy of the multi-view features extracted from the original data is reduced, correlation among the features of different samples is removed, and the transformed data are transformed back to the original space.

Taking 3D object classification as an example, for each view of object a, connecting all views, uniformly projecting the views to the same low-dimensional space by using a PCA method (principal component analysis), performing whitening processing on the projected features, and finally separating different views of the transformed data and transforming the different views back to the original space.

And 3, modeling and characterizing the processed multi-view features.

The present system models each sample as a multi-dimensional gaussian distribution and assumes that both gaussian distributions have the same covariance matrix. For each sample, the multi-view feature is characterized by a mean vector and covariance matrix of the sample gaussian distribution. For example, in an embodiment of the present invention, each 3D object, each piece of text, is modeled as a multi-dimensional gaussian distribution. Each 3D object, each text, has the same covariance matrix.

as in fig. 2, the complexity is 0 (n due to the selection of training data as triples ³ ) And therefore, not all triples need to be computed. The data classification system based on KL divergence optimization selects k for each sample in the training data set _i Samples from the same class and k _g Samples from different classes were trained. k (k) _i And k _g Are super parameters.

Taking 3D object classification as an example, for each object in the training set, select k _i Most similar to the object (i.e. the KL divergence between the two samples is minimal), objects from the same class and k _g The objects from different classes are updated due to gradient calculations in the subsequent optimization process.

And 5, taking the selected triplet as training data to perform feature mapping, namely, applying a linear mapping A on all mean value vectors, mapping the original data features to a new feature space, wherein the distances between the similar samples are smaller and the distances between the different types of samples are larger in the new feature space.

Most of the existing similar systems measure the similarity between two samples by directly calculating the KL divergence between the two samples. To better distinguish samples in different classes, the KL-divergence-optimized data classification system makes the following improvements:applying a linear mapping A on all mean vectors, i.e. all mu _i Replacement with A mu _i The original data features are mapped to a new feature space. The objective function is as in fig. 2.

Taking 3D object classification as an example, the 3D object after feature mapping satisfies θ _i ＝g(x；Aμ _i ；∑ _i ) Gaussian distribution, A is the learned linear mapping, θ _i Representing the ith object, g represents a gaussian distribution, Σ represents a covariance matrix, and μ represents a mean vector.

Step 6, calculating the similarity between the multi-view samples; i.e. the similarity of the multi-view samples is measured by calculating the optimized KL-divergence in the new feature space of the map. Note that the optimized KL divergence here is continuously updated, i.e. the similarity between the multi-view samples is continuously updated.

In embodiments of the present invention, each text is modeled as a multidimensional gaussian distribution, so the similarity between two texts is measured by the KL divergence between them, the original KL divergence being expressed as Wherein D is _KL ( ₁ ||P ₂ ) Representing sample P ₁ And sample P ₂ KL divergence between, log represents natural logarithms, det represents determinant of matrix, n represents characteristic dimension of sample, tr is trace of matrix, Σ represents covariance matrix, μ represents mean vector. The method assumes that the two gaussian distributions have the same covariance matrix, i.e. Σ ₁ ＝Σ ₂ . Therefore, the above formula can be simplified intoDuring each iteration, the learned mapping is updated continuously, at which time the similarity of the two samples is measured by the optimized KL divergence in the new feature space of the mapping, expressed asWherein K is _A ( ₁ ||P ₂ ) Representing sample P ₁ And sample P ₂ The optimized KL divergence measure in between, Σ represents the covariance matrix, μ represents the mean vector, and A is the learned linear mapping.

as shown in fig. 3, the KL divergence optimization based data classification system models the problem as a minimized problem on a positive definite matrix cluster manifold. In a specific implementation, firstly, the optimized KL divergences (linear mapping a is applied) between all samples in the training data in step 6 are calculated, and different groups are put according to whether the categories are the same or not. The first term in the objective function is the sum of all the KL-divergences of samples from the same class and the second term in the objective function is the sum of all the KL-divergences of samples from different classes. All KL divergences mentioned here and later are optimized KL divergences.

In addition, unlike existing similar systems, the KL divergence optimization-based data classification system does not use a hinge loss function (hinge-loss function), but rather uses a new positive parameter γ to balance the effects of similar data and different types of data. According to the distribution characteristics of the data set, in the present embodiment, the parameter γ is set to beWherein (1)>Is the average KL divergence of the entire training dataset. As the distance between the same-type data and different-type data increases, i.e., the distance between the different-type data becomes more and more distant, the entropy of the system becomes large because the data becomes uniform, and the parameter γ tends to be 1 (in +), at this time>Become very big and->Trend 0 so γ would tend to be 1). Thus, the modified ternary constraint, γ, may well describe the classified features. In addition, λ in the objective function is a parameter for balancing the loss term and regularization term, and its value is between 0 and 1. n represents the number of samples. K (K) _A (θ _i ||θ _j ) Representing sample θ _i And sample θ _j KL divergence measure under the mapping between.

In addition to the modified ternary constraint, the KL divergence optimization-based data classification system employs a new regularization term to prevent the overfitting phenomenon. Overfitting often occurs in such data classification systems, particularly in high dimensional situations. To maintain the local topology in the input space, the data classification system based on KL divergence optimization designs a regularization device based on the local topology and characterized by local neighbors meeting the local smoothness, and adds the regularization device into the objective function, namelyWherein->β _i Belonging to the positive real number field, in effect representing the input X _i Density function p (X) _i ). KL divergence optimization-based data classification system estimates density function p (X) using Parzen window method (kernel density estimation) _i )。

Wherein k is _h Is a gaussian kernel function. h represents the width of the kernel. The width of the kernel controls the effect of sample spacing. h is typically set to 0.4.N (N) _i Is the neighbor index set of the central core xi. N (N) _i Taken as 3 in length.

S _ij Representing the similarity between two samples, using a Gaussian kernel functionThe calculation is performed such that,where σ=mind+1/v (maxD-minD), max D and min D represent the maximum and minimum KL divergence between each pair of all samples, respectively. V is a control parameter, and is set to 10 in the present system. D (D) _ij Indicating the KL divergence between samples i and j. Note that here S _ij The initial KL divergence of the training data is used for calculation and no further updates are then made. I.e. S _ij In the original feature space of the data. K (K) _A (θ _i ||θ _j ) Representing sample θ _i And sample θ _j KL divergence measure under the mapping between.

In addition, in order to solve the problem of minimization and optimization on manifold, the invention does not adopt the traditional gradient descent method, but designs an internal gradient descent algorithm. As shown in fig. 3, this method performs the Riemann gradient descent on the manifold given an affine invariant Riemann metric SPD matrix (symmetric positive definite matrix) after projecting the gradient of the objective function into the tangent space of the same manifold, i.e. the manifold structure of the learned linear map can be preserved during each iteration of the optimization after the symmetrization, thus ensuring the symmetric positive nature of the learned optimal KL divergence metric. The gradient descent optimization algorithm formula of the interior is as followsWherein the method comprises the steps ofexp represents an exponential function based on a natural constant e, f (A _t ) Representing the objective function after t iterations of the linear mapping, < ->Representing the corresponding gradient, α represents the learning rate.

And continuously repeating the characteristic mapping process, the multi-view sample similarity calculation process and the optimization process until convergence, wherein the linear mapping A is the learned optimal linear mapping.

Step 8, mapping the original data features to a new feature space by using the optimal linear mapping learned in the step 7, and classifying test set samples in the new feature space by using a K Nearest Neighbor (KNN) classifier;

in the classification module, a data classification system based on KL divergence optimization adopts a K nearest neighbor algorithm. KL divergence of each sample in the test set and each sample in the training set is first calculated. A priority queue of size k from large to small is maintained for storing nearest neighbor training tuples. And randomly selecting k tuples from the training tuples as initial nearest neighbor tuples, respectively calculating the distances from the test tuple to the k tuples, and storing the training tuple marks and the distances into a priority queue. If a sample is most of the k most similar (i.e., nearest neighbor) samples in the feature space that belong to a certain class, then it is determined that the sample also belongs to that class.

In this embodiment, the data classification system based on KL-divergence optimization performs an example test on both the 3D object recognition and text class tasks, and as a result, as shown in fig. 4, the data classification system based on KL-divergence optimization is higher in classification accuracy than the existing system.

As shown in fig. 4 to 6, the abscissa represents 20%, 30%, 40% and 50% of the data in the original data set as training data, and the ordinate represents the corresponding classification accuracy when selecting different proportions of training data.

The data classification system is shown as KLD-M in the figure, and can be seen from images, compared with other eight main stream data classification systems, the data classification system based on KL divergence optimization is higher in classification accuracy than the existing system. The eight main data classification systems are respectively "partial least squares covariance discrimination learning (cdl_pls)", "Barbitten Distance (BD)", "linear discriminant analysis covariance discrimination learning (cdl_lda)", "Manifold Discriminant Analysis (MDA)", "Projection Metric Learning (PML)", "Log-Euclidean metric learning (LEML)", "soil carrying distance (EMD)", and "conventional KL divergence (KLD)".

As illustrated in fig. 5, the abscissa represents the variation range of λ, and the ordinate represents the corresponding classification accuracy when different λ values are selected. It can be seen from the graph that when the parameters are changed, the data classification system based on the KL divergence optimization can still maintain good performance, and the robustness of the data classification system based on the KL divergence optimization is shown.

It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. A KL divergence optimization-based 3D object data classification system, comprising: the multi-view feature extraction system comprises a feature extraction module, a feature whitening module, a multi-view feature modeling module, a training data selection module, a feature mapping module, a multi-view sample similarity calculation module, an optimization module based on KL divergence and a classification module based on KL divergence under optimal linear mapping, and is characterized in that:

2. The KL-divergence-optimized 3D object data classification system as recited in claim 1, wherein: modeling each sample as a gaussian distribution, and assuming that both gaussian distributions have the same covariance matrix; for each sample, the multi-view feature is characterized by a mean vector and covariance matrix of the gaussian distribution.

3. The KL-divergence-optimized 3D object data classification system as recited in claim 1, wherein: the triplets include samples belonging to the same class of objects and samples belonging to different classes of objects.

4. A 3D object data classification method implemented by the KL-divergence-optimization-based 3D object data classification system of claim 1, characterized in that: the data classification method comprises the following steps:

step 3, modeling and characterizing the processed multi-view features;

step 4, selecting a certain amount of triples from the labeled training data as training data, wherein the distribution of the training data is the characteristic distribution formed by modeling the multi-view characteristics in the step 3;

and 8, mapping the original data features to a new feature space by using the optimal linear mapping learned in the step 7, and classifying the test set samples in the new feature space by adopting a K nearest neighbor classifier based on KL divergence.

5. The 3D object data classification method of claim 4, wherein: in step 7, positive parameter gamma is adopted to balance the influence caused by the same kind of data and different kinds of data; and set the parameter gamma toWherein (1)>Is the average KL divergence of the entire training dataset.

6. The 3D object data classification method of claim 5, wherein: in step 7, after projecting the gradient of the objective function into the tangent space of the same manifold, a Riemann gradient descent is performed on the manifold given an affine invariant Riemann metric symmetric positive definite matrix, after which the manifold structure of the learned linear mapping is preserved during each iteration of the optimization.

7. The 3D object data classification method of claim 5, wherein: in step 8, calculating the KL divergence of each sample in the test set and each sample in the training set; maintaining a priority queue with a size k from large to small according to the distance, and storing nearest neighbor training tuples; and randomly selecting k tuples from the training tuples as initial nearest neighbor tuples, respectively calculating the distances from the test tuple to the k tuples, and storing the training tuple marks and the distances into a priority queue.