CN113159085A

CN113159085A - Training of classification model, image-based classification method and related device

Info

Publication number: CN113159085A
Application number: CN202011604336.0A
Authority: CN
Inventors: 钱扬
Original assignee: Beijing Aibee Technology Co Ltd
Current assignee: Beijing Aibee Technology Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-07-23

Abstract

The application provides a training method and a training device for classification models, which are used for respectively acquiring a first feature extracted by a trained first model on sample data and a second feature extracted by a second model on a sample image, and dividing the dimension of the first feature again to obtain a first target feature. The weights of all dimension features in the first target features are determined, the second model is trained by using the weights, the first target features and the second target features, and the classification model with high accuracy and low complexity can be obtained by training. Correspondingly, the classification method based on the images uses the trained second classification model to obtain the classification result of the target in the images, and the resource occupation amount of classification operation can be reduced on the premise of ensuring the accuracy of the classification result.

Description

Training of classification model, image-based classification method and related device

Technical Field

The present application relates to the field of electronic information, and in particular, to a method and an apparatus for training a classification model and classifying based on an image.

Background

With the development of machine learning techniques, neural network models are applied in various fields as classifiers. As the demand for classification accuracy increases, the complexity (e.g., number of layers) of the neural network model increases, and the more complex neural network model occupies more resources at runtime.

In practice, especially in an application scenario with time-dependent requirements, the complexity of the neural network model needs to be limited due to the computational limitation of hardware resources, that is, a neural network model with lower complexity is adopted.

Therefore, how to reduce the resource occupation of the classification operation on the premise of ensuring the accuracy of the classification result becomes a problem to be solved.

Disclosure of Invention

The application provides a training method and a training device for a classification model, and aims to solve the problem of how to obtain a classification model with high accuracy and low complexity. Correspondingly, the application also provides an image-based classification method, so that the resource occupation amount of classification operation is reduced on the premise of ensuring the accuracy of the classification result.

In order to achieve the above object, the present application provides the following technical solutions:

a training method of a classification model comprises the following steps:

respectively acquiring a first feature and a second feature, wherein the first feature is extracted from the sample data by a trained first model, the second feature is extracted from the sample data by a second model, and the complexity of the first model is higher than that of the second model;

re-dividing the first characteristic into dimensions to obtain a first target characteristic;

determining weights of features of respective dimensions in the first target feature;

training the second model using the weights, the first target features, and second target features, the second target features being obtained by re-dimensioning the second features.

Optionally, the repartitioning the first feature into dimensions to obtain a first target feature includes:

dividing dimensionality of the first feature again by taking the information content or the degree of influence on the output result of the classification model as a basis to obtain the first target feature;

for the features of any dimensionality, the larger the information quantity is, the larger the weight is; or, for the feature of any dimension, the greater the influence degree on the output result, the greater the weight.

Optionally, the step of re-dividing the dimension of the first feature based on the information amount to obtain a first target feature includes:

acquiring a principal component analysis transformation matrix of the first characteristic;

and performing linear transformation on the first characteristic by using the principal component analysis transformation matrix to obtain the first target characteristic with the information quantity of the characteristic of each dimension decreased progressively.

Optionally, the determining the weight of the feature of each dimension in the first target feature includes:

determining the weight of the features of the target dimension according to the variance and the total variance of the features of the target dimension; the target dimension is any one dimension, and the total variance is the sum of the variances of the features of all the dimensions.

Optionally, the step of re-dividing the dimensionality of the first feature according to the degree of influence on the output result of the classification model to obtain a first target feature includes:

acquiring a linear discriminant analysis transformation matrix of the first characteristic;

and performing linear transformation on the first characteristic by using the linear discriminant analysis transformation matrix to obtain the first target characteristic.

determining the weight of the feature of the target dimension according to the feature quantity and the total feature quantity of the target dimension in the linear discriminant analysis transformation matrix; the target dimension is any one dimension, and the total characteristic quantity is the sum of the characteristic quantities of all dimensions in the linear discriminant analysis transformation matrix;

and taking the weight of each dimension in the linear discriminant analysis transformation matrix as the weight of each corresponding dimension in the first target feature.

Optionally, the first model is a teacher model in a knowledge distillation model;

the second model is a student model in the knowledge distillation model.

Optionally, the training the second model using the weights, the first target feature and the second target feature includes:

determining a distance between the first target feature and the second target feature as a function of the weight, the first target feature and the second target feature;

training the second model using the distance as a first loss function.

determining a first similarity matrix and a second similarity matrix by using the distance, wherein the first similarity matrix is a similarity matrix of the first feature, and the second similarity matrix is a similarity matrix of the second feature;

determining a second loss function according to the first similarity matrix and the second similarity matrix;

the second model is trained using a second loss function.

An image-based classification method, comprising:

and inputting the image into a classification model to obtain a classification result of the target in the image output by the classification model, wherein the classification model is used as a second model and is obtained by training by using the training method of the classification model.

A training apparatus for classification models, comprising:

the device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for respectively obtaining a first characteristic and a second characteristic, the first characteristic is the characteristic of a trained first model for extracting sample data, the second characteristic is the characteristic of a second model for extracting the sample data, and the complexity of the first model is higher than that of the second model;

the second acquisition module is used for re-dividing the dimensionality of the first feature to obtain a first target feature;

a determination module for determining a weight of a feature of each dimension in the first target feature;

a training module to train the second model using the weights, the first target features, and second target features, the second target features obtained by re-dividing the second features into dimensions.

An electronic device, comprising:

a memory and a processor;

the memory is used for storing a program, and the processor is used for operating the program to realize the training method of the classification model or the image-based classification method.

A computer-readable storage medium, on which a program is stored, which, when read by a computing device, implements the above-described training method of a classification model or image-based classification method.

According to the training method and device for the classification model, a first feature extracted from a trained first model for sample data and a second feature extracted from a second model for sample images are obtained respectively, and the first feature is re-divided into dimensions to obtain a first target feature. Determining weights of the dimension features in the first target feature, and training a second model by using the weights, the first target feature and the second target feature. Because the complexity of the first model is higher than that of the second model, the accuracy of the trained first model is higher than that of the trained second model, and therefore the second model can be learned to higher accuracy by using the first target feature to train the second model. Furthermore, the weight of the features of each dimension is also used as the basis for training, so that finer-grained training can be realized, the accuracy of the second model is further improved, and therefore the classification model with high accuracy and low complexity can be obtained through training. Correspondingly, the classification method based on the images uses the trained second classification model to obtain the classification result of the target in the images, and the resource occupation amount of classification operation can be reduced on the premise of ensuring the accuracy of the classification result.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a classification model training method disclosed in an embodiment of the present application;

FIG. 2 is a flowchart of a method for training a classification model according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a training apparatus for classification models disclosed in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a training method of a classification model disclosed in an embodiment of the present application, including the following steps:

and S101, respectively acquiring a first feature and a second feature.

The first characteristic is the characteristic extracted by the trained first model to the sample data, and the second characteristic is the characteristic extracted by the second model to the sample data.

In this embodiment, the complexity of the first model is higher than that of the second model, so the trained first model can have higher classification accuracy than the trained second model. Therefore, the first model after training is used as one of the bases for training the second model, and the specific steps are as follows.

And S102, re-dividing the first characteristic into dimensions to obtain a first target characteristic.

Specifically, the dimension may be subdivided according to the information amount of the first feature to obtain the first target feature.

It should be noted that, in a conventional model such as a neural network, features (typically, feature vectors) extracted from input data cannot reflect the amount of information that the features of each dimension occupy in sample data.

The first target feature is obtained by re-dividing the dimensions based on the information amount, so that the features of each dimension in the first target feature are distinguished from the information amount.

For example, the first feature vector includes 512-dimensional features, and typically, one column in the feature vector matrix is a dimension, and data located in the same column is a feature of the same dimension.

After the dimensions are re-divided, the first target feature vector includes 512-dimensional features, and the dimensions are arranged in the order of the largest information amount, that is, the information amount of the features in the first column in the re-divided feature vector matrix is the largest and larger than the information amount of the features in the second column, and the information amount of the features in the second column is larger than the information amount of the features in the third column …, and so on.

The specific manner of re-dividing the dimensions according to the amount of information will be explained in the following embodiments.

Or, the dimensions may be re-divided based on the degree of influence on the output result of the classification model to obtain the first target feature, so that the features of each dimension in the first target feature have the difference of the degree of influence. The specific division will be explained in the following embodiments.

S103, determining the weight of the feature of each dimension in the first target feature.

The weight is larger for the features of any dimension divided by the information amount, and the weight is larger for any dimension divided by the influence degree on the output result.

The manner in which the weights are calculated can be found in the prior art and will be illustrated in the examples that follow.

And S104, training a second model by using the weight, the first target feature and the second target feature.

And the second target feature is obtained by re-dividing the dimension of the second feature, wherein the re-dividing mode of the dimension is the same as that of the first feature.

It can be understood that the purpose of training the second model by using the weighted first target feature and the weighted second target feature is to enable the second model to learn the classification knowledge of the first model on the sample data, and add the weight of each dimension feature into the knowledge to enable the second model to learn finer-grained knowledge.

For an illustration of the training process, see the following examples.

In the process shown in fig. 1, the trained first model is used to extract the features of the sample data, the trained second model is used to extract the features of the sample data, and the weights of the features of each dimension are used to train the second model, so that the second model can learn the classification knowledge of the first model. In summary, the accuracy of the second model is improved under the condition of light weight.

Furthermore, the weight of the feature of each dimension is also used as a training basis, so that the second model can learn the classification knowledge of the first model in a finer granularity, and the accuracy is further improved.

The above examples are described in detail below by taking a knowledge distillation model as an example.

Fig. 2 is a training method of a classification model disclosed in an embodiment of the present application, including the following steps:

s201, inputting sample data into a teacher model and a student model in the knowledge distillation model to respectively obtain a first feature and a second feature.

The knowledge distillation model comprises a teacher model and a student model. The complexity of the teacher model is higher than that of the student models, so that the classification accuracy of the teacher model is high, and the resource occupation of the student models during operation is smaller. Therefore, in practice, the student model is generally used for prediction, and the teacher model is used for training the student model, so that the trained student model can learn the classification knowledge of the teacher model, thereby having higher accuracy.

It is to be understood that the sample data may be a plurality of pieces of sample data. Inputting any sample data (for example, an image) into the teacher network, a feature (for example, a feature vector matrix) extracted by the teacher network, namely, a first feature, can be obtained. Any sample data is input into the student network, and a feature (for example, a feature vector matrix) extracted by the student network, namely the second feature, can be obtained.

It should be noted that the features extracted by the first model and the second model for the sample data may not be output data of the models but intermediate data.

For the training process of the first model, see the prior art.

S202, acquiring a principal component analysis transformation matrix of the first characteristic.

Principal component analysis is a common algorithm for extracting principal components in data, and the acquisition mode of a principal component analysis transformation matrix can be seen in the prior art.

S203, linear transformation is carried out on the first features by using the principal component analysis transformation matrix, and first target features with the information quantity of the features of all dimensions decreasing are obtained.

Specifically, assume that the first feature is denoted as f_tIf the principal component analysis transformation matrix is marked as T, Tf_tIs the first target feature.

Based on the characteristics of principal component analysis, the information amount of the feature of each dimension in the first target feature obtained in the above manner is decreased progressively.

S204, determining the weight of the feature of the target dimension according to the variance and the total variance of the feature of the target dimension.

Wherein, the target dimension is any one dimension, and the total variance is the sum of the variances of the features of each dimension, that is, formula (1):

is the weight of the feature of the ith dimension, \ delta _ i is the variance, Σ, of the feature of the ith dimension_iDelta _ i is the sum of the variances of the features for the i dimensions. It can be seen that the larger the variance, the greater the weight, since the variance is a parameter reflecting the amount of information, the greater the weight of the amount of information.

It should be noted that since the dimensions of the features are newly divided in S203 to obtain the features of the respective dimensions divided according to the amount of information, S204 can obtain the weight for determining the amount of information reflecting the features of the respective dimensions, that is, the degree of importance, by using equation (1).

S205, calculating a distance function by using the weight, the first target feature and the second target feature.

In particular, assume that the second feature is denoted as f_sThen the distance function is as in equation (2):

wherein, Tf_tW is the weight of the feature in the ith dimension.

And S206, training a second model by using the distance function.

Specifically, the training mode of the student model of the knowledge distillation model comprises two modes, and in the step, the distance function is combined into the two modes to realize the training of the student model.

The first mode is as follows: absolute distance based training. That is, the loss function used to train the student model is: loss_kd＝Dist(f_t,f_s) In this step, the distance function of equation (2) is used instead of the loss function to train the second model.

The second mode is as follows: training based on relative distance.

Calculating a similarity matrix of the first feature by using the distance function of the formula (2) to obtain a first similarity matrix A_tCalculating a similarity matrix for the second feature using the distance function of equation (2),obtaining a second similarity matrix A_sUsing Loss function Loss_kd＝Norm(A_t,A_s) A second model is trained. Norm represents the Norm operation.

In the process shown in fig. 2, the weight indicating the importance degree of the feature of each dimension is determined based on the information amount of the feature of each dimension, and the weight is used as a training basis, so that finer-grained knowledge distillation is realized, and the performance of the student network is further improved.

And, through the use of combining with current algorithm, be convenient for insert in current knowledge distillation algorithm framework, so have the commonality.

Based on the degree of influence on the output result of the classification model, the specific implementation manner of the dimensionality repartitioning is similar to the manner of repartitioning based on the information quantity, and the differences include:

1. the matrix used for linear transformation is a linear discriminant analysis transformation matrix.

Namely: and acquiring a linear discriminant analysis transformation matrix of the first feature, and performing linear transformation on the first feature by using the linear discriminant analysis transformation matrix to obtain a first target feature.

Different from the Principle Component Analysis (PCA) variance maximization theory, the idea of the Linear Discriminant Analysis (LDA) algorithm is that data are projected to a low-dimensional space, so that the same class of data is as compact as possible, and data in different classes are dispersed as far as possible. The specific way to obtain the LDA transformation matrix can be found in the prior art.

2. The way in which the weights for each dimension are computed is different.

Specifically, the weights of the dimensions in the LDA transformation matrix are determined first, that is: and determining the weight of the feature of the target dimension according to the feature quantity and the total feature quantity of the target dimension in the LDA transformation matrix.

The target dimension is any dimension in the LDA transformation matrix, and the total characteristic quantity is the sum of the characteristic quantities of all the dimensions in the LDA transformation matrix.

And then, taking the weight of each dimension in the LDA transformation matrix as the weight of each corresponding dimension in the first target characteristic. It can be understood that, because the specific form of the feature is a matrix, the corresponding dimensions of the two matrices are the same dimensions in the same order and in the same position. For example, a first column in the LDA transform matrix corresponds to a first column in the first target feature, a second column in the LDA transform matrix corresponds to a second column in the first target feature, and so on.

The training method of the classification model and the classification model obtained by training can be applied to the scene of pedestrian re-identification based on images:

the requirements for classification are: and identifying all images belonging to the same person according to the pedestrian images acquired by the plurality of cameras.

Based on the above requirements for classification, the training process for the classification model is as follows: the teacher model is trained using the sample images.

And inputting the sample images into the teacher model and the student model respectively to obtain the first characteristic and the second characteristic. The sample image used in this step may be different from or the same as the sample image used for training the teacher model.

The first target feature and the second target feature are obtained in the above manner, and the weight of the feature of each dimension in the first target feature is determined. In the above manner, the student model is trained.

The image-based classification method using the trained student model includes the steps of:

1. and inputting the image into the trained student model to obtain the characteristic vector of the target in the image output by the student model.

2. The distance between the feature vector and the feature vectors of other images is calculated. If the distance is smaller than the set threshold value, the targets in the two images are judged to be the same person; otherwise, a different person is identified.

The student model learns the classification knowledge of the teacher model, so that the classification accuracy is high, and the complexity of the student model is lower than that of the teacher model, so that the occupied computing resources are less in the application scene of identifying the same person, and the purpose of reducing the resource occupation amount of classification operation is achieved on the premise of ensuring the accuracy of the classification result.

Fig. 3 is a training apparatus for a classification model according to an embodiment of the present application, including: the device comprises a first acquisition module, a second acquisition module, a determination module and a training module.

The first obtaining module is configured to obtain a first feature and a second feature respectively, where the first feature is a feature extracted from the trained first model for sample data, the second feature is a feature extracted from the second model for the sample data, and the complexity of the first model is higher than that of the second model.

And the second acquisition module is used for re-dividing the dimension of the first characteristic to obtain a first target characteristic.

A determining module for determining a weight of a feature of each dimension in the first target feature.

Optionally, the second obtaining module is configured to re-divide the first feature into dimensions to obtain a first target feature, and includes:

the second obtaining module is specifically configured to re-divide the dimensionality of the first feature based on an information amount or an influence degree on an output result of the classification model to obtain the first target feature;

Optionally, the second obtaining module is configured to re-divide the dimension of the first feature based on the information amount to obtain a first target feature, and the method includes:

the second obtaining module is specifically configured to obtain a principal component analysis transformation matrix of the first feature; and performing linear transformation on the first characteristic by using the principal component analysis transformation matrix to obtain the first target characteristic with the information quantity of the characteristic of each dimension decreased progressively.

Optionally, the determining module is configured to determine the weight of each dimension of the first target feature, and includes:

the determining module is specifically configured to determine a weight of the feature of the target dimension according to a variance and a total variance of the feature of the target dimension; the target dimension is any one dimension, and the total variance is the sum of the variances of the features of all the dimensions.

Optionally, the second obtaining module is configured to re-divide the dimensionality of the first feature according to an influence degree on an output result of the classification model to obtain a first target feature, and the obtaining of the first target feature includes:

the second obtaining module is specifically configured to obtain a linear discriminant analysis transformation matrix of the first feature; and performing linear transformation on the first characteristic by using the linear discriminant analysis transformation matrix to obtain the first target characteristic.

the determining module is specifically configured to determine a weight of the feature of the target dimension according to the feature quantity and the total feature quantity of the target dimension in the linear discriminant analysis transformation matrix; the target dimension is any one dimension, and the total characteristic quantity is the sum of the characteristic quantities of all dimensions in the linear discriminant analysis transformation matrix; and taking the weight of each dimension in the linear discriminant analysis transformation matrix as the weight of each corresponding dimension in the first target feature.

Optionally, the first model is a teacher model in a knowledge distillation model; the second model is a student model in the knowledge distillation model.

Optionally, the training module is configured to train the second model using the weights, the first target features, and the second target features, and includes:

the training module is specifically configured to determine a distance between the first target feature and the second target feature according to the weight, the first target feature, and the second target feature; training the second model using the distance as a first loss function.

the training module is specifically configured to determine a first similarity matrix and a second similarity matrix using the distance, where the first similarity matrix is a similarity matrix of the first feature, and the second similarity matrix is a similarity matrix of the second feature; determining a second loss function according to the first similarity matrix and the second similarity matrix; the second model is trained using a second loss function. The device provided by the embodiment can improve the accuracy of the model and reduce the complexity of the model, so that the resource occupation amount of classification operation is reduced on the premise of ensuring the accuracy of the classification result.

The embodiment of the application also discloses an electronic device, which comprises: a memory and a processor. The memory is used for storing a program, and the processor is used for operating the program to realize the training method of the classification model or the image-based classification method.

The embodiment of the application also discloses a computer readable storage medium, wherein a program is stored on the computer readable storage medium, and when the program is read by a computing device, the training method of the classification model or the classification method based on the image is realized.

The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A training method of a classification model is characterized by comprising the following steps:

2. The method of claim 1, wherein the re-dimension the first feature to obtain a first target feature comprises:

3. The method of claim 2, wherein re-dividing the first feature into dimensions based on the amount of information to obtain a first target feature comprises:

4. The method of claim 2, wherein the step of re-dividing the dimension into the first target feature according to the degree of influence on the output result of the classification model comprises:

5. The method of claim 1, wherein the training the second model using the weights, the first target features, and second target features comprises:

training the second model using the distance as a first loss function.

6. The method of claim 1, wherein the training the second model using the weights, the first target features, and second target features comprises:

the second model is trained using a second loss function.

7. An image-based classification method, comprising:

inputting the image into a classification model to obtain a classification result of the target in the image output by the classification model, wherein the classification model is used as a second model and is obtained by training by using the training method of the classification model according to any one of claims 1 to 6.

8. A training device for classification models, comprising:

9. An electronic device, comprising:

a memory and a processor;

the memory is for storing a program and the processor is for executing the program to implement the method of any one of claims 1-6 or 7.

10. A computer-readable storage medium on which a program is stored, characterized in that the method of any one of claims 1-6 or 7 is implemented when the program is read by a computing device.