CN115035341A

CN115035341A - Image recognition knowledge distillation method capable of automatically selecting student model structure

Info

Publication number: CN115035341A
Application number: CN202210679569.XA
Authority: CN
Inventors: 张翀; 王宏志; 刘宏伟; 丁小欧
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2022-09-09
Anticipated expiration: 2042-06-15
Also published as: CN115035341B

Abstract

An image recognition knowledge distillation method for automatically selecting a student model structure relates to the field of knowledge distillation. The invention aims to solve the problem of low image recognition precision caused by fixed and complex student model structure and poor flexibility in the conventional image recognition knowledge distillation method. The invention comprises the following steps: inputting a picture data set to be predicted into a classification network to obtain a picture category; the classification network obtaining mode is as follows: training a deep convolutional neural network by using the picture training set to obtain a trained deep convolutional neural network; establishing a sub-model space containing a plurality of selectable paths: in each stage of the deep convolutional neural network, presetting paths with different depths, convolutional channel forms and convolutional channel numbers; and automatically selecting a sub-model space according to the trained convolutional neural network, the global objective function and the staged objective function to obtain a classification network. The method is used for compressing the deep learning model.

Description

Image recognition knowledge distillation method capable of automatically selecting student model structure

Technical Field

The invention relates to the field of knowledge distillation, in particular to an image recognition knowledge distillation method capable of automatically selecting a student model structure.

Background

With the development of science and technology, deep convolutional neural networks have succeeded in the technical fields of image classification, target detection, semantic segmentation and the like, but the deep convolutional neural networks have a large number of model parameters and a large volume. There is a high delay in image classification using convolutional neural networks. Therefore, knowledge distillation methods have come into play, and knowledge distillation is a deep learning model compression technology derived from transfer learning. The basic idea is to utilize a powerful and completely trained complex teacher model to extract relevant knowledge in the training process, and the accuracy of the student model can be effectively improved by transferring the information-rich relevant knowledge to a simpler student model. From the perspective of a teacher model, a simple model close to the prediction accuracy of the original complex model can be obtained in a knowledge distillation mode, and the purpose of model compression is achieved. Knowledge distillation is the focus of research in the field because it is simple to train and can achieve better recognition results.

With the continuous development of knowledge distillation technology, students continuously propose new improvement schemes from various angles to enhance the effect of image recognition knowledge distillation, namely 'softtarget' proposed by Hindon, and the input information of the softmax layer at the last layer of a teacher model is taken as the softtarget and is combined with the output class label 'hardtarget' of the teacher model to be taken as the training target of student models. After that, the trainees began to look at the migrated contents of knowledge, including: the middle layer features, namely the shallow layer features, pay attention to texture details, and the deep layer features pay attention to abstract semantics; task related knowledge such as classification probability distribution, example semantics related to target detection, position regression information and the like; the feature characterization related knowledge emphasizes the migration of feature characterization capability, and the method is relatively universal but independent of tasks; some students pay attention to the representation form of the transfer knowledge, such as attentionmap, Solution process (FSP), and others, and have started from the teacher perspective, and use an integrated model formed by multiple models as a teacher model. However, most of these methods consider the content of knowledge distillation learning process such as knowledge extraction position and representation form, and do not consider the problem of student model structure. Therefore, the current sub-model structure is a simplified version of the teacher model, namely, a structured convolutional neural network is adopted. However, the structured convolutional neural network determines the super-parameters such as the size of the convolutional kernel, the number of the convolutional kernels, the number of layers and the like of each layer, and the convolutional numbers and the sizes of the convolutional kernels are the same among different layers, so that the structure of a student model obtained after the distillation of the image recognition knowledge is fixed and complex in the distillation process of the image recognition knowledge, the flexibility is poor, the problem that the model parameter quantity is less on the premise that the precision is ensured and the reasoning speed is higher is caused, and the problem of low image recognition precision is further caused.

Disclosure of Invention

The invention aims to solve the problems that the student model obtained after knowledge distillation cannot enable the quantity of model parameters to be less and the reasoning speed to be higher on the premise of ensuring the precision due to the fact that the student model adopted in the existing image recognition knowledge distillation method is fixed in structure, complex and poor in flexibility, and further the image recognition precision is low.

The image recognition knowledge distillation method for automatically selecting the student model structure comprises the following specific processes:

acquiring a picture data set to be predicted, and inputting the picture data set to be predicted into a target classification network to acquire the category of the picture data to be predicted;

the target classification network is obtained by:

acquiring a prediction picture training set, training a deep convolutional neural network model by using a prediction picture training data set, and acquiring a trained deep convolutional neural network model as a teacher model;

step two, establishing a sub-model space containing a plurality of selectable paths for the student model:

in each stage of the deep convolutional neural network, presetting paths with different depths, convolutional channel forms and convolutional channel numbers;

and step three, automatically selecting the student models established in the step two according to the teacher model obtained in the step one, the set global objective function and the staged objective function, and obtaining the trained student models as a target classification network.

The invention has the beneficial effects that:

according to the image recognition knowledge distillation method provided by the invention, a plurality of paths are arranged in different stages of the student model, and selection is carried out according to the similarity between the image characteristic maps output by the student model and the image characteristic maps output by the teacher model, so that the network structure of the sub-model is more flexible, and the output image characteristic maps of each part are closer to the teacher model, thereby improving the accuracy of image recognition; meanwhile, the method can avoid the determination of hyper-parameters, ensures the accuracy of the model, simultaneously reduces the parameter quantity of the model, and has higher reasoning speed.

Drawings

FIG. 1 is a schematic diagram of different information transfer channel paths in the same stage;

FIG. 2 is a technical flow chart of a knowledge distillation method for automatically selecting a student model structure.

Detailed Description

The first embodiment is as follows: the image recognition knowledge distillation method for automatically selecting the student model structure comprises the following specific processes:

the target classification network is obtained by:

acquiring a prediction picture training set, training a deep convolutional neural network model by using a prediction picture training data set, and acquiring the trained deep convolutional neural network model as a teacher model; (e.g., resnet56, VGG, etc.)

Step two, establishing a sub-model space with a plurality of selectable paths for each model stage of the student model, namely presetting the student model:

in each stage of the deep convolutional neural network, presetting some 'paths' with different depths, convolutional channel forms and convolutional channel numbers as much as possible for selection;

the preset different paths meet the output consistency constraint, that is, the dimension of the feature map output by each path in each stage is the same, as shown in fig. 1;

the sub-model space includes: a model of all possible paths in each stage in the deep neural network;

step three, automatically selecting the student models according to the teacher model obtained in the step one, the set global objective function and the staged objective function, and obtaining the selected student models as a target classification network, as shown in fig. 2:

performing iterative training on the student model, acquiring a feature map output by each path in each stage in the sub-model space in each iterative training process, and calculating the similarity between the feature map output by each path in each stage and a feature map output by a teacher model (a trained deep convolutional neural network);

the similarity between the feature map output by each path in each stage and the feature map output by the trained deep convolutional neural network is judged through a staged objective function:

L _s ＝||W _ss -W _ts || ² +λD _s

wherein L is _s The smaller the similarity of the two characteristic maps is, the larger L _s Is an objective function of the current stage, W _ss Is a characteristic map output by a student model at the current stage, W _ts Is a characteristic map output by the teacher at the current stage, D _s Is the parameter quantity of the selected path, λ is the relation parameter for balancing the parameter quantity in the loss function and L2-norm;

in the step, the path closest to the teacher model is selected to be output as the feature map of the output image at the current stage according to the similarity of the image feature maps output by the teacher model and the student models, so that the image recognition result output by the student models is closer to the image recognition result of the teacher model, and the image recognition precision of the student models is improved.

Step two, training the path corresponding to the maximum similarity value in each stage output in the step one, recording the selected number of the path, and deleting the path with the least selected number in sequence according to the selected number of the path through multiple rounds of iterative updating, wherein the network formed by the last path in each stage is the final student model;

in the training process, each 20 rounds of iterative updating are carried out, and unimportant paths are deleted one by one according to the selected number of different paths according to the sequence of the stage 1, the stage 2 and the stage 3.

Step three, training a final student model by utilizing a global objective function to obtain a target classification network:

the global target function adopts a traditional soft target + hard target mode (soft target + hard target), as follows:

simultaneously considering the difference (left term) between the network output and the real label of the data and the difference (right term) between the teacher network and the student network final output, wherein D (-) represents the cross entropy,

and

respectively representing the logits output (namely the input of the last softmax layer of the convolutional neural network) generated by the student model and the teacher model on the jth input, y ^j The true tag representing the jth input, α is a parameter that trades off the importance of "soft target" and "hard target", T is the softening parameter of logits, j ∈ [1, m]Is a netThe label of the input data is rounded and m is the total number of input data.

And (5) performing final fine adjustment on the sub-model space according to the global loss function, training for 30 rounds, and obtaining a target classification network.

In this embodiment, in one iteration, the same input in each stage passes through n paths to obtain n different feature maps, and the n different feature maps are obtained according to the staged objective function L _s And (3) selecting a path with the output characteristic diagram closest to the output characteristic diagram of the teacher model at the stage for training, wherein the selected number of the selected path is +1 and is recorded, the path selection of the rest stages is performed according to the scheme, only the selected path has the opportunity to be trained in each iteration, namely, only the network structure of the characteristic diagram output by the nearest teacher model is trained in each iteration, and in the training process, according to the difference of the selected number, redundant paths are gradually deleted. Finally, only one path closest to the teacher model will be retained for each phase. F () in fig. 2 represents a phasing target function, respectively.

Claims

1. An image recognition knowledge distillation method for automatically selecting a student model structure is characterized by comprising the following specific processes: acquiring a picture data set to be predicted, and inputting the picture data set to be predicted into a target classification network to acquire the category of the picture data to be predicted;

in each stage of the deep convolutional neural network, presetting some 'paths' with different depths, convolutional channel forms and convolutional channel numbers, wherein all models which can be formed by all the paths are sub-model spaces;

and step three, automatically selecting the student models established in the step two according to the teacher model, the global objective function and the staged objective function obtained in the step one, and obtaining the selected student models as a target classification network.

2. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 1, wherein: the deep convolutional neural network of the step one comprises: resnet56, VGG.

3. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 2, wherein: and each path in the submodel space containing the plurality of selectable paths in the step two meets the output consistency principle.

4. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 3, wherein: in the third step, the sub-model space containing the selectable path established in the second step is automatically selected according to the trained convolutional neural network model obtained in the first step, the global objective function and the staged objective function to obtain the target classification network, and the method comprises the following steps:

performing iterative training on the student model, acquiring a feature map output by each path in each stage in a sub-model space in each iterative training process, and calculating the similarity between the feature map output by each path in each stage and the feature map output by the trained deep convolutional neural network;

and step three, training a final student model by utilizing a global objective function to obtain a target classification network.

5. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 4, wherein the image recognition knowledge distillation method comprises the following steps: the similarity of the feature map output by each path in each stage in the neutron model space in each iteration training process and the feature map output by the trained deep convolutional neural network is judged through a staged objective function:

L _s ＝||W _ss -W _ts || ² +λD _s

wherein L is _s The smaller the similarity between the two characteristic maps is, the larger L _s Is an objective function of the current stage, W _ss Is a characteristic map output by a student model at the current stage, W _ts Is a feature map output by a teacher model at the current stage, D _s Is the parameter quantity of the selected path and lambda is the relation parameter used to balance the parameter quantity in the loss function and L2-norm.

6. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 5, wherein: the global objective function adopts a soft objective + hard objective mode.

7. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 6, wherein: the step of sequentially deleting the path with the least number of choices according to the number of choices of the path through the multiple rounds of iterative updating specifically comprises: and deleting the path with the least number of choices one by one according to the sequence of the stages and the number of choices of different paths every time the iteration updating of a preset turn is carried out in the process of carrying out the iterative training on the sub-model space.