CN115035341A - Image recognition knowledge distillation method capable of automatically selecting student model structure - Google Patents

Image recognition knowledge distillation method capable of automatically selecting student model structure Download PDF

Info

Publication number
CN115035341A
CN115035341A CN202210679569.XA CN202210679569A CN115035341A CN 115035341 A CN115035341 A CN 115035341A CN 202210679569 A CN202210679569 A CN 202210679569A CN 115035341 A CN115035341 A CN 115035341A
Authority
CN
China
Prior art keywords
model
path
image recognition
student
knowledge distillation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210679569.XA
Other languages
Chinese (zh)
Other versions
CN115035341B (en
Inventor
张翀
王宏志
刘宏伟
丁小欧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202210679569.XA priority Critical patent/CN115035341B/en
Publication of CN115035341A publication Critical patent/CN115035341A/en
Application granted granted Critical
Publication of CN115035341B publication Critical patent/CN115035341B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

An image recognition knowledge distillation method for automatically selecting a student model structure relates to the field of knowledge distillation. The invention aims to solve the problem of low image recognition precision caused by fixed and complex student model structure and poor flexibility in the conventional image recognition knowledge distillation method. The invention comprises the following steps: inputting a picture data set to be predicted into a classification network to obtain a picture category; the classification network obtaining mode is as follows: training a deep convolutional neural network by using the picture training set to obtain a trained deep convolutional neural network; establishing a sub-model space containing a plurality of selectable paths: in each stage of the deep convolutional neural network, presetting paths with different depths, convolutional channel forms and convolutional channel numbers; and automatically selecting a sub-model space according to the trained convolutional neural network, the global objective function and the staged objective function to obtain a classification network. The method is used for compressing the deep learning model.

Description

Image recognition knowledge distillation method capable of automatically selecting student model structure
Technical Field
The invention relates to the field of knowledge distillation, in particular to an image recognition knowledge distillation method capable of automatically selecting a student model structure.
Background
With the development of science and technology, deep convolutional neural networks have succeeded in the technical fields of image classification, target detection, semantic segmentation and the like, but the deep convolutional neural networks have a large number of model parameters and a large volume. There is a high delay in image classification using convolutional neural networks. Therefore, knowledge distillation methods have come into play, and knowledge distillation is a deep learning model compression technology derived from transfer learning. The basic idea is to utilize a powerful and completely trained complex teacher model to extract relevant knowledge in the training process, and the accuracy of the student model can be effectively improved by transferring the information-rich relevant knowledge to a simpler student model. From the perspective of a teacher model, a simple model close to the prediction accuracy of the original complex model can be obtained in a knowledge distillation mode, and the purpose of model compression is achieved. Knowledge distillation is the focus of research in the field because it is simple to train and can achieve better recognition results.
With the continuous development of knowledge distillation technology, students continuously propose new improvement schemes from various angles to enhance the effect of image recognition knowledge distillation, namely 'softtarget' proposed by Hindon, and the input information of the softmax layer at the last layer of a teacher model is taken as the softtarget and is combined with the output class label 'hardtarget' of the teacher model to be taken as the training target of student models. After that, the trainees began to look at the migrated contents of knowledge, including: the middle layer features, namely the shallow layer features, pay attention to texture details, and the deep layer features pay attention to abstract semantics; task related knowledge such as classification probability distribution, example semantics related to target detection, position regression information and the like; the feature characterization related knowledge emphasizes the migration of feature characterization capability, and the method is relatively universal but independent of tasks; some students pay attention to the representation form of the transfer knowledge, such as attentionmap, Solution process (FSP), and others, and have started from the teacher perspective, and use an integrated model formed by multiple models as a teacher model. However, most of these methods consider the content of knowledge distillation learning process such as knowledge extraction position and representation form, and do not consider the problem of student model structure. Therefore, the current sub-model structure is a simplified version of the teacher model, namely, a structured convolutional neural network is adopted. However, the structured convolutional neural network determines the super-parameters such as the size of the convolutional kernel, the number of the convolutional kernels, the number of layers and the like of each layer, and the convolutional numbers and the sizes of the convolutional kernels are the same among different layers, so that the structure of a student model obtained after the distillation of the image recognition knowledge is fixed and complex in the distillation process of the image recognition knowledge, the flexibility is poor, the problem that the model parameter quantity is less on the premise that the precision is ensured and the reasoning speed is higher is caused, and the problem of low image recognition precision is further caused.
Disclosure of Invention
The invention aims to solve the problems that the student model obtained after knowledge distillation cannot enable the quantity of model parameters to be less and the reasoning speed to be higher on the premise of ensuring the precision due to the fact that the student model adopted in the existing image recognition knowledge distillation method is fixed in structure, complex and poor in flexibility, and further the image recognition precision is low.
The image recognition knowledge distillation method for automatically selecting the student model structure comprises the following specific processes:
acquiring a picture data set to be predicted, and inputting the picture data set to be predicted into a target classification network to acquire the category of the picture data to be predicted;
the target classification network is obtained by:
acquiring a prediction picture training set, training a deep convolutional neural network model by using a prediction picture training data set, and acquiring a trained deep convolutional neural network model as a teacher model;
step two, establishing a sub-model space containing a plurality of selectable paths for the student model:
in each stage of the deep convolutional neural network, presetting paths with different depths, convolutional channel forms and convolutional channel numbers;
and step three, automatically selecting the student models established in the step two according to the teacher model obtained in the step one, the set global objective function and the staged objective function, and obtaining the trained student models as a target classification network.
The invention has the beneficial effects that:
according to the image recognition knowledge distillation method provided by the invention, a plurality of paths are arranged in different stages of the student model, and selection is carried out according to the similarity between the image characteristic maps output by the student model and the image characteristic maps output by the teacher model, so that the network structure of the sub-model is more flexible, and the output image characteristic maps of each part are closer to the teacher model, thereby improving the accuracy of image recognition; meanwhile, the method can avoid the determination of hyper-parameters, ensures the accuracy of the model, simultaneously reduces the parameter quantity of the model, and has higher reasoning speed.
Drawings
FIG. 1 is a schematic diagram of different information transfer channel paths in the same stage;
FIG. 2 is a technical flow chart of a knowledge distillation method for automatically selecting a student model structure.
Detailed Description
The first embodiment is as follows: the image recognition knowledge distillation method for automatically selecting the student model structure comprises the following specific processes:
acquiring a picture data set to be predicted, and inputting the picture data set to be predicted into a target classification network to acquire the category of the picture data to be predicted;
the target classification network is obtained by:
acquiring a prediction picture training set, training a deep convolutional neural network model by using a prediction picture training data set, and acquiring the trained deep convolutional neural network model as a teacher model; (e.g., resnet56, VGG, etc.)
Step two, establishing a sub-model space with a plurality of selectable paths for each model stage of the student model, namely presetting the student model:
in each stage of the deep convolutional neural network, presetting some 'paths' with different depths, convolutional channel forms and convolutional channel numbers as much as possible for selection;
the preset different paths meet the output consistency constraint, that is, the dimension of the feature map output by each path in each stage is the same, as shown in fig. 1;
the sub-model space includes: a model of all possible paths in each stage in the deep neural network;
step three, automatically selecting the student models according to the teacher model obtained in the step one, the set global objective function and the staged objective function, and obtaining the selected student models as a target classification network, as shown in fig. 2:
performing iterative training on the student model, acquiring a feature map output by each path in each stage in the sub-model space in each iterative training process, and calculating the similarity between the feature map output by each path in each stage and a feature map output by a teacher model (a trained deep convolutional neural network);
the similarity between the feature map output by each path in each stage and the feature map output by the trained deep convolutional neural network is judged through a staged objective function:
L s =||W ss -W ts || 2 +λD s
wherein L is s The smaller the similarity of the two characteristic maps is, the larger L s Is an objective function of the current stage, W ss Is a characteristic map output by a student model at the current stage, W ts Is a characteristic map output by the teacher at the current stage, D s Is the parameter quantity of the selected path, λ is the relation parameter for balancing the parameter quantity in the loss function and L2-norm;
in the step, the path closest to the teacher model is selected to be output as the feature map of the output image at the current stage according to the similarity of the image feature maps output by the teacher model and the student models, so that the image recognition result output by the student models is closer to the image recognition result of the teacher model, and the image recognition precision of the student models is improved.
Step two, training the path corresponding to the maximum similarity value in each stage output in the step one, recording the selected number of the path, and deleting the path with the least selected number in sequence according to the selected number of the path through multiple rounds of iterative updating, wherein the network formed by the last path in each stage is the final student model;
in the training process, each 20 rounds of iterative updating are carried out, and unimportant paths are deleted one by one according to the selected number of different paths according to the sequence of the stage 1, the stage 2 and the stage 3.
Step three, training a final student model by utilizing a global objective function to obtain a target classification network:
the global target function adopts a traditional soft target + hard target mode (soft target + hard target), as follows:
Figure BDA0003695861630000041
simultaneously considering the difference (left term) between the network output and the real label of the data and the difference (right term) between the teacher network and the student network final output, wherein D (-) represents the cross entropy,
Figure BDA0003695861630000042
and
Figure BDA0003695861630000043
respectively representing the logits output (namely the input of the last softmax layer of the convolutional neural network) generated by the student model and the teacher model on the jth input, y j The true tag representing the jth input, α is a parameter that trades off the importance of "soft target" and "hard target", T is the softening parameter of logits, j ∈ [1, m]Is a netThe label of the input data is rounded and m is the total number of input data.
And (5) performing final fine adjustment on the sub-model space according to the global loss function, training for 30 rounds, and obtaining a target classification network.
In this embodiment, in one iteration, the same input in each stage passes through n paths to obtain n different feature maps, and the n different feature maps are obtained according to the staged objective function L s And (3) selecting a path with the output characteristic diagram closest to the output characteristic diagram of the teacher model at the stage for training, wherein the selected number of the selected path is +1 and is recorded, the path selection of the rest stages is performed according to the scheme, only the selected path has the opportunity to be trained in each iteration, namely, only the network structure of the characteristic diagram output by the nearest teacher model is trained in each iteration, and in the training process, according to the difference of the selected number, redundant paths are gradually deleted. Finally, only one path closest to the teacher model will be retained for each phase. F () in fig. 2 represents a phasing target function, respectively.

Claims (7)

1. An image recognition knowledge distillation method for automatically selecting a student model structure is characterized by comprising the following specific processes: acquiring a picture data set to be predicted, and inputting the picture data set to be predicted into a target classification network to acquire the category of the picture data to be predicted;
acquiring a prediction picture training set, training a deep convolutional neural network model by using a prediction picture training data set, and acquiring a trained deep convolutional neural network model as a teacher model;
step two, establishing a sub-model space containing a plurality of selectable paths for the student model:
in each stage of the deep convolutional neural network, presetting some 'paths' with different depths, convolutional channel forms and convolutional channel numbers, wherein all models which can be formed by all the paths are sub-model spaces;
and step three, automatically selecting the student models established in the step two according to the teacher model, the global objective function and the staged objective function obtained in the step one, and obtaining the selected student models as a target classification network.
2. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 1, wherein: the deep convolutional neural network of the step one comprises: resnet56, VGG.
3. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 2, wherein: and each path in the submodel space containing the plurality of selectable paths in the step two meets the output consistency principle.
4. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 3, wherein: in the third step, the sub-model space containing the selectable path established in the second step is automatically selected according to the trained convolutional neural network model obtained in the first step, the global objective function and the staged objective function to obtain the target classification network, and the method comprises the following steps:
performing iterative training on the student model, acquiring a feature map output by each path in each stage in a sub-model space in each iterative training process, and calculating the similarity between the feature map output by each path in each stage and the feature map output by the trained deep convolutional neural network;
step two, training the path corresponding to the maximum similarity value in each stage output in the step one, recording the selected number of the path, and deleting the path with the least selected number in sequence according to the selected number of the path through multiple rounds of iterative updating, wherein the network formed by the last path in each stage is the final student model;
and step three, training a final student model by utilizing a global objective function to obtain a target classification network.
5. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 4, wherein the image recognition knowledge distillation method comprises the following steps: the similarity of the feature map output by each path in each stage in the neutron model space in each iteration training process and the feature map output by the trained deep convolutional neural network is judged through a staged objective function:
L s =||W ss -W ts || 2 +λD s
wherein L is s The smaller the similarity between the two characteristic maps is, the larger L s Is an objective function of the current stage, W ss Is a characteristic map output by a student model at the current stage, W ts Is a feature map output by a teacher model at the current stage, D s Is the parameter quantity of the selected path and lambda is the relation parameter used to balance the parameter quantity in the loss function and L2-norm.
6. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 5, wherein: the global objective function adopts a soft objective + hard objective mode.
7. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 6, wherein: the step of sequentially deleting the path with the least number of choices according to the number of choices of the path through the multiple rounds of iterative updating specifically comprises: and deleting the path with the least number of choices one by one according to the sequence of the stages and the number of choices of different paths every time the iteration updating of a preset turn is carried out in the process of carrying out the iterative training on the sub-model space.
CN202210679569.XA 2022-06-15 2022-06-15 Image recognition knowledge distillation method for automatically selecting student model structure Active CN115035341B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210679569.XA CN115035341B (en) 2022-06-15 2022-06-15 Image recognition knowledge distillation method for automatically selecting student model structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210679569.XA CN115035341B (en) 2022-06-15 2022-06-15 Image recognition knowledge distillation method for automatically selecting student model structure

Publications (2)

Publication Number Publication Date
CN115035341A true CN115035341A (en) 2022-09-09
CN115035341B CN115035341B (en) 2024-09-06

Family

ID=83125423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210679569.XA Active CN115035341B (en) 2022-06-15 2022-06-15 Image recognition knowledge distillation method for automatically selecting student model structure

Country Status (1)

Country Link
CN (1) CN115035341B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117197590A (en) * 2023-11-06 2023-12-08 山东智洋上水信息技术有限公司 Image classification method and device based on neural architecture search and knowledge distillation
CN117372785A (en) * 2023-12-04 2024-01-09 吉林大学 Image classification method based on feature cluster center compression

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199242A (en) * 2019-12-18 2020-05-26 浙江工业大学 Image increment learning method based on dynamic correction vector
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation
WO2022057078A1 (en) * 2020-09-21 2022-03-24 深圳大学 Real-time colonoscopy image segmentation method and device based on ensemble and knowledge distillation
US20220129731A1 (en) * 2021-05-27 2022-04-28 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for training image recognition model, and method and apparatus for recognizing image

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199242A (en) * 2019-12-18 2020-05-26 浙江工业大学 Image increment learning method based on dynamic correction vector
WO2022057078A1 (en) * 2020-09-21 2022-03-24 深圳大学 Real-time colonoscopy image segmentation method and device based on ensemble and knowledge distillation
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation
US20220129731A1 (en) * 2021-05-27 2022-04-28 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for training image recognition model, and method and apparatus for recognizing image

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117197590A (en) * 2023-11-06 2023-12-08 山东智洋上水信息技术有限公司 Image classification method and device based on neural architecture search and knowledge distillation
CN117197590B (en) * 2023-11-06 2024-02-27 山东智洋上水信息技术有限公司 Image classification method and device based on neural architecture search and knowledge distillation
CN117372785A (en) * 2023-12-04 2024-01-09 吉林大学 Image classification method based on feature cluster center compression
CN117372785B (en) * 2023-12-04 2024-03-26 吉林大学 Image classification method based on feature cluster center compression

Also Published As

Publication number Publication date
CN115035341B (en) 2024-09-06

Similar Documents

Publication Publication Date Title
CN110851645B (en) Image retrieval method based on similarity maintenance under deep metric learning
CN115035341A (en) Image recognition knowledge distillation method capable of automatically selecting student model structure
CN115017418B (en) Remote sensing image recommendation system and method based on reinforcement learning
CN110443257B (en) Significance detection method based on active learning
CN110889450B (en) Super-parameter tuning and model construction method and device
CN113505204B (en) Recall model training method, search recall device and computer equipment
CN103942571B (en) Graphic image sorting method based on genetic programming algorithm
CN113032613B (en) Three-dimensional model retrieval method based on interactive attention convolution neural network
CN114298122B (en) Data classification method, apparatus, device, storage medium and computer program product
CN109710804B (en) Teaching video image knowledge point dimension reduction analysis method
CN114510594A (en) Traditional pattern subgraph retrieval method based on self-attention mechanism
CN111506760B (en) Depth integration measurement image retrieval method based on difficult perception
CN112712127A (en) Image emotion polarity classification method combined with graph convolution neural network
CN115858847B (en) Combined query image retrieval method based on cross-modal attention reservation
CN110704665A (en) Image feature expression method and system based on visual attention mechanism
CN110866134A (en) Image retrieval-oriented distribution consistency keeping metric learning method
Yang et al. A comprehensive survey on image aesthetic quality assessment
CN115422369B (en) Knowledge graph completion method and device based on improved TextRank
CN115249313A (en) Image classification method based on meta-module fusion incremental learning
Zhang et al. Emotion attention-aware collaborative deep reinforcement learning for image cropping
CN115359250A (en) Cross-domain small sample image semantic segmentation method based on memory mechanism
CN115758159B (en) Zero sample text position detection method based on mixed contrast learning and generation type data enhancement
CN117011515A (en) Interactive image segmentation model based on attention mechanism and segmentation method thereof
CN116701706A (en) Data processing method, device, equipment and medium based on artificial intelligence
CN116363461A (en) Depth network incremental learning method for classifying tumor pathological images of multi-view children

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant