CN115035341A - Image recognition knowledge distillation method capable of automatically selecting student model structure - Google Patents
Image recognition knowledge distillation method capable of automatically selecting student model structure Download PDFInfo
- Publication number
- CN115035341A CN115035341A CN202210679569.XA CN202210679569A CN115035341A CN 115035341 A CN115035341 A CN 115035341A CN 202210679569 A CN202210679569 A CN 202210679569A CN 115035341 A CN115035341 A CN 115035341A
- Authority
- CN
- China
- Prior art keywords
- model
- path
- image recognition
- student
- knowledge distillation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000013140 knowledge distillation Methods 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 27
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 25
- 230000006870 function Effects 0.000 abstract description 14
- 238000013136 deep learning model Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 4
- 238000012512 characterization method Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000004821 distillation Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
An image recognition knowledge distillation method for automatically selecting a student model structure relates to the field of knowledge distillation. The invention aims to solve the problem of low image recognition precision caused by fixed and complex student model structure and poor flexibility in the conventional image recognition knowledge distillation method. The invention comprises the following steps: inputting a picture data set to be predicted into a classification network to obtain a picture category; the classification network obtaining mode is as follows: training a deep convolutional neural network by using the picture training set to obtain a trained deep convolutional neural network; establishing a sub-model space containing a plurality of selectable paths: in each stage of the deep convolutional neural network, presetting paths with different depths, convolutional channel forms and convolutional channel numbers; and automatically selecting a sub-model space according to the trained convolutional neural network, the global objective function and the staged objective function to obtain a classification network. The method is used for compressing the deep learning model.
Description
Technical Field
The invention relates to the field of knowledge distillation, in particular to an image recognition knowledge distillation method capable of automatically selecting a student model structure.
Background
With the development of science and technology, deep convolutional neural networks have succeeded in the technical fields of image classification, target detection, semantic segmentation and the like, but the deep convolutional neural networks have a large number of model parameters and a large volume. There is a high delay in image classification using convolutional neural networks. Therefore, knowledge distillation methods have come into play, and knowledge distillation is a deep learning model compression technology derived from transfer learning. The basic idea is to utilize a powerful and completely trained complex teacher model to extract relevant knowledge in the training process, and the accuracy of the student model can be effectively improved by transferring the information-rich relevant knowledge to a simpler student model. From the perspective of a teacher model, a simple model close to the prediction accuracy of the original complex model can be obtained in a knowledge distillation mode, and the purpose of model compression is achieved. Knowledge distillation is the focus of research in the field because it is simple to train and can achieve better recognition results.
With the continuous development of knowledge distillation technology, students continuously propose new improvement schemes from various angles to enhance the effect of image recognition knowledge distillation, namely 'softtarget' proposed by Hindon, and the input information of the softmax layer at the last layer of a teacher model is taken as the softtarget and is combined with the output class label 'hardtarget' of the teacher model to be taken as the training target of student models. After that, the trainees began to look at the migrated contents of knowledge, including: the middle layer features, namely the shallow layer features, pay attention to texture details, and the deep layer features pay attention to abstract semantics; task related knowledge such as classification probability distribution, example semantics related to target detection, position regression information and the like; the feature characterization related knowledge emphasizes the migration of feature characterization capability, and the method is relatively universal but independent of tasks; some students pay attention to the representation form of the transfer knowledge, such as attentionmap, Solution process (FSP), and others, and have started from the teacher perspective, and use an integrated model formed by multiple models as a teacher model. However, most of these methods consider the content of knowledge distillation learning process such as knowledge extraction position and representation form, and do not consider the problem of student model structure. Therefore, the current sub-model structure is a simplified version of the teacher model, namely, a structured convolutional neural network is adopted. However, the structured convolutional neural network determines the super-parameters such as the size of the convolutional kernel, the number of the convolutional kernels, the number of layers and the like of each layer, and the convolutional numbers and the sizes of the convolutional kernels are the same among different layers, so that the structure of a student model obtained after the distillation of the image recognition knowledge is fixed and complex in the distillation process of the image recognition knowledge, the flexibility is poor, the problem that the model parameter quantity is less on the premise that the precision is ensured and the reasoning speed is higher is caused, and the problem of low image recognition precision is further caused.
Disclosure of Invention
The invention aims to solve the problems that the student model obtained after knowledge distillation cannot enable the quantity of model parameters to be less and the reasoning speed to be higher on the premise of ensuring the precision due to the fact that the student model adopted in the existing image recognition knowledge distillation method is fixed in structure, complex and poor in flexibility, and further the image recognition precision is low.
The image recognition knowledge distillation method for automatically selecting the student model structure comprises the following specific processes:
acquiring a picture data set to be predicted, and inputting the picture data set to be predicted into a target classification network to acquire the category of the picture data to be predicted;
the target classification network is obtained by:
acquiring a prediction picture training set, training a deep convolutional neural network model by using a prediction picture training data set, and acquiring a trained deep convolutional neural network model as a teacher model;
step two, establishing a sub-model space containing a plurality of selectable paths for the student model:
in each stage of the deep convolutional neural network, presetting paths with different depths, convolutional channel forms and convolutional channel numbers;
and step three, automatically selecting the student models established in the step two according to the teacher model obtained in the step one, the set global objective function and the staged objective function, and obtaining the trained student models as a target classification network.
The invention has the beneficial effects that:
according to the image recognition knowledge distillation method provided by the invention, a plurality of paths are arranged in different stages of the student model, and selection is carried out according to the similarity between the image characteristic maps output by the student model and the image characteristic maps output by the teacher model, so that the network structure of the sub-model is more flexible, and the output image characteristic maps of each part are closer to the teacher model, thereby improving the accuracy of image recognition; meanwhile, the method can avoid the determination of hyper-parameters, ensures the accuracy of the model, simultaneously reduces the parameter quantity of the model, and has higher reasoning speed.
Drawings
FIG. 1 is a schematic diagram of different information transfer channel paths in the same stage;
FIG. 2 is a technical flow chart of a knowledge distillation method for automatically selecting a student model structure.
Detailed Description
The first embodiment is as follows: the image recognition knowledge distillation method for automatically selecting the student model structure comprises the following specific processes:
acquiring a picture data set to be predicted, and inputting the picture data set to be predicted into a target classification network to acquire the category of the picture data to be predicted;
the target classification network is obtained by:
acquiring a prediction picture training set, training a deep convolutional neural network model by using a prediction picture training data set, and acquiring the trained deep convolutional neural network model as a teacher model; (e.g., resnet56, VGG, etc.)
Step two, establishing a sub-model space with a plurality of selectable paths for each model stage of the student model, namely presetting the student model:
in each stage of the deep convolutional neural network, presetting some 'paths' with different depths, convolutional channel forms and convolutional channel numbers as much as possible for selection;
the preset different paths meet the output consistency constraint, that is, the dimension of the feature map output by each path in each stage is the same, as shown in fig. 1;
the sub-model space includes: a model of all possible paths in each stage in the deep neural network;
step three, automatically selecting the student models according to the teacher model obtained in the step one, the set global objective function and the staged objective function, and obtaining the selected student models as a target classification network, as shown in fig. 2:
performing iterative training on the student model, acquiring a feature map output by each path in each stage in the sub-model space in each iterative training process, and calculating the similarity between the feature map output by each path in each stage and a feature map output by a teacher model (a trained deep convolutional neural network);
the similarity between the feature map output by each path in each stage and the feature map output by the trained deep convolutional neural network is judged through a staged objective function:
L s =||W ss -W ts || 2 +λD s
wherein L is s The smaller the similarity of the two characteristic maps is, the larger L s Is an objective function of the current stage, W ss Is a characteristic map output by a student model at the current stage, W ts Is a characteristic map output by the teacher at the current stage, D s Is the parameter quantity of the selected path, λ is the relation parameter for balancing the parameter quantity in the loss function and L2-norm;
in the step, the path closest to the teacher model is selected to be output as the feature map of the output image at the current stage according to the similarity of the image feature maps output by the teacher model and the student models, so that the image recognition result output by the student models is closer to the image recognition result of the teacher model, and the image recognition precision of the student models is improved.
Step two, training the path corresponding to the maximum similarity value in each stage output in the step one, recording the selected number of the path, and deleting the path with the least selected number in sequence according to the selected number of the path through multiple rounds of iterative updating, wherein the network formed by the last path in each stage is the final student model;
in the training process, each 20 rounds of iterative updating are carried out, and unimportant paths are deleted one by one according to the selected number of different paths according to the sequence of the stage 1, the stage 2 and the stage 3.
Step three, training a final student model by utilizing a global objective function to obtain a target classification network:
the global target function adopts a traditional soft target + hard target mode (soft target + hard target), as follows:
simultaneously considering the difference (left term) between the network output and the real label of the data and the difference (right term) between the teacher network and the student network final output, wherein D (-) represents the cross entropy,andrespectively representing the logits output (namely the input of the last softmax layer of the convolutional neural network) generated by the student model and the teacher model on the jth input, y j The true tag representing the jth input, α is a parameter that trades off the importance of "soft target" and "hard target", T is the softening parameter of logits, j ∈ [1, m]Is a netThe label of the input data is rounded and m is the total number of input data.
And (5) performing final fine adjustment on the sub-model space according to the global loss function, training for 30 rounds, and obtaining a target classification network.
In this embodiment, in one iteration, the same input in each stage passes through n paths to obtain n different feature maps, and the n different feature maps are obtained according to the staged objective function L s And (3) selecting a path with the output characteristic diagram closest to the output characteristic diagram of the teacher model at the stage for training, wherein the selected number of the selected path is +1 and is recorded, the path selection of the rest stages is performed according to the scheme, only the selected path has the opportunity to be trained in each iteration, namely, only the network structure of the characteristic diagram output by the nearest teacher model is trained in each iteration, and in the training process, according to the difference of the selected number, redundant paths are gradually deleted. Finally, only one path closest to the teacher model will be retained for each phase. F () in fig. 2 represents a phasing target function, respectively.
Claims (7)
1. An image recognition knowledge distillation method for automatically selecting a student model structure is characterized by comprising the following specific processes: acquiring a picture data set to be predicted, and inputting the picture data set to be predicted into a target classification network to acquire the category of the picture data to be predicted;
acquiring a prediction picture training set, training a deep convolutional neural network model by using a prediction picture training data set, and acquiring a trained deep convolutional neural network model as a teacher model;
step two, establishing a sub-model space containing a plurality of selectable paths for the student model:
in each stage of the deep convolutional neural network, presetting some 'paths' with different depths, convolutional channel forms and convolutional channel numbers, wherein all models which can be formed by all the paths are sub-model spaces;
and step three, automatically selecting the student models established in the step two according to the teacher model, the global objective function and the staged objective function obtained in the step one, and obtaining the selected student models as a target classification network.
2. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 1, wherein: the deep convolutional neural network of the step one comprises: resnet56, VGG.
3. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 2, wherein: and each path in the submodel space containing the plurality of selectable paths in the step two meets the output consistency principle.
4. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 3, wherein: in the third step, the sub-model space containing the selectable path established in the second step is automatically selected according to the trained convolutional neural network model obtained in the first step, the global objective function and the staged objective function to obtain the target classification network, and the method comprises the following steps:
performing iterative training on the student model, acquiring a feature map output by each path in each stage in a sub-model space in each iterative training process, and calculating the similarity between the feature map output by each path in each stage and the feature map output by the trained deep convolutional neural network;
step two, training the path corresponding to the maximum similarity value in each stage output in the step one, recording the selected number of the path, and deleting the path with the least selected number in sequence according to the selected number of the path through multiple rounds of iterative updating, wherein the network formed by the last path in each stage is the final student model;
and step three, training a final student model by utilizing a global objective function to obtain a target classification network.
5. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 4, wherein the image recognition knowledge distillation method comprises the following steps: the similarity of the feature map output by each path in each stage in the neutron model space in each iteration training process and the feature map output by the trained deep convolutional neural network is judged through a staged objective function:
L s =||W ss -W ts || 2 +λD s
wherein L is s The smaller the similarity between the two characteristic maps is, the larger L s Is an objective function of the current stage, W ss Is a characteristic map output by a student model at the current stage, W ts Is a feature map output by a teacher model at the current stage, D s Is the parameter quantity of the selected path and lambda is the relation parameter used to balance the parameter quantity in the loss function and L2-norm.
6. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 5, wherein: the global objective function adopts a soft objective + hard objective mode.
7. The image recognition knowledge distillation method for automatically selecting the structure of a student model according to claim 6, wherein: the step of sequentially deleting the path with the least number of choices according to the number of choices of the path through the multiple rounds of iterative updating specifically comprises: and deleting the path with the least number of choices one by one according to the sequence of the stages and the number of choices of different paths every time the iteration updating of a preset turn is carried out in the process of carrying out the iterative training on the sub-model space.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210679569.XA CN115035341B (en) | 2022-06-15 | 2022-06-15 | Image recognition knowledge distillation method for automatically selecting student model structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210679569.XA CN115035341B (en) | 2022-06-15 | 2022-06-15 | Image recognition knowledge distillation method for automatically selecting student model structure |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115035341A true CN115035341A (en) | 2022-09-09 |
CN115035341B CN115035341B (en) | 2024-09-06 |
Family
ID=83125423
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210679569.XA Active CN115035341B (en) | 2022-06-15 | 2022-06-15 | Image recognition knowledge distillation method for automatically selecting student model structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115035341B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117197590A (en) * | 2023-11-06 | 2023-12-08 | 山东智洋上水信息技术有限公司 | Image classification method and device based on neural architecture search and knowledge distillation |
CN117372785A (en) * | 2023-12-04 | 2024-01-09 | 吉林大学 | Image classification method based on feature cluster center compression |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111199242A (en) * | 2019-12-18 | 2020-05-26 | 浙江工业大学 | Image increment learning method based on dynamic correction vector |
CN112116030A (en) * | 2020-10-13 | 2020-12-22 | 浙江大学 | Image classification method based on vector standardization and knowledge distillation |
WO2022057078A1 (en) * | 2020-09-21 | 2022-03-24 | 深圳大学 | Real-time colonoscopy image segmentation method and device based on ensemble and knowledge distillation |
US20220129731A1 (en) * | 2021-05-27 | 2022-04-28 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method and apparatus for training image recognition model, and method and apparatus for recognizing image |
-
2022
- 2022-06-15 CN CN202210679569.XA patent/CN115035341B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111199242A (en) * | 2019-12-18 | 2020-05-26 | 浙江工业大学 | Image increment learning method based on dynamic correction vector |
WO2022057078A1 (en) * | 2020-09-21 | 2022-03-24 | 深圳大学 | Real-time colonoscopy image segmentation method and device based on ensemble and knowledge distillation |
CN112116030A (en) * | 2020-10-13 | 2020-12-22 | 浙江大学 | Image classification method based on vector standardization and knowledge distillation |
US20220129731A1 (en) * | 2021-05-27 | 2022-04-28 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method and apparatus for training image recognition model, and method and apparatus for recognizing image |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117197590A (en) * | 2023-11-06 | 2023-12-08 | 山东智洋上水信息技术有限公司 | Image classification method and device based on neural architecture search and knowledge distillation |
CN117197590B (en) * | 2023-11-06 | 2024-02-27 | 山东智洋上水信息技术有限公司 | Image classification method and device based on neural architecture search and knowledge distillation |
CN117372785A (en) * | 2023-12-04 | 2024-01-09 | 吉林大学 | Image classification method based on feature cluster center compression |
CN117372785B (en) * | 2023-12-04 | 2024-03-26 | 吉林大学 | Image classification method based on feature cluster center compression |
Also Published As
Publication number | Publication date |
---|---|
CN115035341B (en) | 2024-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110851645B (en) | Image retrieval method based on similarity maintenance under deep metric learning | |
CN115035341A (en) | Image recognition knowledge distillation method capable of automatically selecting student model structure | |
CN115017418B (en) | Remote sensing image recommendation system and method based on reinforcement learning | |
CN110443257B (en) | Significance detection method based on active learning | |
CN110889450B (en) | Super-parameter tuning and model construction method and device | |
CN113505204B (en) | Recall model training method, search recall device and computer equipment | |
CN103942571B (en) | Graphic image sorting method based on genetic programming algorithm | |
CN113032613B (en) | Three-dimensional model retrieval method based on interactive attention convolution neural network | |
CN114298122B (en) | Data classification method, apparatus, device, storage medium and computer program product | |
CN109710804B (en) | Teaching video image knowledge point dimension reduction analysis method | |
CN114510594A (en) | Traditional pattern subgraph retrieval method based on self-attention mechanism | |
CN111506760B (en) | Depth integration measurement image retrieval method based on difficult perception | |
CN112712127A (en) | Image emotion polarity classification method combined with graph convolution neural network | |
CN115858847B (en) | Combined query image retrieval method based on cross-modal attention reservation | |
CN110704665A (en) | Image feature expression method and system based on visual attention mechanism | |
CN110866134A (en) | Image retrieval-oriented distribution consistency keeping metric learning method | |
Yang et al. | A comprehensive survey on image aesthetic quality assessment | |
CN115422369B (en) | Knowledge graph completion method and device based on improved TextRank | |
CN115249313A (en) | Image classification method based on meta-module fusion incremental learning | |
Zhang et al. | Emotion attention-aware collaborative deep reinforcement learning for image cropping | |
CN115359250A (en) | Cross-domain small sample image semantic segmentation method based on memory mechanism | |
CN115758159B (en) | Zero sample text position detection method based on mixed contrast learning and generation type data enhancement | |
CN117011515A (en) | Interactive image segmentation model based on attention mechanism and segmentation method thereof | |
CN116701706A (en) | Data processing method, device, equipment and medium based on artificial intelligence | |
CN116363461A (en) | Depth network incremental learning method for classifying tumor pathological images of multi-view children |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |