CN113822339A - Natural image classification method combining self-knowledge distillation and unsupervised method - Google Patents
Natural image classification method combining self-knowledge distillation and unsupervised method Download PDFInfo
- Publication number
- CN113822339A CN113822339A CN202110992616.1A CN202110992616A CN113822339A CN 113822339 A CN113822339 A CN 113822339A CN 202110992616 A CN202110992616 A CN 202110992616A CN 113822339 A CN113822339 A CN 113822339A
- Authority
- CN
- China
- Prior art keywords
- model
- unsupervised
- self
- loss
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000013140 knowledge distillation Methods 0.000 title claims abstract description 22
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 238000012549 training Methods 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 8
- 230000002238 attenuated effect Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 2
- 238000012544 monitoring process Methods 0.000 claims description 2
- 238000010998 test method Methods 0.000 claims description 2
- 238000012360 testing method Methods 0.000 claims description 2
- 238000004821 distillation Methods 0.000 description 8
- 230000006835 compression Effects 0.000 description 4
- 238000007906 compression Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The method discloses a natural image classification method combining self-knowledge distillation and an unsupervised method. Unsupervised learning aims to discover the characteristics of the data itself, and similar samples are represented similarly after being characterized. The unsupervised mode is introduced into the existing self-knowledge distillation method, so that the feature extraction capacity of each branch can be increased, and the accuracy of the model for classification is improved. In designing a branch structure, in order to further reduce the number of parameters and to improve the model estimation speed, a block convolution is adopted.
Description
Technical Field
The invention relates to the field of neural network model compression, unsupervised and image classification. And more particularly, to a natural image classification method that combines an unsupervised self-encoder method and a self-aware distillation method.
Background
In the deep neural network with huge parameters, not all parameters play a role in the model, and some parameters have limited functions and express redundancy, and even the performance of the model can be reduced. The large amount of parameters also makes the cost huge. The model compression technology aims to obtain a small-scale network which has less parameter quantity and less occupied resources compared with a large-scale network, but has good accuracy.
The advent of convolutional neural networks has enabled the performance of some computer vision and natural language processing tasks, such as image classification, object detection, text classification, etc., to be greatly improved. The performance of a deep network is often better than that of a shallow network, and the deep network has an excellent effect of capturing features and also brings problems, such as the increase of complex parameters of a model and the need of a large amount of calculation and memory resources. If ten million-level parameter models of the deep neural network are completely installed on some devices with limited resources, such as mobile devices, the resources of the devices are limited, and then corresponding inference tasks are performed, and the corresponding model inference tasks cannot be normally applied to the devices.
Knowledge distillation is a common model compression method that migrates knowledge of a complex model or multiple models into another lightweight model, making the model size smaller while minimizing performance loss. Existing deep learning distillation methods can be classified into a distillation method based on the final output result, and a distillation method based on an intermediate feature layer. In knowledge distillation, the traditional idea is to transmit the knowledge of teachers to students, and improve the abilities of the students.
In the existing model compression knowledge distillation technology, the self-knowledge distillation technology is an improvement on the distillation technology. In the conventional distillation technology, a large teacher network is required to be introduced for supervision, the large teacher network also requires a large amount of memory during loading, and the GPU is also required for forward inference. Self-aware distillation does not require the introduction of additional teacher structures and can be used as a teacher. This approach utilizes deep levels to train shallow levels. The method provides a mode of combining an unsupervised method and knowledge distillation. Unsupervised learning aims to find the characteristics of the data itself, and similar samples can be similar in representation after being subjected to feature extraction. Introducing this approach into existing self-knowledge distillation methods can improve the similarity of features between each branch, thereby improving the accuracy of the model for classification. When designing a branch structure, in order to further reduce the number of parameters and improve the model inference speed, a packet convolution is adopted.
Disclosure of Invention
The existing deep learning model has large parameter quantity and inflexible deployment and use, and in order to solve the problem, the invention adopts self-learning distillation combined with an unsupervised self-encoder mode, and the method improves the accuracy of branches by improving the feature extraction capability of each branch and increasing the similarity of features among the same types. When the method is applied, unnecessary parts can be cut off to reduce the parameter number, and multiple branches can be combined without considering the parameter number to further improve the model accuracy.
A natural image classification method combining self-knowledge distillation and unsupervised methods mainly comprises the following steps:
s1 data processing procedure
S1.1 preprocesses the data set to be trained using a simple data enhancement method.
S1.2, randomly disordering the data set, dividing the data set into different batches and sending the batches into a designed network;
s2 training procedure
And S2.1, inputting the preprocessed data into a designed network model to obtain a characteristic value of each branch structure before passing through a full connection layer.
S2.2, inputting the characteristic value into a designed decoder, and finally outputting a characteristic with the same size as the input data by the decoder, wherein the essence is to restore an initial picture and the MSE loss is solved. Separate codec structures design different weight MSE losses.
And S2.3, inputting the features before the full connection in the previous step into the full connection, and obtaining a predicted value through softmax. And solving the cross entropy loss by the predicted value and the real label.
S2.4 back-propagating loss, and repeating until the model converges.
S3 prediction process
S3.1 remove the decoder part and only preserve the trunk and branch structure.
And S3.2, inputting the pictures to be classified into a network for prediction.
For the step 2 training process, the resulting loss function is:
here, α and β are for balancing the respective losses. Beta stores the loss weight of decoding result of each branch.
Cross entropy loss function Cross Encopy (p)i,y),piAnd the value range of i is a plurality of branch values. KL () is a loss function of knowledge distillation, which is the Kullback-Leibler divergence between the two outputs, introducing a gentle distribution of temperature coefficients, and passing teacher information to a small student network through this loss. MSE is the loss of the decoded part to the original picture. Beta is ═ beta1,β1…,βi]Different decoding loss weights are stored for each branch.
In the training process, the learning rate is attenuated in different rounds, and the convergence of the model is accelerated; and an L2 regularization was introduced.
Drawings
Fig. 1 is a model structure diagram according to the present invention.
Fig. 2 is a block diagram of an unsupervised self-encoder according to the present invention.
Fig. 3 is a flow chart according to the present invention.
Fig. 4 is a graph of extracted feature similarity in accordance with the present invention.
Detailed Description
For the purpose of promoting a better understanding of the objects, features and advantages of the invention, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
S1 data part
The present embodiment uses the Cifar100 dataset as the image classification training dataset. The Cifar100 dataset contains 6 ten thousand training pictures, of which 5 ten thousand pictures are training set and 1 ten thousand pictures are test set, and there are 10 categories in total.
S1.1 data enhancement processing is carried out by using a simple random cutting and horizontal inversion mode
S1.2, carrying out a normaize operation on the data. Randomly disorganized and then divided into different batches.
S2 training part
S2.1, the deepest layer of the model divided according to the depth serves as a teacher network, and the output of the teacher network is used for monitoring other shallow branches in a knowledge distillation mode.
S2.2, combining the designed model divided according to the depth and the designed decoder part together to construct the whole network.
S2.3, inputting the preprocessed data into a designed network model to obtain a characteristic value of each branch structure before passing through a full connection layer
And S2.4, inputting the characteristic value into a designed decoder, and finally outputting a characteristic with the same size as the input data by the decoder, wherein the essence is to restore the initial picture. Separate codec structures design different weight MSE losses.
And S2.5, inputting the features before full connection into full connection, and obtaining a predicted value through softmax. And solving the cross entropy loss between the predicted value and the real label.
S2.6, the super-parameter temperature coefficient is required to be set in the training process, and the temperature coefficient can enable the final predicted output of the teacher and the student network to be gentle. The distribution after flattening finds the KL divergence to shift the deep knowledge to the shallow.
S2.7, in the training process, the learning rate is attenuated in different turns, and the convergence of the model is accelerated; and an L2 regularization was introduced.
Here, α and β are for balancing the respective losses. The beta stores the different weights of the branches.
Cross entropy loss function Cross Encopy (p)i,y),piAnd the value range of i is a plurality of branch values. KL () is a loss function of knowledge distillation, which is the Kullback-Leibler divergence between the two outputs, by which loss deep teacher information is passed to shallow, small student networks. MSE is the loss of the decoded part to the original picture. Beta is ═ beta1,β1…,βi]Different decoding loss weights are stored for each branch. Different depths of individual self-encoder pairs use different β.
After the forward propagation is completed, m, p can be obtainediThe loss value of this batch is found according to the above loss function. The entire network is trained using a random gradient descent method. This round is completed when all batches have been counter-propagated once.
S2.8 repeat the training according to the procedure described above until the model finally converges.
S3 test procedure
S3.1, the designed model is that the unsupervised self-encoder is combined with the original structure. When model prediction is required, the self-encoder part in the whole body can be removed, and only the trunk and branch structures are loaded.
And S3.2, inputting the pictures to be predicted into the model to obtain a classification result.
The method provides a natural image classification method combining an unsupervised method and a knowledge distillation method. Unsupervised learning aims to discover the characteristics of the data itself, and similar samples are represented similarly after being characterized. The unsupervised mode is introduced into the existing self-knowledge distillation method, so that the feature extraction capacity of each branch can be increased, and the accuracy of the model for classification is improved. In designing a branch structure, in order to further reduce the number of parameters and to improve the model estimation speed, a block convolution is adopted.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions are included in the scope of the present invention, and therefore, the scope of the present invention should be determined by the protection scope of the claims.
Claims (4)
1. A natural image classification method combining self-knowledge distillation and an unsupervised method is characterized in that unsupervised extraction capacity of image features is as follows:
s1 data part
Using the Cifar100 dataset as an image classification training dataset; the Cifar100 data set comprises 6 ten thousand training pictures, wherein 5 ten thousand pictures are taken as a training set, 1 ten thousand pictures are taken as a test set, and 10 categories are contained in total;
s1.1 data enhancement processing is carried out by using a simple random cutting and horizontal inversion mode
S1.2, performing normaize operation on the data; randomly disorganizing and dividing into different batches;
s2 training part
S2.1, taking the deepest layer of the deeply divided model as a teacher network, and monitoring other shallow branches by using the output of the teacher network in a knowledge distillation mode;
s2.2, combining the designed model divided according to the depth and the designed decoder part together to construct a whole network;
s2.3, inputting the preprocessed data into a designed network model to obtain a characteristic value of each branch structure before passing through a full connection layer
S2.4, inputting the characteristic value into a designed decoder, and finally outputting a characteristic with the same size as the input data by the decoder, wherein the essence is to restore an initial picture; designing MSE loss of different weights by using a single coding and decoding structure;
s2.5, inputting the features before full connection into full connection, and obtaining a predicted value through softmax; solving cross entropy loss between the predicted value and the real label;
s2.6, in the training process, a super-parameter temperature coefficient is required to be set, and the temperature coefficient can enable the final predicted output of a teacher network and a student network to be relatively smooth; the KL divergence is calculated according to the gentle distribution, so that the deep knowledge is transferred to the shallow knowledge;
s2.7, in the training process, the learning rate is attenuated in different turns, and the convergence of the model is accelerated; and L2 regularization was introduced;
α, β are to balance the respective losses; beta stores different weights of each branch;
cross entropy loss function Cross Encopy (p)i,y),piRepresenting the predicted value of the network to the sample finally, wherein the value range of i is a plurality of branch values; KL () is a loss function of knowledge distillation, which is the Kullback-Leibler divergence between two outputs, and transmits deep teacher information to a shallow small student network through the loss; MSE is the loss of the decoded part to the original picture; beta is ═ beta1,β1…,βi]Different decoding loss weights of all branches are stored; different depths of individual self-encoder pairs use different β;
after the forward propagation is completed, m, p can be obtainediObtaining the loss value of the batch according to the loss function; training the whole network by using a random gradient descent method; when the reverse propagation of all batches is completed once, the round is finished;
s2.8, repeating the training according to the process described above until the model finally converges;
s3 test procedure
S3.1, the designed model is formed by combining an unsupervised self-encoder with the original structure; when model prediction is needed, a self-encoder part in the whole is removed, and only a trunk and branch structure is loaded;
and S3.2, inputting the pictures to be predicted into the model to obtain a classification result.
2. The method for natural image classification by self-knowledge distillation and unsupervised method according to claim 1, wherein the method comprises the following steps: the loss generated by the unsupervised self-encoder adds to the total loss.
3. The method for natural image classification by self-knowledge distillation and unsupervised method according to claim 1, wherein the method comprises the following steps: the decoder is designed to match the characteristic outputs of the branch structure so that they can be concatenated together.
4. The method for natural image classification by self-knowledge distillation and unsupervised method according to claim 1, wherein the method comprises the following steps: multiple exits multiple branch structures share a decoder, increasing the similarity between features.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110992616.1A CN113822339A (en) | 2021-08-27 | 2021-08-27 | Natural image classification method combining self-knowledge distillation and unsupervised method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110992616.1A CN113822339A (en) | 2021-08-27 | 2021-08-27 | Natural image classification method combining self-knowledge distillation and unsupervised method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113822339A true CN113822339A (en) | 2021-12-21 |
Family
ID=78913667
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110992616.1A Pending CN113822339A (en) | 2021-08-27 | 2021-08-27 | Natural image classification method combining self-knowledge distillation and unsupervised method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113822339A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180053057A1 (en) * | 2016-08-18 | 2018-02-22 | Xerox Corporation | System and method for video classification using a hybrid unsupervised and supervised multi-layer architecture |
CN110414368A (en) * | 2019-07-04 | 2019-11-05 | 华中科技大学 | A kind of unsupervised pedestrian recognition methods again of knowledge based distillation |
CN112116030A (en) * | 2020-10-13 | 2020-12-22 | 浙江大学 | Image classification method based on vector standardization and knowledge distillation |
CN112465111A (en) * | 2020-11-17 | 2021-03-09 | 大连理工大学 | Three-dimensional voxel image segmentation method based on knowledge distillation and countertraining |
CN112906747A (en) * | 2021-01-25 | 2021-06-04 | 北京工业大学 | Knowledge distillation-based image classification method |
CN112949786A (en) * | 2021-05-17 | 2021-06-11 | 腾讯科技(深圳)有限公司 | Data classification identification method, device, equipment and readable storage medium |
-
2021
- 2021-08-27 CN CN202110992616.1A patent/CN113822339A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180053057A1 (en) * | 2016-08-18 | 2018-02-22 | Xerox Corporation | System and method for video classification using a hybrid unsupervised and supervised multi-layer architecture |
CN110414368A (en) * | 2019-07-04 | 2019-11-05 | 华中科技大学 | A kind of unsupervised pedestrian recognition methods again of knowledge based distillation |
CN112116030A (en) * | 2020-10-13 | 2020-12-22 | 浙江大学 | Image classification method based on vector standardization and knowledge distillation |
CN112465111A (en) * | 2020-11-17 | 2021-03-09 | 大连理工大学 | Three-dimensional voxel image segmentation method based on knowledge distillation and countertraining |
CN112906747A (en) * | 2021-01-25 | 2021-06-04 | 北京工业大学 | Knowledge distillation-based image classification method |
CN112949786A (en) * | 2021-05-17 | 2021-06-11 | 腾讯科技(深圳)有限公司 | Data classification identification method, device, equipment and readable storage medium |
Non-Patent Citations (2)
Title |
---|
王金甲;杨倩;崔琳;纪绍男;: "基于平均教师模型的弱标记半监督声音事件检测", 复旦学报(自然科学版), no. 05, 15 October 2020 (2020-10-15) * |
赵胜伟;葛仕明;叶奇挺;罗朝;李强;: "基于增强监督知识蒸馏的交通标识分类", 中国科技论文, no. 20, 23 October 2017 (2017-10-23) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108875807B (en) | Image description method based on multiple attention and multiple scales | |
CN112101190B (en) | Remote sensing image classification method, storage medium and computing device | |
CN111145116B (en) | Sea surface rainy day image sample augmentation method based on generation of countermeasure network | |
CN113159051B (en) | Remote sensing image lightweight semantic segmentation method based on edge decoupling | |
CN108648188B (en) | No-reference image quality evaluation method based on generation countermeasure network | |
CN110751698B (en) | Text-to-image generation method based on hybrid network model | |
CN107506823B (en) | Construction method of hybrid neural network model for dialog generation | |
CN109859288B (en) | Image coloring method and device based on generation countermeasure network | |
CN109857871B (en) | User relationship discovery method based on social network mass contextual data | |
CN109242090B (en) | Video description and description consistency judgment method based on GAN network | |
CN113705811B (en) | Model training method, device, computer program product and equipment | |
WO2021042857A1 (en) | Processing method and processing apparatus for image segmentation model | |
CN110009700B (en) | Convolutional neural network visual depth estimation method based on RGB (red, green and blue) graph and gradient graph | |
CN112784929A (en) | Small sample image classification method and device based on double-element group expansion | |
CN113807222A (en) | Video question-answering method and system for end-to-end training based on sparse sampling | |
CN115170874A (en) | Self-distillation implementation method based on decoupling distillation loss | |
CN114912419A (en) | Unified machine reading understanding method based on reorganization confrontation | |
CN113743277A (en) | Method, system, equipment and storage medium for short video frequency classification | |
CN115829029A (en) | Channel attention-based self-distillation implementation method | |
CN113822339A (en) | Natural image classification method combining self-knowledge distillation and unsupervised method | |
CN115471576A (en) | Point cloud lossless compression method and device based on deep learning | |
CN113747480B (en) | Processing method and device for 5G slice faults and computing equipment | |
CN115660882A (en) | Method for predicting user-to-user relationship in social network and multi-head mixed aggregation graph convolutional network | |
CN114139674A (en) | Behavior cloning method, electronic device, storage medium, and program product | |
CN113518229B (en) | Method and device for training loop filter network, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |