CN113822339A - Natural image classification method combining self-knowledge distillation and unsupervised method - Google Patents

Natural image classification method combining self-knowledge distillation and unsupervised method Download PDF

Info

Publication number
CN113822339A
CN113822339A CN202110992616.1A CN202110992616A CN113822339A CN 113822339 A CN113822339 A CN 113822339A CN 202110992616 A CN202110992616 A CN 202110992616A CN 113822339 A CN113822339 A CN 113822339A
Authority
CN
China
Prior art keywords
model
unsupervised
self
loss
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110992616.1A
Other languages
Chinese (zh)
Inventor
杨新武
刘伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110992616.1A priority Critical patent/CN113822339A/en
Publication of CN113822339A publication Critical patent/CN113822339A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The method discloses a natural image classification method combining self-knowledge distillation and an unsupervised method. Unsupervised learning aims to discover the characteristics of the data itself, and similar samples are represented similarly after being characterized. The unsupervised mode is introduced into the existing self-knowledge distillation method, so that the feature extraction capacity of each branch can be increased, and the accuracy of the model for classification is improved. In designing a branch structure, in order to further reduce the number of parameters and to improve the model estimation speed, a block convolution is adopted.

Description

Natural image classification method combining self-knowledge distillation and unsupervised method
Technical Field
The invention relates to the field of neural network model compression, unsupervised and image classification. And more particularly, to a natural image classification method that combines an unsupervised self-encoder method and a self-aware distillation method.
Background
In the deep neural network with huge parameters, not all parameters play a role in the model, and some parameters have limited functions and express redundancy, and even the performance of the model can be reduced. The large amount of parameters also makes the cost huge. The model compression technology aims to obtain a small-scale network which has less parameter quantity and less occupied resources compared with a large-scale network, but has good accuracy.
The advent of convolutional neural networks has enabled the performance of some computer vision and natural language processing tasks, such as image classification, object detection, text classification, etc., to be greatly improved. The performance of a deep network is often better than that of a shallow network, and the deep network has an excellent effect of capturing features and also brings problems, such as the increase of complex parameters of a model and the need of a large amount of calculation and memory resources. If ten million-level parameter models of the deep neural network are completely installed on some devices with limited resources, such as mobile devices, the resources of the devices are limited, and then corresponding inference tasks are performed, and the corresponding model inference tasks cannot be normally applied to the devices.
Knowledge distillation is a common model compression method that migrates knowledge of a complex model or multiple models into another lightweight model, making the model size smaller while minimizing performance loss. Existing deep learning distillation methods can be classified into a distillation method based on the final output result, and a distillation method based on an intermediate feature layer. In knowledge distillation, the traditional idea is to transmit the knowledge of teachers to students, and improve the abilities of the students.
In the existing model compression knowledge distillation technology, the self-knowledge distillation technology is an improvement on the distillation technology. In the conventional distillation technology, a large teacher network is required to be introduced for supervision, the large teacher network also requires a large amount of memory during loading, and the GPU is also required for forward inference. Self-aware distillation does not require the introduction of additional teacher structures and can be used as a teacher. This approach utilizes deep levels to train shallow levels. The method provides a mode of combining an unsupervised method and knowledge distillation. Unsupervised learning aims to find the characteristics of the data itself, and similar samples can be similar in representation after being subjected to feature extraction. Introducing this approach into existing self-knowledge distillation methods can improve the similarity of features between each branch, thereby improving the accuracy of the model for classification. When designing a branch structure, in order to further reduce the number of parameters and improve the model inference speed, a packet convolution is adopted.
Disclosure of Invention
The existing deep learning model has large parameter quantity and inflexible deployment and use, and in order to solve the problem, the invention adopts self-learning distillation combined with an unsupervised self-encoder mode, and the method improves the accuracy of branches by improving the feature extraction capability of each branch and increasing the similarity of features among the same types. When the method is applied, unnecessary parts can be cut off to reduce the parameter number, and multiple branches can be combined without considering the parameter number to further improve the model accuracy.
A natural image classification method combining self-knowledge distillation and unsupervised methods mainly comprises the following steps:
s1 data processing procedure
S1.1 preprocesses the data set to be trained using a simple data enhancement method.
S1.2, randomly disordering the data set, dividing the data set into different batches and sending the batches into a designed network;
s2 training procedure
And S2.1, inputting the preprocessed data into a designed network model to obtain a characteristic value of each branch structure before passing through a full connection layer.
S2.2, inputting the characteristic value into a designed decoder, and finally outputting a characteristic with the same size as the input data by the decoder, wherein the essence is to restore an initial picture and the MSE loss is solved. Separate codec structures design different weight MSE losses.
And S2.3, inputting the features before the full connection in the previous step into the full connection, and obtaining a predicted value through softmax. And solving the cross entropy loss by the predicted value and the real label.
S2.4 back-propagating loss, and repeating until the model converges.
S3 prediction process
S3.1 remove the decoder part and only preserve the trunk and branch structure.
And S3.2, inputting the pictures to be classified into a network for prediction.
For the step 2 training process, the resulting loss function is:
Figure BDA0003232940040000031
here, α and β are for balancing the respective losses. Beta stores the loss weight of decoding result of each branch.
Cross entropy loss function Cross Encopy (p)i,y),piAnd the value range of i is a plurality of branch values. KL () is a loss function of knowledge distillation, which is the Kullback-Leibler divergence between the two outputs, introducing a gentle distribution of temperature coefficients, and passing teacher information to a small student network through this loss. MSE is the loss of the decoded part to the original picture. Beta is ═ beta11…,βi]Different decoding loss weights are stored for each branch.
In the training process, the learning rate is attenuated in different rounds, and the convergence of the model is accelerated; and an L2 regularization was introduced.
Drawings
Fig. 1 is a model structure diagram according to the present invention.
Fig. 2 is a block diagram of an unsupervised self-encoder according to the present invention.
Fig. 3 is a flow chart according to the present invention.
Fig. 4 is a graph of extracted feature similarity in accordance with the present invention.
Detailed Description
For the purpose of promoting a better understanding of the objects, features and advantages of the invention, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
S1 data part
The present embodiment uses the Cifar100 dataset as the image classification training dataset. The Cifar100 dataset contains 6 ten thousand training pictures, of which 5 ten thousand pictures are training set and 1 ten thousand pictures are test set, and there are 10 categories in total.
S1.1 data enhancement processing is carried out by using a simple random cutting and horizontal inversion mode
S1.2, carrying out a normaize operation on the data. Randomly disorganized and then divided into different batches.
S2 training part
S2.1, the deepest layer of the model divided according to the depth serves as a teacher network, and the output of the teacher network is used for monitoring other shallow branches in a knowledge distillation mode.
S2.2, combining the designed model divided according to the depth and the designed decoder part together to construct the whole network.
S2.3, inputting the preprocessed data into a designed network model to obtain a characteristic value of each branch structure before passing through a full connection layer
And S2.4, inputting the characteristic value into a designed decoder, and finally outputting a characteristic with the same size as the input data by the decoder, wherein the essence is to restore the initial picture. Separate codec structures design different weight MSE losses.
And S2.5, inputting the features before full connection into full connection, and obtaining a predicted value through softmax. And solving the cross entropy loss between the predicted value and the real label.
S2.6, the super-parameter temperature coefficient is required to be set in the training process, and the temperature coefficient can enable the final predicted output of the teacher and the student network to be gentle. The distribution after flattening finds the KL divergence to shift the deep knowledge to the shallow.
S2.7, in the training process, the learning rate is attenuated in different turns, and the convergence of the model is accelerated; and an L2 regularization was introduced.
Figure BDA0003232940040000041
Here, α and β are for balancing the respective losses. The beta stores the different weights of the branches.
Cross entropy loss function Cross Encopy (p)i,y),piAnd the value range of i is a plurality of branch values. KL () is a loss function of knowledge distillation, which is the Kullback-Leibler divergence between the two outputs, by which loss deep teacher information is passed to shallow, small student networks. MSE is the loss of the decoded part to the original picture. Beta is ═ beta11…,βi]Different decoding loss weights are stored for each branch. Different depths of individual self-encoder pairs use different β.
After the forward propagation is completed, m, p can be obtainediThe loss value of this batch is found according to the above loss function. The entire network is trained using a random gradient descent method. This round is completed when all batches have been counter-propagated once.
S2.8 repeat the training according to the procedure described above until the model finally converges.
S3 test procedure
S3.1, the designed model is that the unsupervised self-encoder is combined with the original structure. When model prediction is required, the self-encoder part in the whole body can be removed, and only the trunk and branch structures are loaded.
And S3.2, inputting the pictures to be predicted into the model to obtain a classification result.
The method provides a natural image classification method combining an unsupervised method and a knowledge distillation method. Unsupervised learning aims to discover the characteristics of the data itself, and similar samples are represented similarly after being characterized. The unsupervised mode is introduced into the existing self-knowledge distillation method, so that the feature extraction capacity of each branch can be increased, and the accuracy of the model for classification is improved. In designing a branch structure, in order to further reduce the number of parameters and to improve the model estimation speed, a block convolution is adopted.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions are included in the scope of the present invention, and therefore, the scope of the present invention should be determined by the protection scope of the claims.

Claims (4)

1. A natural image classification method combining self-knowledge distillation and an unsupervised method is characterized in that unsupervised extraction capacity of image features is as follows:
s1 data part
Using the Cifar100 dataset as an image classification training dataset; the Cifar100 data set comprises 6 ten thousand training pictures, wherein 5 ten thousand pictures are taken as a training set, 1 ten thousand pictures are taken as a test set, and 10 categories are contained in total;
s1.1 data enhancement processing is carried out by using a simple random cutting and horizontal inversion mode
S1.2, performing normaize operation on the data; randomly disorganizing and dividing into different batches;
s2 training part
S2.1, taking the deepest layer of the deeply divided model as a teacher network, and monitoring other shallow branches by using the output of the teacher network in a knowledge distillation mode;
s2.2, combining the designed model divided according to the depth and the designed decoder part together to construct a whole network;
s2.3, inputting the preprocessed data into a designed network model to obtain a characteristic value of each branch structure before passing through a full connection layer
S2.4, inputting the characteristic value into a designed decoder, and finally outputting a characteristic with the same size as the input data by the decoder, wherein the essence is to restore an initial picture; designing MSE loss of different weights by using a single coding and decoding structure;
s2.5, inputting the features before full connection into full connection, and obtaining a predicted value through softmax; solving cross entropy loss between the predicted value and the real label;
s2.6, in the training process, a super-parameter temperature coefficient is required to be set, and the temperature coefficient can enable the final predicted output of a teacher network and a student network to be relatively smooth; the KL divergence is calculated according to the gentle distribution, so that the deep knowledge is transferred to the shallow knowledge;
s2.7, in the training process, the learning rate is attenuated in different turns, and the convergence of the model is accelerated; and L2 regularization was introduced;
Figure FDA0003232940030000011
α, β are to balance the respective losses; beta stores different weights of each branch;
cross entropy loss function Cross Encopy (p)i,y),piRepresenting the predicted value of the network to the sample finally, wherein the value range of i is a plurality of branch values; KL () is a loss function of knowledge distillation, which is the Kullback-Leibler divergence between two outputs, and transmits deep teacher information to a shallow small student network through the loss; MSE is the loss of the decoded part to the original picture; beta is ═ beta11…,βi]Different decoding loss weights of all branches are stored; different depths of individual self-encoder pairs use different β;
after the forward propagation is completed, m, p can be obtainediObtaining the loss value of the batch according to the loss function; training the whole network by using a random gradient descent method; when the reverse propagation of all batches is completed once, the round is finished;
s2.8, repeating the training according to the process described above until the model finally converges;
s3 test procedure
S3.1, the designed model is formed by combining an unsupervised self-encoder with the original structure; when model prediction is needed, a self-encoder part in the whole is removed, and only a trunk and branch structure is loaded;
and S3.2, inputting the pictures to be predicted into the model to obtain a classification result.
2. The method for natural image classification by self-knowledge distillation and unsupervised method according to claim 1, wherein the method comprises the following steps: the loss generated by the unsupervised self-encoder adds to the total loss.
3. The method for natural image classification by self-knowledge distillation and unsupervised method according to claim 1, wherein the method comprises the following steps: the decoder is designed to match the characteristic outputs of the branch structure so that they can be concatenated together.
4. The method for natural image classification by self-knowledge distillation and unsupervised method according to claim 1, wherein the method comprises the following steps: multiple exits multiple branch structures share a decoder, increasing the similarity between features.
CN202110992616.1A 2021-08-27 2021-08-27 Natural image classification method combining self-knowledge distillation and unsupervised method Pending CN113822339A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110992616.1A CN113822339A (en) 2021-08-27 2021-08-27 Natural image classification method combining self-knowledge distillation and unsupervised method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110992616.1A CN113822339A (en) 2021-08-27 2021-08-27 Natural image classification method combining self-knowledge distillation and unsupervised method

Publications (1)

Publication Number Publication Date
CN113822339A true CN113822339A (en) 2021-12-21

Family

ID=78913667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110992616.1A Pending CN113822339A (en) 2021-08-27 2021-08-27 Natural image classification method combining self-knowledge distillation and unsupervised method

Country Status (1)

Country Link
CN (1) CN113822339A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180053057A1 (en) * 2016-08-18 2018-02-22 Xerox Corporation System and method for video classification using a hybrid unsupervised and supervised multi-layer architecture
CN110414368A (en) * 2019-07-04 2019-11-05 华中科技大学 A kind of unsupervised pedestrian recognition methods again of knowledge based distillation
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation
CN112465111A (en) * 2020-11-17 2021-03-09 大连理工大学 Three-dimensional voxel image segmentation method based on knowledge distillation and countertraining
CN112906747A (en) * 2021-01-25 2021-06-04 北京工业大学 Knowledge distillation-based image classification method
CN112949786A (en) * 2021-05-17 2021-06-11 腾讯科技(深圳)有限公司 Data classification identification method, device, equipment and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180053057A1 (en) * 2016-08-18 2018-02-22 Xerox Corporation System and method for video classification using a hybrid unsupervised and supervised multi-layer architecture
CN110414368A (en) * 2019-07-04 2019-11-05 华中科技大学 A kind of unsupervised pedestrian recognition methods again of knowledge based distillation
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation
CN112465111A (en) * 2020-11-17 2021-03-09 大连理工大学 Three-dimensional voxel image segmentation method based on knowledge distillation and countertraining
CN112906747A (en) * 2021-01-25 2021-06-04 北京工业大学 Knowledge distillation-based image classification method
CN112949786A (en) * 2021-05-17 2021-06-11 腾讯科技(深圳)有限公司 Data classification identification method, device, equipment and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王金甲;杨倩;崔琳;纪绍男;: "基于平均教师模型的弱标记半监督声音事件检测", 复旦学报(自然科学版), no. 05, 15 October 2020 (2020-10-15) *
赵胜伟;葛仕明;叶奇挺;罗朝;李强;: "基于增强监督知识蒸馏的交通标识分类", 中国科技论文, no. 20, 23 October 2017 (2017-10-23) *

Similar Documents

Publication Publication Date Title
CN108875807B (en) Image description method based on multiple attention and multiple scales
CN112101190B (en) Remote sensing image classification method, storage medium and computing device
CN111145116B (en) Sea surface rainy day image sample augmentation method based on generation of countermeasure network
CN113159051B (en) Remote sensing image lightweight semantic segmentation method based on edge decoupling
CN108648188B (en) No-reference image quality evaluation method based on generation countermeasure network
CN110751698B (en) Text-to-image generation method based on hybrid network model
CN107506823B (en) Construction method of hybrid neural network model for dialog generation
CN109859288B (en) Image coloring method and device based on generation countermeasure network
CN109857871B (en) User relationship discovery method based on social network mass contextual data
CN109242090B (en) Video description and description consistency judgment method based on GAN network
CN113705811B (en) Model training method, device, computer program product and equipment
WO2021042857A1 (en) Processing method and processing apparatus for image segmentation model
CN110009700B (en) Convolutional neural network visual depth estimation method based on RGB (red, green and blue) graph and gradient graph
CN112784929A (en) Small sample image classification method and device based on double-element group expansion
CN113807222A (en) Video question-answering method and system for end-to-end training based on sparse sampling
CN115170874A (en) Self-distillation implementation method based on decoupling distillation loss
CN114912419A (en) Unified machine reading understanding method based on reorganization confrontation
CN113743277A (en) Method, system, equipment and storage medium for short video frequency classification
CN115829029A (en) Channel attention-based self-distillation implementation method
CN113822339A (en) Natural image classification method combining self-knowledge distillation and unsupervised method
CN115471576A (en) Point cloud lossless compression method and device based on deep learning
CN113747480B (en) Processing method and device for 5G slice faults and computing equipment
CN115660882A (en) Method for predicting user-to-user relationship in social network and multi-head mixed aggregation graph convolutional network
CN114139674A (en) Behavior cloning method, electronic device, storage medium, and program product
CN113518229B (en) Method and device for training loop filter network, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination