CN114626504A - Model compression method based on group relation knowledge distillation - Google Patents
Model compression method based on group relation knowledge distillation Download PDFInfo
- Publication number
- CN114626504A CN114626504A CN202210030247.2A CN202210030247A CN114626504A CN 114626504 A CN114626504 A CN 114626504A CN 202210030247 A CN202210030247 A CN 202210030247A CN 114626504 A CN114626504 A CN 114626504A
- Authority
- CN
- China
- Prior art keywords
- image
- network
- group
- sample
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a model compression method based on group relation knowledge distillation, which is characterized in that after a data set is preprocessed, a large-capacity convolutional neural network is initialized randomly as a teacher network, and the network is pre-trained by utilizing a cross entropy loss function; and then, in a knowledge distillation stage, randomly initializing a small-capacity convolutional neural network as a student network, carrying out K-means clustering on the image sample characteristics by the teacher network and the student network respectively, calculating the relation among all groups by using the maximum mean difference, and training the student network by using the weighting sum of cross entropy and a group relation loss function. And finally, carrying out classification decision on the test image by using the trained network. The method can guide the student network to imitate the grouping capability of the teacher to the samples, so that the performance of the student network approaches the performance of the teacher network.
Description
Technical Field
The invention relates to a model compression method based on group relation knowledge distillation, and belongs to the technical field of computer vision.
Background
In recent years, deep convolutional neural networks have enjoyed unprecedented success in a number of artificial intelligence fields, such as computer vision, natural language processing, speech recognition, and the like. However, these successes are heavily dependent on powerful computing power and huge memory resources to a large extent, so that deep convolutional networks cannot be widely deployed in embedded and mobile systems at all. In order to reduce the computation cost while maintaining excellent performance, knowledge distillation technology is proposed to transfer the knowledge learned in a teacher network with large scale and many parameters to a student network with small scale and few parameters, so as to enable a network model with smaller complexity to approach or exceed the performance of a complex network as much as possible.
Knowledge distillation is an effective model compression technique, and aims to improve the performance of lightweight student networks by transferring knowledge from a large-capacity teacher network trained in advance. The core idea is to train a complex teacher network model in advance, and then train a smaller student network by using the output of the teacher network model and the real label of the data. However, the output of the pre-trained teacher network contains information similar to the authentic labels, thereby allowing limited knowledge to be utilized in the knowledge distillation process. Subsequently, several distillation methods based on knowledge of the characteristics implicit in the network intermediate layers were proposed in succession. For example, Romero et al propose directly matching feature outputs of a student network and a teacher network by minimizing the L2 distance between peer-level features. However, the conditions for the student network to simulate the overall characteristics of the teacher network are too stringent to adversely affect the performance and convergence of the model. Consequently, follow-up work is undertaken on how to encode features near-efficiently on feature knowledge to close the gap between student and teacher networks. Zagaruyko and the like convert the characteristic diagram into a space attention diagram and guide students to learn a teacher network attention area in a network way; yim et al propose as knowledge the Gram matrix between adjacent layer features within the same residual block. Although the above works achieve good performance, they still suffer from the problem of ignoring the relationship between the different samples. Recent work has therefore focused on how to use the relationships between sample features for knowledge distillation. For example, Tung et al calculates the similarity between different sample features to obtain a similarity matrix, and keeps the similarity between the sample features consistent between the student network and the teacher through training. Zhu et al propose to use both the features and their gradient information to obtain better knowledge of the relationship.
Knowledge distillation methods based on sample characteristic relationships can achieve good performance when model compression is performed on low volumes, and several work of modeling sample characteristic relationships has been performed at present. However, the current work is still only based on establishing the relationship among sample individuals, and neglecting the group relationship among samples. The group relationship established according to the similarity between the samples is an important knowledge and is important for improving the performance of the model.
Disclosure of Invention
In view of the problems in the prior art, the present invention provides a model compression method based on group relation knowledge distillation, so as to solve the above technical problems.
In order to achieve the purpose, the invention adopts the technical scheme that: a model compression method based on group relation knowledge distillation comprises the following steps:
the method comprises the following steps: processing image data;
step two: training a teacher network;
step three: constructing teacher network group relation knowledge;
step four: constructing student network group relation knowledge;
step five: training a student network;
step six: and (5) testing the student network.
Further, the specific steps of processing the image data in the first step are as follows:
s11: for a given image data set, randomly dividing the given image data set into three subsets, namely a training set, a verification set and a test set, and respectively using the three subsets for training a model, verifying a hyper-parameter and testing the performance of the model;
s12: the number of image classification classes is C, and the mth image sample in the training set is represented as xmIts corresponding label ym(ii) a The ith image sample in the test set is denoted xi。
Further, the training of the teacher network in the second step includes the specific steps of:
s21: randomly initializing a large-capacity convolutional neural network, which consists of a plurality of convolutional layers, a plurality of fully-connected layers and a Softmax layer; the convolutional layer is represented asFor extracting features of image samples, where θtIs a parameter of the convolutional layer; the fully-connected layer is represented asFor transformation and classification of features, wherein WtA parameter matrix of a full connection layer; the Softmax layer converts the classification score into a classification probability output value
S22: randomly taking M image samples in the training set and inputting the M image samples into the teacher network, wherein the M image sample is represented by a convolutional layer as:
s23: the classification score obtained after the characteristics of the image sample pass through the full connection layer is expressed as follows:
whereinRepresenting the weight vectors in the teacher network full-connection layer, wherein the total number of the weight vectors is C, and the weight vectors are consistent with the number of the image classification categories;
s24: the calculation formula of the Softmax layer for converting the classification score into the probability output value belonging to the c-th class is as follows:
whereinScore a classificationThe (j) th component of (a),for a classifier parameter matrix WtThe jth weight vector of
S25: the cross entropy loss function between the classification output probability value and the real label is:
whereinIn order to output a probability value for the classification,is a full connection layer parameter matrix WtM is the number of image samples.
Further, the specific steps of the construction of teacher network group relation knowledge in the third step are as follows:
s31: randomly extracting M image samples in a training set, inputting the M image samples into a teacher network to extract features, and clustering the features of the M image samples into K groups by using a K-means algorithm;
s32: in the k group, there is NkSamples, wherein the characteristics of the kth sample are expressed asIn group I there are NlSamples, wherein the characteristic of the first sample is expressed asThen, the formula for calculating the coefficient of relationship between the kth group of sample features and the lth group of sample features based on the maximum mean difference is:
further, the concrete steps of the construction of the student network group relation knowledge in the fourth step are as follows:
s41: randomly initializing a small-capacity convolutional neural network;
s42: for the same batch of M image samples randomly extracted in a training set, the same batch of M image samples are taken into a student network to extract features, and the features of the M image samples are clustered into K groups by using a K-means algorithm;
s43: in the k group, there is NkSamples, wherein the characteristics of the kth sample are expressed asIn group I there are NlA sample, wherein the characteristic of the first sample is expressed asThen, the calculation formula of the coefficient of relationship between the kth group of sample features and the l group of sample features based on the maximum mean difference is:
further, the concrete steps of training the student network in the fifth step are as follows:
s51: for the same batch of M image samples randomly drawn in the training set and input into the student network, the mth image sample is represented by the convolution layer as:
s52: the classification score obtained after the characteristics of the image sample pass through the full connection layer is expressed as:
whereinRepresenting weight vectors in the full-connection layer, wherein the total number of the weight vectors is C, and the weight vectors are consistent with the number of image classification categories;
s53: the calculation formula for converting the Softmax layer classification score into the probability output value belonging to the c category is as follows:
whereinScore a classificationThe (j) th component of (a),for a classifier parameter matrix WsThe jth weight vector of (1);
s54: the cross entropy loss function between the classification output probability value and the real label is:
whereinA probability value is output for the classification of the student network,is a full connection layer parameter matrix WsThe jth weight vector of (1), M being the number of image samples;
s55: based on a relation matrix RtAnd RsF norm therebetween to obtain a group relationship distillation loss functionThe calculation formula of (c) is:
s56: the total loss function for the student network optimization is:
where α and β are adjustable parameters, the total loss function based on the above equation is applied to the parameter θ in the student networksAnd WsAnd optimizing to complete the model compression process.
Further, the step six includes the specific steps of testing the student network:
s61: parameter θ in fixed student networksAnd Ws(ii) a Randomly extracting any image in the test set and inputting the image into the student network, wherein the ith image sample is represented as xiThe features of the convolutional layer through the student network are represented as:
s62: the classification score obtained after the characteristics of the image sample pass through the full connection layer is expressed as:
whereinRepresenting weight vectors in the full-connection layer, wherein the total number of the weight vectors is C, and the weight vectors are consistent with the number of image classification categories;
s63: the calculation formula for converting the Softmax layer classification score into the probability output value belonging to the c category is as follows:
whereinScore a classificationThe (j) th component of (a),is the jth weight vector in the classifier parameter matrix Ws.
The invention has the beneficial effects that: the invention discloses a model compression method based on group relation knowledge distillation, which is characterized in that features extracted from samples are clustered by using a K-means algorithm based on a teacher network, and then a student network is guided to simulate the grouping capability of the teacher on the samples, so that the performance of the student network approaches the performance of the teacher network.
Drawings
FIG. 1 is a schematic structural diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood, however, that the description herein of specific embodiments is only intended to illustrate the invention and not to limit the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, and the terms used herein in the specification of the present invention are for the purpose of describing particular embodiments only and are not intended to limit the present invention.
As shown in fig. 1, the present invention is a method for compressing a model based on group relation knowledge distillation, comprising the following steps:
the method comprises the following steps: processing image data;
s11: for a given image data set, randomly dividing the image data set into three subsets, namely a training set, a verification set and a test set, and respectively using the three subsets for training a model, verifying a hyper-parameter and testing the performance of the model;
s12: the number of image classification classes is C, and the mth image sample in the training set is represented as xmIts corresponding label ym(ii) a The ith image sample in the test set is denoted xi。
Step two: training a teacher network;
s21: randomly initializing a large-capacity convolutional neural network, which consists of a plurality of convolutional layers, a plurality of fully-connected layers and a Softmax layer; the convolutional layer is represented asFor extracting features of image samples, where θtIs a parameter of the convolutional layer; the fully-connected layer is represented asFor transforming and classifying features, wherein WtA parameter matrix of a full connection layer; the Softmax layer converts the classification score into a classification probability output value
S22: randomly taking M image samples in the training set and inputting the M image samples into the teacher network, wherein the M image sample is represented by a convolutional layer as:
s23: the classification score obtained after the characteristics of the image sample pass through the full connection layer is expressed as:
whereinRepresenting the weight vectors in the teacher network full-connection layer, wherein the total number of the weight vectors is C, and the weight vectors are consistent with the number of the image classification categories;
s24: the calculation formula of the Softmax layer for converting the classification score into the probability output value belonging to the c-th class is as follows:
whereinScore a classificationThe (j) th component of (a),for a classifier parameter matrix WtThe jth weight vector of
S25: the cross entropy loss function between the classification output probability value and the real label is:
whereinIn order to output a probability value for the classification,is a full connection layer parameter matrix WtM is the number of image samples.
Step three: constructing teacher network group relation knowledge;
s31: randomly extracting M image samples in a training set, inputting the M image samples into a teacher network to extract features, and clustering the features of the M image samples into K groups by using a K-means algorithm;
s32: in the k group, there is NkSamples, wherein the characteristics of the kth sample are expressed asIn group I there are NlSamples, wherein the characteristic of the first sample is expressed asThen, the formula for calculating the coefficient of relationship between the kth group of sample features and the lth group of sample features based on the maximum mean difference is:
step four: constructing student network group relation knowledge;
s41: randomly initializing a small-capacity convolutional neural network;
s42: for the same batch of M image samples randomly extracted in a training set, the same batch of M image samples are taken into a student network to extract features, and the features of the M image samples are clustered into K groups by using a K-means algorithm;
s43: in the k group, there is NkSamples, wherein the characteristics of the kth sample are expressed asIn group I there are NlSamples, wherein the characteristic of the first sample is expressed asThen, the formula for calculating the coefficient of relationship between the kth group of sample features and the lth group of sample features based on the maximum mean difference is:
step five: training a student network;
s51: for the same batch of M image samples randomly drawn in the training set and input into the student network, the mth image sample is represented by the convolutional layer as:
s52: the classification score obtained after the characteristics of the image sample pass through the full connection layer is expressed as:
whereinRepresenting weight vectors in the full-connection layer, wherein the total number of the weight vectors is C, and the weight vectors are consistent with the number of image classification categories;
s53: the calculation formula for converting the Softmax layer classification score into the probability output value belonging to the c category is as follows:
whereinScore a classificationThe (j) th component of (a),for a classifier parameter matrix WsThe jth weight vector of (a);
s54: the cross entropy loss function between the classification output probability value and the real label is:
whereinA probability value is output for the classification of the student network,is a full connection layer parameter matrix WsThe jth weight vector in (1), M being the number of image samples;
s55: based on a relation matrix RtAnd RsF norm of between to obtain a group relation distillation loss functionThe calculation formula of (2) is as follows:
s56: the total loss function for the student network optimization is:
where α and β are adjustable parameters, the total loss function based on the above equation is applied to the parameter θ in the student networksAnd WsAnd optimizing to complete the model compression process.
Step six: and (5) testing the student network.
S61: parameter θ in fixed student networksAnd Ws(ii) a Randomly extracting any image in the test set and inputting the image into the student network, wherein the ith image sample is represented as xi, and the characteristic expression of passing through the convolution layer in the student network is as follows:
s62: the classification score obtained after the characteristics of the image sample pass through the full connection layer is expressed as:
whereinRepresenting weight vectors in the full-connection layer, wherein the total number of the weight vectors is C, and the weight vectors are consistent with the number of image classification categories;
s63: the calculation formula for converting the Softmax layer classification score into the probability output value belonging to the c category is as follows:
whereinScore a classificationThe (j) th component of (a),is the jth weight vector in the classifier parameter matrix Ws.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents or improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (7)
1. A model compression method based on group relation knowledge distillation is characterized by comprising the following steps:
the method comprises the following steps: processing image data;
step two: training a teacher network;
randomly initializing a large-capacity convolutional neural network, wherein the convolutional neural network consists of a plurality of convolutional layers, a plurality of fully-connected layers and a Softmax layer; the convolutional layer and the fully-connected layer are respectively represented asAndwherein theta istAs a parameter of the convolutional layer, WtA parameter matrix of a full connection layer; randomly extracting M image samples in a training set, and inputting the M image samples into a teacher network; the m-th image sample is represented as a convolution layerThrough the full connection layer is shown asFinally, the Softmax layer converts the probability output value of the probability output value into the c category, and the probability output value is pt(ym=c|xm) (ii) a The probability output value p is then calculatedt(ym=c|xm) With the authenticity label y of the specimenmCross entropy objective function betweenOptimizing parameters in the teacher network;
step three: constructing teacher network group relation knowledge;
randomly extracting M image samples in a training set, inputting the M image samples into a teacher network to extract features, and clustering the features of the M image samples into K groups by using a K-means algorithm, wherein N is arranged in the K groupkSamples, wherein the characteristics of the kth sample are expressed asIn group I there are NlSamples, wherein the characteristic of the first sample is expressed asCalculating the relation coefficient between any two groups by using the maximum mean differenceFinally obtaining a relation matrix Rt;
Step four: constructing student network group relation knowledge;
randomly initializing a small-capacity convolutional neural network, which consists of a small number of convolutional layers, a small number of full-connection layers and a Softmax layer; the convolutional layer and the fully-connected layer are respectively represented asAndwherein theta issAs a parameter of the convolution layer, WsIs a parameter matrix of the full connection layer. For the same batch of M image samples randomly extracted in a training set, the M image samples are taken into a student network to extract features, a K-means algorithm is used for clustering the features of the M image samples into K groups, and N image samples are assumed to exist in the K groupkSamples, wherein the characteristics of the kth sample are expressed asGroup IIn which is NlSamples, wherein the characteristic of the first sample is expressed asCalculating the relation coefficient between any two groups by using the maximum mean differenceFinally obtaining a relation matrix Rs;
Step five: training a student network;
for the same batch of M image samples randomly extracted in the training set, the M image sample is input into the student network to obtain the probability output value belonging to the c category, and the probability output value is represented as ps(ym=c|xm) Calculating a probability output value ps(ym=c|xm) With the authenticity label y of the specimenmCross entropy objective function betweenAnd simultaneously calculating a relationship matrix RtAnd RsF norm of between as a function of group-related distillation lossWill be provided withAndas a function of the total loss, optimizing parameters in the student network;
step six: testing a student network;
parameter θ in fixed student networksAnd WsRandomly extracting any image in the test set and inputting the image into the student network, wherein the ith image sample is represented as xiFinally obtaining the summary belonging to the c-th category through the convolution layer, the full connection layer and the Softmax layer in the student networkThe value of the rate output is expressed as ps(yi=c|xi)。
2. The method for compressing a model based on knowledge distillation of group relationships according to claim 1, wherein the image data in the first step is processed by the following specific steps:
s11: for a given image data set, randomly dividing the given image data set into three subsets, namely a training set, a verification set and a test set, and respectively using the three subsets for training a model, verifying a hyper-parameter and testing the performance of the model;
s12: the number of image classification classes is C, and the mth image sample in the training set is represented as xmIts corresponding label ym(ii) a The ith image sample in the test set is denoted xi。
3. The group-relationship-knowledge-distillation-based model compression method as claimed in claim 1, wherein the training of the teacher network in the second step comprises the specific steps of:
s21: randomly initializing a large-capacity convolutional neural network, which consists of a plurality of convolutional layers, a plurality of fully-connected layers and a Softmax layer; the convolutional layer is represented asFor extracting features of image samples, where θtIs a parameter of the convolutional layer; the fully-connected layer is represented asFor transforming and classifying features, wherein WtA parameter matrix of a full connection layer; the Softmax layer converts the classification score into a classification probability output value
S22: randomly taking M image samples in the training set and inputting the M image samples into the teacher network, wherein the M image sample is represented by a convolutional layer as:
s23: the classification score obtained after the characteristics of the image sample pass through the full connection layer is expressed as:
whereinRepresenting the weight vectors in the teacher network full-connection layer, wherein the total number of the weight vectors is C, and the weight vectors are consistent with the number of the image classification categories;
s24: the calculation formula of the Softmax layer for converting the classification score into the probability output value belonging to the c-th class is as follows:
whereinScore a classificationThe (j) th component of (a),for a classifier parameter matrix WtThe jth weight vector of
S25: the cross entropy loss function between the classification output probability value and the real label is:
4. The group relationship knowledge distillation-based model compression method as claimed in claim 1, wherein the construction of teacher network group relationship knowledge in the third step comprises the following specific steps:
s31: randomly extracting M image samples in a training set, inputting the M image samples into a teacher network to extract features, and clustering the features of the M image samples into K groups by using a K-means algorithm;
s32: in the kth group there is NkSamples, wherein the characteristics of the kth sample are expressed asIn group I there are NlSamples, wherein the characteristic of the first sample is expressed asThen, the formula for calculating the coefficient of relationship between the kth group of sample features and the lth group of sample features based on the maximum mean difference is:
5. the group relationship knowledge distillation-based model compression method as claimed in claim 1, wherein the concrete steps of the construction of the group relationship knowledge of the student network in the fourth step are as follows:
s41: randomly initializing a small-capacity convolutional neural network;
s42: for the same batch of M image samples randomly extracted in a training set, the same batch of M image samples are taken into a student network to extract features, and the features of the M image samples are clustered into K groups by using a K-means algorithm;
s43: in the k group, there is NkSamples, wherein the characteristics of the kth sample are expressed asIn group I there are NlA sample, wherein the characteristic of the first sample is expressed asThen, the calculation formula of the coefficient of relationship between the kth group of sample features and the l group of sample features based on the maximum mean difference is:
6. the group relationship knowledge distillation-based model compression method according to claim 1, wherein the training of the student network in the fifth step comprises the following specific steps:
s51: for the same batch of M image samples randomly drawn in the training set and input into the student network, the mth image sample is represented by the convolutional layer as:
s52: the classification score obtained after the characteristics of the image sample pass through the full connection layer is expressed as:
whereinRepresenting weight vectors in the full-connection layer, wherein the total number of the weight vectors is C, and the weight vectors are consistent with the number of image classification categories;
s53: the calculation formula for converting the Softmax layer classification score into the probability output value belonging to the c category is as follows:
whereinScore a classificationThe (j) th component of (a),for a classifier parameter matrix WsThe jth weight vector of (a);
s54: the cross entropy loss function between the classification output probability value and the real label is:
whereinA probability value is output for the classification of the student network,is a full connection layer parameter matrix WsThe jth weight vector of (1), M being the number of image samples;
s55: based on a relation matrix RtAnd RsF norm of between to obtain a group relation distillation loss functionThe calculation formula of (c) is:
s56: the total loss function for the student network optimization is:
where α and β are adjustable parameters, the total loss function based on the above equation is applied to the parameter θ in the student networksAnd WsAnd optimizing to complete the model compression process.
7. The group relationship knowledge distillation-based model compression method as claimed in claim 1, wherein the concrete steps of the test of the student network in the sixth step are as follows:
s61: parameter θ in fixed student networksAnd Ws(ii) a Randomly extracting any image in the test set and inputting the image into the student network, wherein the ith image sample is represented as xi, and the characteristic expression of passing through the convolution layer in the student network is as follows:
s62: the classification score obtained after the characteristics of the image sample pass through the full connection layer is expressed as:
whereinRepresenting the weight vectors in the full-link layer, C in total, and the number of image classification categoriesThe purposes are consistent;
s63: the calculation formula for converting the Softmax layer classification score into the probability output value belonging to the c category is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210030247.2A CN114626504A (en) | 2022-01-11 | 2022-01-11 | Model compression method based on group relation knowledge distillation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210030247.2A CN114626504A (en) | 2022-01-11 | 2022-01-11 | Model compression method based on group relation knowledge distillation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114626504A true CN114626504A (en) | 2022-06-14 |
Family
ID=81899061
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210030247.2A Withdrawn CN114626504A (en) | 2022-01-11 | 2022-01-11 | Model compression method based on group relation knowledge distillation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114626504A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115511059A (en) * | 2022-10-12 | 2022-12-23 | 北华航天工业学院 | Network lightweight method based on convolutional neural network channel decoupling |
CN117058437A (en) * | 2023-06-16 | 2023-11-14 | 江苏大学 | Flower classification method, system, equipment and medium based on knowledge distillation |
-
2022
- 2022-01-11 CN CN202210030247.2A patent/CN114626504A/en not_active Withdrawn
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115511059A (en) * | 2022-10-12 | 2022-12-23 | 北华航天工业学院 | Network lightweight method based on convolutional neural network channel decoupling |
CN115511059B (en) * | 2022-10-12 | 2024-02-09 | 北华航天工业学院 | Network light-weight method based on convolutional neural network channel decoupling |
CN117058437A (en) * | 2023-06-16 | 2023-11-14 | 江苏大学 | Flower classification method, system, equipment and medium based on knowledge distillation |
CN117058437B (en) * | 2023-06-16 | 2024-03-08 | 江苏大学 | Flower classification method, system, equipment and medium based on knowledge distillation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114626504A (en) | Model compression method based on group relation knowledge distillation | |
CN109801621A (en) | A kind of audio recognition method based on residual error gating cycle unit | |
CN112733866A (en) | Network construction method for improving text description correctness of controllable image | |
CN108345866B (en) | Pedestrian re-identification method based on deep feature learning | |
CN111985581A (en) | Sample-level attention network-based few-sample learning method | |
CN111695456A (en) | Low-resolution face recognition method based on active discriminability cross-domain alignment | |
CN114157539B (en) | Data-aware dual-drive modulation intelligent identification method | |
CN110598552A (en) | Expression recognition method based on improved particle swarm optimization convolutional neural network optimization | |
CN112766378B (en) | Cross-domain small sample image classification model method focusing on fine granularity recognition | |
CN106971180A (en) | A kind of micro- expression recognition method based on the sparse transfer learning of voice dictionary | |
Chen et al. | A semisupervised deep learning framework for tropical cyclone intensity estimation | |
CN112633154A (en) | Method and system for converting heterogeneous face feature vectors | |
Huang et al. | Design and Application of Face Recognition Algorithm Based on Improved Backpropagation Neural Network. | |
CN116452862A (en) | Image classification method based on domain generalization learning | |
CN115731595A (en) | Fuzzy rule-based multi-level decision fusion emotion recognition method | |
CN109034192B (en) | Track-vehicle body vibration state prediction method based on deep learning | |
CN114387474A (en) | Small sample image classification method based on Gaussian prototype classifier | |
CN114329031A (en) | Fine-grained bird image retrieval method based on graph neural network and deep hash | |
CN116543269B (en) | Cross-domain small sample fine granularity image recognition method based on self-supervision and model thereof | |
CN109165576A (en) | A kind of moving state identification method and device | |
CN116246305A (en) | Pedestrian retrieval method based on hybrid component transformation network | |
CN112784800B (en) | Face key point detection method based on neural network and shape constraint | |
CN114997331A (en) | Small sample relation classification method and system based on metric learning | |
CN112001222B (en) | Student expression prediction method based on semi-supervised learning | |
CN114357166A (en) | Text classification method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220614 |