CN115409157A - Non-data knowledge distillation method based on student feedback - Google Patents

Non-data knowledge distillation method based on student feedback Download PDF

Info

Publication number
CN115409157A
CN115409157A CN202211028120.3A CN202211028120A CN115409157A CN 115409157 A CN115409157 A CN 115409157A CN 202211028120 A CN202211028120 A CN 202211028120A CN 115409157 A CN115409157 A CN 115409157A
Authority
CN
China
Prior art keywords
student
model
task
student model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211028120.3A
Other languages
Chinese (zh)
Inventor
王灿
罗诗雅
陈德仿
冯雁
史麒豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202211028120.3A priority Critical patent/CN115409157A/en
Publication of CN115409157A publication Critical patent/CN115409157A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A non-data knowledge distillation method based on student feedback concretely relates to a non-data knowledge distillation method based on student feedback for image classification. The method comprises the following steps: s1: initializing the student model, and adding an auxiliary classifier S2 after the feature extractor of the student model: feeding back the current learning ability of the student model by using the auxiliary classifier, and simultaneously training a noise vector and a generator in a combined manner according to loss functions fed back by students and teachers so as to obtain an optimal synthetic picture; s3: training a student model by knowledge distillation by using the synthetic picture obtained in the S2, and simultaneously independently training an auxiliary classifier to learn an auxiliary task; s4: s2 and S3 are repeated until the student model is trained to converge. Under the condition of no original training data, the invention adaptively adjusts the content of the synthetic picture according to the current state of the student model, and tailors the synthetic picture for the student model, thereby training the student model more effectively and improving the final performance.

Description

Non-data knowledge distillation method based on student feedback
Technical Field
The invention relates to the technical field of knowledge distillation, in particular to a non-data knowledge distillation method based on student feedback.
Background
Convolutional neural networks have enjoyed significant success in a variety of practical applications in recent years. But their expensive storage and computational costs make it difficult to deploy the models on the mobile device. Therefore, hinton et al propose knowledge distillation techniques to achieve model compression, with the main idea of migrating dark knowledge from pre-trained heavyweight teacher models to lightweight student models.
Typical knowledge distillation methods are based on a strong premise that the raw data used to train the teacher model can be used directly to train the student model. However, in some practical scenarios, data is not shared publicly for reasons of privacy, intellectual property or the bulkiness of data sets, and thus, dataless intellectual distillation is proposed to solve this problem. The existing related work mainly utilizes feedback of a teacher model to realize picture synthesis, and then utilizes a synthesized picture to replace an original picture to carry out a knowledge distillation process.
However, existing work does not explicitly consider the learning ability of students during picture composition, and the synthesized pictures may get into a situation too simple relative to the current ability of students, resulting in that the student models do not learn new knowledge, thus impairing the final performance of the models.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a non-data knowledge distillation method based on student feedback, which estimates the current learning ability of a student by using an auxiliary task enhanced by self-supervision, so as to adaptively adjust the content of a synthetic picture, generate a sample difficult for a student model, ensure that the student model continuously acquires new knowledge and improve the final performance of the student model.
The invention adopts the following technical scheme:
a distillation method without data knowledge based on student feedback comprises the following steps:
s1: initializing a student model, and adding an auxiliary classifier behind a feature extractor of the student model;
s2: feeding back the current learning ability of the student model by using the auxiliary classifier, and simultaneously training a noise vector and a generator in a combined manner according to loss functions fed back by students and teachers so as to obtain an optimal synthetic picture;
s3: training a student model by knowledge distillation by using the synthetic picture obtained in the S2, and simultaneously independently training an auxiliary classifier to learn an auxiliary task;
s4: s2 and S3 are repeated until the student model is trained to converge.
Specifically, in S2, the current learning ability of the student model is fed back by using the auxiliary classifier, and the specific process includes:
randomly generating a noise vector z input into a generator network
Figure BDA0003816384710000011
Can obtain a composite picture
Figure BDA0003816384710000012
Then the composite sheet is synthesized
Figure BDA0003816384710000013
Rotating a certain angle to obtain a rotated picture
Figure BDA0003816384710000014
Input to student model
Figure BDA0003816384710000015
So that the obtained features are represented
Figure BDA0003816384710000016
Input to an auxiliary classifier
Figure BDA0003816384710000017
Output results using an auxiliary classifier
Figure BDA0003816384710000018
Calculating a loss function so as to quantify the current learning ability of the student model, namely the loss function fed back by the student, specifically:
Figure BDA0003816384710000019
wherein k represents a class label of the self-supervision enhancing task, and the self-supervision enhancing task treats an self-supervision rotating task and an original image classification task as a joint task.
Specifically, the category of the self-supervision enhanced task is specifically defined as follows:
the method comprises the steps that the total number of categories of original image classification tasks is given as N, and the total number of categories of self-supervision rotation tasks is given as M; hypothetical composite picture
Figure BDA0003816384710000021
In the image classification task, n classes are provided, and in the self-supervision rotation task, M classes are provided, so that in the self-supervision enhancement task, n × M + M classes are provided.
Specifically, the loss function fed back by the teacher in S2 is specifically:
Figure BDA0003816384710000022
wherein,
Figure BDA0003816384710000023
for teacher model
Figure BDA0003816384710000024
Output of (2)
Figure BDA0003816384710000025
And labels for predefined image classification tasks
Figure BDA0003816384710000026
Cross entropy between, which is formulated as:
Figure BDA0003816384710000027
Figure BDA0003816384710000028
the l2 norm distance between the feature statistics of the synthetic image and the real image is expressed by the formula:
Figure BDA0003816384710000029
wherein,
Figure BDA00038163847100000210
and
Figure BDA00038163847100000211
are respectively a composite image
Figure BDA00038163847100000212
Mean and variance of feature maps at the ith layer of the teacher model; mu.s l And
Figure BDA00038163847100000213
the mean value and the variance of the feature map stored in the ith layer of the teacher model, namely the feature statistical information representing the real image.
Specifically, in S2, the noise vector sum generator is jointly trained according to loss functions of student feedback and teacher feedback, where the total loss function is:
Figure BDA00038163847100000214
where α is the weight of the hyperparameter used to balance the two loss terms.
Specifically, in S3, the overall loss function for training the student model through knowledge distillation is as follows:
Figure BDA00038163847100000215
wherein β is the weight of the hyperparameter used to balance the three loss terms;
Figure BDA00038163847100000216
the method is a conventional loss item in an original image classification task and is used for calculating the cross entropy between student model output and a predefined label;
Figure BDA00038163847100000217
the KL divergence between the teacher and student outputs is formulated as:
Figure BDA00038163847100000218
where σ (·) is the softmax function, τ is the hyper-parameter of the smoothed output distribution;
Figure BDA00038163847100000219
feature maps for the last layer of the teacher model
Figure BDA00038163847100000220
And feature map of last layer of student model
Figure BDA00038163847100000221
The mean square error between them, which is formulated as:
Figure BDA00038163847100000222
where r (-) is a mapping operation in order to align the dimensions between feature maps.
Specifically, in S3, the independently training the auxiliary classifier specifically includes: after the student completes each training iteration, the parameters of the student model are fixed and then the parameters are fixed according to the loss function
Figure BDA00038163847100000223
Training updates the parameters of the auxiliary classifier.
From the above description, the advantages of the present invention compared to the prior art are:
in the process of picture synthesis, the student model also serves as one of contributors, the content of the picture is adaptively adjusted and integrated according to the current ability fed back by the student, a sample which is more difficult for the current ability of the student is generated, the phenomenon that the student model cannot learn new knowledge all the time due to the too simple sample is avoided, and the student is trained more effectively, so that the final performance is improved.
Under the condition of no original training data, the invention adaptively adjusts the content of the synthetic picture according to the current state of the student model, and tailors the synthetic picture for the student model, thereby training the student model more effectively and improving the final performance.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The process of the present invention is further described below with reference to the accompanying drawings:
as shown in fig. 1, a distillation method without data knowledge based on student feedback comprises the following steps:
s1: initializing a student model, and adding an auxiliary classifier behind a feature extractor of the student model;
in a specific implementation, the auxiliary classifier is composed of two fully connected layers.
S2: feeding back the current learning ability of the student model by using an auxiliary classifier, and simultaneously training a noise vector sum generator in a combined manner according to loss functions fed back by students and fed back by teachers, thereby obtaining an optimal synthetic picture;
and (3) feeding back the current learning ability of the student model by using the auxiliary classifier, wherein the specific process comprises the following steps:
randomly generating a noise vector z input into a generator network
Figure BDA0003816384710000031
Can obtain a composite picture
Figure BDA0003816384710000032
Then, the composite sheet is combined
Figure BDA0003816384710000033
Rotate a certain angle to rotate the rotated picture
Figure BDA0003816384710000034
Input to student model
Figure BDA0003816384710000035
So that the obtained features are represented
Figure BDA0003816384710000036
Input to an auxiliary classifier
Figure BDA0003816384710000037
Output results using an auxiliary classifier
Figure BDA0003816384710000038
Calculating a loss function so as to quantify the current learning ability of the student model, namely the loss function fed back by the student, specifically:
Figure BDA0003816384710000039
wherein k represents a class label of the self-supervision enhancing task, and the self-supervision enhancing task regards an self-supervision rotating task and an original image classification task as a joint task.
It is necessary to adaptively synthesize pictures according to the current ability of the student model, the ability of the model to capture semantic information can be well used as an index of the ability of the student model, and an auxiliary task can reflect the degree of understanding of the semantic information by the student model from the side. If only the self-supervised rotation task is used as the auxiliary task, the capability evaluation may be inaccurate, for example, the number "6" is rotated by 180 ° for the data "9", and is rotated by 0 ° for the number "6" itself. Therefore, the self-supervision enhancing task is adopted as an auxiliary task of the method, so that the model can identify the category while identifying the rotation angle.
In the picture synthesis process, the optimization target is enlarged
Figure BDA00038163847100000310
To generate difficult samples, i.e., those samples for which the student model has difficulty understanding on the semantic information.
The category of the self-supervision enhanced task is specifically defined as follows:
the total classification number of the original image classification task is given as N, and the total classification number of the self-supervision rotation task is given as M; hypothetical composite picture
Figure BDA00038163847100000312
In the image classification task, n classes, and in the self-supervision rotation task, M classes, and in the self-supervision enhancement task, n × M + M classes.
However, if the picture synthesis is only based on student feedback, the distribution of the synthesized picture is far from the distribution of the real picture due to lack of prior knowledge of the original data, so that the picture synthesis also needs to be based on teacher feedback in order to consider the quality of the synthesized picture, and the loss function is specifically as follows:
Figure BDA00038163847100000311
wherein,
Figure BDA0003816384710000041
a one-hot assumption is expressed that if the composite picture has the same distribution as the original training picture, the output of the teacher model for the composite picture will resemble a one-hot vector form. Therefore, the first and second electrodes are formed on the substrate,
Figure BDA0003816384710000042
is defined as a teacher model
Figure BDA0003816384710000043
Output of (2)
Figure BDA0003816384710000044
And labels for predefined image classification tasks
Figure BDA0003816384710000045
Cross entropy between them, which is expressed by the formula:
Figure BDA0003816384710000046
Figure BDA0003816384710000047
statistical data stored in a batch normalization layer of the teacher model is effectively utilized as data prior information. Therefore, it is possible to
Figure BDA0003816384710000048
Is defined as the l2 norm distance between the composite image and the real image feature statistics, and is formulated as:
Figure BDA0003816384710000049
wherein,
Figure BDA00038163847100000410
and
Figure BDA00038163847100000411
are respectively a composite image
Figure BDA00038163847100000412
Mean and variance of feature maps at the ith layer of the teacher model; mu.s l And
Figure BDA00038163847100000413
the mean value and the variance of the feature map stored in the ith layer of the teacher model, namely the feature statistical information representing the real image.
Therefore, in the picture synthesis process, the noise vector sum generator is jointly trained according to the loss functions of student feedback and teacher feedback, and the total loss function is as follows:
Figure BDA00038163847100000414
where α is the weight of the hyperparameter used to balance the two loss terms. In a specific implementation, α is set to 10.
S3: training a student model by knowledge distillation by using the synthetic picture obtained in the S3, and simultaneously independently training an auxiliary classifier to learn an auxiliary task;
the overall loss function for training the student model for knowledge distillation is:
Figure BDA00038163847100000415
where β is the weight of the hyperparameter used to balance the three loss terms, and in a specific implementation, β is set to 30;
Figure BDA00038163847100000416
the method is a conventional loss item in an original image classification task and is used for calculating the cross entropy between student model output and a predefined label;
Figure BDA00038163847100000417
the KL divergence between the teacher and student outputs is formulated as:
Figure BDA00038163847100000418
wherein σ (·) is a softmax function, τ is a hyperparameter of the smoothed output distribution, and τ is set to 20 in specific implementations;
Figure BDA00038163847100000419
feature maps for the last layer of the teacher model
Figure BDA00038163847100000420
And feature map of last layer of student
Figure BDA00038163847100000421
The mean square error between them, which is formulated as:
Figure BDA00038163847100000422
where r (-) is a mapping operation, in order to align the dimensions between feature maps, in a specific implementation, the mapping operation consists of three layers of convolution blocks of 1 × 1,3 × 3,1 × 1.
The independently trained auxiliary classifier specifically comprises: after the student completes each training iteration, the parameters of the student model are fixed and then the parameters are fixed according to the loss function
Figure BDA00038163847100000423
And training and updating parameters of the auxiliary classifier, so as to improve the evaluation capability of the auxiliary classifier, thereby more accurately estimating the learning capability of students in the picture synthesis process.
S4: s2 and S3 are repeated until the student model is trained to converge.
Experiments were performed on two published image classification datasets, CIFAR10 and CIFAR100, respectively, using a no-data knowledge distillation method based on student feedback. Their picture sizes are all 32 x 32.CIFAR10 is composed of 10 classes, each class including 5000 training pictures and 1000 test pictures. The CIFAR100 is composed of 100 classes, each class including 500 training pictures and 100 test pictures. The training pictures are only used to train the teacher model to obtain a pre-trained teacher model, which is not visible to the student model. The test pictures are used to evaluate the prediction accuracy of the model. The teacher model selects the network structure of WRN-40-2, and the student model selects the network structure of WRN-16-1.
Firstly, in order to prove the superiority of the invention, compared with other prior art methods, the experimental results are shown in table 1, the invention is obviously superior to other technical methods, and a student model with better performance is obtained.
TABLE 1 prediction accuracy of the respective methodological models
Figure BDA0003816384710000051
Wherein, DFAL is from Chen, handing, yunhe Wang, chang Xu, ZHaohui Yang, chuanjian Liu, boxin Shi, chunjing Xu, chao Xu, qi Tian, data-Free Learning of Student Networks;
ZSKT is from Micaelli, paul, amos J.Storkey, zero-shot Knowledge Transfer via Adversal Belief Matching;
ADI from Yin, hongxu, pavlo Molchanov, zhizhong Li, jos e Manual
Figure BDA0003816384710000054
Arun Mallya,Derek Hoiem,Niraj Kumar Jha,Jan Kautz,Dreaming to Distill:Data-Free Knowledge Transfer via DeepInversion;
CMI is from Fang, gongfan, jie Song, xinchao Wang, chen Shen, xingen Wang, mingli Song, contrast Model Inversion for Data-Free Knowledge Distillation.
To further prove the importance of student feedback in the picture synthesis processIt is applied to ablation experiment, namely, in the process of data synthesis, when training noise vector and generator, the feedback loss function of student is removed
Figure BDA0003816384710000053
Based only on teacher feedback. The experimental results are shown in table 2, which can show that better performing student models are obtained with student feedback.
Table 2 ablation experimental results
Figure BDA0003816384710000052
Another embodiment of the present invention provides a method for implementing image classification using a student feedback-based distillation method without data knowledge, comprising:
obtaining a pre-training teacher model: obtaining from a model that has been trained on a training set of image classification datasets;
picture synthesis: simultaneously, jointly training a noise vector and a generator according to the feedback of the student model and the teacher model, inputting the noise vector to the generator after training is finished, and obtaining output which is the required synthetic picture;
knowledge distillation: the student models are trained by knowledge distillation using the synthesized pictures, and simultaneously the auxiliary classifiers used to feed back the student status in the picture synthesis stage are independently trained. And the picture synthesis process and the knowledge distillation process are alternately carried out until the student model converges.
Image classification: and inputting the picture to be predicted into the trained student model, wherein the obtained output is the category of the picture.

Claims (7)

1. A distillation method without data knowledge based on student feedback comprises the following steps:
s1: initializing a student model, and adding an auxiliary classifier behind a feature extractor of the student model;
s2: feeding back the current learning ability of the student model by using an auxiliary classifier, and simultaneously training a noise vector sum generator in a combined manner according to loss functions fed back by students and fed back by teachers, thereby obtaining an optimal synthetic picture;
s3: training a student model by knowledge distillation by using the synthetic picture obtained in the S2, and simultaneously independently training an auxiliary classifier to learn an auxiliary task;
s4: s2 and S3 are repeated until the student model is trained to converge.
2. The distillation method for the dataless knowledge based on the student feedback as claimed in claim 1, wherein the step S2 of feeding back the current learning ability of the student model by using an auxiliary classifier comprises the following steps:
randomly generating a noise vector z input into a generator network
Figure FDA0003816384700000011
Can obtain a composite picture
Figure FDA0003816384700000012
Then, the composite sheet is combined
Figure FDA0003816384700000013
Rotate a certain angle to rotate the rotated picture
Figure FDA0003816384700000014
Input to student model
Figure FDA0003816384700000015
So that the obtained features are represented
Figure FDA0003816384700000016
Input to an auxiliary classifier
Figure FDA0003816384700000017
Output results using an auxiliary classifier
Figure FDA0003816384700000018
Calculating a loss function so as to quantify the current learning ability of the student model, namely the loss function fed back by the student, specifically:
Figure FDA0003816384700000019
wherein k represents a class label of the self-supervision enhancing task, and the self-supervision enhancing task treats an self-supervision rotating task and an original image classification task as a joint task.
3. The distillation method without data knowledge based on student feedback as claimed in claim 2, wherein the self-supervision enhancing task is specifically defined as follows:
the total classification number of the original image classification task is given as N, and the total classification number of the self-supervision rotation task is given as M; hypothetical composite picture
Figure FDA00038163847000000110
In the image classification task, n classes, and in the self-supervised rotation task, M classes, then in the self-supervised enhancement task, n x M + M classes.
4. The distillation method without data knowledge based on student feedback as claimed in claim 3, wherein the loss function of teacher feedback in S2 is specifically:
Figure FDA00038163847000000111
wherein,
Figure FDA00038163847000000112
for teacher model
Figure FDA00038163847000000113
Output of (2)
Figure FDA00038163847000000114
And labels for predefined image classification tasks
Figure FDA00038163847000000115
Cross entropy between them, which is expressed by the formula:
Figure FDA00038163847000000116
Figure FDA00038163847000000117
the l2 norm distance between the synthesized image and the real image feature statistics is expressed by the formula:
Figure FDA00038163847000000118
wherein,
Figure FDA00038163847000000119
and
Figure FDA00038163847000000120
are respectively a composite image
Figure FDA00038163847000000121
Mean and variance of feature maps at the ith layer of the teacher model; mu.s l And
Figure FDA00038163847000000122
the bands represent feature statistics of real images for the mean and variance of the feature maps stored at layer i of the teacher model.
5. The method of claim 4, wherein in the step S2, the noise vector sum generator is jointly trained according to loss functions of student feedback and teacher feedback, and the overall loss function is:
Figure FDA00038163847000000123
where α is the weight of the hyperparameter used to balance the two loss terms.
6. The student feedback-based dataless knowledge distillation method as claimed in claim 5, wherein in the step S3, the overall loss function for training the student model through knowledge distillation is as follows:
Figure FDA0003816384700000021
wherein,
Figure FDA0003816384700000022
the method is a conventional loss item in an original image classification task and is used for calculating the cross entropy between student model output and a predefined label;
Figure FDA0003816384700000023
for the KL divergence between the teacher and student outputs, the formula is:
Figure FDA0003816384700000024
where σ (·) is the softmax function, τ is the hyper-parameter of the smoothed output distribution;
Figure FDA0003816384700000025
feature maps for the last layer of the teacher model
Figure FDA0003816384700000026
And feature map of last layer of student model
Figure FDA0003816384700000027
The mean square error between them, which is formulated as:
Figure FDA0003816384700000028
where r (-) is a mapping operation in order to align the dimensions between feature maps.
7. The method for distillation without data knowledge based on student feedback according to claim 6, wherein in the step S3, the independently training the auxiliary classifier specifically comprises: after the student completes each training iteration, the parameters of the student model are fixed and then the parameters are fixed according to the loss function
Figure FDA0003816384700000029
Training updates the parameters of the auxiliary classifier.
CN202211028120.3A 2022-08-25 2022-08-25 Non-data knowledge distillation method based on student feedback Pending CN115409157A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211028120.3A CN115409157A (en) 2022-08-25 2022-08-25 Non-data knowledge distillation method based on student feedback

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211028120.3A CN115409157A (en) 2022-08-25 2022-08-25 Non-data knowledge distillation method based on student feedback

Publications (1)

Publication Number Publication Date
CN115409157A true CN115409157A (en) 2022-11-29

Family

ID=84162036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211028120.3A Pending CN115409157A (en) 2022-08-25 2022-08-25 Non-data knowledge distillation method based on student feedback

Country Status (1)

Country Link
CN (1) CN115409157A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116630724A (en) * 2023-07-24 2023-08-22 美智纵横科技有限责任公司 Data model generation method, image processing method, device and chip
CN117576518A (en) * 2024-01-15 2024-02-20 第六镜科技(成都)有限公司 Image distillation method, apparatus, electronic device, and computer-readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116630724A (en) * 2023-07-24 2023-08-22 美智纵横科技有限责任公司 Data model generation method, image processing method, device and chip
CN116630724B (en) * 2023-07-24 2023-10-10 美智纵横科技有限责任公司 Data model generation method, image processing method, device and chip
CN117576518A (en) * 2024-01-15 2024-02-20 第六镜科技(成都)有限公司 Image distillation method, apparatus, electronic device, and computer-readable storage medium
CN117576518B (en) * 2024-01-15 2024-04-23 第六镜科技(成都)有限公司 Image distillation method, apparatus, electronic device, and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN110837870B (en) Sonar image target recognition method based on active learning
CN111695467B (en) Spatial spectrum full convolution hyperspectral image classification method based on super-pixel sample expansion
CN115409157A (en) Non-data knowledge distillation method based on student feedback
CN111340738B (en) Image rain removing method based on multi-scale progressive fusion
CN108288035A (en) The human motion recognition method of multichannel image Fusion Features based on deep learning
Xu et al. Open-ended visual question answering by multi-modal domain adaptation
CN110705591A (en) Heterogeneous transfer learning method based on optimal subspace learning
CN111008650B (en) Metallographic structure automatic grading method based on deep convolution antagonistic neural network
CN114357221B (en) Self-supervision active learning method based on image classification
CN112200797B (en) Effective training method based on PCB noise labeling data
CN114049515A (en) Image classification method, system, electronic device and storage medium
CN113591978A (en) Image classification method, device and storage medium based on confidence penalty regularization self-knowledge distillation
CN114998602A (en) Domain adaptive learning method and system based on low confidence sample contrast loss
CN114329031A (en) Fine-grained bird image retrieval method based on graph neural network and deep hash
Ye et al. Learning cross-domain representations by vision transformer for unsupervised domain adaptation
CN117078656A (en) Novel unsupervised image quality assessment method based on multi-mode prompt learning
CN117011515A (en) Interactive image segmentation model based on attention mechanism and segmentation method thereof
CN115457299B (en) Matching method of sensor chip projection photoetching machine
CN116935438A (en) Pedestrian image re-recognition method based on autonomous evolution of model structure
CN116720106A (en) Self-adaptive motor imagery electroencephalogram signal classification method based on transfer learning field
CN109145749B (en) Cross-data-set facial expression recognition model construction and recognition method
CN116188428A (en) Bridging multi-source domain self-adaptive cross-domain histopathological image recognition method
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium
CN115273135A (en) Gesture image classification method based on DC-Res2Net and feature fusion attention module
CN114821219A (en) Unsupervised multi-source field self-adaption method based on deep joint semantics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination