CN113112020B - Model network extraction and compression method based on generation network and knowledge distillation - Google Patents

Model network extraction and compression method based on generation network and knowledge distillation Download PDF

Info

Publication number
CN113112020B
CN113112020B CN202110320646.8A CN202110320646A CN113112020B CN 113112020 B CN113112020 B CN 113112020B CN 202110320646 A CN202110320646 A CN 202110320646A CN 113112020 B CN113112020 B CN 113112020B
Authority
CN
China
Prior art keywords
network
teacher
trained
generated
student
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110320646.8A
Other languages
Chinese (zh)
Other versions
CN113112020A (en
Inventor
曾一锋
林晓晴
杨帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202110320646.8A priority Critical patent/CN113112020B/en
Publication of CN113112020A publication Critical patent/CN113112020A/en
Application granted granted Critical
Publication of CN113112020B publication Critical patent/CN113112020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/027Frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a model network extraction and compression method based on a generation network and knowledge distillation, which comprises the following steps: training a loss function of the generated network by using the trained teacher network to obtain a trained generated network; generating a plurality of generated pictures according to the generation network; inputting the generated pictures into a trained teacher network and a trained student network, and carrying out knowledge distillation on the student network; updating the student network; when facing a large network, the method provided by the invention can only learn the classification knowledge of specific categories in the large network according to different tasks and migrate the classification knowledge to a smaller network. Meanwhile, the method of the invention can rely on the data less, and the knowledge distillation is carried out under the condition of no data, thereby reducing the dependence of the original knowledge distillation on the real data.

Description

Model network extraction and compression method based on generation network and knowledge distillation
Technical Field
The invention relates to the field of error compensation, in particular to a model network extraction and compression method based on a generation network and knowledge distillation.
Background
In the field of artificial intelligence, in order to solve different problems, people propose more and more complex network structures, the network scale is larger and larger, and meanwhile, the problem is that in practical projects, due to the limitation of hardware resource computing capacity and the like, a large-scale network with excellent performance is difficult to apply, so that a plurality of methods such as knowledge distillation and the like can compress and accelerate the trained large-scale network. Meanwhile, for a trained network, according to the actual requirements, in many cases, the requirements of the desired network may not be all task targets of the original network, but only some task targets thereof, for example, a large network is available to implement 1000 classes of classification tasks on ImageNet, however, in actual applications, the task targets may not need the above 1000 classes but 10 classes of task targets.
For network compression and speed-up technologies, some classical methods are researched and improved at present. Related researchers put forward a new hash coding mode for the neural network to accelerate the operation of the network, and carry out hash mapping on parameters, wherein the parameters in the same hash bucket contribute to a weight value. The positions of a model pruning filter and a full connection layer are determined by evaluating the filter and the full connection neurons in the trained network. In addition, regularization induction updating is carried out on the weights through kernel sparsification, so that the weights of the kernels are more sparse and then the kernels are conveniently cut. Besides reducing the capacity of the models by pruning in the trained models through various methods, Hinton proposes the concept of knowledge distillation, and the aim of transferring the knowledge learned by the teacher network to a smaller student network is fulfilled by fitting the teacher model by enabling the output labels of the student models and the output labels of the teacher model to be as close as possible. Compared with the method, the method can be separated from the limit of the model structure between model compression.
At present, some classical network compression methods are researched, some methods are pruning operations based on an original model, some methods are network acceleration by using kernel sparseness, and some methods are compression by redesigning a smaller network by using distillation.
Although the model after network compression can be free from the constraint of the original trained model directly through technologies such as distillation, the distillation process has strong dependence on the original data set. And the task targets of the network before and after distillation are not changed, and only the knowledge of the network part can not be migrated.
Disclosure of Invention
The invention mainly aims to overcome the defects in the prior art, and provides a model network extraction and compression method based on a generated network and knowledge distillation, so that when a large network is faced, only specific classification knowledge in the large network can be learned according to different tasks and transferred to a smaller network; meanwhile, the data can be relied on less, knowledge distillation is carried out under the condition of no data, and the dependence of original knowledge distillation on real data is reduced.
The invention adopts the following technical scheme:
A model network extraction and compression method based on generation network and knowledge distillation comprises the following steps:
training a loss function of the generated network by using the trained teacher network to obtain a trained generated network;
generating a plurality of generated pictures according to the generation network;
inputting the generated pictures into a trained teacher network and a trained student network, and carrying out knowledge distillation on the student network;
and updating the student network.
Specifically, the method for obtaining the trained generated network by using the trained teacher network to train the loss function of the generated network specifically includes:
outputting the classification result of the teacher network generating the network generated picture by using the trained teacher network as feedback;
generating a loss function of the network by using feedback calculation;
the gradient of the loss function is calculated and the parameters of the generator network are updated. And when the output of the picture generated by the generating network to the teacher network and the classification result of the teacher network to the real picture output meet the set requirements, obtaining the trained generating network.
Specifically, a trained teacher network is used for training a loss function of a generated network to obtain the trained generated network, where the loss function specifically is as follows:
Figure BDA0002992755770000031
wherein,
Figure BDA0002992755770000032
The teacher network generates cross entropy loss of the picture to the generator;
Figure BDA0002992755770000033
outputting the information entropy of the target task;
Figure BDA0002992755770000034
generating the probability that the image is judged as the target category by the teacher network;
Figure BDA0002992755770000035
distance of the network output feature map; alpha, beta, gamma and delta are the weights of the three loss functions, and the value range is 0 to 1.
In particular, in the loss function
Figure BDA0002992755770000036
The method comprises the following specific steps:
Figure BDA0002992755770000037
wherein,
Figure BDA0002992755770000038
for the output of the teacher network for generating the picture,
Figure BDA0002992755770000039
outputting the obtained pseudo label by the teacher network for generating the picture; n is the number of pictures a generator generates a batch.
In particular, in the loss function
Figure BDA00029927557700000310
And
Figure BDA00029927557700000311
the method specifically comprises the following steps:
Figure BDA00029927557700000312
Figure BDA00029927557700000313
wherein N is the total number of the trained model task categories; m is the number of task categories of the target part, M<N;piThe n pictures are distinguished as the frequency of the ith category for the teacher network.
In particular, in the loss function
Figure BDA00029927557700000314
The method specifically comprises the following steps:
Figure BDA00029927557700000315
wherein the real image is defined as
Figure BDA00029927557700000316
The image generated by the generator is defined as
Figure BDA00029927557700000317
Figure BDA00029927557700000318
Generating a mean value of the pictures;
Figure BDA00029927557700000319
the variance of the picture is generated, l is the first of the network.
Specifically, the generated pictures are input into a teacher network and a student network, and knowledge distillation is carried out on the student network, and the method specifically comprises the following steps:
a set of n random vectors z 1,z2,…,znInputting the result into the generated network, and obtaining the output result of the generated network as follows:
Figure BDA0002992755770000041
respectively inputting the generated pictures into a teacher network and a student network to obtain the output of the teacher network
Figure BDA0002992755770000042
And output of student network
Figure BDA0002992755770000043
With knowledge distillation, the optimization objective function of the student network is:
Figure BDA0002992755770000044
wherein WSIs a parameter of the student network.
As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following advantages:
(1) the model network extraction and compression method based on the generated network and knowledge distillation provided by the invention utilizes the trained teacher network to train the loss function of the generated network to obtain the trained generated network; generating a plurality of generated pictures according to the generation network; inputting the generated pictures into a trained teacher network and a trained student network, and carrying out knowledge distillation on the student network; updating the student network; the invention combines the picture generation technology for generating the network and the knowledge distillation technology in network compression, can purposefully distill all kinds of knowledge learned by a large network, and only extracts a part of interested target knowledge to be in a smaller network. Meanwhile, the technology utilizes the generation network to design the loss function which accords with the classification distribution in the middle of the original teacher network, and reduces the dependence on real data in the small network training process.
Drawings
FIG. 1 is a model network extraction and compression method based on a generation network and knowledge distillation provided by an embodiment of the invention.
The invention is described in further detail below with reference to the figures and specific examples.
Detailed Description
The invention is further described below by means of specific embodiments.
Knowledge distillation is a process proposed by Hinton et al that enables knowledge learning between networks, which may be of different structures or of similar structures but of different capacities. Conventional knowledge distillation requires a trained, well-behaved network as a teacher network, which is typically complex, and a smaller network designed based on task requirements as a student network. The knowledge distillation considers that the output result of the last layer of the teacher network contains rich knowledge learned by the model, and the knowledge is reflected by the output distribution, so that the output of the student network is expected to learn the distribution output by the last layer of the teacher network, and the process of transferring the knowledge of the teacher network to a smaller student network is realized by the way.
The invention combines the picture generation technology for generating the network and the knowledge distillation technology in network compression, can purposefully distill all kinds of knowledge learned by a large network, and only extracts a part of interested target knowledge to be in a smaller network. Meanwhile, the technology utilizes the generation network to design the loss function which accords with the classification distribution in the middle of the original teacher network, and reduces the dependence on real data in the small network training process.
Referring to fig. 1, a flowchart of a model network extraction and compression method based on a generated network and knowledge distillation provided in an embodiment of the present invention specifically includes the following steps:
s101: training a loss function of the generated network by using the trained teacher network to obtain a trained generated network;
when a trained teacher network is owned, the teacher network is supposed to contain valuable information in the training process, and the valuable information of the knowledge is represented in the output result of the teacher network for the calculation of input data in the network. The goal is to let the generator learn the output expression of the teacher's network, let the picture generated by the generator be more likely to be considered as a "normal" image by the teacher's network, and be able to successfully identify the category result of the small task object, thereby completing the process of knowledge extraction and transfer. Therefore, the output of the teacher network on the synthesized image can be used as an important index for the learning of the generator, so that the output result of the image generated by the generator on the teacher network, even the result of the middle layer, can approach the result of the flow of the real image in the teacher network;
the learning process of generating the network is as follows: and training parameters of the generated network by using the output result of the teacher network on the generated network generated picture as feedback, so that the output result of the generated picture on the teacher network by the generator is as close as possible to the output result of the real picture on the teacher network. The label for generating the network generated picture is obtained from the result of the teacher network.
Training a loss function of the generated network by using the trained teacher network to obtain the trained generated network, wherein the loss function specifically comprises the following steps:
Figure BDA0002992755770000061
wherein,
Figure BDA0002992755770000062
the teacher network generates cross entropy loss of the picture to the generator;
Figure BDA0002992755770000063
outputting the information entropy of the target task;
Figure BDA0002992755770000064
generating the probability that the image is judged as the target category by the teacher network;
Figure BDA0002992755770000065
distance of the network output feature map; alpha, beta, gamma and delta are the weights of the three loss functions, and the value range is 0 to 1.
Said loss boxIn counting
Figure BDA0002992755770000066
The method specifically comprises the following steps:
Figure BDA0002992755770000067
wherein,
Figure BDA0002992755770000068
for the output of the teacher network for generating the picture,
Figure BDA0002992755770000069
outputting the obtained pseudo label by the teacher network for generating the picture; n is the number of pictures a generator generates a batch.
In the loss function
Figure BDA00029927557700000610
And
Figure BDA00029927557700000611
the method specifically comprises the following steps:
Figure BDA00029927557700000612
Figure BDA00029927557700000613
wherein N is the total number of the trained model task categories; m is the number of task categories of the target part, M<N;piThe frequency for distinguishing n pictures as the ith category for the teacher network.
In the loss function
Figure BDA00029927557700000614
The method specifically comprises the following steps:
Figure BDA00029927557700000615
wherein the real image is defined as
Figure BDA00029927557700000616
The image generated by the generator is defined as
Figure BDA00029927557700000617
Figure BDA00029927557700000618
Generating a mean value of the pictures;
Figure BDA0002992755770000071
generating a variance of the picture, l being the first of the network;
specifically; the loss function for the design generator is:
Figure BDA0002992755770000072
the three components of the loss function have their respective optimization objectives:
Figure BDA0002992755770000073
It is possible to make the generated image a composite image in the output layer that is fully compatible with the teacher's network, in other words, by
Figure BDA0002992755770000074
The generator can be made to learn how to make the generated picture more recognizable by the teacher network to success. Considering the pseudo label given to the generator by the teacher network as a real label, then
Figure BDA0002992755770000075
Figure BDA0002992755770000076
And
Figure BDA0002992755770000077
from the angle of the information entropy, the information entropy of the target category is obtained by calculating the output distribution of the teacher network to the target category: in information theory, how much a network output value contains the required information is expressed by quantifying the probability distribution of the output. If the probability that the image generated by the generator is judged to be the target category is higher after the image enters the teacher network, according to the information entropy theory, the uncertainty of the network for outputting the generated image to be the target category is smaller, the information quantity is smaller, the obtained result is also smaller,
Figure BDA0002992755770000078
the smaller the size, but the entropy of the learning target class is still insufficient, and when the class imbalance occurs, the minimum value may still be obtained, but in this case, the balance of the number of samples of the generated pictures cannot be guaranteed. In view of this, introduce
Figure BDA0002992755770000079
When the frequency distribution of each category is equal, the loss value is the minimum, namely, the generation network can generate the image of each category in the task target according to the average probability, so that the purpose of generating the image categories in a balanced manner is achieved.
Figure BDA00029927557700000710
And
Figure BDA00029927557700000711
the specific expression form is as follows:
Figure BDA00029927557700000712
Figure BDA00029927557700000713
to take into account the quality of the generated picture, a regularization term for the image is added to the loss function of the generator
Figure BDA0002992755770000081
Suppose a true image is defined as
Figure BDA0002992755770000082
The image generated by the generator is defined as
Figure BDA0002992755770000083
In order to ensure the similarity of the extracted features of the generated image and the real image in the middle layer of the teacher network, the target problem is converted into minimizing the distance between the feature maps of the generated image and the real image in the middle layer. Assuming that the features extracted by the intermediate layer follow a Gaussian distribution, the regularization of the distances between feature maps can be defined as
Figure BDA0002992755770000084
When the calculation of the real image is lacked, the mean and variance of the data distribution of the real image can be obtained by using the output of a BN layer in the teacher network, and the formula is expressed as follows:
Figure BDA0002992755770000085
although it is unknown how the teacher network is trained, it can be known that when the network introduces batch processing, it can capture the mean and variance of the network after batch processing the input, so the mean and variance of the network with respect to the real data can be obtained approximately, and therefore the distance of the network output feature map can be defined as:
Figure BDA0002992755770000086
s102: generating a plurality of generated pictures according to the generation network;
s103: inputting the generated pictures into a trained teacher network and a trained student network, and carrying out knowledge distillation on the student network;
Knowledge distillation is a process proposed by Hinton et al to achieve knowledge learning between networks, which may be of different structures or of similar structures but of different capacities. Traditional knowledge distillation requires a trained, well-behaved network as the teacher's network, which is typically complex, while a smaller network designed based on task requirements is the student's network. The knowledge distillation considers that the output result of the last layer of the teacher network contains rich knowledge learned by the model, and the knowledge is reflected by the output distribution, so that the output of the student network is expected to learn the output distribution of the last layer of the teacher network.
In the technology of the invention, the distillation process comprises the following specific steps: for a set of n random vectors z1,z2,…,znInputting the set into the generation network, and obtaining the output result of the generation network as:
Figure BDA0002992755770000091
the generated pictures are respectively input into a teacher network and a student network, so that the output of the teacher network can be obtained
Figure BDA0002992755770000092
And output of student network
Figure BDA0002992755770000093
With knowledge distillation, the optimization objective function of the student network is:
Figure BDA0002992755770000094
Wherein WSIs a parameter of the student network.
S104: and updating the student network.
Experiments were performed on three general Image type datasets using model network knowledge extraction and compression techniques based on a combination of generative networks and knowledge distillation, with cifar10, cifar100 and Natural Scene Image Classification, with cifar10 and cifar100 Image data each being 32 x 3 in size and Natural Scene Image Classification Image data being 112 x 3 in size. The task goal of the trained model is image classification, and the task goal of the smaller network model is to achieve classification of some of the image classes in the dataset. The teacher network selects a trained Resnet34 network structure and the student network selects a Resnet18 network structure. The results are shown in the following table:
Figure BDA0002992755770000095
Figure BDA0002992755770000101
the method has the advantages that the generated network is utilized to directly transfer part of the task knowledge needed by the trained network knowledge of the large-scale teacher, so that a good partial knowledge distillation process can be performed on the accuracy of part of the task targets of the original model. Moreover, partial task knowledge migration with different classification quantities on the original model has a better effect.
The network knowledge extraction and compression technology based on the combination of the generation network and the knowledge distillation, which is provided by the invention, can only learn the classification knowledge of a specific class in a large network according to different tasks when facing the large network, and migrate the classification knowledge to a smaller network. Meanwhile, the method can rely on the data less, knowledge distillation is carried out under the condition of no data, and the dependence of original knowledge distillation on real data is reduced.
In addition, the method mainly focuses on the directions of model compression and task object category extraction, and due to the improvement of computing capacity of high-speed computing equipment and the sharing convenience of network resources, people can obtain a trained network more and more easily, however, how to extract part of task knowledge in the network and move the task knowledge to a smaller network is a practical problem, the two problems can be solved simultaneously by the method, and the method can flexibly change different practical application requirements. On the basis of possessing a large-scale network which is trained, the method can reduce the possibility that the training of a small-scale network has a better classification effect on part of task targets, and can be more conveniently applied to various systems.
The above description is only an embodiment of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modifications made by using the design concept should fall within the scope of the invention.

Claims (5)

1. A model network extraction and compression method based on generation network and knowledge distillation is characterized by comprising the following steps:
training the trained teacher network to generate a loss function of the network by using a cifar10, a cifar100 and a Natural Scene Image Classification Image data set to obtain a trained generated network, wherein the trained teacher network task target is Image Classification;
generating a plurality of generated pictures according to the generation network;
inputting the generated pictures into a trained teacher network and a trained student network, and carrying out knowledge distillation on the student network;
updating the student network;
training a loss function of the generated network by using the trained teacher network to obtain the trained generated network, wherein the loss function specifically comprises the following steps:
Figure FDA0003618580520000014
wherein,
Figure FDA0003618580520000015
the teacher network generates cross entropy loss of the picture to the generator;
Figure FDA00036185805200000110
outputting the information entropy of the target task;
Figure FDA0003618580520000016
generating the probability that the image is judged as the target category by the teacher network;
Figure FDA0003618580520000017
distance of the network output feature map; α, β, γ, δ are
Figure FDA0003618580520000018
And
Figure FDA0003618580520000019
weights of the four loss functions in the loss function of the generator range from 0 to 1;
the generated pictures are input into a teacher network and a student network, knowledge distillation is carried out on the student network, and the method specifically comprises the following steps:
a set of n random vectors z1,z2,…,znAnd inputting the data into the generated network, wherein the output result of the generated network is as follows:
Figure FDA0003618580520000011
respectively inputting the generated pictures into a teacher network and a student network to obtain the output of the teacher network
Figure FDA0003618580520000012
And output of student network
Figure FDA0003618580520000013
Optimization objective using knowledge distillation, student networkThe standard function is:
Figure FDA0003618580520000021
wherein WSIs a parameter of the student network.
2. The model network extraction and compression method based on generation network and knowledge distillation as claimed in claim 1, wherein the trained teacher network is used to train the loss function of the generation network to obtain the trained generation network, and the method specifically comprises:
outputting the classification result of the teacher network generating the network generated picture by using the trained teacher network as feedback;
generating a loss function of the network by using feedback calculation;
calculating the gradient of the loss function, and updating the parameters of the generator network; and when the output of the picture generated by the generating network to the teacher network and the classification result of the teacher network to the real picture output meet the set requirements, obtaining the trained generating network.
3. The method of claim 1, wherein the loss function is a function of model network extraction and compression based on a generation network and knowledge distillation
Figure FDA0003618580520000027
The method specifically comprises the following steps:
Figure FDA0003618580520000022
wherein,
Figure FDA0003618580520000023
for the output of the teacher network for generating the picture,
Figure FDA0003618580520000024
outputting the obtained pseudo label by the teacher network for generating the picture; m is the number of pictures that the generator generates a batch.
4. The method of claim 3, wherein the loss function is a function of model network extraction and compression based on a generation network and knowledge distillation
Figure FDA0003618580520000028
And
Figure FDA0003618580520000029
the method specifically comprises the following steps:
Figure FDA0003618580520000025
Figure FDA0003618580520000026
wherein N is the total number of the trained model task categories; m is the number of task categories of the target part, M<N;piThe m pictures are distinguished as the frequency of the ith category for the teacher network.
5. The method of claim 3, wherein the loss function is a function of model network extraction and compression based on a generation network and knowledge distillation
Figure FDA0003618580520000035
The method specifically comprises the following steps:
Figure FDA0003618580520000031
wherein, the real image is defined as x ∈ χ, and the image generated by the generator is defined as
Figure FDA0003618580520000032
Figure FDA0003618580520000033
Generating a mean value of the picture;
Figure FDA0003618580520000034
to generate the variance of the picture, l is the ith layer of the network.
CN202110320646.8A 2021-03-25 2021-03-25 Model network extraction and compression method based on generation network and knowledge distillation Active CN113112020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110320646.8A CN113112020B (en) 2021-03-25 2021-03-25 Model network extraction and compression method based on generation network and knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110320646.8A CN113112020B (en) 2021-03-25 2021-03-25 Model network extraction and compression method based on generation network and knowledge distillation

Publications (2)

Publication Number Publication Date
CN113112020A CN113112020A (en) 2021-07-13
CN113112020B true CN113112020B (en) 2022-06-28

Family

ID=76712144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110320646.8A Active CN113112020B (en) 2021-03-25 2021-03-25 Model network extraction and compression method based on generation network and knowledge distillation

Country Status (1)

Country Link
CN (1) CN113112020B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792606B (en) * 2021-08-18 2024-04-26 清华大学 Low-cost self-supervision pedestrian re-identification model construction method based on multi-target tracking
CN113688990B (en) * 2021-09-09 2024-08-16 贵州电网有限责任公司 Data-free quantitative training method for power edge calculation classification neural network
CN114095447B (en) * 2021-11-22 2024-03-12 成都中科微信息技术研究院有限公司 Communication network encryption flow classification method based on knowledge distillation and self-distillation
CN114897155A (en) * 2022-03-30 2022-08-12 北京理工大学 Integrated model data-free compression method for satellite
CN115564024B (en) * 2022-10-11 2023-09-15 清华大学 Characteristic distillation method, device, electronic equipment and storage medium for generating network
CN116594994B (en) * 2023-03-30 2024-02-23 重庆师范大学 Application method of visual language knowledge distillation in cross-modal hash retrieval

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160533A (en) * 2019-12-31 2020-05-15 中山大学 Neural network acceleration method based on cross-resolution knowledge distillation
CN111709476A (en) * 2020-06-17 2020-09-25 浪潮集团有限公司 Knowledge distillation-based small classification model training method and device
CN111967534A (en) * 2020-09-03 2020-11-20 福州大学 Incremental learning method based on generation of confrontation network knowledge distillation
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation
CN112465111A (en) * 2020-11-17 2021-03-09 大连理工大学 Three-dimensional voxel image segmentation method based on knowledge distillation and countertraining

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160533A (en) * 2019-12-31 2020-05-15 中山大学 Neural network acceleration method based on cross-resolution knowledge distillation
CN111709476A (en) * 2020-06-17 2020-09-25 浪潮集团有限公司 Knowledge distillation-based small classification model training method and device
CN111967534A (en) * 2020-09-03 2020-11-20 福州大学 Incremental learning method based on generation of confrontation network knowledge distillation
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation
CN112465111A (en) * 2020-11-17 2021-03-09 大连理工大学 Three-dimensional voxel image segmentation method based on knowledge distillation and countertraining

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
data-free learning of student networks;Hanting Chen et al.;《arXiv》;20191231;全文 *
Densely Distilled Flow-Based Knowledge Transfer in Teacher-Student Framework for Image Classification;Ji-Hoon Bae et al.;《 IEEE Transactions on Image Processing》;20200406;第29卷;全文 *
基于知识蒸馏的超分辨率卷积神经网络压缩方法;高钦泉 等;《计算机应用》;20191118;第39卷(第10期);第2802-2808页 *
基于量化卷积神经网络的模型压缩方法研究;郝立扬;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20200715(第7期);第I138-1277页 *

Also Published As

Publication number Publication date
CN113112020A (en) 2021-07-13

Similar Documents

Publication Publication Date Title
CN113112020B (en) Model network extraction and compression method based on generation network and knowledge distillation
CN108564029B (en) Face attribute recognition method based on cascade multitask learning deep neural network
CN105701502B (en) Automatic image annotation method based on Monte Carlo data equalization
CN112446423B (en) Fast hybrid high-order attention domain confrontation network method based on transfer learning
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
CN110633708A (en) Deep network significance detection method based on global model and local optimization
CN109816032A (en) Zero sample classification method and apparatus of unbiased mapping based on production confrontation network
CN112418351B (en) Zero sample learning image classification method based on global and local context sensing
CN113487629B (en) Image attribute editing method based on structured scene and text description
CN109635140B (en) Image retrieval method based on deep learning and density peak clustering
CN108710894A (en) A kind of Active Learning mask method and device based on cluster representative point
CN113569895A (en) Image processing model training method, processing method, device, equipment and medium
CN112862015A (en) Paper classification method and system based on hypergraph neural network
CN115937774A (en) Security inspection contraband detection method based on feature fusion and semantic interaction
CN109947948B (en) Knowledge graph representation learning method and system based on tensor
CN112017255A (en) Method for generating food image according to recipe
CN114357307B (en) News recommendation method based on multidimensional features
CN116258990A (en) Cross-modal affinity-based small sample reference video target segmentation method
Fan et al. A global and local surrogate-assisted genetic programming approach to image classification
CN114202021A (en) Knowledge distillation-based efficient image classification method and system
CN116957304A (en) Unmanned aerial vehicle group collaborative task allocation method and system
CN116797850A (en) Class increment image classification method based on knowledge distillation and consistency regularization
Zhu et al. Incremental classifier learning based on PEDCC-loss and cosine distance
CN112990336B (en) Deep three-dimensional point cloud classification network construction method based on competitive attention fusion
He et al. ECS-SC: Long-tailed classification via data augmentation based on easily confused sample selection and combination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant