CN113807214B - Small target face recognition method based on deit affiliated network knowledge distillation - Google Patents

Small target face recognition method based on deit affiliated network knowledge distillation Download PDF

Info

Publication number
CN113807214B
CN113807214B CN202111015756.XA CN202111015756A CN113807214B CN 113807214 B CN113807214 B CN 113807214B CN 202111015756 A CN202111015756 A CN 202111015756A CN 113807214 B CN113807214 B CN 113807214B
Authority
CN
China
Prior art keywords
network
teacher
deit
characteristic
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111015756.XA
Other languages
Chinese (zh)
Other versions
CN113807214A (en
Inventor
宋尧哲
孟方舟
舒子婷
吴萌萌
童官军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Microsystem and Information Technology of CAS
Original Assignee
Shanghai Institute of Microsystem and Information Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Microsystem and Information Technology of CAS filed Critical Shanghai Institute of Microsystem and Information Technology of CAS
Priority to CN202111015756.XA priority Critical patent/CN113807214B/en
Publication of CN113807214A publication Critical patent/CN113807214A/en
Application granted granted Critical
Publication of CN113807214B publication Critical patent/CN113807214B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a small target face recognition method based on deit affiliated network knowledge distillation, which comprises the following steps: constructing a deit network as a student network, constructing a teacher network, adding a residual error connection module, and training the student network on a high-pixel face image by using the teacher network; inputting a small target face image into a trained student network to obtain a second classification characteristic and a second distillation characteristic; inputting an image which has the same identity as the trained deit network and is not downsampled into the teacher network to obtain a second teacher characteristic; constructing a third loss function according to the second classification characteristic and the real label, constructing a fourth loss function according to the second distillation characteristic and the second teacher characteristic, and adding the third loss function and the fourth loss function to obtain a second total loss; and performing secondary training on the trained deit network under the second total loss. The invention can effectively identify the small target face image.

Description

Small target face recognition method based on deit affiliated network knowledge distillation
Technical Field
The invention relates to the technical field of computer vision, in particular to a small target face recognition method based on deit affiliated network knowledge distillation.
Background
With the continuous updating and proposition of deep learning algorithm and corresponding large-scale data set, face recognition has been greatly developed. Under the conditions of fixed face pose (front face), clear image and closed environment (without 'uncertain' category), the face recognition accuracy can reach more than 99%.
However, in the monitoring environment, due to practical problems such as low resolution of the camera, long distance of the face target, and blurred relative motion of the target, the small target face image actually collected often has various states of gestures (for example, side face, head elevation), low resolution (less than 32×32 pixels), and noise interference. Meanwhile, as not all detected face targets can be matched with the identities of the people in the database under the field monitoring environment, the problem of small target face recognition becomes an open-loop environment problem at the same time.
For the above reasons, in a real environment, the performance of a face recognition algorithm with fixed posture, clear image and excellent performance in a closed environment tends to be drastically reduced. The performance degradation of the algorithm is not only reflected in that the performance of the algorithm is greatly reduced when the algorithm is directly tested on the small target face image in the monitoring environment after being trained on the high-pixel face image, but also reflected in that the performance is poor even if the algorithm is used for training in the small target face image in the monitoring environment and then tested on the same small target face image. The reason for this is that if training is performed on a high-pixel data set, the small-target face image test causes the problem of 'domain transfer' due to different distributions of the data sets, so that the fitting is over-performed; the training is directly performed on the small target face image, so that the characteristics are difficult to extract due to the fact that the pixels of the small target face image are too low (less than 32 x 32 pixels), and in addition, the low-pixel face recognition data set in a large-scale real environment does not exist in the existing public data set, so that a network with discrimination capability is difficult to form.
Aiming at the difficulty of the face recognition problem of a small target in a real environment, two algorithms with the best performance at present adopt a method for carrying out knowledge distillation based on a CNN network, and the method specifically comprises the following steps: the teacher network is a CNN network-based model pre-trained on high-pixel face images, and freezes parameters in the training process and only serves as a feature extractor. The student network is consistent with the teacher network, and participates in training in the training process. During training, a high-pixel face image is input to a teacher network, a small target face image obtained by downsampling the same high-pixel image is input to a student network, and the penultimate layer characteristics of students are enabled to approach to the corresponding layer of the teacher through designing a loss function, so that a student model can obtain the high-pixel image transmitted by the teacher model in knowledge distillation to extract characteristic information, and meanwhile, the student model classifies the loss function to learn the information of the small target face image. When the knowledge distillation loss function is designed, the traditional algorithm directly inputs the high and low pixel face images to the loss function, so that the high pixel face image recognition accuracy is damaged, and therefore, the traditional algorithm is improved on the basis, and a parallel feature layer input loss function is added. Because the teacher network feature layer has good discrimination capability in the high-pixel face image, the student network can acquire the discrimination characteristics expected to be the same in the face image with the same ID but downsampled by designing the loss function through the teacher feature layer and the student feature layer, so that the discrimination capability of the student network in the low-pixel face image is improved.
Disclosure of Invention
The invention aims to solve the technical problem of providing a small target face recognition method based on the deit affiliated network knowledge distillation, which can effectively recognize small target face images.
The technical scheme adopted for solving the technical problems is as follows: the small target face recognition method based on the deit affiliated network knowledge distillation comprises the following steps:
step (1): constructing a deit network as a student network, preprocessing a selected training set, and inputting the preprocessed training set into the student network to obtain a first classification characteristic and a first distillation characteristic;
step (2): selecting a teacher network which is pre-trained in the data set, and inputting the pre-processed training set into the teacher network to obtain a first teacher characteristic;
step (3): adding a residual error connection module after the last discrimination layer of the teacher network, wherein the residual error connection module participates in training;
step (4): constructing a first loss function according to the first classification feature and the real tag, constructing a second loss function according to the first distillation feature and the first teacher feature, and adding the first loss function and the second loss function to obtain a first total loss;
training a student network on a first face image with the teacher network at the first total loss;
step (5): inputting a second face image to the trained student network to obtain a second classification characteristic and a second distillation characteristic;
the pixel resolution of the first face image is higher than that of the second face image;
step (6): inputting a second face image which is the same as the trained student network but not downsampled to the teacher network to obtain a second teacher feature;
step (7): constructing a third loss function according to the second classification characteristic and the real label, constructing a fourth loss function according to the second distillation characteristic and the second teacher characteristic, and adding the third loss function and the fourth loss function to obtain a second total loss;
performing secondary training on the trained student network under the second total loss;
step (8): and identifying the input new second face image by using the secondarily trained student network.
In the step (1), the selected training set is preprocessed and then input to the student network, specifically: and adjusting the size of each image in the training set to 224 x 224 by an interpolation method, cutting out 14 x 14 image blocks according to the size of 16 x 16, and inputting the cut image blocks into the student network.
And in the step (1), the Vgface 2 high-pixel face image is used as a training set.
The step (2) specifically comprises the following steps: and selecting the SE+Resnet network which is pre-trained in the data set as a teacher network, fixing parameters of the SE+Resnet network, enabling the SE+Resnet network to be a feature extractor, and inputting the pre-processed training set into the teacher network to obtain a first teacher feature.
In the step (5), a second face image is input to the trained student network, specifically: and (3) performing downsampling on the trained student network input to be 16 x 16, and performing interpolation amplification to obtain 224 x 224 downsampled second face images.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: according to the student network, a transformer structure with a non-CNN structure is adopted as a model framework, and a transformer network non-local attention mechanism is utilized to combine each pixel point of an input image with the information of other pixel points of the whole image, so that the integral characteristic of the image is learned, the performance loss of the model is far smaller than that of the CNN network framework when the model faces a low-pixel image after pre-training, and the problems of model performance loss and over-fitting caused by the fact that the downsampled image is interpolated to be in the same dimension as the high-pixel image are avoided; according to the invention, the auxiliary residual error connection module is added in the teacher network, so that the 'what knowledge the teacher should teach' is parameterized, the 'model capacity gap' problem is avoided, the knowledge distillation method is changed into online-offline combined knowledge distillation, a stable and easily-converged model is obtained in a self-adaptive manner, and meanwhile, the student network absorbs good information from the teacher network; according to the invention, by the auxiliary knowledge distillation based on the deit network, the accuracy rate of a Tinyface dataset test set in a native low-pixel face dataset is 71.1%, and the highest accuracy rate is achieved in an end-to-end face recognition algorithm without enhancing the test set.
Drawings
FIG. 1 is a schematic diagram of a deit network of an embodiment of the invention;
FIG. 2 is a schematic diagram of a residual connection module according to an embodiment of the present invention;
fig. 3 is a schematic diagram of the overall architecture of a teacher network according to an embodiment of the present invention.
Detailed Description
The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.
The embodiment of the invention relates to a small target face recognition method based on the deit affiliated network knowledge distillation, which specifically comprises the following steps:
1. constructing a deit network as a student network (see fig. 1 in detail), selecting a Vggface2 high-pixel face image as a training set, adjusting the image size to 224 x 224 by an interpolation method, then splicing and cutting 14 x 14 image blocks according to the 16 x 16 size and the picture, and inputting the 14 x 14 image blocks into the deit network to obtain a first classification characteristic and a first distillation characteristic.
In fig. 1, patch keys are 768-dimensional features obtained by linear layer coding after 16×16 stitching and clipping of an image, and class keys and distillation token are respectively a learnable embedded vector with the same dimension as the patch keys, wherein the class keys are used for generating a judging layer of a final loss function with a real label, and distillation token is used for generating a judging layer of a final loss function with a teacher network output.
2. And (2) selecting the SE+Resnet network which is pre-trained in the Vgface 2 data set as a teacher network, fixing parameters of the SE+Resnet network, enabling the SE+Resnet network to be a feature extractor, and inputting the same face image in the step (1) into the teacher network to obtain a first teacher feature.
3. Constructing a first loss function according to the first classification characteristic obtained in the step 1 and the real label, constructing a second loss function according to the first distillation characteristic obtained in the step 1 and the first teacher characteristic obtained in the step 2, adding the first loss function and the second loss function to obtain a first total loss, and training a deit network on a high-pixel face image by using a teacher network under the first total loss. The genuine tag is here the person identity ID before downsampling, i.e. the genuine tag in step 1.
The function of the steps 1 to 3 is to pretrain the network with the high-pixel face image information before training the network with the downsampled face image information, so that the network learns basic features for face recognition, and the learned features in the high-pixel face image are conveniently utilized in the subsequent training with the low-pixel face image, thereby avoiding the problem that the model is difficult to converge due to the fact that the low-pixel face image is directly trained and the task is too complex.
4. And (3) adding an attached residual error connection module after the last layer of the judging layer of the teacher network in the step (2), wherein the last layer of the judging layer and the previous part still freeze parameters to be used as a feature extractor, and the newly added residual error connection module participates in training. Fig. 3 is a schematic diagram of the whole architecture of a teacher network incorporating a residual connection module.
Fig. 2 is a schematic diagram of an auxiliary residual error connection module, and by setting the auxiliary residual error connection module, the "what knowledge the teacher should teach" can be parameterized, so that the "model capacity gap" problem is avoided, the knowledge distillation method is changed into online-offline combined knowledge distillation, a stable and easily-converged model is obtained in a self-adaptive manner, and meanwhile, a student network can absorb good information from the teacher network.
5. And (3) inputting the dect model trained in the step (3) into a downsampled small target face image with downsampled 16 times, and then interpolating and amplifying the downsampled small target face image with downsampled 224 times to obtain a second classification characteristic and a second distillation characteristic.
6. And (3) inputting an image which has the same identity as the student network in the step (5) and is not subjected to downsampling into the teacher network in the step (4) to obtain a second teacher characteristic.
7. Constructing a third loss function according to the second classification characteristic obtained in the step 5 and the real label, constructing a fourth loss function according to the second distillation characteristic obtained in the step 5 and the second teacher characteristic obtained in the step 6, adding the third loss function and the fourth loss function to obtain a second total loss, and performing secondary training on the trained det network under the second total loss. The real label is the person identity ID after downsampling, and the person identity is unchanged before and after downsampling, so that the real label is the same as the real label in the step 1 and the step 3.
Further, the formulas of the first total loss and the second total loss in step 3 and step 7 can be expressed as follows:
L global =(1-λ)L CE (ψ(Z s ),y)+λτ 2 KL(ψ(Z s /τ),ψ(Z t /τ))
wherein λ is a weighting coefficient for adjusting the first total loss or the second total loss, and in this embodiment, 0.5 is selected; psi (·) is a softmax function, Z s Z for output of trained det network t For the output of the teacher network, y is a real label, namely the character identity ID corresponding to the face image, τ is the degree coefficient of knowledge distillation, and 1.25 is selected in the embodiment; by the method of Z s 、Z t The output of the teacher network and the student network (det network) can be softened by dividing by the knowledge distillation degree coefficient tau, so that knowledge distillation is better performed; psi (Z) s /τ)、ψ(Z t τ) output the softened teacher network output and the softened student network output by a softmax function, respectively.
L CE () Is a cross-loss function and L CE () Can be expressed as:
KL () is a KL divergence and can be expressed as:
8. and identifying the input small target face image by using the deit network trained for the second time.
9. By testing on a public data set Tinyface, the invention achieves a Rank-1 accuracy of 71.1%, and is the highest accuracy method in the current algorithm of the data set, and specific results are shown in Table 1.
Table 1 experimental results test comparison chart
Model Rank-1 Rank-20 mAP
DeepId2 17.4 25.2 12.1
SphereFace 22.3 35.5 16.2
VGGFace 30.4 40.4 23.1
CenterFace 32.1 44.5 24.6
CSRI 45.2 60.2 39.9
T-C 58.6 73.0 52.7
Shi 63.9 / /
SafwanKhalid 70.4 82.2 63.2
This embodiment mode 71.13 84.09 64.58
Therefore, the student network adopts a non-CNN-structured transformer structure as a model framework, and combines each pixel point of an input image with the information of the rest pixel points of the whole image by utilizing a non-local attention mechanism of the transformer network, so that the integral characteristic of the image is learned, and the performance loss of the model is far smaller than that of the CNN network framework when the model faces a low-pixel image after pre-training; according to the invention, the auxiliary residual error connection module is added in the teacher network, so that the knowledge distillation method is changed into online-offline combined knowledge distillation, a stable and easily-converged model is obtained in a self-adaptive manner, and meanwhile, the student network absorbs good information from the teacher network.

Claims (5)

1. The small target face recognition method based on the deit affiliated network knowledge distillation is characterized by comprising the following steps of:
step (1): constructing a deit network as a student network, preprocessing a selected training set, and inputting the preprocessed training set into the student network to obtain a first classification characteristic and a first distillation characteristic;
step (2): selecting a teacher network which is pre-trained in the data set, and inputting the pre-processed training set into the teacher network to obtain a first teacher characteristic;
step (3): adding a residual error connection module after the last discrimination layer of the teacher network, wherein the residual error connection module participates in training;
step (4): constructing a first loss function according to the first classification feature and the real tag, constructing a second loss function according to the first distillation feature and the first teacher feature, and adding the first loss function and the second loss function to obtain a first total loss;
training a student network on a first face image with the teacher network at the first total loss;
step (5): inputting a second face image to the trained student network to obtain a second classification characteristic and a second distillation characteristic;
the pixel resolution of the first face image is higher than that of the second face image;
step (6): inputting a second face image which is the same as the trained student network but not downsampled to the teacher network to obtain a second teacher feature;
step (7): constructing a third loss function according to the second classification characteristic and the real label, constructing a fourth loss function according to the second distillation characteristic and the second teacher characteristic, and adding the third loss function and the fourth loss function to obtain a second total loss;
performing secondary training on the trained student network under the second total loss;
step (8): and identifying the input new second face image by using the secondarily trained student network.
2. The small target face recognition method based on the deit affiliated network knowledge distillation according to claim 1, wherein in the step (1), the selected training set is preprocessed and then input to the student network, specifically: and adjusting the size of each image in the training set to 224 x 224 by an interpolation method, cutting out 14 x 14 image blocks according to the size of 16 x 16, and inputting the cut image blocks into the student network.
3. The small target face recognition method based on the deit affiliated network knowledge distillation according to claim 1, wherein Vggface2 high pixel face images are used as training sets in the step (1).
4. The small target face recognition method based on the deit affiliated network knowledge distillation according to claim 1, wherein the step (2) is specifically: and selecting the SE+Resnet network which is pre-trained in the data set as a teacher network, fixing parameters of the SE+Resnet network, enabling the SE+Resnet network to be a feature extractor, and inputting the pre-processed training set into the teacher network to obtain a first teacher feature.
5. The small target face recognition method based on the deit affiliated network knowledge distillation according to claim 1, wherein the step (5) inputs a second face image to the trained student network, specifically: and (3) performing downsampling on the trained student network input to be 16 x 16, and performing interpolation amplification to obtain 224 x 224 downsampled second face images.
CN202111015756.XA 2021-08-31 2021-08-31 Small target face recognition method based on deit affiliated network knowledge distillation Active CN113807214B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111015756.XA CN113807214B (en) 2021-08-31 2021-08-31 Small target face recognition method based on deit affiliated network knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111015756.XA CN113807214B (en) 2021-08-31 2021-08-31 Small target face recognition method based on deit affiliated network knowledge distillation

Publications (2)

Publication Number Publication Date
CN113807214A CN113807214A (en) 2021-12-17
CN113807214B true CN113807214B (en) 2024-01-05

Family

ID=78894457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111015756.XA Active CN113807214B (en) 2021-08-31 2021-08-31 Small target face recognition method based on deit affiliated network knowledge distillation

Country Status (1)

Country Link
CN (1) CN113807214B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947801B (en) * 2021-12-21 2022-07-26 中科视语(北京)科技有限公司 Face recognition method and device and electronic equipment
CN114743243B (en) * 2022-04-06 2024-05-31 平安科技(深圳)有限公司 Human face recognition method, device, equipment and storage medium based on artificial intelligence

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830813A (en) * 2018-06-12 2018-11-16 福建帝视信息科技有限公司 A kind of image super-resolution Enhancement Method of knowledge based distillation
CN110458765A (en) * 2019-01-25 2019-11-15 西安电子科技大学 The method for enhancing image quality of convolutional network is kept based on perception
CN110674688A (en) * 2019-08-19 2020-01-10 深圳力维智联技术有限公司 Face recognition model acquisition method, system and medium for video monitoring scene
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning
CN111160533A (en) * 2019-12-31 2020-05-15 中山大学 Neural network acceleration method based on cross-resolution knowledge distillation
CN111291637A (en) * 2020-01-19 2020-06-16 中国科学院上海微系统与信息技术研究所 Face detection method, device and equipment based on convolutional neural network
CN111444760A (en) * 2020-02-19 2020-07-24 天津大学 Traffic sign detection and identification method based on pruning and knowledge distillation
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation
CN112465138A (en) * 2020-11-20 2021-03-09 平安科技(深圳)有限公司 Model distillation method, device, storage medium and equipment
WO2021047286A1 (en) * 2019-09-12 2021-03-18 华为技术有限公司 Text processing model training method, and text processing method and apparatus
CN112613303A (en) * 2021-01-07 2021-04-06 福州大学 Knowledge distillation-based cross-modal image aesthetic quality evaluation method
CN112784964A (en) * 2021-01-27 2021-05-11 西安电子科技大学 Image classification method based on bridging knowledge distillation convolution neural network
WO2021118737A1 (en) * 2019-12-11 2021-06-17 Microsoft Technology Licensing, Llc Sentence similarity scoring using neural network distillation
CN112988975A (en) * 2021-04-09 2021-06-18 北京语言大学 Viewpoint mining method based on ALBERT and knowledge distillation
CN113205002A (en) * 2021-04-08 2021-08-03 南京邮电大学 Low-definition face recognition method, device, equipment and medium for unlimited video monitoring
CN113240580A (en) * 2021-04-09 2021-08-10 暨南大学 Lightweight image super-resolution reconstruction method based on multi-dimensional knowledge distillation
CN113257361A (en) * 2021-05-31 2021-08-13 中国科学院深圳先进技术研究院 Method, device and equipment for realizing self-adaptive protein prediction framework

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10872596B2 (en) * 2017-10-19 2020-12-22 Baidu Usa Llc Systems and methods for parallel wave generation in end-to-end text-to-speech
US11620515B2 (en) * 2019-11-07 2023-04-04 Salesforce.Com, Inc. Multi-task knowledge distillation for language model

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830813A (en) * 2018-06-12 2018-11-16 福建帝视信息科技有限公司 A kind of image super-resolution Enhancement Method of knowledge based distillation
CN110458765A (en) * 2019-01-25 2019-11-15 西安电子科技大学 The method for enhancing image quality of convolutional network is kept based on perception
CN110674688A (en) * 2019-08-19 2020-01-10 深圳力维智联技术有限公司 Face recognition model acquisition method, system and medium for video monitoring scene
WO2021047286A1 (en) * 2019-09-12 2021-03-18 华为技术有限公司 Text processing model training method, and text processing method and apparatus
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning
WO2021118737A1 (en) * 2019-12-11 2021-06-17 Microsoft Technology Licensing, Llc Sentence similarity scoring using neural network distillation
CN111160533A (en) * 2019-12-31 2020-05-15 中山大学 Neural network acceleration method based on cross-resolution knowledge distillation
CN111291637A (en) * 2020-01-19 2020-06-16 中国科学院上海微系统与信息技术研究所 Face detection method, device and equipment based on convolutional neural network
CN111444760A (en) * 2020-02-19 2020-07-24 天津大学 Traffic sign detection and identification method based on pruning and knowledge distillation
CN112116030A (en) * 2020-10-13 2020-12-22 浙江大学 Image classification method based on vector standardization and knowledge distillation
CN112465138A (en) * 2020-11-20 2021-03-09 平安科技(深圳)有限公司 Model distillation method, device, storage medium and equipment
CN112613303A (en) * 2021-01-07 2021-04-06 福州大学 Knowledge distillation-based cross-modal image aesthetic quality evaluation method
CN112784964A (en) * 2021-01-27 2021-05-11 西安电子科技大学 Image classification method based on bridging knowledge distillation convolution neural network
CN113205002A (en) * 2021-04-08 2021-08-03 南京邮电大学 Low-definition face recognition method, device, equipment and medium for unlimited video monitoring
CN112988975A (en) * 2021-04-09 2021-06-18 北京语言大学 Viewpoint mining method based on ALBERT and knowledge distillation
CN113240580A (en) * 2021-04-09 2021-08-10 暨南大学 Lightweight image super-resolution reconstruction method based on multi-dimensional knowledge distillation
CN113257361A (en) * 2021-05-31 2021-08-13 中国科学院深圳先进技术研究院 Method, device and equipment for realizing self-adaptive protein prediction framework

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
"A Seismic-Based Feature Extraction Algorithm for Robust Ground Target Classification";Qianwei Zhou等;《IEEE Signal Processing Letters》;第19卷;全文 *
"Contact Angle of an Evaporating Droplet of Binary Solution on a Super Wetting Surface";Mengmeng Wu等;《ARXIV》;全文 *
"On the Demystification of Knowledge Distillation: A Residual Network Perspective";Nandan Kumar Jha等;《ARXIV》;全文 *
"TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition";Ji Won Yoon等;《IEEE/ACM Transactions on Audio, Speech, and Language Processing 》;第29卷;全文 *
"基于改进YOLOv3的目标检测算法";赵琼等;《激光与光电子学进展》;第57卷(第12期);全文 *
"基于生成对抗网络与知识蒸馏的人脸修复与表情识别";姜慧明;《中国硕士学位论文全文数据库》;全文 *
"大视场域的目标检测与识别算法综述";李唐薇等;《激光与光电子学进展》;第57卷(第12期);全文 *
Pre-trained models for natural language processing: A survey;QIU XiPeng;SUN TianXiang;XU YiGe;SHAO YunFan;DAI Ning;HUANG XuanJing;;Science China(Technological Sciences)(第10期);全文 *
超高清视频画质提升技术及其芯片化方案;高新波;路文;查林;惠政;亓统帅;姜建德;;重庆邮电大学学报(自然科学版)(第05期);全文 *

Also Published As

Publication number Publication date
CN113807214A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN113688723B (en) Infrared image pedestrian target detection method based on improved YOLOv5
CN112926396B (en) Action identification method based on double-current convolution attention
CN111652202B (en) Method and system for solving video question-answer problem by improving video-language representation learning through self-adaptive space-time diagram model
CN111583263A (en) Point cloud segmentation method based on joint dynamic graph convolution
CN113807214B (en) Small target face recognition method based on deit affiliated network knowledge distillation
CN114492574A (en) Pseudo label loss unsupervised countermeasure domain adaptive picture classification method based on Gaussian uniform mixing model
CN111738054B (en) Behavior anomaly detection method based on space-time self-encoder network and space-time CNN
CN112150493A (en) Semantic guidance-based screen area detection method in natural scene
CN112307982A (en) Human behavior recognition method based on staggered attention-enhancing network
CN111598167B (en) Small sample image identification method and system based on graph learning
CN112085055A (en) Black box attack method based on migration model Jacobian array feature vector disturbance
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN114842343A (en) ViT-based aerial image identification method
CN116452862A (en) Image classification method based on domain generalization learning
CN111291705A (en) Cross-multi-target-domain pedestrian re-identification method
CN112016592B (en) Domain adaptive semantic segmentation method and device based on cross domain category perception
CN113096133A (en) Method for constructing semantic segmentation network based on attention mechanism
CN116433909A (en) Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method
CN116543250A (en) Model compression method based on class attention transmission
CN113159071B (en) Cross-modal image-text association anomaly detection method
CN115661539A (en) Less-sample image identification method embedded with uncertainty information
CN115630361A (en) Attention distillation-based federal learning backdoor defense method
CN114841887A (en) Image restoration quality evaluation method based on multi-level difference learning
CN114780767A (en) Large-scale image retrieval method and system based on deep convolutional neural network
CN114549958A (en) Night and disguised target detection method based on context information perception mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant