CN117351279A - Self-distillation realization method for space-time distillation fusion - Google Patents

Self-distillation realization method for space-time distillation fusion Download PDF

Info

Publication number
CN117351279A
CN117351279A CN202311305326.0A CN202311305326A CN117351279A CN 117351279 A CN117351279 A CN 117351279A CN 202311305326 A CN202311305326 A CN 202311305326A CN 117351279 A CN117351279 A CN 117351279A
Authority
CN
China
Prior art keywords
distillation
network
mean
training
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311305326.0A
Other languages
Chinese (zh)
Inventor
朱隆熙
徐锋
刘宁钟
汪俊杰
谭健
王淑君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Lemote Information Technology Co ltd
Nanjing University of Aeronautics and Astronautics
Original Assignee
Jiangsu Lemote Information Technology Co ltd
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Lemote Information Technology Co ltd, Nanjing University of Aeronautics and Astronautics filed Critical Jiangsu Lemote Information Technology Co ltd
Priority to CN202311305326.0A priority Critical patent/CN117351279A/en
Publication of CN117351279A publication Critical patent/CN117351279A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of computer vision, and provides a self-distillation realization method for space-time distillation fusion, which comprises the following steps: s1, acquiring a CIFAR data set, and dividing the CIFAR data set into a training set, a testing set and data augmentation; s2, constructing a distillation frame neural network, using a residual network as a backbone network, taking the characteristics of four stages as branches when constructing the network, adding a bottleck layer and an FC layer as predictions of a student network, and using the last layer as a teacher network for distillation; meanwhile, the prediction result of the previous round of model is used as a guide in the time dimension, so that model training is performed under the guidance of the softening hard tag; s3, sending the CIFAR data set subjected to data augmentation and division into the distillation framework neural network for training until the distillation framework neural network converges, and obtaining a weight file; and S4, detecting the classification accuracy in the test image by using the trained distillation framework neural network and the weight file. The invention improves the accuracy of the model under distillation.

Description

Self-distillation realization method for space-time distillation fusion
Technical Field
The invention relates to the technical field of computer vision, in particular to a self-distillation realization method for space-time distillation fusion.
Background
Deep learning has made great progress in recent years, but is limited by huge calculation amount and parameter amount, and is difficult to be practically applied to resource-constrained devices. In order to make the depth model more efficient, one explores the field of knowledge distillation. In 2006, bucilua et al first proposed the idea of migrating knowledge of a large model to a small model. In 2015, hinton officially proposed the well-known concept of knowledge distillation. The main idea of knowledge distillation is: the student model obtains the accuracy equivalent to that of the teacher model by imitating the teacher model, and a key problem is how to migrate the knowledge of the teacher model to the student model.
Traditional knowledge distillation can be categorized into response-based knowledge distillation and feature-based knowledge distillation. Response-based knowledge generally refers to the neural response of the last output layer of the teacher model. The main idea is to directly simulate the final prediction of the teacher model. Reaction-based knowledge distillation is a simple and effective model compression method and is widely applied to different tasks and applications.
Feature-based knowledge distillation comes from the middle layer, is a good extension of response-based knowledge, and can be used as knowledge for supervising student model training by using the feature map of the middle layer. The most direct idea is to match the activation function values of intermediate features, in particular, zagoruyko and Komodake (2017) propose to represent knowledge with an attention map; to match semantic information between a teacher and students, chen et al (2021) proposes cross-layer KD to adaptively assign layers in the teacher network to layers in each student network by attention localization. However, the two classical approaches described above have two disadvantages including: the first disadvantage is that knowledge transfer is inefficient, meaning that the student model uses little or no knowledge in the teacher model. It is still rare for an outstanding student model to perform better than its teacher model; another disadvantage is how to design and train the appropriate teacher mode. The existing distillation framework requires a lot of effort and experiments to find the best teacher model architecture, which takes a relatively long time, for example, the conventional distillation method takes 14.67 hours to train the teacher network ResNet152 on CIFAR100 and 12.31 hours to train the student network ResNet50 in the second step.
Disclosure of Invention
The invention aims to provide a self-distillation realization method for space-time distillation fusion.
The invention aims to solve the problems of the prior art that the pre-training teacher network in the distillation frame neural network consumes long time, and the large scale difference between the teacher network and the student network causes poor student precision.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
a method for implementing self-distillation of space-time distillation fusion, comprising: s1, acquiring a CIFAR data set, and dividing the CIFAR data set into a training set, a testing set and data augmentation; s2, constructing a distillation frame neural network, using a residual network as a backbone network, taking the characteristics of four stages as branches when constructing the network, adding a bottleck layer and an FC layer as predictions of a student network, and using the last layer as a teacher network for distillation; meanwhile, the prediction result of the previous round of model is used as a guide in the time dimension, so that model training is performed under the guidance of the softening hard tag; s3, sending the CIFAR data set subjected to data augmentation and division into the distillation framework neural network for training until the distillation framework neural network converges, and obtaining a weight file; and S4, detecting the classification accuracy in the test image by using the trained distillation framework neural network and the weight file.
As a further improvement, in the step S1, the CIFAR10 and CIFAR100 data sets are used, and the training set and the test set are divided according to a ratio of five to one.
As a further improvement, in the step S2, a residual network is used as a backbone network, and the target convolutional neural network is first divided into several shallow segments according to the depth and the original structure thereof, the shallow layer network is regarded as a student model, and the deep layer network is regarded as a teacher model in concept.
As a further improvement, the step S2 includes: s21, regarding prediction results of different shallow networks in a residual network as student networks, and setting a bottleneck layer and a full connection layer which are only used for training and can be removed in reasoning after each shallow block; s22, sequencing feature predictions of the last layer of the sample, regarding individual top-1 categories of each position as one vote, aggregating the votes into a histogram, ranking the categories according to the occurrence frequency of the votes in the histogram, and guiding learning the shallow frequency by using frequency information.
As a further improvement, the step S3 includes: s31, aiming at the size of the target in the CIFAR data set, processing the CIFAR data set by using a data enhancement method of random clipping and random horizontal overturn; s32, optimizing by using a random gradient descent method, and attenuating the learning rate twice and attenuating from an initial value, so that a neural network can achieve a better distillation result; s33, different training super parameters are tried on the neural network, data augmentation and training are carried out on the input image, and when the loss function converges or the maximum iteration number is reached, training is stopped to obtain a self-distilled network file and a self-distilled weight file.
As a further improvement, the step S31 includes: prior to training the distillation frame neural network, assuming that the CIFAR100 dataset comprises a dataset of n samples, denoted x1, x2,., xn, the mean of the CIFAR100 dataset is expressed as the sum of the values of all samples divided by the number of samples, i.e., mean = (x1+x2..+ xn)/n; calculate the difference between each data point and the mean: (x 1-mean), (x 2-mean); the square of the difference is calculated: (x 1-mean)/(2), (x 2-mean)/(2), -x n-mean)/(2; calculating the mean value of the square difference value: [ (x 1-mean) ≡2+ (x 2-mean) fact2+ (xn-mean) fact2 ]/n; the data is normalized by converting the data into a distribution with a mean value of 0 and a standard deviation of 1, that is, normalized value= (original value-mean)/standard deviation.
As a further improvement, the step S32 includes: using the random weight as an initial weight for setting a learning rate, iteration number, batch_size, and the like; and in 100 and 150 rounds, the learning rate is attenuated from the initial value, so that the distillation frame neural network can achieve a better detection result.
As a further improvement, the step S33 includes: and (3) amplifying the input image, training, and stopping training to obtain a weight file after distillation when the loss function converges or the maximum iteration number is reached.
As a further improvement, the step S4 includes: s41, sending the test image into an improved residual error network backbone network, and acquiring convolution characteristics of four stages; s42, respectively carrying out weighted average and prediction on the convolution characteristics of the four stages; s43, obtaining the prediction results of the four stage sets through simple weighted average, and comparing the results of the four stages with the results of the five stages, and selecting the final result with high prediction accuracy.
The beneficial effects of the invention are as follows:
according to the invention, on the basis of the residual network backbone network, the deep network is used as a teacher network to distill the shallow student network, so that the shallow student network can learn deeper semantic information, and the classification accuracy of the model is enhanced.
According to the method for distilling loss in an improved mode, decoupling knowledge distillation is used, so that dark knowledge contained in non-target categories can be utilized more effectively, and the accuracy of target picture classification is improved.
The method well solves the problems that the pre-training time of a teacher network in the existing distillation frame is consuming and the accuracy of a small model is not up to standard, and improves the accuracy of the model under distillation.
Drawings
FIG. 1 is a schematic diagram of a method for realizing self-distillation of space-time distillation fusion according to an embodiment of the invention.
Fig. 2 is a schematic diagram of test results of a self-distillation implementation method of space-time distillation fusion according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
In the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Referring to fig. 1, a method for realizing self-distillation of space-time distillation fusion comprises the following steps: s1, acquiring a CIFAR data set, and dividing the CIFAR data set into a training set, a testing set and data augmentation; s2, constructing a distillation frame neural network, using a residual network as a backbone network, taking the characteristics of four stages as branches when constructing the network, adding a bottleck layer and an FC layer as predictions of a student network, and using the last layer as a teacher network for distillation; meanwhile, the prediction result of the previous round of model is used as a guide in the time dimension, so that model training is performed under the guidance of the softening hard tag; s3, sending the CIFAR data set subjected to data augmentation and division into the distillation framework neural network for training until the distillation framework neural network converges, and obtaining a weight file; and S4, detecting the classification accuracy in the test image by using the trained distillation framework neural network and the weight file.
In the step S1, the data sets of CIFAR10 and CIFAR100 are used, and the training set and the test set are divided according to a ratio of five to one.
In the step S2, a residual network is used as a backbone network, and the target convolutional neural network is firstly divided into a plurality of shallow segments according to the depth and the original structure thereof, the shallow layer network can be regarded as a student model, and the deep layer network can be regarded as a teacher model in concept; meanwhile, the prediction result of the model of the previous round is used as a teacher model of the time dimension, and valuable information in the teacher model is utilized to soften the hard tag; on the basis of a residual error network structure, four stage characteristics are used for constructing a space dimension teacher, and a predicted result of a model of the previous round is used as a teacher model of a time dimension to construct a new distillation frame neural network.
The step S2 includes: s21, regarding prediction results of different shallow networks in a residual network as student networks, and setting a bottleneck layer and a full connection layer which are only used for training and can be removed in reasoning after each shallow block; s22, utilizing the prediction result of the previous round of model as a guide in the time dimension to capture beneficial information contained in the model, and further adjusting the learning target and the label of the model in a soft guide mode in the training process, thereby optimizing the performance and the generalization capability of the model.
In the step S2, four branches are added to extract features, and then the bottleneck layer is utilized to extract features more effectively, and finally, prediction is performed through the FC layer.
The step S2 includes: extracting the characteristics from the first layer to the third layer of the residual network, and adding attention to enable the network to learn important characteristics; extracting the characteristics by using a bottleneck layer; then predicting the extracted features by using an FC layer; finally, the sample prediction of the previous round is mixed with the current hard tag to guide learning of rich information.
In the step S2, a residual network is constructed, and the residual network is composed of the following parts: 1. convolution layer: the residual network starts with three convolution layers, which are responsible for extracting features in an input image; 2. residual block: each residual block consists of three convolution layers, and realizes the residual learning of a network through jump connection; 3. pooling layer: the pooling layer in the residual error network is used for reducing the size and the parameter number of the feature map, so that the calculation efficiency is improved; 4. full tie layer: the output of the averaging pooling layer is connected to a predefined number of categories for performing a final classification task. Dividing the neural network into four layers of features with different depths according to four residual blocks, wherein the more the residual blocks pass through, the deeper the feature depths are, taking the features of the four stages as branches, adding a bottleneck layer and an FC layer as predictions of a student network, and distilling by using the last layer as a teacher network; meanwhile, deep features are sequenced and averaged, and frequency soft labels are generated to conduct guiding learning on shallow layers;
the step S3 includes: s31, aiming at the size of the target in the CIFAR data set, processing the CIFAR data set by using a data enhancement method of random clipping and random horizontal overturn; s32, optimizing by using a random gradient descent method, and attenuating the learning rate twice and attenuating from an initial value, so that a neural network can achieve a better distillation result; s33, different training super parameters are tried on the neural network, data augmentation and training are carried out on the input image, and when the loss function converges or the maximum iteration number is reached, training is stopped to obtain a self-distilled network file and a self-distilled weight file.
In the step S31, the original image is randomly cropped, and the cropping filling size is 4.
The step S31 includes:
before training the distillation frame neural network, assuming that the CIFAR100 dataset comprises a dataset of n samples, denoted x1, x2, xn, the mean of the CIFAR100 dataset is expressed as the sum of the values of all samples divided by the number of samples,
mean = (x1+x2+, +xn)/n;
calculate the difference between each data point and the mean: (x 1-mean), (x 2-mean);
the square of the difference is calculated: (x 1-mean)/(2), (x 2-mean)/(2), -x n-mean)/(2;
calculating the mean value of the square difference value: [ (x 1-mean) ≡2+ (x 2-mean) fact2+ (xn-mean) fact2 ]/n;
the data is normalized by converting the data into a distribution with a mean value of 0 and a standard deviation of 1, that is, normalized value= (original value-mean)/standard deviation.
The step S32 includes: using the random weight as an initial weight for setting a learning rate, iteration number, batch_size, and the like; and in 100 and 150 rounds, the learning rate is attenuated from the initial value, so that the distillation frame neural network can achieve a better detection result.
The step S33 includes: and (3) amplifying the input image, training, and stopping training to obtain a weight file after distillation when the loss function converges or the maximum iteration number is reached.
The step S4 includes: s41, sending the test image into an improved residual error network backbone network, and acquiring convolution characteristics of four stages; s42, respectively carrying out weighted average and prediction on the convolution characteristics of the four stages; s43, obtaining the prediction results of the four stage sets through simple weighted average, and comparing the results of the four stages with the results of the five stages, and selecting the final result with high prediction accuracy.
Fig. 2 shows the detection result of the method of the present invention, training and testing are performed on a total time series XP graphic card, the distillation temperature is set to be 4.0 during distillation, the weight attenuation in the random gradient descent algorithm is set to be 0.0001, the value of the loss function is output at the terminal during each round of training, the whole convergence condition is convenient to observe, and the test set is used for verification at the end of each round, the prediction result of each branch is also output during the training process, if Acc1-4 is represented as the prediction result of the first layer branch in the current four layers, ensembe represents the result of averaging after weighting different branches, the classification result of the fourth layer of the residual network is compared during verification accuracy, if the current verification result is greater than the historical optimal accuracy, the weight is updated, and the classification accuracy of 78.94% can be achieved on the cir 100 through verification.
The above examples are only for illustrating the technical scheme of the present invention and are not limiting. It will be understood by those skilled in the art that any modifications and equivalents that do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.

Claims (9)

1. A method for implementing self-distillation of temporal-spatial distillation fusion, comprising:
s1, acquiring a CIFAR data set, and dividing the CIFAR data set into a training set, a testing set and data augmentation;
s2, constructing a distillation frame neural network, using a residual network as a backbone network, taking the characteristics of four stages as branches when constructing the network, adding a bottleck layer and an FC layer as predictions of a student network, and using the last layer as a teacher network for distillation; meanwhile, the prediction result of the previous round of model is used as a guide in the time dimension, so that model training is performed under the guidance of the softening hard tag;
s3, sending the CIFAR data set subjected to data augmentation and division into the distillation framework neural network for training until the distillation framework neural network converges, and obtaining a weight file;
and S4, detecting the classification accuracy in the test image by using the trained distillation framework neural network and the weight file.
2. A method for realizing self-distillation of space-time distillation fusion according to claim 1, wherein in step S1,
the CIFAR10 and CIFAR100 data sets are used and the training and testing sets are partitioned according to a five to one ratio.
3. A method for realizing self-distillation of space-time distillation fusion according to claim 1, wherein in step S2,
the residual network is used as a backbone network, the target convolutional neural network is firstly divided into a plurality of shallow sections according to the depth and the original structure of the target convolutional neural network, the shallow layer network is regarded as a student model, and the deep layer network is regarded as a teacher model in concept.
4. The method for realizing self-distillation of space-time distillation fusion according to claim 1, wherein said step S2 comprises:
s21, regarding prediction results of different shallow networks in a residual network as student networks, and setting a bottleneck layer and a full connection layer which are only used for training and can be removed in reasoning after each shallow block;
s22, sequencing feature predictions of the last layer of the sample, regarding individual top-1 categories of each position as one vote, aggregating the votes into a histogram, ranking the categories according to the occurrence frequency of the votes in the histogram, and guiding learning the shallow frequency by using frequency information.
5. The method for realizing self-distillation of space-time distillation fusion according to claim 1, wherein said step S3 comprises:
s31, aiming at the size of the target in the CIFAR data set, processing the CIFAR data set by using a data enhancement method of random clipping and random horizontal overturn;
s32, optimizing by using a random gradient descent method, and attenuating the learning rate twice and attenuating from an initial value, so that a neural network can achieve a better distillation result;
s33, different training super parameters are tried on the neural network, data augmentation and training are carried out on the input image, and when the loss function converges or the maximum iteration number is reached, training is stopped to obtain a self-distilled network file and a self-distilled weight file.
6. The method for realizing self-distillation of temporal-spatial distillation fusion according to claim 5, wherein said step S31 comprises:
before training the distillation frame neural network, assuming that the CIFAR100 dataset comprises a dataset of n samples, denoted x1, x2, xn, the mean of the CIFAR100 dataset is expressed as the sum of the values of all samples divided by the number of samples,
mean = (x1+x2+, +xn)/n;
calculate the difference between each data point and the mean: (x 1-mean), (x 2-mean);
the square of the difference is calculated: (x 1-mean)/(2), (x 2-mean)/(2), -x n-mean)/(2;
calculating the mean value of the square difference value: [ (x 1-mean) ≡2+ (x 2-mean) fact2+ (xn-mean) fact2 ]/n;
the data is normalized by converting the data into a distribution with a mean value of 0 and a standard deviation of 1, that is, normalized value= (original value-mean)/standard deviation.
7. The method for realizing self-distillation of temporal-spatial distillation fusion according to claim 5, wherein said step S32 comprises:
using the random weight as an initial weight for setting a learning rate, iteration number, batch_size, and the like; and in 100 and 150 rounds, the learning rate is attenuated from the initial value, so that the distillation frame neural network can achieve a better detection result.
8. The method for realizing self-distillation of temporal-spatial distillation fusion according to claim 5, wherein said step S33 comprises:
and (3) amplifying the input image, training, and stopping training to obtain a weight file after distillation when the loss function converges or the maximum iteration number is reached.
9. The method for realizing self-distillation of space-time distillation fusion according to claim 1, wherein said step S4 comprises:
s41, sending the test image into an improved residual error network backbone network, and acquiring convolution characteristics of four stages;
s42, respectively carrying out weighted average and prediction on the convolution characteristics of the four stages;
s43, obtaining the prediction results of the four stage sets through simple weighted average, and comparing the results of the four stages with the results of the five stages, and selecting the final result with high prediction accuracy.
CN202311305326.0A 2023-10-10 2023-10-10 Self-distillation realization method for space-time distillation fusion Pending CN117351279A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311305326.0A CN117351279A (en) 2023-10-10 2023-10-10 Self-distillation realization method for space-time distillation fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311305326.0A CN117351279A (en) 2023-10-10 2023-10-10 Self-distillation realization method for space-time distillation fusion

Publications (1)

Publication Number Publication Date
CN117351279A true CN117351279A (en) 2024-01-05

Family

ID=89366157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311305326.0A Pending CN117351279A (en) 2023-10-10 2023-10-10 Self-distillation realization method for space-time distillation fusion

Country Status (1)

Country Link
CN (1) CN117351279A (en)

Similar Documents

Publication Publication Date Title
JP2022058915A (en) Method and device for training image recognition model, method and device for recognizing image, electronic device, storage medium, and computer program
CN106547871A (en) Method and apparatus is recalled based on the Search Results of neutral net
CN111738169B (en) Handwriting formula recognition method based on end-to-end network model
CN115170874A (en) Self-distillation implementation method based on decoupling distillation loss
CN113628059A (en) Associated user identification method and device based on multilayer graph attention network
CN112819024B (en) Model processing method, user data processing method and device and computer equipment
CN115761408A (en) Knowledge distillation-based federal domain adaptation method and system
CN112148994B (en) Information push effect evaluation method and device, electronic equipment and storage medium
CN112084936B (en) Face image preprocessing method, device, equipment and storage medium
CN111858879B (en) Question and answer method and system based on machine reading understanding, storage medium and computer equipment
CN117556369A (en) Power theft detection method and system for dynamically generated residual error graph convolution neural network
CN117351279A (en) Self-distillation realization method for space-time distillation fusion
CN115829029A (en) Channel attention-based self-distillation implementation method
CN117315400A (en) Self-distillation realization method based on characteristic frequency
CN115130003A (en) Model processing method, device, equipment and storage medium
CN113705873B (en) Construction method of film and television work score prediction model and score prediction method
CN116610770B (en) Judicial field case pushing method based on big data
CN114462391B (en) Nested entity identification method and system based on contrast learning
CN117197613B (en) Image quality prediction model training method and device and image quality prediction method and device
CN114882558B (en) Learning scene real-time identity authentication method based on face recognition technology
CN116503618B (en) Method and device for detecting remarkable target based on multi-mode and multi-stage feature aggregation
CN116416212B (en) Training method of road surface damage detection neural network and road surface damage detection neural network
CN112966569B (en) Image processing method and device, computer equipment and storage medium
CN113569136A (en) Video recommendation method and device, electronic equipment and storage medium
CN115759225A (en) Self-distillation implementation method based on comparative learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination