CN117351279A - Self-distillation realization method for space-time distillation fusion - Google Patents
Self-distillation realization method for space-time distillation fusion Download PDFInfo
- Publication number
- CN117351279A CN117351279A CN202311305326.0A CN202311305326A CN117351279A CN 117351279 A CN117351279 A CN 117351279A CN 202311305326 A CN202311305326 A CN 202311305326A CN 117351279 A CN117351279 A CN 117351279A
- Authority
- CN
- China
- Prior art keywords
- distillation
- network
- mean
- training
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004821 distillation Methods 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000004927 fusion Effects 0.000 title claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 43
- 238000013528 artificial neural network Methods 0.000 claims abstract description 31
- 238000012360 testing method Methods 0.000 claims abstract description 17
- 238000013434 data augmentation Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 8
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 230000002238 attenuated effect Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 101100153586 Caenorhabditis elegans top-1 gene Proteins 0.000 claims description 2
- 101100370075 Mus musculus Top1 gene Proteins 0.000 claims description 2
- 230000004931 aggregating effect Effects 0.000 claims description 2
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 238000013140 knowledge distillation Methods 0.000 description 9
- 230000006872 improvement Effects 0.000 description 8
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008904 neural response Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of computer vision, and provides a self-distillation realization method for space-time distillation fusion, which comprises the following steps: s1, acquiring a CIFAR data set, and dividing the CIFAR data set into a training set, a testing set and data augmentation; s2, constructing a distillation frame neural network, using a residual network as a backbone network, taking the characteristics of four stages as branches when constructing the network, adding a bottleck layer and an FC layer as predictions of a student network, and using the last layer as a teacher network for distillation; meanwhile, the prediction result of the previous round of model is used as a guide in the time dimension, so that model training is performed under the guidance of the softening hard tag; s3, sending the CIFAR data set subjected to data augmentation and division into the distillation framework neural network for training until the distillation framework neural network converges, and obtaining a weight file; and S4, detecting the classification accuracy in the test image by using the trained distillation framework neural network and the weight file. The invention improves the accuracy of the model under distillation.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a self-distillation realization method for space-time distillation fusion.
Background
Deep learning has made great progress in recent years, but is limited by huge calculation amount and parameter amount, and is difficult to be practically applied to resource-constrained devices. In order to make the depth model more efficient, one explores the field of knowledge distillation. In 2006, bucilua et al first proposed the idea of migrating knowledge of a large model to a small model. In 2015, hinton officially proposed the well-known concept of knowledge distillation. The main idea of knowledge distillation is: the student model obtains the accuracy equivalent to that of the teacher model by imitating the teacher model, and a key problem is how to migrate the knowledge of the teacher model to the student model.
Traditional knowledge distillation can be categorized into response-based knowledge distillation and feature-based knowledge distillation. Response-based knowledge generally refers to the neural response of the last output layer of the teacher model. The main idea is to directly simulate the final prediction of the teacher model. Reaction-based knowledge distillation is a simple and effective model compression method and is widely applied to different tasks and applications.
Feature-based knowledge distillation comes from the middle layer, is a good extension of response-based knowledge, and can be used as knowledge for supervising student model training by using the feature map of the middle layer. The most direct idea is to match the activation function values of intermediate features, in particular, zagoruyko and Komodake (2017) propose to represent knowledge with an attention map; to match semantic information between a teacher and students, chen et al (2021) proposes cross-layer KD to adaptively assign layers in the teacher network to layers in each student network by attention localization. However, the two classical approaches described above have two disadvantages including: the first disadvantage is that knowledge transfer is inefficient, meaning that the student model uses little or no knowledge in the teacher model. It is still rare for an outstanding student model to perform better than its teacher model; another disadvantage is how to design and train the appropriate teacher mode. The existing distillation framework requires a lot of effort and experiments to find the best teacher model architecture, which takes a relatively long time, for example, the conventional distillation method takes 14.67 hours to train the teacher network ResNet152 on CIFAR100 and 12.31 hours to train the student network ResNet50 in the second step.
Disclosure of Invention
The invention aims to provide a self-distillation realization method for space-time distillation fusion.
The invention aims to solve the problems of the prior art that the pre-training teacher network in the distillation frame neural network consumes long time, and the large scale difference between the teacher network and the student network causes poor student precision.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
a method for implementing self-distillation of space-time distillation fusion, comprising: s1, acquiring a CIFAR data set, and dividing the CIFAR data set into a training set, a testing set and data augmentation; s2, constructing a distillation frame neural network, using a residual network as a backbone network, taking the characteristics of four stages as branches when constructing the network, adding a bottleck layer and an FC layer as predictions of a student network, and using the last layer as a teacher network for distillation; meanwhile, the prediction result of the previous round of model is used as a guide in the time dimension, so that model training is performed under the guidance of the softening hard tag; s3, sending the CIFAR data set subjected to data augmentation and division into the distillation framework neural network for training until the distillation framework neural network converges, and obtaining a weight file; and S4, detecting the classification accuracy in the test image by using the trained distillation framework neural network and the weight file.
As a further improvement, in the step S1, the CIFAR10 and CIFAR100 data sets are used, and the training set and the test set are divided according to a ratio of five to one.
As a further improvement, in the step S2, a residual network is used as a backbone network, and the target convolutional neural network is first divided into several shallow segments according to the depth and the original structure thereof, the shallow layer network is regarded as a student model, and the deep layer network is regarded as a teacher model in concept.
As a further improvement, the step S2 includes: s21, regarding prediction results of different shallow networks in a residual network as student networks, and setting a bottleneck layer and a full connection layer which are only used for training and can be removed in reasoning after each shallow block; s22, sequencing feature predictions of the last layer of the sample, regarding individual top-1 categories of each position as one vote, aggregating the votes into a histogram, ranking the categories according to the occurrence frequency of the votes in the histogram, and guiding learning the shallow frequency by using frequency information.
As a further improvement, the step S3 includes: s31, aiming at the size of the target in the CIFAR data set, processing the CIFAR data set by using a data enhancement method of random clipping and random horizontal overturn; s32, optimizing by using a random gradient descent method, and attenuating the learning rate twice and attenuating from an initial value, so that a neural network can achieve a better distillation result; s33, different training super parameters are tried on the neural network, data augmentation and training are carried out on the input image, and when the loss function converges or the maximum iteration number is reached, training is stopped to obtain a self-distilled network file and a self-distilled weight file.
As a further improvement, the step S31 includes: prior to training the distillation frame neural network, assuming that the CIFAR100 dataset comprises a dataset of n samples, denoted x1, x2,., xn, the mean of the CIFAR100 dataset is expressed as the sum of the values of all samples divided by the number of samples, i.e., mean = (x1+x2..+ xn)/n; calculate the difference between each data point and the mean: (x 1-mean), (x 2-mean); the square of the difference is calculated: (x 1-mean)/(2), (x 2-mean)/(2), -x n-mean)/(2; calculating the mean value of the square difference value: [ (x 1-mean) ≡2+ (x 2-mean) fact2+ (xn-mean) fact2 ]/n; the data is normalized by converting the data into a distribution with a mean value of 0 and a standard deviation of 1, that is, normalized value= (original value-mean)/standard deviation.
As a further improvement, the step S32 includes: using the random weight as an initial weight for setting a learning rate, iteration number, batch_size, and the like; and in 100 and 150 rounds, the learning rate is attenuated from the initial value, so that the distillation frame neural network can achieve a better detection result.
As a further improvement, the step S33 includes: and (3) amplifying the input image, training, and stopping training to obtain a weight file after distillation when the loss function converges or the maximum iteration number is reached.
As a further improvement, the step S4 includes: s41, sending the test image into an improved residual error network backbone network, and acquiring convolution characteristics of four stages; s42, respectively carrying out weighted average and prediction on the convolution characteristics of the four stages; s43, obtaining the prediction results of the four stage sets through simple weighted average, and comparing the results of the four stages with the results of the five stages, and selecting the final result with high prediction accuracy.
The beneficial effects of the invention are as follows:
according to the invention, on the basis of the residual network backbone network, the deep network is used as a teacher network to distill the shallow student network, so that the shallow student network can learn deeper semantic information, and the classification accuracy of the model is enhanced.
According to the method for distilling loss in an improved mode, decoupling knowledge distillation is used, so that dark knowledge contained in non-target categories can be utilized more effectively, and the accuracy of target picture classification is improved.
The method well solves the problems that the pre-training time of a teacher network in the existing distillation frame is consuming and the accuracy of a small model is not up to standard, and improves the accuracy of the model under distillation.
Drawings
FIG. 1 is a schematic diagram of a method for realizing self-distillation of space-time distillation fusion according to an embodiment of the invention.
Fig. 2 is a schematic diagram of test results of a self-distillation implementation method of space-time distillation fusion according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
In the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Referring to fig. 1, a method for realizing self-distillation of space-time distillation fusion comprises the following steps: s1, acquiring a CIFAR data set, and dividing the CIFAR data set into a training set, a testing set and data augmentation; s2, constructing a distillation frame neural network, using a residual network as a backbone network, taking the characteristics of four stages as branches when constructing the network, adding a bottleck layer and an FC layer as predictions of a student network, and using the last layer as a teacher network for distillation; meanwhile, the prediction result of the previous round of model is used as a guide in the time dimension, so that model training is performed under the guidance of the softening hard tag; s3, sending the CIFAR data set subjected to data augmentation and division into the distillation framework neural network for training until the distillation framework neural network converges, and obtaining a weight file; and S4, detecting the classification accuracy in the test image by using the trained distillation framework neural network and the weight file.
In the step S1, the data sets of CIFAR10 and CIFAR100 are used, and the training set and the test set are divided according to a ratio of five to one.
In the step S2, a residual network is used as a backbone network, and the target convolutional neural network is firstly divided into a plurality of shallow segments according to the depth and the original structure thereof, the shallow layer network can be regarded as a student model, and the deep layer network can be regarded as a teacher model in concept; meanwhile, the prediction result of the model of the previous round is used as a teacher model of the time dimension, and valuable information in the teacher model is utilized to soften the hard tag; on the basis of a residual error network structure, four stage characteristics are used for constructing a space dimension teacher, and a predicted result of a model of the previous round is used as a teacher model of a time dimension to construct a new distillation frame neural network.
The step S2 includes: s21, regarding prediction results of different shallow networks in a residual network as student networks, and setting a bottleneck layer and a full connection layer which are only used for training and can be removed in reasoning after each shallow block; s22, utilizing the prediction result of the previous round of model as a guide in the time dimension to capture beneficial information contained in the model, and further adjusting the learning target and the label of the model in a soft guide mode in the training process, thereby optimizing the performance and the generalization capability of the model.
In the step S2, four branches are added to extract features, and then the bottleneck layer is utilized to extract features more effectively, and finally, prediction is performed through the FC layer.
The step S2 includes: extracting the characteristics from the first layer to the third layer of the residual network, and adding attention to enable the network to learn important characteristics; extracting the characteristics by using a bottleneck layer; then predicting the extracted features by using an FC layer; finally, the sample prediction of the previous round is mixed with the current hard tag to guide learning of rich information.
In the step S2, a residual network is constructed, and the residual network is composed of the following parts: 1. convolution layer: the residual network starts with three convolution layers, which are responsible for extracting features in an input image; 2. residual block: each residual block consists of three convolution layers, and realizes the residual learning of a network through jump connection; 3. pooling layer: the pooling layer in the residual error network is used for reducing the size and the parameter number of the feature map, so that the calculation efficiency is improved; 4. full tie layer: the output of the averaging pooling layer is connected to a predefined number of categories for performing a final classification task. Dividing the neural network into four layers of features with different depths according to four residual blocks, wherein the more the residual blocks pass through, the deeper the feature depths are, taking the features of the four stages as branches, adding a bottleneck layer and an FC layer as predictions of a student network, and distilling by using the last layer as a teacher network; meanwhile, deep features are sequenced and averaged, and frequency soft labels are generated to conduct guiding learning on shallow layers;
the step S3 includes: s31, aiming at the size of the target in the CIFAR data set, processing the CIFAR data set by using a data enhancement method of random clipping and random horizontal overturn; s32, optimizing by using a random gradient descent method, and attenuating the learning rate twice and attenuating from an initial value, so that a neural network can achieve a better distillation result; s33, different training super parameters are tried on the neural network, data augmentation and training are carried out on the input image, and when the loss function converges or the maximum iteration number is reached, training is stopped to obtain a self-distilled network file and a self-distilled weight file.
In the step S31, the original image is randomly cropped, and the cropping filling size is 4.
The step S31 includes:
before training the distillation frame neural network, assuming that the CIFAR100 dataset comprises a dataset of n samples, denoted x1, x2, xn, the mean of the CIFAR100 dataset is expressed as the sum of the values of all samples divided by the number of samples,
mean = (x1+x2+, +xn)/n;
calculate the difference between each data point and the mean: (x 1-mean), (x 2-mean);
the square of the difference is calculated: (x 1-mean)/(2), (x 2-mean)/(2), -x n-mean)/(2;
calculating the mean value of the square difference value: [ (x 1-mean) ≡2+ (x 2-mean) fact2+ (xn-mean) fact2 ]/n;
the data is normalized by converting the data into a distribution with a mean value of 0 and a standard deviation of 1, that is, normalized value= (original value-mean)/standard deviation.
The step S32 includes: using the random weight as an initial weight for setting a learning rate, iteration number, batch_size, and the like; and in 100 and 150 rounds, the learning rate is attenuated from the initial value, so that the distillation frame neural network can achieve a better detection result.
The step S33 includes: and (3) amplifying the input image, training, and stopping training to obtain a weight file after distillation when the loss function converges or the maximum iteration number is reached.
The step S4 includes: s41, sending the test image into an improved residual error network backbone network, and acquiring convolution characteristics of four stages; s42, respectively carrying out weighted average and prediction on the convolution characteristics of the four stages; s43, obtaining the prediction results of the four stage sets through simple weighted average, and comparing the results of the four stages with the results of the five stages, and selecting the final result with high prediction accuracy.
Fig. 2 shows the detection result of the method of the present invention, training and testing are performed on a total time series XP graphic card, the distillation temperature is set to be 4.0 during distillation, the weight attenuation in the random gradient descent algorithm is set to be 0.0001, the value of the loss function is output at the terminal during each round of training, the whole convergence condition is convenient to observe, and the test set is used for verification at the end of each round, the prediction result of each branch is also output during the training process, if Acc1-4 is represented as the prediction result of the first layer branch in the current four layers, ensembe represents the result of averaging after weighting different branches, the classification result of the fourth layer of the residual network is compared during verification accuracy, if the current verification result is greater than the historical optimal accuracy, the weight is updated, and the classification accuracy of 78.94% can be achieved on the cir 100 through verification.
The above examples are only for illustrating the technical scheme of the present invention and are not limiting. It will be understood by those skilled in the art that any modifications and equivalents that do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.
Claims (9)
1. A method for implementing self-distillation of temporal-spatial distillation fusion, comprising:
s1, acquiring a CIFAR data set, and dividing the CIFAR data set into a training set, a testing set and data augmentation;
s2, constructing a distillation frame neural network, using a residual network as a backbone network, taking the characteristics of four stages as branches when constructing the network, adding a bottleck layer and an FC layer as predictions of a student network, and using the last layer as a teacher network for distillation; meanwhile, the prediction result of the previous round of model is used as a guide in the time dimension, so that model training is performed under the guidance of the softening hard tag;
s3, sending the CIFAR data set subjected to data augmentation and division into the distillation framework neural network for training until the distillation framework neural network converges, and obtaining a weight file;
and S4, detecting the classification accuracy in the test image by using the trained distillation framework neural network and the weight file.
2. A method for realizing self-distillation of space-time distillation fusion according to claim 1, wherein in step S1,
the CIFAR10 and CIFAR100 data sets are used and the training and testing sets are partitioned according to a five to one ratio.
3. A method for realizing self-distillation of space-time distillation fusion according to claim 1, wherein in step S2,
the residual network is used as a backbone network, the target convolutional neural network is firstly divided into a plurality of shallow sections according to the depth and the original structure of the target convolutional neural network, the shallow layer network is regarded as a student model, and the deep layer network is regarded as a teacher model in concept.
4. The method for realizing self-distillation of space-time distillation fusion according to claim 1, wherein said step S2 comprises:
s21, regarding prediction results of different shallow networks in a residual network as student networks, and setting a bottleneck layer and a full connection layer which are only used for training and can be removed in reasoning after each shallow block;
s22, sequencing feature predictions of the last layer of the sample, regarding individual top-1 categories of each position as one vote, aggregating the votes into a histogram, ranking the categories according to the occurrence frequency of the votes in the histogram, and guiding learning the shallow frequency by using frequency information.
5. The method for realizing self-distillation of space-time distillation fusion according to claim 1, wherein said step S3 comprises:
s31, aiming at the size of the target in the CIFAR data set, processing the CIFAR data set by using a data enhancement method of random clipping and random horizontal overturn;
s32, optimizing by using a random gradient descent method, and attenuating the learning rate twice and attenuating from an initial value, so that a neural network can achieve a better distillation result;
s33, different training super parameters are tried on the neural network, data augmentation and training are carried out on the input image, and when the loss function converges or the maximum iteration number is reached, training is stopped to obtain a self-distilled network file and a self-distilled weight file.
6. The method for realizing self-distillation of temporal-spatial distillation fusion according to claim 5, wherein said step S31 comprises:
before training the distillation frame neural network, assuming that the CIFAR100 dataset comprises a dataset of n samples, denoted x1, x2, xn, the mean of the CIFAR100 dataset is expressed as the sum of the values of all samples divided by the number of samples,
mean = (x1+x2+, +xn)/n;
calculate the difference between each data point and the mean: (x 1-mean), (x 2-mean);
the square of the difference is calculated: (x 1-mean)/(2), (x 2-mean)/(2), -x n-mean)/(2;
calculating the mean value of the square difference value: [ (x 1-mean) ≡2+ (x 2-mean) fact2+ (xn-mean) fact2 ]/n;
the data is normalized by converting the data into a distribution with a mean value of 0 and a standard deviation of 1, that is, normalized value= (original value-mean)/standard deviation.
7. The method for realizing self-distillation of temporal-spatial distillation fusion according to claim 5, wherein said step S32 comprises:
using the random weight as an initial weight for setting a learning rate, iteration number, batch_size, and the like; and in 100 and 150 rounds, the learning rate is attenuated from the initial value, so that the distillation frame neural network can achieve a better detection result.
8. The method for realizing self-distillation of temporal-spatial distillation fusion according to claim 5, wherein said step S33 comprises:
and (3) amplifying the input image, training, and stopping training to obtain a weight file after distillation when the loss function converges or the maximum iteration number is reached.
9. The method for realizing self-distillation of space-time distillation fusion according to claim 1, wherein said step S4 comprises:
s41, sending the test image into an improved residual error network backbone network, and acquiring convolution characteristics of four stages;
s42, respectively carrying out weighted average and prediction on the convolution characteristics of the four stages;
s43, obtaining the prediction results of the four stage sets through simple weighted average, and comparing the results of the four stages with the results of the five stages, and selecting the final result with high prediction accuracy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311305326.0A CN117351279A (en) | 2023-10-10 | 2023-10-10 | Self-distillation realization method for space-time distillation fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311305326.0A CN117351279A (en) | 2023-10-10 | 2023-10-10 | Self-distillation realization method for space-time distillation fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117351279A true CN117351279A (en) | 2024-01-05 |
Family
ID=89366157
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311305326.0A Pending CN117351279A (en) | 2023-10-10 | 2023-10-10 | Self-distillation realization method for space-time distillation fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117351279A (en) |
-
2023
- 2023-10-10 CN CN202311305326.0A patent/CN117351279A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2022058915A (en) | Method and device for training image recognition model, method and device for recognizing image, electronic device, storage medium, and computer program | |
CN106547871A (en) | Method and apparatus is recalled based on the Search Results of neutral net | |
CN111738169B (en) | Handwriting formula recognition method based on end-to-end network model | |
CN115170874A (en) | Self-distillation implementation method based on decoupling distillation loss | |
CN113628059A (en) | Associated user identification method and device based on multilayer graph attention network | |
CN112819024B (en) | Model processing method, user data processing method and device and computer equipment | |
CN115761408A (en) | Knowledge distillation-based federal domain adaptation method and system | |
CN112148994B (en) | Information push effect evaluation method and device, electronic equipment and storage medium | |
CN112084936B (en) | Face image preprocessing method, device, equipment and storage medium | |
CN111858879B (en) | Question and answer method and system based on machine reading understanding, storage medium and computer equipment | |
CN117556369A (en) | Power theft detection method and system for dynamically generated residual error graph convolution neural network | |
CN117351279A (en) | Self-distillation realization method for space-time distillation fusion | |
CN115829029A (en) | Channel attention-based self-distillation implementation method | |
CN117315400A (en) | Self-distillation realization method based on characteristic frequency | |
CN115130003A (en) | Model processing method, device, equipment and storage medium | |
CN113705873B (en) | Construction method of film and television work score prediction model and score prediction method | |
CN116610770B (en) | Judicial field case pushing method based on big data | |
CN114462391B (en) | Nested entity identification method and system based on contrast learning | |
CN117197613B (en) | Image quality prediction model training method and device and image quality prediction method and device | |
CN114882558B (en) | Learning scene real-time identity authentication method based on face recognition technology | |
CN116503618B (en) | Method and device for detecting remarkable target based on multi-mode and multi-stage feature aggregation | |
CN116416212B (en) | Training method of road surface damage detection neural network and road surface damage detection neural network | |
CN112966569B (en) | Image processing method and device, computer equipment and storage medium | |
CN113569136A (en) | Video recommendation method and device, electronic equipment and storage medium | |
CN115759225A (en) | Self-distillation implementation method based on comparative learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |