CN114972839A - Generalized continuous classification method based on online contrast distillation network - Google Patents
Generalized continuous classification method based on online contrast distillation network Download PDFInfo
- Publication number
- CN114972839A CN114972839A CN202210326319.8A CN202210326319A CN114972839A CN 114972839 A CN114972839 A CN 114972839A CN 202210326319 A CN202210326319 A CN 202210326319A CN 114972839 A CN114972839 A CN 114972839A
- Authority
- CN
- China
- Prior art keywords
- model
- feature
- data
- student
- student model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a generalized continuous classification method based on an online comparison distillation network, which is used for establishing a classification model based on knowledge distillation; establishing a buffer area and updating the buffer area by using a reservoir sampling method; randomly sampling S samples from the buffer area and respectively inputting the samples into the teacher model and the student model to obtain classification output and feature embedding corresponding to the teacher model and the student model; calculating and classifying and outputting the quality fraction according to the teacher model, adjusting knowledge distillation loss function weights of different samples and calculating distillation lossComparing the characteristic embedding between the two models, and calculating the distillation loss of the comparison relation of the two modelsCalculating student model self-supervision lossAnd supervised contrast learning lossCalculating cross entropy classification loss of student modelAnd (4) weighting and adding the losses to determine parameters of the maximum optimization objective optimization student model. And updating the parameters of the teacher model by the parameters of the student model. The invention has good classification accuracy for both new tasks and old tasks.
Description
Technical Field
The invention relates to a generalized continuous classification method, in particular to a generalized continuous classification method based on an online contrast distillation network.
Background
At present, in recent years, deep learning has achieved good effects on computer vision tasks such as image classification, target detection and semantic segmentation. However, when a neural network trained on an old task is trained directly on a new task, the new task can severely interfere with the performance of the old task, creating a Catastrophic Forgetting problem. It is obvious that training a neural network from scratch consumes more time and computing resources, and data of a previous task is not necessarily acquired again due to privacy and other problems. The human beings have the ability of continuously learning, and can quickly learn new knowledge on the basis of old knowledge without damaging the stability of the previously learned knowledge. It is expected that neural networks will have this ability to learn continuously, and continuous Learning (also called Incremental Learning) is proposed to overcome the problem of catastrophic forgetting. In recent years, a lot of continuous learning work adopts the idea of empirical Replay (Experience Replay), samples of a part of old tasks are stored, and the stored samples are replayed when training new tasks, so that the catastrophic forgetting problem is alleviated.
In the existing continuous learning technology, it is often necessary to assume that categories among tasks are mutually disjoint, that is, categories in a new task are not present in an old task, and clear task boundaries exist between tasks, and there is a high possibility that such a priori knowledge does not exist in real-world tasks. Many existing techniques exploit the unlikely a priori knowledge of such realistic tasks, simplifying the difficulty of continuous learning problems. For example, when the model output of old samples at the past time is used to output a regular model of old samples at the present time to alleviate catastrophic forgetting, the dimension of the old model output and the dimension of the new model output become inconsistent due to the arrival of new classes, and in the case where it is assumed that classes are complementary between tasks, the output of the new model can be made to coincide with only a part of the old model. In the past, a continuous method of utilizing the mutual disjointness of the task classes cannot be applied to the setting of the generalized continuous learning. For this reason, a General continuous Learning (General continuous Learning) technique for solving the catastrophic forgetting problem in a real scene is receiving attention. The goal of generalized continuous learning is to consolidate learned knowledge from a non-stationary infinite data stream and learn new knowledge quickly. Under the setting of generalized continuous learning, the categories of each task may be intersected, new samples of old categories may appear in new tasks, and the conventional method for solving the continuous learning by means of the prior knowledge which does not necessarily exist in the real world is difficult to apply to the generalized continuous learning.
The generalized continuous Learning is a general continuous Learning scenario, and can also be applied to the classic Class Incremental Learning (Class Incremental Learning), Task Incremental Learning (Task Incremental Learning) and Domain Incremental Learning (Domain Incremental Learning) scenarios. But specific a priori knowledge in these classical scenarios cannot be exploited to alleviate catastrophic forgetfulness when image classification is done in scenarios of broad continuous learning. This means that in empirical playback some inherent non-scene specific information must be mined to consolidate the old task knowledge.
Disclosure of Invention
The invention provides a generalized continuous classification method based on an online contrast distillation network for solving the technical problems in the prior art.
The technical scheme adopted by the invention for solving the technical problems in the prior art is as follows: a generalized continuous classification method based on an online comparative distillation network comprises the following steps:
step 1, establishing a knowledge distillation-based classification model, wherein the classification model comprises a teacher model and a student model; the teacher model and the student model are provided with a feature encoder, a classifier and a feature mapper; setting an optimization target of the student model; initializing parameters of a teacher model and parameters of a student model and endowing the parameters to a buffer area with a fixed size;
step 2, when a batch data stream containing R samples arrives, counting the number of the samples encountered currently, and updating a buffer area by using a reservoir sampling method;
step 3, randomly sampling S samples from the buffer area, respectively inputting the S samples into a teacher model and a student model, and respectively obtaining classification output data of the teacher model and the student model corresponding to the S samples through the processing of respective feature encoders and classifiers of the S samples; respectively obtaining feature embedded data of the teacher model and the student model corresponding to the S samples through the processing of the feature encoder and the feature mapper of the teacher model and the feature mapper;
step 4, calculating the quality scores of the teacher model classified output data, adjusting the weights of the online knowledge distillation loss functions of different samples according to the quality scores of the teacher model classified output data, and further calculating the online distillation loss of the teacher model and the student model
Step 5, comparing characteristic embedded data between the teacher model and the student models, and calculating the comparison relation distillation loss of the teacher model and the student models
Step 6, using the self-supervision learning and the supervision contrast learning to help the student model to extract the discriminant characteristics to calculate the self-supervision loss of the student modelAnd supervised contrast learning loss
Step 7, calculating the cross entropy classification loss of the student model based on experience playback
Step 8, calculating the total optimization target of the student model α 1 To alpha 3 The hyperparameters of each corresponding loss function; optimizing parameters of the student model by using a random gradient descent algorithm;
and 9, directly utilizing the parameters of the student model to update the parameters of the teacher model.
Further, in step 2, assume that the non-stationary data stream is composed of n sample disjoint tasks { T } 1 ,T 2 ,...,T n Are composed of, each task T n The training set of (A) is all marked dataComposition, where m is task T n Number of samples, x, of training set i For task T n I-th image sample in training set, y i For task T n Ith image sample x in training set i The labeled category; buffer zoneHas a capacity ofx j For the jth image sample in the buffer, y j Is the j image sample x in the buffer j The labeled category; the method for sampling the water reservoir comprises the following steps:
step A1, judging the current encounteredNumber of samples num and buffer capacityThe size between, ifSample (x) i ,y i ) Direct storage to bufferPerforming the following steps; x is the number of i For task T n I image sample in training set, y i For task T n Ith image sample x in training set i The labeled category;
step A2, ifGenerating a random integer rand _ num, wherein the minimum value of the random integer is 0, and the maximum value is num-1; if it isUsing samples (x) i ,y i ) Replacing samples (x) in the buffer rand_num ,y rand_num );x rand_num Buffer with index rand _ numOf (2) an image sample, y rand_num Buffer with index rand _ numThe image sample label in (1).
Further, in step 4, the method for calculating the quality score of the teacher model classification output data is as follows:
setting:indicates a capacity ofThe buffer area of (2); x is the number of j For the jth image sample in the buffer, y j Is the j image sample x in the buffer j The labeled category;represents a sample x j Classified output data are obtained after the characteristic encoder and the classifier of the teacher model are sequentially processed; omega (x) j ) For corresponding sample x j The teacher model of (1) classifies the quality score of the output data; omega (x) j ) The calculation formula of (a) is as follows:
in the formula:
ρ represents a temperature coefficient;
c represents the number of all possible categories;
exp (·) represents an exponential function based on a natural constant e;
outputting data for classificationMiddle corresponding category y j The classification output data of (1);
Further, in step 4, letRepresents a sample x j Sequentially pass through the student modelThe feature encoder and the classifier of (2) to obtain classified output data; calculating on-line distillation loss for teacher and student modelsThe method of (1) is as follows:
in the formula: l. capillary 2 Is represented by 2 A norm;representing a mathematical expectation function.
Further, in step 5, setting:indicates a capacity ofThe buffer area of (2); x is the number of j For the jth image sample in the buffer, y j Is the j image sample x in the buffer j The labeled category;is a sample x j After the input data is input into the teacher model, the characteristic embedded data of the teacher model is obtained through a characteristic encoder and a characteristic mapper;is a sample x j After the data is input into the student model, feature embedded data of the student model is obtained through a feature encoder and a feature mapper; z is a radical of t All samples x for the current batch j All teacher model feature embedded data obtained by processing the teacher model feature embedded data through a feature encoder and a feature mapper after being input into the teacher modelA set of (a); z is a radical of s All samples x for the current batch j All student model feature embedded data obtained by the feature encoder and the feature mapper after being input into the student modelA set of (a);denotes from z s Sampling the obtained characteristic embedded data; z is a radical of t+ Is represented byEmbedding teacher model characteristics with the same class label into a data set;denotes from z t+ Sampling the obtained characteristic embedded data;denotes from z t Sampling the obtained characteristic embedded data; calculating comparative distillation loss between teacher model and student modelThe method (2) is as follows:
in the formula:
|·| 2 is represented by 2 A norm;
log (-) represents a natural logarithmic function based on a natural constant e;
representing a judgment function for judging the feature-embedded dataAndwhether or not they originate from their joint distribution
Representing a judgment function for judging the feature-embedded dataAndwhether or not they originate from their joint distribution
(·) T Representing a transpose;
exp (·) represents an exponential function based on a natural constant e;
τ represents a temperature coefficient.
Further, step 6 includes the following sub-steps:
step B1, let Θ be t ,Φ t ,Ψ t Feature encoder and classification corresponding to teacher modelDevice and feature mapper s ,Φ s ,Ψ s The characteristic encoder, the classifier and the characteristic mapper correspond to the teacher model; each training sample (x, y) of the student model is subjected to one-time random geometric transformation to obtain an amplified training sampleWhere x represents an image sample, y is the class labeled by image sample x,for the image samples after the geometric transformation,a label that is geometrically transformed; amplifying the training sampleInputting the data into a student model, and processing the data by a feature coder and a feature mapper of the student model to obtain corresponding student model feature data F s And feature embedded dataWherein:
step B2, obtaining student model characteristic data F s Input to a multi-layer sensorIn the method, the method is used for judging the training samplesThe kind of geometric transformation to be performed; let the output of the multilayer perceptron be S s ,S s The calculation formula of (a) is as follows:
softmax (·) denotes a softmax function;
l (-) represents a cross entropy loss function;
step B4, setting:indicates a capacity ofThe buffer area of (2); x is the number of j For the jth image sample in the buffer, y j Is the j image sample x in the buffer j The labeled category;is a sample x j After the data is input into the student model, feature embedded data of the student model is obtained through a feature encoder and a feature mapper;feature embedded data representing acquired ensemble student modelsAnda set of (a);represents fromSampling the obtained characteristic embedded data;is shown andembedding the characteristics of the student model with the same class label into a data set;represents fromSampling the obtained characteristic embedded data;represents fromEmbedding the sampled features into data; based on the original characteristic embedded data and the amplified characteristic embedded data, the characteristic embedded data in the student model is utilized to carry out supervised contrast learning, and the loss function of the supervised contrast learningThe calculation formula of (c) is as follows:
in the formula:
|·| 2 represents l 2 A norm;
log (-) represents a natural logarithmic function based on a natural constant e;
exp (·) represents an exponential function based on a natural constant e;
τ represents a temperature coefficient;
step B5, loss will be self-monitoredAnd supervised contrast learning lossPerforming the combination yields a collaborative contrast lossHelp the student model to better extract discriminant features,the calculation formula of (a) is as follows:
further, in step B1, the geometric transformation includes rotating, scaling and adjusting the aspect ratio of the image.
Further, in step 7, assume that the non-stationary data stream is composed of n sample disjoint tasks { T } 1 ,T 2 ,...,T n Is formed, let x denote task T n And a buffer areaY is the category marked by the image sample x; cross entropy classification loss for student modelsThe calculation formula of (a) is as follows:
in the formula:
softmax (·) denotes a softmax function;
l (-) represents a cross entropy loss function;
r s (x) And the classification output data of the image sample x after being sequentially processed by a feature encoder and a classifier of the student model is represented.
Further, in step 9, a specific method for updating the parameters of the teacher model by using the parameters of the student model is as follows:
let Θ be t ,Φ t ,Ψ t Feature encoder, classifier and feature mapper for the teacher model s ,Φ s ,Ψ s The characteristic encoder, the classifier and the characteristic mapper correspond to the teacher model; the method for updating the parameters of the teacher model comprises the following steps:
Θ t ←mΘ t +(1-m)[(1-X)Θ t +XΘ s ];
Φ t ←mΦ t +(1-m)[(1-X)Φ t +XΦ s ];
Ψ t ←mΨ t +(1-m)[(1-X)Ψ t +XΨ s ];
where m represents a momentum factor and X obeys a Bernoulli distribution (also known as a 0-1 distribution), defined as:
P(X=k)=p k (1-p) 1-k ,k={0,1};
the value range of the Bernoulli probability p is (0, 1), and the update frequency of the teacher model is controlled through the Bernoulli probability p.
Further, the calculation formula of the momentum factor m is as follows:
m=min(itera/(itera+1),η);
where itera is the number of iterations of the current student model, min (itera/(itera +1), η) represents the smaller of itera/(itera +1) and η, and η is a constant and is typically set to 0.999.
The invention has the advantages and positive effects that: according to the generalized continuous classification method based on the online comparison distillation network, the teacher-student framework in online knowledge distillation is used for effectively consolidating the knowledge of the old task, so that the model has good classification accuracy on both the new task and the old task. In the training stage, the training strategy of comparison learning is introduced into online knowledge distillation, the teacher model integrates the weights of the student models at all times to realize the accumulation of knowledge, and the student models distill classified output data and comparison relations to the teacher model to relieve catastrophic forgetting. The teacher model and the student models are cooperated with each other, so that the student models can keep the performance of the old tasks, the teacher model can accumulate the weight which is more balanced to the old tasks and the new tasks when accumulating the weight, and the teacher model can better guide the student models to consolidate the knowledge of the old tasks when the student models train the new tasks. In the testing stage, the teacher model is used for testing, because the teacher model integrates the advantages of the student models which are good at distinguishing different classes at different moments, the teacher model can have good classification performance on all classes. Therefore, the invention can effectively integrate the advantages of the student network and improve the classification accuracy of the teacher network during testing.
Drawings
FIG. 1 is a flow chart of the broad continuous classification method based on an online contrast distillation network according to the present invention.
Detailed Description
For further understanding of the contents, features and effects of the present invention, the following embodiments are enumerated in conjunction with the accompanying drawings, and the following detailed description is given:
referring to fig. 1, a generalized continuous classification method based on an online comparative distillation network includes the following steps:
step 1, establishing a knowledge distillation-based classification model, wherein the classification model comprises a teacher model and a student model; the teacher model and the student model are provided with a feature encoder, a classifier and a feature mapper; setting an optimization target of the student model; parameters of the teacher model and the student model are initialized and a buffer area with a fixed size is given.
And 2, when a batch data stream containing R samples arrives, counting the number of the samples encountered currently, and updating the buffer area by using a reservoir sampling method.
Step 3, randomly sampling S samples from the buffer area, respectively inputting the S samples into a teacher model and a student model, and respectively obtaining classification output data of the teacher model and the student model corresponding to the S samples through the processing of respective feature encoders and classifiers of the S samples; respectively obtaining feature embedded data of the teacher model and the student model corresponding to the S samples through the processing of the feature encoder and the feature mapper of the teacher model and the feature mapper; namely: processing the S samples by a feature encoder and a classifier of the teacher model in sequence to obtain a classification output data set of the teacher model; after being processed by a feature encoder and a classifier of the student model in sequence, a classification output data set of the student model is obtained; after being processed by a feature encoder and a feature mapper of the teacher model in sequence, a feature embedded data set of the teacher model is obtained; and after being processed by a feature encoder and a feature mapper of the student model in sequence, the feature embedded data set of the student model is obtained.
Step 4, calculating the quality scores of the teacher model classified output data, adjusting the weights of the online knowledge distillation loss functions of different samples according to the quality scores of the teacher model classified output data, and further calculating the online distillation loss of the teacher model and the student model
Step 5, comparing the characteristic embedded data between the teacher model and the student models, and calculating the comparison relationship distillation loss of the teacher model and the student models
Step 6, using the self-supervision learning and the supervision contrast learning to help the student model to extract the discriminant characteristics to calculate the self-supervision loss of the student modelAnd supervised contrast learning loss
Step 7, calculating the cross entropy classification loss of the student model based on experience playback
Step 8, calculating the total optimization target of the student model α 1 To alpha 3 The hyperparameters of each corresponding loss function; and optimizing parameters of the student model by using a random gradient descent algorithm.
And 9, directly utilizing the parameters of the student model to update the parameters of the teacher model.
Preferably, in step 2, it can be assumed that the non-stationary data stream is formed by n sample disjoint tasks { T } 1 ,T 2 ,...,T n Are composed of, each task T n The training set of (A) is all marked dataComposition, where m is task T n Number of samples, x, of training set i For task T n I image sample in training set, y i For task T n Ith image sample x in training set i The labeled category; buffer zoneHas a capacity ofx j For the j image sample in the buffer,y j Is the j image sample x in the buffer j The labeled category; a method of sampling a water reservoir may comprise the steps of:
step A1, judging the size between the number num of samples encountered currently and the buffer area capacity B, if num is less than or equal to B, sampling (x) i ,y i ) Direct storage to bufferPerforming the following steps; x is the number of i For task T n I image sample in training set, y i For task T n Ith image sample x in training set i The labeled category.
Step A2, ifGenerating a random integer rand _ num, wherein the minimum value of the random integer is 0, and the maximum value is num-1; if it isUsing samples (x) i ,y i ) Replacing samples (x) in the buffer rand_num ,y rand_num );x rand_num Buffer with index rand _ numImage sample of (1), y rand_num Buffer with index rand _ numThe image sample label in (1).
Preferably, in step 4, the method for calculating the quality score of the teacher model classification output data may be as follows:
can be provided with:indicates a capacity ofThe buffer area of (2); x is the number of j For the jth image sample in the buffer, y j Is the j image sample x in the buffer j The labeled category;represents a sample x j Classifying output data is obtained after the processing of the characteristic encoder and the classifier of the teacher model is carried out in sequence; omega (x) j ) For corresponding sample x j The teacher model of (1) classifies and outputs the quality scores of the data; omega (x) j ) The calculation formula of (c) can be as follows:
in the formula: ρ represents a temperature coefficient; c represents the number of all possible categories; exp (·) represents an exponential function based on a natural constant e;outputting data for classificationMiddle corresponding category y j The classification output data of (2);representing sorted output dataThe classification of each class outputs data.
Preferably, in step 4, letRepresents a sample x j Classifying output data is obtained after the data are sequentially processed by a feature encoder and a classifier of the student model; calculating on-line distillation loss for teacher and student modelsThe method of (2) is as follows:
in the formula: l. capillary 2 Is represented by 2 A norm;representing a mathematical expectation function; exp (. cndot.) represents an exponential function based on a natural constant e.
Preferably, in step 5, there may be:indicates a capacity ofThe buffer area of (2); x is the number of j For the jth image sample in the buffer, y j Is the j image sample x in the buffer j The labeled category;is a sample x j After the input data is input into the teacher model, the characteristic embedded data of the teacher model is obtained through a characteristic encoder and a characteristic mapper;is a sample x j After the data is input into the student model, feature embedded data of the student model is obtained through a feature encoder and a feature mapper; z is a radical of formula t All samples x for the current batch j All teacher model feature embedded data obtained by processing the teacher model feature embedded data through a feature encoder and a feature mapper after being input into the teacher modelA set of (a); z is a radical of s All samples x for the current batch j After being input into the student model, all student model characteristic embedded data obtained through the characteristic encoder and the characteristic mapperA set of (a);represents from z s Sampling the obtained characteristic embedded data; z is a radical of t+ Is shown andembedding teacher model features with the same class label into a data set;denotes from z t+ Embedding the sampled features into data;denotes from z t Embedding the sampled features into data; calculating the distillation loss of the comparison relationship between the teacher model and the student modelThe method of (a) can be as follows:
in the formula:representing a mathematical expectation function; l. capillary 2 Is represented by 2 A norm; log (-) represents a natural logarithmic function based on a natural constant e;representing a judgment function for judging the feature-embedded dataAndwhether or not they originate from their joint distribution Representing a judgment function for judging the feature-embedded dataAndwhether or not they originate from their joint distribution Representing a transpose; exp (·) represents an exponential function based on a natural constant e; τ represents a temperature coefficient.
Preferably, step 6 may comprise the following sub-steps:
in step B1, Θ can be set t ,Φ t ,Ψ t Feature encoder, classifier and feature mapper for the teacher model s ,Φ s ,Ψ s Corresponding to the feature encoder, the classifier and the feature mapper of the teacher model, each training sample (x, y) of the student model is subjected to one-time random geometric transformation to obtain an amplified training sampleWhere x represents an image sample, y is the class labeled by image sample x,for the image samples after the geometric transformation,a label that is geometrically transformed; amplifying the training sampleInputting the data into the student model, processing the data by the feature encoder and the feature mapper of the student model to obtain corresponding student model feature data F s And feature embedded dataWherein:
step B2, the obtained student model characteristic data F can be used s Input to a multi-layer sensorIn the method, the method is used for judging the training samplesThe kind of geometric transformation to be performed; the output of the multilayer perceptron can be set as S s ,S s The calculation formula of (c) can be as follows:
whereinRepresenting a mathematical expectation function; softmax (·) denotes a softmax function; l (-) represents a cross entropy loss function;
step B4, may include:indicates a capacity ofThe buffer area of (2); x is the number of j For the jth image sample in the buffer, y j Is the j image sample x in the buffer j The labeled category;is a sample x j After the data is input into the student model, feature embedded data of the student model is obtained through a feature encoder and a feature mapper;feature embedded data representing derived universe student modelAnda set of (a);represents fromEmbedding the sampled features into data;is shown andembedding the characteristics of the student model with the same class label into a data set;represents fromSampling the obtained characteristic embedded data;represents fromSampling the obtained characteristic embedded data; based on the original characteristic embedded data and the amplified characteristic embedded data, the original characteristic embedded data and the amplified characteristic embedded data in the student model can be utilized to carry out supervised contrast learning, and the loss function of the supervised contrast learningThe calculation formula of (c) can be as follows:
in the formula:
whereinRepresents a mathematical expectation; l. capillary 2 Is represented by 2 A norm; log (-) represents a natural logarithmic function based on a natural constant e;representing feature embedded dataAndthe distance of (d);representing feature embedded dataAndthe distance of (d); exp (·) represents an exponential function based on a natural constant e;representing a transpose; τ represents a temperature coefficient;
step B5, self-supervision loss can be reducedAnd supervised contrast learning lossCombining to obtain collaborative contrast lossHelp the student model to better extract discriminant features,the calculation formula of (c) can be as follows:
preferably, in step B1, the geometric transformation may include rotating, scaling and adjusting the aspect ratio of the image.
Preferably, in step 7, it can be assumed that the non-stationary data stream is formed by n sample disjoint tasks { T } 1 ,T 2 ,…,T n Is formed, let x denote task T n And a buffer areaY is the category marked by the image sample x; cross entropy classification loss for student modelsThe calculation formula of (c) can be as follows:
in the formula:representing a mathematical expectation function; softmax (·) denotes a softmax function; l (-) represents a cross entropy loss function; r is s (x) And the classification output data of the image sample x after being sequentially processed by a feature encoder and a classifier of the student model is represented.
Preferably, in step 9, a specific method for updating the parameters of the teacher model by using the parameters of the student model may be as follows:
let Θ be t ,Φ t ,Ψ t Corresponding characteristic compiling for teacher modelCoder, classifier and feature mapper s ,Φ s ,Ψ s The characteristic encoder, the classifier and the characteristic mapper correspond to the teacher model; the method for updating the parameters of the teacher model can be as follows:
Θ t ←mΘ t +(1-m)[(1-X)Θ t +XΘ s ];
Φ t ←mΦ t +(1-m)[(1-X)Φ t +XΦ s ];
Ψ t ←mΨ t +(1-m)[(1-X)Ψ t +XΨ s ];
where m represents a momentum factor and X obeys a Bernoulli distribution (also known as a 0-1 distribution), which can be defined as:
P(X=k)=p k (1-p) 1-k ,k={0,1};
the value range of the Bernoulli probability p is (0, 1), and the updating frequency of the teacher model can be controlled through the Bernoulli probability p.
Preferably, the calculation formula of the momentum factor m can be as follows:
m=min(itera/(itera+1),η);
where itera is the number of iterations of the current student model, min (itera/(itera +1), η) represents the smaller of itera/(itera +1) and η, η is a constant and can be set to 0.999.
The working process and working principle of the present invention are further explained by a preferred embodiment of the present invention as follows:
the generalized continuous learning is to consolidate old knowledge and accumulate new knowledge from a non-stationary data stream, and finally complete classification prediction of all the seen class images. Assume that a non-stationary data stream is formed by N sample disjoint tasks T 1 ,T 2 ,...,T N Are composed of, each task T n The training set of (A) is all marked dataComposition, where m is task T n Number of samples, x, of training set i For task T n Training set ith image sample,y i For task T n Ith image sample x in training set i The labeled category. In the testing stage, the generalized continuous learning method can complete classification tasks for all the classes currently seen. Each task T n The test sets of (1) are all tagged with dataComposition, wherein p is task T n Number of samples, x, of test set q For task T n The q image sample in the test set, y q For task T n The q image sample x in the test set q The labeled category. The generalized continuous learning task is all the tasks { T } trained at present 1 ,T 2 ,...,T n And performing category prediction on the test set.
FIG. 1 is a flow chart of the broad continuous classification method based on an online contrast distillation network according to the present invention. Wherein the content of the first and second substances,indicates a capacity ofBuffer of x j For the jth image sample in the buffer, y j Is the j image sample x in the buffer j The labeled category.Which represents the loss of the on-line distillation,indicating comparative distillation losses. Let Θ be t ,Φ t ,Ψ t Setting theta as a feature encoder, a classifier and a feature mapper of the teacher model respectively s ,Φ s ,Ψ s A feature encoder, a classifier and a feature mapper respectively of the student model,
the invention relates to a generalized continuous classification method based on an online contrast distillation network, which comprises the following steps:
step 1, before a task starts, firstly initializing parameters of a teacher model and a student model and endowing a buffer area with a fixed size: theta t =Θ s ,Φ t =Φ s ,Ψ t =Ψ s ,
Step 2, when a batch data stream containing bsz samples arrives, counting the number num of the samples met currently, and updating a buffer area by utilizing a reservoir sampling methodThis ensures that the probability of all samples being stored in the buffer is equal. For a specific sample, the specific steps of sampling by using the water reservoir comprise:
(1) judging the number num of samples encountered currently and the capacity of a buffer areaThe size between, ifSample (x) i ,y i ) Directly storing the data into a buffer area;
(2) if it isA random integer rand _ num is generated, the minimum value of the random integer is 0, and the maximum value is num-1. If it isUsing samples (x) i ,Y i ) Replacing samples (x) in the buffer rand_num ,y rand_num );x rand-num The representation index isOf the buffer of (2), y rand_num The representation index isOf the buffer area.
Step 3, from the buffer areaRandomly sampling S samples x j To consolidate the old knowledge, the S samples x j Respectively inputting the data into a teacher model and a student model, wherein the classified output data of the teacher model and the classified output data of the student model obtained by the feature encoder and the classifier are respectively as follows:
the characteristic embedded data of the teacher model and the student model obtained by the characteristic encoder and the characteristic mapper are respectively as follows:
step 4, setting:represents a sample x j Classified output data are obtained after the characteristic encoder and the classifier of the teacher model are sequentially processed; omega (x) j ) For corresponding sample x j The teacher model of (1) classifies the quality score of the output data;representing sorted output dataThe classification output data of each category; is provided withRepresents a sample x j Classifying output data are obtained after the processing of a feature encoder and a classifier of the student model is carried out in sequence;outputting data for classificationMiddle corresponding category y j The classification of (2) outputs data.
Calculating the quality fraction omega (x) of the classified output data of each sample by the perceptron j ):
Where ρ is the temperature coefficient, C represents the number of all possible classes, and exp (·) represents an exponential function based on the natural constant e.
Wherein |. non 2 Is represented by 2 The norm of the number of the first-order-of-arrival,representing a mathematical expectation function. By giving the difference between the outputs of the teacher model and the student modelWeight ω (x) j ) To let the student model focus more attention on samples with high quality scores.
And 5, comparing the characteristic embedded data between the teacher model and the student model, and calculating the distillation loss of the comparison relation according to the formulas (3) and (4)
WhereinRepresents a mathematical expectation function, log (-) represents a natural logarithmic function based on a natural constant e,is a sample x j After the input data is input into the teacher model, the characteristic embedded data of the teacher model is obtained through a characteristic encoder and a characteristic mapper;is a sample x j After the data is input into the student model, feature embedded data of the student model is obtained through a feature encoder and a feature mapper; z is a radical of t All samples x for the current batch j All teacher model feature embedded data obtained by processing the teacher model feature embedded data through a feature encoder and a feature mapper after being input into the teacher modelA set of (a); z is a radical of s All samples x for the current batch j All student model feature embedded data obtained by the feature encoder and the feature mapper after being input into the student modelA set of (a);denotes from z s Sampling the obtained characteristic embedded data; z is a radical of t+ Is shown andembedding teacher model characteristics with the same class label into a data set;represents from z t+ Sampling the obtained characteristic embedded data;denotes from z t The sampled features are embedded in the data.
Representing a judgment function for judging the feature-embedded dataAndwhether or not they originate from their joint distributionThe calculation formula is as follows:
wherein exp (·) represents an exponential function with a natural constant e as a base, | · non · 2 Is represented by 2 Norm, (.) T Denotes transposition and τ denotes temperature coefficient.
Representing a judgment function for judging the feature-embedded dataAndwhether or not they originate from their joint distributionThe calculation formula is as follows:
wherein exp (·) represents an exponential function with a natural constant e as a base, | · non · 2 Is represented by 2 Norm, (.) T Denotes transposition and τ denotes temperature coefficient.
And 6, using self-supervised learning and supervised contrast learning to help the student model to extract discriminant features, wherein the specific steps comprise:
(1) each training sample (x, y) of the student model is subjected to one-time random geometric transformation to obtain an amplified training sampleWhere x represents an image sample, y is the class labeled by image sample x,for the image samples after the geometric transformation,is a geometrically transformed label. The geometric transformation includes rotating, scaling and adjusting the aspect ratio of the image. This doubles the number of training images for the student model. For the set of images subjected to random geometric transformationInputting them into the student network, and getting the corresponding student model characteristics and characteristic embedding data:
(2) inputting the obtained characteristics of the student model into a multi-layer perceptronIn the method, the training samples are judged and matchedThe kind of geometric transformation performed:
WhereinRepresents the mathematical expectation function, softmax (·) represents the softmax function, and l (·) represents the cross-entropy loss function.
(4) Is provided withIs a sample x j After being input into the student model, the characteristics are codedCharacteristic embedding data of the student model obtained by the device and the characteristic mapper;feature embedded data representing acquired ensemble student modelsAnda set of (a);represents fromSampling the obtained characteristic embedded data;is shown andembedding the student model characteristics with the same category label into a data set;represents fromSampling the obtained characteristic embedded data;represents fromEmbedding the sampled features into data; based on the original characteristic embedded data and the amplified characteristic embedded data, the characteristic embedded data in the student model is utilized to carry out supervised contrast learning, and the loss function of the supervised contrast learningThe calculation formula of (a) is as follows:
in the formula:
represents a mathematical expectation; log (-) represents a natural logarithmic function based on a natural constant e;
representing feature embedded dataAndthe distance of (d);representing feature embedded dataAndthe distance of (d); exp (·) denotes an exponential function based on a natural constant e; l. capillary 2 Is represented by 2 A norm;representing a transpose; τ represents a temperature coefficient.
(5) For self-supervision lossAnd supervised contrast learning lossPerforming the combination yields a collaborative contrast lossHelp student's model to extract the characteristic of discriminant better:
and 7, calculating the cross entropy classification loss of the student model based on experience playback:
wherein x represents a task T n And a buffer areaY is the category marked by the image sample x;representing a mathematical expectation function; softmax (·) denotes a softmax function; l (-) represents a cross entropy loss function; r is a radical of hydrogen s (x) And the classification output data of the image sample x after being sequentially processed by a feature encoder and a classifier of the student model is represented.
r s (x) Feature encoder Θ representing the passage of an image sample x through a student model s And a classifier phi s The resulting output:
step 8, calculating the total optimization target of the student modelAnd (3) optimizing parameters of the student model by using a random gradient descent algorithm:
wherein alpha is 1 、α 2 And alpha 3 Representing a hyper-parameter.
And 9, directly updating the parameters of the teacher model by using the parameters of the student model without involving gradient return, and setting theta t ,Φ t ,Ψ t Feature encoder, classifier and feature mapper for the teacher model s ,Φ s ,Ψ s The characteristic encoder, the classifier and the characteristic mapper correspond to a teacher model. The updating method comprises the following steps:
Θ t ←mΘ t +(1-m)[(1-X)Θ t +XΘ s ] (21);
Φ t ←mΦ t +(1-m)[(1-X)Φ t +XΦ s ] (22);
Ψ t ←mΨ t +(1-m)[(1-X)Ψ t +XΨ s ] (23);
where m represents a momentum factor and X obeys a Bernoulli distribution (also known as a 0-1 distribution), defined as:
P(X=k)=p k (1-p) 1-k ,k={0,1} (24);
the value range of the Bernoulli probability p is (0, 1), and the updating frequency of the teacher model is controlled through the Bernoulli probability p.
In order for the teacher model to quickly learn new knowledge in the early stage of model training, the momentum factor m is designed as follows:
m=min(itera/(itera+1),η) (25);
where itera is the number of iterations of the current student model, min (itera/(itera +1), η) represents the smaller of itera/(itera +1) and η, and η is a constant and is typically set to 0.999.
The generalized continuous classification method based on the online contrast distillation network can be used for testing at any time. In the testing stage, a teacher model is used for testing. The reason is that student models at different times are good at classifying different categories, and teacher models that learn from student models can accumulate their merits for learning. Thus, the teacher model has a greater ability to distinguish between all the classes seen than the student models.
The above-mentioned embodiments are only for illustrating the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and to carry out the same, and the present invention shall not be limited to the embodiments, i.e. the equivalent changes or modifications made within the spirit of the present invention shall fall within the scope of the present invention.
Claims (10)
1. A generalized continuous classification method based on an online comparative distillation network is characterized by comprising the following steps:
step 1, establishing a knowledge distillation-based classification model, wherein the classification model comprises a teacher model and a student model; the teacher model and the student model are provided with a feature encoder, a classifier and a feature mapper; setting an optimization target of the student model; initializing parameters of a teacher model and parameters of a student model and endowing the parameters to a buffer area with a fixed size;
step 2, when a batch data stream containing R samples arrives, counting the number of the samples encountered currently, and updating a buffer area by using a reservoir sampling method;
step 3, randomly sampling S samples from the buffer area, respectively inputting the S samples into a teacher model and a student model, and respectively obtaining classification output data of the teacher model and the student model corresponding to the S samples through the processing of respective feature encoders and classifiers of the S samples; respectively obtaining feature embedded data of the teacher model and the student model corresponding to the S samples through the processing of the feature encoder and the feature mapper of the teacher model and the feature mapper;
step 4, calculating the quality scores of the teacher model classified output data, adjusting the weights of the online knowledge distillation loss functions of different samples according to the quality scores of the teacher model classified output data, and further calculating the online distillation loss of the teacher model and the student model
Step 5, comparing the characteristic embedded data between the teacher model and the student models, and calculating the comparison relationship distillation loss of the teacher model and the student models
Step 6, using the self-supervision learning and the supervision contrast learning to help the student model to extract the discriminant characteristics to calculate the self-supervision loss of the student modelAnd supervised contrast learning loss
Step 7, calculating the cross entropy classification loss of the student model based on experience playback
Step 8, calculating the total optimization target of the student model α 1 To alpha 3 The hyperparameters of each corresponding loss function; using random gradient descentOptimizing parameters of the student model by an algorithm;
and 9, directly utilizing the parameters of the student model to update the parameters of the teacher model.
2. The method for the generalized continuous classification based on the online contrast distillation network as claimed in claim 1, wherein in step 2, the non-stationary data stream is assumed to be formed by n sample disjoint tasks { T } 1 ,T 2 ,...,T n Are composed of, each task T n The training set of (A) is all marked dataComposition, where m is task T n Number of samples, x, of training set i For task T n I image sample in training set, y i For task T n Ith image sample x in training set i The labeled category; buffer zoneHas a capacity of B, x j For the jth image sample in the buffer, y j Is the j image sample x in the buffer j The labeled category; the method for sampling the water reservoir comprises the following steps:
step A1, determining the number num of samples currently encountered and the buffer capacityThe size between, ifSample (x) i ,y i ) Direct storage to bufferPerforming the following steps; x is a radical of a fluorine atom i For task T n I-th image sample in training set, y i For task T n Ith image sample x in training set i MarkedA category;
step A2, ifGenerating a random integer rand _ num, wherein the minimum value of the random integer is 0, and the maximum value is num-1; if rand _ num < B, use the sample (x) i ,y i ) Replacing samples (x) in the buffer rand_num ,y rand_num );x rand_num Buffer with index rand _ numOf (2) an image sample, y rand_num Buffer with index rand _ numThe image sample label in (1).
3. The generalized continuous classification method based on the online contrast distillation network as claimed in claim 1, wherein in step 4, the method for calculating the quality score of the teacher model classification output data is as follows:
setting:indicates a capacity ofThe buffer area of (2); x is the number of j For the jth image sample in the buffer, y j Is the j image sample x in the buffer j The labeled category;representing a sample x j Classified output data are obtained after the characteristic encoder and the classifier of the teacher model are sequentially processed; omega (x) j ) For corresponding sample x j The teacher model of (1) classifies the quality score of the output data; omega (x) j ) Is calculated as followsThe following steps:
in the formula:
ρ represents a temperature coefficient;
c represents the number of all possible categories;
exp (·) represents an exponential function based on a natural constant e;
outputting data for classificationMiddle corresponding category y j The classification output data of (1);
4. The generalized continuous classification method based on online comparative distillation network as claimed in claim 3, wherein in step 4, settingRepresents a sample x j Classifying output data are obtained after the processing of a feature encoder and a classifier of the student model is carried out in sequence; calculating on-line distillation loss for teacher and student modelsThe method of (2) is as follows:
5. The generalized continuous classification method based on the online comparative distillation network according to claim 1, wherein in step 5, the following steps are provided:indicates a capacity ofThe buffer area of (2); x is a radical of a fluorine atom j For the jth image sample in the buffer, y j Is the j image sample x in the buffer j The labeled category;is a sample x j After the input data is input into the teacher model, the characteristic embedded data of the teacher model is obtained through a characteristic encoder and a characteristic mapper;is a sample x j After the data is input into the student model, feature embedded data of the student model is obtained through a feature encoder and a feature mapper; z is a radical of t All samples x for the current batch j All teacher model feature embedded data obtained by processing the teacher model feature embedded data through a feature encoder and a feature mapper after being input into the teacher modelA set of (a); z is a radical of s All samples x for the current batch j All student model feature embedded data obtained by the feature encoder and the feature mapper after being input into the student modelA set of (a);denotes from z s Sampling the obtained characteristic embedded data; z is a radical of t+ Is shown andembedding teacher model characteristics with the same class label into a data set;denotes from z t+ Sampling the obtained characteristic embedded data;denotes from z t Sampling the obtained characteristic embedded data; calculating comparative distillation loss between teacher model and student modelThe method of (2) is as follows:
in the formula:
log (-) represents a natural logarithmic function based on a natural constant e;
representing a judgment function for judging the feature-embedded dataAndwhether or not they originate from their joint distribution
Representing a judgment function for judging the feature-embedded dataAndwhether or not they originate from their joint distribution
(·) T To representTransposition is carried out;
exp (·) represents an exponential function based on a natural constant e;
6. The broad continuous classification method based on the online contrast distillation network according to claim 5, wherein the step 6 comprises the following sub-steps:
step B1, let Θ be t ,Φ t ,Ψ t Feature encoder, classifier and feature mapper for the teacher model s ,Φ s ,Ψ s The characteristic encoder, the classifier and the characteristic mapper correspond to the teacher model; each training sample (x, y) of the student model is subjected to one-time random geometric transformation to obtain an amplified training sampleWhere x represents an image sample, y is the class labeled by image sample x,for the image samples after the geometric transformation,a label that is geometrically transformed; amplifying the training sampleInputting the data into a student model, and processing the data by a feature coder and a feature mapper of the student model to obtain corresponding student model feature data F s And feature embedded dataWherein:
step B2, obtaining student model characteristic data F s Input to a multi-layer sensorIn the method, the training samples are judged and matchedThe kind of geometric transformation to be performed; let the output of the multilayer perceptron be S s ,S s The calculation formula of (c) is as follows:
softmax (·) represents a softmax function;
step B4, setting:indicates a capacity ofThe buffer area of (2); x is the number of j For the jth image sample in the buffer, y j For the jth image sample x in the buffer j The labeled category;is a sample x j After the data is input into the student model, feature embedded data of the student model is obtained through a feature encoder and a feature mapper;feature embedded data representing derived universe student modelAnda set of (a);represents fromSampling the obtained characteristic embedded data;is shown andembedding the student model characteristics with the same category label into a data set;represents fromEmbedding the characteristics obtained by the fruit sample into data;represents fromSampling the obtained characteristic embedded data; based on the original characteristic embedded data and the amplified characteristic embedded data, the characteristic embedded data in the student model is utilized to carry out supervised contrast learning, and the loss function of the supervised contrast learningThe calculation formula of (a) is as follows:
in the formula:
log (-) represents a natural logarithmic function based on a natural constant e;
exp (·) represents an exponential function based on a natural constant e;
(·) T representing a transpose;
step B5, loss self-supervisionAnd supervised contrast learning lossPerforming the combination yields a collaborative contrast lossHelp the student model to better extract discriminant features,the calculation formula of (a) is as follows:
7. the generalized continuous classification method based on online contrast distillation network according to claim 1, wherein in step B1, the geometric transformation includes rotating, scaling and adjusting aspect ratio of the image.
8. The method for the generalized continuous classification based on the online contrast distillation network as claimed in claim 1, wherein in step 7, the non-stationary data stream is assumed to be formed by n sample disjoint tasks { T } 1 ,T 2 ,...,T n Is formed, let x denote task T n And a buffer areaY is the category marked by the image sample x; cross entropy classification loss for student modelsThe calculation formula of (c) is as follows:
in the formula:
softmax (·) denotes a softmax function;
r s (x) And the classification output data of the image sample x after being sequentially processed by a feature encoder and a classifier of the student model is represented.
9. The generalized continuous classification method based on the online contrast distillation network as claimed in claim 1, wherein the specific method for updating the parameters of the teacher model by using the parameters of the student model in step 9 is as follows:
let Θ be t ,Φ t ,Ψ t Feature encoder, classifier and feature mapper for the teacher model s ,Φ s ,Ψ s The characteristic encoder, the classifier and the characteristic mapper correspond to the teacher model; the method for updating the parameters of the teacher model comprises the following steps:
Θ t ←mΘ t +(1-m)[(1-X)Θ t +XΘ s ];
Φ t ←mΦ t +(1-m)[(1-X)Φ t +XΦ s ];
Ψ t ←mΨ t +(1-m)[(1-X)Ψ t +XΨ s ];
where m represents a momentum factor and X obeys a Bernoulli distribution (also known as a 0-1 distribution), defined as:
P(X=k)=p k (1-p) 1-k ,k={0,1};
the value range of the Bernoulli probability p is (0, 1), and the updating frequency of the teacher model is controlled through the Bernoulli probability p.
10. The generalized continuous classification method based on the online contrast distillation network according to claim 9, wherein the momentum factor m is calculated as follows:
m=min(itera/(itera+1),η);
where itera is the number of iterations of the current student model, min (itera/(itera +1), η) represents the smaller of itera/(itera +1) and η, and η is a constant and is typically set to 0.999.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210326319.8A CN114972839A (en) | 2022-03-30 | 2022-03-30 | Generalized continuous classification method based on online contrast distillation network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210326319.8A CN114972839A (en) | 2022-03-30 | 2022-03-30 | Generalized continuous classification method based on online contrast distillation network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114972839A true CN114972839A (en) | 2022-08-30 |
Family
ID=82976151
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210326319.8A Pending CN114972839A (en) | 2022-03-30 | 2022-03-30 | Generalized continuous classification method based on online contrast distillation network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114972839A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115457042A (en) * | 2022-11-14 | 2022-12-09 | 四川路桥华东建设有限责任公司 | Method and system for detecting surface defects of thread bushing based on distillation learning |
CN115511059A (en) * | 2022-10-12 | 2022-12-23 | 北华航天工业学院 | Network lightweight method based on convolutional neural network channel decoupling |
CN116502621A (en) * | 2023-06-26 | 2023-07-28 | 北京航空航天大学 | Network compression method and device based on self-adaptive comparison knowledge distillation |
-
2022
- 2022-03-30 CN CN202210326319.8A patent/CN114972839A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115511059A (en) * | 2022-10-12 | 2022-12-23 | 北华航天工业学院 | Network lightweight method based on convolutional neural network channel decoupling |
CN115511059B (en) * | 2022-10-12 | 2024-02-09 | 北华航天工业学院 | Network light-weight method based on convolutional neural network channel decoupling |
CN115457042A (en) * | 2022-11-14 | 2022-12-09 | 四川路桥华东建设有限责任公司 | Method and system for detecting surface defects of thread bushing based on distillation learning |
CN116502621A (en) * | 2023-06-26 | 2023-07-28 | 北京航空航天大学 | Network compression method and device based on self-adaptive comparison knowledge distillation |
CN116502621B (en) * | 2023-06-26 | 2023-10-17 | 北京航空航天大学 | Network compression method and device based on self-adaptive comparison knowledge distillation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109086658B (en) | Sensor data generation method and system based on generation countermeasure network | |
CN111126386B (en) | Sequence domain adaptation method based on countermeasure learning in scene text recognition | |
CN114972839A (en) | Generalized continuous classification method based on online contrast distillation network | |
CN111274398B (en) | Method and system for analyzing comment emotion of aspect-level user product | |
CN110929034A (en) | Commodity comment fine-grained emotion classification method based on improved LSTM | |
CN111414461A (en) | Intelligent question-answering method and system fusing knowledge base and user modeling | |
CN111582397A (en) | CNN-RNN image emotion analysis method based on attention mechanism | |
CN111460157A (en) | Cyclic convolution multitask learning method for multi-field text classification | |
CN113626589A (en) | Multi-label text classification method based on mixed attention mechanism | |
CN106339718A (en) | Classification method based on neural network and classification device thereof | |
CN111813939A (en) | Text classification method based on representation enhancement and fusion | |
Das et al. | Determining attention mechanism for visual sentiment analysis of an image using svm classifier in deep learning based architecture | |
CN114973226A (en) | Training method for text recognition system in natural scene of self-supervision contrast learning | |
CN114780723A (en) | Portrait generation method, system and medium based on guide network text classification | |
CN111611375B (en) | Text emotion classification method based on deep learning and turning relation | |
CN113627550A (en) | Image-text emotion analysis method based on multi-mode fusion | |
Liu et al. | Audio and video bimodal emotion recognition in social networks based on improved alexnet network and attention mechanism | |
Saha et al. | The Corporeality of Infotainment on Fans Feedback Towards Sports Comment Employing Convolutional Long-Short Term Neural Network | |
CN116662924A (en) | Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism | |
CN113792541B (en) | Aspect-level emotion analysis method introducing mutual information regularizer | |
CN106447691A (en) | Weighted extreme learning machine video target tracking method based on weighted multi-example learning | |
Ouanan et al. | Development of deep learning-based facial recognition system | |
Soujanya et al. | A CNN based approach for handwritten character identification of Telugu guninthalu using various optimizers | |
CN113626537B (en) | Knowledge graph construction-oriented entity relation extraction method and system | |
Zhu | Neural architecture search for deep face recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |