CN115952851A - Self-supervision continuous learning method based on information loss mechanism - Google Patents

Self-supervision continuous learning method based on information loss mechanism Download PDF

Info

Publication number
CN115952851A
CN115952851A CN202211375805.5A CN202211375805A CN115952851A CN 115952851 A CN115952851 A CN 115952851A CN 202211375805 A CN202211375805 A CN 202211375805A CN 115952851 A CN115952851 A CN 115952851A
Authority
CN
China
Prior art keywords
model
self
image
feature
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211375805.5A
Other languages
Chinese (zh)
Other versions
CN115952851B (en
Inventor
潘力立
杨帆
张亮
赵江伟
吴庆波
李宏亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202211375805.5A priority Critical patent/CN115952851B/en
Publication of CN115952851A publication Critical patent/CN115952851A/en
Application granted granted Critical
Publication of CN115952851B publication Critical patent/CN115952851B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides an information loss mechanism-based self-supervision continuous learning method, which comprises the following steps: (1) An unsupervised continuous learning framework based on information loss to cause models to learn only important feature representations on continuous tasks; (2) An InfoDrap loss term based on a self-supervision learning paradigm is used for helping a model to still extract important feature expressions of a test sample after an InfoDrap mechanism is removed in a testing stage. In addition, the unsupervised continuous learning framework proposed by the invention can be used simultaneously with most of the continuous learning strategies. By discarding unimportant image information, the model only focuses on the feature representation of the important image information to relieve the limitation of the capacity of the model, and the performance of the self-supervision model is improved under the condition that samples of historical tasks or parameter information of the historical model are not required to be introduced.

Description

Self-supervision continuous learning method based on information loss mechanism
Technical Field
The invention belongs to the field of image processing, and mainly aims to improve the performance of an automatic supervision continuous learning model; the method is mainly applied to the field of image classification.
Background
In recent years, deep Learning (DL) has been remarkably successful in the fields of machine Learning, natural language processing, and the like. The focus of DL is to develop Deep Neural Networks (DNNs) by off-line training using fixed or predefined data sets, which exhibit significant performance on the corresponding task. However, DNN is also limited, and the trained DNN is fixed, and parameters inside the network may not change during the operation process, which means that the DNN remains static after deployment and cannot adapt to a changing environment. Real-world applications are not all monolithic, and in particular applications associated with autonomous agents involve the processing of continuously changing data, and over time, the data or tasks faced by the model may change, and static models do not perform well in such scenarios. One possible solution is to retrain the network when the data distribution changes, however, the complete training using the expanded data set is a computationally intensive task that is not possible in real world computing resource constrained environments, resulting in the need for a new algorithm that enables continuous learning with efficient use of resources.
Continuous learning presents needs and challenges in many real-world scenarios: the robot needs to autonomously learn a new behavior specification according to the change of the environment so as to adapt to the new environment and complete a new task; the automatic driving program needs to adapt to different environments, such as from rural highways to highways, from locations with sufficient light to dim environments; intelligent dialog systems need to adapt to different users and situations; smart medical applications need to adapt to new cases, new hospitals and inconsistent medical conditions.
Continuous Learning (CL) studies the problem of Learning in non-stationary data streams, and aims to expand the adaptive capacity of a model, so that the model can learn corresponding knowledge in different tasks, and can memorize characteristics learned in historical tasks. According to whether the input data has a label, continuous Learning can be divided into Supervised Continuous Learning (SCL) and Unsupervised Continuous Learning (UCL), supervised continuous Learning is usually concentrated on a series of related tasks, an artificially given label is added to the input data, so that task information and task boundary information needing generalization can be obtained, and the setting no longer meets the requirements of real situations: the unknown task labels, the undefined definition of task boundaries and the unavailability of a large amount of class label data lead to unsupervised continuous learning and self-supervised continuous learning methods. Self-supervised learning is part of unsupervised learning, which aims to eliminate the need for artificial identification to represent learning, and learns the characterization of data using unidentified raw information. The real self-monitoring continuous learning algorithm can utilize continuously input data streams which are not independently and uniformly distributed to learn a robust and self-adaptive model on the premise of not forgetting the obtained knowledge.
In recent years, research on CL has focused mainly on SCL, and these research results generally cannot be expanded into practical application scenarios with biased data distribution, and therefore, research on UCL that does not rely on manual annotation or supervised information has been receiving increasing attention, and despite short research time, complex research problems, and less results in the UCL field, efforts have been made to show that relying on manual annotation data is not essential for continuous learning, unsupervised visual representation can alleviate the problem of catastrophic forgetting, and UCL can exhibit better performance than SCL. Reference documents: madan, d., yoon, j., li, y, liu, y, & Hwang, s.j. (2021, separator) for unsupervised connected communication in International Conference communication in order to improve the performance of unsupervised models, a lightweight method independent of the model, namely information loss (InfoDrop), has attracted attention, which improves the robustness, interpretability of the model by reducing the texture bias of the Convolutional Neural Networks (CNN). Reference: the invention aims to combine an information loss mechanism with an unsupervised continuous Learning framework, improve the performance of the model, construct a more robust and reasonable continuous Learning model and promote the unsupervised continuous Learning technology to develop forward.
Disclosure of Invention
The invention relates to a self-supervision continuous learning method, which leads a model to extract important image characteristics in a continuous learning task by introducing an InfoDrap mechanism into a self-supervision model. The method selects the abandoned unimportant image information by calculating the self-information amount of the image block, guides the model to pay attention to the important region of the image information, and accordingly improves the performance of the self-supervision model.
The method comprises the steps of firstly constructing an information loss mechanism-based self-supervision continuous learning framework, dividing a CIFAR-10 data set into 5 tasks, training a model on the corresponding data set according to the arrival sequence of the tasks, and testing the accuracy of the model by using a KNN algorithm. The method is characterized in that an information loss mechanism is introduced into an automatic supervision learning framework to improve the performance of the model. The invention mainly does the following work from the perspective of model capacity: 1) Constructing a self-supervision learning model and a self-supervision continuous learning paradigm; 2) An information loss mechanism based on information quantity and a Dropout method is established, the model is helped to lose unimportant features in the image, the important features are reserved, and the information loss mechanism is integrated into a self-supervision continuous learning framework; 3) Based on the self-supervision loss paradigm, an InfoDrop loss item is combined, and the situation that the InfoDrop mechanism needs to be removed to finely adjust the model in the post-test is avoided; 4) Training is carried out on a data set CIFAR-10, accuracy of the model on the test set is tested by using a KNN classification algorithm, performance of the model is evaluated, and the model is compared with various continuous learning strategies. Through the work, the unsupervised continuous learning method is applicable to various continuous learning strategies, can improve the performance of models under different strategies, and is high in applicability.
To facilitate the description of the present disclosure, certain terms are first defined.
Definition 1: residual convolutional neural networks (ResNet). The method has the advantages that the 'residual connection' is added into the convolutional network, so that the degradation phenomenon of a deep network in training is solved, the trainable depth of the neural network is greatly increased, and compared with the traditional convolutional neural network, the residual network has the advantages of better training and easier optimization. In the present invention, the residual convolutional neural network used is the Resnet18 network.
Definition 2: and (4) self-adaptive averaging of the pooling layers. The self-adaptive average pooling layer can compress the spatial dimension, take out the average value of data in the corresponding dimension, and output the result of the specified size in a self-adaptive manner, so that some useless characteristics can be inhibited to a certain extent.
Definition 3: simsim. This is a different name for the twin network model, the simsim model maximizes the similarity between two augmentations of one image, which learns the characterization without the need for negative sample pairs, large batches, and momentum encoding.
Definition 4: dropout method. Dropout is a regularization method that solves the neural network overfitting problem by setting a probability to be discarded for neurons in a certain layer of the network, and randomly discarding some neurons according to the set probability in training.
Definition 5: the image Patch. Patch can be understood as an image block, and during the operation of the neural network, the network divides the picture into a plurality of small blocks, and the convolution kernel only looks at one small block at a time, and such a small block is called Patch.
Definition 6: the ReLU activation layer. Also called modified linear unit, is a commonly used activation function in artificial neural network, usually referring to a nonlinear function represented by a ramp function and its variants, and the expression f (x) = max (0, x).
The technical scheme of the invention is a continuous image feature extraction method based on an information loss mechanism, which comprises the following steps:
step 1: preprocessing the data set;
acquiring real world object images, labeling the real images according to the types of objects in the real images, normalizing pixel values of all pictures, zooming and cutting the pictures, and dividing the images into a plurality of data sets, wherein each data set comprises different types of images;
step 2: constructing an automatic supervision learning model;
self-supervised learning model feature-by-feature encoder f Θ And a characteristic measuring head h; feature encoder f Θ By the feature extraction module f b And a feature projection module f g Is formed by cascading:
Figure BDA0003926543740000034
constructing a feature extraction module by using a residual convolutional neural network Resnet18, wherein the first layer of the feature extraction module is a convolutional neural network block, the second layer to the fifth layer of the feature extraction module are residual network blocks, and the last layer is an adaptive average pooling layer; the characteristic projection module is formed by connecting two layers of linear layers; feature encoder f Θ Is an image->
Figure BDA0003926543740000031
Outputting as a feature representation of an image>
Figure BDA0003926543740000032
The characteristic prediction head h is formed by connecting two layers of linear layers, the input of the characteristic prediction head h is the characteristic z of the image, and the output of the characteristic prediction head h is the prediction->
Figure BDA0003926543740000033
The block structure of the convolutional neural network is shown in fig. 1, the block structure of the residual convolutional neural network is shown in fig. 2, and the structure of the residual convolutional neural network Resnet18 is shown in fig. 3;
and step 3: constructing a self-supervision continuous learning paradigm;
self-supervised continuous learning aims at a series of orderedArriving unlabeled tasks
Figure BDA0003926543740000041
Feature representation of an upper learning image with a data set having a different distribution per task->
Figure BDA00039265437400000417
T =1, ·, T; generally, an image x is randomly sampled from a data set, and then two image transformation operations are respectively performed on the image x to obtain images x of two related viewing angles 1 And x 2 (ii) a One view x of an image using a feature encoder 1 Performing feature encoding to obtain its feature z 1 =f(x 1 ) Similarly, another view x can be obtained 2 Characteristic z of 2 =f(x 2 ) (ii) a The goal of self-supervised continuous learning is to allow the model to learn about the historical task T at any time τ in the training 1 ,...,T τ-1 And the current task T τ The image representation of (1):
Figure BDA0003926543740000042
wherein in small batches of samples
Figure BDA0003926543740000043
t = 1.., τ,>
Figure BDA0003926543740000044
to approximate the desired operator->
Figure BDA0003926543740000045
x i,t Represents slave data set->
Figure BDA00039265437400000418
Sampling an ith sample in the small batch of samples obtained by up-random sampling; loss term
Figure BDA0003926543740000046
For the purpose of self-supervised learning loss, the self-supervised loss calculation formula in simsim is used here:
Figure BDA0003926543740000047
Figure BDA0003926543740000048
wherein
Figure BDA0003926543740000049
Is that the feature encoder is for->
Figure BDA00039265437400000410
Is greater than or equal to>
Figure BDA00039265437400000411
Is that the characteristic prediction header relates to>
Figure BDA00039265437400000412
Is predicted by the characteristic representation of>
Figure BDA00039265437400000413
Stopgrad (. Cndot.) denotes stopping the gradient back propagation of the variable; i | · | live through 2 Is a two-norm operator;
however, achieving the goal of self-supervised learning is challenging; since in a continuous learning setting it is usually assumed that data from historical tasks is not available, i.e. required in inaccessible data sets
Figure BDA00039265437400000419
While t =1, t-1, τ -1, solving for the model in the data set ÷ based on the number of cells in the data set>
Figure BDA00039265437400000420
t = 1.. The optimum parameter Θ on τ * (ii) a Therefore, some continuous learning strategies need to be introduced to help the model to learnMaintaining its performance on historical tasks while previous tasks;
and 4, step 4: establishing an information loss mechanism
An InfoDrap mechanism, namely an information-based Dropout method, is introduced to help a continuous learning model to discard unimportant features in an image and only keep the important features; if the image patch input by the neuron contains less information, the Infodrop mechanism zeros the output of the neuron with higher probability, otherwise, keeps the output of the neuron; specifically, the first in the neural network is calculated under Boltzmann distribution
Figure BDA00039265437400000414
The output of the jth neuron of the c-th channel in the layer->
Figure BDA00039265437400000415
The discarding factor of (2):
Figure BDA00039265437400000416
wherein,
Figure BDA0003926543740000051
is the ^ th or greater in the neural network>
Figure BDA0003926543740000052
Input patch for jth neuron of the c-th channel in the layer;
Figure BDA0003926543740000053
When the self-information in the input patch of the neuron is low, the output of the neuron is discarded with a high probability, namely, the neural network is prompted to reduce the attention to the low-information area in the image; t is a temperature coefficient and is a 'soft threshold' of an InfoDrap mechanism, when T becomes small, namely the threshold is reduced, most of the patch is reserved, and only few patches with low self-information are lost; when T becomes infinite, i.e., the threshold goes high, the InfoDrop mechanism willDegenerates to the conventional Dropout mechanism, all patches will be dropped with equal probability;
Figure BDA0003926543740000054
Is->
Figure BDA0003926543740000055
A probability distribution of (a);
to approximate distribution
Figure BDA0003926543740000056
InfoDrap mechanism hypothesis &>
Figure BDA0003926543740000057
Is greater than or equal to>
Figure BDA0003926543740000058
Is sampled from the distribution->
Figure BDA0003926543740000059
When/is>
Figure BDA00039265437400000510
Repeating the pattern of patch in its vicinity results in a higher ≧ greater>
Figure BDA00039265437400000511
And therefore a low self-information; define a distribution->
Figure BDA00039265437400000512
The estimation of (d) is:
Figure BDA00039265437400000513
Figure BDA00039265437400000514
wherein R represents
Figure BDA00039265437400000515
The manhattan radius of the field, | | · | |, represents the euclidean distance, h is the bandwidth, 6 is the bandwidth; from
Figure BDA00039265437400000516
Can be observed when->
Figure BDA00039265437400000517
And its neighborhood->
Figure BDA00039265437400000518
The more different the patch within, the more self-information it contains, i.e. </or>
Figure BDA00039265437400000519
Will be set to zero with a lower probability;
and 5: constructing an automatic supervision continuous learning framework based on an information loss mechanism;
the method comprises the steps that a model is expected to learn feature representations of regions with important information in an image on a data set of a current task, and features of unimportant regions are ignored, so that the model can be guaranteed to be capable of learning at least key feature representations under the condition of limited model capacity; generally, an InfoDrop mechanism is implemented when a neural network model is optimized on a training set, and the InfoDrop mechanism is cancelled when the performance of the neural network model is verified on a test set, but most of areas with low self-information in an image can be discarded by the InfoDrop mechanism, so that larger distribution deviation occurs in the training data set and the test data set, and the performance of the model on the test set can be influenced; therefore, before testing the model, the model with the InfoDrop mechanism removed is usually optimized for the second time on the training set; however, the second optimization consumes additional training time and also introduces the effect of unimportant information areas in the image on the model; in order to avoid adverse effects brought by second optimization, an information loss mechanism suitable for self-supervision continuous learning is constructed based on a self-supervision learning model; when in task
Figure BDA00039265437400000520
When the model is trained, infoDrap loss is introduced on the basis of an auto-supervised loss term, and the following auto-supervised learning paradigm with an InfoDrap mechanism is constructed:
Figure BDA0003926543740000061
the self-supervision learning paradigm comprises two terms, wherein the first term is an original self-supervision loss term, and the second term is an InfoDrop regular term; wherein,
Figure BDA0003926543740000062
for a model with an InfoDrap mechanism>
Figure BDA0003926543740000063
Is recorded as &>
Figure BDA0003926543740000064
Eyes->
Figure BDA0003926543740000065
And f Θ Sharing the network weight; by minimizing InfoDrop regular terms, model f without the InfoDrop mechanism can be made Θ Is greater or less than>
Figure BDA0003926543740000066
And a model with an InfoDrap mechanism @>
Figure BDA0003926543740000067
Is greater or less than>
Figure BDA0003926543740000068
Approximation to promote model f Θ Actively capturing the characteristics of the area with important information without adopting an InfoDrap mechanism, and ignoring unimportant characteristics; method frame schematic see figure 4
Step 6: (1) Processing the data set according to the step 1 to obtain data sets of a plurality of tasks; (2) constructing an unsupervised learning model according to the step 2; (3) Training a model on a training set of each task according to the arrival sequence of the tasks;
and 7: evaluating the performance of the model by using a KNN algorithm;
at task
Figure BDA0003926543740000069
Using KNN classification algorithm on model f Θ And (3) carrying out accuracy test:
(1) Will task
Figure BDA00039265437400000610
In a training set>
Figure BDA00039265437400000611
Switch to the feature bank>
Figure BDA00039265437400000612
Wherein v is i =f Θ (x i );
(2) Predicting tasks based on feature libraries
Figure BDA00039265437400000613
Test set sample on->
Figure BDA00039265437400000614
Is greater than or equal to>
Figure BDA00039265437400000615
a) Computing test samples
Figure BDA00039265437400000616
Is characteristic of->
Figure BDA00039265437400000617
Similarity to individual signatures in the feature library->
Figure BDA00039265437400000618
s ij =cos(f i ,v j );
b) Will be provided with
Figure BDA00039265437400000619
Item preceding K big as test sample>
Figure BDA00039265437400000620
K neighbor set of>
Figure BDA00039265437400000621
Calculating a test sample->
Figure BDA00039265437400000622
Scores in C categories, the category with the highest score being the predictive classification of the test sample, test sample->
Figure BDA00039265437400000623
The score calculation formula on the jth category is as follows:
Figure BDA00039265437400000624
wherein T is a temperature parameter; test specimen
Figure BDA00039265437400000625
Is determined as being->
Figure BDA00039265437400000626
c) Calculation model f Θ At task
Figure BDA00039265437400000627
Test accuracy of (1):
Figure BDA00039265437400000628
And 8: after the model is trained on each task, the feature encoder f of the model is used Θ Chinese medicine for treating chronic hepatitis BSign extraction module f b To characterize the images of the test set on each task and then evaluate the validity of the characterization of the model using a KNN classification algorithm. The test results are shown in Table 1.
The innovation here is that:
(1) The invention establishes a framework for promoting the self-supervision model to extract important features on continuous tasks based on an InfoDrap mechanism. On a continuous learning task, the model, due to its limited capacity, makes a trade-off between preserving the feature representation capabilities of past tasks and learning the feature representation capabilities of the current task. The framework enables the model to only pay attention to the feature representation of the important image information by discarding the unimportant image information, so that the limitation of the capacity of the model is relieved, and the performance of the self-supervision model is improved under the condition that a sample of a historical task or parameter information of the historical model is not required to be introduced.
(2) The invention designs an InfoDrop loss item based on a self-supervision loss model, and can help the model to have the capability of directly extracting important feature representation of a test sample after an InfoDrop mechanism is removed in a test stage by optimizing the loss item, thereby avoiding fine tuning of the model.
Drawings
FIG. 1 is a block diagram of a convolutional network block of the method of the present invention
FIG. 2 is a block diagram of the residual convolutional neural network of the present invention
FIG. 3 is the structure diagram of the residual convolution neural network Resnet18 of the method of the present invention
FIG. 4 is a schematic diagram of the method of the present invention
Detailed Description
Step 1: preprocessing the data set;
CIFAR-10 dataset (http:// www.cs. Toronto. Edu/. Kriz/CIFAR. Html.) was downloaded, CIFAR-10 dataset containing 10 categories of real-world color pictures. Each category contains 5000 training pictures and 1000 test pictures with an image resolution size of 32 x 2. Dividing a CIFAR-10 data set into 5 tasks, wherein the data set of each task comprises two random image samples, and the image types of the data sets of each task are not overlapped with each other;
step 2: constructing a self-supervision learning model;
self-supervised learning model feature-by-feature encoder f Θ And the characteristic measuring head h. Feature encoder f Θ By the feature extraction module f b And a feature projection module f g Is formed by cascading:
Figure BDA0003926543740000074
constructing a feature extraction module by adopting a residual convolutional neural network Resnet18, wherein the first layer of the feature extraction module is a convolutional neural network block, the second layer to the fifth layer of the feature extraction module are residual convolutional neural network blocks, and the last layer of the feature extraction module is an adaptive average pooling layer; the characteristic projection module is formed by connecting two layers of linear layers. Feature encoder f Θ Is an image->
Figure BDA0003926543740000071
The output is a characteristic representation of the image->
Figure BDA0003926543740000072
The characteristic prediction head h is formed by connecting two layers of linear layers, the input of the characteristic prediction head h is the characteristic z of the image, and the output of the characteristic prediction head h is the prediction->
Figure BDA0003926543740000073
The block structure of the convolutional neural network is shown in fig. 1, the block structure of the residual convolutional neural network is shown in fig. 2, and the structure of the residual convolutional neural network Resnet18 is shown in fig. 3;
and step 3: constructing a self-supervision continuous learning paradigm;
self-supervised continuous learning addresses unlabeled tasks in a series of ordered arrivals
Figure BDA0003926543740000081
Feature representation of the upper learning image with a different distribution of data sets ≥ on each task>
Figure BDA00039265437400000817
T = 1. Generally speakingWill be derived from the data set
Figure BDA00039265437400000818
The image x is obtained by sampling randomly, and then the image x with two related visual angles is obtained by respectively carrying out image transformation operations twice on the image x 1 And x 2 . One view x of an image using a feature encoder 1 Performing feature encoding to obtain its feature z 1 =f(x 1 ) Similarly, another view x can be obtained 2 Characteristic z of 2 =f(x 2 ). The goal of self-supervised continuous learning is to allow the model to learn about the historical task T at any time τ in the training 1 ,...,T τ-1 } and the current task T τ The image representation of (1):
Figure BDA0003926543740000082
wherein in small batches of samples
Figure BDA0003926543740000083
t = 1.., τ,>
Figure BDA0003926543740000084
to approximate the desired operator->
Figure BDA0003926543740000085
x i,t Represents slave data set->
Figure BDA00039265437400000819
And (4) up-randomly sampling the ith sample in the small batch of samples. Loss term
Figure BDA0003926543740000086
For the purpose of self-supervised learning loss, the self-supervised loss calculation formula in simsim is used here:
Figure BDA0003926543740000087
Figure BDA0003926543740000088
wherein
Figure BDA0003926543740000089
Is that the feature encoder is for->
Figure BDA00039265437400000810
Is greater than or equal to>
Figure BDA00039265437400000811
Is that the characteristic prediction header relates to>
Figure BDA00039265437400000812
Is predicted by the characteristic representation of>
Figure BDA00039265437400000813
Stopgrad (. Cndot.) indicates that the gradient of the stopping variable is propagated backwards. I | · | purple wind 2 Is a two-norm operator.
However, achieving the goal of self-supervised learning is challenging. Since in a continuous learning setting it is usually assumed that data from historical tasks is not available, i.e. required in inaccessible data sets
Figure BDA00039265437400000820
t = 1.. Once.τ -1, the model is solved and evaluated in the data set ≥ s>
Figure BDA00039265437400000821
t = 1.. The optimum parameter Θ on τ * . Therefore, some continuous learning strategies need to be introduced to help the model to keep its performance on the historical tasks while learning the current task.
And 4, step 4: establishing an information loss mechanism
InfoDrop mechanism, an information-based Dropout method, is introduced to help continuous learning model discardAnd (4) retaining only important features of the unimportant features in the image. The Infodrop mechanism nulates the output of a neuron with a high probability if the input image patch contains less information, and otherwise retains its output. Specifically, the first in the neural network is calculated under Boltzmann distribution
Figure BDA00039265437400000814
Output of a jth neuron in a channel c in a layer +>
Figure BDA00039265437400000815
The discarding factor of (2):
Figure BDA00039265437400000816
wherein,
Figure BDA0003926543740000091
is the ^ th or greater in the neural network>
Figure BDA0003926543740000092
Input patch of jth neuron of the c-th channel in the layer.
Figure BDA0003926543740000093
Defined as self-information, when the self-information in the input patch of a neuron is low, the output of the neuron will be discarded with a greater probability, i.e., causing the neural network to reduce the attention to low-information regions in the image. T is a temperature coefficient and is a 'soft threshold' of an InfoDrap mechanism, when T becomes small, namely the threshold is reduced, most of the patch is reserved, and only few patches with low self-information are lost; when T becomes infinite, i.e., the threshold goes high, the InfoDrop mechanism will degenerate to the conventional Dropout mechanism and all the latches will be discarded with equal probability.
Figure BDA0003926543740000094
Is->
Figure BDA0003926543740000095
The probability distribution of (c).
To approximate distribution
Figure BDA0003926543740000096
InfoDrap mechanism hypothesis->
Figure BDA0003926543740000097
Is greater than or equal to>
Figure BDA0003926543740000098
Is sampled from the distribution->
Figure BDA0003926543740000099
When/is>
Figure BDA00039265437400000910
Repeating the pattern of patch in its vicinity results in a higher ≧ greater>
Figure BDA00039265437400000911
And therefore a low self-information. Define a distribution->
Figure BDA00039265437400000912
Is estimated as:
Figure BDA00039265437400000913
Figure BDA00039265437400000914
wherein R represents
Figure BDA00039265437400000915
The manhattan radius of the field, | | · | |, represents the euclidean distance, h is the bandwidth, b is the bandwidth. From
Figure BDA00039265437400000916
Can be observed when->
Figure BDA00039265437400000917
And its neighborhood->
Figure BDA00039265437400000918
The more different the patch within, the more self-information it contains, i.e. </or>
Figure BDA00039265437400000919
Will be zeroed with a lower probability.
And 5: constructing an automatic supervision continuous learning framework based on an information loss mechanism;
it is desirable for the model to learn only the feature representations of regions in the image that have important information on the dataset of the current task, ignoring features of unimportant regions to ensure that the model can learn at least the key feature representations with limited model capacity. Generally, an InfoDrop mechanism is implemented when a neural network model is optimized on a training set, and the InfoDrop mechanism is cancelled when the performance of the neural network model is verified on a test set, but as the InfoDrop mechanism discards most of areas with low self-information in an image, larger distribution deviation occurs in the training data set and the test data set, and the performance of the model on the test set is influenced. Therefore, a model with the InfoDrop mechanism removed is typically optimized a second time on the training set before testing the model. However, the second optimization requires additional training time and also introduces the effect of unimportant information areas in the image on the model. In order to avoid adverse effects brought by the second optimization, an information loss mechanism adaptive to the self-supervision continuous learning is constructed based on the self-supervision learning model. When in task
Figure BDA00039265437400000920
When the model is trained, infoDrop loss is introduced on the basis of an unsupervised loss item, and the following unsupervised learning paradigm with an InfoDrop mechanism is constructed:
Figure BDA0003926543740000101
the self-supervised learning paradigm contains two terms, the first term being the original self-supervised loss term and the second term being the InfoDrop canonical term. Wherein,
Figure BDA0003926543740000102
for a model with an InfoDrap mechanism>
Figure BDA0003926543740000103
Is recorded as &>
Figure BDA0003926543740000104
Eyes->
Figure BDA0003926543740000105
And f Θ And sharing the network weight. By minimizing the InfoDrop regular term, model f without the InfoDrop mechanism can be made Θ Is greater or less than>
Figure BDA0003926543740000106
And a model with an InfoDrap mechanism @>
Figure BDA0003926543740000107
Is greater or less than>
Figure BDA0003926543740000108
Approximation to promote model f Θ The characteristics of the area with important information are actively captured under the condition that an InfoDrap mechanism is not adopted, and the unimportant characteristics are ignored. Method frame schematic see figure 4
Step 6: processing the data set according to the step 1 to obtain data sets of a plurality of tasks; and (3) constructing an unsupervised learning model according to the step (2), and training the model on the training set of each task according to the task arrival sequence.
And 7: evaluating the performance of the model by using a KNN algorithm;
at task
Figure BDA0003926543740000109
Using KNN classification algorithm on model f Θ And (3) carrying out accuracy test:
(1) Will task
Figure BDA00039265437400001010
On training set>
Figure BDA00039265437400001011
Conversion into a library of characteristics>
Figure BDA00039265437400001012
Wherein v is i =f b (x i );
(2) Predicting tasks based on a feature library
Figure BDA00039265437400001013
Test set sample on->
Figure BDA00039265437400001014
Is greater than or equal to>
Figure BDA00039265437400001015
a) Calculating test samples
Figure BDA00039265437400001016
Is characteristic of->
Figure BDA00039265437400001017
Similarity to individual signatures in the feature library->
Figure BDA00039265437400001018
s ij =cos(f i ,v j );
b) Will be provided with
Figure BDA00039265437400001019
Item preceding K big as test sample>
Figure BDA00039265437400001020
K neighbor set of>
Figure BDA00039265437400001021
Calculating a test sample->
Figure BDA00039265437400001022
Scores in C categories, the category with the highest score being the predicted category of the test sample, test sample->
Figure BDA00039265437400001023
The score calculation formula on the jth category is as follows:
Figure BDA00039265437400001024
wherein T is a temperature parameter. Test specimen
Figure BDA00039265437400001025
Is determined as being->
Figure BDA00039265437400001026
c) Calculation model f Θ At task
Figure BDA00039265437400001027
Test accuracy of (1):
Figure BDA00039265437400001028
And 8: after the model is trained on each task, the feature encoder f of the model is used Θ Feature extraction module f in (1) b To characterize the images of the test set on each task and then to evaluate the validity of the characterization of the model using a KNN classification algorithm. The test results are shown in Table 1. The invention verifies on 5 typical continuous learning strategies of FINETUNE, DER, SI, LUMP and CASSLEThe superiority of the self-supervision continuous learning framework based on the information loss mechanism is improved. It can be seen from table l that the self-supervision continuous learning framework provided by the invention can significantly alleviate the catastrophic forgetting phenomenon and improve the accuracy of the model on each task.
The picture size is as follows: 32*32*3
The picture categories are: airplanes, cars, birds, cats, deer, dogs, frogs, horses, boats, and trucks.
Learning rate: 0.003
Training batch size N:256
Iteration times are as follows: 200
Table 1 is a graph of the results of the experiment of the method of the present invention.
Figure BDA0003926543740000111

Claims (1)

1. An image feature continuous extraction method based on an information loss mechanism comprises the following steps:
step 1: preprocessing the data set;
acquiring real world object images, labeling the real images according to the types of objects in the real images, normalizing pixel values of all pictures, scaling and cutting the pictures, and dividing the images into a plurality of data sets, wherein each data set comprises different types of images;
step 2: constructing an automatic supervision learning model;
self-supervised learning model feature encoder f Θ And a characteristic measuring head h; feature encoder f Θ By the feature extraction module f b And a feature projection module f g Is formed by cascading:
Figure FDA00039265437300000114
the feature extraction module is constructed by adopting a residual convolutional neural network Resnet18, the first layer of the feature extraction module is a convolutional neural network block, the second layer to the fifth layer are residual network blocks, and the last layer is an adaptive average pooling layer(ii) a The characteristic projection module is formed by connecting two layers of linear layers; feature encoder f Θ Is inputted as an image
Figure FDA0003926543730000011
The output is a characteristic representation of the image->
Figure FDA0003926543730000012
The characteristic prediction head h is formed by connecting two layers of linear layers, the input of the characteristic prediction head h is the characteristic z of an image, and the output of the characteristic prediction head h is the prediction of the image characteristic>
Figure FDA0003926543730000013
And step 3: constructing a self-supervision continuous learning paradigm;
self-supervised continuous learning addresses unlabeled tasks in a series of ordered arrivals
Figure FDA0003926543730000014
Feature representation of the upper learning image with a different distribution of data sets ≥ on each task>
Figure FDA0003926543730000015
Generally, an image x is randomly sampled from a data set, and then two image transformation operations are respectively performed on the image x to obtain images x of two related view angles 1 And x 2 (ii) a One view x of an image using a feature encoder 1 Performing feature encoding to obtain its feature z 1 =f(x 1 ) Similarly, another view x can be obtained 2 Characteristic z of 2 =f(x 2 ) (ii) a The goal of self-supervised continuous learning is to allow the model to learn about the historical task T at any time τ in the training 1 ,...,T τ-1 And the current task T τ The image of (1) represents:
Figure FDA0003926543730000016
wherein in small batches of samples
Figure FDA0003926543730000017
Up-count loss term->
Figure FDA0003926543730000018
To approximate the desired operator
Figure FDA0003926543730000019
x i,t Represents slave data set->
Figure FDA00039265437300000110
Sampling an ith sample in the small batch of samples obtained by up-random sampling; loss term->
Figure FDA00039265437300000111
For the purpose of self-supervised learning loss, the self-supervised loss calculation formula in simsim is used here:
Figure FDA00039265437300000112
Figure FDA00039265437300000113
wherein
Figure FDA0003926543730000021
Is that the feature encoder is for->
Figure FDA0003926543730000022
Is greater than or equal to>
Figure FDA0003926543730000023
Is that the characteristic prediction header relates to>
Figure FDA0003926543730000024
Prediction of feature representation of
Figure FDA0003926543730000025
Stopgrad (. Cndot.) denotes stopping the gradient back propagation of the variable; i | · | purple wind 2 Is a two-norm operator;
however, achieving the goal of self-supervised learning is challenging; since in a continuous learning setting it is usually assumed that data from historical tasks is not available, i.e. required in inaccessible data sets
Figure FDA0003926543730000026
While solving for the model at the data set->
Figure FDA0003926543730000027
Optimum parameter theta of (2) * (ii) a Therefore, some continuous learning strategies need to be introduced to help the model to keep its performance on the historical task while learning the current task;
and 4, step 4: establishing an information loss mechanism
An InfoDrop mechanism, an information-based Dropout method, is introduced to help a continuous learning model discard unimportant features in an image and only keep the important features; if the image patch input by the neuron contains less information, the Infodrop mechanism can set the output of the neuron to zero with higher probability, otherwise, the output of the neuron is kept; specifically, the first in the neural network is calculated under Boltzmann distribution
Figure FDA0003926543730000028
The output of the jth neuron of the c-th channel in the layer->
Figure FDA0003926543730000029
The discarding coefficient of (c):
Figure FDA00039265437300000210
wherein,
Figure FDA00039265437300000211
is the ^ th or greater in the neural network>
Figure FDA00039265437300000212
Input patch for jth neuron of the c-th channel in the layer;
Figure FDA00039265437300000213
When the self-information in the input patch of the neuron is lower, the output of the neuron is discarded with higher probability, namely, the neural network is prompted to reduce the attention to the low-information area in the image; t is a temperature coefficient and is a 'soft threshold' of an InfoDrap mechanism, when T becomes small, namely the threshold is reduced, most of the patch is reserved, and only few patches with low self-information are lost; when T becomes infinite, i.e., the threshold becomes high, the InfoDrop mechanism will degenerate to the conventional Dropout mechanism and all the latches will be discarded with equal probability;
Figure FDA00039265437300000214
Is->
Figure FDA00039265437300000215
A probability distribution of (a);
to approximate distribution
Figure FDA00039265437300000216
InfoDrap mechanism hypothesis->
Figure FDA00039265437300000217
Is greater than or equal to>
Figure FDA00039265437300000218
All samples of (1) are from minutesCloth/device>
Figure FDA00039265437300000219
When/is>
Figure FDA00039265437300000220
Repeating the pattern of patch in its vicinity results in a higher ≧ greater>
Figure FDA00039265437300000221
And therefore a low self-information; define a distribution->
Figure FDA00039265437300000222
The estimation of (d) is:
Figure FDA00039265437300000223
Figure FDA00039265437300000224
wherein R represents
Figure FDA0003926543730000031
The manhattan radius of the field, | | · | |, represents the euclidean distance, h is the bandwidth, b is the bandwidth; from
Figure FDA0003926543730000032
Can be observed when->
Figure FDA0003926543730000033
And its neighborhood>
Figure FDA0003926543730000034
The more diverse the patch within, it contains more self-information, i.e. </>>
Figure FDA0003926543730000035
Will be zeroed with lower probability;
and 5: constructing an automatic supervision continuous learning framework based on an information loss mechanism;
the method comprises the steps that a model is expected to learn feature representations of regions with important information in an image on a data set of a current task, and features of unimportant regions are ignored, so that the model can be guaranteed to be capable of learning at least key feature representations under the condition of limited model capacity; generally, an InfoDrop mechanism is implemented when a neural network model is optimized on a training set, and the InfoDrop mechanism is cancelled when the performance of the neural network model is verified on a test set, but as the InfoDrop mechanism discards most of areas with low self-information in an image, larger distribution deviation occurs in the training data set and the test data set, and the performance of the model on the test set is influenced; therefore, before testing the model, the model with the InfoDrop mechanism removed is usually optimized for the second time on the training set; however, the second optimization consumes additional training time and also introduces the effect of unimportant information areas in the image on the model; in order to avoid adverse effects caused by second optimization, an information loss mechanism adaptive to self-supervision continuous learning is constructed on the basis of a self-supervision learning model; when in task
Figure FDA0003926543730000036
When the model is trained, infoDrap loss is introduced on the basis of an auto-supervised loss term, and the following auto-supervised learning paradigm with an InfoDrap mechanism is constructed:
Figure FDA0003926543730000037
the self-supervision learning paradigm comprises two terms, wherein the first term is an original self-supervision loss term, and the second term is an InfoDrop regular term; wherein,
Figure FDA0003926543730000038
for models with an InfoDrap mechanism>
Figure FDA0003926543730000039
Is recorded as &>
Figure FDA00039265437300000310
Eyes->
Figure FDA00039265437300000311
And f Θ Sharing the network weight; by minimizing the InfoDrop regular term, model f without the InfoDrop mechanism can be made Θ Is greater or less than>
Figure FDA00039265437300000312
And a model with an InfoDrap mechanism @>
Figure FDA00039265437300000313
Is greater or less than>
Figure FDA00039265437300000314
Approximation to promote model f Θ Actively capturing the characteristics of the area with important information without adopting an InfoDrap mechanism, and ignoring unimportant characteristics;
step 6: (1) Processing the data set according to the step 1 to obtain data sets of a plurality of tasks; (2) constructing an unsupervised learning model according to the step 2; (3) Training a model on a training set of each task according to the arrival sequence of the tasks;
and 7: evaluating the performance of the model by using a KNN algorithm;
at task
Figure FDA00039265437300000315
On the model f by using KNN classification algorithm Θ And (3) testing accuracy:
(1) Will task
Figure FDA00039265437300000316
On training set>
Figure FDA00039265437300000317
Switch to the feature bank>
Figure FDA00039265437300000318
Wherein v is i =f Θ (x i );
(2) Predicting tasks based on a feature library
Figure FDA00039265437300000319
Test set sample on->
Figure FDA00039265437300000320
Is greater than or equal to>
Figure FDA00039265437300000321
a) Calculating test samples
Figure FDA0003926543730000041
Is characteristic of->
Figure FDA0003926543730000042
Similarity to individual signatures in the feature library->
Figure FDA0003926543730000043
s ij =cos(f i ,v j );
b) Will be provided with
Figure FDA0003926543730000044
Item preceding K big as test sample>
Figure FDA0003926543730000045
K neighbor set of>
Figure FDA0003926543730000046
Calculating a test sample->
Figure FDA0003926543730000047
Scores in C categories, the category with the highest score being the predicted category of the test sample, test sample->
Figure FDA0003926543730000048
The score calculation formula on the jth category is as follows:
Figure FDA0003926543730000049
wherein T is a temperature parameter; test specimen
Figure FDA00039265437300000410
Is determined as being->
Figure FDA00039265437300000411
c) Calculation model f Θ At task
Figure FDA00039265437300000412
Test accuracy of (1):
Figure FDA00039265437300000413
And 8: after the model is trained on each task, the feature encoder f of the model is used Θ Feature extraction module f in (1) b To characterize the images of the test set on each task and then evaluate the validity of the characterization of the model using a KNN classification algorithm.
CN202211375805.5A 2022-11-04 2022-11-04 Self-supervision continuous learning method based on information loss mechanism Active CN115952851B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211375805.5A CN115952851B (en) 2022-11-04 2022-11-04 Self-supervision continuous learning method based on information loss mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211375805.5A CN115952851B (en) 2022-11-04 2022-11-04 Self-supervision continuous learning method based on information loss mechanism

Publications (2)

Publication Number Publication Date
CN115952851A true CN115952851A (en) 2023-04-11
CN115952851B CN115952851B (en) 2024-10-01

Family

ID=87288106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211375805.5A Active CN115952851B (en) 2022-11-04 2022-11-04 Self-supervision continuous learning method based on information loss mechanism

Country Status (1)

Country Link
CN (1) CN115952851B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109690576A (en) * 2016-07-18 2019-04-26 渊慧科技有限公司 The training machine learning model in multiple machine learning tasks
CN114612847A (en) * 2022-03-31 2022-06-10 长沙理工大学 Method and system for detecting distortion of Deepfake video
CN114758195A (en) * 2022-05-10 2022-07-15 西安交通大学 Human motion prediction method capable of realizing continuous learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109690576A (en) * 2016-07-18 2019-04-26 渊慧科技有限公司 The training machine learning model in multiple machine learning tasks
CN114612847A (en) * 2022-03-31 2022-06-10 长沙理工大学 Method and system for detecting distortion of Deepfake video
CN114758195A (en) * 2022-05-10 2022-07-15 西安交通大学 Human motion prediction method capable of realizing continuous learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALESSANDRO ACHILLE 等: "Information Dropout: Learning Optimal Representations Through Noisy Computation", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》, 31 December 2018 (2018-12-31), pages 2897 - 2905, XP011698769, DOI: 10.1109/TPAMI.2017.2784440 *
莫建文 等: "基于神经元正则和资源释放的增量学习", 《华南理工大学学报(自然科学版)》, vol. 50, no. 6, 30 June 2022 (2022-06-30), pages 71 - 80 *

Also Published As

Publication number Publication date
CN115952851B (en) 2024-10-01

Similar Documents

Publication Publication Date Title
Ghosh et al. Structured variational learning of Bayesian neural networks with horseshoe priors
CN111444878B (en) Video classification method, device and computer readable storage medium
CN108960086B (en) Multi-pose human body target tracking method based on generation of confrontation network positive sample enhancement
CN114492574A (en) Pseudo label loss unsupervised countermeasure domain adaptive picture classification method based on Gaussian uniform mixing model
CN113449864A (en) Feedback type pulse neural network model training method for image data classification
CN110443372B (en) Transfer learning method and system based on entropy minimization
CN116312782B (en) Spatial transcriptome spot region clustering method fusing image gene data
CN113378937B (en) Small sample image classification method and system based on self-supervision enhancement
CN107945210A (en) Target tracking algorism based on deep learning and environment self-adaption
CN115331284A (en) Self-healing mechanism-based facial expression recognition method and system in real scene
CN116883751A (en) Non-supervision field self-adaptive image recognition method based on prototype network contrast learning
CN114417975A (en) Data classification method and system based on deep PU learning and class prior estimation
CN114048843A (en) Small sample learning network based on selective feature migration
CN118097228A (en) Multi-teacher auxiliary instance self-adaptive DNN-based mobile platform multi-target classification method
Zhang et al. Learning to search efficient densenet with layer-wise pruning
CN117079017A (en) Credible small sample image identification and classification method
Singh et al. Deep active transfer learning for image recognition
Hindarto Comparative Analysis VGG16 Vs MobileNet Performance for Fish Identification
CN115952851A (en) Self-supervision continuous learning method based on information loss mechanism
Połap et al. Meta-heuristic algorithm as feature selector for convolutional neural networks
CN112989088B (en) Visual relation example learning method based on reinforcement learning
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium
CN115019342A (en) Endangered animal target detection method based on class relation reasoning
CN113553917A (en) Office equipment identification method based on pulse transfer learning
CN114120447A (en) Behavior recognition method and system based on prototype comparison learning and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant