CN115952851A

CN115952851A - Self-supervision continuous learning method based on information loss mechanism

Info

Publication number: CN115952851A
Application number: CN202211375805.5A
Authority: CN
Inventors: 潘力立; 杨帆; 张亮; 赵江伟; 吴庆波; 李宏亮
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2023-04-11
Anticipated expiration: 2042-11-04
Also published as: CN115952851B

Abstract

The invention provides an information loss mechanism-based self-supervision continuous learning method, which comprises the following steps: (1) An unsupervised continuous learning framework based on information loss to cause models to learn only important feature representations on continuous tasks; (2) An InfoDrap loss term based on a self-supervision learning paradigm is used for helping a model to still extract important feature expressions of a test sample after an InfoDrap mechanism is removed in a testing stage. In addition, the unsupervised continuous learning framework proposed by the invention can be used simultaneously with most of the continuous learning strategies. By discarding unimportant image information, the model only focuses on the feature representation of the important image information to relieve the limitation of the capacity of the model, and the performance of the self-supervision model is improved under the condition that samples of historical tasks or parameter information of the historical model are not required to be introduced.

Description

A self-supervised continuous learning method based on information loss mechanism

技术领域Technical Field

本发明属于图像处理领域，主要用来提升自监督连续学习模型的性能；主要应用于图像分类领域。The present invention belongs to the field of image processing and is mainly used to improve the performance of a self-supervised continuous learning model; it is mainly used in the field of image classification.

背景技术Background Art

近年来，深度学习(DL,Deep Learning)在机器学习，自然语言处理等领域取得了显著的成功。DL的重点在于通过使用固定或者预定义的数据集进行离线训练来开发深度神经网络(DNN,Deep Neural Networks)，DNN在对应的任务上表现出显著的性能。但是，DNN也有局限性，训练完成的DNN是固定的，在运行过程中网络内部的参数不会再发生改变，这意味着DNN会在部署后保持静态，无法适应不断变化的环境。现实世界的应用不都是单一的，特别是与自治代理相关的应用涉及到处理连续变化的数据，随着时间的推移，模型面对的数据或者任务会发生变化，静态模型在这种场景下表现不佳。一个可能的解决方案是在数据分布发生变化时重新训练网络，然而，使用扩展后的数据集进行完整的训练是计算密集型任务，这在现实世界中计算资源受限的环境下是不可能实现的，这导致需要一种新的算法能够实现在资源高效利用的条件下进行持续学习。In recent years, deep learning (DL) has achieved remarkable success in machine learning, natural language processing and other fields. The focus of DL is to develop deep neural networks (DNN) by using fixed or predefined data sets for offline training. DNN has shown remarkable performance on the corresponding tasks. However, DNN also has limitations. The trained DNN is fixed, and the parameters inside the network will not change during operation, which means that DNN will remain static after deployment and cannot adapt to the changing environment. Real-world applications are not all single. In particular, applications related to autonomous agents involve processing continuously changing data. Over time, the data or tasks faced by the model will change. Static models perform poorly in this scenario. One possible solution is to retrain the network when the data distribution changes. However, full training with an expanded data set is a computationally intensive task, which is impossible to achieve in the real world with limited computing resources. This leads to the need for a new algorithm that can achieve continuous learning under the condition of efficient resource utilization.

持续学习在许多现实场景中都存在着需求与挑战：机器人根据环境的变化需要自主的学习新的行为规范，以此来适应新环境，完成新的任务；自动驾驶程序需要去适应不同环境，如从乡村公路到高速公路，从光线充足的场所到昏暗的环境；智能对话系统需要去适应不同的用户和情景；智能医疗应用则需要适应新的病例、新的医院以及不一致的医疗条件。Continuous learning has demands and challenges in many real-world scenarios: robots need to autonomously learn new behavioral norms based on environmental changes in order to adapt to new environments and complete new tasks; autonomous driving programs need to adapt to different environments, such as from rural roads to highways, from well-lit places to dim environments; intelligent dialogue systems need to adapt to different users and scenarios; and smart medical applications need to adapt to new cases, new hospitals, and inconsistent medical conditions.

连续学习(CL,Continual Learning)研究在非平稳数据流中进行学习的问题，其目标在于扩展模型的适应能力，令模型能在不同的任务中学习对应的知识，同时能记忆历史任务中学习到的特征。根据输入数据是否存在标签，连续学习可以分为有监督连续学习(SCL,Supervised Continual Learning)与无监督连续学习(UCL,UnsupervisedContinual Learning)，有监督的连续学习往往集中于一系列相关的任务，在输入的数据上加入人为给定的标签，可以获得任务信息和需要泛化的任务边界信息，这一设定不再满足现实情景的需要：任务标签未知、任务边界的定义不明确以及大量的类标记数据不可用，这就引出了无监督连续学习以及自监督连续学习方法。自监督学习是无监督学习的一部分，其旨在消除表示学习对人工标识的需求，自监督学习利用未加标识的原始信息来学习数据的表征。真正的自监督连续学习算法能够利用连续输入的非独立同分布的数据流，在不遗忘过去得到的知识的前提下，学习一种鲁棒的、自适应的模型。Continuous learning (CL) studies the problem of learning in non-stationary data streams. Its goal is to expand the adaptability of the model so that the model can learn corresponding knowledge in different tasks and remember the features learned in historical tasks. Depending on whether the input data has labels, continuous learning can be divided into supervised continuous learning (SCL) and unsupervised continuous learning (UCL). Supervised continuous learning often focuses on a series of related tasks. By adding artificially given labels to the input data, task information and task boundary information that needs to be generalized can be obtained. This setting no longer meets the needs of real-world scenarios: task labels are unknown, task boundaries are not clearly defined, and a large amount of class-labeled data is unavailable, which leads to unsupervised continuous learning and self-supervised continuous learning methods. Self-supervised learning is a part of unsupervised learning, which aims to eliminate the need for manual labeling in representation learning. Self-supervised learning uses unlabeled raw information to learn data representation. A true self-supervised continuous learning algorithm can use a continuously input non-independent and identically distributed data stream to learn a robust and adaptive model without forgetting the knowledge obtained in the past.

近年来，CL的研究主要集中在SCL方面，这些研究成果通常无法扩展到数据分布有偏差的实际应用场景中，因此，不依赖人工标注或者监督信息的UCL研究逐渐受到关注，尽管研究时间短，研究问题复杂，UCL领域的成果较少，但已经有成果显示依赖人工标注数据对于连续学习不是必须的，无监督的视觉表示能够减缓灾难性遗忘的问题，并且UCL能够比SCL表现出更好的性能。参考文献：Madaan,D.,Yoon,J.,Li,Y.,Liu,Y.,&Hwang,S.J.(2021,September).Representational continuity for unsupervised continual learning.InInternational Conference on Learning Representations.为了提升无监督模型的性能，一种与模型无关的轻量级方法，即信息丢失(InfoDrop)引起了的关注，该方法通过减少卷积神经网络(Convolutional Neural Networks,CNN)的纹理偏差来提高模型的鲁棒性，可解释性。参考文献：Shi,B.,Zhang,D.,Dai,Q.,Zhu,Z.,Mu,Y.,&Wang,J.(2020,November).Informative dropout for robust representation learning:A shape-biasperspective.In International Conference on Machine Learning(pp.8828-8839).PMLR.无监督的连续学习具有极高的研究价值，是构建真正智能体的关键技术之一，本发明致力于将信息丢失机制与无监督连续学习框架结合起来，提升模型的性能，构建更鲁棒合理的连续学习模型，推动无监督连续学习技术不断向前发展。In recent years, CL research has mainly focused on SCL. These research results are usually not extended to practical application scenarios with biased data distribution. Therefore, UCL research that does not rely on manual annotation or supervised information has gradually attracted attention. Although the research time is short, the research problems are complex, and there are few results in the field of UCL, there are results that show that relying on manually labeled data is not necessary for continuous learning. Unsupervised visual representation can alleviate the problem of catastrophic forgetting, and UCL can perform better than SCL. References: Madaan, D., Yoon, J., Li, Y., Liu, Y., & Hwang, S. J. (2021, September). Representational continuity for unsupervised continual learning. In International Conference on Learning Representations. In order to improve the performance of unsupervised models, a model-independent lightweight method, namely information loss (InfoDrop), has attracted attention. This method improves the robustness and interpretability of the model by reducing the texture bias of Convolutional Neural Networks (CNN). References: Shi, B., Zhang, D., Dai, Q., Zhu, Z., Mu, Y., & Wang, J. (2020, November). Informative dropout for robust representation learning: A shape-bias perspective. In International Conference on Machine Learning (pp. 8828-8839). PMLR. Unsupervised continuous learning has extremely high research value and is one of the key technologies for building a true intelligent agent. This invention is committed to combining the information loss mechanism with the unsupervised continuous learning framework to improve the performance of the model, build a more robust and reasonable continuous learning model, and promote the continuous development of unsupervised continuous learning technology.

发明内容Summary of the invention

本发明是一种自监督连续学习方法，通过在自监督模型中引入InfoDrop机制，使模型在连续学习任务中提取重要图像特征。该方法通过计算图像块的自信息量来选择遗弃不重要的图像信息，引导模型关注图像信息重要的区域，从而提升自监督模型的性能。The present invention is a self-supervised continuous learning method, which introduces the InfoDrop mechanism into the self-supervised model to enable the model to extract important image features in the continuous learning task. The method selects to discard unimportant image information by calculating the self-information of the image block, guides the model to focus on the important areas of the image information, and thus improves the performance of the self-supervised model.

该方法首先构造了基于信息丢失机制的自监督连续学习框架，将CIFAR-10数据集划分到5个任务上，按照任务到达顺序，在对应数据集上训练模型，并使用KNN算法对模型的准确性进行测试。本方法重点在于在自监督学习框架中引入信息丢失机制来提高模型性能。本发明从模型容量的角度出发，主要做了以下的工作：1)构建了自监督学习模型以及自监督连续学习范式；2)建立了一种基于信息量与Dropout方法的信息丢失机制，帮助模型丢失图像中不重要的特征，保留重要特征，并将信息丢失机制融入到自监督连续学习的框架中；3)基于自监督损失范式，结合一种InfoDrop损失项，避免了在后测试时需要去除InfoDrop机制对模型进行微调；4)在数据集CIFAR-10上进行训练，使用KNN分类算法测试模型在测试集上的准确性，评估模型的性能，并与多种连续学习学习策略进行比较。通过上述工作，验证了本发明可以适用于多种连续学习策略，并可提高不同策略下模型的性能，是一种应用性强的无监督连续学习方法。The method first constructs a self-supervised continuous learning framework based on the information loss mechanism, divides the CIFAR-10 dataset into 5 tasks, trains the model on the corresponding dataset according to the order of task arrival, and uses the KNN algorithm to test the accuracy of the model. The method focuses on introducing the information loss mechanism in the self-supervised learning framework to improve the model performance. From the perspective of model capacity, the present invention mainly does the following work: 1) constructs a self-supervised learning model and a self-supervised continuous learning paradigm; 2) establishes an information loss mechanism based on information volume and Dropout method to help the model lose unimportant features in the image, retain important features, and integrate the information loss mechanism into the framework of self-supervised continuous learning; 3) based on the self-supervised loss paradigm, combined with an InfoDrop loss term, avoids the need to remove the InfoDrop mechanism to fine-tune the model during post-testing; 4) trains on the CIFAR-10 dataset, uses the KNN classification algorithm to test the accuracy of the model on the test set, evaluates the performance of the model, and compares it with a variety of continuous learning strategies. Through the above work, it is verified that the present invention can be applied to a variety of continuous learning strategies and can improve the performance of the model under different strategies. It is an unsupervised continuous learning method with strong applicability.

为了方便地描述本发明内容，首先对一些术语进行定义。In order to conveniently describe the present invention, some terms are first defined.

定义1：残差卷积神经网络(ResNet)。通过在卷积网络中加入“残差连接”，解决了深层网络在训练中出现的退化现象，极大地增加了神经网络可训练的深度，相比于传统的卷积神经网络，残差网络具有更好训练，更易优化的优点。在本发明中，所用的残差卷积神经网络是Resnet18网络。Definition 1: Residual Convolutional Neural Network (ResNet). By adding "residual connection" to the convolutional network, the degradation phenomenon of deep network in training is solved, and the trainable depth of the neural network is greatly increased. Compared with the traditional convolutional neural network, the residual network has the advantages of better training and easier optimization. In the present invention, the residual convolutional neural network used is the Resnet18 network.

定义2：自适应平均池化层。自适应平均池化层可以对空间维度进行压缩，取出对应维度中数据的均值，自适应地输出指定尺寸的结果，在一定程度上可以抑制一些没用的特征。Definition 2: Adaptive average pooling layer. The adaptive average pooling layer can compress the spatial dimension, extract the mean of the data in the corresponding dimension, and adaptively output the result of the specified size, which can suppress some useless features to a certain extent.

定义3：SimSiam。这是孪生网络模型的别称，SimSiam模型最大化一个图像的两个增广之间的相似性，其在不需要负样本对、大的批次和动量编码的情况下学习表征。Definition 3: SimSiam. This is another name for the Siamese network model. The SimSiam model maximizes the similarity between two augmentations of an image and learns representations without the need for negative sample pairs, large batches, and momentum encoding.

定义4：Dropout方法。Dropout是一种正则化方法，通过对网络某层的神经元设置一个被丢弃的概率，在训练中按照设置的概率随机将某些神经元丢弃，以解决神经网络过拟合问题。Definition 4: Dropout method. Dropout is a regularization method that sets a probability of being dropped for neurons in a certain layer of the network and randomly drops certain neurons according to the set probability during training to solve the problem of overfitting of the neural network.

定义5：图像Patch。Patch可以理解为图像块，在神经网络的运行过程中，网络将图片划分为多个小块，卷积核每次只查看一个小块，这种小块就被称为Patch。Definition 5: Image Patch. Patch can be understood as an image block. During the operation of the neural network, the network divides the image into multiple small blocks, and the convolution kernel only looks at one small block at a time. This small block is called a Patch.

定义6：ReLU激活层。又称修正线性单元，是一种人工神经网络中常用的激活函数，通常指代以斜坡函数及其变种为代表的非线性函数，表达式为f(x)＝max(0，x)。Definition 6: ReLU activation layer. Also known as rectified linear unit, it is an activation function commonly used in artificial neural networks. It usually refers to nonlinear functions represented by ramp functions and their variants, expressed as f(x) = max(0, x).

本发明技术方案为一种基于信息丢失机制的连续图像特征提取方法，该方法包括：The technical solution of the present invention is a method for extracting features of continuous images based on an information loss mechanism, the method comprising:

步骤1：对数据集进行预处理；Step 1: Preprocess the dataset;

获取真实世界物体图像，并将这些真实图像按照其中物体的类别进行标注，对所有图片的像素值进行归一化，并对图片进行缩放裁剪，然后将图像划分成多个数据集，每个数据集包含图像的类别不同；Obtain images of real-world objects, annotate them according to the categories of the objects, normalize the pixel values of all images, scale and crop the images, and then divide the images into multiple data sets, each of which contains images of different categories;

步骤2：构建自监督学习模型；Step 2: Build a self-supervised learning model;

自监督学习模型由特征编码器f_Θ和特征预测头h两部分组成；特征编码器f_Θ由特征提取模块f_b和特征投影模块f_g级联而成：

采用残差卷积神经网络Resnet18构造特征提取模块，它的第一层为卷积神经网络块，第二层到第五层为残差网络块，最后一层为自适应平均池化层；特征投影模块由两层线性层连接而成；特征编码器f_Θ的输入为图像

输出为图像的特征表示

特征预测头h由两层线性层连接而成，它的输入为图像的特征z，输出为图像特征的预测

卷积神经网络块结构参见图1，残差卷积神经网络块结构参见图2，残差卷积神经网络Resnet18结构参见图3；The self-supervised learning model consists of two parts: the feature encoder f _Θ and the feature prediction head h; the feature encoder f _Θ is cascaded by the feature extraction module f _b and the feature projection module f _g :

The residual convolutional neural network Resnet18 is used to construct the feature extraction module. Its first layer is a convolutional neural network block, the second to fifth layers are residual network blocks, and the last layer is an adaptive average pooling layer; the feature projection module is composed of two linear layers connected; the input of the feature encoder f _Θ is the image

The output is the feature representation of the image

The feature prediction head h is composed of two linear layers connected together. Its input is the image feature z and its output is the prediction of the image feature.

See Figure 1 for the convolutional neural network block structure, Figure 2 for the residual convolutional neural network block structure, and Figure 3 for the residual convolutional neural network Resnet18 structure;

步骤3：构建自监督连续学习范式；Step 3: Construct a self-supervised continuous learning paradigm;

自监督连续学习致力于在一系列有序到达的无标签任务

上学习图像的特征表示，每个任务上具有不同分布的数据集

t＝1，...，T；一般地，会从数据集中随机采样得到图像x，然后对它分别采取两次图像变换操作得到两个相关视角的图像x¹和x²；利用特征编码器对图像的一个视角x¹进行特征编码，得到它的特征z¹＝f(x¹)，同理也可以得到另一个视角x²的特征z²＝f(x²)；自监督连续学习的目标是在训练的任意时刻τ都能让模型学习到对历史任务{T₁，...，T_τ-1}和当前任务T_τ中的图像表示：Self-supervised continuous learning aims to learn a series of unlabeled tasks that arrive in sequence.

Learning feature representations for images, with datasets of different distributions for each task

t＝1，...，T；Generally, an image x is randomly sampled from the data set, and then two image transformation operations are performed on it to obtain two images of related perspectives ^x1 and ^x2 ; a feature encoder is used to encode the features of one perspective ^x1 of the image to obtain its feature ^z1 ＝f( ^x1 ), and similarly, the feature ^z2 ＝f( ^x2 ) of another perspective ^x2 can also be obtained; The goal of self-supervised continuous learning is to enable the model to learn the image representations in the historical tasks { _T1 ，...，Tτ _-1 } and the current task _Tτ at any time τ of training:

其中，在小批次样本

t＝1，...，τ上计算损失项

的均值，以近似期望算子

x_i，t表示从数据集

上随机采样得到的小批次样本中的第i个样本；损失项

为自监督学习损失，这里采用SimSiam中的自监督损失计算公式：Among them, in small batch samples

Calculate the loss term on t=1,...,τ

The mean of

xi _,t represents the

The i-th sample in the small batch of samples obtained by random sampling; the loss term

For self-supervised learning loss, the self-supervised loss calculation formula in SimSiam is used here:

其中

是特征编码器对于

的输出，

是特征预测头关于

的特征表示的预测

stopgrad(·)表示阻止变量的梯度反向传播；||·||₂为二范数算子；in

is the feature encoder for

The output,

is the feature prediction head about

The prediction of feature representation

stopgrad(·) means stopping the gradient back propagation of the variable; ||·|| ₂ is the two-norm operator;

然而，达成自监督学习的目标是具有挑战性的；因为在连续学习设置下，通常假设来自历史任务的数据不可用，即要求在不可访问数据集

t＝1，...，τ-1的同时，求解得到模型在数据集

t＝1，...，τ上的最佳参数Θ^*；因此需要引入一些连续学习策略来帮助模型在学习当前任务的同时，保持它在历史任务上的性能；However, achieving the goal of self-supervised learning is challenging because in the continuous learning setting, it is usually assumed that data from historical tasks are not available, which requires

At t=1,...,τ-1, solve the model in the data set

The optimal parameter Θ ^* on t = 1, ..., τ; therefore, it is necessary to introduce some continuous learning strategies to help the model maintain its performance on historical tasks while learning the current task;

步骤4：建立信息丢失机制Step 4: Establish information loss mechanism

引入了InfoDrop机制——一种基于信息的Dropout方法，以帮助连续学习模型丢弃图像中不重要的特征，仅保留重要的特征；如果当神经元输入的图像patch中包含较少的信息，Infodrop机制会以较高的概率将该神经元的输出置零，否则保留它的输出；具体来说，在Boltzmann分布下计算神经网络中第

层中的第c个通道的第j个神经元的输出

的丢弃系数：The InfoDrop mechanism is introduced, which is an information-based Dropout method to help continuous learning models discard unimportant features in the image and retain only important features. If the image patch input to the neuron contains less information, the InfoDrop mechanism will set the output of the neuron to zero with a higher probability, otherwise it will retain its output. Specifically, the Boltzmann distribution is used to calculate the first

The output of the jth neuron in the cth channel in the layer

The discard coefficient is:

其中，

是神经网络中第

层中的第c个通道的第j个神经元的输入patch；

定义为自信息，当神经元的输入patch中的自信息比较低时，该神经元的输出会以较大的概率被丢弃，即促使神经网络减少对图像中的低信息区域的关注；T为温度系数，是InfoDrop机制的一个“软阈值”，当T变小的时候，即阈值降低，大部分的patch将被保留，只有极少的自信息低的patch会被丢去；当T变成无穷大的时候，即阈值变高，InfoDrop机制将退化成常规的Dropout机制，所有的patch将会被等概率地丢弃；

为

的概率分布；in,

The neural network

The input patch of the jth neuron in the cth channel in the layer;

Defined as self-information, when the self-information in the input patch of a neuron is relatively low, the output of the neuron will be discarded with a greater probability, which prompts the neural network to pay less attention to the low-information areas in the image; T is the temperature coefficient, which is a "soft threshold" of the InfoDrop mechanism. When T becomes smaller, that is, the threshold is lowered, most patches will be retained, and only a few patches with low self-information will be discarded; when T becomes infinite, that is, the threshold becomes higher, the InfoDrop mechanism will degenerate into a conventional Dropout mechanism, and all patches will be discarded with equal probability;

for

The probability distribution of

为了近似分布

InfoDrop机制假设

的邻域

的patch都采样来自于分布

当

和它邻域内的patch的模式重复时，会造成较高的

并因此得到一个低的自信息；定义分布

的估计为：To approximate the distribution

InfoDrop mechanism assumptions

Neighborhood

The patches are sampled from the distribution

when

When the pattern of the patch in its neighborhood is repeated, it will cause a higher

And thus get a low self-information; define the distribution

The estimate is:

其中，R表示

领域的曼哈顿半径，||·||表示欧式距离，h是带宽，6为带宽；从

的计算公式可以观察得出，当

和它邻域

内的patch越不相同，那么它就会包含更多的自信息，即

将会以更低的概率被置零；Among them, R represents

The Manhattan radius of the domain, ||·|| represents the Euclidean distance, h is the bandwidth, and 6 is the bandwidth; from

The calculation formula can be observed that when

and its neighbors

The more different the patches in a, the more self-information it contains, i.e.

will be set to zero with a lower probability;

步骤5：构建基于信息丢失机制的自监督连续学习框架；Step 5: Construct a self-supervised continuous learning framework based on information loss mechanism;

希望模型在当前任务的数据集上，仅学习图像中具有重要信息的区域的特征表示，忽略不重要区域的特征，以在有限的模型容量下，保证模型至少能够学习到关键的特征表示；一般地，在训练集上优化神经网络模型时实施InfoDrop机制，在测试集上验证神经网络模型的性能时取消InfoDrop机制，但由于InfoDrop机制会丢弃掉图像中大部分自信息低的区域，造成训练数据集和测试数据集出现更大的分布偏差，从而会影响模型在测试集上的性能；因此，在测试模型前，通常会将去掉InfoDrop机制的模型在训练集上进行第二次优化；然而，第二次优化需要消耗额外的训练时间，并且也会引入图像中不重要的信息区域对模型产生的影响；为了避免第二次优化带来的不利影响，基于自监督学习范式，构建了一种适应于自监督连续学习的信息丢失机制；当在任务

上训练模型时，在自监督损失项的基础上引入InfoDrop损失，构造如下带有InfoDrop机制的自监督学习范式：It is hoped that the model will only learn the feature representations of the areas with important information in the image on the dataset of the current task, and ignore the features of the unimportant areas, so as to ensure that the model can at least learn the key feature representations under the limited model capacity; generally, the InfoDrop mechanism is implemented when optimizing the neural network model on the training set, and the InfoDrop mechanism is cancelled when verifying the performance of the neural network model on the test set. However, since the InfoDrop mechanism will discard most of the areas with low self-information in the image, it will cause a greater distribution deviation between the training dataset and the test dataset, which will affect the performance of the model on the test set; therefore, before testing the model, the model with the InfoDrop mechanism removed is usually optimized for the second time on the training set; however, the second optimization requires additional training time and will also introduce the influence of unimportant information areas in the image on the model; in order to avoid the adverse effects of the second optimization, an information loss mechanism suitable for self-supervised continuous learning is constructed based on the self-supervised learning paradigm; when in the task

When training the model, InfoDrop loss is introduced on the basis of self-supervised loss term, and the following self-supervised learning paradigm with InfoDrop mechanism is constructed:

该自监督学习范式包含两项，第一项为原始的自监督损失项，第二项为InfoDrop正则项；其中，

为带有InfoDrop机制的模型

的输出，记为

目.

和f_Θ共享网络权值；通过最小化InfoDrop正则项，可以使得不带InfoDrop机制的模型f_Θ的输出

和带有InfoDrop机制的模型

的输出

相近似，以促使模型f_Θ在不采取InfoDrop机制下主动去捕获具有重要信息的区域的特征，忽略不重要的特征；方法框架示意图参见图4The self-supervised learning paradigm consists of two items, the first is the original self-supervised loss term, and the second is the InfoDrop regularization term;

For models with InfoDrop mechanism

The output of

Goal.

Share network weights with f _Θ ; By minimizing the InfoDrop regularization term, the output of the model f _Θ without the InfoDrop mechanism can be

and models with InfoDrop mechanism

Output

The method is similar to that in order to encourage the model f _Θ to actively capture the features of the area with important information without taking the InfoDrop mechanism and ignore the unimportant features; see Figure 4 for a schematic diagram of the method framework.

步骤6：(1)按步骤1处理数据集得到多个任务的数据集；(2)按步骤2构建无监督学习模型；(3)按任务到达顺序，在每个任务的训练集上训练模型；Step 6: (1) Process the data set according to step 1 to obtain data sets for multiple tasks; (2) Construct an unsupervised learning model according to step 2; (3) Train the model on the training set of each task according to the order in which the tasks arrive;

步骤7：利用KNN算法评估模型的性能；Step 7: Use KNN algorithm to evaluate the performance of the model;

在任务

上使用KNN分类算法对模型f_Θ进行准确率测试的步骤：On Task

Steps to use the KNN classification algorithm to test the accuracy of the model f _Θ :

(1)将任务

上的训练集

转换为特征库

其中v_i＝f_Θ(x_i)；(1) Task

The training set on

Convert to Feature Library

where v _i = f _Θ ( x _i );

(2)基于特征库，预测任务

上的测试集样本

的类别

(2) Prediction tasks based on feature library

The test set samples on

Category

a)计算测试样本

的特征表示

与特征库中各个表征的相似性

s_ij＝cos(f_i，v_j)；a) Calculate the test sample

The feature representation

Similarity with each representation in the feature library

s _ij =cos(f _i ,v _j );

b)将

中前K大的项作为测试样本

的K近邻集合

采用加权投票法计算测试样本

在C个类别上的得分，得分最高的类别即为测试样本的预测分类，测试样本

在第j个类别上的得分计算公式如下：b)

The first K largest items are used as test samples

The K nearest neighbor set

The weighted voting method is used to calculate the test samples

The score on C categories, the category with the highest score is the predicted classification of the test sample, and the test sample

The score calculation formula for the jth category is as follows:

其中T为温度参数；测试样本

的预测类别即为

Where T is the temperature parameter; test sample

The predicted category is

c)计算模型f_Θ在任务

上的测试准确率：

c) Computational model f _Θ in task

Test accuracy on:

步骤8：在每个任务上训练完模型后，利用模型的特征编码器f_Θ中的特征提取模块f_b来对各个任务上的测试集的图像进行特征表示，然后采用KNN分类算法来评估模型的特征表示的有效性。测试结果参见表1。Step 8: After training the model on each task, the feature extraction module _fb in the feature encoder _fθ of the model is used to represent the images of the test set on each task, and then the KNN classification algorithm is used to evaluate the effectiveness of the model's feature representation. The test results are shown in Table 1.

本文的创新之处在于：The innovation of this paper lies in:

(1)本发明基于InfoDrop机制建立了一种促进自监督模型在连续任务上提取重要特征的框架。在连续学习任务上，模型由于容量有限，会在保留过去任务的特征表示能力和学习当前任务的特征表示能力之间做权衡。本框架通过丢弃不重要的图像信息，使得模型仅关注对重要的图像信息的特征表示，以缓解模型容量的限制，在不需要引入历史任务的样本或者历史模型的参数信息情况下，提升了自监督模型的性能。(1) Based on the InfoDrop mechanism, the present invention establishes a framework that promotes the self-supervised model to extract important features on continuous tasks. In continuous learning tasks, due to the limited capacity of the model, a trade-off will be made between retaining the feature representation ability of past tasks and learning the feature representation ability of the current task. This framework discards unimportant image information so that the model only focuses on the feature representation of important image information, thereby alleviating the limitation of model capacity and improving the performance of the self-supervised model without introducing samples of historical tasks or parameter information of historical models.

(2)本发明基于自监督损失范式，设计了一种InfoDrop损失项，通过对该损失项进行优化，可以帮助模型在测试阶段去除InfoDrop机制后，拥有直接提取测试样本的重要特征表示的能力，从而避免了对模型的微调。(2) Based on the self-supervised loss paradigm, the present invention designs an InfoDrop loss term. By optimizing the loss term, the model can remove the InfoDrop mechanism during the testing phase and have the ability to directly extract important feature representations of the test samples, thereby avoiding fine-tuning of the model.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明方法的卷积网络块结构图FIG1 is a block diagram of a convolutional network of the method of the present invention.

图2为本发明方法的残差卷积神经网络块结构图FIG. 2 is a block diagram of the residual convolutional neural network of the method of the present invention.

图3为本发明方法的残差卷积神经网络Resnet18结构图Figure 3 is a diagram of the residual convolutional neural network Resnet18 structure of the method of the present invention

图4为本发明方法的框架示意图FIG. 4 is a schematic diagram of the framework of the method of the present invention

具体实施方式DETAILED DESCRIPTION

步骤1：对数据集进行预处理；Step 1: Preprocess the dataset;

下载CIFAR-10数据集(http：//www.cs.toronto.edu/～kriz/cifar.html)，CIFAR-10数据集包含10个类别的真实世界的彩色图片。每个类别包含5000张训练图片和1000张测试图片，图像分辨率大小为32*32*2。将CIFAR-10数据集划分到5个任务上，每个任务的数据集中包含两个随机类别的图像样本，且每个任务的数据集的图像类别互不重叠；Download the CIFAR-10 dataset (http://www.cs.toronto.edu/～kriz/cifar.html). The CIFAR-10 dataset contains 10 categories of real-world color images. Each category contains 5,000 training images and 1,000 test images, and the image resolution is 32*32*2. Divide the CIFAR-10 dataset into 5 tasks. Each task's dataset contains two random categories of image samples, and the image categories of each task's dataset do not overlap.

自监督学习模型由特征编码器f_Θ和特征预测头h两部分组成。特征编码器f_Θ由特征提取模块f_b和特征投影模块f_g级联而成：

采用残差卷积神经网络Resnet18构造特征提取模块，它的第一层为卷积神经网络块，第二层到第五层为残差网络块，最后一层为自适应平均池化层；特征投影模块由两层线性层连接而成。特征编码器f_Θ的输入为图像

输出为图像的特征表示

卷积神经网络块结构参见图1，残差卷积神经网络块结构参见图2，残差卷积神经网络Resnet18结构参见图3；The self-supervised learning model consists of two parts: the feature encoder f _Θ and the feature prediction head h. The feature encoder f _Θ is composed of a cascade of feature extraction module f _b and feature projection module f _g :

The residual convolutional neural network Resnet18 is used to construct the feature extraction module. Its first layer is a convolutional neural network block, the second to fifth layers are residual network blocks, and the last layer is an adaptive average pooling layer; the feature projection module is composed of two linear layers connected. The input of the feature encoder f _Θ is the image

The output is the feature representation of the image

自监督连续学习致力于在一系列有序到达的无标签任务

上学习图像的特征表示，每个任务上具有不同分布的数据集

t＝1，...，T。一般地，会从数据集

中随机采样得到图像x，然后对它分别采取两次图像变换操作得到两个相关视角的图像x¹和x²。利用特征编码器对图像的一个视角x¹进行特征编码，得到它的特征z¹＝f(x¹)，同理也可以得到另一个视角x²的特征z²＝f(x²)。自监督连续学习的目标是在训练的任意时刻τ都能让模型学习到对历史任务{T₁，...，T_τ-1}和当前任务T_τ中的图像表示：Self-supervised continuous learning aims to learn a series of unlabeled tasks that arrive in sequence.

t＝1，...，T. Generally, we will

The image x is randomly sampled from the image, and then two image transformation operations are performed on it to obtain two images of related perspectives ^x1 and ^x2 . The feature encoder is used to encode the features of one perspective ^x1 of the image, and its feature ^z1 = f( ^x1 ). Similarly, the feature ^z2 = f( ^x2 ) of another perspective ^x2 can be obtained. The goal of self-supervised continuous learning is to enable the model to learn the image representations in the historical tasks { _T1 , ..., Tτ _-1 } and the current task _Tτ at any time τ of training:

其中，在小批次样本

t＝1，...，τ上计算损失项

的均值，以近似期望算子

x_i，t表示从数据集

上随机采样得到的小批次样本中的第i个样本。损失项

Calculate the loss term on t=1,...,τ

The mean of

xi _,t represents the

The i-th sample in the small batch of samples randomly sampled from above. The loss term

其中

是特征编码器对于

的输出，

是特征预测头关于

的特征表示的预测

stopgrad(·)表示阻止变量的梯度反向传播。||·||₂为二范数算子。in

is the feature encoder for

The output,

is the feature prediction head about

The prediction of feature representation

stopgrad(·) means stopping the gradient back propagation of the variable. ||·|| ₂ is the two-norm operator.

然而，达成自监督学习的目标是具有挑战性的。因为在连续学习设置下，通常假设来自历史任务的数据不可用，即要求在不可访问数据集

t＝1，...，τ-1的同时，求解得到模型在数据集

t＝1，...，τ上的最佳参数Θ^*。因此需要引入一些连续学习策略来帮助模型在学习当前任务的同时，保持它在历史任务上的性能。However, achieving the goal of self-supervised learning is challenging because in the continuous learning setting, it is usually assumed that data from historical tasks are unavailable, which requires

At t=1,...,τ-1, solve the model in the data set

The optimal parameter Θ ^* on t = 1, ..., τ. Therefore, it is necessary to introduce some continuous learning strategies to help the model maintain its performance on historical tasks while learning the current task.

步骤4：建立信息丢失机制Step 4: Establish information loss mechanism

引入了InfoDrop机制——一种基于信息的Dropout方法，以帮助连续学习模型丢弃图像中不重要的特征，仅保留重要的特征。如果当神经元输入的图像patch中包含较少的信息，Infodrop机制会以较高的概率将该神经元的输出置零，否则保留它的输出。具体来说，在Boltzmann分布下计算神经网络中第

层中的第c个通道的第j个神经元的输出

的丢弃系数：The InfoDrop mechanism, an information-based Dropout method, is introduced to help continuous learning models discard unimportant features in the image and retain only important features. If the image patch input to a neuron contains less information, the InfoDrop mechanism will set the output of the neuron to zero with a higher probability, otherwise it will retain its output. Specifically, the Boltzmann distribution is used to calculate the first

The output of the jth neuron in the cth channel in the layer

The discard coefficient is:

其中，

是神经网络中第

层中的第c个通道的第j个神经元的输入patch。

定义为自信息，当神经元的输入patch中的自信息比较低时，该神经元的输出会以较大的概率被丢弃，即促使神经网络减少对图像中的低信息区域的关注。T为温度系数，是InfoDrop机制的一个“软阈值”，当T变小的时候，即阈值降低，大部分的patch将被保留，只有极少的自信息低的patch会被丢去；当T变成无穷大的时候，即阈值变高，InfoDrop机制将退化成常规的Dropout机制，所有的patch将会被等概率地丢弃。

为

的概率分布。in,

The neural network

The input patch of the jth neuron of the cth channel in the layer.

Defined as self-information, when the self-information in the input patch of a neuron is relatively low, the output of the neuron will be discarded with a greater probability, which prompts the neural network to pay less attention to the low-information areas in the image. T is the temperature coefficient, which is a "soft threshold" of the InfoDrop mechanism. When T becomes smaller, that is, the threshold is lowered, most patches will be retained, and only a few patches with low self-information will be discarded; when T becomes infinite, that is, the threshold becomes higher, the InfoDrop mechanism will degenerate into a conventional Dropout mechanism, and all patches will be discarded with equal probability.

for

The probability distribution of .

为了近似分布

InfoDrop机制假设

的邻域

的patch都采样来自于分布

当

和它邻域内的patch的模式重复时，会造成较高的

并因此得到一个低的自信息。定义分布

的估计为：To approximate the distribution

InfoDrop mechanism assumptions

Neighborhood

The patches are sampled from the distribution

when

And thus get a low self-information. Define the distribution

The estimate is:

其中，R表示

领域的曼哈顿半径，||·||表示欧式距离，h是带宽，b为带宽。从

的计算公式可以观察得出，当

和它邻域

内的patch越不相同，那么它就会包含更多的自信息，即

将会以更低的概率被置零。Among them, R represents

The Manhattan radius of the domain, ||·|| represents the Euclidean distance, h is the bandwidth, and b is the bandwidth.

The calculation formula can be observed that when

and its neighbors

will be set to zero with a lower probability.

希望模型在当前任务的数据集上，仅学习图像中具有重要信息的区域的特征表示，忽略不重要区域的特征，以在有限的模型容量下，保证模型至少能够学习到关键的特征表示。一般地，在训练集上优化神经网络模型时实施InfoDrop机制，在测试集上验证神经网络模型的性能时取消InfoDrop机制，但由于InfoDrop机制会丢弃掉图像中大部分自信息低的区域，造成训练数据集和测试数据集出现更大的分布偏差，从而会影响模型在测试集上的性能。因此，在测试模型前，通常会将去掉InfoDrop机制的模型在训练集上进行第二次优化。然而，第二次优化需要消耗额外的训练时间，并且也会引入图像中不重要的信息区域对模型产生的影响。为了避免第二次优化带来的不利影响，基于自监督学习范式，构建了一种适应于自监督连续学习的信息丢失机制。当在任务

上训练模型时，在自监督损失项的基础上引入InfoDrop损失，构造如下带有InfoDrop机制的自监督学习范式：It is hoped that the model will only learn the feature representations of areas with important information in the image on the dataset of the current task, and ignore the features of unimportant areas, so as to ensure that the model can at least learn the key feature representations under the limited model capacity. Generally, the InfoDrop mechanism is implemented when optimizing the neural network model on the training set, and the InfoDrop mechanism is canceled when verifying the performance of the neural network model on the test set. However, since the InfoDrop mechanism will discard most of the areas with low self-information in the image, it will cause a larger distribution deviation between the training dataset and the test dataset, which will affect the performance of the model on the test set. Therefore, before testing the model, the model with the InfoDrop mechanism removed is usually optimized a second time on the training set. However, the second optimization requires additional training time and will also introduce the influence of unimportant information areas in the image on the model. In order to avoid the adverse effects of the second optimization, an information loss mechanism suitable for self-supervised continuous learning is constructed based on the self-supervised learning paradigm. When in the task

该自监督学习范式包含两项，第一项为原始的自监督损失项，第二项为InfoDrop正则项。其中，

为带有InfoDrop机制的模型

的输出，记为

目.

和f_Θ共享网络权值。通过最小化InfoDrop正则项，可以使得不带InfoDrop机制的模型f_Θ的输出

和带有InfoDrop机制的模型

的输出

相近似，以促使模型f_Θ在不采取InfoDrop机制下主动去捕获具有重要信息的区域的特征，忽略不重要的特征。方法框架示意图参见图4This self-supervised learning paradigm consists of two items, the first is the original self-supervised loss term, and the second is the InfoDrop regularization term.

For models with InfoDrop mechanism

The output of

Goal.

Share network weights with f _Θ . By minimizing the InfoDrop regularization term, the output of the model f _Θ without the InfoDrop mechanism can be

and models with InfoDrop mechanism

Output

The method is similar to that of the previous one, so as to encourage the model f _Θ to actively capture the features of the area with important information without taking the InfoDrop mechanism and ignore the unimportant features. See Figure 4 for a schematic diagram of the method framework.

步骤6：按步骤1处理数据集得到多个任务的数据集；按步骤2构建无监督学习模型，按任务到达顺序，在每个任务的训练集上训练模型。Step 6: Process the data set according to step 1 to obtain data sets for multiple tasks; build an unsupervised learning model according to step 2, and train the model on the training set of each task according to the order in which the tasks arrive.

在任务

上使用KNN分类算法对模型f_Θ进行准确率测试的步骤：On Task

(1)将任务

上的训练集

转换为特征库

其中v_i＝f_b(x_i)；(1) Task

The training set on

Convert to Feature Library

where v _i = f _b ( x _i );

(2)基于特征库，预测任务

上的测试集样本

的类别

(2) Prediction tasks based on feature library

The test set samples on

Claims

1. An image feature continuous extraction method based on an information loss mechanism comprises the following steps:

step 1: preprocessing the data set;

acquiring real world object images, labeling the real images according to the types of objects in the real images, normalizing pixel values of all pictures, scaling and cutting the pictures, and dividing the images into a plurality of data sets, wherein each data set comprises different types of images;

step 2: constructing an automatic supervision learning model;

self-supervised learning model feature encoder f _Θ And a characteristic measuring head h; feature encoder f _Θ By the feature extraction module f _b And a feature projection module f _g Is formed by cascading:

the feature extraction module is constructed by adopting a residual convolutional neural network Resnet18, the first layer of the feature extraction module is a convolutional neural network block, the second layer to the fifth layer are residual network blocks, and the last layer is an adaptive average pooling layer(ii) a The characteristic projection module is formed by connecting two layers of linear layers; feature encoder f _Θ Is inputted as an image

The output is a characteristic representation of the image->

The characteristic prediction head h is formed by connecting two layers of linear layers, the input of the characteristic prediction head h is the characteristic z of an image, and the output of the characteristic prediction head h is the prediction of the image characteristic>

And step 3: constructing a self-supervision continuous learning paradigm;

self-supervised continuous learning addresses unlabeled tasks in a series of ordered arrivals

Feature representation of the upper learning image with a different distribution of data sets ≥ on each task>

Generally, an image x is randomly sampled from a data set, and then two image transformation operations are respectively performed on the image x to obtain images x of two related view angles ¹ And x ² (ii) a One view x of an image using a feature encoder ¹ Performing feature encoding to obtain its feature z ¹ ＝f(x ¹ ) Similarly, another view x can be obtained ² Characteristic z of ² ＝f(x ² ) (ii) a The goal of self-supervised continuous learning is to allow the model to learn about the historical task T at any time τ in the training ₁ ，...，T _τ-1 And the current task T _τ The image of (1) represents:

wherein in small batches of samples

Up-count loss term->

To approximate the desired operator

x _i，t Represents slave data set->

Sampling an ith sample in the small batch of samples obtained by up-random sampling; loss term->

For the purpose of self-supervised learning loss, the self-supervised loss calculation formula in simsim is used here:

wherein

Is that the feature encoder is for->

Is greater than or equal to>

Is that the characteristic prediction header relates to>

Prediction of feature representation of

Stopgrad (. Cndot.) denotes stopping the gradient back propagation of the variable; i | · | purple wind ₂ Is a two-norm operator;

however, achieving the goal of self-supervised learning is challenging; since in a continuous learning setting it is usually assumed that data from historical tasks is not available, i.e. required in inaccessible data sets

While solving for the model at the data set->

Optimum parameter theta of (2) ^* (ii) a Therefore, some continuous learning strategies need to be introduced to help the model to keep its performance on the historical task while learning the current task;

and 4, step 4: establishing an information loss mechanism

An InfoDrop mechanism, an information-based Dropout method, is introduced to help a continuous learning model discard unimportant features in an image and only keep the important features; if the image patch input by the neuron contains less information, the Infodrop mechanism can set the output of the neuron to zero with higher probability, otherwise, the output of the neuron is kept; specifically, the first in the neural network is calculated under Boltzmann distribution

The output of the jth neuron of the c-th channel in the layer->

The discarding coefficient of (c):

wherein,

is the ^ th or greater in the neural network>

Input patch for jth neuron of the c-th channel in the layer;

When the self-information in the input patch of the neuron is lower, the output of the neuron is discarded with higher probability, namely, the neural network is prompted to reduce the attention to the low-information area in the image; t is a temperature coefficient and is a 'soft threshold' of an InfoDrap mechanism, when T becomes small, namely the threshold is reduced, most of the patch is reserved, and only few patches with low self-information are lost; when T becomes infinite, i.e., the threshold becomes high, the InfoDrop mechanism will degenerate to the conventional Dropout mechanism and all the latches will be discarded with equal probability;

Is->

A probability distribution of (a);

to approximate distribution

InfoDrap mechanism hypothesis->

Is greater than or equal to>

All samples of (1) are from minutesCloth/device>

When/is>

Repeating the pattern of patch in its vicinity results in a higher ≧ greater>

And therefore a low self-information; define a distribution->

The estimation of (d) is:

wherein R represents

The manhattan radius of the field, | | · | |, represents the euclidean distance, h is the bandwidth, b is the bandwidth; from

Can be observed when->

And its neighborhood>

The more diverse the patch within, it contains more self-information, i.e. </>>

Will be zeroed with lower probability;

and 5: constructing an automatic supervision continuous learning framework based on an information loss mechanism;

the method comprises the steps that a model is expected to learn feature representations of regions with important information in an image on a data set of a current task, and features of unimportant regions are ignored, so that the model can be guaranteed to be capable of learning at least key feature representations under the condition of limited model capacity; generally, an InfoDrop mechanism is implemented when a neural network model is optimized on a training set, and the InfoDrop mechanism is cancelled when the performance of the neural network model is verified on a test set, but as the InfoDrop mechanism discards most of areas with low self-information in an image, larger distribution deviation occurs in the training data set and the test data set, and the performance of the model on the test set is influenced; therefore, before testing the model, the model with the InfoDrop mechanism removed is usually optimized for the second time on the training set; however, the second optimization consumes additional training time and also introduces the effect of unimportant information areas in the image on the model; in order to avoid adverse effects caused by second optimization, an information loss mechanism adaptive to self-supervision continuous learning is constructed on the basis of a self-supervision learning model; when in task

When the model is trained, infoDrap loss is introduced on the basis of an auto-supervised loss term, and the following auto-supervised learning paradigm with an InfoDrap mechanism is constructed:

the self-supervision learning paradigm comprises two terms, wherein the first term is an original self-supervision loss term, and the second term is an InfoDrop regular term; wherein,

for models with an InfoDrap mechanism>

Is recorded as &>

Eyes->

And f _Θ Sharing the network weight; by minimizing the InfoDrop regular term, model f without the InfoDrop mechanism can be made _Θ Is greater or less than>

And a model with an InfoDrap mechanism @>

Is greater or less than>

Approximation to promote model f _Θ Actively capturing the characteristics of the area with important information without adopting an InfoDrap mechanism, and ignoring unimportant characteristics;

step 6: (1) Processing the data set according to the step 1 to obtain data sets of a plurality of tasks; (2) constructing an unsupervised learning model according to the step 2; (3) Training a model on a training set of each task according to the arrival sequence of the tasks;

and 7: evaluating the performance of the model by using a KNN algorithm;

at task

On the model f by using KNN classification algorithm _Θ And (3) testing accuracy:

(1) Will task

On training set>

Switch to the feature bank>

Wherein v is _i ＝f _Θ (x _i )；

(2) Predicting tasks based on a feature library

Test set sample on->

Is greater than or equal to>

a) Calculating test samples

Is characteristic of->

Similarity to individual signatures in the feature library->

s _ij ＝cos(f _i ，v _j )；

b) Will be provided with

Item preceding K big as test sample>

K neighbor set of>

Calculating a test sample->

Scores in C categories, the category with the highest score being the predicted category of the test sample, test sample->

The score calculation formula on the jth category is as follows:

wherein T is a temperature parameter; test specimen

Is determined as being->

c) Calculation model f _Θ At task

Test accuracy of (1):

And 8: after the model is trained on each task, the feature encoder f of the model is used _Θ Feature extraction module f in (1) _b To characterize the images of the test set on each task and then evaluate the validity of the characterization of the model using a KNN classification algorithm.