CN115952851A - Self-supervision continuous learning method based on information loss mechanism - Google Patents

Self-supervision continuous learning method based on information loss mechanism Download PDF

Info

Publication number
CN115952851A
CN115952851A CN202211375805.5A CN202211375805A CN115952851A CN 115952851 A CN115952851 A CN 115952851A CN 202211375805 A CN202211375805 A CN 202211375805A CN 115952851 A CN115952851 A CN 115952851A
Authority
CN
China
Prior art keywords
model
self
feature
image
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211375805.5A
Other languages
Chinese (zh)
Other versions
CN115952851B (en
Inventor
潘力立
杨帆
张亮
赵江伟
吴庆波
李宏亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202211375805.5A priority Critical patent/CN115952851B/en
Publication of CN115952851A publication Critical patent/CN115952851A/en
Application granted granted Critical
Publication of CN115952851B publication Critical patent/CN115952851B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides an information loss mechanism-based self-supervision continuous learning method, which comprises the following steps: (1) An unsupervised continuous learning framework based on information loss to cause models to learn only important feature representations on continuous tasks; (2) An InfoDrap loss term based on a self-supervision learning paradigm is used for helping a model to still extract important feature expressions of a test sample after an InfoDrap mechanism is removed in a testing stage. In addition, the unsupervised continuous learning framework proposed by the invention can be used simultaneously with most of the continuous learning strategies. By discarding unimportant image information, the model only focuses on the feature representation of the important image information to relieve the limitation of the capacity of the model, and the performance of the self-supervision model is improved under the condition that samples of historical tasks or parameter information of the historical model are not required to be introduced.

Description

一种基于信息丢失机制的自监督连续学习方法A self-supervised continuous learning method based on information loss mechanism

技术领域Technical Field

本发明属于图像处理领域,主要用来提升自监督连续学习模型的性能;主要应用于图像分类领域。The present invention belongs to the field of image processing and is mainly used to improve the performance of a self-supervised continuous learning model; it is mainly used in the field of image classification.

背景技术Background Art

近年来,深度学习(DL,Deep Learning)在机器学习,自然语言处理等领域取得了显著的成功。DL的重点在于通过使用固定或者预定义的数据集进行离线训练来开发深度神经网络(DNN,Deep Neural Networks),DNN在对应的任务上表现出显著的性能。但是,DNN也有局限性,训练完成的DNN是固定的,在运行过程中网络内部的参数不会再发生改变,这意味着DNN会在部署后保持静态,无法适应不断变化的环境。现实世界的应用不都是单一的,特别是与自治代理相关的应用涉及到处理连续变化的数据,随着时间的推移,模型面对的数据或者任务会发生变化,静态模型在这种场景下表现不佳。一个可能的解决方案是在数据分布发生变化时重新训练网络,然而,使用扩展后的数据集进行完整的训练是计算密集型任务,这在现实世界中计算资源受限的环境下是不可能实现的,这导致需要一种新的算法能够实现在资源高效利用的条件下进行持续学习。In recent years, deep learning (DL) has achieved remarkable success in machine learning, natural language processing and other fields. The focus of DL is to develop deep neural networks (DNN) by using fixed or predefined data sets for offline training. DNN has shown remarkable performance on the corresponding tasks. However, DNN also has limitations. The trained DNN is fixed, and the parameters inside the network will not change during operation, which means that DNN will remain static after deployment and cannot adapt to the changing environment. Real-world applications are not all single. In particular, applications related to autonomous agents involve processing continuously changing data. Over time, the data or tasks faced by the model will change. Static models perform poorly in this scenario. One possible solution is to retrain the network when the data distribution changes. However, full training with an expanded data set is a computationally intensive task, which is impossible to achieve in the real world with limited computing resources. This leads to the need for a new algorithm that can achieve continuous learning under the condition of efficient resource utilization.

持续学习在许多现实场景中都存在着需求与挑战:机器人根据环境的变化需要自主的学习新的行为规范,以此来适应新环境,完成新的任务;自动驾驶程序需要去适应不同环境,如从乡村公路到高速公路,从光线充足的场所到昏暗的环境;智能对话系统需要去适应不同的用户和情景;智能医疗应用则需要适应新的病例、新的医院以及不一致的医疗条件。Continuous learning has demands and challenges in many real-world scenarios: robots need to autonomously learn new behavioral norms based on environmental changes in order to adapt to new environments and complete new tasks; autonomous driving programs need to adapt to different environments, such as from rural roads to highways, from well-lit places to dim environments; intelligent dialogue systems need to adapt to different users and scenarios; and smart medical applications need to adapt to new cases, new hospitals, and inconsistent medical conditions.

连续学习(CL,Continual Learning)研究在非平稳数据流中进行学习的问题,其目标在于扩展模型的适应能力,令模型能在不同的任务中学习对应的知识,同时能记忆历史任务中学习到的特征。根据输入数据是否存在标签,连续学习可以分为有监督连续学习(SCL,Supervised Continual Learning)与无监督连续学习(UCL,UnsupervisedContinual Learning),有监督的连续学习往往集中于一系列相关的任务,在输入的数据上加入人为给定的标签,可以获得任务信息和需要泛化的任务边界信息,这一设定不再满足现实情景的需要:任务标签未知、任务边界的定义不明确以及大量的类标记数据不可用,这就引出了无监督连续学习以及自监督连续学习方法。自监督学习是无监督学习的一部分,其旨在消除表示学习对人工标识的需求,自监督学习利用未加标识的原始信息来学习数据的表征。真正的自监督连续学习算法能够利用连续输入的非独立同分布的数据流,在不遗忘过去得到的知识的前提下,学习一种鲁棒的、自适应的模型。Continuous learning (CL) studies the problem of learning in non-stationary data streams. Its goal is to expand the adaptability of the model so that the model can learn corresponding knowledge in different tasks and remember the features learned in historical tasks. Depending on whether the input data has labels, continuous learning can be divided into supervised continuous learning (SCL) and unsupervised continuous learning (UCL). Supervised continuous learning often focuses on a series of related tasks. By adding artificially given labels to the input data, task information and task boundary information that needs to be generalized can be obtained. This setting no longer meets the needs of real-world scenarios: task labels are unknown, task boundaries are not clearly defined, and a large amount of class-labeled data is unavailable, which leads to unsupervised continuous learning and self-supervised continuous learning methods. Self-supervised learning is a part of unsupervised learning, which aims to eliminate the need for manual labeling in representation learning. Self-supervised learning uses unlabeled raw information to learn data representation. A true self-supervised continuous learning algorithm can use a continuously input non-independent and identically distributed data stream to learn a robust and adaptive model without forgetting the knowledge obtained in the past.

近年来,CL的研究主要集中在SCL方面,这些研究成果通常无法扩展到数据分布有偏差的实际应用场景中,因此,不依赖人工标注或者监督信息的UCL研究逐渐受到关注,尽管研究时间短,研究问题复杂,UCL领域的成果较少,但已经有成果显示依赖人工标注数据对于连续学习不是必须的,无监督的视觉表示能够减缓灾难性遗忘的问题,并且UCL能够比SCL表现出更好的性能。参考文献:Madaan,D.,Yoon,J.,Li,Y.,Liu,Y.,&Hwang,S.J.(2021,September).Representational continuity for unsupervised continual learning.InInternational Conference on Learning Representations.为了提升无监督模型的性能,一种与模型无关的轻量级方法,即信息丢失(InfoDrop)引起了的关注,该方法通过减少卷积神经网络(Convolutional Neural Networks,CNN)的纹理偏差来提高模型的鲁棒性,可解释性。参考文献:Shi,B.,Zhang,D.,Dai,Q.,Zhu,Z.,Mu,Y.,&Wang,J.(2020,November).Informative dropout for robust representation learning:A shape-biasperspective.In International Conference on Machine Learning(pp.8828-8839).PMLR.无监督的连续学习具有极高的研究价值,是构建真正智能体的关键技术之一,本发明致力于将信息丢失机制与无监督连续学习框架结合起来,提升模型的性能,构建更鲁棒合理的连续学习模型,推动无监督连续学习技术不断向前发展。In recent years, CL research has mainly focused on SCL. These research results are usually not extended to practical application scenarios with biased data distribution. Therefore, UCL research that does not rely on manual annotation or supervised information has gradually attracted attention. Although the research time is short, the research problems are complex, and there are few results in the field of UCL, there are results that show that relying on manually labeled data is not necessary for continuous learning. Unsupervised visual representation can alleviate the problem of catastrophic forgetting, and UCL can perform better than SCL. References: Madaan, D., Yoon, J., Li, Y., Liu, Y., & Hwang, S. J. (2021, September). Representational continuity for unsupervised continual learning. In International Conference on Learning Representations. In order to improve the performance of unsupervised models, a model-independent lightweight method, namely information loss (InfoDrop), has attracted attention. This method improves the robustness and interpretability of the model by reducing the texture bias of Convolutional Neural Networks (CNN). References: Shi, B., Zhang, D., Dai, Q., Zhu, Z., Mu, Y., & Wang, J. (2020, November). Informative dropout for robust representation learning: A shape-bias perspective. In International Conference on Machine Learning (pp. 8828-8839). PMLR. Unsupervised continuous learning has extremely high research value and is one of the key technologies for building a true intelligent agent. This invention is committed to combining the information loss mechanism with the unsupervised continuous learning framework to improve the performance of the model, build a more robust and reasonable continuous learning model, and promote the continuous development of unsupervised continuous learning technology.

发明内容Summary of the invention

本发明是一种自监督连续学习方法,通过在自监督模型中引入InfoDrop机制,使模型在连续学习任务中提取重要图像特征。该方法通过计算图像块的自信息量来选择遗弃不重要的图像信息,引导模型关注图像信息重要的区域,从而提升自监督模型的性能。The present invention is a self-supervised continuous learning method, which introduces the InfoDrop mechanism into the self-supervised model to enable the model to extract important image features in the continuous learning task. The method selects to discard unimportant image information by calculating the self-information of the image block, guides the model to focus on the important areas of the image information, and thus improves the performance of the self-supervised model.

该方法首先构造了基于信息丢失机制的自监督连续学习框架,将CIFAR-10数据集划分到5个任务上,按照任务到达顺序,在对应数据集上训练模型,并使用KNN算法对模型的准确性进行测试。本方法重点在于在自监督学习框架中引入信息丢失机制来提高模型性能。本发明从模型容量的角度出发,主要做了以下的工作:1)构建了自监督学习模型以及自监督连续学习范式;2)建立了一种基于信息量与Dropout方法的信息丢失机制,帮助模型丢失图像中不重要的特征,保留重要特征,并将信息丢失机制融入到自监督连续学习的框架中;3)基于自监督损失范式,结合一种InfoDrop损失项,避免了在后测试时需要去除InfoDrop机制对模型进行微调;4)在数据集CIFAR-10上进行训练,使用KNN分类算法测试模型在测试集上的准确性,评估模型的性能,并与多种连续学习学习策略进行比较。通过上述工作,验证了本发明可以适用于多种连续学习策略,并可提高不同策略下模型的性能,是一种应用性强的无监督连续学习方法。The method first constructs a self-supervised continuous learning framework based on the information loss mechanism, divides the CIFAR-10 dataset into 5 tasks, trains the model on the corresponding dataset according to the order of task arrival, and uses the KNN algorithm to test the accuracy of the model. The method focuses on introducing the information loss mechanism in the self-supervised learning framework to improve the model performance. From the perspective of model capacity, the present invention mainly does the following work: 1) constructs a self-supervised learning model and a self-supervised continuous learning paradigm; 2) establishes an information loss mechanism based on information volume and Dropout method to help the model lose unimportant features in the image, retain important features, and integrate the information loss mechanism into the framework of self-supervised continuous learning; 3) based on the self-supervised loss paradigm, combined with an InfoDrop loss term, avoids the need to remove the InfoDrop mechanism to fine-tune the model during post-testing; 4) trains on the CIFAR-10 dataset, uses the KNN classification algorithm to test the accuracy of the model on the test set, evaluates the performance of the model, and compares it with a variety of continuous learning strategies. Through the above work, it is verified that the present invention can be applied to a variety of continuous learning strategies and can improve the performance of the model under different strategies. It is an unsupervised continuous learning method with strong applicability.

为了方便地描述本发明内容,首先对一些术语进行定义。In order to conveniently describe the present invention, some terms are first defined.

定义1:残差卷积神经网络(ResNet)。通过在卷积网络中加入“残差连接”,解决了深层网络在训练中出现的退化现象,极大地增加了神经网络可训练的深度,相比于传统的卷积神经网络,残差网络具有更好训练,更易优化的优点。在本发明中,所用的残差卷积神经网络是Resnet18网络。Definition 1: Residual Convolutional Neural Network (ResNet). By adding "residual connection" to the convolutional network, the degradation phenomenon of deep network in training is solved, and the trainable depth of the neural network is greatly increased. Compared with the traditional convolutional neural network, the residual network has the advantages of better training and easier optimization. In the present invention, the residual convolutional neural network used is the Resnet18 network.

定义2:自适应平均池化层。自适应平均池化层可以对空间维度进行压缩,取出对应维度中数据的均值,自适应地输出指定尺寸的结果,在一定程度上可以抑制一些没用的特征。Definition 2: Adaptive average pooling layer. The adaptive average pooling layer can compress the spatial dimension, extract the mean of the data in the corresponding dimension, and adaptively output the result of the specified size, which can suppress some useless features to a certain extent.

定义3:SimSiam。这是孪生网络模型的别称,SimSiam模型最大化一个图像的两个增广之间的相似性,其在不需要负样本对、大的批次和动量编码的情况下学习表征。Definition 3: SimSiam. This is another name for the Siamese network model. The SimSiam model maximizes the similarity between two augmentations of an image and learns representations without the need for negative sample pairs, large batches, and momentum encoding.

定义4:Dropout方法。Dropout是一种正则化方法,通过对网络某层的神经元设置一个被丢弃的概率,在训练中按照设置的概率随机将某些神经元丢弃,以解决神经网络过拟合问题。Definition 4: Dropout method. Dropout is a regularization method that sets a probability of being dropped for neurons in a certain layer of the network and randomly drops certain neurons according to the set probability during training to solve the problem of overfitting of the neural network.

定义5:图像Patch。Patch可以理解为图像块,在神经网络的运行过程中,网络将图片划分为多个小块,卷积核每次只查看一个小块,这种小块就被称为Patch。Definition 5: Image Patch. Patch can be understood as an image block. During the operation of the neural network, the network divides the image into multiple small blocks, and the convolution kernel only looks at one small block at a time. This small block is called a Patch.

定义6:ReLU激活层。又称修正线性单元,是一种人工神经网络中常用的激活函数,通常指代以斜坡函数及其变种为代表的非线性函数,表达式为f(x)=max(0,x)。Definition 6: ReLU activation layer. Also known as rectified linear unit, it is an activation function commonly used in artificial neural networks. It usually refers to nonlinear functions represented by ramp functions and their variants, expressed as f(x) = max(0, x).

本发明技术方案为一种基于信息丢失机制的连续图像特征提取方法,该方法包括:The technical solution of the present invention is a method for extracting features of continuous images based on an information loss mechanism, the method comprising:

步骤1:对数据集进行预处理;Step 1: Preprocess the dataset;

获取真实世界物体图像,并将这些真实图像按照其中物体的类别进行标注,对所有图片的像素值进行归一化,并对图片进行缩放裁剪,然后将图像划分成多个数据集,每个数据集包含图像的类别不同;Obtain images of real-world objects, annotate them according to the categories of the objects, normalize the pixel values of all images, scale and crop the images, and then divide the images into multiple data sets, each of which contains images of different categories;

步骤2:构建自监督学习模型;Step 2: Build a self-supervised learning model;

自监督学习模型由特征编码器fΘ和特征预测头h两部分组成;特征编码器fΘ由特征提取模块fb和特征投影模块fg级联而成:

Figure BDA0003926543740000034
采用残差卷积神经网络Resnet18构造特征提取模块,它的第一层为卷积神经网络块,第二层到第五层为残差网络块,最后一层为自适应平均池化层;特征投影模块由两层线性层连接而成;特征编码器fΘ的输入为图像
Figure BDA0003926543740000031
输出为图像的特征表示
Figure BDA0003926543740000032
特征预测头h由两层线性层连接而成,它的输入为图像的特征z,输出为图像特征的预测
Figure BDA0003926543740000033
卷积神经网络块结构参见图1,残差卷积神经网络块结构参见图2,残差卷积神经网络Resnet18结构参见图3;The self-supervised learning model consists of two parts: the feature encoder f Θ and the feature prediction head h; the feature encoder f Θ is cascaded by the feature extraction module f b and the feature projection module f g :
Figure BDA0003926543740000034
The residual convolutional neural network Resnet18 is used to construct the feature extraction module. Its first layer is a convolutional neural network block, the second to fifth layers are residual network blocks, and the last layer is an adaptive average pooling layer; the feature projection module is composed of two linear layers connected; the input of the feature encoder f Θ is the image
Figure BDA0003926543740000031
The output is the feature representation of the image
Figure BDA0003926543740000032
The feature prediction head h is composed of two linear layers connected together. Its input is the image feature z and its output is the prediction of the image feature.
Figure BDA0003926543740000033
See Figure 1 for the convolutional neural network block structure, Figure 2 for the residual convolutional neural network block structure, and Figure 3 for the residual convolutional neural network Resnet18 structure;

步骤3:构建自监督连续学习范式;Step 3: Construct a self-supervised continuous learning paradigm;

自监督连续学习致力于在一系列有序到达的无标签任务

Figure BDA0003926543740000041
上学习图像的特征表示,每个任务上具有不同分布的数据集
Figure BDA00039265437400000417
t=1,...,T;一般地,会从数据集中随机采样得到图像x,然后对它分别采取两次图像变换操作得到两个相关视角的图像x1和x2;利用特征编码器对图像的一个视角x1进行特征编码,得到它的特征z1=f(x1),同理也可以得到另一个视角x2的特征z2=f(x2);自监督连续学习的目标是在训练的任意时刻τ都能让模型学习到对历史任务{T1,...,Tτ-1}和当前任务Tτ中的图像表示:Self-supervised continuous learning aims to learn a series of unlabeled tasks that arrive in sequence.
Figure BDA0003926543740000041
Learning feature representations for images, with datasets of different distributions for each task
Figure BDA00039265437400000417
t=1,...,T;Generally, an image x is randomly sampled from the data set, and then two image transformation operations are performed on it to obtain two images of related perspectives x1 and x2 ; a feature encoder is used to encode the features of one perspective x1 of the image to obtain its feature z1 =f( x1 ), and similarly, the feature z2 =f( x2 ) of another perspective x2 can also be obtained; The goal of self-supervised continuous learning is to enable the model to learn the image representations in the historical tasks { T1 ,...,Tτ -1 } and the current task at any time τ of training:

Figure BDA0003926543740000042
Figure BDA0003926543740000042

其中,在小批次样本

Figure BDA0003926543740000043
t=1,...,τ上计算损失项
Figure BDA0003926543740000044
的均值,以近似期望算子
Figure BDA0003926543740000045
xi,t表示从数据集
Figure BDA00039265437400000418
上随机采样得到的小批次样本中的第i个样本;损失项
Figure BDA0003926543740000046
为自监督学习损失,这里采用SimSiam中的自监督损失计算公式:Among them, in small batch samples
Figure BDA0003926543740000043
Calculate the loss term on t=1,...,τ
Figure BDA0003926543740000044
The mean of
Figure BDA0003926543740000045
xi ,t represents the
Figure BDA00039265437400000418
The i-th sample in the small batch of samples obtained by random sampling; the loss term
Figure BDA0003926543740000046
For self-supervised learning loss, the self-supervised loss calculation formula in SimSiam is used here:

Figure BDA0003926543740000047
Figure BDA0003926543740000047

Figure BDA0003926543740000048
Figure BDA0003926543740000048

其中

Figure BDA0003926543740000049
是特征编码器对于
Figure BDA00039265437400000410
的输出,
Figure BDA00039265437400000411
是特征预测头关于
Figure BDA00039265437400000412
的特征表示的预测
Figure BDA00039265437400000413
stopgrad(·)表示阻止变量的梯度反向传播;||·||2为二范数算子;in
Figure BDA0003926543740000049
is the feature encoder for
Figure BDA00039265437400000410
The output,
Figure BDA00039265437400000411
is the feature prediction head about
Figure BDA00039265437400000412
The prediction of feature representation
Figure BDA00039265437400000413
stopgrad(·) means stopping the gradient back propagation of the variable; ||·|| 2 is the two-norm operator;

然而,达成自监督学习的目标是具有挑战性的;因为在连续学习设置下,通常假设来自历史任务的数据不可用,即要求在不可访问数据集

Figure BDA00039265437400000419
t=1,...,τ-1的同时,求解得到模型在数据集
Figure BDA00039265437400000420
t=1,...,τ上的最佳参数Θ*;因此需要引入一些连续学习策略来帮助模型在学习当前任务的同时,保持它在历史任务上的性能;However, achieving the goal of self-supervised learning is challenging because in the continuous learning setting, it is usually assumed that data from historical tasks are not available, which requires
Figure BDA00039265437400000419
At t=1,...,τ-1, solve the model in the data set
Figure BDA00039265437400000420
The optimal parameter Θ * on t = 1, ..., τ; therefore, it is necessary to introduce some continuous learning strategies to help the model maintain its performance on historical tasks while learning the current task;

步骤4:建立信息丢失机制Step 4: Establish information loss mechanism

引入了InfoDrop机制——一种基于信息的Dropout方法,以帮助连续学习模型丢弃图像中不重要的特征,仅保留重要的特征;如果当神经元输入的图像patch中包含较少的信息,Infodrop机制会以较高的概率将该神经元的输出置零,否则保留它的输出;具体来说,在Boltzmann分布下计算神经网络中第

Figure BDA00039265437400000414
层中的第c个通道的第j个神经元的输出
Figure BDA00039265437400000415
的丢弃系数:The InfoDrop mechanism is introduced, which is an information-based Dropout method to help continuous learning models discard unimportant features in the image and retain only important features. If the image patch input to the neuron contains less information, the InfoDrop mechanism will set the output of the neuron to zero with a higher probability, otherwise it will retain its output. Specifically, the Boltzmann distribution is used to calculate the first
Figure BDA00039265437400000414
The output of the jth neuron in the cth channel in the layer
Figure BDA00039265437400000415
The discard coefficient is:

Figure BDA00039265437400000416
Figure BDA00039265437400000416

其中,

Figure BDA0003926543740000051
是神经网络中第
Figure BDA0003926543740000052
层中的第c个通道的第j个神经元的输入patch;
Figure BDA0003926543740000053
定义为自信息,当神经元的输入patch中的自信息比较低时,该神经元的输出会以较大的概率被丢弃,即促使神经网络减少对图像中的低信息区域的关注;T为温度系数,是InfoDrop机制的一个“软阈值”,当T变小的时候,即阈值降低,大部分的patch将被保留,只有极少的自信息低的patch会被丢去;当T变成无穷大的时候,即阈值变高,InfoDrop机制将退化成常规的Dropout机制,所有的patch将会被等概率地丢弃;
Figure BDA0003926543740000054
Figure BDA0003926543740000055
的概率分布;in,
Figure BDA0003926543740000051
The neural network
Figure BDA0003926543740000052
The input patch of the jth neuron in the cth channel in the layer;
Figure BDA0003926543740000053
Defined as self-information, when the self-information in the input patch of a neuron is relatively low, the output of the neuron will be discarded with a greater probability, which prompts the neural network to pay less attention to the low-information areas in the image; T is the temperature coefficient, which is a "soft threshold" of the InfoDrop mechanism. When T becomes smaller, that is, the threshold is lowered, most patches will be retained, and only a few patches with low self-information will be discarded; when T becomes infinite, that is, the threshold becomes higher, the InfoDrop mechanism will degenerate into a conventional Dropout mechanism, and all patches will be discarded with equal probability;
Figure BDA0003926543740000054
for
Figure BDA0003926543740000055
The probability distribution of

为了近似分布

Figure BDA0003926543740000056
InfoDrop机制假设
Figure BDA0003926543740000057
的邻域
Figure BDA0003926543740000058
的patch都采样来自于分布
Figure BDA0003926543740000059
Figure BDA00039265437400000510
和它邻域内的patch的模式重复时,会造成较高的
Figure BDA00039265437400000511
并因此得到一个低的自信息;定义分布
Figure BDA00039265437400000512
的估计为:To approximate the distribution
Figure BDA0003926543740000056
InfoDrop mechanism assumptions
Figure BDA0003926543740000057
Neighborhood
Figure BDA0003926543740000058
The patches are sampled from the distribution
Figure BDA0003926543740000059
when
Figure BDA00039265437400000510
When the pattern of the patch in its neighborhood is repeated, it will cause a higher
Figure BDA00039265437400000511
And thus get a low self-information; define the distribution
Figure BDA00039265437400000512
The estimate is:

Figure BDA00039265437400000513
Figure BDA00039265437400000513

Figure BDA00039265437400000514
Figure BDA00039265437400000514

其中,R表示

Figure BDA00039265437400000515
领域的曼哈顿半径,||·||表示欧式距离,h是带宽,6为带宽;从
Figure BDA00039265437400000516
的计算公式可以观察得出,当
Figure BDA00039265437400000517
和它邻域
Figure BDA00039265437400000518
内的patch越不相同,那么它就会包含更多的自信息,即
Figure BDA00039265437400000519
将会以更低的概率被置零;Among them, R represents
Figure BDA00039265437400000515
The Manhattan radius of the domain, ||·|| represents the Euclidean distance, h is the bandwidth, and 6 is the bandwidth; from
Figure BDA00039265437400000516
The calculation formula can be observed that when
Figure BDA00039265437400000517
and its neighbors
Figure BDA00039265437400000518
The more different the patches in a, the more self-information it contains, i.e.
Figure BDA00039265437400000519
will be set to zero with a lower probability;

步骤5:构建基于信息丢失机制的自监督连续学习框架;Step 5: Construct a self-supervised continuous learning framework based on information loss mechanism;

希望模型在当前任务的数据集上,仅学习图像中具有重要信息的区域的特征表示,忽略不重要区域的特征,以在有限的模型容量下,保证模型至少能够学习到关键的特征表示;一般地,在训练集上优化神经网络模型时实施InfoDrop机制,在测试集上验证神经网络模型的性能时取消InfoDrop机制,但由于InfoDrop机制会丢弃掉图像中大部分自信息低的区域,造成训练数据集和测试数据集出现更大的分布偏差,从而会影响模型在测试集上的性能;因此,在测试模型前,通常会将去掉InfoDrop机制的模型在训练集上进行第二次优化;然而,第二次优化需要消耗额外的训练时间,并且也会引入图像中不重要的信息区域对模型产生的影响;为了避免第二次优化带来的不利影响,基于自监督学习范式,构建了一种适应于自监督连续学习的信息丢失机制;当在任务

Figure BDA00039265437400000520
上训练模型时,在自监督损失项的基础上引入InfoDrop损失,构造如下带有InfoDrop机制的自监督学习范式:It is hoped that the model will only learn the feature representations of the areas with important information in the image on the dataset of the current task, and ignore the features of the unimportant areas, so as to ensure that the model can at least learn the key feature representations under the limited model capacity; generally, the InfoDrop mechanism is implemented when optimizing the neural network model on the training set, and the InfoDrop mechanism is cancelled when verifying the performance of the neural network model on the test set. However, since the InfoDrop mechanism will discard most of the areas with low self-information in the image, it will cause a greater distribution deviation between the training dataset and the test dataset, which will affect the performance of the model on the test set; therefore, before testing the model, the model with the InfoDrop mechanism removed is usually optimized for the second time on the training set; however, the second optimization requires additional training time and will also introduce the influence of unimportant information areas in the image on the model; in order to avoid the adverse effects of the second optimization, an information loss mechanism suitable for self-supervised continuous learning is constructed based on the self-supervised learning paradigm; when in the task
Figure BDA00039265437400000520
When training the model, InfoDrop loss is introduced on the basis of self-supervised loss term, and the following self-supervised learning paradigm with InfoDrop mechanism is constructed:

Figure BDA0003926543740000061
Figure BDA0003926543740000061

该自监督学习范式包含两项,第一项为原始的自监督损失项,第二项为InfoDrop正则项;其中,

Figure BDA0003926543740000062
为带有InfoDrop机制的模型
Figure BDA0003926543740000063
的输出,记为
Figure BDA0003926543740000064
目.
Figure BDA0003926543740000065
和fΘ共享网络权值;通过最小化InfoDrop正则项,可以使得不带InfoDrop机制的模型fΘ的输出
Figure BDA0003926543740000066
和带有InfoDrop机制的模型
Figure BDA0003926543740000067
的输出
Figure BDA0003926543740000068
相近似,以促使模型fΘ在不采取InfoDrop机制下主动去捕获具有重要信息的区域的特征,忽略不重要的特征;方法框架示意图参见图4The self-supervised learning paradigm consists of two items, the first is the original self-supervised loss term, and the second is the InfoDrop regularization term;
Figure BDA0003926543740000062
For models with InfoDrop mechanism
Figure BDA0003926543740000063
The output of
Figure BDA0003926543740000064
Goal.
Figure BDA0003926543740000065
Share network weights with f Θ ; By minimizing the InfoDrop regularization term, the output of the model f Θ without the InfoDrop mechanism can be
Figure BDA0003926543740000066
and models with InfoDrop mechanism
Figure BDA0003926543740000067
Output
Figure BDA0003926543740000068
The method is similar to that in order to encourage the model f Θ to actively capture the features of the area with important information without taking the InfoDrop mechanism and ignore the unimportant features; see Figure 4 for a schematic diagram of the method framework.

步骤6:(1)按步骤1处理数据集得到多个任务的数据集;(2)按步骤2构建无监督学习模型;(3)按任务到达顺序,在每个任务的训练集上训练模型;Step 6: (1) Process the data set according to step 1 to obtain data sets for multiple tasks; (2) Construct an unsupervised learning model according to step 2; (3) Train the model on the training set of each task according to the order in which the tasks arrive;

步骤7:利用KNN算法评估模型的性能;Step 7: Use KNN algorithm to evaluate the performance of the model;

在任务

Figure BDA0003926543740000069
上使用KNN分类算法对模型fΘ进行准确率测试的步骤:On Task
Figure BDA0003926543740000069
Steps to use the KNN classification algorithm to test the accuracy of the model f Θ :

(1)将任务

Figure BDA00039265437400000610
上的训练集
Figure BDA00039265437400000611
转换为特征库
Figure BDA00039265437400000612
其中vi=fΘ(xi);(1) Task
Figure BDA00039265437400000610
The training set on
Figure BDA00039265437400000611
Convert to Feature Library
Figure BDA00039265437400000612
where v i = f Θ ( x i );

(2)基于特征库,预测任务

Figure BDA00039265437400000613
上的测试集样本
Figure BDA00039265437400000614
的类别
Figure BDA00039265437400000615
(2) Prediction tasks based on feature library
Figure BDA00039265437400000613
The test set samples on
Figure BDA00039265437400000614
Category
Figure BDA00039265437400000615

a)计算测试样本

Figure BDA00039265437400000616
的特征表示
Figure BDA00039265437400000617
与特征库中各个表征的相似性
Figure BDA00039265437400000618
sij=cos(fi,vj);a) Calculate the test sample
Figure BDA00039265437400000616
The feature representation
Figure BDA00039265437400000617
Similarity with each representation in the feature library
Figure BDA00039265437400000618
s ij =cos(f i ,v j );

b)将

Figure BDA00039265437400000619
中前K大的项作为测试样本
Figure BDA00039265437400000620
的K近邻集合
Figure BDA00039265437400000621
采用加权投票法计算测试样本
Figure BDA00039265437400000622
在C个类别上的得分,得分最高的类别即为测试样本的预测分类,测试样本
Figure BDA00039265437400000623
在第j个类别上的得分计算公式如下:b)
Figure BDA00039265437400000619
The first K largest items are used as test samples
Figure BDA00039265437400000620
The K nearest neighbor set
Figure BDA00039265437400000621
The weighted voting method is used to calculate the test samples
Figure BDA00039265437400000622
The score on C categories, the category with the highest score is the predicted classification of the test sample, and the test sample
Figure BDA00039265437400000623
The score calculation formula for the jth category is as follows:

Figure BDA00039265437400000624
Figure BDA00039265437400000624

其中T为温度参数;测试样本

Figure BDA00039265437400000625
的预测类别即为
Figure BDA00039265437400000626
Where T is the temperature parameter; test sample
Figure BDA00039265437400000625
The predicted category is
Figure BDA00039265437400000626

c)计算模型fΘ在任务

Figure BDA00039265437400000627
上的测试准确率:
Figure BDA00039265437400000628
c) Computational model f Θ in task
Figure BDA00039265437400000627
Test accuracy on:
Figure BDA00039265437400000628

步骤8:在每个任务上训练完模型后,利用模型的特征编码器fΘ中的特征提取模块fb来对各个任务上的测试集的图像进行特征表示,然后采用KNN分类算法来评估模型的特征表示的有效性。测试结果参见表1。Step 8: After training the model on each task, the feature extraction module fb in the feature encoder of the model is used to represent the images of the test set on each task, and then the KNN classification algorithm is used to evaluate the effectiveness of the model's feature representation. The test results are shown in Table 1.

本文的创新之处在于:The innovation of this paper lies in:

(1)本发明基于InfoDrop机制建立了一种促进自监督模型在连续任务上提取重要特征的框架。在连续学习任务上,模型由于容量有限,会在保留过去任务的特征表示能力和学习当前任务的特征表示能力之间做权衡。本框架通过丢弃不重要的图像信息,使得模型仅关注对重要的图像信息的特征表示,以缓解模型容量的限制,在不需要引入历史任务的样本或者历史模型的参数信息情况下,提升了自监督模型的性能。(1) Based on the InfoDrop mechanism, the present invention establishes a framework that promotes the self-supervised model to extract important features on continuous tasks. In continuous learning tasks, due to the limited capacity of the model, a trade-off will be made between retaining the feature representation ability of past tasks and learning the feature representation ability of the current task. This framework discards unimportant image information so that the model only focuses on the feature representation of important image information, thereby alleviating the limitation of model capacity and improving the performance of the self-supervised model without introducing samples of historical tasks or parameter information of historical models.

(2)本发明基于自监督损失范式,设计了一种InfoDrop损失项,通过对该损失项进行优化,可以帮助模型在测试阶段去除InfoDrop机制后,拥有直接提取测试样本的重要特征表示的能力,从而避免了对模型的微调。(2) Based on the self-supervised loss paradigm, the present invention designs an InfoDrop loss term. By optimizing the loss term, the model can remove the InfoDrop mechanism during the testing phase and have the ability to directly extract important feature representations of the test samples, thereby avoiding fine-tuning of the model.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明方法的卷积网络块结构图FIG1 is a block diagram of a convolutional network of the method of the present invention.

图2为本发明方法的残差卷积神经网络块结构图FIG. 2 is a block diagram of the residual convolutional neural network of the method of the present invention.

图3为本发明方法的残差卷积神经网络Resnet18结构图Figure 3 is a diagram of the residual convolutional neural network Resnet18 structure of the method of the present invention

图4为本发明方法的框架示意图FIG. 4 is a schematic diagram of the framework of the method of the present invention

具体实施方式DETAILED DESCRIPTION

步骤1:对数据集进行预处理;Step 1: Preprocess the dataset;

下载CIFAR-10数据集(http://www.cs.toronto.edu/~kriz/cifar.html),CIFAR-10数据集包含10个类别的真实世界的彩色图片。每个类别包含5000张训练图片和1000张测试图片,图像分辨率大小为32*32*2。将CIFAR-10数据集划分到5个任务上,每个任务的数据集中包含两个随机类别的图像样本,且每个任务的数据集的图像类别互不重叠;Download the CIFAR-10 dataset (http://www.cs.toronto.edu/~kriz/cifar.html). The CIFAR-10 dataset contains 10 categories of real-world color images. Each category contains 5,000 training images and 1,000 test images, and the image resolution is 32*32*2. Divide the CIFAR-10 dataset into 5 tasks. Each task's dataset contains two random categories of image samples, and the image categories of each task's dataset do not overlap.

步骤2:构建自监督学习模型;Step 2: Build a self-supervised learning model;

自监督学习模型由特征编码器fΘ和特征预测头h两部分组成。特征编码器fΘ由特征提取模块fb和特征投影模块fg级联而成:

Figure BDA0003926543740000074
采用残差卷积神经网络Resnet18构造特征提取模块,它的第一层为卷积神经网络块,第二层到第五层为残差网络块,最后一层为自适应平均池化层;特征投影模块由两层线性层连接而成。特征编码器fΘ的输入为图像
Figure BDA0003926543740000071
输出为图像的特征表示
Figure BDA0003926543740000072
特征预测头h由两层线性层连接而成,它的输入为图像的特征z,输出为图像特征的预测
Figure BDA0003926543740000073
卷积神经网络块结构参见图1,残差卷积神经网络块结构参见图2,残差卷积神经网络Resnet18结构参见图3;The self-supervised learning model consists of two parts: the feature encoder f Θ and the feature prediction head h. The feature encoder f Θ is composed of a cascade of feature extraction module f b and feature projection module f g :
Figure BDA0003926543740000074
The residual convolutional neural network Resnet18 is used to construct the feature extraction module. Its first layer is a convolutional neural network block, the second to fifth layers are residual network blocks, and the last layer is an adaptive average pooling layer; the feature projection module is composed of two linear layers connected. The input of the feature encoder f Θ is the image
Figure BDA0003926543740000071
The output is the feature representation of the image
Figure BDA0003926543740000072
The feature prediction head h is composed of two linear layers connected together. Its input is the image feature z and its output is the prediction of the image feature.
Figure BDA0003926543740000073
See Figure 1 for the convolutional neural network block structure, Figure 2 for the residual convolutional neural network block structure, and Figure 3 for the residual convolutional neural network Resnet18 structure;

步骤3:构建自监督连续学习范式;Step 3: Construct a self-supervised continuous learning paradigm;

自监督连续学习致力于在一系列有序到达的无标签任务

Figure BDA0003926543740000081
上学习图像的特征表示,每个任务上具有不同分布的数据集
Figure BDA00039265437400000817
t=1,...,T。一般地,会从数据集
Figure BDA00039265437400000818
中随机采样得到图像x,然后对它分别采取两次图像变换操作得到两个相关视角的图像x1和x2。利用特征编码器对图像的一个视角x1进行特征编码,得到它的特征z1=f(x1),同理也可以得到另一个视角x2的特征z2=f(x2)。自监督连续学习的目标是在训练的任意时刻τ都能让模型学习到对历史任务{T1,...,Tτ-1}和当前任务Tτ中的图像表示:Self-supervised continuous learning aims to learn a series of unlabeled tasks that arrive in sequence.
Figure BDA0003926543740000081
Learning feature representations for images, with datasets of different distributions for each task
Figure BDA00039265437400000817
t=1,...,T. Generally, we will
Figure BDA00039265437400000818
The image x is randomly sampled from the image, and then two image transformation operations are performed on it to obtain two images of related perspectives x1 and x2 . The feature encoder is used to encode the features of one perspective x1 of the image, and its feature z1 = f( x1 ). Similarly, the feature z2 = f( x2 ) of another perspective x2 can be obtained. The goal of self-supervised continuous learning is to enable the model to learn the image representations in the historical tasks { T1 , ..., Tτ -1 } and the current task at any time τ of training:

Figure BDA0003926543740000082
Figure BDA0003926543740000082

其中,在小批次样本

Figure BDA0003926543740000083
t=1,...,τ上计算损失项
Figure BDA0003926543740000084
的均值,以近似期望算子
Figure BDA0003926543740000085
xi,t表示从数据集
Figure BDA00039265437400000819
上随机采样得到的小批次样本中的第i个样本。损失项
Figure BDA0003926543740000086
为自监督学习损失,这里采用SimSiam中的自监督损失计算公式:Among them, in small batch samples
Figure BDA0003926543740000083
Calculate the loss term on t=1,...,τ
Figure BDA0003926543740000084
The mean of
Figure BDA0003926543740000085
xi ,t represents the
Figure BDA00039265437400000819
The i-th sample in the small batch of samples randomly sampled from above. The loss term
Figure BDA0003926543740000086
For self-supervised learning loss, the self-supervised loss calculation formula in SimSiam is used here:

Figure BDA0003926543740000087
Figure BDA0003926543740000087

Figure BDA0003926543740000088
Figure BDA0003926543740000088

其中

Figure BDA0003926543740000089
是特征编码器对于
Figure BDA00039265437400000810
的输出,
Figure BDA00039265437400000811
是特征预测头关于
Figure BDA00039265437400000812
的特征表示的预测
Figure BDA00039265437400000813
stopgrad(·)表示阻止变量的梯度反向传播。||·||2为二范数算子。in
Figure BDA0003926543740000089
is the feature encoder for
Figure BDA00039265437400000810
The output,
Figure BDA00039265437400000811
is the feature prediction head about
Figure BDA00039265437400000812
The prediction of feature representation
Figure BDA00039265437400000813
stopgrad(·) means stopping the gradient back propagation of the variable. ||·|| 2 is the two-norm operator.

然而,达成自监督学习的目标是具有挑战性的。因为在连续学习设置下,通常假设来自历史任务的数据不可用,即要求在不可访问数据集

Figure BDA00039265437400000820
t=1,...,τ-1的同时,求解得到模型在数据集
Figure BDA00039265437400000821
t=1,...,τ上的最佳参数Θ*。因此需要引入一些连续学习策略来帮助模型在学习当前任务的同时,保持它在历史任务上的性能。However, achieving the goal of self-supervised learning is challenging because in the continuous learning setting, it is usually assumed that data from historical tasks are unavailable, which requires
Figure BDA00039265437400000820
At t=1,...,τ-1, solve the model in the data set
Figure BDA00039265437400000821
The optimal parameter Θ * on t = 1, ..., τ. Therefore, it is necessary to introduce some continuous learning strategies to help the model maintain its performance on historical tasks while learning the current task.

步骤4:建立信息丢失机制Step 4: Establish information loss mechanism

引入了InfoDrop机制——一种基于信息的Dropout方法,以帮助连续学习模型丢弃图像中不重要的特征,仅保留重要的特征。如果当神经元输入的图像patch中包含较少的信息,Infodrop机制会以较高的概率将该神经元的输出置零,否则保留它的输出。具体来说,在Boltzmann分布下计算神经网络中第

Figure BDA00039265437400000814
层中的第c个通道的第j个神经元的输出
Figure BDA00039265437400000815
的丢弃系数:The InfoDrop mechanism, an information-based Dropout method, is introduced to help continuous learning models discard unimportant features in the image and retain only important features. If the image patch input to a neuron contains less information, the InfoDrop mechanism will set the output of the neuron to zero with a higher probability, otherwise it will retain its output. Specifically, the Boltzmann distribution is used to calculate the first
Figure BDA00039265437400000814
The output of the jth neuron in the cth channel in the layer
Figure BDA00039265437400000815
The discard coefficient is:

Figure BDA00039265437400000816
Figure BDA00039265437400000816

其中,

Figure BDA0003926543740000091
是神经网络中第
Figure BDA0003926543740000092
层中的第c个通道的第j个神经元的输入patch。
Figure BDA0003926543740000093
定义为自信息,当神经元的输入patch中的自信息比较低时,该神经元的输出会以较大的概率被丢弃,即促使神经网络减少对图像中的低信息区域的关注。T为温度系数,是InfoDrop机制的一个“软阈值”,当T变小的时候,即阈值降低,大部分的patch将被保留,只有极少的自信息低的patch会被丢去;当T变成无穷大的时候,即阈值变高,InfoDrop机制将退化成常规的Dropout机制,所有的patch将会被等概率地丢弃。
Figure BDA0003926543740000094
Figure BDA0003926543740000095
的概率分布。in,
Figure BDA0003926543740000091
The neural network
Figure BDA0003926543740000092
The input patch of the jth neuron of the cth channel in the layer.
Figure BDA0003926543740000093
Defined as self-information, when the self-information in the input patch of a neuron is relatively low, the output of the neuron will be discarded with a greater probability, which prompts the neural network to pay less attention to the low-information areas in the image. T is the temperature coefficient, which is a "soft threshold" of the InfoDrop mechanism. When T becomes smaller, that is, the threshold is lowered, most patches will be retained, and only a few patches with low self-information will be discarded; when T becomes infinite, that is, the threshold becomes higher, the InfoDrop mechanism will degenerate into a conventional Dropout mechanism, and all patches will be discarded with equal probability.
Figure BDA0003926543740000094
for
Figure BDA0003926543740000095
The probability distribution of .

为了近似分布

Figure BDA0003926543740000096
InfoDrop机制假设
Figure BDA0003926543740000097
的邻域
Figure BDA0003926543740000098
的patch都采样来自于分布
Figure BDA0003926543740000099
Figure BDA00039265437400000910
和它邻域内的patch的模式重复时,会造成较高的
Figure BDA00039265437400000911
并因此得到一个低的自信息。定义分布
Figure BDA00039265437400000912
的估计为:To approximate the distribution
Figure BDA0003926543740000096
InfoDrop mechanism assumptions
Figure BDA0003926543740000097
Neighborhood
Figure BDA0003926543740000098
The patches are sampled from the distribution
Figure BDA0003926543740000099
when
Figure BDA00039265437400000910
When the pattern of the patch in its neighborhood is repeated, it will cause a higher
Figure BDA00039265437400000911
And thus get a low self-information. Define the distribution
Figure BDA00039265437400000912
The estimate is:

Figure BDA00039265437400000913
Figure BDA00039265437400000913

Figure BDA00039265437400000914
Figure BDA00039265437400000914

其中,R表示

Figure BDA00039265437400000915
领域的曼哈顿半径,||·||表示欧式距离,h是带宽,b为带宽。从
Figure BDA00039265437400000916
的计算公式可以观察得出,当
Figure BDA00039265437400000917
和它邻域
Figure BDA00039265437400000918
内的patch越不相同,那么它就会包含更多的自信息,即
Figure BDA00039265437400000919
将会以更低的概率被置零。Among them, R represents
Figure BDA00039265437400000915
The Manhattan radius of the domain, ||·|| represents the Euclidean distance, h is the bandwidth, and b is the bandwidth.
Figure BDA00039265437400000916
The calculation formula can be observed that when
Figure BDA00039265437400000917
and its neighbors
Figure BDA00039265437400000918
The more different the patches in a, the more self-information it contains, i.e.
Figure BDA00039265437400000919
will be set to zero with a lower probability.

步骤5:构建基于信息丢失机制的自监督连续学习框架;Step 5: Construct a self-supervised continuous learning framework based on information loss mechanism;

希望模型在当前任务的数据集上,仅学习图像中具有重要信息的区域的特征表示,忽略不重要区域的特征,以在有限的模型容量下,保证模型至少能够学习到关键的特征表示。一般地,在训练集上优化神经网络模型时实施InfoDrop机制,在测试集上验证神经网络模型的性能时取消InfoDrop机制,但由于InfoDrop机制会丢弃掉图像中大部分自信息低的区域,造成训练数据集和测试数据集出现更大的分布偏差,从而会影响模型在测试集上的性能。因此,在测试模型前,通常会将去掉InfoDrop机制的模型在训练集上进行第二次优化。然而,第二次优化需要消耗额外的训练时间,并且也会引入图像中不重要的信息区域对模型产生的影响。为了避免第二次优化带来的不利影响,基于自监督学习范式,构建了一种适应于自监督连续学习的信息丢失机制。当在任务

Figure BDA00039265437400000920
上训练模型时,在自监督损失项的基础上引入InfoDrop损失,构造如下带有InfoDrop机制的自监督学习范式:It is hoped that the model will only learn the feature representations of areas with important information in the image on the dataset of the current task, and ignore the features of unimportant areas, so as to ensure that the model can at least learn the key feature representations under the limited model capacity. Generally, the InfoDrop mechanism is implemented when optimizing the neural network model on the training set, and the InfoDrop mechanism is canceled when verifying the performance of the neural network model on the test set. However, since the InfoDrop mechanism will discard most of the areas with low self-information in the image, it will cause a larger distribution deviation between the training dataset and the test dataset, which will affect the performance of the model on the test set. Therefore, before testing the model, the model with the InfoDrop mechanism removed is usually optimized a second time on the training set. However, the second optimization requires additional training time and will also introduce the influence of unimportant information areas in the image on the model. In order to avoid the adverse effects of the second optimization, an information loss mechanism suitable for self-supervised continuous learning is constructed based on the self-supervised learning paradigm. When in the task
Figure BDA00039265437400000920
When training the model, InfoDrop loss is introduced on the basis of self-supervised loss term, and the following self-supervised learning paradigm with InfoDrop mechanism is constructed:

Figure BDA0003926543740000101
Figure BDA0003926543740000101

该自监督学习范式包含两项,第一项为原始的自监督损失项,第二项为InfoDrop正则项。其中,

Figure BDA0003926543740000102
为带有InfoDrop机制的模型
Figure BDA0003926543740000103
的输出,记为
Figure BDA0003926543740000104
目.
Figure BDA0003926543740000105
和fΘ共享网络权值。通过最小化InfoDrop正则项,可以使得不带InfoDrop机制的模型fΘ的输出
Figure BDA0003926543740000106
和带有InfoDrop机制的模型
Figure BDA0003926543740000107
的输出
Figure BDA0003926543740000108
相近似,以促使模型fΘ在不采取InfoDrop机制下主动去捕获具有重要信息的区域的特征,忽略不重要的特征。方法框架示意图参见图4This self-supervised learning paradigm consists of two items, the first is the original self-supervised loss term, and the second is the InfoDrop regularization term.
Figure BDA0003926543740000102
For models with InfoDrop mechanism
Figure BDA0003926543740000103
The output of
Figure BDA0003926543740000104
Goal.
Figure BDA0003926543740000105
Share network weights with f Θ . By minimizing the InfoDrop regularization term, the output of the model f Θ without the InfoDrop mechanism can be
Figure BDA0003926543740000106
and models with InfoDrop mechanism
Figure BDA0003926543740000107
Output
Figure BDA0003926543740000108
The method is similar to that of the previous one, so as to encourage the model f Θ to actively capture the features of the area with important information without taking the InfoDrop mechanism and ignore the unimportant features. See Figure 4 for a schematic diagram of the method framework.

步骤6:按步骤1处理数据集得到多个任务的数据集;按步骤2构建无监督学习模型,按任务到达顺序,在每个任务的训练集上训练模型。Step 6: Process the data set according to step 1 to obtain data sets for multiple tasks; build an unsupervised learning model according to step 2, and train the model on the training set of each task according to the order in which the tasks arrive.

步骤7:利用KNN算法评估模型的性能;Step 7: Use KNN algorithm to evaluate the performance of the model;

在任务

Figure BDA0003926543740000109
上使用KNN分类算法对模型fΘ进行准确率测试的步骤:On Task
Figure BDA0003926543740000109
Steps to use the KNN classification algorithm to test the accuracy of the model f Θ :

(1)将任务

Figure BDA00039265437400001010
上的训练集
Figure BDA00039265437400001011
转换为特征库
Figure BDA00039265437400001012
其中vi=fb(xi);(1) Task
Figure BDA00039265437400001010
The training set on
Figure BDA00039265437400001011
Convert to Feature Library
Figure BDA00039265437400001012
where v i = f b ( x i );

(2)基于特征库,预测任务

Figure BDA00039265437400001013
上的测试集样本
Figure BDA00039265437400001014
的类别
Figure BDA00039265437400001015
(2) Prediction tasks based on feature library
Figure BDA00039265437400001013
The test set samples on
Figure BDA00039265437400001014
Category
Figure BDA00039265437400001015

a)计算测试样本

Figure BDA00039265437400001016
的特征表示
Figure BDA00039265437400001017
与特征库中各个表征的相似性
Figure BDA00039265437400001018
sij=cos(fi,vj);a) Calculate the test sample
Figure BDA00039265437400001016
The feature representation
Figure BDA00039265437400001017
Similarity with each representation in the feature library
Figure BDA00039265437400001018
s ij =cos(f i ,v j );

b)将

Figure BDA00039265437400001019
中前K大的项作为测试样本
Figure BDA00039265437400001020
的K近邻集合
Figure BDA00039265437400001021
采用加权投票法计算测试样本
Figure BDA00039265437400001022
在C个类别上的得分,得分最高的类别即为测试样本的预测分类,测试样本
Figure BDA00039265437400001023
在第j个类别上的得分计算公式如下:b)
Figure BDA00039265437400001019
The first K largest items are used as test samples
Figure BDA00039265437400001020
The K nearest neighbor set
Figure BDA00039265437400001021
The weighted voting method is used to calculate the test samples
Figure BDA00039265437400001022
The score on C categories, the category with the highest score is the predicted classification of the test sample, and the test sample
Figure BDA00039265437400001023
The score calculation formula for the jth category is as follows:

Figure BDA00039265437400001024
Figure BDA00039265437400001024

其中T为温度参数。测试样本

Figure BDA00039265437400001025
的预测类别即为
Figure BDA00039265437400001026
Where T is the temperature parameter. Test sample
Figure BDA00039265437400001025
The predicted category is
Figure BDA00039265437400001026

c)计算模型fΘ在任务

Figure BDA00039265437400001027
上的测试准确率:
Figure BDA00039265437400001028
c) Computational model f Θ in task
Figure BDA00039265437400001027
Test accuracy on:
Figure BDA00039265437400001028

步骤8:在每个任务上训练完模型后,利用模型的特征编码器fΘ中的特征提取模块fb来对各个任务上的测试集的图像进行特征表示,然后采用KNN分类算法来评估模型的特征表示的有效性。测试结果参见表1。本发明在FINETUNE,DER,SI,LUMP,CASSLE这5种典型的连续学习策略上验证了基于信息丢失机制的自监督连续学习框架的优越性。从表l可见,本发明提出的自监督连续学习框架能够显著的缓解灾难性遗忘现象,并提升模型在各个任务上的准确率。Step 8: After training the model on each task, the feature extraction module fb in the feature encoder of the model is used to perform feature representation on the images of the test set on each task, and then the KNN classification algorithm is used to evaluate the effectiveness of the feature representation of the model. The test results are shown in Table 1. The present invention verifies the superiority of the self-supervised continuous learning framework based on the information loss mechanism on five typical continuous learning strategies: FINETUNE, DER, SI, LUMP, and CASSLE. As can be seen from Table 1, the self-supervised continuous learning framework proposed in the present invention can significantly alleviate the catastrophic forgetting phenomenon and improve the accuracy of the model on each task.

图片大小:32*32*3Image size: 32*32*3

图片的类别有:飞机、汽车、鸟类、猫、鹿、狗、蛙类、马、船和卡车。The categories of images are: airplanes, cars, birds, cats, deer, dogs, frogs, horses, boats, and trucks.

学习率:0.003Learning rate: 0.003

训练批次大小N:256Training batch size N: 256

迭代次数:200Iterations: 200

表1为本发明方法的实验结果图。Table 1 is a graph showing the experimental results of the method of the present invention.

Figure BDA0003926543740000111
Figure BDA0003926543740000111

Claims (1)

1. An image feature continuous extraction method based on an information loss mechanism comprises the following steps:
step 1: preprocessing the data set;
acquiring real world object images, labeling the real images according to the types of objects in the real images, normalizing pixel values of all pictures, scaling and cutting the pictures, and dividing the images into a plurality of data sets, wherein each data set comprises different types of images;
step 2: constructing an automatic supervision learning model;
self-supervised learning model feature encoder f Θ And a characteristic measuring head h; feature encoder f Θ By the feature extraction module f b And a feature projection module f g Is formed by cascading:
Figure FDA00039265437300000114
the feature extraction module is constructed by adopting a residual convolutional neural network Resnet18, the first layer of the feature extraction module is a convolutional neural network block, the second layer to the fifth layer are residual network blocks, and the last layer is an adaptive average pooling layer(ii) a The characteristic projection module is formed by connecting two layers of linear layers; feature encoder f Θ Is inputted as an image
Figure FDA0003926543730000011
The output is a characteristic representation of the image->
Figure FDA0003926543730000012
The characteristic prediction head h is formed by connecting two layers of linear layers, the input of the characteristic prediction head h is the characteristic z of an image, and the output of the characteristic prediction head h is the prediction of the image characteristic>
Figure FDA0003926543730000013
And step 3: constructing a self-supervision continuous learning paradigm;
self-supervised continuous learning addresses unlabeled tasks in a series of ordered arrivals
Figure FDA0003926543730000014
Feature representation of the upper learning image with a different distribution of data sets ≥ on each task>
Figure FDA0003926543730000015
Generally, an image x is randomly sampled from a data set, and then two image transformation operations are respectively performed on the image x to obtain images x of two related view angles 1 And x 2 (ii) a One view x of an image using a feature encoder 1 Performing feature encoding to obtain its feature z 1 =f(x 1 ) Similarly, another view x can be obtained 2 Characteristic z of 2 =f(x 2 ) (ii) a The goal of self-supervised continuous learning is to allow the model to learn about the historical task T at any time τ in the training 1 ,...,T τ-1 And the current task T τ The image of (1) represents:
Figure FDA0003926543730000016
wherein in small batches of samples
Figure FDA0003926543730000017
Up-count loss term->
Figure FDA0003926543730000018
To approximate the desired operator
Figure FDA0003926543730000019
x i,t Represents slave data set->
Figure FDA00039265437300000110
Sampling an ith sample in the small batch of samples obtained by up-random sampling; loss term->
Figure FDA00039265437300000111
For the purpose of self-supervised learning loss, the self-supervised loss calculation formula in simsim is used here:
Figure FDA00039265437300000112
Figure FDA00039265437300000113
wherein
Figure FDA0003926543730000021
Is that the feature encoder is for->
Figure FDA0003926543730000022
Is greater than or equal to>
Figure FDA0003926543730000023
Is that the characteristic prediction header relates to>
Figure FDA0003926543730000024
Prediction of feature representation of
Figure FDA0003926543730000025
Stopgrad (. Cndot.) denotes stopping the gradient back propagation of the variable; i | · | purple wind 2 Is a two-norm operator;
however, achieving the goal of self-supervised learning is challenging; since in a continuous learning setting it is usually assumed that data from historical tasks is not available, i.e. required in inaccessible data sets
Figure FDA0003926543730000026
While solving for the model at the data set->
Figure FDA0003926543730000027
Optimum parameter theta of (2) * (ii) a Therefore, some continuous learning strategies need to be introduced to help the model to keep its performance on the historical task while learning the current task;
and 4, step 4: establishing an information loss mechanism
An InfoDrop mechanism, an information-based Dropout method, is introduced to help a continuous learning model discard unimportant features in an image and only keep the important features; if the image patch input by the neuron contains less information, the Infodrop mechanism can set the output of the neuron to zero with higher probability, otherwise, the output of the neuron is kept; specifically, the first in the neural network is calculated under Boltzmann distribution
Figure FDA0003926543730000028
The output of the jth neuron of the c-th channel in the layer->
Figure FDA0003926543730000029
The discarding coefficient of (c):
Figure FDA00039265437300000210
wherein,
Figure FDA00039265437300000211
is the ^ th or greater in the neural network>
Figure FDA00039265437300000212
Input patch for jth neuron of the c-th channel in the layer;
Figure FDA00039265437300000213
When the self-information in the input patch of the neuron is lower, the output of the neuron is discarded with higher probability, namely, the neural network is prompted to reduce the attention to the low-information area in the image; t is a temperature coefficient and is a 'soft threshold' of an InfoDrap mechanism, when T becomes small, namely the threshold is reduced, most of the patch is reserved, and only few patches with low self-information are lost; when T becomes infinite, i.e., the threshold becomes high, the InfoDrop mechanism will degenerate to the conventional Dropout mechanism and all the latches will be discarded with equal probability;
Figure FDA00039265437300000214
Is->
Figure FDA00039265437300000215
A probability distribution of (a);
to approximate distribution
Figure FDA00039265437300000216
InfoDrap mechanism hypothesis->
Figure FDA00039265437300000217
Is greater than or equal to>
Figure FDA00039265437300000218
All samples of (1) are from minutesCloth/device>
Figure FDA00039265437300000219
When/is>
Figure FDA00039265437300000220
Repeating the pattern of patch in its vicinity results in a higher ≧ greater>
Figure FDA00039265437300000221
And therefore a low self-information; define a distribution->
Figure FDA00039265437300000222
The estimation of (d) is:
Figure FDA00039265437300000223
Figure FDA00039265437300000224
wherein R represents
Figure FDA0003926543730000031
The manhattan radius of the field, | | · | |, represents the euclidean distance, h is the bandwidth, b is the bandwidth; from
Figure FDA0003926543730000032
Can be observed when->
Figure FDA0003926543730000033
And its neighborhood>
Figure FDA0003926543730000034
The more diverse the patch within, it contains more self-information, i.e. </>>
Figure FDA0003926543730000035
Will be zeroed with lower probability;
and 5: constructing an automatic supervision continuous learning framework based on an information loss mechanism;
the method comprises the steps that a model is expected to learn feature representations of regions with important information in an image on a data set of a current task, and features of unimportant regions are ignored, so that the model can be guaranteed to be capable of learning at least key feature representations under the condition of limited model capacity; generally, an InfoDrop mechanism is implemented when a neural network model is optimized on a training set, and the InfoDrop mechanism is cancelled when the performance of the neural network model is verified on a test set, but as the InfoDrop mechanism discards most of areas with low self-information in an image, larger distribution deviation occurs in the training data set and the test data set, and the performance of the model on the test set is influenced; therefore, before testing the model, the model with the InfoDrop mechanism removed is usually optimized for the second time on the training set; however, the second optimization consumes additional training time and also introduces the effect of unimportant information areas in the image on the model; in order to avoid adverse effects caused by second optimization, an information loss mechanism adaptive to self-supervision continuous learning is constructed on the basis of a self-supervision learning model; when in task
Figure FDA0003926543730000036
When the model is trained, infoDrap loss is introduced on the basis of an auto-supervised loss term, and the following auto-supervised learning paradigm with an InfoDrap mechanism is constructed:
Figure FDA0003926543730000037
the self-supervision learning paradigm comprises two terms, wherein the first term is an original self-supervision loss term, and the second term is an InfoDrop regular term; wherein,
Figure FDA0003926543730000038
for models with an InfoDrap mechanism>
Figure FDA0003926543730000039
Is recorded as &>
Figure FDA00039265437300000310
Eyes->
Figure FDA00039265437300000311
And f Θ Sharing the network weight; by minimizing the InfoDrop regular term, model f without the InfoDrop mechanism can be made Θ Is greater or less than>
Figure FDA00039265437300000312
And a model with an InfoDrap mechanism @>
Figure FDA00039265437300000313
Is greater or less than>
Figure FDA00039265437300000314
Approximation to promote model f Θ Actively capturing the characteristics of the area with important information without adopting an InfoDrap mechanism, and ignoring unimportant characteristics;
step 6: (1) Processing the data set according to the step 1 to obtain data sets of a plurality of tasks; (2) constructing an unsupervised learning model according to the step 2; (3) Training a model on a training set of each task according to the arrival sequence of the tasks;
and 7: evaluating the performance of the model by using a KNN algorithm;
at task
Figure FDA00039265437300000315
On the model f by using KNN classification algorithm Θ And (3) testing accuracy:
(1) Will task
Figure FDA00039265437300000316
On training set>
Figure FDA00039265437300000317
Switch to the feature bank>
Figure FDA00039265437300000318
Wherein v is i =f Θ (x i );
(2) Predicting tasks based on a feature library
Figure FDA00039265437300000319
Test set sample on->
Figure FDA00039265437300000320
Is greater than or equal to>
Figure FDA00039265437300000321
a) Calculating test samples
Figure FDA0003926543730000041
Is characteristic of->
Figure FDA0003926543730000042
Similarity to individual signatures in the feature library->
Figure FDA0003926543730000043
s ij =cos(f i ,v j );
b) Will be provided with
Figure FDA0003926543730000044
Item preceding K big as test sample>
Figure FDA0003926543730000045
K neighbor set of>
Figure FDA0003926543730000046
Calculating a test sample->
Figure FDA0003926543730000047
Scores in C categories, the category with the highest score being the predicted category of the test sample, test sample->
Figure FDA0003926543730000048
The score calculation formula on the jth category is as follows:
Figure FDA0003926543730000049
wherein T is a temperature parameter; test specimen
Figure FDA00039265437300000410
Is determined as being->
Figure FDA00039265437300000411
c) Calculation model f Θ At task
Figure FDA00039265437300000412
Test accuracy of (1):
Figure FDA00039265437300000413
And 8: after the model is trained on each task, the feature encoder f of the model is used Θ Feature extraction module f in (1) b To characterize the images of the test set on each task and then evaluate the validity of the characterization of the model using a KNN classification algorithm.
CN202211375805.5A 2022-11-04 2022-11-04 A self-supervised continuous learning method based on information loss mechanism Active CN115952851B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211375805.5A CN115952851B (en) 2022-11-04 2022-11-04 A self-supervised continuous learning method based on information loss mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211375805.5A CN115952851B (en) 2022-11-04 2022-11-04 A self-supervised continuous learning method based on information loss mechanism

Publications (2)

Publication Number Publication Date
CN115952851A true CN115952851A (en) 2023-04-11
CN115952851B CN115952851B (en) 2024-10-01

Family

ID=87288106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211375805.5A Active CN115952851B (en) 2022-11-04 2022-11-04 A self-supervised continuous learning method based on information loss mechanism

Country Status (1)

Country Link
CN (1) CN115952851B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109690576A (en) * 2016-07-18 2019-04-26 渊慧科技有限公司 Train machine learning models on multiple machine learning tasks
CN114612847A (en) * 2022-03-31 2022-06-10 长沙理工大学 Method and system for detecting distortion of Deepfake video
CN114758195A (en) * 2022-05-10 2022-07-15 西安交通大学 Human motion prediction method capable of realizing continuous learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109690576A (en) * 2016-07-18 2019-04-26 渊慧科技有限公司 Train machine learning models on multiple machine learning tasks
CN114612847A (en) * 2022-03-31 2022-06-10 长沙理工大学 Method and system for detecting distortion of Deepfake video
CN114758195A (en) * 2022-05-10 2022-07-15 西安交通大学 Human motion prediction method capable of realizing continuous learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALESSANDRO ACHILLE 等: "Information Dropout: Learning Optimal Representations Through Noisy Computation", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》, 31 December 2018 (2018-12-31), pages 2897 - 2905, XP011698769, DOI: 10.1109/TPAMI.2017.2784440 *
莫建文 等: "基于神经元正则和资源释放的增量学习", 《华南理工大学学报(自然科学版)》, vol. 50, no. 6, 30 June 2022 (2022-06-30), pages 71 - 80 *

Also Published As

Publication number Publication date
CN115952851B (en) 2024-10-01

Similar Documents

Publication Publication Date Title
CN114492574B (en) Unsupervised adversarial domain adaptation image classification method based on pseudo-label loss of Gaussian uniform mixture model
CN113705769B (en) Neural network training method and device
Ghosh et al. Structured variational learning of Bayesian neural networks with horseshoe priors
CN109766992B (en) Anomaly detection and attack classification method for industrial control based on deep learning
US20200097818A1 (en) Method and system for training binary quantized weight and activation function for deep neural networks
Dong et al. Research and discussion on image recognition and classification algorithm based on deep learning
CN114241282A (en) Knowledge distillation-based edge equipment scene identification method and device
WO2021218470A1 (en) Neural network optimization method and device
CN112464743B (en) A small-sample object detection method based on multi-scale feature weighting
CN111079847A (en) Remote sensing image automatic labeling method based on deep learning
CN117070741B (en) Control method and system of pickling line
CN116012880A (en) Pedestrian re-identification method, system and device for distributed edge collaborative reasoning
CN117079017A (en) Credible small sample image identification and classification method
CN116681128A (en) A neural network model training method and device for noisy multi-label data
CN114612450A (en) Image detection segmentation method and system based on data augmentation machine vision and electronic equipment
CN114048843A (en) Small sample learning network based on selective feature migration
CN119151869A (en) Intelligent industrial quality inspection method and system based on AI visual recognition
CN118628717A (en) UAV swarm target detection method, system, electronic equipment, medium and product
CN117154256B (en) Electrochemical repair method of lithium battery
Yang et al. NAM net: meta-network with normalization-based attention for few-shot learning
CN116630816B (en) SAR target recognition method, device, equipment and medium based on prototype comparison learning
CN115952851B (en) A self-supervised continuous learning method based on information loss mechanism
CN117853596A (en) Unmanned aerial vehicle remote sensing mapping method and system
CN117705059A (en) Positioning method and system for remote sensing mapping image of natural resource
Połap et al. Meta-heuristic algorithm as feature selector for convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant