CN113657561A

CN113657561A - Semi-supervised night image classification method based on multi-task decoupling learning

Info

Publication number: CN113657561A
Application number: CN202111220897.5A
Authority: CN
Inventors: 章依依; 郑影; 朱亚光; 徐晓刚; 王军; 虞舒敏
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2021-11-16
Anticipated expiration: 2041-10-20
Also published as: CN113657561B

Abstract

The invention discloses a semi-supervised nighttime image classification method based on multi-task decoupling learning. The daytime labeled samples and the nighttime unlabeled samples are input into a feature extraction network together, and the feature vector extracted from the daytime samples is input into a classification network head , using the cross-entropy loss function for supervision; the feature vector extracted from the night samples is firstly input into the classification network head to obtain pseudo-labels, and then constructs positive and negative sample pairs according to the pseudo-labels and then inputs them into the self-supervised network head, and uses the angle contrast loss function for supervision training; After the multi-task training of the model is completed, a small number of labeled samples in the night data set are input into the feature extraction network and the classification network head, and the iterative self-distillation learning is performed, and finally the night data set can be effectively classified.

Description

A semi-supervised nighttime image classification method based on multi-task decoupling learning

技术领域technical field

本发明涉及计算机视觉识别技术领域中的多任务学习，尤其是涉及一种基于多任务解耦学习的半监督夜间图像分类方法。The invention relates to multi-task learning in the technical field of computer vision recognition, in particular to a semi-supervised night image classification method based on multi-task decoupling learning.

背景技术Background technique

领域迁移是计算机视觉中一个亟待解决的问题，在该问题的定义中，源域和目标域的任务相同，数据不同但相关。这类学习的核心任务是解决两个域数据分布的差异问题。目前通用图像识别算法是在有监督的数据集上训练而成，其在类似分布的图像上已达到较高的性能。然而当迁移到其他目标域的图像时，性能往往会出现极具下降，这是源域和目标域之间的数据分布差异造成的。比如当基于白天数据集训练的网络预测夜间图像时，识别的效果往往会出现大幅降低。Domain transfer is an urgent problem in computer vision, in which the definition of the problem is that the tasks of the source and target domains are the same, and the data are different but related. The core task of this type of learning is to solve the problem of differences in the data distributions of the two domains. Current general image recognition algorithms are trained on supervised datasets and have achieved high performance on images with similar distributions. However, when migrating to images from other target domains, the performance tends to drop dramatically, which is caused by the difference in data distribution between the source and target domains. For example, when a network trained on a daytime dataset predicts nighttime images, the recognition performance tends to drop dramatically.

众所周知，目前存在大量开源的白天图像分类数据集，如PASCAL VOC，但是带标签的夜间图像分类数据集却十分缺乏。因此，我们希望利用白天图像的数据集训练网络，并使该网络可以有效迁移到夜间图像分类上，从而提高夜间图像分类的性能。As we all know, there are a large number of open-source daytime image classification datasets, such as PASCAL VOC, but labeled nighttime image classification datasets are very scarce. Therefore, we hope to train the network with a dataset of daytime images and enable the network to efficiently transfer to nighttime image classification, thereby improving the performance of nighttime image classification.

自监督学习主要是利用辅助任务从大规模的无监督数据中挖掘自身的监督信息，通过这种构造的监督信息对网络进行训练，从而学习到对下游任务有价值的表征。这种学习方法被证明可以捕捉到图像的判别性特征，对于缺乏标签数据的任务来说是一个有效的解决方法。对大量无标签的夜间图像进行自监督学习，可以使网络学习到夜间图像的特征分布，从而提高夜间图像分类的准确率。Self-supervised learning mainly uses auxiliary tasks to mine its own supervision information from large-scale unsupervised data, and trains the network through this constructed supervision information, thereby learning valuable representations for downstream tasks. This learning method is proven to capture discriminative features of images and is an effective solution for tasks lacking labeled data. Self-supervised learning on a large number of unlabeled nighttime images enables the network to learn the feature distribution of nighttime images, thereby improving the accuracy of nighttime image classification.

因此，通过将夜间图像分类的任务解耦为白天图像的有监督分类任务和夜间图像的自监督任务，并将两个任务进行多任务学习，可以使模型既具备提取各类判别性特征的能力，又能适应夜间图像的数据分布。然而多任务学习中，各个任务之间存在竞争关系，如何使两个任务相互促进，而不是相互制约，需要设计有效的损失函数。Therefore, by decoupling the task of nighttime image classification into a supervised classification task of daytime images and a self-supervised task of nighttime images, and performing multi-task learning on the two tasks, the model can not only have the ability to extract various discriminative features , and can also adapt to the data distribution of nighttime images. However, in multi-task learning, there is a competitive relationship between each task. How to make the two tasks promote each other instead of restricting each other requires the design of an effective loss function.

近年来，知识蒸馏成为一个热门的话题。知识蒸馏通过引入与教师网络相关的软目标作为损失的一部分，以诱导学生网络的训练，从而实现知识迁移。自蒸馏的定义，是自己向自己学习，以与自己相关的软目标，诱导下一代网络的训练。这种方法通常可以增强网络的鲁棒性，避免过拟合，因此可适用于进一步提升模型在夜间图像的性能。Knowledge distillation has become a hot topic in recent years. Knowledge distillation achieves knowledge transfer by introducing soft targets related to the teacher network as part of the loss to induce the training of the student network. The definition of self-distillation is to learn from oneself to induce the training of the next generation network with soft goals related to oneself. This approach usually enhances the robustness of the network and avoids overfitting, so it can be applied to further improve the performance of the model on nighttime images.

发明内容SUMMARY OF THE INVENTION

为解决现有技术的不足，实现提高夜间图像识别性能的目的，本发明采用如下的技术方案：In order to solve the deficiencies of the prior art and achieve the purpose of improving the performance of image recognition at night, the present invention adopts the following technical solutions:

一种基于多任务解耦学习的半监督夜间图像分类方法，包括如下步骤：A semi-supervised nighttime image classification method based on multi-task decoupling learning, including the following steps:

S1，构建带标签的白天图像分类数据集D；构建夜间图像分类数据集A，其中夜间图像只有部分样本带有标签，其余样本无类别标签；S1, construct a labeled daytime image classification dataset D; construct a nighttime image classification dataset A, in which only some samples of the nighttime images have labels, and the rest of the samples have no class labels;

S2，将白天图像数据集中有标签的样本与夜间图像数据集中无标签的样本，一同输入特征提取网络，输出白天图像特征向量和夜间图像特征向量；所述特征提取网络为深度残差卷积网络；S2, the labeled samples in the daytime image data set and the unlabeled samples in the nighttime image data set are input into the feature extraction network together, and the daytime image feature vector and the nighttime image feature vector are output; the feature extraction network is a deep residual convolution network. ;

S3，在特征提取网络层后接入一个多任务学习网络，该网络由一个有监督的分类网络头和一个自监督网络头构成；S3, access a multi-task learning network after the feature extraction network layer, the network consists of a supervised classification network head and a self-supervised network head;

S4，对于白天图像特征向量，通过分类网络头进行

损失监督训练；对于夜间图像特征向量，通过同一分类网络头预测其类别作为伪标签，并根据伪标签构造夜间图像正负样本对；分类网络头由一个全局平均池化层和全连接层构成； S4, for the daytime image feature vector, through the classification network head

Loss-supervised training; for nighttime image feature vectors, the same classification network head predicts its category as a pseudo-label, and constructs nighttime image positive and negative sample pairs according to the pseudo-label; the classification network head consists of a global average pooling layer and a fully connected layer;

S5，自监督网络头根据分类网络头的权重参数，对夜间图像正负样本对进行归一化操作，得到归一化后的特征向量，并采用对比损失

指导特征空间的学习，使正样本相似，负样本有效区分； S5, the self-supervised network head normalizes the positive and negative samples of the night image according to the weight parameters of the classification network head to obtain the normalized feature vector, and uses the contrast loss

Guide the learning of the feature space, so that the positive samples are similar and the negative samples are effectively distinguished;

S6，将所述损失监督训练与所述对比损失进行共同监督训练；S6, performing co-supervised training on the loss supervision training and the contrast loss;

S7，将夜间图像数据集中有标签的样本，输入训练完成的特征提取网络与分类网络头，固定特征提取网络的权重，通过分类网络头进行

损失监督训练，使分类网络头适应夜间图像的特征分布；进入自蒸馏学习阶段，进行多次迭代更新，利用前一次

损失监督训练的分类预测结果作为软目标，与真实标签一同参与监督； S7, input the labeled samples in the night image data set into the trained feature extraction network and the classification network head, fix the weight of the feature extraction network, and carry out the classification network head.

Loss-supervised training to adapt the classification network head to the feature distribution of night images; enter the self-distillation learning stage, perform multiple iterative updates, and use the previous

The classification prediction result of loss supervision training is used as a soft target to participate in supervision together with the real label;

S8，在推理阶段，将待测夜间图像输入所述训练完成的特征提取网络与分类网络头，输出图像分类结果。S8, in the inference stage, input the nighttime image to be tested into the trained feature extraction network and classification network head, and output the image classification result.

进一步地，所述S4中，将白天图像特征向量输入分类网络头，输出白天样本类别，通过交叉熵损失函数进行监督：Further, in the S4, the daytime image feature vector is input into the classification network head, the daytime sample category is output, and the cross-entropy loss function is used to supervise:

其中，N表示白天图像数据集中有标签的样本总个数，y ⁱ表示第i个样本的真实标签，

表示第i个样本的类别预测概率值。 Among them, N represents the total number of labeled samples in the daytime image dataset, y ⁱ represents the true label of the ith sample,

Represents the class prediction probability value of the ith sample.

进一步地，所述S4中，将夜间图像特征向量输入分类网络头进行计算，得到预测的伪标签，并根据伪标签构造夜间图像正负样本对{k,k ₊,k _-}_m，k ₊为k的正样本，与k属于同一标签，k _-为k的负样本，与k属于不同标签，m表示样本对个数。Further, in S4, the nighttime image feature vector is input into the classification network head for calculation to obtain the predicted pseudo-label, and the nighttime image positive and negative sample pairs { k , k ₊ , k _- } _m , k ₊ are constructed according to the pseudo-label is the positive sample of k , which belongs to the same label as k , k _- is the negative sample of k , which belongs to a different label from k , and m represents the number of sample pairs.

进一步地，所述S5中将正负样本特征对进行角度归一化：Further, in the S5, the angle normalization is performed on the positive and negative sample feature pairs:

其中，x表示输入的特征向量，||x||表示特征向量x的模长，y表示向量x所属的标签，W_y表示分类网络头中全连接层第y行的参数；将正负样本对{k,k ₊,k _-}_m中的每个样本特征向量进行角度归一化计算，得到归一化后的特征向量{Λk,Λk ₊,Λk _-}_m：Among them, x represents the input feature vector, ||x|| represents the modulo length of the feature vector x, y represents the label to which the vector x belongs, W _y represents the parameter of the yth row of the fully connected layer in the classification network head; Perform angle normalization calculation on each sample eigenvector in { k , k ₊ , k _- } _m to obtain normalized eigenvectors {Λ k , Λ k ₊ , Λ k _- } _m :

Λk=Λ(k,W,y)Λ k =Λ( k ,W,y)

Λk ₊=Λ(k ₊,W,y)Λ k ₊ =Λ( k ₊ ,W,y)

Λk _-=Λ(k _-,W,y)。Λ k _- =Λ( k _- ,W,y).

进一步地，所述S5中，采用对比损失指导特征空间的学习，使正样本相似，负样本有效区分，采用如下损失函数：Further, in the S5, the contrast loss is used to guide the learning of the feature space, so that the positive samples are similar and the negative samples are effectively distinguished, and the following loss function is used:

其中，y^k,y^k+,y^k-分别表示一个样本对中样本k,k ₊,k _-的真实标签，𝜂是超参数，表示不同类样本之间的距离最小阈值，

表示相似度函数。 Among them, y ^k , y ^k +, y ^k - represent the true labels of samples k , k ₊ , k _- in a sample pair, respectively, 𝜂 is a hyperparameter, which represents the minimum distance threshold between samples of different classes,

represents the similarity function.

进一步地，采用余弦相似度函数对归一化后的特征向量{Λk,Λk ₊,Λk _-}_m进行相似度比较：Further, the cosine similarity function is used to compare the similarity of the normalized feature vectors {Λ k , Λ k ₊ , Λ k _- } _m :

其中，A_i、B_i分别代表向量A和B的各分量，其中正样本的相似度

为1，负样本的相似度

为-1。 Among them, A _i and B _i represent the components of the vectors A and B, respectively, and the similarity of the positive samples

is 1, the similarity of negative samples

is -1.

进一步地，所述S6的总损失函数为：Further, the total loss function of the S6 is:

当训练epoch达到指定次数后，停止训练。When the training epoch reaches the specified number of times, stop training.

进一步地，所述S7中，将夜间图像数据集中有标签的样本，输入训练完成的特征提取网络与分类网络头，固定特征提取网络的权重，利用交叉熵损失函数对分类网络头进行监督：Further, in the described S7, the samples with labels in the nighttime image data set are input into the trained feature extraction network and the classification network head, the weight of the fixed feature extraction network is fixed, and the cross entropy loss function is used to supervise the classification network head:

其中，N’表示夜间图像数据集中有标签的样本总个数，y ⁱ表示第i个样本的真实标签，

表示第i个样本的类别预测概率值。 Among them, N' represents the total number of labeled samples in the night image dataset, y ⁱ represents the true label of the ith sample,

Represents the class prediction probability value of the ith sample.

进一步地，所述S7中，进入自蒸馏学习阶段，进行多次迭代更新，利用前一次

损失监督训练的分类预测结果作为软目标

，与真实标签y一同参与监督： Further, in the S7, enter the self-distillation learning stage, perform multiple iterative updates, and use the previous

Loss-supervised training classification predictions as soft targets

, participate in supervision together with the ground truth label y :

其中，λ表示软目标损失所占的比重，经多次迭代更新后，完成自蒸馏训练。Among them, λ represents the proportion of soft target loss, and after several iterations of updates, the self-distillation training is completed.

一种基于多任务解耦学习的半监督夜间图像分类方法，将待测图像输入所述训练完成的特征提取网络与分类网络头，输出图像分类结果。A semi-supervised nighttime image classification method based on multi-task decoupling learning, the image to be tested is input into the trained feature extraction network and classification network head, and the image classification result is output.

本发明的优势和有益效果在于：The advantages and beneficial effects of the present invention are:

本发明首次提出将多任务学习与知识蒸馏结合赋能于夜间图像分类，利用夜间无标签图像进行自监督学习，使网络在学习白天图像类别特征的同时，自适应地学习到夜间图像的特征分布；通过角度归一化损失函数进行自监督学习，减少自监督损失与有监督损失之间的竞争关系；通过自蒸馏的方法，利用夜间少量带标签的数据进行蒸馏学习，可以避免网络过拟合到目标域而失去泛化能力，同时又能适当地将模型进一步适应到夜间数据中。The present invention first proposes to combine multi-task learning and knowledge distillation to enable night image classification, and use unlabeled images at night for self-supervised learning, so that the network can adaptively learn the feature distribution of night images while learning the features of daytime image categories. ; Carry out self-supervised learning through angle normalization loss function to reduce the competition between self-supervised loss and supervised loss; Through self-distillation method, using a small amount of labeled data at night for distillation learning can avoid network overfitting to the target domain and lose the generalization ability, while appropriately further adapting the model to the nighttime data.

附图说明Description of drawings

图1是本发明方法流程示意图。Fig. 1 is the schematic flow chart of the method of the present invention.

图2是本发明中多任务解耦学习阶段的示意图。FIG. 2 is a schematic diagram of a multi-task decoupling learning stage in the present invention.

图3是本发明中正负样本对的示例图。FIG. 3 is an example diagram of a positive and negative sample pair in the present invention.

图4是本发明中自蒸馏学习阶段的示意图。Figure 4 is a schematic diagram of the self-distillation learning stage in the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的具体实施方式进行详细说明。应当理解的是，此处所描述的具体实施方式仅用于说明和解释本发明，并不用于限制本发明。The specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are only used to illustrate and explain the present invention, but not to limit the present invention.

本发明通过结合夜间数据的自监督学习与白天数据的有监督学习，训练出具备域自适应能力的特征提取网络，并通过夜间数据集中少量带标签样本对图像识别网络进行进一步的自蒸馏学习，使分类网络头向夜间数据分布特征迁移，从而提高夜间图像识别性能。By combining the self-supervised learning of nighttime data and the supervised learning of daytime data, the invention trains a feature extraction network with domain self-adaptation ability, and further self-distills learning of the image recognition network through a small number of labeled samples in the nighttime data set, The classification network head is migrated to the nighttime data distribution features, thereby improving the nighttime image recognition performance.

如图1、图2所示，本发明的一种基于多任务解耦学习的半监督夜间图像分类方法，包括以下步骤：As shown in Figure 1 and Figure 2, a semi-supervised nighttime image classification method based on multi-task decoupling learning of the present invention includes the following steps:

步骤1：构建带标签的白天图像分类数据集，构建夜间图像分类数据集，其中只有少量夜间样本带有标签。本实施例采用开源数据集Exclusively Dark（ExDARK）中的12个类别，分别为自行车、船、瓶子、公交车、轿车、猫、椅子、杯子、狗、摩托车、人和桌子。对于上述12个类别，从COCO公开数据集中分别选取对应图像各800张，作为白天图像分类数据集D。此外，将ExDARK数据集分为3部分：从每个类别中分别抽取400张图像，构建无监督夜间图像数据集A；从每个类别中分别抽取10张图像作为少量带标签的夜间图像数据集T；最后剩下的图像作为夜间图像分类性能验证集V，以评估算法有效性；Step 1: Build a labeled daytime image classification dataset, build a nighttime image classification dataset, in which only a small number of nighttime samples are labeled. This example uses 12 categories in the open source dataset Exclusively Dark (ExDARK), which are bicycles, boats, bottles, buses, cars, cats, chairs, cups, dogs, motorcycles, people and tables. For the above 12 categories, 800 corresponding images were selected from the COCO public data set, respectively, as the daytime image classification data set D. In addition, the ExDARK dataset is divided into 3 parts: 400 images are extracted from each category to construct an unsupervised nighttime image dataset A; 10 images are extracted from each category as a small number of labeled nighttime image datasets T; the last remaining images are used as the night image classification performance validation set V to evaluate the effectiveness of the algorithm;

步骤2：将白天数据集D中带标签的图像样本与夜间数据集A中无标签的图像样本，一同输入特征提取网络，输出各样本数据的特征向量。特征提取网络为深度残差卷积网络，本实施例中，采用ResNet50网络，在conv5_x层输出维度为2048的特征向量。网络对所有图像样本采用

的输入尺寸，并使用随机裁剪、水平翻转的图像增强技术来扩增样本多样性。每次输入的白天图像样本batch size为32，夜间图像样本batch_size为32，采用8 卡GPU并行训练； Step 2: Input the labeled image samples in the daytime data set D and the unlabeled image samples in the nighttime data set A into the feature extraction network together, and output the feature vector of each sample data. The feature extraction network is a deep residual convolutional network. In this embodiment, a ResNet50 network is used, and a feature vector with a dimension of 2048 is output in the conv5_x layer. The network uses the

, and use randomly cropped, horizontally flipped image augmentation techniques to amplify sample diversity. The batch size of daytime image samples input each time is 32, and the batch_size of nighttime image samples is 32, and 8-card GPU is used for parallel training;

步骤3，在特征提取网络层后接入一个多任务解耦学习网络，该网络由一个有监督的分类网络头和一个自监督网络头构成；Step 3, after the feature extraction network layer is connected to a multi-task decoupling learning network, the network consists of a supervised classification network head and a self-supervised network head;

步骤4：构建分类网络头，该分类网络头由一个全局平均池化层和一个全连接层构成。本实施例采用average_pool层和一个维度为[2048，12]的全连接层，其中12是输出的类别个数；Step 4: Construct the classification network head, which consists of a global average pooling layer and a fully connected layer. This embodiment uses the average_pool layer and a fully connected layer with a dimension of [2048, 12], where 12 is the number of output categories;

步骤4.1：将白天样本经过步骤2提取的特征向量输入分类网络头，选择最高概率对应的类别作为该特征点的类别预测结果，采用交叉熵损失函数进行监督，其计算公式

如下： Step 4.1: Input the feature vector extracted by the daytime sample in step 2 into the classification network head, select the category corresponding to the highest probability as the category prediction result of the feature point, and use the cross entropy loss function to supervise, the calculation formula

as follows:

N表示样本总个数，y ⁱ表示第i个样本的真实标签，

表示第i个样本的类别预测概率值； N represents the total number of samples, y ⁱ represents the true label of the ith sample,

Represents the category prediction probability value of the ith sample;

步骤4.2：将夜间样本经过步骤2提取的特征向量输入分类网络头，获得该样本的伪标签，并根据伪标签构造正负样本对{k,k ₊,k _-}_m：k ₊为k的正样本，即与k属于同一标签；k _-为k的负样本，即与k属于不同标签，m表示样本对个数。具体构造方法为，在32个夜间样本向量中，首先随机选择一个类别C1，将该类别中的样本随机两两配对，得到一组正样本对集合C1{…}，从其他类别中随机挑选1个样本与C1{…}中的正样本对进行组合，得到多个正负样本对；然后从剩余的其他类别中选取一个类别C2，并重复以上操作，直到得到16个正负样本对。对于不足16个的极端情况，即所有样本均来自同一个类别，此次则无自监督网络的输入。因此m在大多数情况下取值16。图3为本实施例中一个正负样本对示例；Step 4.2: Input the feature vector extracted from the night sample in step 2 into the classification network head, obtain the pseudo-label of the sample, and construct a positive and negative sample pair { k , k ₊ , k _- } _m according to the pseudo-label: k ₊ is k Positive samples, that is, belong to the same label as k ; k _- is a negative sample of k , that is, belong to different labels from k , and m represents the number of sample pairs. The specific construction method is as follows: in the 32 nighttime sample vectors, first randomly select a category C1, and randomly pair the samples in this category to obtain a set of positive sample pairs C1{…}, and randomly select 1 from other categories. The samples are combined with the positive sample pairs in C1{…} to obtain multiple positive and negative sample pairs; then a category C2 is selected from the remaining other categories, and the above operations are repeated until 16 positive and negative sample pairs are obtained. For the extreme case of less than 16, that is, all samples are from the same category, this time there is no input from the self-supervised network. So m takes the value 16 in most cases. FIG. 3 is an example of a positive and negative sample pair in this embodiment;

步骤5：构建自监督网络头：将步骤3.2获得的正负样本对{k,k ₊,k _-}_m以及分类网络头的权重参数W输入自监督网络头，首先将样本特征进行角度归一化，其计算公式如下：Step 5: Construct the self-supervised network head: Input the positive and negative sample pairs { k , k ₊ , k _- } _m obtained in step 3.2 and the weight parameter W of the classification network head into the self-supervised network head, and firstly normalize the angle of the sample features , its calculation formula is as follows:

x表示输入的特征向量，||x||表示特征向量x的模长，y表示向量x所属的标签，W_y表示分类网络头中全连接层第y行的参数。角度归一化处理可以缓解多任务学习任务中，附加任务与主要任务之间的竞争关系，即减少自监督任务对有监督任务的负面影响；x represents the input feature vector, ||x|| represents the modulo length of the feature vector x, y represents the label to which the vector x belongs, and W _y represents the parameter of the yth row of the fully connected layer in the classification network head. Angle normalization can alleviate the competitive relationship between additional tasks and main tasks in multi-task learning tasks, that is, reduce the negative impact of self-supervised tasks on supervised tasks;

将正负样本对{k,k ₊,k _-}_m中的每个样本特征向量进行角度归一化计算，得到归一化后的特征向量{Λk,Λk ₊,Λk _-}_m：Perform angle normalization calculation on the positive and negative samples for each sample feature vector in { k , k ₊ , k _- } _m to obtain the normalized feature vector {Λ k , Λ k ₊ , Λ k _- } _m :

Λk=Λ(k,W,y)Λ k =Λ( k ,W,y)

Λk ₊=Λ(k ₊,W,y)Λ k ₊ =Λ( k ₊ ,W,y)

Λk _-=Λ(k _-,W,y)Λ k _- =Λ( k _- ,W,y)

步骤5.1：采用余弦相似度函数对{Λk,Λk ₊,Λk _-}_m进行相似度比较，其相似度函数

计算公式如下： Step 5.1: Use the cosine similarity function to compare the similarity of {Λ k , Λ k ₊ , Λ k _- } _m , and the similarity function

Calculated as follows:

A_i、B_i分别代表向量A和B的各分量，其中正样本的相似度

应为1，负样本的相似度

应为-1； A _i , B _i represent the components of vectors A and B, respectively, where the similarity of positive samples

Should be 1, the similarity of negative samples

should be -1;

步骤5.2：采用对比损失指导特征空间的学习，使正样本相似，负样本有效区分，其损失函数

计算公式如下： Step 5.2: Use the contrast loss to guide the learning of the feature space, so that the positive samples are similar and the negative samples are effectively distinguished, and its loss function

Calculated as follows:

y^k,y^k+,y^k-分别表示一个样本对中样本k,k ₊,k _-的真实标签，𝜂是超参数，表示不同类样本之间的距离应该超过该值；y ^k , y ^k +, y ^k - represent the true labels of samples k , k ₊ , k _- in a sample pair, respectively, 𝜂 is a hyperparameter, indicating that the distance between samples of different classes should exceed this value;

步骤6：利用步骤4.1与步骤5.2的损失函数，对特征提取网络与多任务解耦学习网络进行共同监督训练，其总损失函数为：Step 6: Use the loss functions of Step 4.1 and Step 5.2 to jointly supervise the training of the feature extraction network and the multi-task decoupling learning network. The total loss function is:

本实施例中，采用SGD优化器，其初始学习率为0.01，当训练epoch达到70时，将学习率降至0.001。当训练epoch达到100次后，停止训练；In this embodiment, the SGD optimizer is used, and its initial learning rate is 0.01, and when the training epoch reaches 70, the learning rate is reduced to 0.001. When the training epoch reaches 100 times, stop training;

步骤7：将夜间数据集中少量带标签的样本输入训练完成的特征提取网络与分类网络头，固定特征提取网络的权重，利用交叉熵损失函数对分类网络头进行进一步监督训练，使分类网络头适应夜间图像特征的数据分布，其计算公式如下：Step 7: Input a small number of labeled samples in the night dataset into the trained feature extraction network and classification network head, fix the weights of the feature extraction network, and use the cross-entropy loss function to further supervise the training of the classification network head, so that the classification network head adapts The data distribution of nighttime image features is calculated as follows:

N’表示样本总个数，y ⁱ表示第i个样本的真实标签，

表示第i个样本的类别预测概率值； N' represents the total number of samples, y ⁱ represents the true label of the ith sample,

Represents the category prediction probability value of the ith sample;

步骤7.1：如图4所示，在自蒸馏学习阶段，利用前一次的分类预测结果作为软目标

，与真实标签y一同参与监督，其损失函数

的计算公式如下： Step 7.1: As shown in Figure 4, in the self-distillation learning stage, the previous classification prediction result is used as a soft target

, participate in supervision together with the real label y, and its loss function

The calculation formula is as follows:

λ表示软目标损失所占的比重，本实施例中λ=0.5时，模型性能最佳。基于损失函数

对网络进行反向传播，学习率为0.005，,通过批量梯度下降法不断更新网络参数； λ represents the proportion of soft target loss. In this embodiment, when λ=0.5, the model has the best performance. based on loss function

Backpropagating the network, the learning rate is 0.005, and the network parameters are continuously updated through the batch gradient descent method;

步骤7.2：重复步骤6.1，经过10次迭代更新后，模型前后两次的损失相差小于0.1，完成自蒸馏网络的训练；Step 7.2: Repeat step 6.1. After 10 iterations of updating, the difference between the two losses before and after the model is less than 0.1, and the training of the self-distillation network is completed;

步骤8：推理阶段，将待测的夜间图像输入特征提取网络与分类网络头，输出图像分类结果。本实例训练与推理阶段皆在GPU服务器GEFORCE RTX 2080 Ti上实现。Step 8: In the inference stage, the night image to be tested is input into the feature extraction network and the classification network head, and the image classification result is output. The training and inference stages of this example are implemented on the GPU server GEFORCE RTX 2080 Ti.

本发明通过将夜间图像分类的任务解耦为白天图像的有监督分类任务和夜间图像的自监督任务，进行多任务学习后训练出具备域自适应能力的特征提取网络，并通过夜间少量带标签样本对图像识别网络进行进一步的自蒸馏学习，使分类网络头学习到的表征向夜间图像特征迁移，从而提高夜间图像识别性能。本实施例采用的验证数据集V在基于ResNet50网络下分类性能达到83.8%，采用本发明的算法可使分类性能达到89.2%，相较于baseline提高了5.4%的准确率，充分体现出本发明的实际效益与应用价值。The invention decouples the task of night image classification into a supervised classification task of daytime images and a self-supervised task of nighttime images, performs multi-task learning and trains a feature extraction network with domain self-adaptive ability, and uses a small amount of labels at night to train a feature extraction network. The sample performs further self-distillation learning on the image recognition network, so that the representation learned by the classification network head is transferred to the nighttime image features, thereby improving the nighttime image recognition performance. The classification performance of the verification data set V used in this embodiment reaches 83.8% based on the ResNet50 network, and the algorithm of the present invention can make the classification performance reach 89.2%, which is 5.4% higher than the baseline accuracy rate, which fully reflects the present invention practical benefits and application value.

以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明实施例技术方案的范围。The above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be described in the foregoing embodiments. The technical solutions of the present invention are modified, or some or all of the technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A semi-supervised night image classification method based on multitask decoupling learning is characterized by comprising the following steps:

s1, constructing a daytime image classification data set and a nighttime image classification data set, wherein the daytime image classification data set is sample images with class labels, and only part of the sample images in the nighttime image classification data set are labeled;

s2, inputting the sample images with labels in the daytime image classification data set and the sample images without labels in the nighttime image classification data set into a feature extraction network together, and outputting a daytime image feature vector and a nighttime image feature vector;

s3, accessing a multi-task decoupling learning network after the feature extraction network layer, wherein the network is composed of a supervised classification network head and an automatic supervision network head;

s4, for the feature vector of the daytime image, the step is carried out through a classification network head

Loss supervision training; for the feature vector of the night image, predicting the category of the feature vector as a pseudo label through a classified network head, and constructing a pair of positive and negative samples of the night image according to the pseudo label;

s5, the self-monitoring network head normalizes the positive and negative sample pairs of the night image according to the weight parameter of the classification network head to obtain the normalized feature vector, and adopts the contrast loss

Guiding the learning of the feature space to enable the positive samples to be similar and the negative samples to be effectively distinguished;

s6, the classification network head and the self-supervision network head are subjected to multi-task training;

s7, inputting the sample with the label in the night image data set, and finishing trainingThe weight of the characteristic extraction network is fixed and the operation is carried out by the classification network head

Loss supervision training, so that the classification network head adapts to the feature distribution of the images at night; entering into self-distillation learning stage, performing multiple iteration update on the weight parameters of the classification network head, and utilizing the previous time

Taking the classification prediction result of the loss supervision training as a soft target, and participating in supervision together with a real label;

and S8, in the reasoning stage, inputting the night image to be detected into the trained feature extraction network and classification network head, and outputting an image classification result.

2. The semi-supervised night image classification method based on multi-task decoupled learning as claimed in claim 1, wherein in S4, the feature vectors of the daytime images are input into a classification network header, the predicted sample classes are output, and the monitoring is performed through a cross entropy loss function:

wherein,Nrepresenting the total number of samples tagged in the daytime dataset,y ⁱis shown asiThe authenticity of the label of the individual specimen,

is shown asiThe class of samples predicts a probability value.

3. The method for semi-supervised night image classification based on multi-task decoupling learning as claimed in claim 1, wherein in S4, the night image feature vector is input into the classification network head for calculation to obtain the predicted pseudo-feature vectorA label, and constructing a positive and negative sample pairing according to the pseudo labelk,k ₊,k _-}_m，k ₊Is composed ofkA positive sample of (1), andkbelonging to the same label, and the label is a single label,k _-is composed ofkA negative sample ofkBelonging to different labels, and belonging to different labels,mindicating the number of sample pairs.

4. The semi-supervised nighttime image classification method based on multi-task decoupled learning of claim 3, wherein the positive and negative sample feature pairs are angle-normalized in the step S5:

wherein x represents the input feature vector, | | x | | | represents the modular length of the feature vector x, y represents the label to which the vector x belongs, and W represents the label to which the vector x belongs_yA parameter indicating the y-th row of the fully connected layer in the classified network header; checking positive and negative samplesk,k ₊,k _-}_mCarrying out angle normalization calculation on each sample feature vector to obtain normalized feature vector { Lambdak,Λk ₊,Λk _-}_m：

Λk=Λ(k,W,y)

Λk ₊=Λ(k ₊,W,y)

Λk _-=Λ(k _-,W,y)。

5. The semi-supervised night image classification method based on multi-task decoupling learning as claimed in claim 4, wherein in the step S5, the following contrast loss function is adopted:

wherein, y^k,y^k+,y^kRespectively representing samples of a sample pairk,k ₊,k _-The real label of (a) is,𝜂is a hyper-parameter, representing a minimum threshold value for the distance between samples of different classes,

representing a similarity function.

6. The semi-supervised night image classification method based on multi-task decoupling learning as claimed in claim 5, wherein the normalized eigenvector { Lambda ] is subjected to cosine similarity functionk,Λk ₊,Λk _-}_mAnd (3) carrying out similarity comparison:

wherein A is_i、B_iRepresenting the components of vectors A and B, respectively, with the similarity of the positive samples

1, similarity of negative examples

Is-1.

7. The semi-supervised night image classification method based on multi-task decoupled learning of claim 1, wherein the total loss function of S6 is as follows:

and stopping training when the training reaches the specified times.

8. The semi-supervised night image classification method based on multi-task decoupling learning of claim 1, wherein in S7, the samples with labels in the night image data set are input to the trained feature extraction network and classification network header, the weights of the feature extraction network are fixed, and the classification network header is supervised by using the cross entropy loss function:

wherein,N’indicating the total number of samples with labels in the night image dataset,y ⁱis shown asiThe authenticity of the label of the individual specimen,

is shown asiThe class of samples predicts a probability value.

9. The semi-supervised night image classification method based on multi-task decoupling learning as claimed in claim 1, wherein in step S7, a self-distillation learning stage is entered, and multiple iterative updates are performed, using a previous iteration

Class prediction of loss supervised training as soft target

And a genuine labelyAnd (3) participating in supervision together:

wherein, lambda represents the proportion of the soft target loss, and the self-distillation training is completed after repeated iteration updating.