CN115641483A

CN115641483A - An unsupervised low-light domain adaptive training method and detection method

Info

Publication number: CN115641483A
Application number: CN202211129606.6A
Authority: CN
Inventors: 刘家瑛; 罗润冬; 汪文靖
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2023-01-24
Also published as: WO2024055398A1

Abstract

The invention discloses an unsupervised low-illumination-area self-adaptive training method and a detection method. The method comprises the following steps: 1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; connecting a multilayer perceptron behind a feature extractor of the pre-training model to obtain a first model; 2) Training a multilayer sensor in a first model by using normal illumination data; 3) Constructing a depth concave curve model and placing the depth concave curve model in front of a feature extractor in a first model to obtain a second model; 4) Training a depth concave curve model in the second model by using the low-light data; 5) Brightening the low-illumination data by using a deep concave curve model, inputting the low-illumination data into a pre-training model, and using a label obtained by prediction as a pseudo label of the low-illumination data; 6) Training and fine-tuning the pre-training model by using normal illumination data and low illumination data with a pseudo label; 7) And (4) brightening the low-illumination image to be processed, inputting the fine-tuned pre-training model, and outputting a corresponding detection result.

Description

An unsupervised low-light domain adaptive training method and detection method

技术领域technical field

本发明属于数字图像低光照增强领域和机器视觉领域，涉及一种基于深度凹曲线的无监督低光照域自适应训练方法及检测方法。The invention belongs to the fields of digital image low-light enhancement and machine vision, and relates to an unsupervised low-light field self-adaptive training method and detection method based on a deep concave curve.

背景技术Background technique

低光照是一种常见的图像降质，光照不足通常由低光照拍摄环境、相机故障、参数设置错误等原因造成。低光照环境下的视觉任务，包括物体分类、人脸检测、行为识别和光流估计等，一直受到学术界和工业界的关注。传统的低光照视觉任务模型训练需要大规模标注训练集，但在低光照环境下，数据难以标注，并且业内已有大量正常光照训练数据集及预训练模型，新搭建低光照数据集并重新训练模型会重复消耗人力物力。如何充分利用已有的具备标注的正常光照训练数据集及正常光照预训练模型，并在不额外引入低光照标注的条件下训练得到能够应用在低光照环境下的模型，即通过无监督域自适应的方法将正常光照预训练模型迁移至低光照环境，具有广泛的现实意义和应用价值。Low light is a common image degradation. Insufficient light is usually caused by low light shooting environment, camera failure, wrong parameter setting and other reasons. Vision tasks in low-light environments, including object classification, face detection, action recognition, and optical flow estimation, have been attracting attention from academia and industry. Traditional low-light vision task model training requires large-scale labeling of training sets, but in low-light environments, data is difficult to label, and there are a large number of normal-light training data sets and pre-training models in the industry. A new low-light data set is built and retrained Models will repeatedly consume manpower and material resources. How to make full use of the existing labeled normal illumination training data set and normal illumination pre-training model, and train a model that can be applied in low-light environments without additional introduction of low-light annotations, that is, through unsupervised domain automatic The adaptive method migrates the normal lighting pre-training model to the low-light environment, which has extensive practical significance and application value.

传统无监督低光照域自适应方法可以分为三类。基于提亮的方法对低光照图像进行提亮，从而提升在正常光照图像上训练的模型的性能。基于特征迁移的方法通过对比学习方法对齐正常光图像与低光图像的特征，从而使模型能够运用于低光环境。基于对抗学习的方法通过生成对抗网络生成暗光图像，并利用伪标签将模型迁移到低光环境。Traditional unsupervised low-light domain adaptation methods can be divided into three categories. Brightening-based methods brighten low-light images to improve the performance of models trained on normal-light images. Feature-transfer-based methods align the features of normal-light images and low-light images through a contrastive learning method, thereby enabling the model to be applied to low-light environments. Adversarial learning-based methods generate low-light images through generative adversarial networks and utilize pseudo-labels to transfer the model to low-light environments.

但是，基于提亮的方法忽略了人类视觉与机器视觉的差异，基于特征迁移的方法忽略了像素级别调整的重要性，基于对抗学习的方法需要来自多个域的数据且忽略了输入图像自身的特征。已有的无监督域自适应方法效果不佳，均无法满足实际应用的需求。However, brightening-based methods ignore the difference between human vision and machine vision, feature transfer-based methods ignore the importance of pixel-level adjustments, and adversarial learning-based methods require data from multiple domains and ignore the input image itself. feature. Existing unsupervised domain adaptation methods do not perform well, and none of them can meet the needs of practical applications.

发明内容Contents of the invention

针对上述问题，本发明的目的在于提供一种基于深度凹曲线的无监督低光照域自适应训练方法及检测方法。本发明运用自监督训练策略训练深度凹曲线模型进行亮度增强，综合地提升了模型在低光环境下的性能。In view of the above problems, the purpose of the present invention is to provide an unsupervised low-light domain adaptive training method and detection method based on deep concave curves. The invention uses a self-supervised training strategy to train a deep concave curve model to enhance brightness, and comprehensively improves the performance of the model in a low-light environment.

本发明采用的技术方案如下：The technical scheme that the present invention adopts is as follows:

一种无监督低光照域自适应训练方法，其步骤包括：An unsupervised low-light domain adaptive training method, the steps of which include:

1)收集有标注的正常光照训练数据、无标注的低光照训练数据和预训练模型；所述预训练模型为有标注的正常光照训练数据训练后的视觉任务模型；在所述预训练模型的特征提取器之后连接一多层感知器，得到第一模型；所述多层感知器用于将特征提取器提取的特征映射到自监督任务的表示空间；在本方案采用的基于旋转拼图的自监督学习中，多层感知器的输出为一个30维向量，代表被打乱后的图像在所有图像块排列组合方案中的字典序号；1) Collect labeled normal illumination training data, unlabeled low-light training data, and a pre-training model; the pre-training model is a visual task model trained with labeled normal illumination training data; in the pre-training model A multi-layer perceptron is connected after the feature extractor to obtain the first model; the multi-layer perceptron is used to map the features extracted by the feature extractor to the representation space of the self-supervised task; the self-supervision based on the rotation puzzle used in this program During learning, the output of the multi-layer perceptron is a 30-dimensional vector, which represents the dictionary number of the shuffled image in all image block arrangement and combination schemes;

2)利用所述有标注的正常光照训练数据训练所述第一模型，训练过程中锁定所述特征提取器的参数，仅训练所述多层感知器；2) using the labeled normal illumination training data to train the first model, locking the parameters of the feature extractor during the training process, and only training the multi-layer perceptron;

3)构建深度凹曲线模型，用于预测输入图像的每个像素值在亮度增强后的像素值；将所述深度凹曲线模型置于所述第一模型中所述特征提取器之前，得到第二模型；3) Constructing a depth concave curve model for predicting the pixel value of each pixel value of the input image after brightness enhancement; placing the depth concave curve model before the feature extractor in the first model to obtain the second Two models;

4)利用所述低光照训练数据训练所述第二模型；训练过程中锁定所述特征提取器的参数以及所述多层感知器中的参数，仅训练所述深度凹曲线模型；4) using the low-light training data to train the second model; during the training process, the parameters of the feature extractor and the parameters in the multi-layer perceptron are locked, and only the deep concave curve model is trained;

5)利用训练后的所述深度凹曲线模型对所述低光照训练数据进行提亮后输入所述预训练模型，预测得到所述低光照训练数据的标签；将预测所得标签作为所述低光照训练数据的伪标签；5) Using the trained deep concave curve model to brighten the low-light training data and then input the pre-training model to predict the label of the low-light training data; use the predicted label as the low-light Pseudo-labeling of training data;

6)利用所述有标注的正常光照训练数据及带伪标签的低光照训练数据，对所述预训练模型进行训练，得到微调后的预训练模型。6) Using the labeled normal-light training data and the pseudo-labeled low-light training data to train the pre-training model to obtain a fine-tuned pre-training model.

进一步的，所述多层感知器采用“全连接层-批规范化层-线性整流函数-全连接层”的网络结构；利用所述正常光照训练数据训练所述第一模型的方法为：首先将一正常光照训练数据依次进行旋转、分块，得到多个图像块；然后将图像块的顺序打乱后输入所述特征提取器进行特征提取，并将所提取特征发送给所述多层感知器；所述多层感知器根据输入的特征数据预测各所述图像块的顺序；其中，训练所述第一模型所采用的损失函数为

L_C为交叉熵损失函数，

是正常光照图像N被打乱的顺序在所有图像块排列组合方案中的字典序号，

是多层感知器预测的拼图顺序。Further, the multi-layer perceptron adopts the network structure of "fully connected layer-batch normalization layer-linear rectification function-fully connected layer"; the method of using the normal illumination training data to train the first model is: first A normal illumination training data is sequentially rotated and divided into blocks to obtain multiple image blocks; then the order of the image blocks is disrupted and input to the feature extractor for feature extraction, and the extracted features are sent to the multi-layer perceptron ; The multi-layer perceptron predicts the order of each of the image blocks according to the input feature data; wherein, the loss function used for training the first model is

L _C is the cross entropy loss function,

is the dictionary sequence number of the disordered order of the normal illumination image N in all image block arrangement and combination schemes,

is the puzzle order predicted by the multilayer perceptron.

进一步的，所述深度凹曲线模型依次包含降采样层、U-net网络、卷积层、全局池化层和全连接层；其中，所述降采样层用于对输入图像进行降采样并输入所述U-net网络，所述U-net网络用于对输入数据进行特征提取并将其输入所述卷积层，所述卷积层对输入特征数据进行进一步特征提取并将所提取特征依次输入所述全局池化层、全连接层，得到预测结果。Further, the deep concave curve model includes a downsampling layer, a U-net network, a convolutional layer, a global pooling layer, and a fully connected layer in sequence; wherein, the downsampling layer is used to downsample the input image and input The U-net network, the U-net network is used to extract features from the input data and input it into the convolution layer, and the convolution layer performs further feature extraction on the input feature data and extracts the features sequentially Input the global pooling layer and the fully connected layer to obtain the prediction result.

进一步的，利用所述低光照训练数据训练所述第二模型的方法为：首先采用所述深度凹曲线模型对所述低光照训练数据进行提亮，得到提亮后的图像；然后对提亮后的图像依次进行旋转、分块，得到多个图像块；然后将图像块的顺序打乱后输入所述特征提取器进行特征提取，并将所提取特征发送给所述多层感知器；所述多层感知器根据输入的特征数据预测各所述图像块的顺序；其中训练所述第二模型所采用的损失函数为

L_C为交叉熵损失函数，

是低光照图像L被打乱的顺序在所有图像块排列组合方案中的字典序号，

是多层感知器预测的拼图顺序。Further, the method of using the low-light training data to train the second model is as follows: firstly, using the deep concave curve model to brighten the low-light training data to obtain a brightened image; The final image is rotated and divided into blocks in turn to obtain multiple image blocks; then the order of the image blocks is disrupted and then input to the feature extractor for feature extraction, and the extracted features are sent to the multi-layer perceptron; The multi-layer perceptron predicts the order of each of the image blocks according to the input feature data; wherein the loss function used to train the second model is

L _C is the cross entropy loss function,

is the sequence number of the low-light image L that is shuffled in all image block arrangement and combination schemes,

is the puzzle order predicted by the multilayer perceptron.

进一步的，所述深度凹曲线模型包括两个卷积层；即所述深度凹曲线模型依次包含降采样层、U-net网络、第一卷积层、第二卷积层、全局池化层和全连接层。Further, the deep concave curve model includes two convolutional layers; that is, the deep concave curve model includes a downsampling layer, a U-net network, a first convolutional layer, a second convolutional layer, and a global pooling layer in sequence and fully connected layers.

进一步的，对于分类任务，所述预训练模型采用ResNet-18；对于人脸检测任务，所述预训练模型采用DSFD；对于行为识别任务，所述预训练模型采用I3D；对于光流估计任务，所述预训练模型采用PWC-Net。Further, for the classification task, the pre-training model uses ResNet-18; for the face detection task, the pre-training model uses DSFD; for the behavior recognition task, the pre-training model uses I3D; for the optical flow estimation task, The pre-training model uses PWC-Net.

一种无监督低光照域图像视觉任务检测方法，其步骤包括：An unsupervised low-light domain image vision task detection method, the steps of which include:

1)收集有标注的正常光照训练数据、无标注的低光照训练数据和预训练模型；所述预训练模型为有标注的正常光照训练数据训练后的视觉任务模型；在所述预训练模型的特征提取器之后连接一多层感知器，得到第一模型；所述多层感知器采用“全连接层-批规范化层-线性整流函数-全连接层”的网络结构；1) Collect labeled normal illumination training data, unlabeled low-light training data, and a pre-training model; the pre-training model is a visual task model trained with labeled normal illumination training data; in the pre-training model A multi-layer perceptron is connected after the feature extractor to obtain the first model; the multi-layer perceptron adopts the network structure of "fully connected layer-batch normalization layer-linear rectification function-fully connected layer";

6)利用所述有标注的正常光照训练数据及带伪标签的低光照训练数据，对所述预训练模型进行训练，得到微调后的预训练模型；6) using the labeled normal illumination training data and the low-light training data with pseudo-labels to train the pre-training model to obtain a fine-tuned pre-training model;

7)对于待处理的低光照图像，将其输入训练后的所述深度凹曲线模型进行提亮后输入微调后的预训练模型，输出对应的视觉任务检测结果。7) For the low-light image to be processed, input it into the trained deep concave curve model for brightening, then input it into the fine-tuned pre-training model, and output the corresponding visual task detection result.

一种服务器，其特征在于，包括存储器和处理器，所述存储器存储计算机程序，所述计算机程序被配置为由所述处理器执行，所述计算机程序包括用于执行上述方法中各步骤的指令。A server, characterized in that it includes a memory and a processor, the memory stores a computer program, the computer program is configured to be executed by the processor, and the computer program includes instructions for executing the steps in the above method .

一种计算机可读存储介质，其上存储有计算机程序，其特征在于，所述计算机程序被处理器执行时实现上述方法的步骤。A computer-readable storage medium, on which a computer program is stored, is characterized in that, when the computer program is executed by a processor, the steps of the above method are realized.

与现有技术相比，本发明的积极效果为：Compared with prior art, positive effect of the present invention is:

本发明显著提升正常光照模型在低光照环境下的性能，在CODaN低光照分类基准测试集上，能够将通用分类模型ResNet-18的准确率从60.96％提升至63.92％；在DarkFace低光照人脸检测基准测试集上，能够将通用人脸检测器Dual Shot Face Detector的平均精确度(mean of Average Precision)由44.44提升至46.91；在低光照行为识别基准测试集ARID上，能够将识别准确率由50.18％提升至52.13％；在低光光流估计基准测试机VBOF上，能够将终点错误(end-point error)由8.99降低至7.44。The present invention significantly improves the performance of the normal illumination model in low-light environments. On the CODaN low-light classification benchmark test set, the accuracy rate of the general classification model ResNet-18 can be increased from 60.96% to 63.92%; On the detection benchmark test set, the average accuracy (mean of Average Precision) of the general-purpose face detector Dual Shot Face Detector can be improved from 44.44 to 46.91; on the low-light behavior recognition benchmark test set ARID, the recognition accuracy can be increased by 50.18% increased to 52.13%; on the low-light optical flow estimation benchmark machine VBOF, the end-point error (end-point error) can be reduced from 8.99 to 7.44.

附图说明Description of drawings

图1为深度凹曲线模型的结构图。Figure 1 is a structural diagram of a deep concave curve model.

图2为深度凹曲线模型的训练流程图。Figure 2 is a training flow chart of the deep concave curve model.

图3为将预训练模型迁移至低光域的流程图。Fig. 3 is a flowchart of migrating the pre-trained model to the low-light domain.

具体实施方式Detailed ways

为使本发明的上述特征和优点能更明显易懂，下文特举实施例，并配合所附图作详细说明如下。In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.

本实施例公开一种应用在低光分类任务上的无监督低光照域自适应方法，具体说明如下：This embodiment discloses an unsupervised low-light domain adaptive method applied to low-light classification tasks, and the specific description is as follows:

步骤1：搜集带标注的正常光照图像，组成训练数据集{X_N，Y_N}；搜集低光照图像，组成低光照训练数据集{X_L}。其中，正常光照训练数据集中的样本X_N需包含类别信息Y_N，低光照训练数据集中的样本无需包含类别信息。获取正常光照图像上的预训练模型，该模型中包含特征提取器。预训练模型采用残差卷积网络ResNet-18，也可以采用其他预训练模型。对于分类任务，预训练模型采用ResNet-18；对于人脸检测任务，预训练模型采用DSFD；对于行为识别任务，预训练模型采用I3D；对于光流估计任务，预训练模型采用PWC-Net。Step 1: Collect labeled normal-light images to form a training dataset {X _N , Y _N }; collect low-light images to form a low-light training dataset {X _L }. Among them, the samples X _N in the normal light training data set need to contain category information Y _N , and the samples in the low light training data set do not need to contain category information. Get a pretrained model on normally illuminated images that includes a feature extractor. The pre-training model uses the residual convolutional network ResNet-18, and other pre-training models can also be used. For classification tasks, the pre-training model uses ResNet-18; for face detection tasks, the pre-training model uses DSFD; for behavior recognition tasks, the pre-training model uses I3D; for optical flow estimation tasks, the pre-training model uses PWC-Net.

步骤2：构建并训练多层感知器。多层感知器采用“全连接层-批规范化层-线性整流函数-全连接层”的网络结构。固定步骤1中获取的特征提取器的参数，将多层感知器接在特征提取器后，通过自监督训练方法，利用正常光照数据集{X_L}训练多层感知器。自监督训练方法可以使用旋转拼图的策略，即先对图像进行旋转，再将图像分为3×3的9个图像块，对图像块进行顺序打乱，并训练多层感知器恢复图像块原本的顺序。这一步骤的损失函数项为：Step 2: Build and train a multilayer perceptron. The multi-layer perceptron adopts the network structure of "fully connected layer-batch normalization layer-linear rectification function-fully connected layer". Fix the parameters of the feature extractor obtained in step 1, connect the multi-layer perceptron to the feature extractor, and use the normal illumination data set {X _L } to train the multi-layer perceptron through self-supervised training method. The self-supervised training method can use the strategy of rotating the puzzle, that is, first rotate the image, and then divide the image into 9 image blocks of 3×3, scramble the order of the image blocks, and train the multilayer perceptron to restore the original image blocks. Order. The loss function term for this step is:

式中，L_C为交叉熵损失函数，

是多层感知器预测的拼图顺序。训练批大小为64，首先使用学习率0.01进行150000次迭代，再使用学习率0.001进行150000次迭代。In the formula, _LC is the cross-entropy loss function,

is the puzzle order predicted by the multilayer perceptron. The training batch size is 64, first with a learning rate of 0.01 for 150,000 iterations, and then with a learning rate of 0.001 for 150,000 iterations.

步骤3：构建深度凹曲线模型，该深度凹曲线模型以无标注的低光照图像作为输入，用来预测映射g。g是从原图像像素值到新图像像素值的映射。例如，对于8比特灰度图像，由于色彩域中有256种数值，g是一个256维的向量。深度凹曲线模型的输出是g在归一化前的离散二阶导数的相反数，为255维的向量，依据该输出可以积分并归一化得到g。对于8比特三通道彩色图像，深度凹曲线模型对三个通道对应的g分别进行预测，即输出为765维向量。深度凹曲线模型末层为整流线性函数，以保证输出为非负值，进而保证g为凹曲线。深度凹曲线模型的详细结构如附图1所示，依次包含降采样层、U-net网络、两个3×3卷积层、全局池化层和全连接层。其中，降采样层将输入图像的分辨率降低为16*16，U-net网络以降采样层的输出作为输入，提取数据的特征，输出与输入大小相等。两个卷积层以U-net网络输出作为输入，进一步提取特征。全局池化层和全连接层以卷积层的输出作为输入，输出深度凹曲线模型的预测结果，即765维向量。Step 3: Construct a deep concave curve model that takes an unlabeled low-light image as input to predict the map g. g is the mapping from the pixel values of the original image to the pixel values of the new image. For example, for an 8-bit grayscale image, since there are 256 values in the color domain, g is a 256-dimensional vector. The output of the deep concave curve model is the opposite number of the discrete second derivative of g before normalization, which is a 255-dimensional vector. According to the output, g can be obtained by integrating and normalizing. For an 8-bit three-channel color image, the deep concave curve model predicts the g corresponding to the three channels, that is, the output is a 765-dimensional vector. The last layer of the deep concave curve model is a rectified linear function to ensure that the output is a non-negative value, thereby ensuring that g is a concave curve. The detailed structure of the deep concave curve model is shown in Figure 1, which sequentially includes a downsampling layer, a U-net network, two 3×3 convolutional layers, a global pooling layer, and a fully connected layer. Among them, the downsampling layer reduces the resolution of the input image to 16*16, and the U-net network uses the output of the downsampling layer as input to extract the characteristics of the data, and the output is equal to the input size. Two convolutional layers take the U-net network output as input to further extract features. The global pooling layer and the fully connected layer take the output of the convolutional layer as input, and output the prediction result of the deep concave curve model, that is, a 765-dimensional vector.

步骤4：训练深度凹曲线模型，流程图如附图2所示。本步骤中，特征提取器和多层感知器的参数保持不变，仅训练深度凹曲线模型。训练利用低光照数据集{X_L}，采用自监督范式。自监督训练方法可以使用旋转拼图的策略，即先对图像进行旋转，再将图像分为3×3的9个图像块，对图像块进行顺序打乱，并训练模型恢复图像块原本的顺序。这一步骤的损失函数项为：Step 4: Train the deep concave curve model, the flow chart is shown in Figure 2. In this step, the parameters of the feature extractor and multilayer perceptron remain unchanged, and only the deep concave curve model is trained. Training utilizes the low-light dataset {X _L }, using a self-supervised paradigm. The self-supervised training method can use the strategy of rotating the puzzle, that is, first rotate the image, and then divide the image into 9 image blocks of 3×3, scramble the order of the image blocks, and train the model to restore the original order of the image blocks. The loss function term for this step is:

式中，L_C为交叉熵损失函数，

是多层感知器预测的拼图顺序。训练批大小为64，初始学习率设置为0.01，共迭代20000次。学习率在第5000和第10000次迭代后以比率0.1衰减。In the formula, _LC is the cross-entropy loss function,

is the puzzle order predicted by the multilayer perceptron. The training batch size is 64, the initial learning rate is set to 0.01, and a total of 20000 iterations are performed. The learning rate decays with a rate of 0.1 after the 5000th and 10000th iterations.

步骤5：获取低光照训练数据的伪标签。该步骤中，先将低光图像数据集{X_L}输入深度凹曲线预测模型得到提亮后的低光图像数据集{E(X_L)}，然后将低光图像数据集{E(X_L)}输入步骤1中得到的预训练模型预测得到标签

在获得的标签中，置信度低于0.98的标签将被丢弃。本步骤得到包含伪标签的低光照数据集

Step 5: Obtain pseudo-labels for the low-light training data. In this step, first input the low-light image dataset {X _L } into the deep concave curve prediction model to obtain the brightened low-light image dataset {E(X _L )}, and then input the low-light image dataset {E(X _L )} Input the pre-trained model obtained in step 1 to predict the label

Among the obtained labels, those with confidence lower than 0.98 will be discarded. This step obtains a low-light dataset containing pseudo-labels

步骤6：利用步骤1收集的包含标签的正常光数据集和步骤6得到的包含伪标签的低光照数据集，将预训练模型迁移至低光域。具体流程图如附图3所示。预训练模型的训练采用交叉熵损失函数，批大小为64，训练过程共计7000次迭代，初始学习率为0.001，并在第2000，4000，6000次迭代后以比率0.1衰减。训练采用SGD优化器，动量设置为0.9，权重衰减设置为0.00001。训练采用的数据增强方法包括随机裁剪，水平翻转，颜色抖动和随机旋转。Step 6: Migrate the pre-trained model to the low-light domain using the normal-light dataset containing labels collected in step 1 and the low-light dataset containing pseudo-labels obtained in step 6. The specific flow chart is shown in Figure 3. The training of the pre-trained model adopts the cross-entropy loss function, the batch size is 64, the training process has a total of 7000 iterations, the initial learning rate is 0.001, and the decay rate is 0.1 after the 2000th, 4000th and 6000th iterations. The training uses the SGD optimizer, the momentum is set to 0.9, and the weight decay is set to 0.00001. Data augmentation methods used for training include random cropping, horizontal flipping, color jittering and random rotation.

步骤7：推理阶段，对于待分类的低光照图像，首先使用步骤4中训练得到的深度凹曲线预测进行提亮，然后输入步骤6训练得到的低光照分类模型(即微调后的预训练模型)，得到预测结果，即一个大小等同于数据集中分类类别数的向量。Step 7: Inference stage, for the low-light image to be classified, first use the depth concave curve prediction obtained in step 4 to brighten, and then input the low-light classification model trained in step 6 (ie, the fine-tuned pre-trained model) , to get the predicted result, which is a vector whose size is equal to the number of classification categories in the dataset.

对于人脸检测任务，预训练模型采用双重检测人脸识别器DSFD；对于待检测的低光照图像，首先使用步骤4中训练得到的深度凹曲线预测进行提亮，然后输入步骤6训练微调后的DSFD，得到预测的人脸检测框坐标。For the face detection task, the pre-training model uses the double-detection face recognizer DSFD; for the low-light image to be detected, first use the depth concave curve prediction obtained in step 4 to brighten, and then input the fine-tuned image in step 6 to train DSFD, get the predicted coordinates of the face detection frame.

对于行为识别任务，预训练模型采用双流膨胀3D卷积网络I3D；对于待检测的低光照图像，首先使用步骤4中训练得到的深度凹曲线预测进行提亮，然后输入步骤6训练微调后的I3D，得到视频每一帧的行为识别预测结果，即一个大小等同于数据集中行为种类数的向量。For the behavior recognition task, the pre-training model uses the two-stream inflated 3D convolutional network I3D; for the low-light image to be detected, first use the depth concave curve prediction trained in step 4 to brighten, and then input the fine-tuned I3D trained in step 6 , to get the behavior recognition prediction result of each frame of the video, that is, a vector whose size is equal to the number of behavior types in the data set.

对于光流估计任务，预训练模型采用金字塔-形变-立体匹配光流估计网络PWC-Net；对于待检测的低光照图像，首先使用步骤4中训练得到的深度凹曲线预测进行提亮，然后输入步骤6训练微调后的PWC-Net，得到图像中每个像素在下一时刻的位置偏移量。For the optical flow estimation task, the pre-training model uses the pyramid-deformation-stereo matching optical flow estimation network PWC-Net; for the low-light image to be detected, first use the deep concave curve prediction trained in step 4 to brighten, and then input Step 6 trains the fine-tuned PWC-Net to obtain the position offset of each pixel in the image at the next moment.

以上实施例仅用以说明本发明的技术方案而非对其进行限制，本领域的普通技术人员可以对本发明的技术方案进行修改或者等同替换，而不脱离本发明的精神和范围，本发明的保护范围应以权利要求书所述为准。The above embodiments are only used to illustrate the technical solution of the present invention and not to limit it. Those of ordinary skill in the art can modify or equivalently replace the technical solution of the present invention without departing from the spirit and scope of the present invention. The scope of protection should be determined by the claims.

Claims

1. An unsupervised low-illumination-field self-adaptive training method comprises the following steps:

1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; the pre-training model is a visual task model trained by marked normal illumination training data; connecting a multilayer perceptron behind the feature extractor of the pre-training model to obtain a first model; the multilayer perceptron is used for mapping the features extracted by the feature extractor to a representation space of an automatic supervision task;

2) Training the first model by using the marked normal illumination training data, locking the parameters of the feature extractor in the training process, and only training the multilayer perceptron;

3) Constructing a depth concave curve model for predicting the pixel value of each pixel value of the input image after the brightness is enhanced; placing the depth concave curve model in front of the feature extractor in the first model to obtain a second model;

4) Training the second model using the low-light training data; locking parameters of the feature extractor and parameters in the multilayer perceptron in a training process, and only training the deep concave curve model;

5) Brightening the low-illumination training data by using the trained deep concave curve model, inputting the brightening result into the pre-training model, and predicting to obtain a label of the low-illumination training data; taking the predicted label as a pseudo label of the low-illumination training data;

6) And training the pre-training model by using the marked normal illumination training data and the low illumination training data with the pseudo label to obtain the fine-tuned pre-training model.

2. The method according to claim 1, wherein the multilayer perceptron adopts a network structure of full connection layer-batch normalization layer-linear rectification function-full connection layer; the method for training the first model by using the normal illumination training data comprises the following steps: firstly, sequentially rotating and partitioning normal illumination training data to obtain a plurality of image blocks; then, the sequence of the image blocks is scrambled and then input into the feature extractor for feature extraction, and the extracted features are sent to the multilayer perceptron; the multilayer perceptron predicts the sequence of each image block according to the input feature data; wherein the loss function used to train the first model is

L _C In order to be a function of the cross-entropy loss,

is the dictionary number in the arrangement and combination scheme of all image blocks in the order in which the normal illumination image N is scrambled,

is multi-layer perceptron predictionThe order of the puzzle pieces.

3. The method according to claim 1 or 2, wherein the deep concave curve model comprises a down-sampling layer, a U-net network, a convolutional layer, a global pooling layer and a full connection layer in sequence; the system comprises a global pooling layer, a full connection layer, a down-sampling layer, a U-net network, a convolution layer and a prediction layer, wherein the down-sampling layer is used for down-sampling an input image and inputting the down-sampling image into the U-net network, the U-net network is used for performing feature extraction on input data and inputting the input data into the convolution layer, the convolution layer performs further feature extraction on the input feature data and sequentially inputs the extracted features into the global pooling layer and the full connection layer to obtain a prediction result.

4. The method of claim 3, wherein the method for training the second model using the low-light training data is: firstly, brightening the low-illumination training data by adopting the deep concave curve model to obtain a brightened image; then, sequentially rotating and partitioning the brightened image to obtain a plurality of image blocks; then, the sequence of the image blocks is scrambled and then input into the feature extractor for feature extraction, and the extracted features are sent to the multilayer perceptron; the multilayer perceptron predicts the sequence of each image block according to input feature data; wherein the loss function used to train the second model is

L _C In order to be a function of the cross-entropy loss,

is the dictionary number in all image block permutation and combination schemes in the order in which the low-illumination images L are scrambled,

is the tile order predicted by the multi-layer perceptron.

5. The method of claim 3, wherein the depth-concave curve model comprises two convolution layers; namely, the deep concave curve model sequentially comprises a down-sampling layer, a U-net network, a first convolution layer, a second convolution layer, a global pooling layer and a full-connection layer.

6. The method of claim 1, wherein for classification tasks, the pre-trained model employs ResNet-18; for the face detection task, the pre-training model adopts DSFD; for the behavior recognition task, the pre-training model adopts I3D; for the optical flow estimation task, the pre-training model adopts PWC-Net.

7. An unsupervised low-illumination-area image visual task detection method comprises the following steps:

1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; the pre-training model is a visual task model trained by marked normal illumination training data; connecting a multilayer perceptron behind the feature extractor of the pre-training model to obtain a first model; the multilayer perceptron adopts a network structure of a full connection layer, a batch normalization layer, a linear rectification function and a full connection layer;

4) Training the second model using the low-light training data; in the training process, parameters of the feature extractor and parameters in the multilayer perceptron are locked, and only the deep concave curve model is trained;

5) Brightening the low-illumination training data by using the trained deep concave curve model, inputting the brightening result into the pre-training model, and predicting to obtain a label of the low-illumination training data; taking the predicted label as a pseudo label of the low-light training data;

6) Training the pre-training model by using the marked normal illumination training data and the low illumination training data with the pseudo label to obtain a fine-tuned pre-training model;

7) And inputting the low-illumination image to be processed into the trained deep concave curve model, brightening the low-illumination image, inputting the pre-trained model after fine adjustment, and outputting a corresponding visual task detection result.

8. A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method of any one of claims 1 to 7.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.