CN115641483A - An unsupervised low-light domain adaptive training method and detection method - Google Patents
An unsupervised low-light domain adaptive training method and detection method Download PDFInfo
- Publication number
- CN115641483A CN115641483A CN202211129606.6A CN202211129606A CN115641483A CN 115641483 A CN115641483 A CN 115641483A CN 202211129606 A CN202211129606 A CN 202211129606A CN 115641483 A CN115641483 A CN 115641483A
- Authority
- CN
- China
- Prior art keywords
- model
- training
- low
- illumination
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 151
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000001514 detection method Methods 0.000 title claims abstract description 17
- 230000003044 adaptive effect Effects 0.000 title description 5
- 238000005286 illumination Methods 0.000 claims abstract description 48
- 238000005282 brightening Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 14
- 238000011176 pooling Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 230000006399 behavior Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 7
- 230000003287 optical effect Effects 0.000 claims description 7
- 230000000007 visual effect Effects 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims 6
- 238000000638 solvent extraction Methods 0.000 claims 2
- 238000002372 labelling Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/48—Extraction of image or video features by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明属于数字图像低光照增强领域和机器视觉领域,涉及一种基于深度凹曲线的无监督低光照域自适应训练方法及检测方法。The invention belongs to the fields of digital image low-light enhancement and machine vision, and relates to an unsupervised low-light field self-adaptive training method and detection method based on a deep concave curve.
背景技术Background technique
低光照是一种常见的图像降质,光照不足通常由低光照拍摄环境、相机故障、参数设置错误等原因造成。低光照环境下的视觉任务,包括物体分类、人脸检测、行为识别和光流估计等,一直受到学术界和工业界的关注。传统的低光照视觉任务模型训练需要大规模标注训练集,但在低光照环境下,数据难以标注,并且业内已有大量正常光照训练数据集及预训练模型,新搭建低光照数据集并重新训练模型会重复消耗人力物力。如何充分利用已有的具备标注的正常光照训练数据集及正常光照预训练模型,并在不额外引入低光照标注的条件下训练得到能够应用在低光照环境下的模型,即通过无监督域自适应的方法将正常光照预训练模型迁移至低光照环境,具有广泛的现实意义和应用价值。Low light is a common image degradation. Insufficient light is usually caused by low light shooting environment, camera failure, wrong parameter setting and other reasons. Vision tasks in low-light environments, including object classification, face detection, action recognition, and optical flow estimation, have been attracting attention from academia and industry. Traditional low-light vision task model training requires large-scale labeling of training sets, but in low-light environments, data is difficult to label, and there are a large number of normal-light training data sets and pre-training models in the industry. A new low-light data set is built and retrained Models will repeatedly consume manpower and material resources. How to make full use of the existing labeled normal illumination training data set and normal illumination pre-training model, and train a model that can be applied in low-light environments without additional introduction of low-light annotations, that is, through unsupervised domain automatic The adaptive method migrates the normal lighting pre-training model to the low-light environment, which has extensive practical significance and application value.
传统无监督低光照域自适应方法可以分为三类。基于提亮的方法对低光照图像进行提亮,从而提升在正常光照图像上训练的模型的性能。基于特征迁移的方法通过对比学习方法对齐正常光图像与低光图像的特征,从而使模型能够运用于低光环境。基于对抗学习的方法通过生成对抗网络生成暗光图像,并利用伪标签将模型迁移到低光环境。Traditional unsupervised low-light domain adaptation methods can be divided into three categories. Brightening-based methods brighten low-light images to improve the performance of models trained on normal-light images. Feature-transfer-based methods align the features of normal-light images and low-light images through a contrastive learning method, thereby enabling the model to be applied to low-light environments. Adversarial learning-based methods generate low-light images through generative adversarial networks and utilize pseudo-labels to transfer the model to low-light environments.
但是,基于提亮的方法忽略了人类视觉与机器视觉的差异,基于特征迁移的方法忽略了像素级别调整的重要性,基于对抗学习的方法需要来自多个域的数据且忽略了输入图像自身的特征。已有的无监督域自适应方法效果不佳,均无法满足实际应用的需求。However, brightening-based methods ignore the difference between human vision and machine vision, feature transfer-based methods ignore the importance of pixel-level adjustments, and adversarial learning-based methods require data from multiple domains and ignore the input image itself. feature. Existing unsupervised domain adaptation methods do not perform well, and none of them can meet the needs of practical applications.
发明内容Contents of the invention
针对上述问题,本发明的目的在于提供一种基于深度凹曲线的无监督低光照域自适应训练方法及检测方法。本发明运用自监督训练策略训练深度凹曲线模型进行亮度增强,综合地提升了模型在低光环境下的性能。In view of the above problems, the purpose of the present invention is to provide an unsupervised low-light domain adaptive training method and detection method based on deep concave curves. The invention uses a self-supervised training strategy to train a deep concave curve model to enhance brightness, and comprehensively improves the performance of the model in a low-light environment.
本发明采用的技术方案如下:The technical scheme that the present invention adopts is as follows:
一种无监督低光照域自适应训练方法,其步骤包括:An unsupervised low-light domain adaptive training method, the steps of which include:
1)收集有标注的正常光照训练数据、无标注的低光照训练数据和预训练模型;所述预训练模型为有标注的正常光照训练数据训练后的视觉任务模型;在所述预训练模型的特征提取器之后连接一多层感知器,得到第一模型;所述多层感知器用于将特征提取器提取的特征映射到自监督任务的表示空间;在本方案采用的基于旋转拼图的自监督学习中,多层感知器的输出为一个30维向量,代表被打乱后的图像在所有图像块排列组合方案中的字典序号;1) Collect labeled normal illumination training data, unlabeled low-light training data, and a pre-training model; the pre-training model is a visual task model trained with labeled normal illumination training data; in the pre-training model A multi-layer perceptron is connected after the feature extractor to obtain the first model; the multi-layer perceptron is used to map the features extracted by the feature extractor to the representation space of the self-supervised task; the self-supervision based on the rotation puzzle used in this program During learning, the output of the multi-layer perceptron is a 30-dimensional vector, which represents the dictionary number of the shuffled image in all image block arrangement and combination schemes;
2)利用所述有标注的正常光照训练数据训练所述第一模型,训练过程中锁定所述特征提取器的参数,仅训练所述多层感知器;2) using the labeled normal illumination training data to train the first model, locking the parameters of the feature extractor during the training process, and only training the multi-layer perceptron;
3)构建深度凹曲线模型,用于预测输入图像的每个像素值在亮度增强后的像素值;将所述深度凹曲线模型置于所述第一模型中所述特征提取器之前,得到第二模型;3) Constructing a depth concave curve model for predicting the pixel value of each pixel value of the input image after brightness enhancement; placing the depth concave curve model before the feature extractor in the first model to obtain the second Two models;
4)利用所述低光照训练数据训练所述第二模型;训练过程中锁定所述特征提取器的参数以及所述多层感知器中的参数,仅训练所述深度凹曲线模型;4) using the low-light training data to train the second model; during the training process, the parameters of the feature extractor and the parameters in the multi-layer perceptron are locked, and only the deep concave curve model is trained;
5)利用训练后的所述深度凹曲线模型对所述低光照训练数据进行提亮后输入所述预训练模型,预测得到所述低光照训练数据的标签;将预测所得标签作为所述低光照训练数据的伪标签;5) Using the trained deep concave curve model to brighten the low-light training data and then input the pre-training model to predict the label of the low-light training data; use the predicted label as the low-light Pseudo-labeling of training data;
6)利用所述有标注的正常光照训练数据及带伪标签的低光照训练数据,对所述预训练模型进行训练,得到微调后的预训练模型。6) Using the labeled normal-light training data and the pseudo-labeled low-light training data to train the pre-training model to obtain a fine-tuned pre-training model.
进一步的,所述多层感知器采用“全连接层-批规范化层-线性整流函数-全连接层”的网络结构;利用所述正常光照训练数据训练所述第一模型的方法为:首先将一正常光照训练数据依次进行旋转、分块,得到多个图像块;然后将图像块的顺序打乱后输入所述特征提取器进行特征提取,并将所提取特征发送给所述多层感知器;所述多层感知器根据输入的特征数据预测各所述图像块的顺序;其中,训练所述第一模型所采用的损失函数为LC为交叉熵损失函数,是正常光照图像N被打乱的顺序在所有图像块排列组合方案中的字典序号,是多层感知器预测的拼图顺序。Further, the multi-layer perceptron adopts the network structure of "fully connected layer-batch normalization layer-linear rectification function-fully connected layer"; the method of using the normal illumination training data to train the first model is: first A normal illumination training data is sequentially rotated and divided into blocks to obtain multiple image blocks; then the order of the image blocks is disrupted and input to the feature extractor for feature extraction, and the extracted features are sent to the multi-layer perceptron ; The multi-layer perceptron predicts the order of each of the image blocks according to the input feature data; wherein, the loss function used for training the first model is L C is the cross entropy loss function, is the dictionary sequence number of the disordered order of the normal illumination image N in all image block arrangement and combination schemes, is the puzzle order predicted by the multilayer perceptron.
进一步的,所述深度凹曲线模型依次包含降采样层、U-net网络、卷积层、全局池化层和全连接层;其中,所述降采样层用于对输入图像进行降采样并输入所述U-net网络,所述U-net网络用于对输入数据进行特征提取并将其输入所述卷积层,所述卷积层对输入特征数据进行进一步特征提取并将所提取特征依次输入所述全局池化层、全连接层,得到预测结果。Further, the deep concave curve model includes a downsampling layer, a U-net network, a convolutional layer, a global pooling layer, and a fully connected layer in sequence; wherein, the downsampling layer is used to downsample the input image and input The U-net network, the U-net network is used to extract features from the input data and input it into the convolution layer, and the convolution layer performs further feature extraction on the input feature data and extracts the features sequentially Input the global pooling layer and the fully connected layer to obtain the prediction result.
进一步的,利用所述低光照训练数据训练所述第二模型的方法为:首先采用所述深度凹曲线模型对所述低光照训练数据进行提亮,得到提亮后的图像;然后对提亮后的图像依次进行旋转、分块,得到多个图像块;然后将图像块的顺序打乱后输入所述特征提取器进行特征提取,并将所提取特征发送给所述多层感知器;所述多层感知器根据输入的特征数据预测各所述图像块的顺序;其中训练所述第二模型所采用的损失函数为LC为交叉熵损失函数,是低光照图像L被打乱的顺序在所有图像块排列组合方案中的字典序号,是多层感知器预测的拼图顺序。Further, the method of using the low-light training data to train the second model is as follows: firstly, using the deep concave curve model to brighten the low-light training data to obtain a brightened image; The final image is rotated and divided into blocks in turn to obtain multiple image blocks; then the order of the image blocks is disrupted and then input to the feature extractor for feature extraction, and the extracted features are sent to the multi-layer perceptron; The multi-layer perceptron predicts the order of each of the image blocks according to the input feature data; wherein the loss function used to train the second model is L C is the cross entropy loss function, is the sequence number of the low-light image L that is shuffled in all image block arrangement and combination schemes, is the puzzle order predicted by the multilayer perceptron.
进一步的,所述深度凹曲线模型包括两个卷积层;即所述深度凹曲线模型依次包含降采样层、U-net网络、第一卷积层、第二卷积层、全局池化层和全连接层。Further, the deep concave curve model includes two convolutional layers; that is, the deep concave curve model includes a downsampling layer, a U-net network, a first convolutional layer, a second convolutional layer, and a global pooling layer in sequence and fully connected layers.
进一步的,对于分类任务,所述预训练模型采用ResNet-18;对于人脸检测任务,所述预训练模型采用DSFD;对于行为识别任务,所述预训练模型采用I3D;对于光流估计任务,所述预训练模型采用PWC-Net。Further, for the classification task, the pre-training model uses ResNet-18; for the face detection task, the pre-training model uses DSFD; for the behavior recognition task, the pre-training model uses I3D; for the optical flow estimation task, The pre-training model uses PWC-Net.
一种无监督低光照域图像视觉任务检测方法,其步骤包括:An unsupervised low-light domain image vision task detection method, the steps of which include:
1)收集有标注的正常光照训练数据、无标注的低光照训练数据和预训练模型;所述预训练模型为有标注的正常光照训练数据训练后的视觉任务模型;在所述预训练模型的特征提取器之后连接一多层感知器,得到第一模型;所述多层感知器采用“全连接层-批规范化层-线性整流函数-全连接层”的网络结构;1) Collect labeled normal illumination training data, unlabeled low-light training data, and a pre-training model; the pre-training model is a visual task model trained with labeled normal illumination training data; in the pre-training model A multi-layer perceptron is connected after the feature extractor to obtain the first model; the multi-layer perceptron adopts the network structure of "fully connected layer-batch normalization layer-linear rectification function-fully connected layer";
2)利用所述有标注的正常光照训练数据训练所述第一模型,训练过程中锁定所述特征提取器的参数,仅训练所述多层感知器;2) using the labeled normal illumination training data to train the first model, locking the parameters of the feature extractor during the training process, and only training the multi-layer perceptron;
3)构建深度凹曲线模型,用于预测输入图像的每个像素值在亮度增强后的像素值;将所述深度凹曲线模型置于所述第一模型中所述特征提取器之前,得到第二模型;3) Constructing a depth concave curve model for predicting the pixel value of each pixel value of the input image after brightness enhancement; placing the depth concave curve model before the feature extractor in the first model to obtain the second Two models;
4)利用所述低光照训练数据训练所述第二模型;训练过程中锁定所述特征提取器的参数以及所述多层感知器中的参数,仅训练所述深度凹曲线模型;4) using the low-light training data to train the second model; during the training process, the parameters of the feature extractor and the parameters in the multi-layer perceptron are locked, and only the deep concave curve model is trained;
5)利用训练后的所述深度凹曲线模型对所述低光照训练数据进行提亮后输入所述预训练模型,预测得到所述低光照训练数据的标签;将预测所得标签作为所述低光照训练数据的伪标签;5) Using the trained deep concave curve model to brighten the low-light training data and then input the pre-training model to predict the label of the low-light training data; use the predicted label as the low-light Pseudo-labeling of training data;
6)利用所述有标注的正常光照训练数据及带伪标签的低光照训练数据,对所述预训练模型进行训练,得到微调后的预训练模型;6) using the labeled normal illumination training data and the low-light training data with pseudo-labels to train the pre-training model to obtain a fine-tuned pre-training model;
7)对于待处理的低光照图像,将其输入训练后的所述深度凹曲线模型进行提亮后输入微调后的预训练模型,输出对应的视觉任务检测结果。7) For the low-light image to be processed, input it into the trained deep concave curve model for brightening, then input it into the fine-tuned pre-training model, and output the corresponding visual task detection result.
一种服务器,其特征在于,包括存储器和处理器,所述存储器存储计算机程序,所述计算机程序被配置为由所述处理器执行,所述计算机程序包括用于执行上述方法中各步骤的指令。A server, characterized in that it includes a memory and a processor, the memory stores a computer program, the computer program is configured to be executed by the processor, and the computer program includes instructions for executing the steps in the above method .
一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现上述方法的步骤。A computer-readable storage medium, on which a computer program is stored, is characterized in that, when the computer program is executed by a processor, the steps of the above method are realized.
与现有技术相比,本发明的积极效果为:Compared with prior art, positive effect of the present invention is:
本发明显著提升正常光照模型在低光照环境下的性能,在CODaN低光照分类基准测试集上,能够将通用分类模型ResNet-18的准确率从60.96%提升至63.92%;在DarkFace低光照人脸检测基准测试集上,能够将通用人脸检测器Dual Shot Face Detector的平均精确度(mean of Average Precision)由44.44提升至46.91;在低光照行为识别基准测试集ARID上,能够将识别准确率由50.18%提升至52.13%;在低光光流估计基准测试机VBOF上,能够将终点错误(end-point error)由8.99降低至7.44。The present invention significantly improves the performance of the normal illumination model in low-light environments. On the CODaN low-light classification benchmark test set, the accuracy rate of the general classification model ResNet-18 can be increased from 60.96% to 63.92%; On the detection benchmark test set, the average accuracy (mean of Average Precision) of the general-purpose face detector Dual Shot Face Detector can be improved from 44.44 to 46.91; on the low-light behavior recognition benchmark test set ARID, the recognition accuracy can be increased by 50.18% increased to 52.13%; on the low-light optical flow estimation benchmark machine VBOF, the end-point error (end-point error) can be reduced from 8.99 to 7.44.
附图说明Description of drawings
图1为深度凹曲线模型的结构图。Figure 1 is a structural diagram of a deep concave curve model.
图2为深度凹曲线模型的训练流程图。Figure 2 is a training flow chart of the deep concave curve model.
图3为将预训练模型迁移至低光域的流程图。Fig. 3 is a flowchart of migrating the pre-trained model to the low-light domain.
具体实施方式Detailed ways
为使本发明的上述特征和优点能更明显易懂,下文特举实施例,并配合所附图作详细说明如下。In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.
本实施例公开一种应用在低光分类任务上的无监督低光照域自适应方法,具体说明如下:This embodiment discloses an unsupervised low-light domain adaptive method applied to low-light classification tasks, and the specific description is as follows:
步骤1:搜集带标注的正常光照图像,组成训练数据集{XN,YN};搜集低光照图像,组成低光照训练数据集{XL}。其中,正常光照训练数据集中的样本XN需包含类别信息YN,低光照训练数据集中的样本无需包含类别信息。获取正常光照图像上的预训练模型,该模型中包含特征提取器。预训练模型采用残差卷积网络ResNet-18,也可以采用其他预训练模型。对于分类任务,预训练模型采用ResNet-18;对于人脸检测任务,预训练模型采用DSFD;对于行为识别任务,预训练模型采用I3D;对于光流估计任务,预训练模型采用PWC-Net。Step 1: Collect labeled normal-light images to form a training dataset {X N , Y N }; collect low-light images to form a low-light training dataset {X L }. Among them, the samples X N in the normal light training data set need to contain category information Y N , and the samples in the low light training data set do not need to contain category information. Get a pretrained model on normally illuminated images that includes a feature extractor. The pre-training model uses the residual convolutional network ResNet-18, and other pre-training models can also be used. For classification tasks, the pre-training model uses ResNet-18; for face detection tasks, the pre-training model uses DSFD; for behavior recognition tasks, the pre-training model uses I3D; for optical flow estimation tasks, the pre-training model uses PWC-Net.
步骤2:构建并训练多层感知器。多层感知器采用“全连接层-批规范化层-线性整流函数-全连接层”的网络结构。固定步骤1中获取的特征提取器的参数,将多层感知器接在特征提取器后,通过自监督训练方法,利用正常光照数据集{XL}训练多层感知器。自监督训练方法可以使用旋转拼图的策略,即先对图像进行旋转,再将图像分为3×3的9个图像块,对图像块进行顺序打乱,并训练多层感知器恢复图像块原本的顺序。这一步骤的损失函数项为:Step 2: Build and train a multilayer perceptron. The multi-layer perceptron adopts the network structure of "fully connected layer-batch normalization layer-linear rectification function-fully connected layer". Fix the parameters of the feature extractor obtained in step 1, connect the multi-layer perceptron to the feature extractor, and use the normal illumination data set {X L } to train the multi-layer perceptron through self-supervised training method. The self-supervised training method can use the strategy of rotating the puzzle, that is, first rotate the image, and then divide the image into 9 image blocks of 3×3, scramble the order of the image blocks, and train the multilayer perceptron to restore the original image blocks. Order. The loss function term for this step is:
式中,LC为交叉熵损失函数,是正常光照图像N被打乱的顺序在所有图像块排列组合方案中的字典序号,是多层感知器预测的拼图顺序。训练批大小为64,首先使用学习率0.01进行150000次迭代,再使用学习率0.001进行150000次迭代。In the formula, LC is the cross-entropy loss function, is the dictionary sequence number of the disordered order of the normal illumination image N in all image block arrangement and combination schemes, is the puzzle order predicted by the multilayer perceptron. The training batch size is 64, first with a learning rate of 0.01 for 150,000 iterations, and then with a learning rate of 0.001 for 150,000 iterations.
步骤3:构建深度凹曲线模型,该深度凹曲线模型以无标注的低光照图像作为输入,用来预测映射g。g是从原图像像素值到新图像像素值的映射。例如,对于8比特灰度图像,由于色彩域中有256种数值,g是一个256维的向量。深度凹曲线模型的输出是g在归一化前的离散二阶导数的相反数,为255维的向量,依据该输出可以积分并归一化得到g。对于8比特三通道彩色图像,深度凹曲线模型对三个通道对应的g分别进行预测,即输出为765维向量。深度凹曲线模型末层为整流线性函数,以保证输出为非负值,进而保证g为凹曲线。深度凹曲线模型的详细结构如附图1所示,依次包含降采样层、U-net网络、两个3×3卷积层、全局池化层和全连接层。其中,降采样层将输入图像的分辨率降低为16*16,U-net网络以降采样层的输出作为输入,提取数据的特征,输出与输入大小相等。两个卷积层以U-net网络输出作为输入,进一步提取特征。全局池化层和全连接层以卷积层的输出作为输入,输出深度凹曲线模型的预测结果,即765维向量。Step 3: Construct a deep concave curve model that takes an unlabeled low-light image as input to predict the map g. g is the mapping from the pixel values of the original image to the pixel values of the new image. For example, for an 8-bit grayscale image, since there are 256 values in the color domain, g is a 256-dimensional vector. The output of the deep concave curve model is the opposite number of the discrete second derivative of g before normalization, which is a 255-dimensional vector. According to the output, g can be obtained by integrating and normalizing. For an 8-bit three-channel color image, the deep concave curve model predicts the g corresponding to the three channels, that is, the output is a 765-dimensional vector. The last layer of the deep concave curve model is a rectified linear function to ensure that the output is a non-negative value, thereby ensuring that g is a concave curve. The detailed structure of the deep concave curve model is shown in Figure 1, which sequentially includes a downsampling layer, a U-net network, two 3×3 convolutional layers, a global pooling layer, and a fully connected layer. Among them, the downsampling layer reduces the resolution of the input image to 16*16, and the U-net network uses the output of the downsampling layer as input to extract the characteristics of the data, and the output is equal to the input size. Two convolutional layers take the U-net network output as input to further extract features. The global pooling layer and the fully connected layer take the output of the convolutional layer as input, and output the prediction result of the deep concave curve model, that is, a 765-dimensional vector.
步骤4:训练深度凹曲线模型,流程图如附图2所示。本步骤中,特征提取器和多层感知器的参数保持不变,仅训练深度凹曲线模型。训练利用低光照数据集{XL},采用自监督范式。自监督训练方法可以使用旋转拼图的策略,即先对图像进行旋转,再将图像分为3×3的9个图像块,对图像块进行顺序打乱,并训练模型恢复图像块原本的顺序。这一步骤的损失函数项为:Step 4: Train the deep concave curve model, the flow chart is shown in Figure 2. In this step, the parameters of the feature extractor and multilayer perceptron remain unchanged, and only the deep concave curve model is trained. Training utilizes the low-light dataset {X L }, using a self-supervised paradigm. The self-supervised training method can use the strategy of rotating the puzzle, that is, first rotate the image, and then divide the image into 9 image blocks of 3×3, scramble the order of the image blocks, and train the model to restore the original order of the image blocks. The loss function term for this step is:
式中,LC为交叉熵损失函数,是低光照图像L被打乱的顺序在所有图像块排列组合方案中的字典序号,是多层感知器预测的拼图顺序。训练批大小为64,初始学习率设置为0.01,共迭代20000次。学习率在第5000和第10000次迭代后以比率0.1衰减。In the formula, LC is the cross-entropy loss function, is the sequence number of the low-light image L that is shuffled in all image block arrangement and combination schemes, is the puzzle order predicted by the multilayer perceptron. The training batch size is 64, the initial learning rate is set to 0.01, and a total of 20000 iterations are performed. The learning rate decays with a rate of 0.1 after the 5000th and 10000th iterations.
步骤5:获取低光照训练数据的伪标签。该步骤中,先将低光图像数据集{XL}输入深度凹曲线预测模型得到提亮后的低光图像数据集{E(XL)},然后将低光图像数据集{E(XL)}输入步骤1中得到的预训练模型预测得到标签在获得的标签中,置信度低于0.98的标签将被丢弃。本步骤得到包含伪标签的低光照数据集 Step 5: Obtain pseudo-labels for the low-light training data. In this step, first input the low-light image dataset {X L } into the deep concave curve prediction model to obtain the brightened low-light image dataset {E(X L )}, and then input the low-light image dataset {E(X L )} Input the pre-trained model obtained in step 1 to predict the label Among the obtained labels, those with confidence lower than 0.98 will be discarded. This step obtains a low-light dataset containing pseudo-labels
步骤6:利用步骤1收集的包含标签的正常光数据集和步骤6得到的包含伪标签的低光照数据集,将预训练模型迁移至低光域。具体流程图如附图3所示。预训练模型的训练采用交叉熵损失函数,批大小为64,训练过程共计7000次迭代,初始学习率为0.001,并在第2000,4000,6000次迭代后以比率0.1衰减。训练采用SGD优化器,动量设置为0.9,权重衰减设置为0.00001。训练采用的数据增强方法包括随机裁剪,水平翻转,颜色抖动和随机旋转。Step 6: Migrate the pre-trained model to the low-light domain using the normal-light dataset containing labels collected in step 1 and the low-light dataset containing pseudo-labels obtained in step 6. The specific flow chart is shown in Figure 3. The training of the pre-trained model adopts the cross-entropy loss function, the batch size is 64, the training process has a total of 7000 iterations, the initial learning rate is 0.001, and the decay rate is 0.1 after the 2000th, 4000th and 6000th iterations. The training uses the SGD optimizer, the momentum is set to 0.9, and the weight decay is set to 0.00001. Data augmentation methods used for training include random cropping, horizontal flipping, color jittering and random rotation.
步骤7:推理阶段,对于待分类的低光照图像,首先使用步骤4中训练得到的深度凹曲线预测进行提亮,然后输入步骤6训练得到的低光照分类模型(即微调后的预训练模型),得到预测结果,即一个大小等同于数据集中分类类别数的向量。Step 7: Inference stage, for the low-light image to be classified, first use the depth concave curve prediction obtained in step 4 to brighten, and then input the low-light classification model trained in step 6 (ie, the fine-tuned pre-trained model) , to get the predicted result, which is a vector whose size is equal to the number of classification categories in the dataset.
对于人脸检测任务,预训练模型采用双重检测人脸识别器DSFD;对于待检测的低光照图像,首先使用步骤4中训练得到的深度凹曲线预测进行提亮,然后输入步骤6训练微调后的DSFD,得到预测的人脸检测框坐标。For the face detection task, the pre-training model uses the double-detection face recognizer DSFD; for the low-light image to be detected, first use the depth concave curve prediction obtained in step 4 to brighten, and then input the fine-tuned image in step 6 to train DSFD, get the predicted coordinates of the face detection frame.
对于行为识别任务,预训练模型采用双流膨胀3D卷积网络I3D;对于待检测的低光照图像,首先使用步骤4中训练得到的深度凹曲线预测进行提亮,然后输入步骤6训练微调后的I3D,得到视频每一帧的行为识别预测结果,即一个大小等同于数据集中行为种类数的向量。For the behavior recognition task, the pre-training model uses the two-stream inflated 3D convolutional network I3D; for the low-light image to be detected, first use the depth concave curve prediction trained in step 4 to brighten, and then input the fine-tuned I3D trained in step 6 , to get the behavior recognition prediction result of each frame of the video, that is, a vector whose size is equal to the number of behavior types in the data set.
对于光流估计任务,预训练模型采用金字塔-形变-立体匹配光流估计网络PWC-Net;对于待检测的低光照图像,首先使用步骤4中训练得到的深度凹曲线预测进行提亮,然后输入步骤6训练微调后的PWC-Net,得到图像中每个像素在下一时刻的位置偏移量。For the optical flow estimation task, the pre-training model uses the pyramid-deformation-stereo matching optical flow estimation network PWC-Net; for the low-light image to be detected, first use the deep concave curve prediction trained in step 4 to brighten, and then input Step 6 trains the fine-tuned PWC-Net to obtain the position offset of each pixel in the image at the next moment.
以上实施例仅用以说明本发明的技术方案而非对其进行限制,本领域的普通技术人员可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明的精神和范围,本发明的保护范围应以权利要求书所述为准。The above embodiments are only used to illustrate the technical solution of the present invention and not to limit it. Those of ordinary skill in the art can modify or equivalently replace the technical solution of the present invention without departing from the spirit and scope of the present invention. The scope of protection should be determined by the claims.
Claims (9)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211129606.6A CN115641483A (en) | 2022-09-16 | 2022-09-16 | An unsupervised low-light domain adaptive training method and detection method |
PCT/CN2022/130218 WO2024055398A1 (en) | 2022-09-16 | 2022-11-07 | Unsupervised low-illumination-domain adaptive training method and detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211129606.6A CN115641483A (en) | 2022-09-16 | 2022-09-16 | An unsupervised low-light domain adaptive training method and detection method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115641483A true CN115641483A (en) | 2023-01-24 |
Family
ID=84941611
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211129606.6A Pending CN115641483A (en) | 2022-09-16 | 2022-09-16 | An unsupervised low-light domain adaptive training method and detection method |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115641483A (en) |
WO (1) | WO2024055398A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119830393A (en) * | 2024-12-10 | 2025-04-15 | 中国建筑第五工程局有限公司 | Swin transform neural network-based three-dimensional surface lighting performance prediction method for building |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12293293B2 (en) * | 2018-09-28 | 2025-05-06 | Aarish Technologies | Machine learning using structurally regularized convolutional neural network architecture |
CN112069921A (en) * | 2020-08-18 | 2020-12-11 | 浙江大学 | A Small-Sample Visual Object Recognition Method Based on Self-Supervised Knowledge Transfer |
CN112508815A (en) * | 2020-12-09 | 2021-03-16 | 中国科学院深圳先进技术研究院 | Model training method and device, electronic equipment and machine-readable storage medium |
CN114693545A (en) * | 2022-02-15 | 2022-07-01 | 北京大学 | Low-illumination enhancement method and system based on curve family function |
-
2022
- 2022-09-16 CN CN202211129606.6A patent/CN115641483A/en active Pending
- 2022-11-07 WO PCT/CN2022/130218 patent/WO2024055398A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2024055398A1 (en) | 2024-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111639692B (en) | Shadow detection method based on attention mechanism | |
Yadav et al. | Using deep learning to classify X-ray images of potential tuberculosis patients | |
CN112308860A (en) | Earth observation image semantic segmentation method based on self-supervision learning | |
CN110322495A (en) | A kind of scene text dividing method based on Weakly supervised deep learning | |
CN114841972A (en) | Transmission line defect identification method based on saliency map and semantic embedding feature pyramid | |
CN114936605A (en) | A neural network training method, equipment and storage medium based on knowledge distillation | |
CN115147632B (en) | Method and device for automatic labeling of image categories based on density peak clustering algorithm | |
CN109509156B (en) | Image defogging processing method based on generation countermeasure model | |
CN108229346B (en) | Video summarization using signed foreground extraction and fusion | |
CN111667027B (en) | Multi-modal image segmentation model training method, image processing method and device | |
EP4214687A1 (en) | Systems and methods of contrastive point completion with fine-to-coarse refinement | |
CN113870286B (en) | Foreground segmentation method based on multi-level feature and mask fusion | |
WO2022218012A1 (en) | Feature extraction method and apparatus, device, storage medium, and program product | |
CN110827265A (en) | Image anomaly detection method based on deep learning | |
CN111127360A (en) | Gray level image transfer learning method based on automatic encoder | |
CN110335299A (en) | An Implementation Method of Monocular Depth Estimation System Based on Adversarial Network | |
WO2024081778A1 (en) | A generalist framework for panoptic segmentation of images and videos | |
CN114596477A (en) | Foggy day train fault detection method based on field self-adaption and attention mechanism | |
CN108446627A (en) | A kind of Aerial Images matching process based on partial-depth Hash | |
CN116091764A (en) | Cloud image segmentation method based on fusion transformation network | |
CN117710645A (en) | Dynamic scene VSLAM optimization method based on fusion attention mechanism and lightweight neural network | |
CN115641483A (en) | An unsupervised low-light domain adaptive training method and detection method | |
CN118982676A (en) | A feature extraction method based on feature point extraction network fused with dual attention in SLAM system | |
CN113343979A (en) | Method, apparatus, device, medium and program product for training a model | |
Li et al. | Perceptually-calibrated synergy network for night-time image quality assessment with enhancement booster and knowledge cross-sharing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |