CN115641483A - Unsupervised low-illumination-area self-adaptive training method and detection method - Google Patents
Unsupervised low-illumination-area self-adaptive training method and detection method Download PDFInfo
- Publication number
- CN115641483A CN115641483A CN202211129606.6A CN202211129606A CN115641483A CN 115641483 A CN115641483 A CN 115641483A CN 202211129606 A CN202211129606 A CN 202211129606A CN 115641483 A CN115641483 A CN 115641483A
- Authority
- CN
- China
- Prior art keywords
- model
- training
- low
- illumination
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 177
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000001514 detection method Methods 0.000 title claims abstract description 18
- 238000005286 illumination Methods 0.000 claims abstract description 89
- 238000005282 brightening Methods 0.000 claims abstract description 19
- 230000006870 function Effects 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 15
- 238000011176 pooling Methods 0.000 claims description 10
- 230000000007 visual effect Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 230000006399 behavior Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 230000003287 optical effect Effects 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 4
- 238000000638 solvent extraction Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 description 7
- 238000012360 testing method Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 1
- 229960001948 caffeine Drugs 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- RYYVLZVUVIJVGH-UHFFFAOYSA-N trimethylxanthine Natural products CN1C(=O)N(C)C(=O)C2=C1N=CN2C RYYVLZVUVIJVGH-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/48—Extraction of image or video features by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an unsupervised low-illumination-area self-adaptive training method and a detection method. The method comprises the following steps: 1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; connecting a multilayer perceptron behind a feature extractor of the pre-training model to obtain a first model; 2) Training a multilayer sensor in a first model by using normal illumination data; 3) Constructing a depth concave curve model and placing the depth concave curve model in front of a feature extractor in a first model to obtain a second model; 4) Training a depth concave curve model in the second model by using the low-light data; 5) Brightening the low-illumination data by using a deep concave curve model, inputting the low-illumination data into a pre-training model, and using a label obtained by prediction as a pseudo label of the low-illumination data; 6) Training and fine-tuning the pre-training model by using normal illumination data and low illumination data with a pseudo label; 7) And (4) brightening the low-illumination image to be processed, inputting the fine-tuned pre-training model, and outputting a corresponding detection result.
Description
Technical Field
The invention belongs to the field of digital image low-illumination enhancement and the field of machine vision, and relates to an unsupervised low-illumination-area self-adaptive training method and a detection method based on a deep concave curve.
Background
Low light is a common image degradation, and insufficient light is usually caused by low light shooting environment, camera failure, parameter setting error and the like. Visual tasks in low-light environments, including object classification, face detection, behavior recognition, and optical flow estimation, have been receiving attention from both academic and industrial fields. Traditional low-illumination visual task model training needs large-scale labeling of a training set, but in a low-illumination environment, data is difficult to label, a large number of normal-illumination training data sets and pre-training models exist in the industry, and manpower and material resources are repeatedly consumed when the low-illumination training data sets are newly built and the models are newly trained. How to fully utilize the existing normal illumination training data set with labels and a normal illumination pre-training model and train under the condition of not additionally introducing low illumination labels to obtain the model capable of being applied to the low illumination environment, namely, the normal illumination pre-training model is transferred to the low illumination environment by a unsupervised domain self-adaptive method, and the method has wide practical significance and application value.
Traditional unsupervised low-illumination-domain adaptive methods can be divided into three categories. The low-illumination image is lightened based on the lightening method, so that the performance of the model trained on the normal-illumination image is improved. The feature migration-based method aligns the features of the normal light image and the low light image through a contrast learning method, so that the model can be applied to a low light environment. The countermeasure learning-based method generates a dim light image by generating a countermeasure network and migrates the model to a low light environment using pseudo tags.
However, the brightening-based method ignores the difference between human vision and machine vision, the feature migration-based method ignores the importance of pixel level adjustment, the counterlearning-based method requires data from multiple domains and ignores the features of the input image itself. The existing unsupervised domain self-adaptive methods have poor effects and cannot meet the requirements of practical application.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide an unsupervised low illumination area adaptive training method and a detection method based on a deep concave curve. The invention trains the deep concave curve model to enhance the brightness by using the self-supervision training strategy, thereby comprehensively improving the performance of the model in a low-light environment.
The technical scheme adopted by the invention is as follows:
an unsupervised low-illumination-area self-adaptive training method comprises the following steps:
1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; the pre-training model is a visual task model trained by marked normal illumination training data; connecting a multilayer perceptron behind the feature extractor of the pre-training model to obtain a first model; the multilayer perceptron is used for mapping the features extracted by the feature extractor to a representation space of an unsupervised task; in the self-supervision learning based on the rotary jigsaw puzzle adopted by the scheme, the output of the multilayer perceptron is a 30-dimensional vector which represents the dictionary sequence number of the disordered image in all the image block arrangement and combination schemes;
2) Training the first model by using the marked normal illumination training data, locking parameters of the feature extractor in the training process, and only training the multilayer perceptron;
3) Constructing a depth concave curve model for predicting the pixel value of each pixel value of the input image after the brightness is enhanced; placing the depth concave curve model in front of the feature extractor in the first model to obtain a second model;
4) Training the second model using the low-light training data; in the training process, parameters of the feature extractor and parameters in the multilayer perceptron are locked, and only the deep concave curve model is trained;
5) Brightening the low-illumination training data by using the trained deep concave curve model, inputting the brightening result into the pre-training model, and predicting to obtain a label of the low-illumination training data; taking the predicted label as a pseudo label of the low-light training data;
6) And training the pre-training model by using the marked normal illumination training data and the low illumination training data with the pseudo label to obtain the fine-tuned pre-training model.
Furthermore, the multilayer perceptron adopts a network structure of a full connection layer-batch normalization layer-linear rectification function-full connection layer; the method for training the first model by using the normal illumination training data comprises the following steps: firstly, sequentially rotating and partitioning normal illumination training data to obtain a plurality of image blocks; then, the sequence of the image blocks is scrambled and then input into the feature extractor for feature extraction, and the extracted features are sent to the multilayer perceptron; the multilayer perceptron predicts the sequence of each image block according to input feature data; wherein the loss function used to train the first model isL C In order to be a function of the cross-entropy loss,is the dictionary number in the arrangement and combination scheme of all image blocks in the order in which the normal illumination image N is scrambled,is the tile order predicted by the multi-layer perceptron.
Further, the deep concave curve model sequentially comprises a down-sampling layer, a U-net network, a convolutional layer, a global pooling layer and a full-connection layer; the system comprises a global pooling layer, a full connection layer, a down-sampling layer, a U-net network, a convolution layer and a prediction layer, wherein the down-sampling layer is used for down-sampling an input image and inputting the down-sampling image into the U-net network, the U-net network is used for performing feature extraction on input data and inputting the input data into the convolution layer, the convolution layer performs further feature extraction on the input feature data and sequentially inputs the extracted features into the global pooling layer and the full connection layer to obtain a prediction result.
Further, the method for training the second model by using the low-light training data comprises the following steps: firstly, brightening the low-illumination training data by adopting the deep concave curve model to obtain a brightened image; then, sequentially rotating and partitioning the brightened image to obtain a plurality of image blocks; then, the sequence of the image blocks is scrambled and then input into the feature extractor for feature extraction, and the extracted features are sent to the multilayer perceptron; the multilayer perceptron predicts the sequence of each image block according to the input feature data; wherein the loss function used to train the second model isL C In order to be a function of the cross-entropy loss,is the dictionary number in all the image block permutation and combination schemes in the order in which the low-illumination images L are scrambled,is the tile order predicted by the multi-layer perceptron.
Further, the depth concave curve model comprises two convolution layers; namely, the deep concave curve model sequentially comprises a down-sampling layer, a U-net network, a first convolution layer, a second convolution layer, a global pooling layer and a full-connection layer.
Further, for the classification task, the pre-training model adopts ResNet-18; for the face detection task, the pre-training model adopts DSFD; for the behavior recognition task, the pre-training model adopts I3D; for the optical flow estimation task, the pre-training model adopts PWC-Net.
An unsupervised low-illumination-area image visual task detection method comprises the following steps:
1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; the pre-training model is a visual task model trained by marked normal illumination training data; connecting a multilayer perceptron behind the feature extractor of the pre-training model to obtain a first model; the multilayer perceptron adopts a network structure of a full connection layer, a batch normalization layer, a linear rectification function and a full connection layer;
2) Training the first model by using the marked normal illumination training data, locking parameters of the feature extractor in the training process, and only training the multilayer perceptron;
3) Constructing a depth concave curve model for predicting the pixel value of each pixel value of the input image after the brightness is enhanced; placing the depth concave curve model in front of the feature extractor in the first model to obtain a second model;
4) Training the second model using the low-light training data; locking parameters of the feature extractor and parameters in the multilayer perceptron in a training process, and only training the deep concave curve model;
5) Brightening the low-illumination training data by using the trained deep concave curve model, inputting the brightening result into the pre-training model, and predicting to obtain a label of the low-illumination training data; taking the predicted label as a pseudo label of the low-light training data;
6) Training the pre-training model by using the marked normal illumination training data and the low illumination training data with the pseudo label to obtain a fine-tuned pre-training model;
7) And inputting the low-illumination image to be processed into the trained deep concave curve model, brightening the low-illumination image, inputting the pre-trained model after fine adjustment, and outputting a corresponding visual task detection result.
A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the above method.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method.
Compared with the prior art, the invention has the following positive effects:
the invention obviously improves the performance of the normal illumination model in a low illumination environment, and can improve the accuracy of the universal classification model ResNet-18 from 60.96% to 63.92% on the CODAN low illumination classification benchmark test set; on a Dark Face low-illumination Face detection reference test set, the Average accuracy (mean of Average Precision) of a universal Face Detector (Dual Shot Face Detector) can be improved from 44.44 to 46.91; on a low-illumination behavior recognition benchmark test set ARID, the recognition accuracy can be improved from 50.18% to 52.13%; on the low luminous flux estimation reference tester VBOF, the end-point error (end-point error) can be reduced from 8.99 to 7.44.
Drawings
Fig. 1 is a structural view of a deep concave curve model.
FIG. 2 is a training flowchart of the deep concave curve model.
FIG. 3 is a flow chart of migrating a pre-trained model to a low light domain.
Detailed Description
In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.
The embodiment discloses an unsupervised low-illumination domain self-adaptive method applied to a low-light classification task, which is specifically described as follows:
step 1: searching and collecting normal illumination images with labels to form a training data set { X N ,Y N }; collecting low-illumination images to form a low-illumination training data set { X L }. Wherein the normal illumination illuminates sample X in the training dataset N Need to include category information Y N Samples in the low-light training dataset need not contain category information. And acquiring a pre-training model on the normal illumination image, wherein the model comprises a feature extractor. The pre-training model adopts a residual convolution network ResNet-18, and other pre-training models can also be adopted. For the classification task, the pre-training model adopts ResNet-18; for theIn the face detection task, a pre-training model adopts DSFD; for the behavior recognition task, the pre-training model adopts I3D; for the optical flow estimation task, the pre-training model adopts PWC-Net.
Step 2: a multi-layered perceptron is constructed and trained. The multilayer perceptron adopts a network structure of 'full connection layer-batch normalization layer-linear rectification function-full connection layer'. Fixing the parameters of the feature extractor obtained in the step 1, connecting the multilayer perceptron to the feature extractor, and utilizing a normal illumination data set { X ] through a self-supervision training method L And (5) training the multilayer perceptron. The self-supervision training method can use the strategy of rotating jigsaw puzzle, namely, firstly rotating the image, then dividing the image into 9 image blocks of 3 multiplied by 3, disordering the image blocks in sequence, and training the multilayer perceptron to restore the original sequence of the image blocks. The loss function term for this step is:
in the formula, L C In order to be a function of the cross-entropy loss,is the dictionary number in the arrangement and combination scheme of all image blocks in the order in which the normal illumination image N is scrambled,is the tile order predicted by the multi-layer perceptron. The training batch size was 64, first 150000 iterations with a learning rate of 0.01, and then 150000 iterations with a learning rate of 0.001.
And step 3: a depth concave curve model is constructed, which takes the unmarked low-light image as input, for predicting the mapping g. g is the mapping from original pixel values to new image pixel values. For example, for an 8-bit gray scale image, g is a 256-dimensional vector due to the 256 values in the color domain. The output of the depth concave curve model is the inverse of the discrete second derivative of g before normalization, and is a 255-dimensional vector, and g can be obtained by integration and normalization according to the output. For an 8-bit three-channel color image, the depth concave curve model predicts g corresponding to three channels respectively, namely 765-dimensional vectors are output. The final layer of the deep concave curve model is a rectification linear function to ensure that the output is a non-negative value, and further ensure that g is a concave curve. The detailed structure of the deep concave curve model is shown in figure 1, and sequentially comprises a down-sampling layer, a U-net network, two 3 x 3 convolutional layers, a global pooling layer and a full connection layer. Wherein the down-sampling layer reduces the resolution of the input image to 16 × 16, the U-net network takes the output of the down-sampling layer as input, extracts the features of the data, and the output is equal in size to the input. The two convolutional layers take the output of the U-net network as input, and further extract features. The global pooling layer and the full link layer take the output of the convolutional layer as input, and output the prediction result of the depth concave curve model, namely 765-dimensional vector.
And 4, step 4: training a deep concave curve model, wherein the flow chart is shown in the attached figure 2. In the step, parameters of the feature extractor and the multilayer perceptron are kept unchanged, and only the deep concave curve model is trained. Training Using Low light dataset { X L And (5) adopting a self-supervision paradigm. The self-supervision training method can use the strategy of rotating jigsaw puzzle, namely, firstly rotating the image, then dividing the image into 9 image blocks of 3 multiplied by 3, disordering the sequence of the image blocks, and training the model to recover the original sequence of the image blocks. The loss function term for this step is:
in the formula, L C In order to be a function of the cross-entropy loss,is the dictionary number in all the image block permutation and combination schemes in the order in which the low-illumination images L are scrambled,is the tile order predicted by the multi-layer perceptron. Training batch size of 64, initial learning rate of 0.01, total iteration 20000 times. The learning rate decays at a rate of 0.1 after the 5000 th and 10000 th iterations.
And 5: acquiring pseudo labels of the low-light training data. In this step, a low-light image data set { X is first acquired L Inputting a depth concave curve prediction model to obtain a brightened low-light image data set { E (X) L ) Then, a low-light image data set { E (X) } L ) Inputting the pre-training model obtained in the step 1 to predict to obtain a labelAmong the obtained tags, those with a confidence below 0.98 will be discarded. This step results in a low-light dataset containing pseudo-tags
Step 6: and (4) migrating the pre-training model to a low-light domain by using the normal-light data set containing the labels collected in the step (1) and the low-light data set containing the pseudo labels obtained in the step (6). The specific flow chart is shown in figure 3. The training of the pre-training model adopts a cross entropy loss function, the batch size is 64, the training process accounts for 7000 iterations, the initial learning rate is 0.001, and the initial learning rate is attenuated by the rate of 0.1 after 2000 th, 4000 th and 6000 th iterations. The training uses an SGD optimizer with momentum set to 0.9 and weight decay set to 0.00001. The training uses data enhancement methods including random cropping, horizontal flipping, color dithering and random rotation.
And 7: and in the inference stage, for the low-illumination image to be classified, firstly, the depth concave curve obtained by training in the step 4 is used for predicting and brightening, and then, the low-illumination classification model obtained by training in the step 6 (namely, a fine-tuned pre-training model) is input to obtain a prediction result, namely, a vector with the size equal to the number of classification categories in the data set.
For the face detection task, a pre-training model adopts a double detection face recognizer DSFD; for the low-illumination image to be detected, firstly, the depth concave curve obtained by training in the step 4 is used for predicting and brightening, and then the DSFD obtained by training and fine-tuning in the step 6 is input to obtain the predicted face detection frame coordinates.
For the behavior recognition task, a double-current expansion 3D convolution network I3D is adopted for the pre-training model; for the low-illumination image to be detected, firstly, the depth concave curve prediction obtained by training in the step 4 is used for brightening, then, the I3D obtained by training and fine tuning in the step 6 is input, and a behavior recognition prediction result of each frame of the video is obtained, namely a vector with the size equal to the behavior category number in the data set is obtained.
For the optical flow estimation task, the pre-training model adopts a pyramid-deformation-stereo matching optical flow estimation network PWC-Net; for the low-illumination image to be detected, firstly, the depth concave curve obtained by training in the step 4 is used for predicating and brightening, and then the PWC-Net obtained by training and fine tuning in the step 6 is input to obtain the position offset of each pixel in the image at the next moment.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.
Claims (9)
1. An unsupervised low-illumination-field self-adaptive training method comprises the following steps:
1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; the pre-training model is a visual task model trained by marked normal illumination training data; connecting a multilayer perceptron behind the feature extractor of the pre-training model to obtain a first model; the multilayer perceptron is used for mapping the features extracted by the feature extractor to a representation space of an automatic supervision task;
2) Training the first model by using the marked normal illumination training data, locking the parameters of the feature extractor in the training process, and only training the multilayer perceptron;
3) Constructing a depth concave curve model for predicting the pixel value of each pixel value of the input image after the brightness is enhanced; placing the depth concave curve model in front of the feature extractor in the first model to obtain a second model;
4) Training the second model using the low-light training data; locking parameters of the feature extractor and parameters in the multilayer perceptron in a training process, and only training the deep concave curve model;
5) Brightening the low-illumination training data by using the trained deep concave curve model, inputting the brightening result into the pre-training model, and predicting to obtain a label of the low-illumination training data; taking the predicted label as a pseudo label of the low-illumination training data;
6) And training the pre-training model by using the marked normal illumination training data and the low illumination training data with the pseudo label to obtain the fine-tuned pre-training model.
2. The method according to claim 1, wherein the multilayer perceptron adopts a network structure of full connection layer-batch normalization layer-linear rectification function-full connection layer; the method for training the first model by using the normal illumination training data comprises the following steps: firstly, sequentially rotating and partitioning normal illumination training data to obtain a plurality of image blocks; then, the sequence of the image blocks is scrambled and then input into the feature extractor for feature extraction, and the extracted features are sent to the multilayer perceptron; the multilayer perceptron predicts the sequence of each image block according to the input feature data; wherein the loss function used to train the first model isL C In order to be a function of the cross-entropy loss,is the dictionary number in the arrangement and combination scheme of all image blocks in the order in which the normal illumination image N is scrambled,is multi-layer perceptron predictionThe order of the puzzle pieces.
3. The method according to claim 1 or 2, wherein the deep concave curve model comprises a down-sampling layer, a U-net network, a convolutional layer, a global pooling layer and a full connection layer in sequence; the system comprises a global pooling layer, a full connection layer, a down-sampling layer, a U-net network, a convolution layer and a prediction layer, wherein the down-sampling layer is used for down-sampling an input image and inputting the down-sampling image into the U-net network, the U-net network is used for performing feature extraction on input data and inputting the input data into the convolution layer, the convolution layer performs further feature extraction on the input feature data and sequentially inputs the extracted features into the global pooling layer and the full connection layer to obtain a prediction result.
4. The method of claim 3, wherein the method for training the second model using the low-light training data is: firstly, brightening the low-illumination training data by adopting the deep concave curve model to obtain a brightened image; then, sequentially rotating and partitioning the brightened image to obtain a plurality of image blocks; then, the sequence of the image blocks is scrambled and then input into the feature extractor for feature extraction, and the extracted features are sent to the multilayer perceptron; the multilayer perceptron predicts the sequence of each image block according to input feature data; wherein the loss function used to train the second model isL C In order to be a function of the cross-entropy loss,is the dictionary number in all image block permutation and combination schemes in the order in which the low-illumination images L are scrambled,is the tile order predicted by the multi-layer perceptron.
5. The method of claim 3, wherein the depth-concave curve model comprises two convolution layers; namely, the deep concave curve model sequentially comprises a down-sampling layer, a U-net network, a first convolution layer, a second convolution layer, a global pooling layer and a full-connection layer.
6. The method of claim 1, wherein for classification tasks, the pre-trained model employs ResNet-18; for the face detection task, the pre-training model adopts DSFD; for the behavior recognition task, the pre-training model adopts I3D; for the optical flow estimation task, the pre-training model adopts PWC-Net.
7. An unsupervised low-illumination-area image visual task detection method comprises the following steps:
1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; the pre-training model is a visual task model trained by marked normal illumination training data; connecting a multilayer perceptron behind the feature extractor of the pre-training model to obtain a first model; the multilayer perceptron adopts a network structure of a full connection layer, a batch normalization layer, a linear rectification function and a full connection layer;
2) Training the first model by using the marked normal illumination training data, locking the parameters of the feature extractor in the training process, and only training the multilayer perceptron;
3) Constructing a depth concave curve model for predicting the pixel value of each pixel value of the input image after the brightness is enhanced; placing the depth concave curve model in front of the feature extractor in the first model to obtain a second model;
4) Training the second model using the low-light training data; in the training process, parameters of the feature extractor and parameters in the multilayer perceptron are locked, and only the deep concave curve model is trained;
5) Brightening the low-illumination training data by using the trained deep concave curve model, inputting the brightening result into the pre-training model, and predicting to obtain a label of the low-illumination training data; taking the predicted label as a pseudo label of the low-light training data;
6) Training the pre-training model by using the marked normal illumination training data and the low illumination training data with the pseudo label to obtain a fine-tuned pre-training model;
7) And inputting the low-illumination image to be processed into the trained deep concave curve model, brightening the low-illumination image, inputting the pre-trained model after fine adjustment, and outputting a corresponding visual task detection result.
8. A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method of any one of claims 1 to 7.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211129606.6A CN115641483A (en) | 2022-09-16 | 2022-09-16 | Unsupervised low-illumination-area self-adaptive training method and detection method |
PCT/CN2022/130218 WO2024055398A1 (en) | 2022-09-16 | 2022-11-07 | Unsupervised low-illumination-domain adaptive training method and detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211129606.6A CN115641483A (en) | 2022-09-16 | 2022-09-16 | Unsupervised low-illumination-area self-adaptive training method and detection method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115641483A true CN115641483A (en) | 2023-01-24 |
Family
ID=84941611
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211129606.6A Pending CN115641483A (en) | 2022-09-16 | 2022-09-16 | Unsupervised low-illumination-area self-adaptive training method and detection method |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115641483A (en) |
WO (1) | WO2024055398A1 (en) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020065403A1 (en) * | 2018-09-28 | 2020-04-02 | Sinha Pavel | Machine learning using structurally regularized convolutional neural network architecture |
CN112069921A (en) * | 2020-08-18 | 2020-12-11 | 浙江大学 | Small sample visual target identification method based on self-supervision knowledge migration |
CN112508815A (en) * | 2020-12-09 | 2021-03-16 | 中国科学院深圳先进技术研究院 | Model training method and device, electronic equipment and machine-readable storage medium |
CN114693545A (en) * | 2022-02-15 | 2022-07-01 | 北京大学 | Low-illumination enhancement method and system based on curve family function |
-
2022
- 2022-09-16 CN CN202211129606.6A patent/CN115641483A/en active Pending
- 2022-11-07 WO PCT/CN2022/130218 patent/WO2024055398A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2024055398A1 (en) | 2024-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111814854B (en) | Target re-identification method without supervision domain adaptation | |
CN113505792B (en) | Multi-scale semantic segmentation method and model for unbalanced remote sensing image | |
CN111612010B (en) | Image processing method, device, equipment and computer readable storage medium | |
CN111611847A (en) | Video motion detection method based on scale attention hole convolution network | |
CN113361645B (en) | Target detection model construction method and system based on meta learning and knowledge memory | |
CN114936605A (en) | Knowledge distillation-based neural network training method, device and storage medium | |
CN113283356B (en) | Multistage attention scale perception crowd counting method | |
CN111932431A (en) | Visible watermark removing method based on watermark decomposition model and electronic equipment | |
US20230154005A1 (en) | Panoptic segmentation with panoptic, instance, and semantic relations | |
CN116704431A (en) | On-line monitoring system and method for water pollution | |
CN111242870B (en) | Low-light image enhancement method based on deep learning knowledge distillation technology | |
CN115082781A (en) | Ship image detection method and device and storage medium | |
CN117994573A (en) | Infrared dim target detection method based on superpixel and deformable convolution | |
CN114170422A (en) | Coal mine underground image semantic segmentation method | |
CN110768864B (en) | Method and device for generating images in batches through network traffic | |
CN116935438A (en) | Pedestrian image re-recognition method based on autonomous evolution of model structure | |
CN116824140A (en) | Small sample segmentation method for test scene non-mask supervision | |
CN116630694A (en) | Target classification method and system for partial multi-label images and electronic equipment | |
CN115641483A (en) | Unsupervised low-illumination-area self-adaptive training method and detection method | |
CN110647917A (en) | Model multiplexing method and system | |
CN114067155B (en) | Image classification method, device, product and storage medium based on meta learning | |
CN115546689A (en) | Video time sequence abnormal frame detection method based on unsupervised frame correlation | |
CN113627342A (en) | Method, system, device and storage medium for video depth feature extraction optimization | |
CN115131844A (en) | Unsupervised low-illumination face detection model training method and detection method | |
CN113837243A (en) | RGB-D camera dynamic visual odometer method based on edge information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |