CN115641483A - Unsupervised low-illumination-area self-adaptive training method and detection method - Google Patents

Unsupervised low-illumination-area self-adaptive training method and detection method Download PDF

Info

Publication number
CN115641483A
CN115641483A CN202211129606.6A CN202211129606A CN115641483A CN 115641483 A CN115641483 A CN 115641483A CN 202211129606 A CN202211129606 A CN 202211129606A CN 115641483 A CN115641483 A CN 115641483A
Authority
CN
China
Prior art keywords
model
training
low
illumination
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211129606.6A
Other languages
Chinese (zh)
Inventor
刘家瑛
罗润冬
汪文靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202211129606.6A priority Critical patent/CN115641483A/en
Priority to PCT/CN2022/130218 priority patent/WO2024055398A1/en
Publication of CN115641483A publication Critical patent/CN115641483A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/48Extraction of image or video features by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an unsupervised low-illumination-area self-adaptive training method and a detection method. The method comprises the following steps: 1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; connecting a multilayer perceptron behind a feature extractor of the pre-training model to obtain a first model; 2) Training a multilayer sensor in a first model by using normal illumination data; 3) Constructing a depth concave curve model and placing the depth concave curve model in front of a feature extractor in a first model to obtain a second model; 4) Training a depth concave curve model in the second model by using the low-light data; 5) Brightening the low-illumination data by using a deep concave curve model, inputting the low-illumination data into a pre-training model, and using a label obtained by prediction as a pseudo label of the low-illumination data; 6) Training and fine-tuning the pre-training model by using normal illumination data and low illumination data with a pseudo label; 7) And (4) brightening the low-illumination image to be processed, inputting the fine-tuned pre-training model, and outputting a corresponding detection result.

Description

Unsupervised low-illumination-area self-adaptive training method and detection method
Technical Field
The invention belongs to the field of digital image low-illumination enhancement and the field of machine vision, and relates to an unsupervised low-illumination-area self-adaptive training method and a detection method based on a deep concave curve.
Background
Low light is a common image degradation, and insufficient light is usually caused by low light shooting environment, camera failure, parameter setting error and the like. Visual tasks in low-light environments, including object classification, face detection, behavior recognition, and optical flow estimation, have been receiving attention from both academic and industrial fields. Traditional low-illumination visual task model training needs large-scale labeling of a training set, but in a low-illumination environment, data is difficult to label, a large number of normal-illumination training data sets and pre-training models exist in the industry, and manpower and material resources are repeatedly consumed when the low-illumination training data sets are newly built and the models are newly trained. How to fully utilize the existing normal illumination training data set with labels and a normal illumination pre-training model and train under the condition of not additionally introducing low illumination labels to obtain the model capable of being applied to the low illumination environment, namely, the normal illumination pre-training model is transferred to the low illumination environment by a unsupervised domain self-adaptive method, and the method has wide practical significance and application value.
Traditional unsupervised low-illumination-domain adaptive methods can be divided into three categories. The low-illumination image is lightened based on the lightening method, so that the performance of the model trained on the normal-illumination image is improved. The feature migration-based method aligns the features of the normal light image and the low light image through a contrast learning method, so that the model can be applied to a low light environment. The countermeasure learning-based method generates a dim light image by generating a countermeasure network and migrates the model to a low light environment using pseudo tags.
However, the brightening-based method ignores the difference between human vision and machine vision, the feature migration-based method ignores the importance of pixel level adjustment, the counterlearning-based method requires data from multiple domains and ignores the features of the input image itself. The existing unsupervised domain self-adaptive methods have poor effects and cannot meet the requirements of practical application.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide an unsupervised low illumination area adaptive training method and a detection method based on a deep concave curve. The invention trains the deep concave curve model to enhance the brightness by using the self-supervision training strategy, thereby comprehensively improving the performance of the model in a low-light environment.
The technical scheme adopted by the invention is as follows:
an unsupervised low-illumination-area self-adaptive training method comprises the following steps:
1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; the pre-training model is a visual task model trained by marked normal illumination training data; connecting a multilayer perceptron behind the feature extractor of the pre-training model to obtain a first model; the multilayer perceptron is used for mapping the features extracted by the feature extractor to a representation space of an unsupervised task; in the self-supervision learning based on the rotary jigsaw puzzle adopted by the scheme, the output of the multilayer perceptron is a 30-dimensional vector which represents the dictionary sequence number of the disordered image in all the image block arrangement and combination schemes;
2) Training the first model by using the marked normal illumination training data, locking parameters of the feature extractor in the training process, and only training the multilayer perceptron;
3) Constructing a depth concave curve model for predicting the pixel value of each pixel value of the input image after the brightness is enhanced; placing the depth concave curve model in front of the feature extractor in the first model to obtain a second model;
4) Training the second model using the low-light training data; in the training process, parameters of the feature extractor and parameters in the multilayer perceptron are locked, and only the deep concave curve model is trained;
5) Brightening the low-illumination training data by using the trained deep concave curve model, inputting the brightening result into the pre-training model, and predicting to obtain a label of the low-illumination training data; taking the predicted label as a pseudo label of the low-light training data;
6) And training the pre-training model by using the marked normal illumination training data and the low illumination training data with the pseudo label to obtain the fine-tuned pre-training model.
Furthermore, the multilayer perceptron adopts a network structure of a full connection layer-batch normalization layer-linear rectification function-full connection layer; the method for training the first model by using the normal illumination training data comprises the following steps: firstly, sequentially rotating and partitioning normal illumination training data to obtain a plurality of image blocks; then, the sequence of the image blocks is scrambled and then input into the feature extractor for feature extraction, and the extracted features are sent to the multilayer perceptron; the multilayer perceptron predicts the sequence of each image block according to input feature data; wherein the loss function used to train the first model is
Figure BDA0003849613140000021
L C In order to be a function of the cross-entropy loss,
Figure BDA0003849613140000022
is the dictionary number in the arrangement and combination scheme of all image blocks in the order in which the normal illumination image N is scrambled,
Figure BDA0003849613140000023
is the tile order predicted by the multi-layer perceptron.
Further, the deep concave curve model sequentially comprises a down-sampling layer, a U-net network, a convolutional layer, a global pooling layer and a full-connection layer; the system comprises a global pooling layer, a full connection layer, a down-sampling layer, a U-net network, a convolution layer and a prediction layer, wherein the down-sampling layer is used for down-sampling an input image and inputting the down-sampling image into the U-net network, the U-net network is used for performing feature extraction on input data and inputting the input data into the convolution layer, the convolution layer performs further feature extraction on the input feature data and sequentially inputs the extracted features into the global pooling layer and the full connection layer to obtain a prediction result.
Further, the method for training the second model by using the low-light training data comprises the following steps: firstly, brightening the low-illumination training data by adopting the deep concave curve model to obtain a brightened image; then, sequentially rotating and partitioning the brightened image to obtain a plurality of image blocks; then, the sequence of the image blocks is scrambled and then input into the feature extractor for feature extraction, and the extracted features are sent to the multilayer perceptron; the multilayer perceptron predicts the sequence of each image block according to the input feature data; wherein the loss function used to train the second model is
Figure BDA0003849613140000024
L C In order to be a function of the cross-entropy loss,
Figure BDA0003849613140000031
is the dictionary number in all the image block permutation and combination schemes in the order in which the low-illumination images L are scrambled,
Figure BDA0003849613140000032
is the tile order predicted by the multi-layer perceptron.
Further, the depth concave curve model comprises two convolution layers; namely, the deep concave curve model sequentially comprises a down-sampling layer, a U-net network, a first convolution layer, a second convolution layer, a global pooling layer and a full-connection layer.
Further, for the classification task, the pre-training model adopts ResNet-18; for the face detection task, the pre-training model adopts DSFD; for the behavior recognition task, the pre-training model adopts I3D; for the optical flow estimation task, the pre-training model adopts PWC-Net.
An unsupervised low-illumination-area image visual task detection method comprises the following steps:
1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; the pre-training model is a visual task model trained by marked normal illumination training data; connecting a multilayer perceptron behind the feature extractor of the pre-training model to obtain a first model; the multilayer perceptron adopts a network structure of a full connection layer, a batch normalization layer, a linear rectification function and a full connection layer;
2) Training the first model by using the marked normal illumination training data, locking parameters of the feature extractor in the training process, and only training the multilayer perceptron;
3) Constructing a depth concave curve model for predicting the pixel value of each pixel value of the input image after the brightness is enhanced; placing the depth concave curve model in front of the feature extractor in the first model to obtain a second model;
4) Training the second model using the low-light training data; locking parameters of the feature extractor and parameters in the multilayer perceptron in a training process, and only training the deep concave curve model;
5) Brightening the low-illumination training data by using the trained deep concave curve model, inputting the brightening result into the pre-training model, and predicting to obtain a label of the low-illumination training data; taking the predicted label as a pseudo label of the low-light training data;
6) Training the pre-training model by using the marked normal illumination training data and the low illumination training data with the pseudo label to obtain a fine-tuned pre-training model;
7) And inputting the low-illumination image to be processed into the trained deep concave curve model, brightening the low-illumination image, inputting the pre-trained model after fine adjustment, and outputting a corresponding visual task detection result.
A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the above method.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method.
Compared with the prior art, the invention has the following positive effects:
the invention obviously improves the performance of the normal illumination model in a low illumination environment, and can improve the accuracy of the universal classification model ResNet-18 from 60.96% to 63.92% on the CODAN low illumination classification benchmark test set; on a Dark Face low-illumination Face detection reference test set, the Average accuracy (mean of Average Precision) of a universal Face Detector (Dual Shot Face Detector) can be improved from 44.44 to 46.91; on a low-illumination behavior recognition benchmark test set ARID, the recognition accuracy can be improved from 50.18% to 52.13%; on the low luminous flux estimation reference tester VBOF, the end-point error (end-point error) can be reduced from 8.99 to 7.44.
Drawings
Fig. 1 is a structural view of a deep concave curve model.
FIG. 2 is a training flowchart of the deep concave curve model.
FIG. 3 is a flow chart of migrating a pre-trained model to a low light domain.
Detailed Description
In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.
The embodiment discloses an unsupervised low-illumination domain self-adaptive method applied to a low-light classification task, which is specifically described as follows:
step 1: searching and collecting normal illumination images with labels to form a training data set { X N ,Y N }; collecting low-illumination images to form a low-illumination training data set { X L }. Wherein the normal illumination illuminates sample X in the training dataset N Need to include category information Y N Samples in the low-light training dataset need not contain category information. And acquiring a pre-training model on the normal illumination image, wherein the model comprises a feature extractor. The pre-training model adopts a residual convolution network ResNet-18, and other pre-training models can also be adopted. For the classification task, the pre-training model adopts ResNet-18; for theIn the face detection task, a pre-training model adopts DSFD; for the behavior recognition task, the pre-training model adopts I3D; for the optical flow estimation task, the pre-training model adopts PWC-Net.
Step 2: a multi-layered perceptron is constructed and trained. The multilayer perceptron adopts a network structure of 'full connection layer-batch normalization layer-linear rectification function-full connection layer'. Fixing the parameters of the feature extractor obtained in the step 1, connecting the multilayer perceptron to the feature extractor, and utilizing a normal illumination data set { X ] through a self-supervision training method L And (5) training the multilayer perceptron. The self-supervision training method can use the strategy of rotating jigsaw puzzle, namely, firstly rotating the image, then dividing the image into 9 image blocks of 3 multiplied by 3, disordering the image blocks in sequence, and training the multilayer perceptron to restore the original sequence of the image blocks. The loss function term for this step is:
Figure BDA0003849613140000041
in the formula, L C In order to be a function of the cross-entropy loss,
Figure BDA0003849613140000051
is the dictionary number in the arrangement and combination scheme of all image blocks in the order in which the normal illumination image N is scrambled,
Figure BDA0003849613140000052
is the tile order predicted by the multi-layer perceptron. The training batch size was 64, first 150000 iterations with a learning rate of 0.01, and then 150000 iterations with a learning rate of 0.001.
And step 3: a depth concave curve model is constructed, which takes the unmarked low-light image as input, for predicting the mapping g. g is the mapping from original pixel values to new image pixel values. For example, for an 8-bit gray scale image, g is a 256-dimensional vector due to the 256 values in the color domain. The output of the depth concave curve model is the inverse of the discrete second derivative of g before normalization, and is a 255-dimensional vector, and g can be obtained by integration and normalization according to the output. For an 8-bit three-channel color image, the depth concave curve model predicts g corresponding to three channels respectively, namely 765-dimensional vectors are output. The final layer of the deep concave curve model is a rectification linear function to ensure that the output is a non-negative value, and further ensure that g is a concave curve. The detailed structure of the deep concave curve model is shown in figure 1, and sequentially comprises a down-sampling layer, a U-net network, two 3 x 3 convolutional layers, a global pooling layer and a full connection layer. Wherein the down-sampling layer reduces the resolution of the input image to 16 × 16, the U-net network takes the output of the down-sampling layer as input, extracts the features of the data, and the output is equal in size to the input. The two convolutional layers take the output of the U-net network as input, and further extract features. The global pooling layer and the full link layer take the output of the convolutional layer as input, and output the prediction result of the depth concave curve model, namely 765-dimensional vector.
And 4, step 4: training a deep concave curve model, wherein the flow chart is shown in the attached figure 2. In the step, parameters of the feature extractor and the multilayer perceptron are kept unchanged, and only the deep concave curve model is trained. Training Using Low light dataset { X L And (5) adopting a self-supervision paradigm. The self-supervision training method can use the strategy of rotating jigsaw puzzle, namely, firstly rotating the image, then dividing the image into 9 image blocks of 3 multiplied by 3, disordering the sequence of the image blocks, and training the model to recover the original sequence of the image blocks. The loss function term for this step is:
Figure BDA0003849613140000053
in the formula, L C In order to be a function of the cross-entropy loss,
Figure BDA0003849613140000054
is the dictionary number in all the image block permutation and combination schemes in the order in which the low-illumination images L are scrambled,
Figure BDA0003849613140000055
is the tile order predicted by the multi-layer perceptron. Training batch size of 64, initial learning rate of 0.01, total iteration 20000 times. The learning rate decays at a rate of 0.1 after the 5000 th and 10000 th iterations.
And 5: acquiring pseudo labels of the low-light training data. In this step, a low-light image data set { X is first acquired L Inputting a depth concave curve prediction model to obtain a brightened low-light image data set { E (X) L ) Then, a low-light image data set { E (X) } L ) Inputting the pre-training model obtained in the step 1 to predict to obtain a label
Figure BDA0003849613140000056
Among the obtained tags, those with a confidence below 0.98 will be discarded. This step results in a low-light dataset containing pseudo-tags
Figure BDA0003849613140000057
Step 6: and (4) migrating the pre-training model to a low-light domain by using the normal-light data set containing the labels collected in the step (1) and the low-light data set containing the pseudo labels obtained in the step (6). The specific flow chart is shown in figure 3. The training of the pre-training model adopts a cross entropy loss function, the batch size is 64, the training process accounts for 7000 iterations, the initial learning rate is 0.001, and the initial learning rate is attenuated by the rate of 0.1 after 2000 th, 4000 th and 6000 th iterations. The training uses an SGD optimizer with momentum set to 0.9 and weight decay set to 0.00001. The training uses data enhancement methods including random cropping, horizontal flipping, color dithering and random rotation.
And 7: and in the inference stage, for the low-illumination image to be classified, firstly, the depth concave curve obtained by training in the step 4 is used for predicting and brightening, and then, the low-illumination classification model obtained by training in the step 6 (namely, a fine-tuned pre-training model) is input to obtain a prediction result, namely, a vector with the size equal to the number of classification categories in the data set.
For the face detection task, a pre-training model adopts a double detection face recognizer DSFD; for the low-illumination image to be detected, firstly, the depth concave curve obtained by training in the step 4 is used for predicting and brightening, and then the DSFD obtained by training and fine-tuning in the step 6 is input to obtain the predicted face detection frame coordinates.
For the behavior recognition task, a double-current expansion 3D convolution network I3D is adopted for the pre-training model; for the low-illumination image to be detected, firstly, the depth concave curve prediction obtained by training in the step 4 is used for brightening, then, the I3D obtained by training and fine tuning in the step 6 is input, and a behavior recognition prediction result of each frame of the video is obtained, namely a vector with the size equal to the behavior category number in the data set is obtained.
For the optical flow estimation task, the pre-training model adopts a pyramid-deformation-stereo matching optical flow estimation network PWC-Net; for the low-illumination image to be detected, firstly, the depth concave curve obtained by training in the step 4 is used for predicating and brightening, and then the PWC-Net obtained by training and fine tuning in the step 6 is input to obtain the position offset of each pixel in the image at the next moment.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (9)

1. An unsupervised low-illumination-field self-adaptive training method comprises the following steps:
1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; the pre-training model is a visual task model trained by marked normal illumination training data; connecting a multilayer perceptron behind the feature extractor of the pre-training model to obtain a first model; the multilayer perceptron is used for mapping the features extracted by the feature extractor to a representation space of an automatic supervision task;
2) Training the first model by using the marked normal illumination training data, locking the parameters of the feature extractor in the training process, and only training the multilayer perceptron;
3) Constructing a depth concave curve model for predicting the pixel value of each pixel value of the input image after the brightness is enhanced; placing the depth concave curve model in front of the feature extractor in the first model to obtain a second model;
4) Training the second model using the low-light training data; locking parameters of the feature extractor and parameters in the multilayer perceptron in a training process, and only training the deep concave curve model;
5) Brightening the low-illumination training data by using the trained deep concave curve model, inputting the brightening result into the pre-training model, and predicting to obtain a label of the low-illumination training data; taking the predicted label as a pseudo label of the low-illumination training data;
6) And training the pre-training model by using the marked normal illumination training data and the low illumination training data with the pseudo label to obtain the fine-tuned pre-training model.
2. The method according to claim 1, wherein the multilayer perceptron adopts a network structure of full connection layer-batch normalization layer-linear rectification function-full connection layer; the method for training the first model by using the normal illumination training data comprises the following steps: firstly, sequentially rotating and partitioning normal illumination training data to obtain a plurality of image blocks; then, the sequence of the image blocks is scrambled and then input into the feature extractor for feature extraction, and the extracted features are sent to the multilayer perceptron; the multilayer perceptron predicts the sequence of each image block according to the input feature data; wherein the loss function used to train the first model is
Figure FDA0003849613130000011
L C In order to be a function of the cross-entropy loss,
Figure FDA0003849613130000012
is the dictionary number in the arrangement and combination scheme of all image blocks in the order in which the normal illumination image N is scrambled,
Figure FDA0003849613130000013
is multi-layer perceptron predictionThe order of the puzzle pieces.
3. The method according to claim 1 or 2, wherein the deep concave curve model comprises a down-sampling layer, a U-net network, a convolutional layer, a global pooling layer and a full connection layer in sequence; the system comprises a global pooling layer, a full connection layer, a down-sampling layer, a U-net network, a convolution layer and a prediction layer, wherein the down-sampling layer is used for down-sampling an input image and inputting the down-sampling image into the U-net network, the U-net network is used for performing feature extraction on input data and inputting the input data into the convolution layer, the convolution layer performs further feature extraction on the input feature data and sequentially inputs the extracted features into the global pooling layer and the full connection layer to obtain a prediction result.
4. The method of claim 3, wherein the method for training the second model using the low-light training data is: firstly, brightening the low-illumination training data by adopting the deep concave curve model to obtain a brightened image; then, sequentially rotating and partitioning the brightened image to obtain a plurality of image blocks; then, the sequence of the image blocks is scrambled and then input into the feature extractor for feature extraction, and the extracted features are sent to the multilayer perceptron; the multilayer perceptron predicts the sequence of each image block according to input feature data; wherein the loss function used to train the second model is
Figure FDA0003849613130000021
L C In order to be a function of the cross-entropy loss,
Figure FDA0003849613130000022
is the dictionary number in all image block permutation and combination schemes in the order in which the low-illumination images L are scrambled,
Figure FDA0003849613130000023
is the tile order predicted by the multi-layer perceptron.
5. The method of claim 3, wherein the depth-concave curve model comprises two convolution layers; namely, the deep concave curve model sequentially comprises a down-sampling layer, a U-net network, a first convolution layer, a second convolution layer, a global pooling layer and a full-connection layer.
6. The method of claim 1, wherein for classification tasks, the pre-trained model employs ResNet-18; for the face detection task, the pre-training model adopts DSFD; for the behavior recognition task, the pre-training model adopts I3D; for the optical flow estimation task, the pre-training model adopts PWC-Net.
7. An unsupervised low-illumination-area image visual task detection method comprises the following steps:
1) Collecting labeled normal illumination training data, unlabeled low illumination training data and a pre-training model; the pre-training model is a visual task model trained by marked normal illumination training data; connecting a multilayer perceptron behind the feature extractor of the pre-training model to obtain a first model; the multilayer perceptron adopts a network structure of a full connection layer, a batch normalization layer, a linear rectification function and a full connection layer;
2) Training the first model by using the marked normal illumination training data, locking the parameters of the feature extractor in the training process, and only training the multilayer perceptron;
3) Constructing a depth concave curve model for predicting the pixel value of each pixel value of the input image after the brightness is enhanced; placing the depth concave curve model in front of the feature extractor in the first model to obtain a second model;
4) Training the second model using the low-light training data; in the training process, parameters of the feature extractor and parameters in the multilayer perceptron are locked, and only the deep concave curve model is trained;
5) Brightening the low-illumination training data by using the trained deep concave curve model, inputting the brightening result into the pre-training model, and predicting to obtain a label of the low-illumination training data; taking the predicted label as a pseudo label of the low-light training data;
6) Training the pre-training model by using the marked normal illumination training data and the low illumination training data with the pseudo label to obtain a fine-tuned pre-training model;
7) And inputting the low-illumination image to be processed into the trained deep concave curve model, brightening the low-illumination image, inputting the pre-trained model after fine adjustment, and outputting a corresponding visual task detection result.
8. A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method of any one of claims 1 to 7.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202211129606.6A 2022-09-16 2022-09-16 Unsupervised low-illumination-area self-adaptive training method and detection method Pending CN115641483A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211129606.6A CN115641483A (en) 2022-09-16 2022-09-16 Unsupervised low-illumination-area self-adaptive training method and detection method
PCT/CN2022/130218 WO2024055398A1 (en) 2022-09-16 2022-11-07 Unsupervised low-illumination-domain adaptive training method and detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211129606.6A CN115641483A (en) 2022-09-16 2022-09-16 Unsupervised low-illumination-area self-adaptive training method and detection method

Publications (1)

Publication Number Publication Date
CN115641483A true CN115641483A (en) 2023-01-24

Family

ID=84941611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211129606.6A Pending CN115641483A (en) 2022-09-16 2022-09-16 Unsupervised low-illumination-area self-adaptive training method and detection method

Country Status (2)

Country Link
CN (1) CN115641483A (en)
WO (1) WO2024055398A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020065403A1 (en) * 2018-09-28 2020-04-02 Sinha Pavel Machine learning using structurally regularized convolutional neural network architecture
CN112069921A (en) * 2020-08-18 2020-12-11 浙江大学 Small sample visual target identification method based on self-supervision knowledge migration
CN112508815A (en) * 2020-12-09 2021-03-16 中国科学院深圳先进技术研究院 Model training method and device, electronic equipment and machine-readable storage medium
CN114693545A (en) * 2022-02-15 2022-07-01 北京大学 Low-illumination enhancement method and system based on curve family function

Also Published As

Publication number Publication date
WO2024055398A1 (en) 2024-03-21

Similar Documents

Publication Publication Date Title
CN111814854B (en) Target re-identification method without supervision domain adaptation
CN113505792B (en) Multi-scale semantic segmentation method and model for unbalanced remote sensing image
CN111612010B (en) Image processing method, device, equipment and computer readable storage medium
CN111611847A (en) Video motion detection method based on scale attention hole convolution network
CN113361645B (en) Target detection model construction method and system based on meta learning and knowledge memory
CN114936605A (en) Knowledge distillation-based neural network training method, device and storage medium
CN113283356B (en) Multistage attention scale perception crowd counting method
CN111932431A (en) Visible watermark removing method based on watermark decomposition model and electronic equipment
US20230154005A1 (en) Panoptic segmentation with panoptic, instance, and semantic relations
CN116704431A (en) On-line monitoring system and method for water pollution
CN111242870B (en) Low-light image enhancement method based on deep learning knowledge distillation technology
CN115082781A (en) Ship image detection method and device and storage medium
CN117994573A (en) Infrared dim target detection method based on superpixel and deformable convolution
CN114170422A (en) Coal mine underground image semantic segmentation method
CN110768864B (en) Method and device for generating images in batches through network traffic
CN116935438A (en) Pedestrian image re-recognition method based on autonomous evolution of model structure
CN116824140A (en) Small sample segmentation method for test scene non-mask supervision
CN116630694A (en) Target classification method and system for partial multi-label images and electronic equipment
CN115641483A (en) Unsupervised low-illumination-area self-adaptive training method and detection method
CN110647917A (en) Model multiplexing method and system
CN114067155B (en) Image classification method, device, product and storage medium based on meta learning
CN115546689A (en) Video time sequence abnormal frame detection method based on unsupervised frame correlation
CN113627342A (en) Method, system, device and storage medium for video depth feature extraction optimization
CN115131844A (en) Unsupervised low-illumination face detection model training method and detection method
CN113837243A (en) RGB-D camera dynamic visual odometer method based on edge information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination