CN114022732A

CN114022732A - Extremely dark light object detection method based on RAW image

Info

Publication number: CN114022732A
Application number: CN202111294930.9A
Authority: CN
Inventors: 付莹; 洪阳
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-11-03
Filing date: 2021-11-03
Publication date: 2022-02-08
Anticipated expiration: 2041-11-03

Abstract

The invention discloses a method for detecting an extremely dim light object based on a RAW image, which is used for low dim light images acquired by a conventional image sensor, establishing a dim light synthesis pipeline according to a physical imaging process, constructing a high-quality dim light object detection simulation data set based on the existing normal light object detection data set resources, and training an accurate dim light object detection network. The invention can finish the detection of the extremely dim light object based on the prior common image acquisition equipment with high quality, realizes the detection of the extremely dim light object with high efficiency and high precision while remarkably saving the acquired images and human resources used for constructing the extremely dim light object detection data set, improves the detection precision, expands the application scene of the object detector and breaks through the bottleneck of the object detection field. The invention can be used in the fields of deep space exploration, deep sea exploration, biomedicine, near-earth exploration and the like.

Description

Extremely dark light object detection method based on RAW image

Technical Field

The invention relates to a dark light image detection data generation method for extreme condition detection, in particular to a method capable of obtaining a high-quality and high-simulation extremely dark light image, and belongs to the technical field of computational vision.

Background

The technology for detecting the object in the extremely dark light is a technology capable of realizing the task of detecting the object under the condition of low light and low light, and can effectively detect the target object in a dark light image with low brightness, obvious noise and low signal-to-noise ratio.

In a low-low light scene with limited light sources, the image sensor receives fewer photons within the exposure time, and is limited by the physical characteristics of the image sensor, so that the shot image has the characteristics of low brightness, obvious noise and low signal-to-noise ratio, and the information contained in the image is seriously influenced. At present, 8-bit quantization JPEG-format images captured by existing Image capturing devices are RGB images (R represents Red, G represents Green, and B represents Blue) processed by a built-in ISP algorithm (Image Signal Processor), and semantic information of the RGB images is lost in the process, so that weak signals existing in output images are usually seriously distorted, and sometimes even permanently lost scene information. Such degradation can have a negative impact not only on the visual quality of low-light images, but also on the performance of the algorithms applied to all downstream images.

Therefore, keeping and mining more information in low-light scenes and images is one of the most important tasks in the task of extremely low light detection.

In order to effectively detect a target object in a low-light scene or an image, a common solution is to perform dim light enhancement and image denoising on the target scene or the image in advance before detection, so that missing or scene information is restored to a certain extent through an enhancement algorithm, thereby including more information, so as to be more likely to be identified by a subsequent detector. However, the existing dim light enhancement and image denoising algorithm based on learning is difficult to obtain a good result in the extremely dim light image with limited information, and lacks a targeted extraction design of semantic feature information required by a subsequent object detection algorithm. In addition, due to the additional enhancement steps, the cascaded two-stage "enhancement + detection" method generates additional computational burden, greatly increases the time consumption of the low-weak light detection task, and seriously hinders the deployment of related efforts on embedded or mobile devices with limited resources but wide application.

In recent years, learning-based end-to-end specialized dim object detection networks have emerged with some research efforts to automatically extract features from noisy dim data using convolutional neural networks. However, existing learning-based methods generally rely on training data sets synthesized by simply using gamma transformation or linear variation to turn down the brightness of natural-light images, or by further adding simple noise models, or the like. Although these methods give good results on synthetic data, they still do not perform well on real data and are not evaluated due to the lack of real labeled extremely dim object detection data.

Currently, there are two approaches to solving this problem. Firstly, paired real data are collected to carry out learning and evaluation of an extreme dim light image detection network, similar to normal light image detection, but the collection and manufacturing cost of a real low dim light data set is high, the time consumption is long, the labeling is difficult, and a large amount of manpower and material resources are needed for manufacturing a high-quality dim light object detection data set; secondly, generating real simulation data, not only fully utilizing the abundant normal light object detection data resources, but also saving a large amount of data manufacturing cost, but also having the key point of the accuracy of the simulation modeling pipeline for synthesizing the low and weak light images. Gaussian noise and poisson noise are common simulated noise models, however, in practice dark light images generated at different photon levels contain more complex noise. In addition, the images shot by the existing camera and the public image resources are almost RGB format images processed by a built-in ISP algorithm, semantic information is lost, and due to the physical characteristics of the image sensor and the complexity of an imaging process, the image brightness is simply added with noise or reduced directly on the images of the existing normal light data set, and the noise generation process in the actual imaging process is not consistent. Inaccurate noise descriptions and noise addition patterns can have a significant negative impact on the quality of the simulation data set.

Disclosure of Invention

The invention aims to creatively provide an extremely dark light object detection method based on a RAW (original) image, aiming at the defects existing in the prior art, in particular to the defects that the existing two-stage method needs additional computing resources, an end-to-end algorithm depends on a real paired labeled data set, and information loss is caused after a common RGB format image is compressed.

The innovation points of the invention are as follows: the image signal processing and imaging process of the image acquisition equipment is analyzed, the advantage of limited information in a dim light scene can be retained to the maximum extent by combining the RAW format image, and a dim light RAW image synthesis pipeline based on a normal light RGB image is established according to the essential characteristics of an extremely dim light image. The method comprises the steps of setting parameters of all components of a synthesis pipeline in a customized mode aiming at set image acquisition equipment, constructing a high-quality simulated dim light object detection data set, training an object detection neural network, and converting all marked images in the existing object detection data set into low-light corresponding data. The method and the device have the advantages that the collected images and human resources used for constructing the extremely dim light object detection data set are saved remarkably, meanwhile, the high-efficiency and high-precision extremely dim light object detection is realized, and the extremely dim light detection precision is improved.

In order to achieve the above purpose, the invention is realized by adopting the following technical scheme.

An extremely dark light object detection method based on a RAW image comprises the following steps:

step 1: and performing degradation processing on the RGB format image to generate a RAW format image, thereby eliminating the influence of the image processing process on the subsequent dim light image noise modeling. Wherein the degradation process comprises: inverse tone mapping, inverse gamma compression, pseudo-color correction, inverse white balancing, and inverse demosaicing operations.

Step 2: noise injection is performed. And establishing a physical noise additive model, and additively injecting real noise into the unprocessed RAW image to simulate a noise image captured in an extremely low light environment.

And step 3: inputting a clean normal light object detection data set, performing inverse ISP degradation processing by using a physical noise additive model, and simulating to generate noisy extremely dim light image data. The method can construct a RAW format real synthetic extreme dim light object detection data set by using the labeled data of the normal light object detection data set.

And 4, step 4: and (3) selecting a basic object detection convolutional neural network according to the data set generated in the step (3), establishing a double-branch extreme dim light object detection network and a training target function thereof, improving the feature extraction enhancement capability and classification regression precision of the network on the extreme dim light image with limited information, performing feature calculation and prediction, and outputting the category information and the position information of the target.

And 5: and (4) inputting an extreme dim light image to be tested, and detecting a framed target class object from the real extreme dim light noisy image by using the double-branch extreme dim light object detection network established in the step (4) and a training target function thereof, so that the high-efficiency and high-precision detection of the extreme dim light object is realized, and the detection quality is improved.

Advantageous effects

Compared with the prior art, the invention has the following advantages:

1. the method provided by the invention is used for analyzing the image signal processing process of the image sensor equipment aiming at common image sensor equipment, establishing a dim light RAW image synthesis pipeline based on a normal light RGB image according to the essential characteristics of an extreme dim light image by combining the advantages of RAW format image data, customizing and setting parameters of each component of the synthesis pipeline aiming at the set image acquisition equipment, and injecting a physical noise model according with real noise distribution, so that the imaging process of the image sensor generated in an extreme dim light scene can be effectively simulated, and the precision of a real synthesized extreme dim light detection data set is further improved.

2. The method of the invention inputs a clean normal light object detection data set, uses the built dark light RAW image synthesis pipeline and the physical noise model, generates an extremely dark light object detection image based on the abundant existing normal light object detection data set simulation, converts the image into low light corresponding data, and can directly follow the existing labeled data to construct a real synthesis data set. In the process of constructing the real synthetic data set, an extremely dark light image does not need to be acquired, and meanwhile, the later manual object labeling work is not needed, so that the image acquisition resource and the human resource used for constructing the extremely dark light object detection data set are obviously saved.

3. According to the method, the optimized convolutional neural network is used for extracting the characteristics of the extreme dim light image and predicting the position and the category information of the image, the generalization of the convolutional neural network can be improved on the basis of not increasing extra calculated amount and operation time, the quality of extreme dim light detection can be improved, the application scene of an object detector is expanded, and the stability and the robustness of the extreme dim light object detection network are ensured.

4. The method has high detection quality and is suitable for the fields of deep space exploration, deep sea exploration, biomedicine, near-field exploration and the like.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic diagram of the internal details of the method of the present invention for constructing a high-quality real synthetic darklight object detection dataset.

FIG. 3 is a schematic representation of the detection of an object in extreme darkness by the method of the present invention.

FIG. 4 is a schematic diagram of the internal details of the dim light enhancement submodule in the dim light object detection network structure according to the method of the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

As shown in fig. 1, a method for detecting an object with extremely dark light based on a RAW image includes the following steps:

step 1: and performing degradation processing on the RGB format image to generate a RAW format image.

Specifically, for an input sRGB image, its RAW format corresponding image is synthesized by inverting camera image signal processing (including inverted tone mapping, gamma correction, color correction, white balance, demosaicing, etc.), thereby eliminating the effect of the image processing process on subsequent dim-light image noise modeling.

The step 1 comprises the following steps:

step 1.1: for an input sRGB image, an inverse tone mapping operation is first performed. Let tone mapping be performed with a smooth step curve and use the inverse of this curve in generating the synthesized data.

The tone mapping step in the conventional sequential ISP flow is set to be implemented by using a mapping function as shown in formula 1, so as to match the characteristic curve of the image, and based on this, the inverse tone mapping operation is implemented by using formula 2, that is, the inverse of formula 1:

smoothstep(x)＝3x²-2x³ (1)

wherein smoothstep () represents a mapping function, smoothstep^-1() Representing the inverse tone mapping function, x being the normalized input image for the tone mapping operation in the forward ISP flow, and y being the normalized input image for the inverse tone mapping operation.

Further, before the mapping operation, the input image needs to be normalized by pixel value, i.e. the pixel value of the image is scaled to 0-1.

Step 1.2: and (3) carrying out inverse gamma compression operation on the output result of the inverse tone mapping operation in the step 1.1.

Since the human eye is more sensitive to gradual changes in dark areas of the image, the resulting gamma compression in the conventional sequential ISP flow is typically used to allocate more dynamic range bits to low intensity pixels. Typically, the operation uses a standard gamma curve as shown in equation 3, with a minimum e 10^-8And performing truncation processing on the input of the gamma curve to prevent the numerical value from being unstable during subsequent training. Based on this, the inverse gamma compression operation is implemented using an inverse approximation of equation 4, i.e., equation 3 (the inverse of gamma compression is implemented as an approximate implementation due to the presence of the parameter ∈).

Γ(x′)＝max(x′，∈)^1/2.2 (3)

Γ^-1(y′)＝max(y′，∈)^2.2 (4)

Wherein Γ () is a gamma compression operation, Γ^-1() For the inverse gamma compression operation, x 'is the normalized input image of the gamma compression operation in the forward ISP flow, and y' is the normalized input image of the inverse gamma compression operation, i.e. the output of the last inverse tone mapping operation (i.e. the output result of y after inverse tone mapping operation in step 1.1).

Step 1.3: and (3) applying the inverse of a sampling Color Correction Matrix (CCM) to cancel the color correction effect on the output result y' of the inverse gamma compression operation in the step 1.2, so that the sRGB image is restored into the camera space RGB image, and the inverse color correction operation is carried out.

Step 1.4: sampling the inverse gain of the digital and white balance, performing inverse gain on each channel of the output result of the inverse color correction operation obtained in the step 1.3 by using the sampling result, recovering the influence of the white balance algorithm on the image illumination condition in the ISP flow, and performing inverse white balance operation.

Specifically, in order to correctly process the intensity of the saturated image, the inverse white balance operation is performed using equations 5 and 6 for the output result obtained in step 1.3. Here, instead of simply applying the inverse gain 1/g to a certain image intensity Θ by multiplicative operation, the intensity Θ of the composite image is reduced while keeping highlights with a highlight preservation transformation f (Θ, g).

Where α () represents a highlight retention factor, g is a white balance gain value, and t is a gain threshold. For example, for a threshold t of 0.9, the highlight preservation transform f (Θ, g) is linear when g ≦ 1 or Θ ≦ t. That is, when Θ ≦ t, f (Θ, g) is x/g; f (theta, g) is 1 when g is less than or equal to 1, and f (theta, g) is continuously differentiable; however, when g >1 and Θ > t, then there is a cubic transformation.

Step 1.5: the target data set follows the convention of performing demosaicing using bilinear interpolation, reversing the process for each pixel of the result output at step 1.4.

Each pixel in a conventional camera sensor is covered by a red, green, or blue filter arranged in a bayer pattern (e.g., R-G-B). According to the bayer filter pattern, two of the three color values thereof are omitted, thereby completely implementing the inverse demosaicing operation.

After the operations in step 1 are completed, the degradation processing of the inverse ISP of the RGB image is realized, so that the RGB image is converted into an image in RAW format originally acquired by the camera, thereby eliminating the interference of various nonlinear operations during the processing of the ISP of the camera.

Step 2: and injecting noise. True noise is additively injected into the RAW image unprocessed to simulate a noisy image captured in a very low light environment.

Compared with the method of simply using the currently commonly adopted Gaussian noise model or heteroscedastic Gaussian noise model, the method adopts the noise model based on the physical characteristics to simulate the complex real noise structure, so that the synthesized extremely dim light image closer to the real situation is obtained. The model establishes noise parameter-system gain joint probability distribution by analyzing how photons go through the stages of electrons, voltages, numbers and the like in the imaging process, considering various noise characteristics such as photon shot noise, read noise, strip mode noise, quantization noise and the like, respectively modeling through Poisson distribution, long tail Tukey-lambda distribution, Gaussian distribution and uniform distribution, and combining parameter calibration, so that a real noise structure can be accurately represented, and simulated extremely dark photosynthetic data closer to real noise distribution can be obtained.

Specifically, a linear injection noise model for clean RAW format normal light image data D according to equation 7 is as follows:

D＝KI+N (7)

where I represents the number of photons, K represents the system gain, and N represents the noise model.

And step 3: inputting a clean normal light object detection data set, and simulating to generate a noisy extremely dim light image by using the inverse ISP degradation processing operation designed in the step 1 and the physical noise additive model established in the step 2. As shown in fig. 2.

All the labeled data resources of the normal light object detection data set, such as object coordinate frames, labels and the like, can be used for constructing the RAW format real synthetic extreme dim light object detection data set.

The process of constructing the real synthetic data set does not need to collect noisy extremely dim light images and do not need to carry out later manual object labeling work, and image collection resources and human resources used for constructing the extremely dim light object detection data set are obviously saved.

And 4, step 4: as shown in the training stage flow of fig. 3, a basic object detection convolutional neural network is selected in combination with the real synthetic data pipeline in step 3, as shown in fig. 4, a two-branch extreme dim light object detection network and a training target function thereof are established, the feature extraction enhancement capability and the classification regression accuracy of the network on the extreme dim light image with limited information are improved, feature calculation and prediction are performed, and the category information and the position information of the target are output.

Wherein, the training target function Loss of the extremely dim light object detection network_totalComprises the following steps:

in formula 8, λ_LRM、λ_cla、λ_locThe weights are all balanced by the obtained loss function,

a loss function representing that the extremely dim light object detection network is responsible for feature extraction and enhancement,

a loss function indicating that the extremely dim light object detection network is responsible for predicting the target class information,

a loss function representing that the extremely dim light object detection network is responsible for predicting the target location information.

Specifically, a data-driven double-branch end-to-end extreme dim light object detection network model respectively aims at two tasks of dim light image recovery and target identification, and by means of multi-task learning, backbone network sharing, knowledge migration and the like, an algorithm model can optimize internal parameters of an object detection network by means of optimized and recovered image hidden scene information and semantic features in a training and learning stage, and migration and sharing of knowledge learned in the two tasks are achieved. After the model algorithm training is completed, the detection result of the extreme dim light object can be directly obtained only by calculating the target recognition branch in the test stage, so that the detection performance of the basic object detection convolutional neural network on the extreme dim light image is greatly improved on the basis of not increasing extra calculation amount and operation time.

Preferably, the training process of the step 4 network and the detection process of the step 5 extremely dim image can be completed by using the GPU. The running speed of the convolutional neural network can be increased by using the cuDNN library.

Examples

To illustrate the effects of the present invention, this example compares various methods under the same experimental conditions.

1. Conditions of the experiment

The hardware test conditions of the experiment were: GPU is P40, video memory 24G. The extreme dim light object detection data used for the test is a real extreme dim light object detection data set manually labeled by experts in the relevant field.

2. Results of the experiment

Compared with different detection schemes of the object with extremely dark light, the detection method disclosed by the invention can verify the effectiveness in multiple angles and all directions.

Table 1 comparative example of very dim light object detection scheme

As can be seen from the results in table 1, the method disclosed in the present invention can achieve a very good detection effect based on the existing object detection network, and the AP index on the real extremely dark light image with various darkness levels is lower than that of the existing methods, no matter in the existing contrast schemes such as direct detection or two-stage "enhancement + detection" scheme. The degree of darkness is mainly divided into 6 different degrees of darkness, such as x 10, x 20, x 30, x 40, x 50, x 100, etc., by changing the exposure time and ISO parameters of the camera, and the AP value is a model evaluation index widely used in object detection tasks, mainly for evaluating the prediction accuracy of object detection model types and position information based on confidence. Therefore, the method disclosed by the invention achieves better detection precision on real images with various darkness degrees, and is superior to other methods.

The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An extremely dark light object detection method based on a RAW image is characterized by comprising the following steps:

step 1: performing degradation processing on the RGB format image to generate an RAW format image;

step 2: carrying out noise injection;

establishing a physical noise additive model, and additively injecting real noise into the unprocessed RAW image to simulate a noise image captured in an extremely low light environment;

and step 3: inputting a clean normal light object detection data set, carrying out inverse ISP degradation processing by using a physical noise additive model, and simulating to generate noisy extremely dim light image data;

and 4, step 4: selecting a basic object detection convolutional neural network according to the data set generated in the step 3, establishing a double-branch extreme dim light object detection network and a training target function thereof, improving the feature extraction enhancement capability and classification regression precision of the network on the extreme dim light image with limited information, performing feature calculation and prediction, and outputting the category information and position information of the target;

and 5: and (4) inputting an extreme dark light image to be tested, and detecting a framed target class object from the real extreme dark light noisy image by using the double-branch extreme dark light object detection network established in the step (4) and a training target function thereof.

2. The method according to claim 1, wherein the step 1 of performing the degradation processing comprises: inverse tone mapping, inverse gamma compression, pseudo-color correction, inverse white balance, and inverse demosaicing.

3. The method according to claim 2, wherein the degradation processing operation is implemented by:

step 1.1: for an input sRGB image, firstly carrying out inverse tone mapping operation;

let tone mapping be performed with a smooth step curve and use the inverse of this curve in generating the synthesized data;

the tone mapping step in the conventional sequential ISP flow is set to be implemented by using the mapping function shown in formula 1, so as to match the characteristic curve of the image, and based on this, the inverse tone mapping operation is implemented by using formula 2, that is, the inverse of formula 1:

smoothstep(x)＝3x²-2x³ (1)

wherein smoothstep () represents a mapping function, smoothstep^-1() Representing an inverse tone mapping function, x being a normalized input image of tone mapping operation in the forward ISP flow, y being a normalized input image of inverse tone mapping operation;

step 1.2: performing inverse gamma compression operation on the output result of the inverse tone mapping operation in the step 1.1;

the standard gamma curve as shown in equation 3 is used, and at the same time, a minimum quantity e is 10^-8The input of the gamma curve is cut off, and the inverse gamma compression operation is realized by using the inverse approximation of formula 4, namely formula 3:

Γ(x′)＝max(x′,∈)^1/2.2 (3)

Γ^-1(y′)＝max(y′,∈)^2.2 (4)

wherein Γ () is a gamma compression operation, Γ^-1() For the inverse gamma compression operation, x 'is the normalized input image of the gamma compression operation in the forward ISP flow, and y' is the normalized input image of the inverse gamma compression operation, i.e. the output of the last inverse tone mapping operation;

step 1.3: applying the inverse of the sampling color correction matrix to the output result y' of the inverse gamma compression operation in the step 1.2 to cancel the color correction effect, so that the sRGB image is restored to be the camera space RGB image, and performing inverse color correction operation;

step 1.4: sampling the inverse gain of the digit and the white balance, performing inverse gain on each channel of the output result of the inverse color correction operation obtained in the step 1.3 by using the sampling result, recovering the influence of a white balance algorithm on the image illumination condition in an ISP flow, and performing inverse white balance operation;

4. A method as claimed in claim 3, wherein the input image is normalized by scaling the pixel values of the input image to between 0 and 1 before the mapping operation of step 1.1.

5. The method according to claim 3, wherein in step 1.4, for correctly processing the intensity of the saturated image, the inverse white balance operation is performed on the output result obtained in step 1.3 by using the following equations 5 and 6; here with a highlight preservation transformation f (Θ, g), the intensity Θ of the composite image is reduced while preserving the highlight:

wherein, α () represents a highlight preservation factor, g is a white balance gain value, and t is a gain threshold; that is, when Θ ≦ t, f (Θ, g) is x/g; f (theta, g) is 1 when g is less than or equal to 1, and f (theta, g) is continuously differentiable; however, when g >1 and Θ > t, then there is a cubic transformation.

6. The method for detecting the object with the extremely dark light based on the RAW image as claimed in claim 1, wherein in the step 2, a noise model based on physical characteristics is adopted to simulate a complex real noise structure;

the model is modeled through Poisson distribution, long-tail Tukey-lambda distribution, Gaussian distribution and uniform distribution, and combined with parameter calibration, noise parameter-system gain joint probability distribution is constructed, a real noise structure is accurately represented, and simulated extremely dark light synthesis data which is closer to real noise distribution is obtained;

according to equation 7, the linear injection noise model for the clean RAW format normal light image data D is as follows:

D＝KI+N (7)

7. The method as claimed in claim 1, wherein in step 3, the RAW format true synthetic object detection dataset is constructed along with all labeled data resources of the normal light object detection dataset.

8. The method according to claim 1, wherein in step 4, the training objective function Loss of the ASD network is set as the Loss_totalComprises the following steps:

wherein λ is_LRM、λ_cla、λ_locThe weights are all balanced by the obtained loss function,

9. The method as claimed in claim 1, wherein the step 4 network training process and the step 5 dark light image detection process are performed by using a GPU.

10. The method of claim 1, wherein a cuDNN library is used to speed up the operation of the convolutional neural network.