CN114022732B

CN114022732B - Ultra-dim light object detection method based on RAW image

Info

Publication number: CN114022732B
Application number: CN202111294930.9A
Authority: CN
Inventors: 付莹; 洪阳
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-11-03
Filing date: 2021-11-03
Publication date: 2024-05-31
Anticipated expiration: 2041-11-03
Also published as: CN114022732A

Abstract

The invention discloses an extremely dim light object detection method based on a RAW image, which is used for a low dim light image acquired by a conventional image sensor, a dim light synthesis pipeline is established according to a physical imaging process, a high-quality dim light object detection simulation data set is constructed based on the existing normal light object detection data set resource, and an accurate dim light object detection network is trained. The invention can finish the extremely dark light object detection based on the existing common image acquisition equipment with high quality, can realize the extremely dark light object detection with high efficiency and high precision while obviously saving the acquisition image and human resources used for constructing an extremely dark light object detection data set, improves the detection precision, expands the application scene of an object detector and breaks through the bottleneck of the object detection field. The invention can be used in a plurality of fields such as deep space exploration, deep sea exploration, biomedicine, near-earth exploration and the like.

Description

Ultra-dim light object detection method based on RAW image

Technical Field

The invention relates to a method for generating dim light image detection data for extreme condition detection, in particular to a method capable of acquiring a high-quality and high-simulation dim light image, and belongs to the technical field of computational vision.

Background

The extremely dim light object detection technology is a technology capable of realizing an object detection task under a low dim light condition, and can effectively detect a target object in a dim light image with low brightness, obvious noise and low signal to noise ratio.

In various low-light scenes with limited light sources, fewer photons are received by the image sensor in the exposure time, and the photons are limited by the physical characteristics of the image sensor, so that the shot image has the characteristics of low brightness, obvious noise and low signal to noise ratio, and the information contained in the image is seriously influenced. At present, the 8-bit quantized JPEG format image captured by the existing image capturing device is an RGB image (R represents Red, G represents Green and B represents Blue) processed by a built-in ISP algorithm (IMAGE SIGNAL Processor, image signal processing), and semantic information of the RGB image is lost in the process, so that weak signals existing in an output image are often severely distorted, and sometimes even scene information is permanently lost. This degradation not only has a serious negative impact on the visual quality of the low-light image, but also on the performance of the algorithm applied to all downstream images.

Therefore, preserving and mining more information in low-light scenes and images is one of the most important tasks in very low-light detection tasks.

In order to effectively detect a target object in a low-low light scene or image, currently, a common solution is to perform dark light enhancement and image denoising processing on the target scene or image in advance before detection, so that missing or scene information is restored to a certain extent through an enhancement algorithm, and thus more information is contained so as to be more likely to be identified by a subsequent detector. However, the existing dim light enhancement and image denoising algorithm based on learning is difficult to obtain a good result in an extremely dim light image with limited information, and lacks a targeted extraction design of semantic feature information required by a subsequent object detection algorithm. In addition, due to the additional enhancement of the steps, the cascaded two-stage 'enhancement+detection' method can generate additional computational burden, greatly increase the time consumption of low-dim light detection tasks and seriously prevent the deployment of related achievements on embedded or mobile equipment with limited resources but wide application.

In recent years, learning-based end-to-end specialized dim light object detection networks have emerged as a result of research, and features are automatically extracted from noisy dim light data by using convolutional neural networks. However, existing learning-based methods generally rely on training data sets synthesized by employing only gamma transformation or linear variation to turn down the natural illumination image brightness, or further adding simple noise models, or the like. Although these methods achieve good results on the composite data, they still do not work well and evaluate on the true data due to the lack of true tagged extremely dim object detection data.

Currently, there are two approaches to solving this problem. Firstly, acquiring paired real data to learn and evaluate an extremely dim light image detection network, wherein the extremely dim light image detection network is similar to normal light image detection, but the collection and manufacture cost of a real low dim light data set is high, the time consumption is long, the labeling is difficult, and a large amount of manpower and material resources are required for manufacturing a high-quality dim light object detection data set; secondly, real simulation data are generated, so that not only can the existing rich normal light object detection data resources be fully utilized, but also a large amount of data manufacturing cost can be saved, and the key point is the accuracy of a simulation modeling pipeline for synthesizing low-low light images. Gaussian noise and poisson noise are common simulated noise models, however, in practice the darkness images produced at different photon levels contain more complex noise. In addition, the images shot by the existing cameras and the disclosed image resources are almost RGB format images processed by the built-in ISP algorithm, semantic information is lost, and due to the physical characteristics of an image sensor and the complexity of an imaging process, the image brightness is simply added or reduced on the images of the existing normal light data set, and the noise generating process in the actual imaging process is inconsistent. Both inaccurate noise descriptions and noise addition can have a significant negative impact on the quality of the simulated data set.

Disclosure of Invention

The invention aims at overcoming the defects of the prior art, in particular to the defects that the prior two-stage method needs extra computing resources, an end-to-end algorithm depends on a real pair-to-pair labeled data set, information is lost after a common RGB format image is compressed, and the like, and creatively provides an extremely dim light object detection method based on a RAW (original) image.

The innovation point of the invention is that: the image signal processing and imaging process of the image acquisition equipment is analyzed, the advantage of limited information in a dark scene can be reserved to the maximum extent by combining with the RAW format image, and a dark light RAW image synthesis pipeline based on normal light RGB images is established according to the intrinsic characteristics of the extremely dark light image. Aiming at a given image acquisition device, the parameters of each component part of a synthetic pipeline are customized and set, a high-quality simulation dim light object detection data set is constructed, an object detection neural network is trained, and all marked images in the existing object detection data set can be converted into low light corresponding data. The method has the advantages that the collected images and human resources for constructing the extremely-dark object detection data set are remarkably saved, the extremely-dark object detection with high efficiency and high precision is realized, and the extremely-dark object detection precision is improved.

In order to achieve the above purpose, the invention is realized by adopting the following technical scheme.

An extremely dim light object detection method based on RAW images comprises the following steps:

Step 1: and performing degradation processing on the RGB format image to generate a RAW format image, so that the influence of the image processing process on the subsequent dark light image noise modeling is eliminated. Wherein the degradation process includes: inverse tone mapping, inverse gamma compression, pseudo-color correction, inverse white balancing, and inverse demosaicing operations.

Step 2: and (5) performing noise injection. A physical noise additive model is built and real noise additive is injected into the RAW image that is RAW to simulate a noise image captured in a very low light environment.

Step 3: and inputting a clean normal light object detection data set, and performing inverse ISP degradation processing by using a physical noise additive model to simulate and generate noisy extremely dim light image data. The RAW format true synthetic extremely dark object detection dataset can be constructed along with the annotation data of the normal light object detection dataset.

Step 4: and 3, selecting a basic object detection convolutional neural network according to the data set generated in the step 3, establishing a double-branch extremely-dark object detection network and a training target function thereof, improving the characteristic extraction enhancement capability and the classification regression precision of the network on an extremely-dark image with limited information, carrying out characteristic calculation and prediction, and outputting the category information and the position information of the target.

Step 5: and (3) inputting an extremely-dark light image to be tested, and selecting a target class object from the real extremely-dark light noisy image by utilizing the double-branch extremely-dark light object detection network and the training objective function thereof established in the step (4), so that the extremely-dark light object detection with high efficiency and high precision is realized, and the detection quality is improved.

Advantageous effects

Compared with the prior art, the invention has the following advantages:

1. According to the method, aiming at common image sensor equipment, an image signal processing process of the image sensor equipment is analyzed, the advantages of RAW format image data are combined, a dark light RAW image synthesis pipeline based on a normal light RGB image is established according to the intrinsic characteristics of a dark light image, parameters of each component of the synthesis pipeline are customized and set for established image acquisition equipment, a physical noise model conforming to real noise distribution is injected, an imaging process of the image sensor in a dark light scene can be effectively simulated, and further the accuracy of a real synthesized dark light detection data set is improved.

2. The method inputs a clean normal light object detection data set, uses an established dim light RAW image synthesis pipeline and a physical noise model, generates an extremely dim light object detection image based on the simulation of the rich existing normal light object detection data set, converts the image into low light corresponding data of the image, and can directly use the existing labeling data to construct a real synthesis data set. The process of constructing the real synthetic data set does not need to acquire extremely dark light images, and meanwhile, does not need to carry out later manual object labeling work, so that acquisition image resources and human resources used for constructing the extremely dark light object detection data set are remarkably saved.

3. According to the method, the optimized convolution neural network is utilized to extract the characteristics of the extremely-dark light image, the image position and the category information are predicted, the generalization of the convolution neural network can be improved on the basis of not increasing extra calculation amount and running time, the quality of extremely-dark light detection can be improved, the application scene of the object detector is expanded, and the stability and the robustness of the extremely-dark light object detection network are ensured.

4. The method has high detection quality, and is suitable for a plurality of fields such as deep space detection, deep sea detection, biomedicine, near-earth detection and the like.

Drawings

Fig. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic illustration of the internal details of the method of the present invention for constructing a high quality true synthetic extremely dark object detection dataset.

FIG. 3 is a schematic representation of the method of the present invention for extremely dim light object detection.

FIG. 4 is a schematic diagram showing the internal details of the darkness enhancer module in the structure of the darkness object detection network according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

As shown in fig. 1, an extremely dim light object detection method based on a RAW image includes the following steps:

Step 1: and performing degradation processing on the RGB format image to generate a RAW format image.

Specifically, for an input sRGB image, its RAW format corresponding image is synthesized by inverting camera image signal processing (including inverse tone mapping, gamma correction, color correction, white balance, demosaicing, etc.), thereby eliminating the effect of the image processing process on subsequent dark image noise modeling.

Step 1 comprises the following steps:

Step 1.1: for an input sRGB image, an inverse tone mapping operation is first performed. Tone mapping is performed with a smooth step-wise curve and the inverse of the curve is used in generating the composite data.

The tone mapping step in the conventional sequential ISP flow is set to be implemented using a mapping function as shown in equation 1, so as to match the characteristic curve of the image, based on which the inverse tone mapping operation is implemented using equation 2, i.e., the inverse of equation 1:

smoothstep(x)＝3x²-2x³ (1)

Where smoothstep () represents the mapping function, smoothstep ^-1 () represents the inverse tone mapping function, x is the normalized input image of the tone mapping operation in the forward ISP flow, and y is the normalized input image of the inverse tone mapping operation.

Further, before the mapping operation, the input image needs to be subjected to pixel value normalization processing, that is, the pixel value of the image is scaled to be between 0 and 1.

Step 1.2: and (3) performing inverse gamma compression operation on the output result of the inverse tone mapping operation in the step 1.1.

Since the human eye is more sensitive to gradual changes in dark areas of the image, the resulting gamma compression in conventional sequential ISP schemes is typically used to assign more dynamic range bits to low intensity pixels. Typically this operation uses a standard gamma curve as shown in equation 3, while the input to the gamma curve is truncated by a minimum amount e=10 ^-8 to prevent numerical instability during subsequent training. Based on this, the inverse gamma compression operation is implemented using the inverse approximation of equation 4, i.e., equation 3 (the inverse of gamma compression is implemented as an approximation due to the presence of the parameter e).

Γ(x′)＝max(x′，∈)^1/2.2 (3)

Γ^-1(y′)＝max(y′,∈)^2.2 (4)

Wherein Γ () is gamma compression operation, Γ ^-1 () is inverse gamma compression operation, x 'is normalized input image of gamma compression operation in forward ISP flow, and y' is normalized input image of inverse gamma compression operation, i.e. output of last inverse tone mapping operation (i.e. output result of y after inverse tone mapping operation in step 1.1).

Step 1.3: and (2) applying the inverse of the sampling Color Correction Matrix (CCM) to the output result y' of the inverse gamma compression operation in the step (1.2) to cancel the color correction effect, so that the sRGB image is restored to the camera space RGB image, and performing the inverse color correction operation.

Step 1.4: and (3) sampling the inverse gain of the digital and white balance, carrying out inverse gain on each channel of the output result of the inverse color correction operation obtained in the step (1.3) by using the sampling result, recovering the influence of the white balance algorithm on the image illumination condition in the ISP flow, and carrying out inverse white balance operation.

Specifically, in order to correctly process the intensity of the saturated image, the inverse white balance operation is performed on the output result obtained in step 1.3 using equations 5 and 6. Here, instead of applying the inverse gain 1/g to a certain image intensity Θ simply by multiplicative operation, the conversion f (Θ, g) is kept with one highlight, and the intensity Θ of the synthesized image is reduced while keeping the highlight.

Where α () represents a highlight holding factor, g is a white balance gain value, and t is a gain threshold. For example, for a threshold t=0.9, when g+.1 or Θ+.t, the highlights keep the transformation f (Θ, g) linear. That is, f (Θ, g) =x/g when Θ is t; f (Θ, g) =1 when g+.ltoreq.1, f (Θ, g) is continuously differentiable; but when g > 1 and Θ > t, then a cube transformation is performed.

Step 1.5: the target dataset follows the convention of performing demosaicing using bilinear interpolation, reversing the process for each pixel of the result output by step 1.4.

Each pixel in a conventional camera sensor is covered by a red, green or blue filter arranged in a bayer pattern (e.g., R-G-B). According to the bayer filter pattern, two of its three color values are omitted, thereby completely realizing the inverse demosaicing operation.

After the operation of each step in the step 1 is completed, the degradation processing of the reverse ISP of the RGB image is realized, so that the RGB image is converted into the RAW format image originally acquired by the camera, and the interference of various nonlinear operations during the ISP processing of the camera is eliminated.

Step 2: noise injection. Real noise is additive injected into the RAW image untreated to simulate a noise image captured in a very low light environment.

Compared with the simple use of the Gaussian noise model or the heteroscedastic Gaussian noise model which are commonly adopted at present, the method adopts the noise model based on physical characteristics so as to simulate a complex real noise structure, thereby obtaining a synthetic extremely dark light image which is more similar to the real situation. According to the model, by analyzing how photons go through the stages of electrons, voltages, numbers and the like in the imaging process and considering various noise characteristics such as photon shot noise, reading noise, stripe pattern noise, quantization noise and the like, the model is modeled through poisson distribution, long-tail Tukey-lambda distribution, gaussian distribution and uniform distribution respectively, and by combining parameter calibration, noise parameter-system gain joint probability distribution is constructed, a real noise structure can be accurately represented, and simulation extremely dim light synthetic data which is more close to the real noise distribution is obtained.

Specifically, the linear injection noise model for the clean RAW format normal light image data D according to equation 7 is specifically as follows:

D＝KI+N (7)

where I represents the number of photons, K represents the system gain, and N represents the noise model.

Step 3: and (3) inputting a clean normal light object detection data set, and simulating to generate a noisy extremely dark light image by using the reverse ISP degradation processing operation designed in the step (1) and the physical noise additive model established in the step (2). As shown in fig. 2.

All marked data resources of the normal light object detection data set, such as object coordinate frames, labels and the like, can be used for constructing the RAW format real synthetic extremely dark light object detection data set.

And in the process of constructing the real synthetic data set, the noisy extremely-dim light image is not required to be acquired, the later manual object labeling work is not required, and the acquisition image resources and the human resources for constructing the extremely-dim light object detection data set are obviously saved.

Step 4: and (3) selecting a basic object detection convolutional neural network by combining the true synthetic data pipeline in the step (3) as shown in a training stage flow of fig. 3, establishing a double-branch extremely-dark object detection network and a training target function thereof as shown in fig. 4, improving the feature extraction enhancement capability and classification regression precision of the network on an extremely-dark image with limited information, carrying out feature calculation and prediction, and outputting the category information and the position information of a target.

The training objective function Loss _total of the extremely dim light object detection network is as follows:

In equation 8, lambda _LRM、λ_cla、λ_loc is the obtained loss function balance weight, Representing the loss function responsible for feature extraction and enhancement for extremely dim light object detection networks,/>Representing the loss function of the extremely dim light object detection network responsible for predicting the target class information,/>Indicating that the very dim light object detection network is responsible for predicting the loss function of the target location information.

Specifically, the data-driven dual-branch end-to-end extremely-dim light object detection network model is used for respectively aiming at two tasks of dim light image recovery and target identification, and the algorithm model can optimize internal parameters of the object detection network by utilizing the image hidden scene information and semantic features after optimization recovery in a training and learning stage by utilizing modes of multi-task learning, backbone network sharing, knowledge migration and the like, and the learned knowledge in the two tasks is migrated and shared. After model algorithm training is completed, the extremely dark object detection result can be directly obtained only by calculating the target recognition branch in the test stage, so that the detection performance of the convolutional neural network for detecting the basic object on an extremely dark image is greatly improved on the basis of not increasing extra calculation amount and running time.

Preferably, the training process of the step 4 network and the detection process of the step 5 extreme dim light image can be accomplished using a GPU. The cuDNN library can be used to speed up the operation of the convolutional neural network.

Examples

To illustrate the effect of the present invention, the present example will compare various methods under the same experimental conditions.

1. Experimental conditions

The hardware test conditions of this experiment were: the GPU is P40 and the memory is 24G. The extremely dark object detection data used for the test is a real extremely dark object detection data set manually marked by an expert in the relevant field.

2. Experimental results

And comparing different extremely dim light object detection schemes, and verifying the effectiveness of the detection method disclosed by the invention in multiple angles and all directions.

Table 1 contrast extreme dim light object detection scheme

From the results in table 1, it can be seen that the method disclosed by the invention can achieve a very good detection effect based on the existing object detection network, and the existing comparison schemes such as direct detection and two-stage 'enhancement+detection' scheme are lower than the method in AP indexes on real extremely dark light images with various darkness degrees. Darkness degree the darkness images are divided into 6 different darkness degree grades of x 10, x 20, x 30, x 40, x 50, x 100 and the like mainly by changing the exposure time and ISO parameters of the camera, and the AP value is used for judging the prediction precision of the object detection model type and the position information mainly based on the confidence degree, and is a model evaluation index widely used in the object detection task. Therefore, the method disclosed by the invention achieves better detection precision on the true images with various darkness degrees and is superior to other methods.

While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

1. The method for detecting the extremely dim light object based on the RAW image is characterized by comprising the following steps of:

step 1: performing degradation processing on the RGB format image to generate a RAW format image;

the degradation processing operation includes: inverse tone mapping, inverse gamma compression, pseudo-color correction, inverse white balance and inverse demosaicing; the specific implementation method is as follows:

Step 1.1: for an input sRGB image, firstly performing inverse tone mapping operation;

Tone mapping is performed with a smooth step-size curve and the inverse of the curve is used in generating the composite data;

the tone mapping step in the conventional sequential ISP flow is set to be implemented using the mapping function as shown in equation 1, so as to match the characteristic curve of the image, based on which the inverse tone mapping operation is implemented using equation 2, i.e., the inverse of equation 1:

smoothstep(x)＝3x²-2x³ (1)

Wherein smoothstep () represents a mapping function, smoothstep ^-1 () represents an inverse tone mapping function, x is a normalized input image of a tone mapping operation in a forward ISP flow, and y is a normalized input image of an inverse tone mapping operation;

step 1.2: performing inverse gamma compression operation on the output result of the inverse tone mapping operation in the step 1.1;

Using the standard gamma curve shown in formula 3, and simultaneously, performing a truncation process on the input of the gamma curve by using a minimum amount e=10 ^-8, and implementing an inverse gamma compression operation using formula 4, i.e., the inverse approximation of formula 3:

Γ(x′)＝max(x′,∈)^1/2.2 (3)

Γ^-1(y′)＝max(y′,∈)^2.2 (4)

Wherein Γ () is gamma compression operation, Γ ^-1 () is inverse gamma compression operation, x 'is normalized input image of gamma compression operation in forward ISP flow, y' is normalized input image of inverse gamma compression operation, i.e. output of last inverse tone mapping operation;

step 1.3: applying the inverse of the sampling color correction matrix to the output result y' of the inverse gamma compression operation in the step 1.2 to cancel the color correction effect, so that the sRGB image is restored to the camera space RGB image, and performing the inverse color correction operation;

step 1.4: sampling the inverse gain of the digital and white balance, and performing inverse gain on each channel of the output result of the inverse color correction operation obtained in the step 1.3 by using the sampling result, recovering the influence of the white balance algorithm on the image illumination condition in the ISP flow, and performing inverse white balance operation;

step 1.5: the target dataset follows the convention of performing demosaicing using bilinear interpolation, reversing the process for each pixel of the result output by step 1.4;

step 2: performing noise injection;

Establishing a physical noise additive model, and injecting real noise additive into an unprocessed RAW image to simulate a noise image captured in an extremely low light environment;

Modeling the model through poisson distribution, long tail Tukey-lambda distribution, gaussian distribution and uniform distribution, and constructing noise parameter-system gain joint probability distribution by combining parameter calibration, so as to accurately represent a real noise structure and obtain simulated extremely dim light synthetic data which is more similar to the real noise distribution;

according to equation 7, the linear injection noise model for the clean RAW format normal light image data D is specifically as follows:

D＝KI+N (7)

Wherein I represents the number of photons, K represents the system gain, and N represents the noise model;

Step 3: inputting a clean normal light object detection data set, and performing inverse ISP degradation treatment by using a physical noise additive model to simulate and generate noisy extremely dim light image data;

Step 4: according to the data set generated in the step 3, a basic object detection convolutional neural network is selected, a double-branch extremely-dark object detection network and a training target function thereof are established, the characteristic extraction enhancement capability and the classification regression precision of the network on an extremely-dark image with limited information are improved, characteristic calculation and prediction are carried out, and the category information and the position information of a target are output;

step 5: inputting an extremely dark light image to be tested, and detecting and framing an object of a target class from the real extremely dark light noisy image by utilizing the double-branch extremely dark light object detection network established in the step 4 and the training objective function thereof.

2. The RAW image-based ultra-dim object detection method according to claim 1, wherein the input image is subjected to pixel value normalization processing, i.e., the image is scaled to a pixel value between 0 and 1, prior to the mapping operation in step 1.1.

3. The method for detecting an extremely dark light object based on a RAW image according to claim 1, wherein in step 1.4, in order to correctly process the intensity of a saturated image, the output result obtained in step 1.3 is subjected to a reverse white balance operation using formula 5 and formula 6; here, the change f (Θ, g) is maintained with a highlight, and the intensity Θ of the composite image is reduced while maintaining the highlight:

Wherein alpha () represents a highlight holding factor, g is a white balance gain value, and t is a gain threshold; that is, f (Θ, g) =x/g when Θ is t; f (Θ, g) =1 when g+.ltoreq.1, f (Θ, g) is continuously differentiable; but when g >1 and Θ > t, then a cube transformation.

4. The method for detecting an extremely dark object based on a RAW image according to claim 1, wherein in step 3, all labeling data resources of the normal light object detection data set are used to construct a RAW format true synthetic extremely dark object detection data set.

5. The method for detecting an extremely dark object based on RAW images as claimed in claim 1, wherein in step 4, the training objective function Loss _total of the extremely dark object detection network is:

Wherein lambda _LRM、λ_cla、λ_loc is the obtained balance weight of the loss function, Representing the loss function responsible for feature extraction and enhancement for extremely dim light object detection networks,/>Representing the loss function of the extremely dim light object detection network responsible for predicting the target class information,/>Indicating that the very dim light object detection network is responsible for predicting the loss function of the target location information.

6. The RAW image-based ultra-dim object detection method according to claim 1, wherein the training process of the network in step 4 and the detection process of the ultra-dim image in step 5 are completed using a GPU.

7. The method for detecting an extremely dim light object based on a RAW image according to claim 1, wherein the operation speed of the convolutional neural network is increased by using cuDNN libraries.