CN110517203B

CN110517203B - Defogging method based on reference image reconstruction

Info

Publication number: CN110517203B
Application number: CN201910815133.7A
Authority: CN
Inventors: 李晋江; 李桂会; 范辉
Original assignee: Shandong Technology and Business University
Current assignee: Shandong Technology and Business University
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2023-06-23
Anticipated expiration: 2039-08-30
Also published as: CN110517203A

Abstract

The invention discloses a defogging method based on reference image reconstruction. The method comprises the following steps: firstly, acquiring haze-free color images and depth image data sets of different scenes and different light rays by using a depth camera, and performing preliminary pretreatment; setting an atmospheric light value and transmissivity, synthesizing a foggy image from the foggy image, and carrying out noise reduction pretreatment on a foggy data set; then selecting images with similar content corresponding to the foggy images as reference images; and finally, constructing an end-to-end convolutional neural network model, specifically, respectively subjecting the network to the processes of extracting haze characteristics, removing the haze characteristics, adaptively migrating textures of a reference image and reconstructing a high-resolution defogging image. Unlike most end-to-end based deep learning methods, the method does not combine the displayed atmospheric physical model, but enhances the details of the image while achieving image defogging by constructing a defogging network and introducing a reference image, implicitly removing haze.

Description

Defogging method based on reference image reconstruction

Technical Field

The invention relates to an image processing method, in particular to an image defogging method based on deep learning.

Background

Mist is an atmospheric phenomenon formed by water droplets or a large number of minute particles in the air. The images photographed in such an environment generally have problems of color distortion, serious degradation of contrast, loss of scene details, and the like. These problems can severely impact various types of system performance that rely on optical imaging instrumentation, such as urban traffic systems, outdoor monitoring systems, target recognition systems, and the like. Therefore, taking effective measures to remove haze of an image, it becomes increasingly necessary to restore sharpness of the image.

The image defogging process takes a hazy image as input, eliminates degradation effects and finally restores an image without haze. Currently, methods for image processing in foggy days are mainly classified into three categories: the first is an algorithm based on image processing. Such algorithms essentially use image processing methods to change contrast or brightness, thereby improving the visual effect of the image. However, this method does not pay attention to the cause of image degradation, resulting in problems such as incomplete image defogging, and easy image distortion. The second category is algorithms based on image restoration. The algorithm is based on the atmospheric scattering principle, and the foggy image is restored by solving the inverse process of the foggy image degradation process. The image restored by the method has a real effect, is more close to the original scene of the degraded foreground, has a good image processing effect on the complex scene, and can be stored more completely. The third category is deep learning based methods. At present, the research of an image defogging method based on deep learning is very hot, some methods are based on an atmospheric physical model, and some methods directly perform learning mapping. These methods can automatically learn complex input-output relationships based on data observations, allowing for more complex heuristic learning that is imperceptible to humans. Although these methods have achieved satisfactory results, they are based on strong assumptions and require a variety of parameters related to image formation, which are not always available. This is due to unpredictability of scene conditions, leading them to fail in cases where the foreground is not authentic, such as underwater environments, high-light or low-light environments, or scenes where haze is not entirely white.

Disclosure of Invention

Object of the invention

The invention provides a defogging method based on reference image reconstruction, and aims to solve the problems of low resolution of results processed by the traditional defogging method and low operation efficiency caused by introducing too many parameters.

(II) technical scheme

In order to achieve the above purpose, the present invention adopts the following technical scheme:

firstly, collecting a data set of a foggy image, taking a high-resolution image which is similar to the image in the data set as a reference image, then constructing an end-to-end convolutional neural network model, adaptively migrating textures of the reference image according to the texture similarity of the foggy image and the reference image, and enhancing details of the image while defogging the image.

The method comprises the following specific steps:

and step 1, manufacturing and synthesizing a foggy image data set and a real foggy image data set.

The step 1 specifically comprises the following steps:

1.1 The method comprises the steps of) acquiring real and clean images of the ground with different brightness of different scenes and corresponding depth maps by using a depth camera, wherein 5000 pairs of data are acquired altogether, the acquired scenes are mainly divided into indoor scenes and outdoor scenes, and the brightness of special scenes is divided into high brightness and low brightness;

1.2 Preprocessing acquired paired images, including aligning and fixing the depth map and the color image;

1.3 Given haze-free image)

Scene depth map

Illuminance of atmosphere

And atmospheric scattering coefficient

According to the formula

And calculating a transmission diagram as a ground real transmission diagram. According to the formula of the atmospheric physical model

The foggy image synthesis was performed, and the obtained foggy image was expressed as:

（1）

1.4 5000 real hazy images are collected on line by using Google pictures to serve as a real hazy image data set.

And 2, selecting a training set and a testing set, and respectively performing rapid noise reduction pretreatment on the training set and the testing set.

The step 2 specifically includes the following steps:

2.1 Randomly selecting 3000 pairs from the synthetic foggy dataset as a training set, and randomly selecting 1000 pairs from the NYU2 Depth dataset as a training set in order to avoid the dependence of a trained network model on the dataset; the test set comprises 2000 pieces of synthesized foggy data set, 950 pieces of NYU2 Depth data set, and 5000 real foggy images collected on line;

2.2 The foggy image is subjected to noise reduction by using an FFDNet network, the foggy image is input into an FFDNet denoising network, and the denoised foggy image is output.

And 3, manufacturing a reference image data set aiming at the fog data set.

The step 3 specifically includes the following steps:

3.1 For each picture in the foggy data set, manually and rapidly searching 5 corresponding similar foggy high-resolution reference images by using the function of hundred-degree picture recognition;

3.2 Lot size adjustment of the reference image and the corresponding haze-free image

Is a picture of the picture(s).

And 4, constructing a defogging network model, wherein the whole defogging network structure consists of two parts, the first part realizes a basic image defogging function, and the second part realizes a texture detail enhancement recovery function.

The step 4 specifically includes the following steps:

4.1 The first part adopts a network structure of an encoder-functional layer-decoder. Wherein the encoder is responsible for implementing the feature extraction function. The device consists of downsampling layers and convolution layers, wherein each downsampling layer comprises a convolution layer, a normalization layer and an activation layer. The convolutional layer with step size of 2 is used here to perform downsampling instead of the fixed pooling layer in the conventional convolutional network. The characteristic calculation formula of each convolution layer by output is as follows:

（2）

wherein the method comprises the steps of

The matrix of images is represented and,

the convolution kernel is represented as a function of the convolution kernel,

a convolution operation is represented and is performed,

representing the bias value. And a batch normalization layer is added after each convolution layer in the network, and normalization processing is carried out on the values of each characteristic on all samples, so that the functions of stabilizing model training and accelerating convergence are realized. Batch normalization is defined as:

（3）

wherein the method comprises the steps of

A feature map representing the input is presented,

representation of

Is used for the average value of (a),

representation of

By setting 2 parameters per layer

Realize scaling and translation, change the value interval. To speed up training convolutional neural networks, the ReLU function is used for activation. The formula for defining the ReLU function is:

（4）

the intermediate functional layer is responsible for implementing the function of fog removal. It consists of 3 residual blocks and a skip connection. Each residual block is a filter

And different bypasses use different convolution kernels. Operations may be defined as:

（5）

wherein the method comprises the steps of

And

respectively, weight and bias, the superscript represents the number of layers they are in, and the subscript represents the size of the convolution kernel used in the surface layer,

representing a cascading operation.

And

representing the input and output of the residual block. Finally removing the background layer by element-by-element subtraction to realize haze removal;

the reconstruction network is thus a decoder corresponding to the encoder, which is responsible for recovering the defogged image, consisting of 4 upsampled layers. Wherein the up-sampling layer consists of a deconvolution layer with a step size of 2, a batch normalization layer and a nonlinear activation layer. The size of the feature map of each up-sampling unit input up-sampling unit is doubled after the feature map is subjected to the deconvolution process;

4.2 The network structure of the second part is composed of the same encoders, the layers of the two encoders are connected by adopting a feature matching block, and the matched features are respectively cascaded to the decoder, so that the image with richer details is obtained on the defogging basis. Similar feature exchange is defined as:

（6）

wherein the method comprises the steps of

And

representing the sampling from the neural feature map

And (b)

And (3) patches.

The space of the neural characteristics is represented,

represent the first

And (b)

Similarity between patches.

A feature map representing an exchange, the feature exchange may be represented by the following formula:

（7）

。

and step 5, training a network model, and testing by using a testing set.

The step 5 specifically includes the following steps:

5.1 Using a synthetic foggy training set training network, the training objective function representing the average error of the defogging image estimated by the network and the real foggy image, setting

Representing an estimated defogging pattern of the network,

the true fog-free map of the ground is represented by the formula:

（8）

wherein the method comprises the steps of

Is the training set

The defogging image is estimated by the network,

is the training set

A true ground fog-free image with fog images;

5.2 Testing the performance of the network by using the test data set, and evaluating by using subjective or objective evaluation indexes PSNR and SSIM.

The invention has the beneficial effects that:

(1) The invention realizes defogging of end-to-end images and avoids the problem of low operation efficiency caused by introducing too many super parameters;

(2) The invention uses similar reference images to recover details of defogging images, thereby enhancing target visualization of the images;

the method can process images of different scenes, namely, indoor and outdoor synthesis of foggy images and foggy images of brighter areas and darker areas in the real world, and the network model has better defogging effect than the traditional neural network method.

Description of the drawings:

FIG. 1 is a schematic overall flow diagram of an image defogging method based on a residual error network of the present invention;

FIG. 2 is a schematic diagram of an image defogging network constructed in accordance with the present invention;

FIG. 3 is a defogging result image obtained in a brighter scene according to the present invention;

FIG. 4 is a defogging result image obtained in a darker scene according to the present invention;

FIG. 5 is a defogging result image obtained under an indoor scene according to the present invention;

fig. 6 is a defogging result image obtained in an outdoor scene according to the present invention.

The specific embodiment is as follows:

the invention is further described with reference to the drawings and examples;

as shown in fig. 1, the following steps are included.

1) Creating a composite foggy image dataset and a real foggy image dataset:

1.3 Given haze-free image)

Scene depth map

Illuminance of atmosphere

And atmospheric scattering coefficient

According to the formula

（1）

2) Selecting a training set and a testing set, and respectively carrying out rapid noise reduction pretreatment on the training set and the testing set:

3) For a foggy dataset, a reference image dataset is made:

Is a picture of the picture(s).

4) Constructing a defogging network model, wherein the whole defogging network structure consists of two parts, the first part realizes a basic image defogging function, and the second part realizes a texture detail enhancement recovery function;

4.1 The first part adopts a network structure of an encoder-functional layer-decoder. Wherein the encoder is responsible for implementing the feature extraction function. It consists of downsampling layers and convolution layers, each downsampling layer comprising

A convolution layer, a normalization layer and an activation layer. The convolutional layer with step size of 2 is used here to perform downsampling instead of the fixed pooling layer in the conventional convolutional network. The characteristic calculation formula of each convolution layer by output is as follows:

（2）

wherein the method comprises the steps of

The matrix of images is represented and,

the convolution kernel is represented as a function of the convolution kernel,

a convolution operation is represented and is performed,

（3）

wherein the method comprises the steps of

A feature map representing the input is presented,

representation of

Is used for the average value of (a),

representation of

By setting 2 parameters per layer

To realizeScaling and translating, and changing the value interval. To speed up training convolutional neural networks, the ReLU function is used for activation. The formula for defining the ReLU function is:

（4）

（5）

wherein the method comprises the steps of

And

representing a cascading operation.

And

（6）

wherein the method comprises the steps of

And

representing the sampling from the neural feature map

And (b)

And (3) patches.

The space of the neural characteristics is represented,

represent the first

And (b)

Similarity between patches.

（7）

。

5) Training a network model, and testing by using a testing set:

Representing an estimated defogging pattern of the network,

the true fog-free map of the ground is represented by the formula:

（8）

wherein the method comprises the steps of

Is the training set

The defogging image is estimated by the network,

is the training set

A true ground fog-free image with fog images;

5.2 The performance of the network is tested by using the test data set, and the subjective or objective evaluation indexes PSNR and SSIM are used for evaluation, and the partial test structure is shown in fig. 3, 4, 5 and 6.

Fig. 3 shows the processing result of the foggy image in the bright scene, fig. 3 (a) is the foggy image, and fig. 3 (b) is the processing result of the present invention. Fig. 4 shows the processing result of the foggy image in the dark scene, fig. 4 (a) is the foggy image, and fig. 4 (b) is the processing result of the present invention. Fig. 5 shows the processing result of the foggy image in the indoor scene, fig. 5 (a) shows the foggy image, and fig. 5 (b) shows the processing result of the present invention. Fig. 6 shows the processing result of the foggy image in the outdoor scene, fig. 6 (a) shows the foggy image, and fig. 6 (b) shows the processing result of the present invention. From the results of the figures 3-6, the invention has universality and can exert better effect on images of special scenes.

In summary, the invention discloses a defogging method based on reference image reconstruction. Unlike most of end-to-end deep learning methods, the method is not combined with a displayed atmospheric physical model, but implicitly removes haze by constructing a defogging network and introducing a reference image, so that the problem of low operation efficiency caused by estimating a plurality of super parameters is avoided, and details of the image are enhanced while defogging the image is realized. While the foregoing detailed description of the embodiments of the present invention has been presented in conjunction with the drawings, it is not intended to limit the scope of the invention, and it should be understood that those skilled in the art can make various modifications or variations within the scope of the invention without inventive effort within the scope of the invention described herein.

Claims

1. A defogging method based on reference image reconstruction, the method comprising:

step 1) manufacturing a synthetic foggy image data set and a real foggy image data set;

step 2) selecting a training set and a testing set, and respectively performing rapid noise reduction pretreatment on the training set and the testing set;

step 3) for the foggy training set, a reference image data set is manufactured;

step 4) constructing a defogging network model, wherein the whole defogging network structure consists of two parts, the first part realizes an image defogging function, and the second part realizes a texture detail enhancement recovery function;

step 5) training a network model, and testing by using a testing set;

step 3) for the foggy data set, a reference image data set is manufactured, and the steps are as follows:

3.2 Lot size of 256×256 pictures;

the step 4) is to construct an image defogging network model, the whole defogging network structure is composed of two parts, the first part is used for realizing the image defogging function, and the second part is used for realizing the texture detail enhancement recovery function, and the method comprises the following steps:

4.1 A first part adopts a network structure of an encoder-functional layer-decoder; wherein the encoder is responsible for realizing the feature extraction function; the device consists of 4 downsampling layers, wherein each downsampling layer comprises a convolution layer, a normalization layer and an activation layer; the convolutional layer with the step length of 2 is adopted to replace a fixed pooling layer in the traditional convolutional network for downsampling; the middle functional layer is responsible for realizing the function of fog elimination; it is composed of residual block and jump connection; each residual block is a two-bypass residual block with a filter of 3 multiplied by 3 and 5 multiplied by 5 respectively, and different bypasses use different convolution kernels; the reconstruction network is a decoder corresponding to the encoder and is responsible for recovering defogging images, and consists of 4 up-sampling layers; the upper sampling layer consists of a deconvolution layer with a step length of 2, a batch normalization layer and a nonlinear activation layer; the size of the feature map of each up-sampling unit input up-sampling unit is doubled after the feature map is subjected to the deconvolution process;

wherein each residual block is a two-bypass residual block with a filter of 3×3,5×5, respectively, different bypasses use different convolution kernels, and the operations are defined as:

where W and b represent weight and bias, respectively, the superscript represents the number of layers they are in, and the subscript represents the size of the convolution kernel used in the skin,

representing cascade operation, x _n-1 And x _n Representing the input and output of the residual block, and finally removing the background layer by element-by-element subtraction to realize the haze removal function;

4.2 The network structure of the second part is composed of the same encoders, the layers of the two encoders are connected by adopting a feature matching block, and the matched features are respectively cascaded to the decoder, so that an image with richer details is obtained on the basis of defogging;

similar feature exchange is defined as:

wherein B is _i And B _j Representing the ith and jth patches sampled from the neural feature map,

representation ofNeural feature space, D _i，j Representing the similarity between the ith and jth patches, T represents the feature map of the exchange, which can be represented by the following formula:

2. the defogging method based on reference image reconstruction of claim 1, wherein the step 1) is to collect real clean images of the ground with different brightness of different scenes and corresponding depth maps, and preprocess the collected paired images, and comprises the following steps:

1.1 Data acquisition is carried out by using a depth camera, 5000 pairs of data are acquired altogether, the acquired scenes are divided into indoor scenes and outdoor scenes, and the brightness of the scenes is divided into high brightness and low brightness;

1.2 Preprocessing the acquired paired images, and carrying out alignment and fixed-size processing on the depth map and the color image;

1.3 Giving a foggy image, a scene depth map, atmospheric illuminance and an atmospheric scattering coefficient, and synthesizing the foggy image according to an atmospheric physical model formula;

3. The defogging method based on reference image reconstruction of claim 1, wherein the step 2) is to select a training set and a test set, and perform rapid noise reduction pretreatment on the training set and the test set respectively, and the method comprises the following steps:

4. The defogging method based on reference image reconstruction of claim 1, wherein the step 5) trains a network model, and tests by using a test set, and comprises the following steps:

5.1 Using the synthesized foggy training set to train the network, wherein the trained objective function represents the average error of the defogging image estimated by the network and the real foggy image;