CN112884680A

CN112884680A - Single image defogging method using end-to-end neural network

Info

Publication number: CN112884680A
Application number: CN202110326940.XA
Authority: CN
Inventors: 胡彬; 顾铭岑; 岳壮壮; 李金航
Original assignee: Nantong University
Current assignee: Nantong University
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2021-06-01

Abstract

The invention provides a single image defogging method by using an end-to-end neural network, which comprises the following steps: constructing a grid attention network model: the input is an image to be defogged, the image to be defogged is sent into a shallow feature extraction convolution layer, then is sent into a GridNet module and an Attention module, and finally the features are transmitted to a reconstruction part and a global residual error learning structure, and a clear image is output. The invention combines the mesh network and the attention mechanism, in the traditional multi-scale network or the coding and decoding network, due to the hierarchical structure, the information flow is often influenced by the bottleneck effect, and the mesh network can avoid the problem by using the up-sampling block and the down-sampling block and by densely connecting the mesh network and the down-sampling block across different scales. In addition, the attention mechanism is given to a channel and pixel of the network, which can provide extra flexibility to process different types of information, and the attention mechanism also enables the network to expand the characterization capability of the CNNs.

Description

Single image defogging method using end-to-end neural network

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a single image defogging method by using an end-to-end neural network.

Background

Haze is a common atmospheric phenomenon produced by small floating particles such as dust and smoke in the air, which absorb scattered light greatly, resulting in a reduction in image quality. Under the influence of haze, many practical applications such as video monitoring, remote sensing, automatic driving and the like are easily threatened, and high-level computer vision tasks such as detection and identification are difficult to complete, so that image defogging (defogging) becomes an increasingly important technology.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a single image defogging method using an end-to-end neural network, which combines a grid network and an attention mechanism, gives the network an attention mechanism of one channel and one pixel, and can provide extra flexibility to process different types of information.

To solve the above technical problem, an embodiment of the present invention provides a single image defogging method using an end-to-end neural network, including the following steps:

s1, constructing a grid attention network model: the input is an image to be defogged, the image to be defogged is sent into a shallow feature extraction convolutional layer, then is sent into a GridNet module and an Attention module, and finally the features are transmitted to a reconstruction part and a global residual error learning structure, and a clear image is output;

s2, model training: by smoothing L₁Loss function:

training in a data set RESIDE;

wherein N refers to the total number of image pixels,

and J_i(x) Refers to the pixel value of x on the ith channel (a total of RGB3 channels),

refers to a value calculated over a network, J_i(x) The actual value is represented by the value of,

in step S1, the GridNet module has four rows and six columns, each row corresponds to a different feature scale, and is composed of five basic attention convolution modules ABD, which combine a skip connection and an attention module, and each column is a bridge connecting adjacent scales through up-sampling and down-sampling blocks; in each upsampling module, the size of the feature map is reduced by a factor of 2, while the number of feature maps is increased by a factor of 2, and the downsampling doubles the size of the features.

Further, the attention convolution module ABD in the GridNet module consists of a local residual learning and attention module, which learns less important information from the low frequency region of the input features through jump concatenation.

In the Attention module of step S1, global average pooling is first adopted:

wherein H_pRepresenting a global average pooling function, X_c(i, j) a value at (i, j) where the c-channel representing the input value is located;

then obtaining SA after processing through convolutional layer, ReLU, convolutional layer and sigmoid activation function₁，

SA₁＝σ(Conv(δ(Conv(g_c))))，

Wherein, sigma represents a sigmoid function, and delta represents a ReLU function;

will input F_cAnd SA₁Multiplication to obtain

Then, the following functions are obtained through convolutional layers, ReLU, convolutional layers and sigmoid activation functions:

and (3) final output:

the technical scheme of the invention has the following beneficial effects: the invention provides a single image defogging method by utilizing an end-to-end neural network, which combines a grid network and an attention mechanism, wherein in a traditional multi-scale network or a coding and decoding network, due to a hierarchical structure, information flow is often influenced by a bottleneck effect, and the grid network avoids the problem by using up-sampling and down-sampling blocks and by densely connecting the grid network and the coding and decoding network in different scales. In addition, the attention mechanism is given to a channel and pixel of the network, which can provide extra flexibility to process different types of information, and the attention mechanism also enables the network to expand the characterization capability of the CNNs.

Drawings

FIG. 1 is a block diagram of an attention network model in accordance with the present invention;

FIG. 2 is a block diagram of the GridNet module of the present invention;

FIG. 3 is a block diagram of an attention convolution module ABD of the present invention;

FIG. 4 is a structural diagram of an Attention module in the present invention;

FIG. 5 is a comparison of before and after image defogging according to one embodiment of the present invention;

FIG. 6 is a comparison of before and after image defogging according to the second embodiment of the present invention;

FIG. 7 is a comparison of before and after image defogging according to the third embodiment of the present invention;

FIG. 8 is a comparison of before and after image defogging according to the fourth embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.

The invention provides a single image defogging method by using an end-to-end neural network, which is characterized by comprising the following steps of:

s1, constructing a grid attention network model shown in the figure 1: the input is an image to be defogged, the image to be defogged is sent into a shallow feature extraction convolution layer, then is sent into a GridNet module and an Attention module, and finally the features are transmitted to a reconstruction part and a global residual error learning structure, and a clear image is output.

The structure of the GridNet module is shown in FIG. 2, and the GridNet module has four rows and six columns, each row corresponds to a different characteristic scale, and consists of five basic attention convolution modules ABD which are combined with a jump connection and an attention module, and each column is a bridge which is connected with adjacent scales through an up-sampling block and a down-sampling block; in each upsampling module, the size of the feature map is reduced by a factor of 2, while the number of feature maps is increased by a factor of 2, and the downsampling doubles the size of the features.

The attention convolution module ABD (structure as figure 3) in the GridNet module consists of a local residual learning and attention module, which learns less important information from the low frequency regions of the input features through jump-joins.

The Attention module in the Attention convolution module ABD is constructed as shown in FIG. 4. In the Attention module, global average pooling is firstly adopted:

then passing through convolutional layer, ReLU, convolutional layer and sigmoidObtaining SA after activating function processing₁，

SA₁＝σ(Conv(δ(Conv(g_c))))，

will input F_cAnd SA₁Multiplication to obtain

and (3) final output:

s2, model training: by smoothing L₁Loss function:

training in a data set RESIDE;

wherein N refers to the total number of image pixels,

the purpose of model training is: and training the neural network by using a sample set consisting of the foggy images and the corresponding clean images to obtain an end-to-end network model. When the image is restored, the trained model is used to input the foggy image and output the clean image, so that no intermediate result exists. Or the method is mainly embodied in the construction of the network structure. After the network is constructed, the sample set can be used for training, and the training result can be directly taken for image defogging.

S3, test result: the peak signal-to-noise ratio (PSNR) and the Structural Similarity (SSIM) are used for measurement, and the results are better than those of the conventional method, wherein the PSNR is 36.69, the SSIM is 0.9900, the PSNR is 33.89 and the SSIM is 0.9865 in an indoor test set of the RESIDE. Partial visualization results are shown in fig. 5-8.

Fig. 5a, 6a, 7a and 8a are images before defogging, and fig. 5b, 6b, 7b and 8b are images after defogging.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A single image defogging method by using an end-to-end neural network is characterized by comprising the following steps:

s2, model training: by smoothing L₁Loss function:

training in a data set RESIDE;

wherein N refers to the total number of image pixels,

and J_i(x) Refers to the pixel value of x on the ith channel,

2. the method for defogging a single image by using an end-to-end neural network as claimed in claim 1, wherein in step S1, the GridNet module has four rows and six columns, each row corresponds to a different feature size, and is composed of five basic attention convolution modules ABD, which combine a jump connection and an attention module, and each column is a bridge connecting adjacent scales through an up sampling block and a down sampling block; in each upsampling module, the size of the feature map is reduced by a factor of 2, while the number of feature maps is increased by a factor of 2, and the downsampling doubles the size of the features.

3. The method for defogging a single image by using an end-to-end neural network as recited in claim 1 or 2, wherein the attention convolution module ABD in the GridNet module is composed of a local residual learning and attention module, and the local residual learning learns less important information from the low frequency region of the input features through jump connection.

4. The method for defogging single image by using end-to-end neural network according to claim 1, wherein in the Attention module of step S1, the global average pooling is firstly adopted:

SA₁＝σ(Conv(δ(Conv(g_c))))，

will input F_cAnd SA₁Multiplication to obtain

and (3) final output: