CN113344806A

CN113344806A - Image defogging method and system based on global feature fusion attention network

Info

Publication number: CN113344806A
Application number: CN202110576365.9A
Authority: CN
Inventors: 苏卓; 刘芮致; 周凡
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-09-03

Abstract

The invention discloses an image defogging method based on a global feature fusion attention network. The method comprises the following steps: firstly, inputting a foggy image, extracting image features by using a feature fusion attention network, inputting the features extracted by each group architecture into a level attention module for processing, and generating a fogless image; secondly, taking the foggy image and the fogless image as training sets, calculating the loss of the real fogless image and the generated fogless image by using a loss function, and updating parameters in the network model to obtain a trained model; and finally, inputting a foggy image to be defogged, and calculating and outputting a fogless image by using the trained model. The invention also discloses an image defogging system, computer equipment and a computer readable storage medium based on the global feature fusion attention network. The invention discusses a method for fusing a feature attention module and a level attention module, provides a global feature fusion attention module, and concentrates on important information of a training image, thereby recovering more texture details of a foggy image.

Description

Image defogging method and system based on global feature fusion attention network

Technical Field

The invention relates to the fields of computer vision, image processing and image defogging, in particular to an image defogging method, system, equipment and storage medium based on a global feature fusion attention network.

Background

Fog is caused by scattering of atmospheric light and scene light by small droplets or particles suspended in air. Images taken in fog tend to suffer from degradation of visual quality such as color distortion, blurring, low contrast, etc., and for such images, computer vision tasks such as object detection, object recognition, tracking, and segmentation become very complex. Therefore, it is important to improve the quality of the captured image in the fog scene and recover the scene details, i.e., an image defogging method is required to solve the above problems. Image defogging is an important issue for image processing and computer vision. The goal is to recover sharp image detail textures from the hazy image, which will be a pre-processing step for advanced vision tasks. In the existing defogging method, a fogging image is generally described by an atmospheric light scattering model, wherein I (x) is J (x) t (x) and + A (1-t (x)), wherein I (x) represents a fogging image, A represents atmospheric light, t (x) represents a transmittance image, and J (x) represents a non-fogging image.

With the popularity of deep learning, people also successfully apply the method to an image defogging method and obtain excellent effects. In contrast to conventional methods, deep learning methods attempt to directly regress intermediate transmittance maps or haze-free images. With the application of large data, they achieve superior performance and robustness.

One of the existing technologies in the prior art is an integrated defogging method based on a convolutional neural network, which includes the following steps: (1) modifying the original atmospheric light scattering model, and integrating the transmittance graph t (x) and the atmospheric light A into a new variable K (x) to obtain a new model: j (x) k (x) i (x) -k (x) + b;

(2) constructing an end-to-end network structure based on CNN, mainly used for estimating the value of K (x), mainly using five convolution layers, forming multi-scale features by fusing convolution kernels with different sizes, and simultaneously connecting the coarse-scale network features with the intermediate layer of a fine-scale network by using a concat layer; (3) the network loss function uses a simple Mean Square Error (MSE) loss function, and utilizes paired foggy image and fogless image training data to train a network model; (4) finally, we input the fog images into the trained meshIn the process, the output result is the defogged image. The disadvantages are that: (1) the network processes the channel characteristics and the pixel characteristics equally, the network does not consider that the fog distribution on the image is uneven, the weights of the pixels in the thin fog area and the thick fog area are different, and the importance degrees of the characteristics of different channels are different when training is not considered; (2) although the network structure is concise, the feature information is not fully utilized.

Another prior art is a defogging algorithm based on an attention network, which includes the following steps: (1) building a network, wherein a network basic structure consists of a local residual error network structure and an attention module; (2) after the model is built, constructing data pairs of the foggy image and the fogless image as a training set, and inputting the foggy image into the model; (3) the network firstly extracts shallow initial features of the foggy image, then inputs the shallow initial features into N cluster structures with multi-hop links, and fuses the features of the N cluster structures together through the proposed attention module. The features are transmitted to a reconstruction part and a global residual learning structure, so that a fog-free image is obtained; (4) in addition, each cluster structure combines M basic block structures with local residual learning modules, each basic block combining a jump join and attention module; (5) calculating the L1 loss by using the generated fog-free image and the real fog-free image, then reversely updating the weight of the network model, and repeating the steps for multiple times to obtain a trained defogging model; (6) and inputting the image to be defogged into the trained super-resolution model to obtain a reconstructed image as a defogging result. The disadvantages are that: the channel attention mechanism acts independently on features of different levels without taking into account the correlation between the features of the respective levels.

Disclosure of Invention

The invention aims to overcome the defects of the existing method and provides an image defogging method based on a global feature fusion attention network. The invention solves the main problems that: firstly, the problem that different channel characteristics contain different weight information and the fog is not uniformly distributed on different image pixels is not considered, the network cannot pay attention to important high-frequency information such as edges and textures, and the defogging effect is poor; secondly, the utilization of the characteristic information is not sufficient; and thirdly, the channel attention mechanism independently acts on the features of different levels without considering the correlation among the features of the levels.

In order to solve the above problem, the present invention provides an image defogging method based on a global feature fusion attention network, including:

constructing a feature attention network structure, wherein the first layer of convolution of the structure is used for extracting shallow layer initial features, the structure subsequently comprises N group architectures, each group architecture comprises M basic block structures, each basic block structure consists of a local residual error learning and feature attention module, and each feature attention module consists of a channel attention module and a pixel attention module;

inputting the feature map after global average pooling operation into two convolution layers and an activation function of the channel attention module, outputting weights of different channels of the feature map, multiplying the weights and the feature map by elements, and operating all the channels to obtain a feature map after channel weighting;

inputting the feature map weighted by the channel into two convolution layers and an activation function of the pixel attention module to obtain pixel weight, multiplying the pixel weight by the feature map weighted by the channel according to elements, outputting the feature map of the feature attention module, inputting the feature map into a next basic block, and outputting a final feature map after the N group architectures and one layer of convolution;

adding a hierarchical attention module to be combined with the feature attention module, constructing a global feature fusion attention network model, inputting an intermediate feature group FG (fuzzy g) formed by combining intermediate features extracted by the N group architectures into the hierarchical attention module, calculating the correlation among different layers, and outputting a hierarchical attention module feature diagram;

integrating the final feature map, the level attention module feature map and the shallow layer initial features to obtain a feature map of a fog-free image, and recovering the fog-free image through a layer of convolution operation;

constructing a data set, wherein the data set comprises indoor and outdoor foggy images synthesized by fogless images, and the fogless images and the corresponding fogless images are imaged to serve as a training set of the global feature fusion attention network model;

performing data enhancement on the training set, inputting a foggy image block into the global feature fusion attention network model every time to obtain a defogged image, calculating the loss of the generated foggy image and the loss of a real foggy image by using a loss function, then reversely updating the weight of the global feature fusion attention network model, and repeating the steps to obtain the trained global feature fusion attention network model;

and inputting the fog images into the trained global feature fusion attention network model, wherein the output images are the defogged images.

Preferably, the constructing of the feature attention network structure includes that a first layer of convolution of the structure is used for extracting shallow initial features, and subsequently includes N cluster architectures, each cluster architecture includes M base block structures, each base block structure is composed of a local residual learning module and a feature attention module, each feature attention module is composed of a channel attention module and a pixel attention module, and specifically:

the first convolution of the feature attention network structure is used for extracting shallow initial features F₀Subsequently, the invention comprises N group structures, wherein N is set to be 3;

each group structure also comprises M basic block structures and is combined with a local residual error learning module, M is set to be 19 in the invention, a recovery part is added at the tail of the characteristic attention network, and a two-layer convolution is used;

the basic block structure is composed of a local residual learning module and a feature attention module, a plurality of local residuals are connected to bypass less important information, so that a feature attention network pays attention to the important information, and the feature attention module is composed of a channel attention module and a pixel attention module.

Preferably, the feature map is subjected to global average pooling operation and then input into the two convolutional layers and the activation function of the channel attention module, weights of different channels of the feature map are output, the weights and the feature map are multiplied by elements, and all the channels are operated to obtain a channel-weighted feature map, specifically:

after the feature map is subjected to global average pooling operation, two convolution layers, sigmoid and ReL activation function sums of the channel attention module are output to obtain weights of different channels of the feature map, as shown in a formula:

CA_c＝σ(Conv(δ(Conv(g_c))))

wherein, X_c(i, j) represents the value of the c channel of the feature map at the (i, j) position, and the feature map size is H × W, H_pRepresenting a global average pooling operation, sigma representing a sigmoid function, and delta representing a ReLu function;

c channel weight CA obtained_cC channel F of original characteristic diagram_cMultiplication by elements, as shown in the formula:

after all the channels are subjected to the above operation, we obtain a characteristic diagram F after the channels are weighted^*。

Preferably, the feature map weighted by the channel is input into two convolution layers and an activation function of the pixel attention module to obtain a pixel weight, the pixel weight is multiplied by the feature map weighted by the channel according to elements, the feature map of the feature attention module is output, the next basic block is input, and a final feature map is output after the convolution of the N group architectures and one layer, specifically:

the pixel attention module weights the channel to obtain a feature map F^*As an input;

obtaining a pixel weight PA with the size of 1 multiplied by H multiplied by W through two layers of convolution layers and sigmoid and ReLu activation functions;

weighting the pixel weight PA and the channel to obtain a feature map F^*Multiplying by elements to obtain a feature map of the feature attention module

The formula is as shown:

PA＝σ(Conv(δ(Conv(F^*))))

feature map of the feature attention module

Inputting the next basic block as the output of one basic block, and outputting the final feature graph FG after the N group architectures and one layer of convolution_out。

Preferably, the add-level attention module is combined with the feature attention module to construct a global feature fusion attention network model, an intermediate feature group FG which is formed by combining intermediate features extracted by the N group architectures is input to the level attention module, the correlation between different layers is calculated, and a level attention module feature map is output, specifically:

an intermediate feature group FG formed by combining the intermediate features extracted by the N group architectures, the dimension of which is NxHxWxC, and a level attention module is input;

transposing the feature groups into a two-dimensional matrix of NxHWC, and then calculating the correlation among different layers by applying operations such as transposing, matrix multiplication and the like to obtain an NxN relation matrix

The concrete formula is as follows:

wherein μ (·) denotes a Softmax function,

denotes a reshape operation, w_i,jRepresenting a correlation index between the ith and jth layer features;

associating the transposed feature set with a correlation matrix W_laMultiplying and adding the original input feature group to output a hierarchical attention module feature map F_LAs shown in the formula:

where α is a scale factor initialized to 0 and automatically assigned by the network in the following training.

Preferably, the final feature map, the hierarchical attention module feature map, and the shallow layer initial feature are integrated to obtain a feature map of a fog-free image, and the fog-free image is restored through a layer of convolution operation, specifically:

integrating the final feature map FG by element summation_outThe hierarchical attention module feature map F_LAnd the shallow initial feature F₀Obtaining a characteristic diagram of the fog-free image;

recovering fog-free image I by one layer of convolution operation_haze-freeAs shown in the formula:

I_haze-free＝Conv(F_L+FG_out+F₀)。

preferably, the data enhancement is performed on the training set, a foggy image block is taken each time and input into the global feature fusion attention network model to obtain a defogged image, the loss of the generated fogless image and the loss of the real fogless image are calculated by using a loss function, then the weight of the global feature fusion attention network model is updated in a reverse direction, and the step is repeated to obtain the trained global feature fusion attention network model, specifically:

performing data enhancement on the training set, and inputting 240 multiplied by 240 foggy image blocks into a built network model each time to obtain a defogged image;

calculating the loss of the generated fog-free image and the real fog-free image by using a loss function, and then reversely updating the weight of the global feature fusion attention network model, wherein the loss function uses an L1 loss function, as shown in a formula:

where HFFA (-) represents the network model of the invention, Θ represents the parameters of the model of the invention, I_gtRepresenting a true fog-free image, I_hazeRepresenting an input foggy image;

and repeating the steps to obtain the trained global feature fusion attention network model.

Correspondingly, the invention also provides an image defogging system based on the global feature fusion attention network, which comprises:

the model building unit is used for building a feature attention network structure, each group architecture comprises a basic block structure, the basic block structure consists of a local residual error learning and feature attention module, and a hierarchical attention module and a feature attention module are added to be combined to build a global feature fusion attention network model;

the model training unit is used for constructing a data set for data enhancement, calculating loss by using a loss function, then reversely updating the weight of the global feature fusion attention network model, and repeating the step to obtain the trained global feature fusion attention network model;

and the defogging display unit is used for inputting the defogged image into the trained global feature fusion attention network model, and the output image is the defogged image.

Correspondingly, the invention also provides computer equipment which comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the steps of the image defogging method based on the global feature fusion attention network.

Accordingly, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, implements the steps of the above-mentioned image defogging method based on the global feature fusion attention network.

The implementation of the invention has the following beneficial effects:

firstly, the invention constructs a novel global feature fusion attention module to model information features among hierarchical layers, channels and pixels, so as to jointly improve the attention degree of a network to important information and realize a method for restoring a foggy image into a fogless image; secondly, the global feature fusion attention module comprises a channel attention module and a pixel attention module, the feature weight can be learned in a self-adaptive mode, more weight is given to important features, and when different types of information are processed, the module can provide extra flexibility, so that the network can concentrate more attention on pixels with stronger fog and more important channel information; third, the present invention introduces a hierarchical attention module (LAM) in the global feature fusion attention module to learn the weights of the hierarchical features by considering the correlation between the multi-scale hierarchies.

Drawings

FIG. 1 is a general flowchart of an image defogging method based on a global feature fusion attention network according to an embodiment of the present invention;

FIG. 2 is a diagram of a feature fusion attention network architecture according to an embodiment of the present invention;

FIG. 3 is a basic block diagram of an embodiment of the present invention;

FIG. 4 is a feature attention block diagram of an embodiment of the present invention;

FIG. 5 is a block diagram of a global feature fusion attention network according to an embodiment of the present invention;

FIG. 6 is a block diagram of a hierarchical attention module in accordance with an embodiment of the present invention;

fig. 7 is a structural diagram of an image defogging system based on a global feature fusion attention network according to an embodiment of the present invention.

Detailed Description

Technical inventions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a general flowchart of an image defogging method based on a global feature fusion attention network according to an embodiment of the present invention, as shown in fig. 1, the method includes:

s1, constructing a feature attention network structure, wherein the first layer of convolution of the structure is used for extracting shallow layer initial features, the structure subsequently comprises N cluster architectures, each cluster architecture comprises M basic block structures, each basic block structure consists of a local residual error learning and feature attention module, and each feature attention module consists of a channel attention module and a pixel attention module;

s2, inputting the feature map after global average pooling operation into two convolution layers and an activation function of the channel attention module, outputting weights of different channels of the feature map, multiplying the weights and the feature map by elements, and operating all the channels to obtain a feature map after channel weighting;

s3, inputting the feature graph after the channel weighting into two convolution layers and an activation function of the pixel attention module to obtain pixel weight, multiplying the pixel weight by the feature graph after the channel weighting according to elements, outputting the feature graph of the feature attention module, inputting the feature graph into the next basic block, and outputting a final feature graph after the N group architectures and one layer of convolution;

s4, adding a hierarchical attention module and combining the hierarchical attention module with the feature attention module to construct a global feature fusion attention network model, inputting an intermediate feature group FG (fuzzy g) formed by combining the intermediate features extracted by the N group architectures into the hierarchical attention module, calculating the correlation among different layers, and outputting a hierarchical attention module feature map;

s5, integrating the final feature map, the hierarchical attention module feature map and the shallow layer initial feature to obtain a feature map of a fog-free image, and recovering the fog-free image through a layer of convolution operation;

s6, constructing a data set which comprises indoor and outdoor foggy images synthesized by fogless images, and taking the foggy images and the corresponding fogless images as an image pair of the global feature fusion attention network model;

s7, performing data enhancement on the training set, inputting the foggy image blocks into the global feature fusion attention network model every time to obtain a defogged image, calculating the loss of the generated fogless image and the real fogless image by using a loss function, then reversely updating the weight of the global feature fusion attention network model, and repeating the steps to obtain the trained global feature fusion attention network model;

and S8, inputting the fog image into the trained global feature fusion attention network model, wherein the output image is the image after defogging.

Step S1 is specifically as follows:

s1-1: as shown in FIG. 2, a characteristic attention network structure, namely a 'skeleton' part of the model of the invention is constructed, and the part refers to a network architecture based on a defogging method (FFA-Net) of a characteristic fusion attention network, wherein Hazy Image represents an input foggy Image, and Haze-free Image represents an output fogless Image.

S1-2: the first convolution of the structure is used for extracting shallow initial features F₀Subsequently, the present invention includes N Group architectures (Group Architecture) and sets N to 3, each Group Architecture further includes M Basic Block structures (Basic Block Structure) and is combined with the local residual learning module, and M in the present invention is set to 19. At the end of the network, the invention adds a recovery part, uses a two-layer convolution, and records the characteristics extracted by the one-layer convolution after N group architectures as FG_out。

S1-3: the Basic Block Structure (Basic Block Structure) is shown in fig. 3 and is composed of a local residual learning and feature attention module. Local residual learning may bypass less important information by multiple local residual connections, allowing the network to focus on important information.

Step S2 is specifically as follows:

s2-1: the feature attention module was constructed, and its structure is shown in fig. 4. The operation of the channel attention module may be represented by the following process. Firstly, the output of the feature graph after the global Average Pooling (Average Pooling) operation is put into two convolution layers, sigmoid and ReLu activation functions to obtain the weights of different channels of the feature graph, as shown in formulas (1) and (2),

CA_c＝σ(Conv(δ(Conv(g_c))))， (1)

wherein, X_c(i, j) represents the value of the c channel of the feature map at the (i, j) position, and the feature map size is H × W, H_pRepresents the global average pooling operation, σ represents the sigmoid function, and δ represents the ReLu function. Then, the obtained c channel weight CA_cC channel F of original characteristic diagram_cMultiplication by elements, as shown in formula (3),

Step S3 is specifically as follows:

s3-1: pixel attention module will F^*As input, obtaining pixel weight PA with the size of 1 xHxW through two layers of convolution layers and sigmoid and ReLu activation functions, and connecting PA with a feature map F^*Multiplying by element to obtain feature attention module output

The formula is shown in (4) and (5),

PA＝σ(Conv(δ(Conv(F^*))))， (4)

s3-2: feature map of feature attention module

Inputting the next basic block as the output of one basic block, outputting the final characteristic diagram FG after N group structures and one layer of convolution_out。

Step S4 is specifically as follows:

s4-1: as shown in fig. 5, a hierarchical attention module (LAM) is added, and combined with the built feature attention module, a Global Feature Fusion Attention Module (GFFAM) is constructed herein. In the invention, the network structure of the global feature fusion attention network is divided into an upper part and a lower part based on a large residual error network, the lower part is a skeleton of the network, and feature extraction is mainly carried out based on a feature attention module; the top part is the level attention module, which is responsible for weighting the features of different levels.

S4-2: as shown in fig. 6, a hierarchical attention module structure is input with the extracted intermediate feature group FG for the N cluster architectures, whose dimensions are N × H × W × C. Then, the feature group reshape is formed into a two-dimensional matrix of NxHWC, and then, the correlation among different layers is calculated by applying operations such as transposition, matrix multiplication and the like to obtain an NxN relation matrix

As shown in the formula (6) in detail,

wherein μ (·) represents Softmax,

representing a reshape operation. Finally, multiplying the characteristic group after reshape with the incidence matrix, adding a scale factor alpha, and adding the original input characteristic to obtain the final output characteristic F_LAs shown in the formula (7),

where α is initialized to 0 and is automatically assigned by the network in the following training. For convenience of feature fusion later, the pair F is also required_LReshape operation is performed to make it a feature of H × W × NC. After the features extracted by different levels of group architectures are weighted by the level attention module, the network can pay more attention to the group architecture with richer feature information.

Step S5 is specifically as follows:

s5-1: at the end of the network, the method integrates the feature F extracted by the level attention module in an element summation mode_LAnd final characteristic FG processed by N groups of structures_outAnd shallow initial feature F extracted by the first convolution₀。

S5-2: after all the characteristics are integrated, a fog-free image characteristic diagram can be obtained, and a fog-free image I can be recovered through a layer of convolution operation_haze-freeAs shown in the formula (8),

I_haze-free＝Conv(F_L+FG_out+F₀) (8)

step S6 is specifically as follows:

s6-1: the method comprises the steps of constructing a data set, and selecting a RESIDE data set, wherein the data set comprises indoor and outdoor fog images synthesized by fog-free images, the global atmospheric light variation range of the fog images is 0.8-1.0, and the transmittance variation range is 0.04-0.2. And taking the fog images and the corresponding fog-free images as image pairs as a training set of the network model.

Step S7 is specifically as follows:

s7-1: firstly, performing data enhancement on a training set, inputting 240 multiplied by 240 foggy image blocks into a built network model each time to obtain a defogged image, calculating the loss of the generated fogless image and the loss of a real fogless image by using a built loss function, then reversely updating the weight of the network model, and repeating the step 1000 times to obtain the trained defogged model.

S7-2: the loss function of the present invention uses the L1 loss function, as shown in equation (8),

where HFFA (-) represents the network model of the invention, Θ represents the parameters of the model of the invention, I_gtRepresenting a true fog-free image, I_hazeRepresenting the input foggy image.

Correspondingly, the present invention further provides an image defogging system based on the global feature fusion attention network, as shown in fig. 7, including:

the model building unit 1 builds a feature attention network structure, each group structure comprises a basic block structure, the basic block structure is composed of a local residual error learning module and a feature attention module, and a layer attention module and a feature attention module are added to be combined to build a global feature fusion attention network model.

Specifically, a feature attention network structure is built, a first layer of convolution of the structure is used for extracting shallow layer initial features, the structure subsequently comprises N group architectures, each group architecture comprises M base block structures, each base block structure is composed of a local residual error learning module and a feature attention module, each feature attention module is composed of a channel attention module and a pixel attention module, different channels and different pixels of a feature map are weighted, important feature information is highlighted, a hierarchical attention module is added to be combined with the feature attention modules to construct a global feature fusion attention network model, the feature maps extracted by each layer of group architecture are weighted, so that the network focuses on a group architecture layer with richer feature information, a middle feature group FG formed by combining the middle features extracted by the N group architectures is input into the hierarchical attention module to calculate the correlation among different layers, and outputting a feature map of the level attention module, and performing integrated convolution by using the feature map to recover the fog-free image.

And the model training unit 2 is used for constructing a data set to perform data enhancement, calculating loss by using a loss function, then reversely updating the weight of the global feature fusion attention network model, and repeating the step to obtain the trained global feature fusion attention network model.

Specifically, the data set comprises indoor and outdoor foggy images synthesized by foggy images, the foggy images are used as a training set of the global feature fusion attention network model, data enhancement is carried out on the training set, foggy image blocks are input into the global feature fusion attention network model every time, a defogged image is obtained, loss of the generated foggy image and loss of a real foggy image are calculated by using a loss function, then the weight of the global feature fusion attention network model is updated reversely, and the step is repeated to obtain the trained global feature fusion attention network model.

And the defogging display unit 3 is used for inputting the defogged image into the trained global feature fusion attention network model, and the output image is the defogged image.

Specifically, the image with fog is input into the trained global feature fusion attention network model, and the output image is the image after defogging.

Therefore, the invention constructs a novel global feature fusion attention module to model the information features among the hierarchical layers, the channels and the pixels, so as to jointly improve the attention degree of the network to important information and realize the method for restoring a foggy image into a fogless image; meanwhile, the global feature fusion attention module comprises a channel attention module and a pixel attention module, so that feature weight can be learned in a self-adaptive manner, more weight is given to important features, and when different types of information are processed, the module can provide extra flexibility, so that the network concentrates more attention on pixels with stronger fog and more important channel information; finally, the invention introduces a hierarchical attention module (LAM) in the global feature fusion attention module to learn the weights of the hierarchical features by considering the correlation between the multi-scale hierarchies.

Correspondingly, the invention also provides computer equipment which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the image defogging method based on the global feature fusion attention network when executing the computer program. Meanwhile, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, implements the steps of the above-mentioned ship identification method.

The image defogging method, system, device and storage medium based on the global feature fusion attention network provided by the embodiment of the invention are described in detail, and a specific example is applied to explain the principle and the implementation of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An image defogging method based on a global feature fusion attention network is characterized by comprising the following steps:

2. The image defogging method based on the global feature fusion attention network as claimed in claim 1, wherein the feature attention network structure is constructed, a first layer convolution of the structure is used for extracting shallow initial features, subsequently, the structure includes N group architectures, each group architecture includes M basic block structures, the basic block structure is composed of a local residual learning and feature attention module, the feature attention module is composed of a channel attention module and a pixel attention module, and specifically:

the first convolution of the feature attention network structure is used for extracting shallow initial features F₀Subsequently, N group architectures are included, and N is set to be 3;

3. The image defogging method based on the global feature fusion attention network as claimed in claim 1, wherein the feature map is input into the two convolution layers and the activation function of the channel attention module after being subjected to the global average pooling operation, the weights of different channels of the feature map are output, the weights and the feature map are multiplied by elements, and all the channels are operated to obtain the weighted feature map, specifically:

CA_c＝σ(Conv(δ(Conv(g_c))))

4. The image defogging method based on the global feature fusion attention network as claimed in claim 3, wherein the feature map weighted by the channel is input into two convolution layers and an activation function of the pixel attention module to obtain a pixel weight, the pixel weight is multiplied by the feature map weighted by the channel according to elements to output the feature map of the feature attention module, the feature map is input into a next basic block, and a final feature map is output after the convolution of the N group architectures and one layer, specifically:

The formula is as shown:

PA＝σ(Conv(δ(Conv(F^*))))

feature map of the feature attention module

5. The image defogging method based on the global feature fusion attention network as claimed in claim 1, wherein the add level attention module is combined with the feature attention module to construct a global feature fusion attention network model, an intermediate feature group FG formed by combining intermediate features extracted from the N group architectures is input to the level attention module to calculate the correlation between different layers, and a level attention module feature map is output, specifically:

The concrete formula is as follows:

wherein μ (·) denotes a Softmax function,

denotes a reshape operation, w_i，jRepresenting a correlation index between the ith and jth layer features;

6. The method according to claim 2, 4 or 5, wherein the final feature map, the hierarchical attention module feature map and the shallow layer initial features are integrated to obtain a feature map of a fog-free image, and the fog-free image is restored through a layer of convolution operation, specifically:

recovering fog-free image I by one layer of convolution operation_haze-freeThe results, as shown in the formula,

I_haze-free＝Conv(F_L+FG_out+F₀)。

7. the image defogging method based on the global feature fusion attention network as claimed in claim 1, wherein the data enhancement is performed on the training set, each time a foggy image block is taken and input into the global feature fusion attention network model to obtain a defogged image, the loss of the generated fogless image and the loss of the real fogless image are calculated by using a loss function, then the weight of the global feature fusion attention network model is reversely updated, and the step is repeated to obtain the trained global feature fusion attention network model, specifically:

where HFFA (-) represents the network model of the invention, Θ represents the parameters of the model of the invention, I_gtRepresenting a true fog-free image of the image,I_hazerepresenting an input foggy image;

8. An image defogging system based on a global feature fusion attention network, characterized in that the system comprises:

9. The image defogging system based on the global feature fusion attention network as claimed in claim 7, wherein said model building unit builds a feature attention network structure, the first layer convolution of the structure is used to extract shallow initial features, and subsequently comprises N group architectures, each group architecture comprises M basic block structures, the basic block structure is composed of a local residual error learning and feature attention module, the feature attention module is composed of a channel attention module and a pixel attention module, different channels and different pixels of a feature map are weighted, important feature information is highlighted, a hierarchical attention module is added to combine with said feature attention module to build the global feature fusion attention network model, the feature map extracted by each layer of group architecture is weighted, so that the network attention extracts the group architecture layer with richer feature information, and inputting an intermediate feature group FG formed by combining the intermediate features extracted by the N group architectures into a hierarchical attention module, calculating the correlation among different layers, outputting a hierarchical attention module feature diagram, and performing integrated convolution by using the feature diagram to recover a fog-free image.

10. The image defogging system based on the global feature fusion attention network as claimed in claim 7, wherein said model training unit, data set comprises indoor and outdoor foggy images synthesized by fogless images as the training set of said global feature fusion attention network model, data enhancement is performed on said training set, each time foggy image block is taken and inputted into said global feature fusion attention network model to obtain defogged image, loss of generated fogless image and real fogless image is calculated by using loss function, then weight of said global feature fusion attention network model is reversely updated, and the step is repeated to obtain the trained global feature fusion attention network model.

11. The image defogging system according to claim 7, wherein the defogging display unit inputs the defogged image into the trained global feature fusion attention network model, and the output image is the defogged image.

12. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.