CN117495718A

CN117495718A - Multi-scale self-adaptive remote sensing image defogging method

Info

Publication number: CN117495718A
Application number: CN202311463096.0A
Authority: CN
Inventors: 王新华; 蒿乾坤; 李壮; 曹金鹏; 宋向阳; 杨俊杰; 郑慧
Original assignee: Northeast Dianli University
Current assignee: Northeast Electric Power University
Priority date: 2023-11-06
Filing date: 2023-11-06
Publication date: 2024-02-02

Abstract

The invention relates to a remote sensing image defogging method based on deep learning multi-scale self-adaption. The method aims at recovering clear image details and scene information from the blurred remote sensing image after atmospheric scattering. The method utilizes the expansion convolution operation to extract multi-scale characteristics, obtains more comprehensive characteristic information, introduces a self-adaptive attention mechanism, and efficiently extracts important information of different scales, thereby realizing a defogging effect of a remote sensing image with higher quality. The method can effectively restore details and scene information of the remote sensing image, has stronger robustness and generalization capability, and is suitable for remote sensing images of different scenes. The technical contribution of the patent not only can be applied to the defogging field of remote sensing images, but also provides powerful support for image processing tasks such as downstream remote sensing image target detection, remote sensing image target recognition and the like.

Description

Multi-scale self-adaptive remote sensing image defogging method

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a multiscale self-adaptive remote sensing image defogging method.

Background

Over the past decades, researchers have proposed a number of methods and techniques for defogging remote sensing images. These methods can be classified into the following categories: the physical model-based method estimates transmittance and restores an image by modeling physical properties of mist according to an atmospheric scattering model. The prior knowledge-based method utilizes image statistics and prior knowledge to estimate the transmittance and restore the image. In recent years, a deep learning method has made remarkable progress in a remote sensing image defogging task, which learns a mapping function of an image using a Convolutional Neural Network (CNN) or a generated countermeasure network (GAN), thereby removing fog and restoring a clear image. Compared with a defogging method based on an atmospheric scattering model and priori knowledge, the defogging method based on deep learning has the advantages that the characteristic representation and the mapping function of an image can be learned from a large amount of data, so that the defogging method has strong self-adaptability, and complex image characteristics and fog removal modes can be learned through samples in training sets, so that the defogging method is suitable for different scenes and fog with different degrees; the deep learning method can perform end-to-end learning, namely directly generating defogged images from the input fog images, does not need to rely on an atmospheric scattering model and priori knowledge, improves the processing efficiency, and can better keep the details and the structures of the images; the deep learning model which is fully trained has stronger generalization capability, and can process new remote sensing images and remove fog. In addition, the deep learning model can also accelerate the training process and improve defogging effect through the use of the transfer learning and pre-training model.

In summary, research on remote sensing image defogging technology by using a method based on deep learning at home and abroad has achieved a certain theoretical result, but some problems still exist to be solved.

Disclosure of Invention

In order to solve the defects in the prior art, the invention aims to provide a multiscale self-adaptive remote sensing image defogging method so as to solve the technical problems. The method provided by the invention can realize defogging of the remote sensing image under different scenes, more effectively solve the multi-scale feature extraction of the remote sensing image and improve the quality of the reconstructed remote sensing image.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the utility model provides a multiscale self-adaptive remote sensing image defogging method, which comprises the following steps:

step 1) preprocessing a remote sensing image dataset:

taking the remote sensing image collected by Google Earth as an example, the target comprises 10 types of targets. The 10 categories are respectively: aircraft, baseball fields, basketball courts, bridges, intersections, track and field, ports, boats, tennis courts, automobiles. The sample size is relatively uniform among the various classes, each class having about 4000 samples.

And simulating the acquired remote sensing images to generate paired haze remote sensing images, and setting values of parameters such as different atmospheric concentrations, scattering coefficients, visibility and the like according to an atmospheric scattering model to simulate remote sensing images with different atomization degrees.

The specific parameter settings will vary depending on the actual requirements. Atmospheric concentration: may be set to a value between 0.01 (lower concentration) and 0.1 (higher concentration), indicating the concentration of particles in the atmosphere. Scattering coefficient: may be set to a value between 0.1 (lower scattering) and 0.9 (higher scattering), indicating the scattering intensity of light in the atmosphere. Visibility: may be set to a value between 100 (better visibility) and 1000 (worse visibility), indicating visibility in the atmosphere.

Step 2) setting a training set, a verification set and a test set:

of the 10 categories of paired remote sensing datasets, 1000 images were selected for each category. And setting different atmospheric concentration, scattering coefficient and visibility parameters according to the step 1) to synthesize the haze image. The training set is divided into 7000 pieces, 2000 pieces of verification set and 1000 pieces of test set according to the ratio of 7:2:1.

Step 3) a multi-scale self-adaptive feature extraction algorithm:

the expansion convolution with different expansion ratios is used for realizing multi-scale feature extraction, and an adaptive channel attention module is introduced after each expansion convolution with different sizes, wherein the module comprises a channel attention mechanism, a spatial attention mechanism and a pixel attention mechanism.

The channel attention mechanism adjusts the feature importance of the input feature map adaptively in the channel dimension. The spatial attention module adjusts the attention weights of different spatial locations adaptively. The pixel attention module adjusts the attention weights of different pixels adaptively. The three aims to improve the capability of the network in the aspects of sensing an atomization area and reserving detailed textures, thereby improving the quality of remote sensing images.

Step 4) local color consistency loss function:

an RGB color space is selected to divide the image into a plurality of regions so as to calculate a color difference within each region. First, for each region, the color difference between its internal pixels is calculated. Second, converting the color differences of the regions into loss values, the different color differences may be weighted by normalizing, squaring, etc. the color differences.

Step 5) residual learning method:

in the network model, a residual connection mode is adopted, jump connection is introduced into the network, and the original input and the output of the middle layer are added, so that the gradient can be directly transmitted back to the earlier layer, the problems of gradient disappearance and gradient explosion are avoided, and the convergence process is accelerated. In addition, the residual connection can pass the original input information directly to the subsequent layer, preserving more detail and feature information. This helps to improve the network's awareness of the details and reduce the loss of information.

Further, the step 1) preprocessing the remote sensing image dataset comprises the following steps:

and (3) processing the original image by designing an atmospheric scattering model, outputting a corresponding haze image, storing the haze image under a new folder, and naming each generated picture as an original image_dehaze. The atmospheric scattering model formula is:

I(x)＝J(x)t(x)+A(1-t(x))；

where I (x) is the synthetic blurred image, J (x) is the target haze free image, t (x) is the medium transmittance image, which depends on unknown depth information, and a is the global atmospheric light value.

Further, the step 2) setting a training set, a verification set and a test set includes:

(1) A total of 10000 remote sensing images of 10 selected categories are calculated according to a training set: verification set: test set = 7:2: 1.

(2) The order of the images is randomly disturbed. This is to ensure randomness of the data set, avoiding the impact of ordering on model training and evaluation.

(3) Using the python programming language, all telemetry images are cropped to a 512x512 size and the images are divided into corresponding sets according to scale.

(4) The images between the training set, the verification set and the test set are mutually exclusive, and the same image does not appear in different sets at the same time.

Further, the step 3) of the multi-scale adaptive feature extraction algorithm includes:

(1) The expansion convolution with expansion ratios of 1, 2 and 4 is selected, convolution operation is carried out on the input feature map, nonlinear transformation is ensured to be introduced by using a Relu activation function after each convolution layer, and the method is beneficial to improving the expression capacity of a model and enabling the model to learn more complex and abstract feature representation.

(2) And carrying out pooling operation on each channel by using global average pooling, reducing the channel dimension to 1, inputting the pooled result into a convolution layer for learning the channel attention weight, the spatial attention weight and the pixel attention weight, limiting the weight between 0 and 1 by using a Sigmoid function, and multiplying the channel attention weight by the feature map so as to adaptively adjust the importance of different channels, spatial positions and pixels of the feature map.

(3) The expansion convolution of three groups of different expansion rates is respectively combined with the self-adaptive attention module, so that the expansion convolution focuses on the respective important characteristic information of different scales.

Further, the step 4) of the local color consistency loss function includes:

(1) The input remote sensing image is divided into 32x32 local areas.

(2) And calculating the average value of pixels in the area corresponding to the generated image and the label image.

(3) The pixel average value for each region is converted to a loss value.

(4) And carrying out average processing on the loss value of each region to obtain the loss value of the whole image.

(5) In the model training process, the color of the generated remote sensing image is constrained.

Further, the residual learning method in step 5) includes:

(1) And the three groups of expansion convolution modules use residual connection to better capture multi-scale characteristics and improve the performance of the model on complex tasks.

(2) Residual connection is used between an input image and the final output, and through residual connection of shallow layer features and deep layer features, information of the shallow layer features is directly transferred to a convolution layer before an output layer, more low-level details and local features are introduced, so that a model can better capture fine differences of images.

The multi-scale self-adaptive remote sensing image defogging method provided by the invention has the following beneficial effects:

firstly, a data set capable of pairing training is generated through remote sensing image preprocessing and is used for coping with complex remote sensing scenes. And secondly, extracting richer characteristic information by using expansion convolution with different expansion rates through a multi-scale self-adaptive characteristic extraction algorithm, introducing a self-adaptive attention module after each group of expansion convolution, focusing on important characteristic information with different scales, and improving the operability of subsequent processing. And then, through a local color consistency loss function, in the training process, each divided local area is averaged and converted into a loss value, and the model is constrained, so that the color of the generated image is more similar to that of the label image. Finally, a residual error learning mode is used, so that the gradient vanishing problem is relieved, training convergence is accelerated, network depth and complexity are improved, parameter updating is optimized, feature reuse is enhanced, and the method has good interpretability.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:

FIG. 1 is a block diagram of a multi-scale adaptive remote sensing image defogging method;

FIG. 2 is a haze remote sensing image synthesized by an atmospheric scattering model;

FIG. 3 is a haze remote sensing image histogram;

FIG. 4 is a diagram of the overall architecture of the network;

FIG. 5 is a diagram of an adaptive attention module;

FIG. 6 is a schematic diagram of defogging results of a remote sensing image;

fig. 7 is a defogging result histogram of the remote sensing image.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown.

The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention.

All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. It should be noted that, for convenience of description, only the portions related to the invention are shown in the drawings.

The hardware environment for implementing the method is as follows: the display card model Tesla T4 has the display memory size of 16g, the development environment of pycharm and the development design by selecting a pytorch framework.

As shown in fig. 1-7, the invention provides a multi-scale self-adaptive remote sensing image defogging method, which comprises the following steps:

step 1) preprocessing a remote sensing image dataset:

taking the HSSRD dataset as an example, there are 55740 instances of targets, around 4000 for each category. HRRSD contains class 13 targets. The 13 categories are respectively: aircraft, baseball field, basketball court, bridge, crossroad, track and field, harbor, parking lot, boat, storage tank, T-crossing, tennis court, car. The sample size is relatively uniform among the various classes, each class having about 4000 samples.

And simulating HRRSD to generate haze images, and setting values of parameters such as different atmospheric concentrations, scattering coefficients, visibility and the like according to an atmospheric scattering model to simulate remote sensing images with different atomization degrees.

Step 2) setting a training set, a verification set and a test set:

of the 13 categories of the HRRSD dataset, ten categories were selected, each category selecting 1000 images. And setting different atmospheric concentration, scattering coefficient and visibility parameters according to the step 1) to synthesize the haze image. It is divided into 7000 training sets, 2000 verification sets and 1000 test sets.

Step 3) a multi-scale self-adaptive feature extraction algorithm:

Step 4) local color consistency loss function:

Step 5) residual learning method:

I(x)＝J(x)t(x)+A(1-t(x))；

The pseudo code of the haze image synthesis algorithm by the atmospheric scattering model is as follows:

the expansion convolution with expansion ratios of 1, 2 and 4 is selected, convolution operation is carried out on the input feature map, nonlinear transformation is ensured to be introduced by using a Relu activation function after each convolution layer, and the method is beneficial to improving the expression capacity of a model and enabling the model to learn more complex and abstract feature representation. When l=1, the dilation convolution performs the same operation as the normal convolution. F is feature map, p is the coordinates of a point on the feature map, k is a convolution kernel of (2r+1) × (2r+1), l is the expansion ratio, (f×k) (p) represents the value at the point p after the feature map and convolution kernel k are calculated, and the expansion convolution expression is:

(F*lk)(p)＝∑∑F(lu+p，lv+p)k(u，v)；

and carrying out pooling operation on each channel by using global average pooling, reducing the channel dimension to 1, inputting the pooled result into a convolution layer for learning the channel attention weight, the spatial attention weight and the pixel attention weight, limiting the weight between 0 and 1 by using a Sigmoid function, and multiplying the channel attention weight by the feature map so as to adaptively adjust the importance of different channels, spatial positions and pixels of the feature map.

Further, the step 4) of the local color consistency loss function includes:

(1) The input remote sensing image is divided into 32x32 local areas.

(3) The pixel average value for each region is converted to a loss value.

The expression is as follows:

further, the residual learning method in step 5) includes:

To test the reliability of the present invention, tests were performed on data from two different remote sensing scenarios and a representation of the remote sensing image is provided in fig. 6. For the experiment, two sets of data were input into the network model separately. Before the experiment starts, the same pretreatment process is carried out on each group of remote sensing data, so that the remote sensing data have the same form.

For the overall evaluation and comparison of experimental results, two common image quality evaluation indicators are used herein: peak Signal-to-Noise Ratio (PSNR) and structural similarity index (Structural Similarity Index, SSIM). First, PSNR is an index that measures the difference between a reconstructed image and an original image. It measures the degree of similarity between two images by calculating their mean square error. The higher the PSNR value, the smaller the difference between the reconstructed image and the original image, and the better the image quality. However, PSNR considers only the mean square error, ignoring the difference perceived by human eyes, and thus may not accurately reflect the true quality of an image in some cases. Second, SSIM is an index that considers brightness, contrast, and structural similarity in combination. It calculates the similarity between two images by comparing their brightness, contrast and structural information. The SSIM value is between 0 and 1, with a closer to 1 indicating a higher similarity between the reconstructed image and the original image. Unlike PSNR, SSIM is better able to capture differences in human eye perception because it takes into account the sensitivity of the human eye to image structures. By using two evaluation indexes, i.e., PSNR and SSIM, at the same time, the experimental results can be evaluated from different angles. PSNR mainly focuses on differences between reconstructed images and original images, while SSIM focuses more on structural similarity of images. Considering both of these metrics together allows a more comprehensive assessment of the performance of the present invention.

The experimental results are shown in FIG. 7. FIG. 7 is an expanded convolution multi-scale feature extraction impact on remote sensing defogging. In the figure, it can be observed that multi-scale feature extraction has significant advantages over single-scale feature extraction, which only uses feature information of a fixed scale, but cannot fully use detail information of different scales in an image. Thus, multi-scale feature extraction can effectively restore details and structure of an image.

FIG. 7 is an effect of adaptive attention mechanisms on defogging of remote sensing images. From the figure, the model can more accurately process the haze area by introducing an attention mechanism, extract key characteristics in the remote sensing image and generate a clearer and more real reconstructed image.

Fig. 7 also shows the effect of residual connection on model training speed. As can be seen from the figure, the convergence speed of the model can be accelerated when using residual connections.

In conclusion, compared with the traditional method, the visual quality of the method provided by the invention is obviously improved, the method provided by the invention does not depend on the traditional atmospheric scattering model and priori knowledge, is suitable for various complex scenes, and has good applicability and robustness.

The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Other technical features besides those described in the specification are known to those skilled in the art, and are not described herein in detail to highlight the innovative features of the present invention.

Claims

1. A multi-scale self-adaptive remote sensing image defogging method is characterized by comprising the following steps:

step 1) preprocessing a remote sensing image dataset:

taking a remote sensing image acquired by Google Earth as an example, the remote sensing image comprises 10 classes of targets, and the 10 classes are respectively: aircraft, baseball fields, basketball courts, bridges, intersections, track and field, ports, boats, tennis courts, automobiles. The sample size is relatively balanced among the various categories, and each category has about 4000 samples;

simulating the acquired remote sensing images to generate paired haze remote sensing images, and setting values of parameters such as different atmospheric concentrations, scattering coefficients, visibility and the like according to an atmospheric scattering model to simulate remote sensing images with different atomization degrees;

specific parameter settings vary according to actual requirements, atmospheric concentration: may be set to a value between 0.01 and 0.1, indicating the concentration of particles in the atmosphere; scattering coefficient: may be set to a value between 0.1 and 0.9, indicating the scattering intensity of light in the atmosphere; visibility: may be set to a value between 100 and 1000, indicating visibility in the atmosphere;

step 2) designing a training set, a verification set and a test set:

in the 10 categories of the matched remote sensing data sets, 1000 images are selected from each category, different atmospheric concentration, scattering coefficient and visibility parameters are set according to the step 1) to synthesize haze images, and the haze images are divided into 7000 training sets, 2000 verification sets and 1000 testing sets according to the ratio of 7:2:1;

step 3) a multi-scale self-adaptive feature extraction algorithm:

adopting expansion convolution with different expansion ratios to realize multi-scale feature extraction, and introducing an adaptive channel attention module after each expansion convolution with different sizes, wherein the module comprises a channel attention mechanism, a spatial attention mechanism and a pixel attention mechanism;

the channel attention mechanism carries out self-adaptive feature importance adjustment on the input feature map in the channel dimension, the spatial attention module self-adaptively adjusts the attention weights of different spatial positions, and the pixel attention module self-adaptively adjusts the attention weights of different pixels, so that the three aims at improving the capability of the network in the aspects of sensing an atomization region and reserving detail textures, and the quality of a remote sensing image is improved;

step 4) local color consistency loss function:

selecting an RGB color space, dividing an image into a plurality of regions so as to calculate color differences within each region, first, for each region, calculating color differences between pixels inside thereof; secondly, converting the color difference of the region into a loss value, and weighting different color differences by normalizing, squaring and the like the color difference;

step 5) residual learning method:

in the network model, a residual connection mode is adopted, jump connection is introduced in the network, the original input and the output of the middle layer are added, the gradient can be directly transmitted back to the earlier layer, the problems of gradient disappearance and gradient explosion are avoided, the convergence process is accelerated, in addition, the residual connection can directly transmit the original input information to the subsequent layer, and more detail and characteristic information are reserved. This helps to improve the network's awareness of the details and reduce the loss of information.

2. The method for defogging a multi-scale adaptive remote sensing image according to claim 1, wherein the step 3) of multi-scale adaptive feature extraction algorithm comprises:

(1) Extracting multi-scale characteristic information by setting expansion convolution with expansion ratios of 1, 2 and 4, wherein each group comprises 6 expansion convolution blocks;

(2) Introducing a self-adaptive attention module after each group of expansion convolution, and effectively extracting important information of different scales;

(3) Defining a channel attention module, wherein the module calculates the attention weight of each channel by convolution to adjust the input characteristic diagram;

(4) Defining a spatial attention module, and adjusting the feature map by focusing on different spatial positions;

(5) A pixel attention module is defined, and each pixel point is regarded as an independent attention unit and is adjusted according to the importance of the self-characteristics.

3. The method for defogging a multi-scale adaptive remote sensing image according to claim 1, wherein said step 4) of locally color consistency loss function comprises:

(1) Selecting a feature map of one intermediate layer as a feature representing color information;

(2) Defining a local area on the feature map, and using a 32x32 local block as the local area;

(3) In each local area, calculating the average value of pixels in the feature map to obtain the average feature of the local area;

(4) For the label image and the generated image, respectively calculating the average value of pixels in the feature map in the same local area to obtain the average features of the label image and the generated image in the local area;

(5) Calculating differences between the average features of the label image and the generated image in the local area by using the mean square error;

(6) And (5) averaging the differences of all the local areas to obtain the local color consistency loss of the whole image.