CN114511785A

CN114511785A - Remote sensing image cloud detection method and system based on bottleneck attention module

Info

Publication number: CN114511785A
Application number: CN202210151693.9A
Authority: CN
Inventors: 姚正; 马雷; 万玲; 程健
Original assignee: Zhongke Nanjing Artificial Intelligence Innovation Research Institute; Institute of Automation of Chinese Academy of Science
Current assignee: Zhongke Nanjing Artificial Intelligence Innovation Research Institute; Institute of Automation of Chinese Academy of Science
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-05-17

Abstract

The invention provides a remote sensing image cloud detection method and system based on a bottleneck attention module, and belongs to the technical field of remote sensing image processing. The method comprises the following steps: step 1, obtaining and processing a sample set; step 2, building a cloud detection network framework of the remote sensing image; step 3, performing iterative training on the cloud detection network of the remote sensing image; and 4, outputting a remote sensing image cloud detection prediction result. The detection system comprises a sample acquisition module, a network framework building module, a training module and an output module. The remote sensing image cloud detection network constructed by the invention adopts a bottleneck attention structure, the structure combines the respective advantages of a lightweight convolution, a linear reversal residual error structure and a coordinate attention mechanism, the problems of reduced characteristic quantity, reduced precision and the like caused by the adoption of the lightweight convolution are mainly solved, a cloud detection system with low parameter quantity and low operation quantity is realized, and a light-weight on-orbit cloud detection effect with higher precision is achieved.

Description

Remote sensing image cloud detection method and system based on bottleneck attention module

Technical Field

The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image cloud detection method and system based on a bottleneck attention module.

Background

Modern remote sensing satellite images are widely applied to production and life of people, however, due to cloud layer obstruction, available ground object information in images downloaded in satellite remote sensing is little and even cannot be used. With the continuous development of artificial intelligence, the cloud detection method based on deep learning makes a new progress on the remote sensing image cloud detection technology, and becomes a main research method in multiple fields of current image processing through the strong characteristic generalization and expression capability of the cloud detection method. At present, three methods are mainly adopted for remote sensing cloud detection, namely a rule-based cloud detection method, a machine learning-based cloud detection method and a deep learning-based cloud detection method.

The rule-based cloud detection method needs to set corresponding thresholds according to image transmission technologies of different detectors to achieve cloud removal or achieve cloud removal according to a graphics image processing technology, and can achieve a high-precision cloud detection effect generally by setting reasonable thresholds or adopting multiple image processing processes.

The cloud detection method based on machine learning is realized by combining a rule-based cloud detection method with some machine learning knowledge such as random forests, support vector machines, neural networks and the like, and the methods can further improve the cloud detection precision, but the complicated artificial feature selection process needs a great deal of energy and time, and the on-orbit cloud detection effect of the remote sensing satellite cannot be met.

According to the cloud detection method based on deep learning, a high-depth and large-width convolutional neural network is usually adopted, a high-precision remote sensing image cloud detection model can be obtained, however, as the depth of the network is increased, the parameters and the calculation amount of the network are also greatly improved, and the in-orbit cloud detection speed and the space occupation of a remote sensing satellite system are seriously influenced. Therefore, how to reduce the amount of computation and parameter of the network becomes a difficult problem. At present, light-weight convolution neural networks mainly adopt light-weight convolution to achieve the effect of light weight, but due to the special convolution process of the light-weight convolution, the reduction of network characteristic quantity and the great reduction of precision are caused, and the on-orbit high-precision cloud detection effect cannot be achieved.

Disclosure of Invention

The purpose of the invention is as follows: a remote sensing image cloud detection method based on a bottleneck attention module is provided, and a system for realizing the method is further provided, so that the problems in the prior art are effectively solved.

In a first aspect, a remote sensing image cloud detection method based on a lightweight neural network is provided, and the method comprises the following steps:

step 1, obtaining a sample set;

step 2, building a cloud detection network framework of the remote sensing image;

step 3, iterative training;

and 4, outputting a prediction result and carrying out comparative analysis.

In a further embodiment of the first aspect, the process of obtaining the sample set further comprises:

step 1-1, downloading a required sample set from a Chinese resource satellite application satellite, removing samples which do not contain cloud and have too little cloud content, describing a cloud-containing area for each sample by qualified samples through labelme label making software to serve as a label, and making a total sample set N:

N＝{(I₁,L₁),(I₂,L₂),…,(I_N,L_N)

in the formula I_NRepresenting the nth image, L_nA label representing the nth image;

step 1-2, forming a training sample set P by N remote sensing image labels randomly selected from a remote sensing image N:

P＝{(I₁,L₁),(I₂,L₂),…(I_n-1,L_n-1),(I_n,L_n)

in the formula I_nRepresents the nth training sample, L_nA label representing the nth training image;

step 1-3, forming a test sample set T by the rest N-N remote sensing images and labels:

T＝{(I₁,L₁),(I₂,L₂),…(I_N-n-1,L_N-n-1),(I_N-n,L_N-n)

in the formula I_N-nRepresenting the N-N test images, L_N-nA label representing the nth test image; wherein, P>>In a further embodiment of the first aspect, the process of constructing the cloud detection network framework for remote sensing images further comprises:

step 2-1, building a main frame consisting of an encoder, a decoder, a bottom layer connecting layer and a combined module;

step 2-2, defining a cloud detection loss function of the remote sensing image:

L＝L₁+L₂

where L is the total loss function, L₁Is a cross entropy function of two classes, L₂Is a set similarity function;

wherein, a two-class cross entropy function L₁The expression of (a) is as follows:

in the formula, T_iLabel representing sample i, with a positive class of 1 and a negative class of 1. And N is the total number of samples. P_iThe main role of Softmax is to calculate the output of the network as the probability distribution of each class, k being the number of classes, Z, for the probability of class i calculated by the Softmax classifier_i，Z_jThe predicted output result in the network is the category i, j.

DICE LOSS function L₂The expression of (a) is as follows:

in the formula, TP is the case where a positive sample is predicted as a positive sample by the network. FN is the case where negative samples are predicted by the network as positive samples. FP is the case where negative samples are predicted by the network to be negative samples.

In a further embodiment of the first aspect, the process of iteratively training the cloud detection network of remote sensing images further includes:

step 3-1, adopting a freezing training mode, setting the initial iteration times to be 300, and setting the initial learning rate to be 10^-4When the number of iterations reaches 300, the learning rate is converted into 10^-5The maximum number of iterations is set to 600;

step 3-2, inputting the training sample set into the remote sensing image cloud detection network in batches to carry out a forward propagation process, and obtaining a network prediction result x of each time after passing through an intermediate hidden layer;

and 3-3, performing feature learning on the network by adopting a back propagation algorithm, calculating an error epsilon between a prediction result of each iteration of the network and corresponding label of the batch of training samples by adopting a loss function, updating a network convolution kernel weight parameter and a middle layer connection parameter by using an Adam optimizer, reducing the error epsilon between the network prediction result and a label, and finally obtaining a model weight parameter after 600 iterations.

In a further embodiment of the first aspect, the process of outputting a remote sensing image cloud detection prediction result further comprises:

and inputting the image test set m into the trained remote sensing image cloud detection network for prediction to obtain a final model prediction result.

The second aspect provides a remote sensing image cloud detection system which comprises a sample acquisition module, a network framework building module, a training module and an output module. The sample acquisition module is used for acquiring a sample set; the network frame building module is used for building a remote sensing image cloud detection network frame; the training module is used for carrying out iterative training on the remote sensing image cloud detection network; the output module is used for outputting a remote sensing image cloud detection prediction result.

The image cloud detection network framework built by the network framework building module consists of an encoder, a decoder, a bottom layer connecting layer and a bottleneck attention module, wherein the encoder comprises 4 layers of chunks, each layer of the encoder comprises 2 3 x 3 lightweight convolution layers and a down-sampling layer, and the input of each layer is the output result of the down-sampling of the upper layer. The decoder also comprises four layers of chunks, each layer comprises 2 lightweight convolution layers and an up-sampling layer, and the input of each layer is the up-sampling output result of the previous layer and is cascaded with the output of the bottleneck attention module of the same layer; the encoder and the decoder are connected through a bottleneck attention module. The bottleneck attention module is a combination of a linear inversion residual error and a coordinate attention mechanism, and the input of the bottleneck attention module is the output of each layer of encoder after twice light-weighted convolution layers. The bottom layer connecting layer comprises 2 lightweight convolution layers of 3 multiplied by 3, the input is the output result after the down sampling of the fourth layer of the encoder, and the output is connected with the bottleneck attention module of each layer through the up sampling layer to realize the characteristic fusion step.

The down-sampling module comprises a convolution layer, a batch standardization layer, a ReLu activation function and a maximum pooling layer; the up-sampling module comprises a convolution layer, a batch standardization layer, a ReLu activation function and an up-sampling layer; the lightweight module comprises a depth separable convolution, a ReLu activation function, a Ghost module and a batch normalization layer. The bottleneck attention includes a linear inversion residual module and a coordinate attention mechanism module. The linear inversion residual comprises two point-by-point convolution layers and a lightweight convolution layer and a residual module. The coordinate attention mechanism module comprises a global maximum pooling layer, a full-connection layer, a batch standardization layer, a convolution layer and a sigmoid function layer in two directions.

In a third aspect, a cloud detection device is provided, which includes: at least one processor and memory; the memory stores computer-executable instructions; the at least one processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform the method of cloud detection of a remotely sensed image as described in the first aspect.

In a fourth aspect, a readable storage medium is provided, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method for cloud detection of remote sensing images according to the first aspect is implemented.

Has the advantages that: the remote sensing image cloud detection network constructed by the invention adopts a bottleneck attention module to solve the problems of reduced characteristic quantity and reduced precision after the network adopts light-weight convolution. Meanwhile, the bottleneck attention module provided by the invention combines the advantages of the lightweight convolution, the linear inversion residual error module and the attention mechanism, so that the network can extract deep features, and the coordinate attention mechanism decomposes the channel attention into two channels for global average pooling along different directions. Enabling the attention module to capture long-term dependencies in one spatial direction and maintain accurate position information in another spatial direction helps the network to more accurately locate objects of interest while increasing the connectivity between different channels by capturing long-range spatial interactions with accurate position information. And finally, improving the cloud detection precision of the remote sensing image.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention.

FIG. 2 is a schematic diagram of a cloud detection network for remote sensing images according to the invention.

Fig. 3 is a schematic structural diagram of a downsampling module constructed by the present invention.

Fig. 4 is a schematic structural diagram of an upsampling module constructed by the present invention.

FIG. 5 is a schematic diagram of the general structure of a lightweight convolution module constructed according to the present invention.

FIG. 6 is a schematic structural diagram of a combined module constructed by the present invention.

FIG. 7 is a schematic diagram of a coordinate attention machine constructed in accordance with the present invention.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.

The embodiment provides a remote sensing image cloud detection method based on a lightweight neural network, which is realized by a remote sensing image cloud detection system, wherein the system comprises a sample acquisition module, a network framework building module, a training module and an output module. The sample acquisition module is used for acquiring a sample set; the network frame building module is used for building a remote sensing image cloud detection network frame; the training module is used for carrying out iterative training on the remote sensing image cloud detection network; the output module is used for outputting a remote sensing image cloud detection prediction result.

The image cloud detection network framework built by the network framework building module consists of an encoder, a decoder, a bottom layer connecting layer and a bottleneck attention module, wherein the encoder comprises 4 layers of chunks, each layer of the encoder comprises 2 3 x 3 lightweight convolution layers and a down-sampling layer, and the input of each layer is the output result of the down-sampling of the upper layer. The decoder also comprises four layers of chunks, each layer comprises 2 lightweight convolution layers and an up-sampling layer, and the input of each layer is the up-sampling output result of the previous layer and is cascaded with the output of the bottleneck attention module of the same layer; the encoder and the decoder are connected through a bottleneck attention module. The bottleneck attention module is a combination module of light-weight convolution, linear reversal residual error and coordinate attention mechanism, and the input of the bottleneck attention module is the output of each layer of encoder after twice light-weight convolution layers. The bottom layer connecting layer comprises 2 lightweight convolution layers of 3 multiplied by 3, the input is the output result after the down sampling of the fourth layer of the encoder, and the output is connected with the bottleneck attention module of each layer through the up sampling layer to realize the feature fusion.

The down-sampling module comprises a convolution layer, a batch standardization layer, a ReLu activation function and a maximum pooling layer; the up-sampling module comprises a convolution layer, a batch standardization layer, a ReLu activation function and an up-sampling layer; the lightweight module comprises a depth separable convolution, a ReLu activation function, a Ghost module and a batch normalization layer. The bottleneck attention module includes a linear inversion residual error module and a coordinate attention mechanism module. The linear inversion residual includes two point-by-point convolutional layers and one lightweight convolutional layer. The coordinate attention mechanism module comprises a global maximum pooling layer, a full-connection layer, a batch standardization layer, a convolution layer and a sigmoid function layer in two directions.

Specifically, the remote sensing image cloud detection method based on the lightweight neural network comprises the following steps, as shown in fig. 1:

acquiring a training sample set and a test sample set:

the data set downloads a required sample set from a Chinese resource satellite application center, and 3168 remote sensing images with labels and containing a large number of cloud layer areas are screened from the sample set. Firstly, forming 3168 remote sensing images into a sample set, and manually labeling a cloud-containing area by using labelme label making software. And (4) randomly extracting 2068 images containing labels as a training sample set to be input into the model for iterative training. And the remaining 1100 remote sensing images containing the labels are used as a sample set for testing, wherein the remote sensing images and the labels have the same size and are binary images. The area with a pixel value of 0 in the label is classified as a background ground feature, and the point with a pixel value of 255 in the label area is classified as a cloud. Considering the network training problem and reducing the processing time of each scene, accelerating the learning process of global information, reducing the size of each scene image to 1320 × 1200, the spatial resolution to 160 meters, and normalizing the data to be between 0-1 during training.

Step two, constructing a remote sensing image cloud detection network:

2a) as shown in fig. 2, the framework is composed of an encoder, a decoder, a bottom layer connection layer and a bottleneck attention module, wherein the encoder comprises 4 layers of chunks, each layer of the encoder comprises 2 lightweight convolutional layers with convolutional kernel size of 3 × 3 and a down-sampling layer, and the input of each layer is the output result of down-sampling of the upper layer. The decoder also comprises four layers of chunks, each layer comprises 2 lightweight convolution layers and an up-sampling layer, and the input of each layer is the up-sampling output result of the previous layer and is cascaded with the output of the bottleneck attention module of the same layer; the encoder and the decoder are connected through a bottleneck attention module. The bottleneck attention module is a combination of a linear reversal residual error and a coordinate attention mechanism, and the input of the combined module is the output of each layer of encoder after twice light-weighted convolution layers. The bottom layer connecting layer comprises 2 lightweight convolution layers with convolution kernel size of 3 multiplied by 3, the input is the output result after the down sampling of the fourth layer of the encoder, and the output is connected with the bottleneck attention module of each layer through the up sampling layer to realize feature fusion.

As shown in fig. 3, the down-sampling module includes a convolution layer, a batch normalization layer, a ReLu activation function, and a maximum pooling layer; as shown in fig. 4, the upsampling module includes a convolution layer, a batch normalization layer, a ReLu activation function, and an upsampling layer; lightweight convolutional layers as in fig. 5, the module contains a depth separable convolutional layer, a ReLu activation function, a Ghost module and a batch normalization layer. The method comprises the steps of firstly separating RGB channels of an image by adopting a convolution layer with convolution kernel size of 3 x 3, then respectively carrying out convolution operation to generate feature layers, and finally synthesizing a new feature layer.

The method comprises the steps that a Ghost module is composed of two parts, the first part is subjected to feature extraction by utilizing a convolution layer and a batch normalization layer, a ReLu activation function is used for carrying out feature extraction on a feature map, the second part is subjected to linear folding mapping through another convolution layer and a batch normalization layer, the ReLu activation function is used for carrying out linear folding mapping on the feature map, a hyper parameter S is set, the dimensionality of linear mapping is changed by changing the numerical value of the hyper parameter S, then a splicing layer is adopted, the feature map obtained through mapping is directly spliced on the feature map obtained by the first part, and the calculated amount of a network can be reduced through the operation dimensionality reduction convolution means.

The bottleneck attention module is shown in fig. 6, which combines the respective advantages of the lightweight convolution, the linear inversion residual module and the coordinate attention mechanism. The overall structure of the linear reverse residual error module comprises two point-by-point convolution layers and a lightweight convolution layer. The structure firstly expands the dimension to a high-dimensional space through point-by-point convolution with the convolution kernel size of 1 multiplied by 1, secondly, uses the depth separable convolution with the convolution kernel size of 3 multiplied by 3 to extract the features, and finally adopts the point-by-point convolution with the convolution kernel size of 1 multiplied by 1 to reduce the dimension to a low-dimensional space to operate, so that the network can extract more deep-level feature information in the high-dimensional space and only increases a small-amplitude operation amount. Meanwhile, in order to control dimension transformation scale, a hyper-parameter mu is introduced to adjust the expansion ratio of different dimensions, and the expansion ratio is usually set to be 6. The overall structure of the coordinate attention mechanism module is shown in fig. 6, and the coordinate attention mechanism module comprises a global maximum pooling layer, a full connection layer, a batch normalization layer, a convolution layer and a Sigmoid function layer in two directions, wherein the convolution layer, the maximum pooling layer, the batch normalization layer and the ReLu activation layer are sequentially cascaded. The maximum pooling layer can reduce dimension, remove redundant information, compress characteristics, reduce network complexity and reduce calculated amount, the convolution layer is used for extracting characteristics, the dimension is lifted, gradient dispersion and gradient explosion in the training process can be avoided by adding the batch normalization layer, the generalization capability of the model can be improved by introducing the ReLu activation function, and meanwhile the problem that the gradient disappears can be solved.

The standard convolution parameter equation is as follows:

D_K×D_K×M×N

the depth separable convolution parameter quantity calculation formula is as follows:

D_K×D_K×M+M×N

the Ghost module parameter formula is as follows:

the lightweight convolution and the standard convolution parameters vary as follows

The standard convolution calculated quantity formula is as follows:

D_F·D_F·D_K·D_K·M·N

the depth separable convolution calculated quantity formula is as follows:

D_K·D_K·M·D_F·D_F+M·N·D_F·D_F

the calculation amount formula of the Ghost module is as follows:

the total calculation amount of the standard convolution of the lightweight convolution kernel varies as follows:

D_K×D_KXMxN is a parameter quantity calculation formula of the conventional convolution, D_K×D_KX M + M x N is a parameter calculation formula of the deep lightweight convolution, D_F·D_F·D_K·D_KM.N is the formula for the total calculation of the conventional convolution, D_K·D_K·M·D_F·D_F+M·N·D_F·D_FAnd (4) a calculation formula of total calculation amount for depth lightweight.

Wherein D_KRepresenting the convolution kernel size, D_FFor the feature layer size, M is the number of input channels, N is the number of output channels, and S is a hyper-parameter that controls the compression ratio.

Step three, performing iterative training on the remote sensing image cloud detection network:

3-2, inputting the training sample set into the remote sensing image cloud detection network in batches for forward propagation, and obtaining each network prediction result after passing through an intermediate hidden layer;

3-3, performing feature learning on the network by adopting a back propagation algorithm, calculating an error epsilon between a prediction result of each iteration of the network and corresponding label of the batch of training samples by adopting a loss function, then updating a network convolution kernel weight parameter and a middle layer connection parameter by using an Adam optimizer, reducing the error epsilon between the network prediction result and a label, and finally obtaining a model weight parameter after 600 iterations;

1) adopting a freezing training mode, setting the initial iteration number to be 300 and the initial learning rate to be 10^-4When the number of iterations reaches 300, the learning rate is converted into 10^-5The maximum number of iterations is set to 600;

2) inputting a training set sample into a remote sensing image cloud detection network for forward propagation training, extracting multi-scale characteristic information through continuous down-sampling at an encoding module stage, and obtaining a network prediction result x of each time after passing through an intermediate hidden layer;

3) performing feature learning on the network by adopting a back propagation algorithm, calculating an error epsilon between a prediction result of the network iterated each time and a corresponding label of the batch of training samples by adopting a loss function, and then using an Adam optimizer to perform network convolution kernel weight parameter omega^tAnd intermediate layer connection parameter omega^tUpdating is carried out, the error epsilon between the network prediction result and the label is reduced, and finally, the model weight parameter after 600 iterations is obtained, wherein the updating formulas are respectively as follows:

wherein eta is stepLong, usually eta ═ 1 × 10-5, omega^t+1,v^t+1Respectively represent omega^t,v^tAs a result of the update of (a),

the partial derivative result is shown.

Fourthly, obtaining a cloud detection prediction result of the remote sensing image:

and inputting the remote sensing image test set m into the trained remote sensing image cloud detection network for testing to obtain a final prediction result. And the prediction result set represents a cloud detection result image of the probability that each pixel is a cloud, each pixel in the cloud detection result image represents the detection result of the corresponding pixel of the input image, if the detection is classified as a cloud, the color of the detection is set to be white, the corresponding pixel is 255, otherwise, the detection is the background, the detection is set to be black, and the corresponding pixel is 0.

A remote sensing image cloud detection system comprises a sample acquisition module, a network framework building module, a training module and an output module. The sample acquisition module is used for acquiring a sample set; the network frame building module is used for building a remote sensing image cloud detection network frame; the training module is used for carrying out iterative training on the remote sensing image cloud detection network; the output module is used for outputting a remote sensing image cloud detection prediction result.

The image cloud detection network framework built by the network framework building module consists of an encoder, a decoder, a bottom layer connecting layer and a bottleneck attention module, wherein the encoder comprises 4 layers of chunks, each layer of the encoder comprises a light convolution layer with the size of 2 convolution kernels being 3 multiplied by 3 and a down-sampling layer, and the input of each layer is the output result of the down-sampling of the upper layer. The decoder also comprises four layers of chunks, each layer comprises 2 lightweight convolution layers and an up-sampling layer, and the input of each layer is the up-sampling output result of the previous layer and is cascaded with the output of the bottleneck attention module of the same layer; the encoder and the decoder are connected through a bottleneck attention module. The bottleneck attention module is a combination of a linear inversion residual error and a coordinate attention mechanism, and the input of the bottleneck attention module is the output of each layer of encoder after twice light-weighted convolution layers. The bottom layer connecting layer comprises 2 lightweight convolution layers with convolution kernel size of 3 multiplied by 3, the input is the output result after the down sampling of the fourth layer of the encoder, and the output is connected with the bottleneck attention module of each layer through the up sampling layer to realize feature fusion.

The down-sampling module comprises a convolution layer, a batch standardization layer, a ReLu activation function and a maximum pooling layer; the up-sampling module comprises a convolution layer, a batch standardization layer, a ReLu activation function and an up-sampling layer; the lightweight module comprises a depth separable convolution, a ReLu activation function, a Ghost module and a batch normalization layer. The bottleneck attention module comprises a light weight convolution module, a linear inversion residual error module and a coordinate attention mechanism module. The linear inversion residual includes two point-by-point convolutional layers and one lightweight convolutional layer. The coordinate attention mechanism module comprises a global maximum pooling layer, a full-connection layer, a batch standardization layer, a convolution layer and a sigmoid function layer in two directions.

Specific experiments are as follows:

in the experiment, two evaluation indexes, namely a Dice Coefficient (Dice Coefficient) and a cross ratio (IOU), are used as two-classification cloud detection experiment evaluation standards, and meanwhile, in order to quantify the actual improving effect of the module, a Total Parameter (Total Parameter) and an average prediction Speed (Speed) of each graph are introduced as references.

Dice coefficient is defined as follows:

wherein, | X^img∩Y^labelThe | represents the intersection between the prediction result and the label, the larger the intersection is, the stronger the similarity between the two samples is, the numerator is multiplied by 2, so as to ensure that the result can be finally in [0, 1 ]]In the meantime. The cross ratio is the ratio of the correctly classified pixels to the actual total pixels. The calculation formula is as follows:

wherein, P_iiTo classify the correct pixel, P_ijFor pixels for which the cloud is classified as non-cloud, P_jiNon-cloud is classified as a cloud pixel.

Table 1 comparison of the modules

In the above table, U-NetA represents the result of the Unet model using the lightweight convolution instead of the standard convolution training. U-NetB represents the experimental result of introducing a linear reversal residual error module after a lightweight convolution is adopted in a Unet model. U-NeTC represents the experimental result of introducing a coordinate attention mechanism after a lightweight convolution is adopted in a Unet model. The method herein is an experimental result of combining training of the lightweight convolution and the other two modules together. Meanwhile, the network parameters can be greatly reduced by adopting the lightweight convolution to replace the standard convolution, but the precision is reduced, the respective functions of the two modules can be proved by introducing a linear reversal residual error and a coordinate attention mechanism, and the best effect when the two modules are combined is also proved by displaying the result after the two modules are combined from the result.

TABLE 2 comparison of cloud detection methods

As can be seen from the table, the method is improved on the basis of the U-Net method, and in the field of cloud detection of remote sensing images, the method is superior to other semantic segmentation cloud detection methods in speed, precision and parameter quantity. Meanwhile, on the basis of the original remote sensing cloud detection method, the large reduction of the parameter quantity can be achieved only by sacrificing a small amount of detection speed, and meanwhile, the precision is improved, so that the effectiveness of the combined module is proved.

In summary, the invention provides a high-precision remote sensing cloud image detection method based on a bottleneck attention module, and particularly provides a combined module combining lightweight convolution, a linear reversal residual module and a coordinate attention mechanism, which is mainly used for solving the problems of reduced network characteristic quantity, reduced precision and the like after the lightweight convolution is adopted in a network. And the bottleneck attention module can be inserted into other network structures, and is suitable for all the remote sensing image pixel segmentation fields to directly improve the detection precision.

As noted above, while the present invention has been shown and described with reference to certain preferred embodiments, it is not to be construed as limited thereto. Various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The remote sensing image cloud detection method based on the bottleneck attention module is characterized by comprising the following steps:

step 1, obtaining a sample set and processing the sample set;

step 2, building a cloud detection network overall framework of the remote sensing image;

step 3, iterative training;

and 4, outputting a prediction result and comparing.

2. The remote sensing image cloud detection method according to claim 1, wherein step 1 further comprises:

step 1-1, downloading a preset sample set, eliminating samples which do not contain cloud and have too little cloud content, describing a cloud-containing area for each sample as a label by qualified samples through labelme label making software, and making a total sample set N:

N＝{(I₁,L₁),(I₂,L₂),…,(I_N,L_N)

in the formula I_NRepresents the nth sheetImage, L_nA label representing the nth image;

P＝{(I₁,L₁),(I₂,L₂),…,(I_n,L_n)

T＝{(I₁,L₁),(I₂,L₂),…,(I_N-n,L_N-n)

in the formula I_N-nRepresenting the N-N test images, L_N-nA label representing the nth test image; wherein, P>>T。

3. The remote sensing image cloud detection method according to claim 1, wherein step 2 further comprises:

step 2-2, defining a cloud detection loss function of the remote sensing image:

L＝L₁+L₂

in the formula, TiLabel representing the sample i, with a positive class of 1 and a negative class of 1; n is the total number of samples; p_iThe main role of Softmax is to calculate the output of the network as the probability distribution of each class, k being the number of classes, Z, for the probability of class i calculated by the Softmax classifier_i，Z_jThe output result predicted in the network for the category i, j;

DICE LOSS function L₂The expression of (a) is as follows:

in the formula, TP is the case where the positive sample is predicted as a positive sample by the network; FN is the case where negative samples are predicted by the network as positive samples; FP is the case where negative samples are predicted by the network to be negative samples.

4. The remote sensing image cloud detection method according to claim 1, wherein step 3 further comprises:

3-2, inputting the training sample set into the remote sensing image cloud detection network in batches to perform a forward propagation process, and obtaining a network prediction result x of each time after passing through the middle layer;

5. The remote sensing image cloud detection method according to claim 1, wherein step 4 further comprises:

inputting the image test set m into the trained remote sensing image cloud detection network for prediction to obtain a final model prediction result as follows:

step 4-1, inputting a test sample set into network models which are respectively added into different modules for prediction, classifying the network prediction into cloud pixel points, and setting the pixel values to be 255 and white; classifying the pixel points as background, and setting the pixel value to be 0 and black;

step 4-2, comparing the prediction result with the real label, and analyzing the actual effect of each module after being respectively added into the network or combined with the network;

and 4-3, comparing the remote sensing image cloud detection network added with the bottleneck attention module with other remote sensing image cloud detection methods.

6. Remote sensing image cloud detecting system, its characterized in that, the system includes:

the sample acquisition module is used for acquiring and processing a sample set;

the network framework building module is used for building a remote sensing image cloud detection network framework;

the training module is used for carrying out iterative training on the remote sensing image cloud detection network;

and the output module is used for outputting the remote sensing image cloud detection network prediction result.

7. The remote sensing image cloud detection system of claim 6,

the image cloud detection network framework built by the network framework building module further comprises:

at least four-layer encoder, each layer of said encoder comprising at least two 3 x 3 lightweight convolutional layers and at least one downsampled layer, with the input of each layer being the output of the downsampling of the upper layer;

the decoder comprises at least four layers, each layer comprises at least two lightweight convolutional layers and at least one upsampling layer, and the input of each layer is the upsampling output result of the previous layer and is cascaded with the output of the combining module of the same layer; the encoder and the decoder are connected through a bottleneck attention module;

the bottom layer-connecting layer comprises at least two 3 x 3 lightweight convolution layers, the input is the output result of the encoder after the fourth layer of downsampling, and the output is connected with the combined module of each layer through the upsampling layer to realize feature fusion;

a bottleneck attention module for connecting the encoder and the decoder; the bottleneck attention module is a combination of a linear reversal residual error and a coordinate attention mechanism, and the input of the bottleneck attention module is the output of each layer of encoder after twice light-weighted convolution layers.

8. Cloud detection equipment, its characterized in that includes: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes the computer-executable instructions stored by the memory to cause the at least one processor to perform the remote sensing image cloud detection method of any of claims 1-5.

9. A readable storage medium, wherein the readable storage medium stores computer executable instructions, and when a processor executes the computer executable instructions, the remote sensing image cloud detection method according to any one of claims 1 to 5 is implemented.