CN117036984B

CN117036984B - Cascade U-shaped network cloud detection method and system integrating attention mechanisms

Info

Publication number: CN117036984B
Application number: CN202311293882.0A
Authority: CN
Inventors: 李星华; 李澳
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2024-01-09
Anticipated expiration: 2043-10-09
Also published as: CN117036984A

Abstract

The invention discloses a cascade U-shaped network cloud detection method and system integrating an attention mechanism. The cloud detection method can accurately detect the cloud in the image, and effectively avoid missing detection and false detection, so that the utilization rate of the remote sensing image is improved, and references are provided for remote sensing applications such as subsequent cloud removal, change detection and target tracking.

Description

Cascade U-shaped network cloud detection method and system integrating attention mechanisms

Technical Field

The invention belongs to the technical field of remote sensing image processing, and particularly relates to a cascade U-shaped network cloud detection method and system integrating an attention mechanism.

Background

The remote sensing image has great effect in research fields such as natural disaster detection, agricultural resource environment management, town investigation and the like, but in the optical earth observation satellite image, the cloud can shield objects on the ground, so that information which is difficult to recover is lost, difficulty is caused to interpretation and application of the remote sensing image, and the utilization rate of the image is reduced. Therefore, accurately identifying cloud coverage on a remote sensing image is an important link in remote sensing information processing.

The remote sensing image cloud detection refers to extracting an area covered by cloud from an image. On an optical remote sensing image, the ground object is imaged mainly through transmitting and reflecting electromagnetic wave information, and different characteristics of the ground object are different in transmitting and reflecting characteristics, and different characteristics are shown on the image, so that a basis is provided for distinguishing or identifying cloud layers. Although clouds show different morphological differences in different conditions such as atmosphere, height, thickness, shape and the like, the clouds basically have the characteristics of high reflectivity, low temperature and the like, and can be distinguished from ground objects.

The conventional Shan Jingyun detection method is mostly based on pixel threshold values, statistical analysis, simple pattern recognition and other methods. The pixel threshold-based method is to analyze with a single pixel on the image, and the method is simple in calculation as a whole, but often ignores the spatial information and texture characteristics of the cloud, so that the detection result is easy to depend on a specific wave band, such as a thermal infrared wave band. The statistical analysis method mainly uses some statistical properties of the cloud, such as spectrum, texture, geometric properties, etc., but mainly uses low-level features of images, and has certain limitations when facing complex and diverse surface environments and diversified clouds.

Along with the development of computers, pattern recognition technology is paid attention to by vast scholars, and comprises methods such as clustering, fuzzy clustering, artificial neural networks, support Vector Machines (SVM) and the like. The methods such as clustering, support vector machines and the like are early in development, a mature system is formed at present, and the detection accuracy is limited. Artificial neural networks are a research hotspot with artificial intelligence, and have made a major breakthrough in recent years. The device has flexible structural design, can fully extract deep features of images, has universality in time and space and high recognition precision, but has thin cloud recognition and edge detail capturing capability to be improved.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a cascade U-shaped network cloud detection method and system integrating an attention mechanism, which capture complex semantic features of cloud in a remote sensing image through two serially connected U-shaped networks to obtain higher cloud detection precision.

In order to achieve the above purpose, the present invention provides a cascade U-shaped network cloud detection method integrating an attention mechanism, comprising the following steps:

step 1, preprocessing multispectral remote sensing image data;

step 1.1, performing radiation correction and geometric correction on a multispectral image;

step 1.2, normalizing all pixel values of the remote sensing image data subjected to radiation correction and geometric correction to a range of 0-1;

step 1.3, carrying out horizontal or vertical random overturn and noise addition on the normalized multispectral remote sensing image data to realize data enhancement;

and 1.4, cutting the multispectral remote sensing image data after data enhancement according to the constructed cascade U-shaped network convolution model fusing the attention mechanisms.

Step 2, constructing a sample data set and a corresponding sample label set for remote sensing image cloud detection;

step 3, constructing a cascade U-shaped network integrating an attention mechanism;

step 4, inputting training sample data sets with labels into the cascade U-shaped network built in the step 3 in batches, setting a loss function, training and optimizing network parameters by using an Adam gradient descent algorithm, verifying by using a verification sample data set, and realizing network convergence after multiple complete training to obtain a trained network model;

step 5, inputting the test sample data set into the cascade U-shaped network trained in the step 4, calculating the probability that each pixel in the test sample is cloud, and outputting a cloud prediction probability value of the test sample data;

and 6, carrying out threshold selection, binarization and data stitching on the test result to form a complete image, and outputting a final predictive label value of the test sample data set to obtain a cloud detection result.

In step 1.1, the number of bits isNThe normalized formula is as follows:

（1）

in the method, in the process of the invention,xrepresenting the pixel value of one pixel in the multispectral remote sensing image data,Nthe number of bits of the image is represented,representing the pixel value of the normalized pixel.

In the step 2, a multispectral remote sensing image cloud detection data set is constructed by utilizing the preprocessed multispectral remote sensing image data, a corresponding sample label set is manufactured, the cloud pixel label corresponds to 1, and the non-cloud pixel label corresponds to 0. The multispectral remote sensing image sample data set and the sample label set are divided into a training set part and a testing set part, wherein the training set part covers different types of land covers and clouds with different shapes. Before performing the network training, the training set portion is divided into a training sample data set and a verification sample data set, and the test set portion includes only the test sample data set.

Furthermore, the cascade U-network with the attention mechanism in step 3 includes two U-networks, and the two U-networks are connected by a convolution. Each U-shaped network comprises three parts, namely a contracted path, a jump connection and an expanded path, which are respectively used for downsampling to extract image features, saving semantic details and upsampling to recover cloud attributes. The contracted and expanded paths are connected by a Bridge layer that contains two convolutions and a ReLU activation function, and in the second convolution a Dropout parameter is set to prevent overfitting. The first U-shaped network has five layers, and an attention mechanism is added in the jump connection process, and the second U-shaped network has four layers, so that no additional structure is added for reducing the calculation amount.

Inputting the preprocessed multispectral remote sensing image into a first layer of a first U-shaped network contraction path, setting a residual error structure comprising two parts of a feature extraction branch and a convolution connection branch in each downsampling of the contraction path, wherein the feature extraction branch and the convolution connection branch are connected in parallel, the feature extraction branch is two continuous convolutions and is used for extracting features from the image, and the convolution connection branch is connected with the input of the first layer through a convolution with a smaller convolution kernel and is used for storing context information. Obtained by adding the two branch resultsOutput to the jump connection, add and maximize the result of pooling downsampling +.>Output to the next layer of the shrink path, the process is expressed as follows:

（2）

in the method, in the process of the invention,representing the first U-shaped network shrink path +.>Characteristics of layer output->Indicating the convolution kernel size in the feature extraction branch in the first U-network downsampling layer,/->Indicating the convolution kernel size in the convolution connection branches in the downsampling layer of the first U-network,/>Representing a first U-shaped network contracted pathThe size of the pooled convolution, +.>Indicating the first U-shaped network contracted path convolution step length,/->Features that indicate that the first U-shaped network shrink path is output to the next layer,representing the first U-shaped network contracted pathiLayer output to jump feature, +.>Representing the processing operations of each layer of the first U-shaped network shrink path.

Employing self-connecting unified featuresThe dimension of each layer in the maximal pooling unified contraction path is utilized, and the dimension sequentially passes through a channel attention module and a space attention module, and the process is expressed as follows:

（3）

in the method, in the process of the invention,is pointed to in the first U-shaped network contracted pathiLayer output to jump feature, +.>Jumping connection to first U-shaped networkiOutput of layer->Representing the processing operations of each layer of the first U-shaped network hops.

In each up-sampling layer of the expansion path, the information extracted from each previous layer is up-sampled by deconvolution, and then is spliced and transmitted to a decoder, so that semantic details are restored by utilizing multi-scale features of the image, and the process is expressed as follows:

（4）

in the method, in the process of the invention,representing the first U-shaped network expansion pathiCharacteristics of layer output->Representing the first U-shaped network expansion pathiThe characteristics of each layer output in front of the layer, < >>Jumping connection to first U-shaped networkiOutput of layer->And->The convolution kernel sizes of convolution and deconvolution in the sampling layer on the first U-shaped network are respectively represented,/->Step length of deconvolution of expansion path of first U-shaped network is indicated, < >>Representing the processing operations of each layer of the first U-shaped network expansion path.

Expanding the first U-shaped network to output of the first layerBy a convolution to get->Will->The method comprises the steps of inputting the characteristic extraction branches and the convolution connection branches into a first layer of a second U-shaped network contraction path, wherein the contraction path of the second U-shaped network is identical to that of the first U-shaped network, residual error structures comprising the characteristic extraction branches and the convolution connection branches are arranged in each downsampling layer of the contraction path, the characteristic extraction branches are connected in parallel, the characteristic extraction branches are two continuous convolutions and are used for extracting characteristics from images, and the convolution connection branches are connected with the input of the convolution connection branches through a convolution with a smaller convolution kernel and are used for storing context information. Adding the two branch results to each other>Output to the jump connection, add and maximize the result of pooling downsampling +.>Output to the next layer of the shrink path, the process is expressed as follows:

（5）

in the method, in the process of the invention,representing a second U-shaped network shrink path +.>Characteristics of layer output->Indicating the convolution kernel size in the feature extraction branch in the downsampling layer of the second U-network,/->Indicating the convolution kernel size in the convolution connection branches in the downsampling layer of the second U-network,/->Representing the size of the second U-shaped network contracted path pooling convolution, +.>Indicating the convolution step length of the second U-shaped network contracted path, +.>Representing the characteristics of the second U-shaped network contracted path output to the next layer, +.>Representing a second U-shaped network contracted pathiCharacteristics of layer output to expansion path, +.>Representing the processing operations of each layer of the second U-shaped network shrink path.

The second U-shaped network expansion path firstly carries out deconvolution up-sampling on the output of the upper layer, then splices with the output result of the layer contraction path and transmits the result to a decoder, thereby recovering semantic details, and the process is expressed as follows:

（6）

in the method, in the process of the invention,、/>respectively represent the second U-shaped network expansion pathiLayer and->Characteristics of layer output->Shrink path for second U-shaped networkiOutput of layer->And->Separate tableConvolution kernel size showing convolution and deconvolution in the sampling layer on the second U-network,/->Referring to the step size of the deconvolution of the second U-shaped network expansion path,representing the processing operations of each layer of the second U-shaped network expansion path.

Expanding the second U-shaped network to output of the first layerBy a convolution to get->Then will、/>Adding and convolving again, and finally outputting the values to a continuous range of 0-1 by using sigmoid as an activation function, thereby completing the construction of the cascade U-shaped network fusing the attention mechanisms.

And in the step 4, a training sample data set of the multispectral remote sensing image and corresponding sample label data are selected and input into the cascade U-shaped network built in the step 3 in batches, the output value of each neuron of the convolutional neural network is calculated in the forward direction, the cloud prediction probability value of the training sample is calculated and output, and the loss function of the mixed convolutional neural network is calculated and counter-propagated.

The specific calculation mode of the loss function is as follows:

（7）

in the method, in the process of the invention,a loss value representing a cloud prediction probability value and a tag value,ta mask representing a true cloud of the earth's surface,ythe cloud detection result is represented by a set of images,Nrepresents the total number of real cloud mask pixels, +.>Weight ratio for recall and precision, +.>Is a minimum value to avoid division by 0.

And selecting an Adam optimizer to perform weight optimization, setting an initial learning rate, applying a learning rate attenuation strategy in a training stage, and completing training when the learning rate is reduced to a set threshold value. Model training and accuracy verification are carried out simultaneously, the accuracy of each training of the model is evaluated by using a verification sample data set, network model parameters are adjusted according to the evaluation accuracy of the verification set, the accuracy of each training of the network model is recorded, and the optimal network parameter model is selected according to the accuracy of the model.

And in the step 5, the test sample data set is input into the cascade U-shaped network trained in the step 4 in batches, cloud probability prediction is carried out pixel by pixel, and cloud prediction probability values of batch test sample data are output. And outputting the probability that each pixel in the test sample data set is cloud after all the test sample data sets are predicted.

Moreover, the cloud probability threshold is set in the step 6 asThe cloud prediction probability value in the test sample data set is larger thanAnd (3) regarding the pixels of the images as cloud, binarizing each image according to the threshold, assigning a value of 1 to cloud pixels and 0 to non-cloud pixels to obtain a detection result of each image, selecting small images cut from the same image according to file names, performing data splicing on the small images to obtain a prediction tag value of the whole image, and outputting the prediction tag value as a final cloud detection result.

The invention also provides a cascade U-shaped network cloud detection system integrating the attention mechanism, which is used for realizing the cascade U-shaped network cloud detection method integrating the attention mechanism.

Furthermore, the method comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions in the memory to execute the cascade U-shaped network cloud detection method integrating the attention mechanism.

Or, the method comprises a readable storage medium, wherein the readable storage medium is stored with a computer program, and when the computer program is executed, the method for detecting the cascade U-shaped network cloud with the integrated attention mechanism is realized.

Compared with the prior art, the invention has the following advantages:

according to the invention, two U-shaped network structures are connected in series to construct a cascade U-shaped network integrating an attention mechanism, the multi-scale characteristics of the image are fully integrated in the first network, the network retains and focuses on important characteristics by using dense connection and the attention mechanism, and irrelevant characteristics are restrained, so that cloud detection precision is improved, and the second network is used for optimizing a preliminary cloud detection result and supplementing detail information and cloud boundary information, so that the position and the edge of the cloud in the image can be detected more accurately. Compared with the existing remote sensing image cloud detection method, the method can obtain higher precision and better visual effect.

Drawings

Fig. 1 is a flowchart of a cloud detection method according to an embodiment of the present invention.

Fig. 2 is a diagram of a cascaded U-shaped network framework incorporating an attention mechanism in accordance with an embodiment of the present invention.

FIG. 3 is a structural framework diagram of a U-shaped network feature extraction branch and a convolution connection branch according to an embodiment of the present invention.

Fig. 4 (a) to 4 (e) are graphs comparing the Cloud detection effect of the method proposed by the present invention with that of the U-Net and Cloud-Net networks, wherein fig. 4 (a) is an original multispectral remote sensing image, fig. 4 (b) is a Cloud detection truth image set in an experiment, fig. 4 (c) is a Cloud detection effect graph of the method proposed by the present invention, fig. 4 (d) is a Cloud detection effect graph of the U-Net network, and fig. 4 (e) is a Cloud detection effect graph of the Cloud-Net network.

Detailed Description

The invention provides a cascade U-shaped network cloud detection method and system integrating attention mechanisms, and the technical scheme of the invention is further described below with reference to the accompanying drawings.

Example 1

As shown in fig. 1, the embodiment of the invention provides a cascade U-shaped network cloud detection method integrating an attention mechanism, which comprises the following steps:

and step 1, preprocessing multispectral remote sensing image data.

The method comprises the steps of inputting multispectral remote sensing image data to be processed, firstly carrying out radiation correction and geometric correction, and then carrying out pretreatment such as data normalization, data enhancement, data clipping and the like.

And 1.1, performing radiation correction and geometric correction on the multispectral image, and eliminating radiation distortion and geometric distortion caused by factors such as atmospheric transmission, a sensor, earth curvature and the like.

Step 1.2, normalizing all pixel values of the remote sensing image data subjected to radiation correction and geometric correction to a range of 0-1, wherein the number of bits isNThe normalized formula is as follows:

（1）

And 1.3, carrying out horizontal or vertical random overturning, noise adding and other processing on the normalized multispectral remote sensing image data to realize data enhancement and increase the cloud area characteristics of the image.

And 1.4, cutting the multispectral remote sensing image data subjected to data enhancement to a proper size according to the constructed cascade U-shaped network convolution model fusing the attention mechanisms.

And 2, constructing a sample data set and a corresponding sample label set for remote sensing image cloud detection.

And constructing a multispectral remote sensing image cloud detection data set by utilizing the preprocessed multispectral remote sensing image data, and manufacturing a corresponding sample label set, wherein the cloud pixel label corresponds to 1, and the non-cloud pixel label corresponds to 0. The multispectral remote sensing image sample data set and the sample label set are divided into a training set part and a testing set part, wherein the training set part covers different types of land covers and clouds with different shapes. The present embodiment divides the training set portion into 80% training sample data set and 20% validation sample data set prior to performing the network training, the test set portion comprising only the test sample data set, derived from the image and label data other than the training set portion.

And 3, constructing a cascade U-shaped network integrating the attention mechanisms.

The cascade U-shaped network integrating the attention mechanism comprises two U-shaped networks, and one U-shaped network is used by the two U-shaped networksIs connected by convolutions of (a). Each U-shaped network comprises three parts, namely a contracted path, a jump connection and an expanded path, which are respectively used for downsampling to extract image characteristics, preserving semantic details and upsampling to recover cloud attributes, wherein the contracted path and the expanded path are connected through a Bridge layer, and the Bridge layer comprises two parts>Is activated by the convolution and ReLU of (A) and is at the second +.>Setting the Dropout parameter in the convolution prevents overfitting. The first U-shaped network has five layers, and an attention mechanism is added in the jump connection process, and the second U-shaped network has four layers, so that no additional structure is added for reducing the calculation amount.

And 3.1, building a first U-shaped network.

Inputting the pretreated multispectral remote sensing image into a first U-shaped network contracted pathAnd the first layer is provided with a residual structure comprising two parts of a characteristic extraction branch and a convolution connection branch in each downsampling layer of the contraction path, wherein the characteristic extraction branch and the convolution connection branch are connected in parallel. The feature extraction branch is two continuous convolutions and is used for extracting features from the image, and the convolution connection branch is connected with the input of the convolution connection branch through a convolution with smaller convolution kernel and is used for storing context information. Obtained by adding the two branch resultsOutput to the jump connection, add and maximize the result of pooling downsampling +.>Output to the next layer of the shrink path, the process is expressed as follows:

（2）

in the method, in the process of the invention,representing the first U-shaped network shrink path +.>Characteristics of layer output->The convolution kernel size in the feature extraction branch in the first U-shaped network downsampling layer is represented, and the value of the convolution kernel size in the first U-shaped network downsampling layer is +.>，/>The convolution kernel size in the convolution connection branch in the first U-shaped network downsampling layer is represented, and the value of the convolution kernel size in the first U-shaped network downsampling layer is +.>，/>Representing the size of the first U-shaped network contracted path pooling convolution, the value of this embodiment is +.>，/>Indicating the first U-shaped network contracted path convolution step, the value of this example is 2,/->Features that indicate that the first U-shaped network shrink path is output to the next layer,representing the first U-shaped network contracted pathiLayer output to jump feature, +.>Representing the processing operations of each layer of the first U-shaped network shrink path.

（3）

（4）

in the method, in the process of the invention,representing the first U-shaped network expansion pathiCharacteristics of layer output->Representing the first U-shaped network expansion pathiThe characteristics of each layer output in front of the layer, < >>Jumping connection to first U-shaped networkiOutput of layer->And->The convolution kernel sizes of convolution and deconvolution in the sampling layer on the first U-shaped network are respectively represented, and the values of the convolution kernel sizes are respectively +.>And->，/>The step length of deconvolution of the expansion path of the first U-shaped network is indicated, and the value of the step length is 2 in the embodiment>Representing the processing operations of each layer of the first U-shaped network expansion path.

And 3.2, building a second U-shaped network.

Expanding the first U-shaped network to output of the first layerBy a +.>Convolution gets +.>Will beAnd inputting the residual error structure into a first layer of a second U-shaped network contraction path, wherein the contraction path of the second U-shaped network is identical to that of the first U-shaped network, and each downsampling layer of the contraction path is provided with a residual error structure comprising two parts of a characteristic extraction branch and a convolution connection branch which are connected in parallel. The feature extraction branch is two continuous convolutions and is used for extracting features from the image, and the convolution connection branch is connected with the input of the convolution connection branch through a convolution with smaller convolution kernel and is used for storing context information. Adding the two branch results to each other>Output to the jump connection, add and maximize the result of pooling downsampling +.>Output to the next layer of the shrink path, the process is expressed as follows:

（5）

in the method, in the process of the invention,representing a second U-shaped network shrink path +.>Characteristics of layer output->The convolution kernel size in the feature extraction branch in the second U-shaped network downsampling layer is represented, and the value of the convolution kernel size in the second U-shaped network downsampling layer is +.>，/>The convolution kernel size in the convolution connection branch in the second U-shaped network downsampling layer is represented, and the value of the convolution kernel size in the second U-shaped network downsampling layer is +.>，/>Representing the size of the second U-shaped network contracted path pooling convolution, the value of this embodiment is +.>，/>Representing the convolution step of the second U-shaped network contracted path, the value of this embodiment is 2,/for this embodiment>Representing the characteristics of the second U-shaped network contracted path output to the next layer, +.>Representing a second U-shaped network contracted pathiCharacteristics of layer output to expansion path, +.>Representing the processing operations of each layer of the second U-shaped network shrink path.

（6）

in the method, in the process of the invention,、/>respectively represent the second U-shaped network expansion pathiLayer and->Characteristics of layer output->Shrink path for second U-shaped networkiOutput of layer->And->The convolution kernel sizes of convolution and deconvolution in the sampling layer on the second U-shaped network are respectively represented, and the values of the convolution kernel sizes are respectively +.>And->，/>The step length of deconvolution of the expansion path of the second U-shaped network is indicated, and the value of the step length is 2 in the embodiment>Representing the processing operations of each layer of the second U-shaped network expansion path.

Step 3.3, setting upThrough one of the two U-shaped networksThe convolution series connection of (3) is used for completing the construction of the cascade U-shaped network fusing the attention mechanisms.

Expanding the second U-shaped network to output of the first layerBy a +.>Convolution gets +.>Then ∈>、/>Add and go through again one +.>And finally, outputting the value to a continuous range of 0-1 by using sigmoid as an activation function, thereby completing the construction of the cascade U-shaped network integrating the attention mechanism.

And 4, inputting training sample data sets with labels into the cascade U-shaped network built in the step 3 in batches, setting a loss function, training and optimizing network parameters by using an Adam gradient descent algorithm, verifying by adopting a verification sample data set, and realizing network convergence after multiple complete training to obtain a trained network model.

Selecting a training sample data set of the multispectral remote sensing image and corresponding sample label data, inputting the training sample data set and the corresponding sample label data into the cascade U-shaped network built in the step 3 in batches, and initializing the network weight in the cascade U-shaped networkForward calculating the output value of each neuron of the convolutional neural network, calculating and outputting the cloud prediction probability value of the training sample, and calculating the mixed convolutional neural networkThe loss function of the complex and back-propagation.

The specific calculation mode of the loss function is as follows:

（7）

in the method, in the process of the invention,a loss value representing a cloud prediction probability value and a tag value,ta mask representing a true cloud of the earth's surface,ythe cloud detection result is represented by a set of images,Nrepresents the total number of real cloud mask pixels, +.>For the weight ratio of recall and precision, the value of this example is 2, < ->Is a minimum value for avoiding division by 0, the value of this embodiment is +.>。

Selecting an Adam optimizer to perform weight optimization, and setting an initial learning rate asApplying learning rate attenuation strategy in training stage, setting attenuation rate to 0.7, and tolerance factor to 15, when learning rate is reduced to +.>At that time, training is completed. Model training and accuracy verification are carried out simultaneously, the accuracy of each training of the model is evaluated by using a verification sample data set, network model parameters are adjusted according to the evaluation accuracy of the verification set, the accuracy of each training of the network model is recorded, and the optimal network parameter model is selected according to the accuracy of the model.

And 5, inputting the test sample data set into the cascade U-shaped network trained in the step 4, calculating the probability that each pixel in the test sample is cloud, and outputting a cloud prediction probability value of the test sample data.

And (3) inputting the test sample data set into the cascade U-shaped network trained in the step (4) in batches, carrying out cloud probability prediction pixel by pixel, and outputting cloud prediction probability values of the batch test sample data. And outputting the probability that each pixel in the test sample data set is cloud after all the test sample data sets are predicted.

Setting cloud probability threshold in this embodimentIs->The cloud prediction probability value in the test sample data set is larger thanAnd (3) regarding the pixels of the image as cloud, binarizing each image according to the threshold, assigning a cloud pixel value of 1 and assigning a non-cloud pixel value of 0, and obtaining a detection result of each image. And selecting small images cut from the same image according to the file name, performing data splicing on the small images to obtain a prediction tag value of the whole scene image, and outputting the prediction tag value as a final cloud detection result.

Selecting 38-Cloud data set for Cloud detection experiment, wherein the data set image contains red, green, blue and near infrared bands, and each view image is cut intoAnd selecting the U-Net and the Cloud-Net networks as comparison methods. Fig. 4 (a) to 4 (e) are graphs comparing the Cloud detection effect of the method of the present invention with that of the U-Net and Cloud-Net networks. Visual effect shows that the cloud detection result obtained by the method provided by the invention is closer to a true value than that obtained by other two main stream cloud detection methods. On quantitative evaluation, the method provided by the invention achieves the following aims at the accuracy, recall rate, cross-over ratio and overall accuracy88.58%, 91.1%, 80.94% and 96.72% higher than U-Net/Cloud-Net by 10.31/7.78, 1.37/1.27, 8.07/8.24, 2.4/1.4% respectively.

Example 2

Based on the same inventive concept, the invention also provides a cascade U-shaped network cloud detection system integrating the attention mechanism, which comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions in the memory to execute the cascade U-shaped network cloud detection method integrating the attention mechanism.

Example 3

Based on the same inventive concept, the invention also provides a cascade U-shaped network cloud detection system integrating the attention mechanism, which comprises a readable storage medium, wherein the readable storage medium is stored with a computer program, and the cascade U-shaped network cloud detection method integrating the attention mechanism is realized when the computer program is executed.

In particular, the method according to the technical solution of the present invention may be implemented by those skilled in the art using computer software technology to implement an automatic operation flow, and a system apparatus for implementing the method, such as a computer readable storage medium storing a corresponding computer program according to the technical solution of the present invention, and a computer device including the operation of the corresponding computer program, should also fall within the protection scope of the present invention.

The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.

Claims

1. A cascade U-shaped network cloud detection method integrating attention mechanisms is characterized by comprising the following steps:

step 1, preprocessing multispectral remote sensing image data;

the cascade U-shaped network integrating the attention mechanism comprises two U-shaped networks, and the two U-shaped networks are connected by using one convolution; each U-shaped network comprises three parts, namely a contracted path, a jump connection and an expanded path, which are respectively used for downsampling to extract image features, saving semantic details and upsampling to recover cloud attributes; the contracted path and the expanded path are connected through a Bridge layer, the Bridge layer comprises two convolutions and a ReLU activation function, and a Dropout parameter is set in the second convolution to prevent overfitting; the first U-shaped network has five layers, and an attention mechanism is added in the jump connection process, and the second U-shaped network has four layers, so that no additional structure is added for reducing the calculated amount;

selecting a training sample data set of the multispectral remote sensing image and corresponding sample label data, inputting the training sample data set and the corresponding sample label data into the cascade U-shaped network built in the step 3 in batches, forward calculating an output value of each neuron of the convolutional neural network, calculating and outputting a cloud prediction probability value of the training sample, calculating a loss function of the hybrid convolutional neural network, and carrying out back propagation, wherein the specific calculation mode of the loss function is as follows:

（7）

in the method, in the process of the invention,a loss value representing a cloud prediction probability value and a tag value,ta mask representing a true cloud of the earth's surface,ythe cloud detection result is represented by a set of images,Nrepresents the total number of real cloud mask pixels, +.>Weight ratio for recall and precision, +.>Is a minimum value for avoiding division by 0;

2. The method for detecting the cascade U-shaped network cloud by fusing attention mechanisms according to claim 1, wherein the method comprises the following steps: the pretreatment in the step 1 comprises the following steps:

（1）

in the method, in the process of the invention,xrepresenting the pixel value of one pixel in the multispectral remote sensing image data,Nthe number of bits of the image is represented,representing the pixel value of the normalized pixel;

3. The method for detecting the cascade U-shaped network cloud by fusing attention mechanisms according to claim 1, wherein the method comprises the following steps: in the step 2, a multispectral remote sensing image cloud detection data set is constructed by utilizing the preprocessed multispectral remote sensing image data, a corresponding sample label set is manufactured, the cloud pixel label corresponds to 1, and the non-cloud pixel label corresponds to 0; the multi-spectrum remote sensing image sample data set and the sample label set are divided into a training set part and a testing set part, wherein the training set part covers different types of land covers and clouds with different shapes, before network training, the training set part is divided into a training sample data set and a verification sample data set, and the testing set part only comprises the testing sample data set.

4. The method for detecting the cascade U-shaped network cloud by fusing attention mechanisms according to claim 1, wherein the method comprises the following steps: step 3, inputting the preprocessed multispectral remote sensing image into a first layer of a first U-shaped network contraction path, setting a residual structure comprising two parts of a feature extraction branch and a convolution connection branch in each downsampling layer of the contraction path, wherein the feature extraction branch and the convolution connection branch are connected in parallel, the feature extraction branch is two continuous convolutions and is used for extracting features from the image, and the convolution connection branch is connected with the input of the first convolution and then is used for storing context information; obtained by adding the two branch resultsOutput to the jump connection, add and maximize the result of pooling downsampling +.>The output to the next layer of the shrink path is represented as follows:

（2）

in the method, in the process of the invention,representing the first U-shaped network shrink path +.>Characteristics of layer output->Indicating the convolution kernel size in the feature extraction branch in the first U-network downsampling layer,/->Indicating the convolution kernel size in the convolution connection branches in the downsampling layer of the first U-network,/>Representing the size of the first U-shaped network contracted path pooling convolution, +.>Indicating the first U-shaped network contracted path convolution step length,/->Representing the characteristics of the first U-shaped network contracted path output to the next layer, +.>Representing the first U-shaped network contracted pathiLayer output to jump feature, +.>Representing the processing operation of each layer of the first U-shaped network contracted path;

employing self-connecting unified featuresAnd utilize the largest dimensionThe feature map of each layer in the pooled unified contraction path sequentially passes through a channel attention module and a space attention module, and the process is expressed as follows:

（3）

in the method, in the process of the invention,is pointed to in the first U-shaped network contracted pathiLayer output to jump feature, +.>Jumping connection to first U-shaped networkiOutput of layer->Representing a processing operation of each layer of the first U-shaped network jump connection;

in each up-sampling layer of the expansion path, the information extracted from each previous layer is up-sampled through deconvolution, and then is spliced and transmitted to a decoder, so that semantic details are restored by utilizing multi-scale features of the image, and the process is expressed as follows:

（4）

in the method, in the process of the invention,representing the first U-shaped network expansion pathiCharacteristics of layer output->Representing the first U-shaped network expansion pathiThe characteristics of each layer output in front of the layer, < >>For the first U-shaped netConnection of the network hopsiOutput of layer->And->The convolution kernel sizes of convolution and deconvolution in the sampling layer on the first U-shaped network are respectively represented,/->Step length of deconvolution of expansion path of first U-shaped network is indicated, < >>Representing the processing operations of each layer of the first U-shaped network expansion path.

5. The method for detecting the cascade U-shaped network cloud with integrated attention mechanism as in claim 4, wherein the method comprises the steps of: step 3, outputting the first layer of the expansion path of the first U-shaped networkBy a convolution to get->Will->The method comprises the steps that the method comprises the steps of inputting the method to a first layer of a second U-shaped network contraction path, wherein the contraction path of the second U-shaped network is identical to that of the first U-shaped network, each downsampling layer of the contraction path is provided with a residual error structure comprising two parts, namely a characteristic extraction branch and a convolution connection branch, the characteristic extraction branch is connected in parallel with the convolution connection branch, the characteristic extraction branch is two continuous convolutions and is used for extracting characteristics from an image, and the convolution connection branch is connected with the input of the first convolution and then used for storing context information; adding the two branch results to each other>Output to the jump connection, add and maximize the result of pooling downsampling +.>The output to the next layer of the shrink path is represented as follows:

（5）

in the method, in the process of the invention,representing a second U-shaped network shrink path +.>Characteristics of layer output->Indicating the convolution kernel size in the feature extraction branch in the downsampling layer of the second U-network,/->Indicating the convolution kernel size in the convolution connection branches in the downsampling layer of the second U-network,/->Representing the size of the second U-shaped network contracted path pooling convolution, +.>Indicating the convolution step length of the second U-shaped network contracted path, +.>Representing the characteristics of the second U-shaped network contracted path output to the next layer, +.>Representation ofSecond U-shaped network contracted pathiCharacteristics of layer output to expansion path, +.>Representing processing operations of each layer of the second U-shaped network shrink path;

（6）

in the method, in the process of the invention,、/>respectively represent the second U-shaped network expansion pathiLayer and->The characteristics of the layer output are such that,shrink path for second U-shaped networkiOutput of layer->And->The convolution kernel sizes of convolution and deconvolution in the sampling layer on the second U-shaped network are denoted, respectively,/->Step length of deconvolution of expansion path of second U-shaped network, +.>Representing the processing operations of each layer of the second U-shaped network expansion path.

6. The method for detecting the cascade U-shaped network cloud with integrated attention mechanism as in claim 5, wherein the method comprises the steps of: step 3, outputting the first layer of the expansion path of the second U-shaped networkBy a convolution to get->Then ∈>、/>Adding and convolving again, and finally outputting the values to a continuous range of 0-1 by using sigmoid as an activation function, thereby completing the construction of the cascade U-shaped network fusing the attention mechanisms.

7. The method for detecting the cascade U-shaped network cloud by fusing attention mechanisms according to claim 1, wherein the method comprises the following steps: step 4, selecting an Adam optimizer to perform weight optimization, setting an initial learning rate, applying a learning rate attenuation strategy in a training stage, and completing training when the learning rate is reduced to a set threshold value; model training and accuracy verification are carried out simultaneously, the accuracy of each training of the model is evaluated by using a verification sample data set, network model parameters are adjusted according to the evaluation accuracy of the verification set, the accuracy of each training of the network model is recorded, and the optimal network parameter model is selected according to the accuracy of the model.

8. The method for detecting the cascade U-shaped network cloud by fusing attention mechanisms according to claim 1, wherein the method comprises the following steps: setting the cloud probability threshold as in step 6The cloud prediction probability value in the test sample data set is greater than +.>And (3) regarding the pixels of the images as cloud, binarizing each image according to the threshold, assigning a value of 1 to cloud pixels and 0 to non-cloud pixels to obtain a detection result of each image, selecting small images cut from the same image according to file names, performing data splicing on the small images to obtain a prediction tag value of the whole image, and outputting the prediction tag value as a final cloud detection result.

9. A cascade U-shaped network cloud detection system with integrated attention mechanism, comprising a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions in the memory to execute a cascade U-shaped network cloud detection method with integrated attention mechanism according to any one of claims 1-8.

10. A cascade U-shaped network cloud detection system incorporating an attention mechanism, comprising a readable storage medium having a computer program stored thereon, which when executed, implements a cascade U-shaped network cloud detection method incorporating an attention mechanism as claimed in any one of claims 1 to 8.