CN115131674A

CN115131674A - Multi-temporal optical remote sensing image cloud detection method based on deep low-rank network

Info

Publication number: CN115131674A
Application number: CN202210729041.9A
Authority: CN
Inventors: 张洪艳; 黄琪; 夏宇; 张良培
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2022-09-30

Abstract

The invention provides a multi-temporal remote sensing image cloud detection method based on a deep low-rank network, which is characterized in that clear sky non-cloud background features are learned and reconstructed in a data-driven mode by utilizing the physical characteristics of the spatial-temporal spectrum low-rank of multi-temporal remote sensing images, a feature extraction network is built in stages by combining the ideas of twin difference and non-local attention, single-scene spatial spectrum features and multi-temporal difference features are extracted and fused respectively, multi-temporal remote sensing image cloud coverage detection is realized intelligently, and the problems that the existing multi-temporal cloud detection algorithm based on deep learning is poor and interpretability is weak are successfully solved. According to the invention, the depth low-rank network model is applied to multi-temporal Landsat remote sensing image cloud detection, and Landsat-8 remote sensing image experiments show that compared with the existing remote sensing image cloud detection method, the multi-temporal remote sensing image cloud detection method based on the depth low-rank network has the advantages of higher speed and higher precision, and has stronger stability for different scenes and different cloud coverage conditions.

Description

Multi-temporal optical remote sensing image cloud detection method based on deep low-rank network

Technical Field

The invention relates to the field of multi-temporal optical remote sensing image cloud detection, in particular to a depth low-rank network-based automatic cloud detection method for multi-temporal optical remote sensing images, which combines low-rank physical prior of multi-temporal images and feature expression capability of a neural network to realize intelligent multi-temporal remote sensing image cloud coverage detection.

Background

The optical remote sensing image is a necessary means for earth resource detection, ecological environment monitoring and the like, has the excellent characteristics of short imaging period and wide coverage range, can provide rich ground feature information, simultaneously visually reflects the spatial distribution state of the ground features, records the reflection spectrum characteristic of the ground features, has very wide application in various fields of environment monitoring, geological survey, urban planning, disaster assessment and the like, and provides effective support for various important strategic decisions.

Common optical remote sensing satellites comprise Landsat series satellites, Sentinel series satellites, domestic GF series satellites, Quickbird satellites, MODIS satellites, Hyperion satellites and the like, and the existing optical satellites are in a wide variety. However, global cloud data published according to the International Satellite Cloud Climate Program (ISCCP) shows that the coverage of cloud on the earth surface is about 66%, and the problem of cloud pollution in optical remote sensing images is unavoidable and serious. Therefore, a high-precision remote sensing image cloud detection method is developed, which is beneficial to recovering the quality of remote sensing images and improving the utilization rate of the images, and has very important significance and value for the development and application of subsequent remote sensing images in the aspects of population, environment and geography in China.

Aiming at the problem of optical remote sensing image cloud detection, scholars at home and abroad carry out a great deal of research and develop various classic remote sensing image cloud detection algorithms. According to the difference of the number of the used images, the method can be divided into a single-view image cloud detection algorithm and a multi-temporal image cloud detection algorithm. The single-scene image cloud detection algorithm is the most researched cloud detection method with the widest application range at present. The method is used for constructing the cloud detection model by combining the physical mechanism and the statistical characteristic of cloud formation according to different hypothesis models and prior knowledge. In contrast, the basic principle of the multi-temporal cloud detection algorithm is to comprehensively utilize the space-time spectral features of the remote sensing images and detect cloud fog based on the difference information between the multi-temporal images, namely cloud pixels can be identified through the difference between an actual observation value and a non-cloud value based on a non-cloud background. The multi-temporal cloud detection algorithm introduces time dimension information on the basis of a single-scene image, and can generally utilize image information from different times and the same position to generate a more accurate cloud mask.

The existing multi-temporal cloud detection algorithm has made a great development in recent years. Firstly, an iterative cloud and mist optimization transformation algorithm (IHOT) improved based on an HOT algorithm is provided, and the method successfully overcomes the problems of highlight surface and cloud confusion by introducing a cloud-free reference image to carry out regression iteration. In addition, an automatic time series analysis algorithm (ATSA) is proposed for multi-temporal optical image cloud detection, which can detect cloud pixels through a time series cloud index and is suitable for a case where a certain region has low observation frequency without clouds. Further, a multi-temporal integrated cloud z-score algorithm (MTICZ) is proposed for cloud and cloud shadow detection, which detects the location of the cloud by analyzing the target image and the time series of cloud indices. Secondly, a model combining a U-net deep convolutional neural network and a long-term memory network (LSTM) is also provided for time series MSG/SEVIRI image cloud detection, and a high-precision detection result is obtained.

Although a lot of researches are carried out on a multi-temporal optical remote sensing image cloud detection method, a critical problem still exists at present and is not solved. The method mainly comprises the following steps: (1) the traditional multi-temporal cloud detection algorithm depends on a specific threshold, and most of the existing multi-temporal cloud detection algorithm is lack of generality under different underlying surface and cloud type scenes; (2) the deep learning multi-temporal cloud detection algorithm is poor in development, the intrinsic physical characteristics of multi-temporal images are not considered, and the interpretability is poor.

Therefore, the method for detecting the cloud of the multi-temporal remote sensing image based on the deep low-rank network is provided by considering the low-rank characteristic of the space-time spectrum of the multi-temporal image, and is very significant.

Disclosure of Invention

The invention aims to solve the defects of lack and poor interpretability of the existing multi-temporal cloud detection algorithm based on deep learning, and provides a multi-temporal remote sensing image cloud detection method based on a deep low-rank network.

The technical scheme of the invention provides a multi-temporal remote sensing image cloud detection method based on a deep low-rank network, which comprises the following steps:

step 1, extracting the space spectrum characteristics of clouds in an original image to be detected;

step 2, extracting low-rank background features of the multi-temporal image;

step 3, simultaneously inputting the original image and the background characteristics obtained in the step 2 by using a multi-temporal cloud detection network, extracting change information of the original image and the background characteristics by adopting a twin differential structure sharing weight and combining a non-local attention thought, and obtaining cloud characteristics based on temporal changes, namely a multi-temporal change characteristic diagram;

and 4, fusing the space spectrum characteristics of the cloud in the step 1 and the cloud characteristics based on time phase change in the step 3 to predict a cloud mask, training the network according to the loss function until the training is converged, and optimizing all parameters of the network to obtain a final prediction graph.

Moreover, in the step 1, the space spectrum characteristics of the cloud in the original image to be detected are extracted by designing a monoscopic cloud detection network, and the specific implementation mode is as follows;

record a given original monoscopic image as

Where w and h represent the length and width of the image and c represents the number of bands. The input of the single-scene cloud detection network is

The output is

f represents the channel dimension of the output feature map. The single-scene cloud detection network mainly comprises a contraction path and an expansion path. The system comprises a contraction path, a plurality of convolution layers and a plurality of convolution kernels, wherein the contraction path is formed by stacking six coding modules, the coding modules are divided into a left branch and a right branch, the left branch is a feature extraction branch, 2D convolution is carried out on the left branch and the right branch based on two 3 x 3 convolution kernels to automatically extract features, and a ReLU activation function is usually connected behind each convolution layer to carry out nonlinear mapping on an original function; the right branch is a context-preserving branch, the most important feature information is extracted by using 1x1 2D convolution, and the backward propagation of the gradient flow is accelerated by adopting an identity mapping operation while the feature information of a low level is preserved. After all features are aggregated, down-sampling of the features is finally achieved using 2 x 2 max pooling operations, and network parameters are reduced to avoid over-fitting. By the encoder module, the multi-scale features of the image can be automatically extracted in the network training process. The expansion path is formed by stacking five decoding modules, the size of an original image is gradually restored based on one deconvolution operation in the decoding modules, then the feature information fused with a low layer and a high layer is extracted by adopting two convolution operations, and the quick connection operation is adopted to help the network to store and utilize the context information learned from an early layer, so that the network can capture more cloud features, and meanwhile, the training process is accelerated by preventing the gradient disappearance phenomenon during the network back propagation. Based on the decoding module, the network can fuse multi-level features while stably training, and gradually recover the size of the original image. In addition, the jump connection between the contraction branch and the expansion branch of the single-scene cloud detection network can realize the fusion of the shallow feature and the deep feature, and is beneficial to generating a more accurate cloud mask. In the encoding module and the decoding module, the convolution operation and the nonlinear activation operation of the core are expressed as shown in formula (1):

wherein, F _in And F _out A profile representing the input and output. K is the convolution kernel and B is the bias matrix. Symbol denotes a convolution operation. max (0,) denotes a non-linear activation operation.

In step 2, the low-rank feature extraction module is designed to extract the low-rank background features of the multi-temporal images, and the input of the low-rank feature extraction module is the multi-temporal images and is recorded as the multi-temporal images

t represents the number of time phases, and the output of the low-rank feature extraction module is a low-rank feature map with the same size as the input, namely a low-rank feature background map with sparse noise removed. Firstly, stacking a plurality of 2D convolution layers together, carrying out 8-time down-sampling on the channel dimension, namely, the rank reduction ratio of the low-rank feature extraction module is 8, and carrying out dimension transformation to obtain a matrix

Where k is the size of the upper rank limit. Then, the same convolution operation and dimensionality transformation operation are used, and matrix transposition operation is carried out to obtain the matrix

Also for the original matrix

After 1-by-1 convolution and dimension transformation, obtaining

By passing

And with

Multiplying the matrix to obtain a matrix

Considering the boundless of convolution operation, U is set in the module ⁽ⁱ⁾ ＝{u ^(i,1) ,...u ^(i,wh) },

And V ⁽ⁱ⁾ ＝{v ^(i,1) ,...v ^(i,k) },

Normalization is performed, and stability of network training is facilitated. To reserve v ^(i,j) As semantic meaning of the base, use l ₂ Norm normalization preserves its directionality, as shown by the following equation:

wherein ε is set to 10 ^-6 The dividend is avoided to be 0. For coefficient matrix u ^(i,j) Normalization using the softmax function yields the corresponding probability for each basis as shown below:

finally, the low rank features may be passed through a matrix

And

the product of (a) and (b). After the low-rank characteristic is subjected to matrix dimension change, a reconstructed low-rank characteristic diagram is obtained

Finally, through 1-to-1 convolution, the reconstructed clear sky cloud-free background characteristics can be obtained

In step 3, the low-rank background feature map obtained by the low-rank feature extraction module is used

And the feature map of the original image

The difference information is input into a multi-temporal cloud detection network based on a twin differential structure, and differential feature information is extracted, namely the difference between an original image and a low-rank background feature map is compared, and a cloud mask is extracted from the change information. The method comprises the following specific steps:

and extracting features of the original image and the low-rank background features through a convolution layer and a pooling layer which are shared by weights respectively, and carrying out differential operation on feature maps with the same size extracted from each layer. And then, carrying out up-sampling from the deepest differential feature map based on a reverse convolution block, and connecting the up-sampled feature map with the previous differential feature map, so that the original image size is gradually restored by using the multi-scale differential semantic features, and meanwhile, the deep semantic features of the network and the detail information of the shallow layer are reserved. In addition, a global context module (GCB) is introduced behind three deep-layer differential feature maps, and the GCB mainly has the function of capturing the relationship between long-distance pixel points and breaks through the limitation that the traditional convolution operation only has a local receptive field. For the input of the feature graph with the size of C x H x W, the GCB module is specifically operated by firstly adopting 1x1 convolution and a softmax function to obtain attention weights and then performing attention pooling to obtain global context features with the size of C x 1; then, 1x1 convolution is adopted to carry out feature conversion to obtain inter-channel dependence, in order to reduce parameter quantity, a bottleneck form is adopted to carry out feature conversion, and optimization difficulty is increased in the process, so that layer normalization (layer normalization) is added in front of a ReLU activation function, optimization difficulty is reduced, and network generalization is improved as a regular form; and finally, aggregating the global context features with the size of C1 to the features of each position for global context modeling. Finally the network outputs a multi-temporal variation characteristic diagram

In step 4, the input is a multi-temporal difference feature map

And monoscopic space spectrum feature map

Finally, the final prediction graph is obtained by the cascade convolution operation and the sigmoid activation function

In addition, the method adopts binary cross entropy loss to optimize all network parameters in the steps 1-4, and the loss expression is shown as the following formula.

Wherein N represents the number of pixels,

the real label representing the ith pixel takes the value of 0 or 1, wherein 0 represents a non-cloud pixel, 1 represents a cloud pixel, and p represents ⁽ⁱ⁾ Representing the probability that the ith pixel predicted by the network is a cloud pixel. And saving the model when the loss is optimized to the lowest state.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a multi-temporal remote sensing image cloud detection method based on a deep low-rank network, which solves the problems that most of the existing multi-temporal cloud detection methods rely on physical thresholds and have no universality, and the multi-temporal methods based on deep learning are few and lack of interpretability. Compared with the existing remote sensing image cloud detection method, the depth low-rank network-based multi-temporal remote sensing image cloud detection method is higher in speed and precision, and has stronger robustness and stability for different scenes and different cloud coverage conditions. Therefore, the multi-temporal remote sensing image cloud detection method based on the deep low-rank network not only has very important academic value, but also has important practical significance.

Drawings

FIG. 1 is a general flow diagram of the present invention;

FIG. 2 is a diagram of a monocular cloud detection network of the present invention;

FIG. 3 is a block diagram of the encoding modules in the monoscopic cloud detection network of the present invention;

FIG. 4 is a block diagram of a decoding module in the monocular cloud detection network of the present invention;

FIG. 5 is a block diagram of the low rank feature extraction module of the present invention;

fig. 6 is a diagram of a multi-temporal cloud detection network architecture of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clearly understood, a deep low rank network cloud detection method according to an embodiment of the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

The invention provides a multi-temporal remote sensing image cloud detection method based on a deep learning, aiming at the problems of lack of multi-temporal cloud detection algorithm and weak interpretability based on the deep learning.

The present invention may be implemented using computer software technology. The specific steps of the cloud detection method for the image depth low-rank network according to the embodiment are described in detail below with reference to fig. 1.

Step 1, designing a monoscopic cloud detection network, and inputting an original image to be detected to extract the space spectrum characteristics of the cloud as shown in FIGS. 2-4;

step 2, designing a low-rank feature extraction module, inputting the multi-temporal image into the module as shown in fig. 5, and learning the low-rank background features of the multi-temporal image in a network based on the decomposition and reconstruction process of the matrix;

step 3, designing a multi-temporal cloud detection network, as shown in fig. 6, inputting the original image and the background features obtained in step 2 at the same time, extracting change information of the original image and the background features by adopting a twin differential structure sharing weight and combining a non-local attention thought, and obtaining cloud features based on temporal changes;

In step 1, the original monoscopic image is recorded as

The output is

f represents the channel dimension of the output feature map. The single-view cloud detection network mainly comprises a contraction path and an expansion path. The contraction path is formed by stacking six coding modules, each coding module is divided into a left branch and a right branch, the left branch is a feature extraction branch, 2D convolution is carried out on the left branch based on two convolution kernels of 3 x 3 to automatically extract features, and each convolution layer is usually connected with a ReLU activation function to carry out nonlinear mapping on an original function; the right branch is a context-preserving branch, and the most important features are extracted by 2D convolution with 1x1Characterizing information and using an identity mapping operation to accelerate the backward propagation of the gradient flow while retaining low-level feature information. After all features are aggregated, down-sampling of the features is finally achieved using 2 x 2 max pooling operations, and network parameters are reduced to avoid over-fitting. By the encoder module, the multi-scale features of the image can be automatically extracted in the network training process. The expansion path is formed by stacking five decoding modules, the size of an original image is gradually restored based on one deconvolution operation in the decoding modules, then the feature information fused with a low layer and a high layer is extracted by adopting two convolution operations, and the quick connection operation is adopted to help the network to store and utilize the context information learned from an early layer, so that the network can capture more cloud features, and meanwhile, the training process is accelerated by preventing the gradient disappearance phenomenon during the network back propagation. Based on the decoding module, the network can fuse multi-level features while stably training, and gradually recover the size of the original image. In addition, the jump connection between the contraction branch and the expansion branch of the single-scene cloud detection network can realize the fusion of the shallow feature and the deep feature, and is beneficial to generating a more accurate cloud mask. In the encoding module and the decoding module, the convolution operation and the nonlinear activation operation of the core are specifically expressed as shown in formula (1):

wherein, F _in And F _out A profile representing the inputs and outputs. K is the convolution kernel and B is the bias matrix. Symbol denotes a convolution operation. max (0,) denotes a non-linear activation operation.

In the embodiment, the image is Landsat data, the size of the image is 512 × 512, and the image includes 4 bands, that is, red, green, blue, and near infrared bands. Final output space spectrum characteristic of single-scene cloud detection network

In step 2, the input of the low-rank feature extraction module is multi-timePhoto image, as

Also for the original matrix

Obtained by 1x1 convolution and dimension transformation

By passing

And

multiplying the matrix to obtain a matrix

Considering the boundless of convolution operation, U is added to the module ⁽ⁱ⁾ ＝{u ^(i,1) ,...u ^(i,wh) },

And V ⁽ⁱ⁾ ＝{v ^(i,1) ,...v ^(i,k) },

Normalization is performed, and stability of network training is facilitated. To reserve v ^(i,j) As semantic meaning of the basis, use l ₂ Norm normalization preserves its directionality, as shown by the following equation:

finally, the low rank features may be passed through a matrix

And

The multi-phase image in the embodiment includes 16 phases, i.e. t is 16. Input image block

After the rank reduction operation is performed on the feature, the rank k is set to 8. In addition, ε is set to 10 ^-6 The dividend is avoided to be 0. Finally outputting clear sky non-cloud background characteristics

Consistent with the original image dimensions.

In step 3, the low-rank background feature map obtained by the low-rank feature extraction module

And the feature map of the original image

The difference information is input into a twin differential structure-based multi-temporal cloud detection network together to extract differential feature information, namely, the difference between an original image and a low-rank background feature map is compared, and a cloud mask is extracted from the change information. The method comprises the following specific steps:

the original image and the low-rank background feature are respectively subjected to weight sharing of the convolution layer and the pooling layer to extract features, and differential operation is carried out on feature maps with the same size extracted from all layers. And then, carrying out up-sampling from the deepest differential feature map based on a reverse convolution block, and connecting the up-sampled feature map with the previous differential feature map, so that the original image size is gradually restored by using the multi-scale differential semantic features, and meanwhile, the deep semantic features of the network and the detail information of the shallow layer are reserved. In addition, a global context module (GCB) is introduced behind three deep-layer differential feature maps, and the GCB mainly has the function of capturing the relationship between long-distance pixel points and breaks through the limitation that the traditional convolution operation only has a local receptive field. For the input of the feature graph with the size of C x H x W, the GCB module is specifically operated by firstly adopting 1x1 convolution and a softmax function to obtain an attention weight value and then performing attention pooling to obtain a global context feature with the size of C x 1; then, 1x1 convolution is adopted to carry out feature conversion to obtain inter-channel dependence, in order to reduce parameter quantity, a bottleneck form is adopted to carry out feature conversion, and optimization difficulty is increased in the process, so that layer normalization (layer normalization) is added in front of a ReLU activation function, optimization difficulty is reduced, and network generalization is improved as a regular form; and finally, aggregating the global context features with the size of C1 x1 to the features of each position for global context modeling. Finally the network outputs moreTime phase change characteristic diagram

Simultaneous input of raw images in embodiments

And background features

Performing characteristic difference through twin network weight sharing, and finally outputting a multi-temporal change characteristic diagram

And monoscopic space-spectral features

The dimensions remain consistent.

In step 4, the input is a multi-temporal differential feature

And monoscopic space-spectrum features

Wherein N represents the number of pixels,

the real label representing the ith pixel takes the value of 0 or1, where 0 represents a non-cloud pixel, 1 represents a cloud pixel, p ⁽ⁱ⁾ The probability that the ith pixel predicted by the network is a cloud pixel is represented. The model is saved when the loss is optimized to the lowest state.

The input characteristics in this embodiment are respectively

And

finally outputting a binary prediction graph

The whole network training is realized under a Keras deep learning framework, the batch size in the network training process is set to be 4, the adopted optimization method is an Adam gradient descent method, the initial learning rate of model training is set to be 1e-4, a self-adaptive learning rate attenuation strategy is adopted in the training process, the attenuation rate is 0.7, the heart-tolerant factor is 15, and the strategy is continued until the learning rate reaches the value of 1 e-8. In the implementation, a person skilled in the art can adjust the hyper-parameters of the network according to the specific image used.

The embodiment of the invention adopts the Landsat remote sensing image, but is not limited to the Landsat remote sensing image. For other multi-temporal optical remote sensing images, the method has wide universality and is less limited by objective factors no matter the intensity of cloud pollution and the spatial resolution or the number of wave bands of the image. The Landsat-8 remote sensing image experiment result shows that the overall accuracy of the method reaches 96.4%, the kappa coefficient is 0.93, the false detection rate of the highlight ground surface is effectively reduced compared with classical cloud detection algorithms such as Fmak, the cloud detection accuracy is improved, and multi-scene universal intelligent cloud detection is realized.

It is to be noted and understood that various modifications and improvements can be made to the invention described in detail above without departing from the spirit and scope of the invention as claimed in the appended claims. Accordingly, the scope of the claimed subject matter is not limited by any of the specific exemplary teachings provided.

Claims

1. The multi-temporal remote sensing image cloud detection method based on the deep low-rank network is characterized by comprising the following steps:

step 2, extracting low-rank background features of the multi-temporal image;

2. The deep low-rank network-based multi-temporal remote sensing image cloud detection method according to claim 1, characterized in that: in the step 1, a monoscopic cloud detection network is designed to extract the space spectrum characteristics of clouds in an original image to be detected, and the specific implementation mode is as follows;

record the given original monoscopic image as

Wherein w and h represent the length and width of the image, c represents the number of wave bands, and the input of the monoscopic cloud detection network is

The output is

f represents the channel dimension of the output feature map;

the single scene cloud detection network is composed of a contraction path and an expansion path, wherein the contraction path is formed by stacking six coding modules, each coding module is divided into a left branch and a right branch, the left branch is a feature extraction branch, 2D convolution is carried out on the left branch and the right branch based on two convolution kernels of 3 x 3 to automatically extract features, and a ReLU activation function is generally connected behind each convolution layer to carry out nonlinear mapping on an original function; the right branch is a context retention branch, the most important feature information is extracted by using 1x1 2D convolution, the constant mapping operation is adopted to accelerate the backward propagation of the gradient flow and retain the feature information of a low level, all the features are aggregated, and finally 2 x 2 maximal pooling operation is used to realize the down-sampling of the features, and network parameters are reduced to avoid over-fitting; automatically extracting multi-scale features of the image through an encoder module;

the expansion path is formed by stacking five decoding modules, the size of an original image is gradually restored based on one deconvolution operation in the decoding modules, then the feature information fused with a low layer and a high layer is extracted by adopting two convolution operations, and the context information learned from an early layer is saved and utilized by adopting a quick connection operation to help the network to capture more cloud features, and meanwhile, the training process is accelerated by preventing the gradient disappearance phenomenon from occurring during the network back propagation; based on the decoding module, the network can fuse multi-level features while stably training, and gradually recover the size of the original image.

3. The deep low-rank network-based multi-temporal remote sensing image cloud detection method according to claim 2, characterized in that: the jump connection between the contraction branch and the expansion branch of the single-scene cloud detection network can realize the fusion of shallow features and deep features, and is beneficial to generating more accurate cloud masks; in the encoding module and the decoding module, the convolution operation and the nonlinear activation operation of the core are expressed as shown in formula (1):

wherein, F _in And F _out Feature maps representing input and output, K being the convolution kernelB is a bias matrix, the symbol denotes the convolution operation, and max (0,) denotes the nonlinear activation operation.

4. The deep low-rank network-based multi-temporal remote sensing image cloud detection method according to claim 1, characterized in that: in step 2, a low-rank feature extraction module is designed to extract low-rank background features of the multi-temporal images, and the input of the low-rank feature extraction module is the multi-temporal images and is recorded as the multi-temporal images

W and h represent the length and width of an image, c represents the number of wave bands, t represents the number of time phases, and the output of the low-rank feature extraction module is a low-rank feature map with the same size as the input, namely a low-rank feature background map with sparse noise removed; the processing procedure for designing the low-rank characteristic extraction module is as follows;

firstly, stacking a plurality of 2D convolution layers together, carrying out 8-time down-sampling on the channel dimension, namely, the rank reduction ratio of the low-rank feature extraction module is 8, and carrying out dimension transformation to obtain a matrix

Wherein k is the size of the upper limit rank; then, the same convolution operation and dimensionality transformation operation are used, and matrix transposition operation is carried out to obtain the matrix

Also for the original matrix

Obtained by 1x1 convolution and dimension transformation

By passing

And with

Multiplying the matrix to obtain a matrix

Considering the boundless of convolution operation, the low rank feature extraction module will be used

And

are normalized to respectively obtain

And

the stability of network training is facilitated; finally, the low rank features may be passed through a matrix

And

5. The multi-temporal remote sensing image cloud detection method based on the deep low-rank network according to claim 4, characterized in that: to reserve v ^(i,j) As semantic meaning of the basis, use l ₂ Norm goes into itThe line normalization preserves its directionality as shown by the following equation:

wherein ε is set to 10 ^-6 Avoid dividend being 0; for coefficient matrix u ^(i,j) Normalization using the softmax function yields the corresponding probability for each basis as shown below:

6. the deep low-rank network-based multi-temporal remote sensing image cloud detection method according to claim 1, characterized in that: in step 3, the low-rank background feature map obtained by the low-rank feature extraction module

And the feature map of the original image

Inputting the two images into a multi-temporal cloud detection network based on a twin differential structure to extract differential feature information, namely comparing the difference between an original image and a low-rank background feature map, and extracting a cloud mask from the variation information, wherein w and h represent the length and width of the image, and c represents the number of wave bands; the method comprises the following specific steps:

extracting features of the original image and the low-rank background features through a weight-shared convolution layer and a pooling layer respectively, and carrying out differential operation on feature maps with the same size extracted from each layer; then, carrying out up-sampling based on a reverse convolution block from the deepest differential feature map, and connecting the up-sampled feature map with the previous layer of differential feature map, thereby gradually recovering the size of the original image by using multi-scale differential semantic features, and simultaneously retaining the semantic features of the deep network layer and the detail information of the shallow network layer;

a global context module GCB is introduced behind three deep differential feature maps and is used for capturing the relationship between long-distance pixel points and breaking through the limitation that the traditional convolution operation only has a local receptive field; for the input of the feature graph with the size of C x H x W, the GCB module is specifically operated by firstly adopting 1x1 convolution and a softmax function to obtain attention weights and then performing attention pooling to obtain global context features with the size of C x 1; then, 1x1 convolution is adopted to carry out feature conversion to obtain inter-channel dependence, in order to reduce parameter quantity, a bottleneck form is adopted to carry out feature conversion, and optimization difficulty is increased in the process, so that layer normalization is added in front of a ReLU activation function, optimization difficulty is reduced, and network generalization is improved as a regular form; finally, aggregating the global context features with the size of C1 to the features of each position for modeling the global context; finally outputting a multi-temporal change characteristic diagram by a multi-temporal cloud detection network

f represents the channel dimension of the feature map.

7. The multi-temporal remote sensing image cloud detection method based on the deep low-rank network according to claim 1, characterized in that: in step 4, the space spectrum characteristics of the cloud in the step 1 and the cloud characteristics based on time phase change in the step 3 are fused, and finally the final prediction graph is obtained through the cascade convolution operation and the sigmoid activation function

Where w and h represent the length and width of the image.

8. The deep low-rank network-based multi-temporal remote sensing image cloud detection method according to claim 1, characterized in that: and (3) optimizing all network parameters in the steps 1-4 by adopting binary cross entropy loss, wherein the loss expression is shown as the following formula.

Wherein N represents the number of pixels,

the real label representing the ith pixel takes the value of 0 or 1, wherein 0 represents a non-cloud pixel, 1 represents a cloud pixel, and p represents ⁽ⁱ⁾ And representing the probability that the ith pixel predicted by the network is the cloud pixel, and saving the model when the loss is optimized to be the lowest state.