CN115131674A - Multi-temporal optical remote sensing image cloud detection method based on deep low-rank network - Google Patents

Multi-temporal optical remote sensing image cloud detection method based on deep low-rank network Download PDF

Info

Publication number
CN115131674A
CN115131674A CN202210729041.9A CN202210729041A CN115131674A CN 115131674 A CN115131674 A CN 115131674A CN 202210729041 A CN202210729041 A CN 202210729041A CN 115131674 A CN115131674 A CN 115131674A
Authority
CN
China
Prior art keywords
rank
network
cloud
low
temporal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210729041.9A
Other languages
Chinese (zh)
Inventor
张洪艳
黄琪
夏宇
张良培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202210729041.9A priority Critical patent/CN115131674A/en
Publication of CN115131674A publication Critical patent/CN115131674A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Astronomy & Astrophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multi-temporal remote sensing image cloud detection method based on a deep low-rank network, which is characterized in that clear sky non-cloud background features are learned and reconstructed in a data-driven mode by utilizing the physical characteristics of the spatial-temporal spectrum low-rank of multi-temporal remote sensing images, a feature extraction network is built in stages by combining the ideas of twin difference and non-local attention, single-scene spatial spectrum features and multi-temporal difference features are extracted and fused respectively, multi-temporal remote sensing image cloud coverage detection is realized intelligently, and the problems that the existing multi-temporal cloud detection algorithm based on deep learning is poor and interpretability is weak are successfully solved. According to the invention, the depth low-rank network model is applied to multi-temporal Landsat remote sensing image cloud detection, and Landsat-8 remote sensing image experiments show that compared with the existing remote sensing image cloud detection method, the multi-temporal remote sensing image cloud detection method based on the depth low-rank network has the advantages of higher speed and higher precision, and has stronger stability for different scenes and different cloud coverage conditions.

Description

Multi-temporal optical remote sensing image cloud detection method based on deep low-rank network
Technical Field
The invention relates to the field of multi-temporal optical remote sensing image cloud detection, in particular to a depth low-rank network-based automatic cloud detection method for multi-temporal optical remote sensing images, which combines low-rank physical prior of multi-temporal images and feature expression capability of a neural network to realize intelligent multi-temporal remote sensing image cloud coverage detection.
Background
The optical remote sensing image is a necessary means for earth resource detection, ecological environment monitoring and the like, has the excellent characteristics of short imaging period and wide coverage range, can provide rich ground feature information, simultaneously visually reflects the spatial distribution state of the ground features, records the reflection spectrum characteristic of the ground features, has very wide application in various fields of environment monitoring, geological survey, urban planning, disaster assessment and the like, and provides effective support for various important strategic decisions.
Common optical remote sensing satellites comprise Landsat series satellites, Sentinel series satellites, domestic GF series satellites, Quickbird satellites, MODIS satellites, Hyperion satellites and the like, and the existing optical satellites are in a wide variety. However, global cloud data published according to the International Satellite Cloud Climate Program (ISCCP) shows that the coverage of cloud on the earth surface is about 66%, and the problem of cloud pollution in optical remote sensing images is unavoidable and serious. Therefore, a high-precision remote sensing image cloud detection method is developed, which is beneficial to recovering the quality of remote sensing images and improving the utilization rate of the images, and has very important significance and value for the development and application of subsequent remote sensing images in the aspects of population, environment and geography in China.
Aiming at the problem of optical remote sensing image cloud detection, scholars at home and abroad carry out a great deal of research and develop various classic remote sensing image cloud detection algorithms. According to the difference of the number of the used images, the method can be divided into a single-view image cloud detection algorithm and a multi-temporal image cloud detection algorithm. The single-scene image cloud detection algorithm is the most researched cloud detection method with the widest application range at present. The method is used for constructing the cloud detection model by combining the physical mechanism and the statistical characteristic of cloud formation according to different hypothesis models and prior knowledge. In contrast, the basic principle of the multi-temporal cloud detection algorithm is to comprehensively utilize the space-time spectral features of the remote sensing images and detect cloud fog based on the difference information between the multi-temporal images, namely cloud pixels can be identified through the difference between an actual observation value and a non-cloud value based on a non-cloud background. The multi-temporal cloud detection algorithm introduces time dimension information on the basis of a single-scene image, and can generally utilize image information from different times and the same position to generate a more accurate cloud mask.
The existing multi-temporal cloud detection algorithm has made a great development in recent years. Firstly, an iterative cloud and mist optimization transformation algorithm (IHOT) improved based on an HOT algorithm is provided, and the method successfully overcomes the problems of highlight surface and cloud confusion by introducing a cloud-free reference image to carry out regression iteration. In addition, an automatic time series analysis algorithm (ATSA) is proposed for multi-temporal optical image cloud detection, which can detect cloud pixels through a time series cloud index and is suitable for a case where a certain region has low observation frequency without clouds. Further, a multi-temporal integrated cloud z-score algorithm (MTICZ) is proposed for cloud and cloud shadow detection, which detects the location of the cloud by analyzing the target image and the time series of cloud indices. Secondly, a model combining a U-net deep convolutional neural network and a long-term memory network (LSTM) is also provided for time series MSG/SEVIRI image cloud detection, and a high-precision detection result is obtained.
Although a lot of researches are carried out on a multi-temporal optical remote sensing image cloud detection method, a critical problem still exists at present and is not solved. The method mainly comprises the following steps: (1) the traditional multi-temporal cloud detection algorithm depends on a specific threshold, and most of the existing multi-temporal cloud detection algorithm is lack of generality under different underlying surface and cloud type scenes; (2) the deep learning multi-temporal cloud detection algorithm is poor in development, the intrinsic physical characteristics of multi-temporal images are not considered, and the interpretability is poor.
Therefore, the method for detecting the cloud of the multi-temporal remote sensing image based on the deep low-rank network is provided by considering the low-rank characteristic of the space-time spectrum of the multi-temporal image, and is very significant.
Disclosure of Invention
The invention aims to solve the defects of lack and poor interpretability of the existing multi-temporal cloud detection algorithm based on deep learning, and provides a multi-temporal remote sensing image cloud detection method based on a deep low-rank network.
The technical scheme of the invention provides a multi-temporal remote sensing image cloud detection method based on a deep low-rank network, which comprises the following steps:
step 1, extracting the space spectrum characteristics of clouds in an original image to be detected;
step 2, extracting low-rank background features of the multi-temporal image;
step 3, simultaneously inputting the original image and the background characteristics obtained in the step 2 by using a multi-temporal cloud detection network, extracting change information of the original image and the background characteristics by adopting a twin differential structure sharing weight and combining a non-local attention thought, and obtaining cloud characteristics based on temporal changes, namely a multi-temporal change characteristic diagram;
and 4, fusing the space spectrum characteristics of the cloud in the step 1 and the cloud characteristics based on time phase change in the step 3 to predict a cloud mask, training the network according to the loss function until the training is converged, and optimizing all parameters of the network to obtain a final prediction graph.
Moreover, in the step 1, the space spectrum characteristics of the cloud in the original image to be detected are extracted by designing a monoscopic cloud detection network, and the specific implementation mode is as follows;
record a given original monoscopic image as
Figure BDA0003712133750000021
Where w and h represent the length and width of the image and c represents the number of bands. The input of the single-scene cloud detection network is
Figure BDA0003712133750000022
The output is
Figure BDA0003712133750000023
f represents the channel dimension of the output feature map. The single-scene cloud detection network mainly comprises a contraction path and an expansion path. The system comprises a contraction path, a plurality of convolution layers and a plurality of convolution kernels, wherein the contraction path is formed by stacking six coding modules, the coding modules are divided into a left branch and a right branch, the left branch is a feature extraction branch, 2D convolution is carried out on the left branch and the right branch based on two 3 x 3 convolution kernels to automatically extract features, and a ReLU activation function is usually connected behind each convolution layer to carry out nonlinear mapping on an original function; the right branch is a context-preserving branch, the most important feature information is extracted by using 1x1 2D convolution, and the backward propagation of the gradient flow is accelerated by adopting an identity mapping operation while the feature information of a low level is preserved. After all features are aggregated, down-sampling of the features is finally achieved using 2 x 2 max pooling operations, and network parameters are reduced to avoid over-fitting. By the encoder module, the multi-scale features of the image can be automatically extracted in the network training process. The expansion path is formed by stacking five decoding modules, the size of an original image is gradually restored based on one deconvolution operation in the decoding modules, then the feature information fused with a low layer and a high layer is extracted by adopting two convolution operations, and the quick connection operation is adopted to help the network to store and utilize the context information learned from an early layer, so that the network can capture more cloud features, and meanwhile, the training process is accelerated by preventing the gradient disappearance phenomenon during the network back propagation. Based on the decoding module, the network can fuse multi-level features while stably training, and gradually recover the size of the original image. In addition, the jump connection between the contraction branch and the expansion branch of the single-scene cloud detection network can realize the fusion of the shallow feature and the deep feature, and is beneficial to generating a more accurate cloud mask. In the encoding module and the decoding module, the convolution operation and the nonlinear activation operation of the core are expressed as shown in formula (1):
Figure BDA0003712133750000031
wherein, F in And F out A profile representing the input and output. K is the convolution kernel and B is the bias matrix. Symbol denotes a convolution operation. max (0,) denotes a non-linear activation operation.
In step 2, the low-rank feature extraction module is designed to extract the low-rank background features of the multi-temporal images, and the input of the low-rank feature extraction module is the multi-temporal images and is recorded as the multi-temporal images
Figure BDA0003712133750000032
t represents the number of time phases, and the output of the low-rank feature extraction module is a low-rank feature map with the same size as the input, namely a low-rank feature background map with sparse noise removed. Firstly, stacking a plurality of 2D convolution layers together, carrying out 8-time down-sampling on the channel dimension, namely, the rank reduction ratio of the low-rank feature extraction module is 8, and carrying out dimension transformation to obtain a matrix
Figure BDA0003712133750000033
Where k is the size of the upper rank limit. Then, the same convolution operation and dimensionality transformation operation are used, and matrix transposition operation is carried out to obtain the matrix
Figure BDA0003712133750000034
Also for the original matrix
Figure BDA0003712133750000035
After 1-by-1 convolution and dimension transformation, obtaining
Figure BDA0003712133750000036
By passing
Figure BDA0003712133750000037
And with
Figure BDA0003712133750000038
Multiplying the matrix to obtain a matrix
Figure BDA0003712133750000039
Considering the boundless of convolution operation, U is set in the module (i) ={u (i,1) ,...u (i,wh) },
Figure BDA00037121337500000310
And V (i) ={v (i,1) ,...v (i,k) },
Figure BDA00037121337500000311
Normalization is performed, and stability of network training is facilitated. To reserve v (i,j) As semantic meaning of the base, use l 2 Norm normalization preserves its directionality, as shown by the following equation:
Figure BDA0003712133750000041
wherein ε is set to 10 -6 The dividend is avoided to be 0. For coefficient matrix u (i,j) Normalization using the softmax function yields the corresponding probability for each basis as shown below:
Figure BDA0003712133750000042
finally, the low rank features may be passed through a matrix
Figure BDA0003712133750000043
And
Figure BDA0003712133750000044
the product of (a) and (b). After the low-rank characteristic is subjected to matrix dimension change, a reconstructed low-rank characteristic diagram is obtained
Figure BDA0003712133750000045
Finally, through 1-to-1 convolution, the reconstructed clear sky cloud-free background characteristics can be obtained
Figure BDA0003712133750000046
In step 3, the low-rank background feature map obtained by the low-rank feature extraction module is used
Figure BDA0003712133750000047
And the feature map of the original image
Figure BDA0003712133750000048
The difference information is input into a multi-temporal cloud detection network based on a twin differential structure, and differential feature information is extracted, namely the difference between an original image and a low-rank background feature map is compared, and a cloud mask is extracted from the change information. The method comprises the following specific steps:
and extracting features of the original image and the low-rank background features through a convolution layer and a pooling layer which are shared by weights respectively, and carrying out differential operation on feature maps with the same size extracted from each layer. And then, carrying out up-sampling from the deepest differential feature map based on a reverse convolution block, and connecting the up-sampled feature map with the previous differential feature map, so that the original image size is gradually restored by using the multi-scale differential semantic features, and meanwhile, the deep semantic features of the network and the detail information of the shallow layer are reserved. In addition, a global context module (GCB) is introduced behind three deep-layer differential feature maps, and the GCB mainly has the function of capturing the relationship between long-distance pixel points and breaks through the limitation that the traditional convolution operation only has a local receptive field. For the input of the feature graph with the size of C x H x W, the GCB module is specifically operated by firstly adopting 1x1 convolution and a softmax function to obtain attention weights and then performing attention pooling to obtain global context features with the size of C x 1; then, 1x1 convolution is adopted to carry out feature conversion to obtain inter-channel dependence, in order to reduce parameter quantity, a bottleneck form is adopted to carry out feature conversion, and optimization difficulty is increased in the process, so that layer normalization (layer normalization) is added in front of a ReLU activation function, optimization difficulty is reduced, and network generalization is improved as a regular form; and finally, aggregating the global context features with the size of C1 to the features of each position for global context modeling. Finally the network outputs a multi-temporal variation characteristic diagram
Figure BDA0003712133750000049
In step 4, the input is a multi-temporal difference feature map
Figure BDA00037121337500000410
And monoscopic space spectrum feature map
Figure BDA00037121337500000411
Finally, the final prediction graph is obtained by the cascade convolution operation and the sigmoid activation function
Figure BDA00037121337500000412
In addition, the method adopts binary cross entropy loss to optimize all network parameters in the steps 1-4, and the loss expression is shown as the following formula.
Figure BDA0003712133750000051
Wherein N represents the number of pixels,
Figure BDA0003712133750000052
the real label representing the ith pixel takes the value of 0 or 1, wherein 0 represents a non-cloud pixel, 1 represents a cloud pixel, and p represents (i) Representing the probability that the ith pixel predicted by the network is a cloud pixel. And saving the model when the loss is optimized to the lowest state.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a multi-temporal remote sensing image cloud detection method based on a deep low-rank network, which solves the problems that most of the existing multi-temporal cloud detection methods rely on physical thresholds and have no universality, and the multi-temporal methods based on deep learning are few and lack of interpretability. Compared with the existing remote sensing image cloud detection method, the depth low-rank network-based multi-temporal remote sensing image cloud detection method is higher in speed and precision, and has stronger robustness and stability for different scenes and different cloud coverage conditions. Therefore, the multi-temporal remote sensing image cloud detection method based on the deep low-rank network not only has very important academic value, but also has important practical significance.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a diagram of a monocular cloud detection network of the present invention;
FIG. 3 is a block diagram of the encoding modules in the monoscopic cloud detection network of the present invention;
FIG. 4 is a block diagram of a decoding module in the monocular cloud detection network of the present invention;
FIG. 5 is a block diagram of the low rank feature extraction module of the present invention;
fig. 6 is a diagram of a multi-temporal cloud detection network architecture of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly understood, a deep low rank network cloud detection method according to an embodiment of the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The invention provides a multi-temporal remote sensing image cloud detection method based on a deep learning, aiming at the problems of lack of multi-temporal cloud detection algorithm and weak interpretability based on the deep learning.
The present invention may be implemented using computer software technology. The specific steps of the cloud detection method for the image depth low-rank network according to the embodiment are described in detail below with reference to fig. 1.
Step 1, designing a monoscopic cloud detection network, and inputting an original image to be detected to extract the space spectrum characteristics of the cloud as shown in FIGS. 2-4;
step 2, designing a low-rank feature extraction module, inputting the multi-temporal image into the module as shown in fig. 5, and learning the low-rank background features of the multi-temporal image in a network based on the decomposition and reconstruction process of the matrix;
step 3, designing a multi-temporal cloud detection network, as shown in fig. 6, inputting the original image and the background features obtained in step 2 at the same time, extracting change information of the original image and the background features by adopting a twin differential structure sharing weight and combining a non-local attention thought, and obtaining cloud features based on temporal changes;
and 4, fusing the space spectrum characteristics of the cloud in the step 1 and the cloud characteristics based on time phase change in the step 3 to predict a cloud mask, training the network according to the loss function until the training is converged, and optimizing all parameters of the network to obtain a final prediction graph.
In step 1, the original monoscopic image is recorded as
Figure BDA0003712133750000061
Where w and h represent the length and width of the image and c represents the number of bands. The input of the single-scene cloud detection network is
Figure BDA0003712133750000062
The output is
Figure BDA0003712133750000063
f represents the channel dimension of the output feature map. The single-view cloud detection network mainly comprises a contraction path and an expansion path. The contraction path is formed by stacking six coding modules, each coding module is divided into a left branch and a right branch, the left branch is a feature extraction branch, 2D convolution is carried out on the left branch based on two convolution kernels of 3 x 3 to automatically extract features, and each convolution layer is usually connected with a ReLU activation function to carry out nonlinear mapping on an original function; the right branch is a context-preserving branch, and the most important features are extracted by 2D convolution with 1x1Characterizing information and using an identity mapping operation to accelerate the backward propagation of the gradient flow while retaining low-level feature information. After all features are aggregated, down-sampling of the features is finally achieved using 2 x 2 max pooling operations, and network parameters are reduced to avoid over-fitting. By the encoder module, the multi-scale features of the image can be automatically extracted in the network training process. The expansion path is formed by stacking five decoding modules, the size of an original image is gradually restored based on one deconvolution operation in the decoding modules, then the feature information fused with a low layer and a high layer is extracted by adopting two convolution operations, and the quick connection operation is adopted to help the network to store and utilize the context information learned from an early layer, so that the network can capture more cloud features, and meanwhile, the training process is accelerated by preventing the gradient disappearance phenomenon during the network back propagation. Based on the decoding module, the network can fuse multi-level features while stably training, and gradually recover the size of the original image. In addition, the jump connection between the contraction branch and the expansion branch of the single-scene cloud detection network can realize the fusion of the shallow feature and the deep feature, and is beneficial to generating a more accurate cloud mask. In the encoding module and the decoding module, the convolution operation and the nonlinear activation operation of the core are specifically expressed as shown in formula (1):
Figure BDA0003712133750000071
wherein, F in And F out A profile representing the inputs and outputs. K is the convolution kernel and B is the bias matrix. Symbol denotes a convolution operation. max (0,) denotes a non-linear activation operation.
In the embodiment, the image is Landsat data, the size of the image is 512 × 512, and the image includes 4 bands, that is, red, green, blue, and near infrared bands. Final output space spectrum characteristic of single-scene cloud detection network
Figure BDA0003712133750000072
In step 2, the input of the low-rank feature extraction module is multi-timePhoto image, as
Figure BDA0003712133750000073
t represents the number of time phases, and the output of the low-rank feature extraction module is a low-rank feature map with the same size as the input, namely a low-rank feature background map with sparse noise removed. Firstly, stacking a plurality of 2D convolution layers together, carrying out 8-time down-sampling on the channel dimension, namely, the rank reduction ratio of the low-rank feature extraction module is 8, and carrying out dimension transformation to obtain a matrix
Figure BDA0003712133750000074
Where k is the size of the upper rank limit. Then, the same convolution operation and dimensionality transformation operation are used, and matrix transposition operation is carried out to obtain the matrix
Figure BDA0003712133750000075
Also for the original matrix
Figure BDA0003712133750000076
Obtained by 1x1 convolution and dimension transformation
Figure BDA0003712133750000077
By passing
Figure BDA0003712133750000078
And
Figure BDA0003712133750000079
multiplying the matrix to obtain a matrix
Figure BDA00037121337500000710
Considering the boundless of convolution operation, U is added to the module (i) ={u (i,1) ,...u (i,wh) },
Figure BDA00037121337500000711
And V (i) ={v (i,1) ,...v (i,k) },
Figure BDA00037121337500000712
Normalization is performed, and stability of network training is facilitated. To reserve v (i,j) As semantic meaning of the basis, use l 2 Norm normalization preserves its directionality, as shown by the following equation:
Figure BDA00037121337500000713
wherein ε is set to 10 -6 The dividend is avoided to be 0. For coefficient matrix u (i,j) Normalization using the softmax function yields the corresponding probability for each basis as shown below:
Figure BDA00037121337500000714
finally, the low rank features may be passed through a matrix
Figure BDA00037121337500000715
And
Figure BDA00037121337500000716
the product of (a) and (b). After the low-rank characteristic is subjected to matrix dimension change, a reconstructed low-rank characteristic diagram is obtained
Figure BDA00037121337500000717
Finally, through 1-to-1 convolution, the reconstructed clear sky cloud-free background characteristics can be obtained
Figure BDA00037121337500000718
The multi-phase image in the embodiment includes 16 phases, i.e. t is 16. Input image block
Figure BDA0003712133750000081
After the rank reduction operation is performed on the feature, the rank k is set to 8. In addition, ε is set to 10 -6 The dividend is avoided to be 0. Finally outputting clear sky non-cloud background characteristics
Figure BDA0003712133750000082
Consistent with the original image dimensions.
In step 3, the low-rank background feature map obtained by the low-rank feature extraction module
Figure BDA0003712133750000083
And the feature map of the original image
Figure BDA0003712133750000084
The difference information is input into a twin differential structure-based multi-temporal cloud detection network together to extract differential feature information, namely, the difference between an original image and a low-rank background feature map is compared, and a cloud mask is extracted from the change information. The method comprises the following specific steps:
the original image and the low-rank background feature are respectively subjected to weight sharing of the convolution layer and the pooling layer to extract features, and differential operation is carried out on feature maps with the same size extracted from all layers. And then, carrying out up-sampling from the deepest differential feature map based on a reverse convolution block, and connecting the up-sampled feature map with the previous differential feature map, so that the original image size is gradually restored by using the multi-scale differential semantic features, and meanwhile, the deep semantic features of the network and the detail information of the shallow layer are reserved. In addition, a global context module (GCB) is introduced behind three deep-layer differential feature maps, and the GCB mainly has the function of capturing the relationship between long-distance pixel points and breaks through the limitation that the traditional convolution operation only has a local receptive field. For the input of the feature graph with the size of C x H x W, the GCB module is specifically operated by firstly adopting 1x1 convolution and a softmax function to obtain an attention weight value and then performing attention pooling to obtain a global context feature with the size of C x 1; then, 1x1 convolution is adopted to carry out feature conversion to obtain inter-channel dependence, in order to reduce parameter quantity, a bottleneck form is adopted to carry out feature conversion, and optimization difficulty is increased in the process, so that layer normalization (layer normalization) is added in front of a ReLU activation function, optimization difficulty is reduced, and network generalization is improved as a regular form; and finally, aggregating the global context features with the size of C1 x1 to the features of each position for global context modeling. Finally the network outputs moreTime phase change characteristic diagram
Figure BDA0003712133750000085
Simultaneous input of raw images in embodiments
Figure BDA0003712133750000086
And background features
Figure BDA0003712133750000087
Performing characteristic difference through twin network weight sharing, and finally outputting a multi-temporal change characteristic diagram
Figure BDA0003712133750000088
And monoscopic space-spectral features
Figure BDA0003712133750000089
The dimensions remain consistent.
In step 4, the input is a multi-temporal differential feature
Figure BDA00037121337500000810
And monoscopic space-spectrum features
Figure BDA00037121337500000811
Finally, the final prediction graph is obtained by the cascade convolution operation and the sigmoid activation function
Figure BDA00037121337500000812
In addition, the method adopts binary cross entropy loss to optimize all network parameters in the steps 1-4, and the loss expression is shown as the following formula.
Figure BDA00037121337500000813
Wherein N represents the number of pixels,
Figure BDA0003712133750000091
the real label representing the ith pixel takes the value of 0 or1, where 0 represents a non-cloud pixel, 1 represents a cloud pixel, p (i) The probability that the ith pixel predicted by the network is a cloud pixel is represented. The model is saved when the loss is optimized to the lowest state.
The input characteristics in this embodiment are respectively
Figure BDA0003712133750000092
And
Figure BDA0003712133750000093
finally outputting a binary prediction graph
Figure BDA0003712133750000094
The whole network training is realized under a Keras deep learning framework, the batch size in the network training process is set to be 4, the adopted optimization method is an Adam gradient descent method, the initial learning rate of model training is set to be 1e-4, a self-adaptive learning rate attenuation strategy is adopted in the training process, the attenuation rate is 0.7, the heart-tolerant factor is 15, and the strategy is continued until the learning rate reaches the value of 1 e-8. In the implementation, a person skilled in the art can adjust the hyper-parameters of the network according to the specific image used.
The embodiment of the invention adopts the Landsat remote sensing image, but is not limited to the Landsat remote sensing image. For other multi-temporal optical remote sensing images, the method has wide universality and is less limited by objective factors no matter the intensity of cloud pollution and the spatial resolution or the number of wave bands of the image. The Landsat-8 remote sensing image experiment result shows that the overall accuracy of the method reaches 96.4%, the kappa coefficient is 0.93, the false detection rate of the highlight ground surface is effectively reduced compared with classical cloud detection algorithms such as Fmak, the cloud detection accuracy is improved, and multi-scene universal intelligent cloud detection is realized.
It is to be noted and understood that various modifications and improvements can be made to the invention described in detail above without departing from the spirit and scope of the invention as claimed in the appended claims. Accordingly, the scope of the claimed subject matter is not limited by any of the specific exemplary teachings provided.

Claims (8)

1. The multi-temporal remote sensing image cloud detection method based on the deep low-rank network is characterized by comprising the following steps:
step 1, extracting the space spectrum characteristics of clouds in an original image to be detected;
step 2, extracting low-rank background features of the multi-temporal image;
step 3, simultaneously inputting the original image and the background characteristics obtained in the step 2 by using a multi-temporal cloud detection network, extracting change information of the original image and the background characteristics by adopting a twin differential structure sharing weight and combining a non-local attention thought, and obtaining cloud characteristics based on temporal changes, namely a multi-temporal change characteristic diagram;
and 4, fusing the space spectrum characteristics of the cloud in the step 1 and the cloud characteristics based on time phase change in the step 3 to predict a cloud mask, training the network according to the loss function until the training is converged, and optimizing all parameters of the network to obtain a final prediction graph.
2. The deep low-rank network-based multi-temporal remote sensing image cloud detection method according to claim 1, characterized in that: in the step 1, a monoscopic cloud detection network is designed to extract the space spectrum characteristics of clouds in an original image to be detected, and the specific implementation mode is as follows;
record the given original monoscopic image as
Figure FDA0003712133740000011
Wherein w and h represent the length and width of the image, c represents the number of wave bands, and the input of the monoscopic cloud detection network is
Figure FDA0003712133740000012
The output is
Figure FDA0003712133740000013
f represents the channel dimension of the output feature map;
the single scene cloud detection network is composed of a contraction path and an expansion path, wherein the contraction path is formed by stacking six coding modules, each coding module is divided into a left branch and a right branch, the left branch is a feature extraction branch, 2D convolution is carried out on the left branch and the right branch based on two convolution kernels of 3 x 3 to automatically extract features, and a ReLU activation function is generally connected behind each convolution layer to carry out nonlinear mapping on an original function; the right branch is a context retention branch, the most important feature information is extracted by using 1x1 2D convolution, the constant mapping operation is adopted to accelerate the backward propagation of the gradient flow and retain the feature information of a low level, all the features are aggregated, and finally 2 x 2 maximal pooling operation is used to realize the down-sampling of the features, and network parameters are reduced to avoid over-fitting; automatically extracting multi-scale features of the image through an encoder module;
the expansion path is formed by stacking five decoding modules, the size of an original image is gradually restored based on one deconvolution operation in the decoding modules, then the feature information fused with a low layer and a high layer is extracted by adopting two convolution operations, and the context information learned from an early layer is saved and utilized by adopting a quick connection operation to help the network to capture more cloud features, and meanwhile, the training process is accelerated by preventing the gradient disappearance phenomenon from occurring during the network back propagation; based on the decoding module, the network can fuse multi-level features while stably training, and gradually recover the size of the original image.
3. The deep low-rank network-based multi-temporal remote sensing image cloud detection method according to claim 2, characterized in that: the jump connection between the contraction branch and the expansion branch of the single-scene cloud detection network can realize the fusion of shallow features and deep features, and is beneficial to generating more accurate cloud masks; in the encoding module and the decoding module, the convolution operation and the nonlinear activation operation of the core are expressed as shown in formula (1):
Figure FDA0003712133740000021
wherein, F in And F out Feature maps representing input and output, K being the convolution kernelB is a bias matrix, the symbol denotes the convolution operation, and max (0,) denotes the nonlinear activation operation.
4. The deep low-rank network-based multi-temporal remote sensing image cloud detection method according to claim 1, characterized in that: in step 2, a low-rank feature extraction module is designed to extract low-rank background features of the multi-temporal images, and the input of the low-rank feature extraction module is the multi-temporal images and is recorded as the multi-temporal images
Figure FDA0003712133740000022
W and h represent the length and width of an image, c represents the number of wave bands, t represents the number of time phases, and the output of the low-rank feature extraction module is a low-rank feature map with the same size as the input, namely a low-rank feature background map with sparse noise removed; the processing procedure for designing the low-rank characteristic extraction module is as follows;
firstly, stacking a plurality of 2D convolution layers together, carrying out 8-time down-sampling on the channel dimension, namely, the rank reduction ratio of the low-rank feature extraction module is 8, and carrying out dimension transformation to obtain a matrix
Figure FDA0003712133740000023
Wherein k is the size of the upper limit rank; then, the same convolution operation and dimensionality transformation operation are used, and matrix transposition operation is carried out to obtain the matrix
Figure FDA0003712133740000024
Also for the original matrix
Figure FDA0003712133740000025
Obtained by 1x1 convolution and dimension transformation
Figure FDA0003712133740000026
By passing
Figure FDA0003712133740000027
And with
Figure FDA0003712133740000028
Multiplying the matrix to obtain a matrix
Figure FDA0003712133740000029
Considering the boundless of convolution operation, the low rank feature extraction module will be used
Figure FDA00037121337400000210
And
Figure FDA00037121337400000211
are normalized to respectively obtain
Figure FDA00037121337400000212
And
Figure FDA00037121337400000213
the stability of network training is facilitated; finally, the low rank features may be passed through a matrix
Figure FDA00037121337400000214
And
Figure FDA00037121337400000215
the product of (a) and (b). After the low-rank characteristic is subjected to matrix dimension change, a reconstructed low-rank characteristic diagram is obtained
Figure FDA00037121337400000216
Finally, through 1-to-1 convolution, the reconstructed clear sky cloud-free background characteristics can be obtained
Figure FDA00037121337400000217
5. The multi-temporal remote sensing image cloud detection method based on the deep low-rank network according to claim 4, characterized in that: to reserve v (i,j) As semantic meaning of the basis, use l 2 Norm goes into itThe line normalization preserves its directionality as shown by the following equation:
Figure FDA00037121337400000218
wherein ε is set to 10 -6 Avoid dividend being 0; for coefficient matrix u (i,j) Normalization using the softmax function yields the corresponding probability for each basis as shown below:
Figure FDA0003712133740000031
6. the deep low-rank network-based multi-temporal remote sensing image cloud detection method according to claim 1, characterized in that: in step 3, the low-rank background feature map obtained by the low-rank feature extraction module
Figure FDA0003712133740000032
And the feature map of the original image
Figure FDA0003712133740000033
Inputting the two images into a multi-temporal cloud detection network based on a twin differential structure to extract differential feature information, namely comparing the difference between an original image and a low-rank background feature map, and extracting a cloud mask from the variation information, wherein w and h represent the length and width of the image, and c represents the number of wave bands; the method comprises the following specific steps:
extracting features of the original image and the low-rank background features through a weight-shared convolution layer and a pooling layer respectively, and carrying out differential operation on feature maps with the same size extracted from each layer; then, carrying out up-sampling based on a reverse convolution block from the deepest differential feature map, and connecting the up-sampled feature map with the previous layer of differential feature map, thereby gradually recovering the size of the original image by using multi-scale differential semantic features, and simultaneously retaining the semantic features of the deep network layer and the detail information of the shallow network layer;
a global context module GCB is introduced behind three deep differential feature maps and is used for capturing the relationship between long-distance pixel points and breaking through the limitation that the traditional convolution operation only has a local receptive field; for the input of the feature graph with the size of C x H x W, the GCB module is specifically operated by firstly adopting 1x1 convolution and a softmax function to obtain attention weights and then performing attention pooling to obtain global context features with the size of C x 1; then, 1x1 convolution is adopted to carry out feature conversion to obtain inter-channel dependence, in order to reduce parameter quantity, a bottleneck form is adopted to carry out feature conversion, and optimization difficulty is increased in the process, so that layer normalization is added in front of a ReLU activation function, optimization difficulty is reduced, and network generalization is improved as a regular form; finally, aggregating the global context features with the size of C1 to the features of each position for modeling the global context; finally outputting a multi-temporal change characteristic diagram by a multi-temporal cloud detection network
Figure FDA0003712133740000034
f represents the channel dimension of the feature map.
7. The multi-temporal remote sensing image cloud detection method based on the deep low-rank network according to claim 1, characterized in that: in step 4, the space spectrum characteristics of the cloud in the step 1 and the cloud characteristics based on time phase change in the step 3 are fused, and finally the final prediction graph is obtained through the cascade convolution operation and the sigmoid activation function
Figure FDA0003712133740000035
Where w and h represent the length and width of the image.
8. The deep low-rank network-based multi-temporal remote sensing image cloud detection method according to claim 1, characterized in that: and (3) optimizing all network parameters in the steps 1-4 by adopting binary cross entropy loss, wherein the loss expression is shown as the following formula.
Figure FDA0003712133740000036
Wherein N represents the number of pixels,
Figure FDA0003712133740000037
the real label representing the ith pixel takes the value of 0 or 1, wherein 0 represents a non-cloud pixel, 1 represents a cloud pixel, and p represents (i) And representing the probability that the ith pixel predicted by the network is the cloud pixel, and saving the model when the loss is optimized to be the lowest state.
CN202210729041.9A 2022-06-24 2022-06-24 Multi-temporal optical remote sensing image cloud detection method based on deep low-rank network Pending CN115131674A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210729041.9A CN115131674A (en) 2022-06-24 2022-06-24 Multi-temporal optical remote sensing image cloud detection method based on deep low-rank network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210729041.9A CN115131674A (en) 2022-06-24 2022-06-24 Multi-temporal optical remote sensing image cloud detection method based on deep low-rank network

Publications (1)

Publication Number Publication Date
CN115131674A true CN115131674A (en) 2022-09-30

Family

ID=83380410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210729041.9A Pending CN115131674A (en) 2022-06-24 2022-06-24 Multi-temporal optical remote sensing image cloud detection method based on deep low-rank network

Country Status (1)

Country Link
CN (1) CN115131674A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731243A (en) * 2022-11-29 2023-03-03 北京长木谷医疗科技有限公司 Spine image segmentation method and device based on artificial intelligence and attention mechanism
CN115984714A (en) * 2023-03-21 2023-04-18 山东科技大学 Cloud detection method based on double-branch network model
CN116245757A (en) * 2023-02-08 2023-06-09 北京艾尔思时代科技有限公司 Multi-scene universal remote sensing image cloud restoration method and system for multi-mode data
CN116612333A (en) * 2023-07-17 2023-08-18 山东大学 Medical hyperspectral image classification method based on rapid full convolution network
CN117372702A (en) * 2023-12-08 2024-01-09 江西师范大学 Cloud layer removing method and device combining self-supervision deep learning and model method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115731243A (en) * 2022-11-29 2023-03-03 北京长木谷医疗科技有限公司 Spine image segmentation method and device based on artificial intelligence and attention mechanism
CN115731243B (en) * 2022-11-29 2024-02-09 北京长木谷医疗科技股份有限公司 Spine image segmentation method and device based on artificial intelligence and attention mechanism
CN116245757A (en) * 2023-02-08 2023-06-09 北京艾尔思时代科技有限公司 Multi-scene universal remote sensing image cloud restoration method and system for multi-mode data
CN116245757B (en) * 2023-02-08 2023-09-19 北京艾尔思时代科技有限公司 Multi-scene universal remote sensing image cloud restoration method and system for multi-mode data
CN115984714A (en) * 2023-03-21 2023-04-18 山东科技大学 Cloud detection method based on double-branch network model
CN115984714B (en) * 2023-03-21 2023-05-23 山东科技大学 Cloud detection method based on dual-branch network model
CN116612333A (en) * 2023-07-17 2023-08-18 山东大学 Medical hyperspectral image classification method based on rapid full convolution network
CN116612333B (en) * 2023-07-17 2023-09-29 山东大学 Medical hyperspectral image classification method based on rapid full convolution network
CN117372702A (en) * 2023-12-08 2024-01-09 江西师范大学 Cloud layer removing method and device combining self-supervision deep learning and model method
CN117372702B (en) * 2023-12-08 2024-02-06 江西师范大学 Cloud layer removing method and device combining self-supervision deep learning and model method

Similar Documents

Publication Publication Date Title
CN115131674A (en) Multi-temporal optical remote sensing image cloud detection method based on deep low-rank network
JP6395158B2 (en) How to semantically label acquired images of a scene
CN112668494A (en) Small sample change detection method based on multi-scale feature extraction
CN111126202A (en) Optical remote sensing image target detection method based on void feature pyramid network
CN113065594A (en) Road network extraction method and device based on Beidou data and remote sensing image fusion
CN116797787B (en) Remote sensing image semantic segmentation method based on cross-modal fusion and graph neural network
CN112818920B (en) Double-temporal hyperspectral image space spectrum joint change detection method
CN114092815B (en) Remote sensing intelligent extraction method for large-range photovoltaic power generation facility
CN113435253A (en) Multi-source image combined urban area ground surface coverage classification method
CN114494821A (en) Remote sensing image cloud detection method based on feature multi-scale perception and self-adaptive aggregation
CN116206306A (en) Inter-category characterization contrast driven graph roll point cloud semantic annotation method
Zhao et al. CNN-based large scale landsat image classification
Zhou et al. Joint frequency-spatial domain network for remote sensing optical image change detection
Tian et al. Semantic segmentation of remote sensing image based on GAN and FCN network model
Zhu et al. Computer image analysis for various shading factors segmentation in forest canopy using convolutional neural networks
Bendre et al. Natural disaster analytics using high resolution satellite images
CN116958800A (en) Remote sensing image change detection method based on hierarchical attention residual unet++
CN113158901A (en) Domain-adaptive pedestrian re-identification method
Norelyaqine et al. Architecture of Deep Convolutional Encoder‐Decoder Networks for Building Footprint Semantic Segmentation
CN112396126A (en) Target detection method and system based on detection of main stem and local feature optimization
Zhang et al. Fine‐Grained Guided Model Fusion Network with Attention Mechanism for Infrared Small Target Segmentation
Chang et al. A Triple-Branch Hybrid Attention Network With Bitemporal Feature Joint Refinement For Remote Sensing Image Semantic Change Detection
CN117173579B (en) Image change detection method based on fusion of inherent features and multistage features
Subhashini et al. A hybrid optimal technique for road extraction using entropy rate super-pixel segmentation and probabilistic neural networks
AlMarzooqi et al. Increase the exploitation of mars satellite images via deep learning techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination