CN111798461B

CN111798461B - Pixel-level remote sensing image cloud area detection method for guiding deep learning by coarse-grained label

Info

Publication number: CN111798461B
Application number: CN202010563344.9A
Authority: CN
Inventors: 李彦胜; 陈蔚; 张永军
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2022-04-01
Anticipated expiration: 2040-06-19
Also published as: CN111798461A

Abstract

The invention discloses a pixel-level remote sensing image cloud area detection method based on deep learning, which is based on the realistic problem that a cloud area detection method based on deep learning needs large cost of marking, and comprises the steps of firstly training a depth network model with good robustness and generating a cloud activation map by using easily-obtained image block-level labels and combining a local pooling layer pruning strategy and a global convolution pooling layer under the constraint of a remote sensing image block data set of the coarse-grained image block-level labels, and then obtaining a final cloud mask map through threshold segmentation. The invention can greatly reduce the marking work, simultaneously realize the pixel-level accurate cloud area detection of the remote sensing image, and effectively improve the efficiency and the performance of the cloud area detection of the remote sensing image.

Description

Pixel-level remote sensing image cloud area detection method for guiding deep learning by coarse-grained label

Technical Field

The invention belongs to the technical field of remote sensing and artificial intelligence, relates to a remote sensing image cloud area detection method for deep learning, and particularly relates to a pixel-level remote sensing image cloud area detection method for deep learning guided by coarse-grained labels.

Background

Cloud region detection is a key problem in remote sensing image interpretation and application, and a large amount of cloud cover influences usability of remote sensing image data and increases difficulty in remote sensing image interpretation. Cloud region detection detects a cloud-containing region in a remote sensing image through various methods in the fields of remote sensing and computer vision, and saves transmission bandwidth and storage space and reduces waste of resources by not issuing images containing more clouds on the aspect of on-satellite application; and on the aspect of ground application, data preparation is provided for applications such as cloud removal and image recovery, subsequent large-range continuous mapping, dynamic monitoring based on remote sensing images and the like.

In recent years, a large number of cloud region detection methods based on artificial structural features have been proposed in academia. With the continuous development of the related art, the method based on deep learning is also used for solving the cloud area detection problem in a large quantity. In general, the performance of a remote sensing image cloud area detection task can be remarkably improved by the deep learning-based method. However, the superior performance of deep learning relies on a large number of accurate pixel-level labels, and the labeling of labels is time-consuming and labor-intensive. Considering that different types of satellites often have great difference in spectral and spatial resolutions, for each remote sensing satellite image, a deep learning-based method needs a corresponding pixel-level labeling data set to operate, and further needs a great deal of manpower for labeling. Therefore, it is of great significance to explore an advanced cloud area detection method capable of reducing the labeling workload.

Disclosure of Invention

The invention provides a pixel-level remote sensing image cloud area detection method based on deep learning, which aims to solve the problem that a large amount of marking cost is needed in a cloud area detection method based on deep learning.

The technical scheme adopted by the invention is as follows: a pixel-level remote sensing image cloud region detection method for deep learning guided by coarse-grained labels comprises the following steps:

step 1: inputting a remote sensing image data set D { (b) with a coarse-grained label_n，y_n) 1,2, …, N, where b is_nRepresenting the nth remote sensing image block, y, of the data set D_nThe coarse-grained remote sensing image block level label corresponding to the nth remote sensing image block is represented, and N represents the total number of the remote sensing image blocks in the data set D;

step 2: learning hyper-parameters of a deep convolutional network model on the data set D in the step 1, wherein the hyper-parameters comprise a convolution weight C, a global convolution pooling weight G and a cloud activation weight W;

and step 3: inputting an image I needing cloud area detection, and calculating a cloud activation map MI of the image I by using the I, the convolution weight C, the global convolution pooling weight G and the cloud activation weight W in the step 2;

and 4, step 4: cloud of image I in step 3Activation map M^IPerforming threshold segmentation to calculate the cloud mask image S of the image I^I；

Further, y in step 1_nHas two forms, y_n＝[1，0]Representing the nth remote sensing image block b in the data set D_nThe label of (A) is cloud-containing, y_n＝[0，1]Representing the nth remote sensing image block b in the data set D_nThe tag of (a) is cloud-free;

further, the specific implementation of step 2 includes the following sub-steps:

step 2.1: one remote sensing image block b of the data set D in the step 1_nInputting the feature map into a deep convolutional network model, and outputting a feature map according to the following formula:

wherein f is_nAn output feature map representing the last convolutional layer in the deep convolutional network model.

Representing the integral representation of operations such as convolution, activation operation and the like in the deep convolutional network model, and C represents the convolution weight in the deep convolutional network model;

step 2.2: for step 2.1 f_nThe global convolution pooling is carried out on each channel, and the activation value of each channel is calculated, wherein the formula is as follows:

wherein the content of the first and second substances,

denotes f_nThe (c) th channel of (a),

represents

Activation value, G, at the k channel after global convolution pooling^KE G represents the weight of the global convolution pooling weight G at the kth channel,

represents f_nPerforming spatial convolution operation with each channel of G;

step 2.3: the hyper-parameters C, G, W of the deep convolutional network model are learned using a cross-entropy loss function based on softmax. The formula is as follows:

wherein the content of the first and second substances,

representing that the cloud activation weight value is in the kth channel, and the weight value of the category c is; class c corresponds to b_nLabel of (a), y_n＝[0，1]When c is 1, y_n＝[1，0]When c is 0; d represents the total number of channels.

Step 2.4: and (3) repeating the step 2.1 to the step 2.3 for each remote sensing image block in the data set D until all data participate in deep convolutional network model training, repeating iteration for 10 times until the network converges, and acquiring a deep convolutional network model and a hyper-parameter thereof: a convolution weight C, a global convolution pooling weight G and a cloud activation weight W;

further, the specific implementation of step 3 includes the following sub-steps:

step 3.1: inputting an image I needing cloud region detection, and cutting the image I into overlapped image blocks { a ] by using a sliding window algorithm₁，a₂，…，a_m}；

Step 3.2: for a certain image block a in step 3.1, inputting the certain image block a into the deep convolutional network model in step 2.4 to output a characteristic map f, wherein the formula is as follows:

wherein f represents the output characteristic diagram of the last convolution layer in the deep convolution network model.

Representing the overall representation of operations such as convolution, activation operations and the like in the deep convolutional network model;

step 3.3: for the kth band f of the output feature f in step 3.2^kCalculating the adjusted feature map T^kThe formula is as follows:

wherein, T^kIs the k-th band f^kAnd (5) adjusting the characteristic diagram.

Is calculated using f in equation 2.1^kActivation value at k channel, G^KE G represents the weight of the global convolution pooling weight G obtained in the step 2.4 in the k channel, tau (f)^k) Is f^kA statistical value of (a), including a mean or median;

step 3.4: repeating the steps 3.2 and 3.3 for each channel of the image block a in the step 3.2, and calculating a cloud activation map M^aThe formula is as follows:

wherein the content of the first and second substances,

k is 1,2, …, d represents the value of the cloud activation weight in the k channel;

step 3.5: for all image blocks { a ] in step 3.1₁，a₂，…，a_mRepeating the steps 3.2, 3.3 and 3.4, and calculating the cloud activation corresponding to each image blockDrawing (A)

The cloud activation image M of the image I can be obtained after all the cloud activation images are spliced^I。

Further, the specific implementation of step 4 includes the following sub-steps:

step 4.1: for all image blocks b in the data set D in step 1 without clouds₁ ^-，b₂ ^-，…，b_t ^-And calculating a cloud activation image corresponding to each image block

Calculate out

The mean μ and standard deviation σ of;

step 4.2: calculating a segmentation threshold h by using the mean value mu and the standard deviation sigma of the cloud activation map in the step 4.1, wherein the formula is as follows:

h ═ μ + kxoσ (formula seven);

wherein k is a coefficient;

step 4.3: cloud activation map M of video I using segmentation threshold h calculated in step 4.2^IPerforming threshold segmentation to calculate the cloud mask image S of the image I^IThe formula is as follows:

wherein, (i, j) is a cloud activation map M^IOr cloud mask map S^IThe horizontal and vertical coordinates of (1);

furthermore, the network structure of the deep convolutional network model contains 10 convolutional layers, 1 global convolutional pooling layer, 1 full-link layer and 1 softmax classification layer.

Furthermore, the size of a convolution kernel in the convolution layer is 3 × 3, the sliding step of convolution is 1 × 1, and a ReLU nonlinear active layer is connected behind the convolution layer.

Further, the window size of the global convolution pooling layer is 230 × 230.

Further, in step 4.2, k is 0.7.

The invention has the following advantages: the local pooling layer pruning strategy used in the method can greatly improve the resolution of the network output characteristic diagram, and can be applied to other tasks with high requirements on the resolution of the characteristic diagram, such as small target detection and the like; the global pooling convolutional layer used in the method can better extract the spatial variation information in the feature map, thereby improving the quality of the output feature map. Compared with the existing cloud detection method, the method can complete the training of the depth network only by the image block-level label, and greatly reduces the required labeling cost while realizing accurate pixel-level cloud detection.

Drawings

FIG. 1 is a schematic diagram of a deep convolutional network structure according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a global convolutional pooling operation according to an embodiment of the present invention; wherein (a) is a forward propagation process and (b) is a backward propagation process.

FIG. 3 is a schematic diagram of a cloud activation map generation process according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a process of generating a cloud activation map for a test image according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a cloud detection result generation process according to an embodiment of the present invention; the method comprises the following steps of (a) obtaining an original image, (b) obtaining a reference cloud area image, (c) obtaining a cloud activation image, and (d) obtaining a cloud mask image.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

The invention provides a remote sensing image scene classification method based on fault-tolerant deep learning, which comprises the following steps of:

step 1: inputting remote sensing image data set with coarse-grained labelD＝{(b_n，y_n) 1,2, …, N, where b is_nRepresenting the nth remote sensing image block in the data set D; y is_nCoarse-grained remote-sensing image block-level label, y, corresponding to nth remote-sensing image block_nHas two forms, y_n＝[1，0]Representing the nth remote sensing image block b in the data set D_nThe label of (A) is cloud-containing, y_n＝[0，1]Representing the nth remote sensing image block b in the data set D_nThe tag of (a) is cloud-free; n represents the total number of remotely sensed image blocks in data set D.

Step 2: and (3) learning hyper-parameters of the deep convolutional network model on the data set D in the step 1, wherein the hyper-parameters comprise a convolution weight C, a global convolution pooling weight G and a cloud activation weight W. The method specifically comprises the following substeps:

please see fig. 1, step 2.1: one remote sensing image block b of the data set D in the step 1_nInputting the feature map into a deep convolutional network model, and outputting a feature map according to the following formula:

Represents the overall representation of operations such as convolution, activation operations and the like in the deep convolutional network model, and C represents the convolution weight in the deep convolutional network model.

Please see fig. 2, step 2.2: for step 2.1 f_nThe global convolution pooling is carried out on each channel, and the activation value of each channel is calculated, wherein the formula is as follows:

wherein the content of the first and second substances,

denotes f_nThe (c) th channel of (a),

represents

represents f_nAnd the spatial convolution operation between the G channels.

wherein the content of the first and second substances,

Step 2.4: and (3) repeating the step 2.1 to the step 2.3 for each remote sensing image block in the data set D until all data participate in deep convolutional network model training, repeating iteration for 10 times until the network converges, and acquiring a deep convolutional network model and a hyper-parameter thereof: a convolution weight C, a global convolution pooling weight G, and a cloud activation weight W.

And step 3: inputting an image I needing cloud area detection, and calculating a cloud activation map M of the image I by using the I, the convolution weight C, the global convolution pooling weight G and the cloud activation weight W in the step 2^I. The method specifically comprises the following substeps:

step 3.1: inputting an image I needing cloud region detection, and segmenting the image I into overlapped images by using a sliding window algorithmImage block { a₁，a₂，…，a_m}。

Please see fig. 3, step 3.2: for a certain image block a in step 3.1, inputting the certain image block a into the deep convolutional network model in step 2.4 to output a characteristic map f, wherein the formula is as follows:

Representing an overall representation of the operations of convolution, activation operations, etc. in a deep convolutional network model.

wherein, T^kIs the k-th band f^kAnd (5) adjusting the characteristic diagram.

Is calculated using f in equation 2.1^kActivation value at k channel, G^KE G represents the weight of the global convolution pooling weight G obtained in the step 2.4 in the k channel, tau (f)^k) Is f^kA statistical value such as a mean or median value.

wherein，

k is 1,2, …, d represents the value of the cloud activation weight value on the k channel, d represents the total channel number

Please see fig. 4, step 3.5: for all image blocks { a ] in step 3.1₁，a₂，…，a_mRepeating the steps 3.2, 3.3 and 3.4, and calculating a cloud activation graph corresponding to each image block

Please see fig. 5, step 4: cloud activation map M for image I in step 3^IPerforming threshold segmentation to calculate the cloud mask image S of the image I^I. The method specifically comprises the following substeps:

step 4.1: for all image blocks b in the data set D in step 1 without clouds₁ ^-，b₂ ^-，…，b_t ^-Repeating the steps 3.2, 3.3 and 3.4, and calculating a cloud activation graph corresponding to each image block

Calculate out

Mean μ and standard deviation σ of.

h ═ μ + kxoσ (formula seven);

wherein k is an empirical coefficient obtained through experiments.

Step 4.3: using the segmentation threshold h calculated in step 4.2 to the cloud activation map M of the image I obtained in step 3.5^IPerforming threshold segmentation to calculate the cloud mask image S of the image I^IThe formula is as follows:

wherein, (i, j) is a cloud activation map M^IOr cloud mask map S^IThe abscissa and ordinate of (a).

TABLE 1 network architecture configuration of deep convolutional network model used in the method

Table 1 shows the network structure of the deep convolutional network model used in the method, and the size of the input image processed by the network structure is 500 × 500 × 4. In table 1, "convolution kernel" specifies the size of the convolution kernel reception field, the dimension dim of the input data, and the number num of convolution kernels, which are expressed as size × size × dim × num by a formula; "step size" means the sliding step size of the convolution; "ReLU nonlinear activation" means that a ReLU nonlinear activation layer is connected after the convolution layer; the "window size" represents the window size of the global convolution pooling layer. As shown in table 1, the network structure contains 10 convolutional layers, 1 global convolutional pooling layer, 1 fully-connected layer and 1 softmax classification layer.

In order to analyze the effect of the experimental coefficient k in step 4.2 on the deep learning, table 2 shows the performance indexes of the method under the settings of different coefficients k. At k 0.7, the process achieves the best performance.

TABLE 2 several performance indexes of the method under different coefficients k

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A pixel-level remote sensing image cloud region detection method for deep learning guided by coarse-grained labels is characterized by comprising the following steps:

step 1: inputting a remote sensing image data set D { (b) with a coarse-grained label_n,y_n) 1,2, …, N, where b is_nRepresenting the nth remote sensing image block, y, of the data set D_nThe coarse-grained remote sensing image block level label corresponding to the nth remote sensing image block is represented, and N represents the total number of the remote sensing image blocks in the data set D;

the specific implementation of the step 2 comprises the following substeps:

wherein f is_nAn output characteristic diagram representing the last convolutional layer in the deep convolutional network model;

representing the integral representation of convolution and activation operation in the deep convolution network model, and C representing the convolution weight in the deep convolution network model;

step 2.2: to the step of2.1 in f_nThe global convolution pooling is carried out on each channel, and the activation value of each channel is calculated, wherein the formula is as follows:

wherein the content of the first and second substances,

denotes f_nThe (c) th channel of (a),

represents

represents f_nPerforming spatial convolution operation with each channel of G;

step 2.3: learning the hyper-parameters C, G, W of the deep convolutional network model using a cross entropy loss function based on softmax, the formula is as follows:

wherein the content of the first and second substances,

representing that the cloud activation weight value is in the kth channel, and the weight value of the category c is; class c corresponds to b_nLabel of (a), y_n＝[0,1]When c is 1, y_n＝[1,0]When c is 0; d represents the total number of channels;

step 2.4: and (3) repeating the step 2.1 to the step 2.3 for each remote sensing image block in the data set D until all data participate in deep convolutional network model training, repeating iteration for a plurality of times until the network converges, and acquiring a deep convolutional network model and a hyper-parameter thereof: a convolution weight C, a global convolution pooling weight G and a cloud activation weight W;

and step 3: inputting an image I needing cloud area detection, and calculating a cloud activation map M of the image I by using the I, the convolution weight C, the global convolution pooling weight G and the cloud activation weight W in the step 2^I；

The specific implementation of the step 3 comprises the following substeps:

step 3.1: inputting an image I needing cloud region detection, and cutting the image I into overlapped image blocks { a ] by using a sliding window algorithm₁,a₂,…,a_m}；

Step 3.2: and (3) inputting a certain image block a in the step 3.1 into the deep convolution network model to output a characteristic map f, wherein the formula is as follows:

wherein f represents the output characteristic diagram of the last convolution layer in the deep convolution network model,

representing an overall representation of convolution, activation arithmetic operations in a deep convolutional network model;

wherein, T^kIs the k-th band f^kAfter the feature map is adjusted,

is to use step 2Equation 1 for equation f^kActivation value at k channel, G^KE G represents the weight of the global convolution pooling weight G obtained in the step 2.4 in the k channel, tau (f)^k) Is f^kA statistical value of (a), including a mean or median;

wherein the content of the first and second substances,

representing the value of the cloud activation weight in the kth channel;

step 3.5: for all image blocks { a ] in step 3.1₁,a₂,…,a_mRepeating the steps 3.2, 3.3 and 3.4, and calculating a cloud activation graph corresponding to each image block

Splicing all the cloud activation maps to obtain a cloud activation map M of the image I^I；

And 4, step 4: cloud activation map M for image I in step 3^IPerforming threshold segmentation to calculate the cloud mask image S of the image I^I。

2. The pixel-level remote sensing image cloud region detection method for coarse-grained label guided deep learning according to claim 1, characterized in that: y in step 1_nHas two forms, y_n＝[1,0]Representing the nth remote sensing image block b in the data set D_nThe label of (A) is cloud-containing, y_n＝[0,1]Representing the nth remote sensing image block b in the data set D_nThe tag of (1) is cloud-free.

3. The pixel-level remote sensing image cloud region detection method for coarse-grained label guided deep learning according to claim 1, characterized in that: the specific implementation of the step 4 comprises the following substeps:

step 4.1: all the cloud-free image blocks in the data set D in step 1 are recorded as { b }₁ ^-,b₂ ^-,…,b_t ^-And calculating a cloud activation image corresponding to each image block

Calculate out

The mean μ and standard deviation σ of;

h ═ μ + kxoσ (formula seven);

wherein k is a coefficient;

4. The pixel-level remote sensing image cloud region detection method for coarse-grained label guided deep learning according to claim 1, characterized in that: the network structure of the deep convolutional network model comprises 10 convolutional layers, 1 global convolutional pooling layer, 1 full-link layer and 1 softmax classification layer.

5. The pixel-level remote sensing image cloud region detection method for coarse-grained label guided deep learning according to claim 4, characterized in that: the size of a convolution kernel in the convolution layer is 3 multiplied by 3, the sliding step length of convolution is 1 multiplied by 1, and a ReLU nonlinear activation layer is connected behind the convolution layer.

6. The pixel-level remote sensing image cloud region detection method for coarse-grained label guided deep learning according to claim 4, characterized in that: the window size of the global convolution pooling layer is 230 × 230.

7. The pixel-level remote sensing image cloud region detection method for coarse-grained label guided deep learning according to claim 3, characterized in that: in step 4.2, k is 0.7.