CN111079683B

CN111079683B - Remote sensing image cloud and snow detection method based on convolutional neural network

Info

Publication number: CN111079683B
Application number: CN201911350284.6A
Authority: CN
Inventors: 李坤; 杜洪才; 郭建华; 杨敬钰
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2023-12-12
Anticipated expiration: 2039-12-24
Also published as: CN111079683A

Abstract

The invention belongs to the field of image processing, and aims to realize a remote sensing image cloud and snow detection method which is reasonable in design and high in identification accuracy and performs multi-scale and multi-path feature fusion on a multi-convolution layer based on a convolution neural network. The image is accurately classified at the pixel level. Therefore, the technical scheme adopted by the invention is that the remote sensing image cloud and snow detection method based on the convolutional neural network comprises the following steps: the coding part of the network is: performing feature coding on input image information; the decoding part of the network is: and extracting original image resolution information from the basic depth features coded by the coding structure through a multi-scale fusion module to generate a cloud and snow object detection result consistent with the original image resolution. The method is mainly applied to weather image processing occasions.

Description

Remote sensing image cloud and snow detection method based on convolutional neural network

Technical Field

The invention belongs to the field of computer vision. In particular to a remote sensing image cloud and snow detection method for extracting and fusing multi-scale multi-path semantic information based on a multi-feature layer of a convolutional neural network.

Background

The remote sensing image is generally used for land monitoring, target detection, geographical mapping and the like, and the distribution of cloud and snow in the image has a large influence on the spectrum of the remote sensing image. Improving the detection accuracy of cloud and snow in remote sensing images has become a target for many remote sensing image applications. Cloud and snow present in the remote sensing image can adversely affect remote sensing applications such as atmospheric correction, target identification, target detection, etc.

Therefore, the method has very important significance in improving the cloud and snow detection precision of the remote sensing image of the optical satellite image. The cloud and snow detection of the remote sensing image in the remote sensing image is a multi-classification problem, and machine learning methods such as an Artificial Neural Network (ANN) and a Support Vector Machine (SVM) are applied to the classification research of the remote sensing image. These methods use manually designed functions and binary classifiers without utilizing advanced functions. Today, convolutional Neural Networks (CNNs) have become an important point of research. Convolutional Neural Networks (CNNs), which are a typical deep learning algorithm, have been applied to classification detection, target recognition, and target detection, and the wide application of CNN frameworks in the fields of speech recognition and image semantic detection has become very important. Because it is able to accurately extract semantic information from a large number of input images. Many deep Convolutional Neural Networks (CNNs) for cloud and snow detection are pixel-based predictions.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to realize a remote sensing image cloud and snow detection method which is reasonable in design and high in identification accuracy and performs multi-scale and multi-path feature fusion based on a multi-convolution layer of a convolution neural network. The image is accurately classified at the pixel level. Therefore, the technical scheme adopted by the invention is that the remote sensing image cloud and snow detection method based on the convolutional neural network comprises the following steps:

the coding part of the network is: performing feature coding on input image information;

the decoding part of the network is: and extracting original image resolution information from the basic depth features coded by the coding structure through a multi-scale fusion module to generate a cloud and snow object detection result consistent with the original image resolution.

The comprehensive cross entropy loss and the mean square error loss are taken as loss functions, the network proposed by the comprehensive loss function target training is used for evaluating the network performance by using the accuracy and the average cross ratio mIOU (Mean Intersection over Union).

The basic depth features comprise global information and local information and are obtained by fusing output features of different convolution layers.

The coding part of the network, namely the step of carrying out feature coding on the input image information, specifically comprises the following steps:

(1) Processing the input image to 256×256 in unified size, taking a residual network ResNet-50 structure with 50 layers of neurons as a coding structure part of the proposed network, and decomposing the residual network ResNet-50 structure into a preprocessing unit and 4 residual blocks according to the residual blocks according to a 5-stage processing structure of the ResNet-50;

(2) Inputting the image with uniform size into a ResNet-50 network structure, outputting a characteristic by each residual block after a series of convolution, batch normalization, pooling and ReLU operation of the image, wherein the resolution of the output characteristic of each residual block is as follows: the residual block 1 is 32×32, the residual block 2 is 16×16, the residual block 3 is 16×16, and the residual block 4 is 16×16, for a total of 4 partial residual block output features.

The encoding part of the network, namely a process of fusing the characteristics generated in the step 1 by using a multi-scale fusion module, specifically comprises the following steps:

(1) The feature output by the residual block 1 with the feature resolution of 32×32 is subjected to 3×3 convolution, the convolution step length of the convolution is 1, and a feature map with the feature resolution of 32×32 is obtained and is recorded as a feature map 1;

(2) The feature output by the residual block 2 with the resolution of 16×16 is subjected to 3×3 expansion convolution, the convolution step length of the expansion convolution is 1, the expansion rate is 6, and a feature map with the resolution of 16×16 is obtained and is recorded as a feature map 2;

(3) The feature output by the residual block 3 with the resolution of 16×16 is subjected to 3×3 expansion convolution, the convolution step length of the expansion convolution is 1, the expansion rate is 12, and a feature map with the resolution of 16×16 is obtained and is recorded as a feature map 3;

(4) The feature output by the residual block 4 with the resolution of 16×16 is subjected to 3×3 expansion convolution, the convolution step length of the expansion convolution is 1, the expansion rate is 18, and a feature map with the resolution of 16×16 is obtained and is recorded as a feature map 4;

(5) The feature output by the residual block 4 with the output feature resolution of 16×16 is subjected to global average pooling layer and then is subjected to convolution of 3×3, the convolution step length of the convolution is 1, and a feature map of 16×16 is obtained and is recorded as a feature map 5;

(6) Respectively carrying out up-sampling on the feature map 2, the feature map 3, the feature map 4 and the feature map 5 by 2 times to generate a feature map 2a, a feature map 3a, a feature map 4a and a feature map 5a;

(7) Cascading the feature map 1 with the feature map 2a, the feature map 3a, the feature map 4a and the feature map 5a to obtain a feature cascading diagram A;

(8) The feature cascade diagram A is subjected to convolution of 1 multiplied by 1, the convolution step length of the convolution is 1, a feature diagram of 32 multiplied by 32 is obtained, and the feature cascade diagram A is recorded as a feature fusion diagram B;

(9) The feature fusion diagram B is subjected to double up-sampling to generate a feature fusion diagram C with the resolution of 64 multiplied by 64;

(10) The feature map with the resolution of 64 multiplied by 64 output by the second residual unit in the residual block 1 is subjected to convolution of 1 multiplied by 1, the convolution step length of the convolution is 1, a feature map with the resolution of 64 multiplied by 64 is obtained, and the feature map is cascaded with a feature fusion map C with the resolution of 64 multiplied by 64, so that a feature fusion map D is obtained;

(11) The feature fusion map D is subjected to 3×3 convolution, the convolution step length of the convolution is 1, a 64×64 feature map is obtained, and the feature map is subjected to 4-time up-sampling to generate a 256×256 detection map.

The training step for the proposed network comprises in particular the following steps:

(1) Calculating cross entropy loss and mean square error loss of the prediction detection diagram and the marked detection diagram, and updating the weight by using a back propagation algorithm;

(2) After the network training is completed, the pixel accuracy and the mIOU (Mean Intersection over Union, equal cross ratio) are used for measuring the prediction performance.

The invention has the characteristics and beneficial effects that:

the invention has reasonable design, fully considers local information and global information, adopts a characteristic fusion method of a multi-scale multi-convolution layer to improve the calculation detection accuracy of the image, trains the proposed network by taking the combination of cross entropy loss and mean square error loss as a minimum target loss function, and effectively improves the cloud and snow detection accuracy of the image remote sensing image.

Description of the drawings:

fig. 1 is a network overall structure proposed by the present invention.

Fig. 2 is an input image of a test of the present invention.

Fig. 3 is a cloud-snow label of an input image tested by the present invention.

FIG. 4 shows the results of the detection of the present invention.

Detailed Description

In order to reduce the influence of external factors such as uncertainty, ambiguity and the like, information in an image is fully utilized to obtain better characteristic representation, and the invention provides a remote sensing image cloud and snow detection method for multi-convolution layer multi-scale multi-path semantic information extraction and fusion based on a convolutional neural network. According to the invention, the characteristic fusion is carried out on the multi-scale multi-path multi-volume lamination, so that the cloud and snow detection performance of the image remote sensing image is improved.

The technical scheme of the invention is realized by the following steps:

1. performing feature coding on input image information;

2. extracting basic depth features, fusing the basic depth features by a multi-scale fusion module, recovering original image resolution information, and generating a cloud and snow object detection result consistent with the original image resolution;

3. and comprehensively cross entropy loss and mean square error loss are used as loss functions, the network proposed by the comprehensive loss function target training is used for evaluating the network performance by using the accuracy and the mIOU.

The relation of the three steps is that the step 2 processes the coded information obtained in the step 1 and generates a result, the coded information obtained in the step 1 is the basis of the step 2, the step 1 and the step 2 are the whole network structure of the invention, and the step 3 is the training method of the network of the invention.

The specific implementation method of the step 1 comprises the following steps:

(3) The input image is processed to a unified size of 256×256, with a res net-50 network structure as the coding structure part of the proposed network, which is decomposed into a pre-processing unit plus 4 residual blocks in residual blocks according to a 5-stage processing structure of res net-50.

(4) Inputting the image with uniform size into a ResNet-50 network structure, outputting a characteristic by each residual block after a series of convolution, batch normalization, pooling and ReLU operation of the image, wherein the resolution of the output characteristic of each residual block is as follows: the residual block 1 is 32×32, the residual block 2 is 16×16, the residual block 3 is 16×16, and the residual block 4 is 16×16, for a total of 4 partial residual block output features.

The specific implementation method of the step 2 comprises the following steps:

(12) The feature output by the residual block 1 with the output feature resolution of 32×32 is subjected to 3×3 convolution, and the convolution step length of the convolution is 1, so as to obtain a 32×32 feature map, which is denoted as feature map 1.

(13) The feature output from the residual block 2 with the output feature resolution of 16×16 is subjected to 3×3 expansion convolution, and the convolution step length of the expansion convolution is 1 and the expansion ratio is 6, so as to obtain a feature map of 16×16, which is denoted as feature map 2.

(14) The feature output from the residual block 3 with the output feature resolution of 16×16 is subjected to 3×3 expansion convolution, the convolution step length of the expansion convolution is 1, the expansion ratio is 12, and a feature map of 16×16 is obtained and is recorded as a feature map 3.

(15) The feature output from the residual block 4 with the output feature resolution of 16×16 is subjected to 3×3 expansion convolution, and the convolution step length of the expansion convolution is 1 and the expansion ratio is 18, so as to obtain a feature map of 16×16, which is denoted as feature map 4.

(16) The feature output by the residual block 4 with the output feature resolution of 16×16 is subjected to global average pooling layer and then is subjected to convolution of 3×3, and the convolution step length of the convolution is 1, so as to obtain a feature map of 16×16, which is denoted as feature map 5.

(17) The feature map 2, feature map 3, feature map 4, feature map 5, and feature map 2a, feature map 3a, feature map 4a, and feature map 5a are respectively subjected to 2-fold upsampling.

(18) And cascading the feature map 1 with the feature map 2a, the feature map 3a, the feature map 4a and the feature map 5a to obtain a feature cascading diagram A.

(19) The feature cascade diagram A is subjected to convolution of 1 multiplied by 1, the convolution step length of the convolution is 1, a feature diagram of 32 multiplied by 32 is obtained, and the feature cascade diagram A is recorded as a feature fusion diagram B.

(20) The feature fusion map B is subjected to double up-sampling to generate a feature fusion map C with a resolution of 64×64.

(21) The feature map with the resolution of 64×64 output by the second residual unit in the residual block 1 is subjected to convolution of 1×1, the convolution step length of the convolution is 1, a feature map with the resolution of 64×64 is obtained, and the feature map is cascaded with a feature fusion map C with the resolution of 64×64, so that a feature fusion map D is obtained.

(22) The feature fusion map D is subjected to 3×3 convolution, the convolution step length of the convolution is 1, a 64×64 feature map is obtained, and the feature map is subjected to 4-time up-sampling to generate a 256×256 detection map.

The specific implementation method of the step 3 comprises the following steps:

(3) And calculating cross entropy loss and mean square error loss of the predicted detection diagram and the marked detection diagram, and updating the weight by using a back propagation algorithm.

(4) After the network training is completed, the pixel accuracy and the mIOU (Mean Intersection over Union, equal cross ratio) are used for measuring the prediction performance.

Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.

The invention provides a method for detecting cloud and snow of a remote sensing image by utilizing a multi-path characteristic multi-scale fusion module aiming at the problem of how to fully utilize global information and local information in cloud and snow detection of the remote sensing image of the image. As shown in FIG. 1, the network structure of the invention carries out convolution extraction on the output characteristics of each residual block in the ResNet-50 network structure, carries out convolution and fusion on the output characteristics and the residual blocks by a multi-scale expansion convolution, and carries out cascade connection on the output characteristics and the residual blocks and the low-level characteristics after the output characteristics and the convolution fusion on the output characteristics and the residual blocks are unified in size by a sampling layer, and then carries out convolution operation on the output characteristics and the residual blocks and the convolution fusion on the output characteristics and the residual blocks, and carries out up sampling on the output characteristics and the residual blocks, so that the original resolution of the image is restored, and the classification result is more credible. The method is equivalent to a feature extraction end in a network, different scale feature extraction is carried out on different feature layers, the receptive fields of convolution kernels of different scales are different, so that the obtained features of each path contain different scale information, and finally a series of local to global features are obtained. Such fusion results fully take into account local information as well as global information. The output of the network is a detection graph consistent with the resolution of the original image, the detection accuracy is calculated by using the existing labels of the image, and finally the proposed network is trained with the aim of minimizing cross entropy loss and mean square error loss.

In this embodiment, a remote sensing image cloud and snow detection method based on a convolutional neural network includes the following steps:

step S1, carrying out feature coding on input image information, wherein the specific processing method of the step is as follows:

step S1.1 processes the input image to 256×256 with a uniform size, takes a ResNet-50 network structure as a pre-trained basic convolutional neural network, and decomposes the input image into a preprocessing unit plus 4 residual blocks according to the 5-stage processing structure of ResNet-50.

Step S1.2, inputting an image with uniform size into a ResNet-50 network structure, outputting a characteristic by each residual block after a series of convolution, batch normalization, pooling and ReLU operation of the image, wherein the resolution of the output characteristic of each residual block is as follows: the residual block 1 is 32×32, the residual block 2 is 16×16, the residual block 3 is 16×16, and the residual block 4 is 16×16, for a total of 4 partial residual block output features.

And S2, recovering the original image resolution information by the multi-scale fusion module from the encoded information obtained in the step S1, and generating a cloud and snow object detection result consistent with the original image resolution. The specific treatment method of the step is as follows:

s2.1 convolves the feature output by the residual block 1 with the feature resolution of 32×32 by 3×3, and the convolution step length of the convolution is 1, so as to obtain a feature map of 32×32, which is denoted as feature map 1.

S2.2 the feature output from the residual block 2 with the feature resolution of 16×16 is subjected to 3×3 expansion convolution, and the convolution step size of the expansion convolution is 1 and the expansion ratio is 6, so as to obtain a feature map of 16×16, which is denoted as feature map 2.

S2.3 the feature output from the residual block 3 with the feature resolution of 16×16 is subjected to 3×3 expansion convolution, and the convolution step size of the expansion convolution is 1 and the expansion ratio is 12, so as to obtain a feature map of 16×16, which is denoted as feature map 3.

S2.4 the feature output from the residual block 4 with the feature resolution of 16×16 is subjected to 3×3 expansion convolution, and the convolution step size of the expansion convolution is 1 and the expansion ratio is 18, so as to obtain a feature map of 16×16, which is denoted as feature map 4.

S2.5, the feature output by the residual block 4 with the resolution of 16×16 is subjected to global average pooling layer and then is subjected to convolution of 3×3, and the convolution step length of the convolution is 1, so that a 16×16 feature map is obtained and is recorded as feature map 5.

S2.7 up-samples the feature map 2, the feature map 3, the feature map 4, and the feature map 5 by 2 times, respectively, to generate a feature map 2a, a feature map 3a, a feature map 4a, and a feature map 5a.

And S2.8, cascading the feature map 1 with the feature map 2a, the feature map 3a, the feature map 4a and the feature map 5a to obtain a feature cascading diagram A.

S2.9, the feature cascade diagram A is subjected to convolution of 1 multiplied by 1, the convolution step length of the convolution is 1, a 32 multiplied by 32 feature diagram is obtained, and the feature cascade diagram A is recorded as a feature fusion diagram B.

And S2.10, carrying out double up-sampling on the feature fusion map B to generate a feature fusion map C with the resolution of 64 multiplied by 64.

S2.11, the feature map with the resolution of 64 multiplied by 64 output by the second residual unit in the residual block 1 is subjected to convolution of 1 multiplied by 1, the convolution step length of the convolution is 1, the feature map with the resolution of 64 multiplied by 64 is obtained, and the feature map is cascaded with the feature fusion map C with the resolution of 64 multiplied by 64, so that a feature fusion map D is obtained.

S2.12, the feature fusion diagram D is subjected to 3×3 convolution, the convolution step length of the convolution is 1, a 64×64 feature diagram is obtained, and the feature diagram is subjected to 4-time up-sampling to generate a 256×256 detection diagram.

And S3, training the network proposed in the steps S1 and S2 by taking the comprehensive cross entropy loss and the mean square error loss function as targets, and evaluating the network performance by using the accuracy and the mIoU. The specific treatment method of the step is as follows:

and S3.1, calculating cross entropy loss and mean square error loss of the prediction detection diagram and the marked detection diagram, and updating the weight by using a back propagation algorithm.

And S3.2, after the network training is finished, measuring the prediction performance of the network training by using the accuracy and the mIOU (Mean Intersection over Union, equal cross ratio).

The following experiments were conducted according to the method of the present invention to explain the recognition effect of the present invention.

Test environment: python2.7 (python language development environment); tensorflow (neural network algorithm library); ubuntu16.04 (operating system) system; NVIDIA GTX 1080Ti GPU (graphic processor developed by NVIDIA Co.)

Test sequence: the selected data set is an image data set of cloud and snow image detection made of ZY-3 No. Wei Xingyun snow images and comprises 21290 satellite remote sensing images

The test indexes are as follows: the invention uses Pixel Accuracy (Pixel Accuracy) and mIOU as performance evaluation indexes. Pixel accuracy refers to pixel classification accuracy. mIoU refers to the ratio of the intersection to the union of the average predicted correct erroneous pixels. The method calculates the index data according to different algorithms popular at present and then compares the results, so that the method proves that the method obtains better results in the field of cloud and snow detection of the image remote sensing image.

The test results were as follows:

as can be seen from the above comparison data, the pixel accuracy and the mIOU of the invention are obviously improved compared with the prior method.

It should be emphasized that the examples described herein are illustrative rather than limiting, and therefore the invention includes, but is not limited to, the examples described in the detailed description, as other embodiments derived from the technical solutions of the invention by a person skilled in the art are equally within the scope of the invention.

Claims

1. A remote sensing image cloud and snow detection method based on a convolutional neural network is characterized by comprising the following steps:

the decoding part of the network is: extracting original image resolution information from basic depth features coded by a coding structure through a multi-scale fusion module to generate a cloud and snow object detection result consistent with the original image resolution;

(2) Inputting the image with uniform size into a ResNet-50 network structure, outputting a characteristic by each residual block after a series of convolution, batch normalization, pooling and ReLU operation of the image, wherein the resolution of the output characteristic of each residual block is as follows: the residual block 1 is 32×32, the residual block 2 is 16×16, the residual block 3 is 16×16, the residual block 4 is 16×16, and the total of 4 partial residual block output features are provided;

2. The method for detecting cloud and snow in remote sensing images based on convolutional neural networks according to claim 1, wherein the comprehensive cross entropy loss and the mean square error loss are used as loss functions, the proposed network is trained by the comprehensive loss functions, and the network performance is evaluated by using accuracy and average cross ratio mIOU (Mean Intersection over Union).

3. The remote sensing image cloud and snow detection method based on the convolutional neural network as set forth in claim 1, wherein the basic depth features comprise global information and local information and are obtained by fusing output features of different convolutional layers.

4. The remote sensing image cloud and snow detection method based on convolutional neural network as set forth in claim 1, wherein the training step of the proposed network specifically comprises the steps of:

(2) After the network training is completed, the prediction performance of the network training is measured by using the pixel accuracy and the average cross ratio mIOU (Mean Intersection over Union).