CN111985487A

CN111985487A - Remote sensing image target extraction method, electronic equipment and storage medium

Info

Publication number: CN111985487A
Application number: CN202010899790.7A
Authority: CN
Inventors: 张效康; 潘文安
Original assignee: Chinese University of Hong Kong CUHK
Current assignee: Chinese University of Hong Kong CUHK
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2020-11-24
Anticipated expiration: 2040-08-31
Also published as: CN111985487B

Abstract

The invention discloses a remote sensing image target extraction method, which comprises the following steps: carrying out object-oriented segmentation on the current remote sensing image after the fact to obtain a current image object; extracting shallow layer characteristics of the current image object in advance and in the future; inputting the posterior shallow feature and the posterior image of the current image object into a full convolution neural network model, integrating the object-oriented shallow feature and the image depth feature of the constructed full convolution neural network, selecting a sample to perform model training, and obtaining the depth feature of the posterior image; inputting the shallow feature of the prior image of the current image object and the prior image into a full convolution neural network model by using transfer learning to obtain the depth feature of the prior image; carrying out change vector analysis on the advance image depth feature and the after image depth feature to generate a change intensity feature map; dividing each pixel of the remote sensing image into a target pixel and a non-target pixel to finish target extraction; the extraction of the target features can be made finer and the noise can be reduced.

Description

Remote sensing image target extraction method, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a remote sensing image target extraction method, electronic equipment and a storage medium.

Background

The automatic extraction and detection of ground object targets (such as landslide, flood, mountain fire, house collapse and the like) from the high-resolution remote sensing images in advance and in the afterward have become effective means for disaster monitoring and emergency rescue. At present, a full convolution neural network model is widely applied to remote sensing image classification, target detection and semantic segmentation.

However, although high-level semantic features are extracted in the above application, a full convolution neural network model also causes more details to be lost, and as the learned features become more and more abstract along with the deepening of the hierarchy, the existing full convolution neural network model is difficult to capture the accurate contours of different objects at the pixel level, and the noise in the target extraction result is more.

Disclosure of Invention

The invention mainly aims to provide a remote sensing image target extraction method, an electronic device and a storage medium, and aims to solve the technical problems that in the prior art, an existing full convolution neural network is difficult to capture accurate outlines of different objects at a pixel level, and noise in a target extraction result is high.

In order to achieve the above object, a first aspect of the present invention provides a method for extracting a target from a remote sensing image, including: segmenting the acquired current remote sensing image facing to the object to obtain a current image object; extracting the shallow layer characteristics of the current image object before and after; inputting the posterior shallow feature and the posterior image of the current image object into a pre-trained full convolution neural network model to obtain the depth feature of the posterior image output by the full convolution neural network model, wherein the full convolution neural network model is trained by using an original remote sensing image and ground reference data; inputting the prior shallow feature and the prior image of the current image object into the full convolution neural network model by using transfer learning to obtain the depth feature of the prior image output by the full convolution neural network model; carrying out change vector analysis on the depth features of the previous image and the depth features of the subsequent image to generate a change intensity feature map; and (3) dividing each pixel of the remote sensing image into a target pixel and a non-target pixel by using an unsupervised K-means clustering method to finish target extraction.

Further, the full convolutional neural network model includes: 17 convolutional layers, 4 maximum pooling layers, 4 transpose convolutional layers, 4 cascade layers, and 1 object feature layer; the size of 15 convolutional layers is 3 × 3, the size of the other two convolutional layers is 1 × 1, the size of a sampling window used by the maximum pooling layer is 2 × 2, and the size of a window of the transposed convolutional layer is 2 × 2.

Further, the full convolution neural network model is constructed in the following manner: a first convolution layer, a second convolution layer, a first maximum pooling layer, a third convolution layer, a fourth convolution layer, a second maximum pooling layer, a fifth convolution layer, a sixth convolution layer, a third maximum pooling layer, a seventh convolution layer, an eighth convolution layer, a fourth maximum pooling layer, a ninth convolution layer, a first transposition convolution layer, a first cascade layer, a tenth convolution layer, an eleventh convolution layer, a second transposition learning layer, a second cascade layer, a twelfth convolution layer, a thirteenth convolution layer, a third transposition learning layer, a third cascade layer, a fourteenth convolution layer, a fifteenth convolution layer and a fourth transposition convolution layer are sequentially arranged and sequentially transmit data, in addition, the fourth transposition convolution layer and the object feature layer commonly transmit data to the fourth cascade layer, the sixteenth convolution layer and the seventeenth convolution layer are sequentially arranged and transmit data, and the seventeenth convolution is an output end of the full convolution neural network model, the first convolution layer is an input end of the full convolution neural network model, the fourth convolution layer can also transmit data to the third cascade layer, the sixth convolution layer can also directly transmit data to the second cascade layer, and the eighth convolution layer can also transmit data to the first transfer convolution layer; the convolution kernel size of the sixteenth convolution layer and the seventeenth convolution layer is 1 × 1; the 17 convolutional layers, the 4 maximum pooling layers, the 4 transposed convolutional layers, the 4 cascade layers and the 1 object feature layer have different dimensions respectively; the dimensions of the first convolutional layer and the second convolutional layer are 16, the dimensions of the third convolutional layer and the fourth convolutional layer are 32, the dimensions of the fifth convolutional layer and the sixth convolutional layer are 64, the dimensions of the seventh convolutional layer and the eighth convolutional layer are 128, the dimensions of the ninth convolutional layer and the tenth convolutional layer are 128, the dimensions of the eleventh convolutional layer and the twelfth convolutional layer are 64, the dimensions of the thirteenth convolutional layer and the fourteenth convolutional layer are 32, the dimensions of the fifteenth convolutional layer and the sixteenth convolutional layer are 16, and the dimension of the seventeenth convolutional layer is 2; the dimension of the first largest pooling layer is 16, the dimension of the second largest pooling layer is 32, the dimension of the third largest pooling layer is 64, and the dimension of the fourth largest pooling layer is 128; the dimension of the first cascade layer is 256, the dimension of the second cascade layer is 128, the dimension of the third cascade layer is 64, and the dimension of the fourth cascade layer is 31; the dimension of the first transposed convolutional layer is 128, the dimension of the second transposed convolutional layer is 64, the dimension of the third transposed convolutional layer is 32, and the dimension of the fourth transposed convolutional layer is 16; the dimension of the object feature layer is 15.

Further, the training method of the full convolution neural network model is as follows: randomly intercepting 400 sample remote sensing images with the size of 160 multiplied by 160 pixels and sample reference data from original remote sensing images and ground reference data; transposing and rotating the sample reference data to form a training sample; carrying out object-oriented segmentation on the training sample to obtain a sample image object; extracting the prior and subsequent shallow features of the sample image object, and constructing an object feature layer by the prior and subsequent shallow features of the sample image; inputting the training sample into the full convolution neural network model for training, wherein the optimization function of the full convolution neural network model is an adaptive moment estimation algorithm, and the learning rate is 1 multiplied by 10^-4The loss function is a binary cross entropy with a batch size of 150.

Further, the shallow features of the current image and the sample image include: the image has the texture characteristics of the homogeneity, the contrast and the second moment in four directions of 0 degree, 45 degrees, 90 degrees and 135 degrees in the gray level co-occurrence matrix.

Further, the object-oriented segmentation method includes: a parting network evolution algorithm or a watershed segmentation algorithm or a mean shift segmentation algorithm is used.

A third aspect of the present invention provides an electronic apparatus comprising: the remote sensing image target extraction method comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the computer program, any one of the remote sensing image target extraction methods is realized.

A fourth aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein when the computer program is executed by a processor, the method for extracting a target of a remote sensing image is implemented.

The invention provides a remote sensing image target extraction method, electronic equipment and a storage medium, and has the beneficial effects that: the full convolution neural network model with the new structure is trained by using the original remote sensing image and the ground reference data, and the method for extracting the shallow features such as the spectrum and the texture is fused on the basis of extracting the deep features of the existing full convolution neural network, so that the extraction of the target features is more precise, the detail information of the target object can be kept, and the noise in the target extraction result is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for extracting a target from a remote sensing image according to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a structure of a full convolution neural network model of a remote sensing image target extraction method according to an embodiment of the present invention;

fig. 3 is a block diagram illustrating a structure of an electronic device according to an embodiment of the invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a method for extracting a remote sensing image target includes: s1, segmenting the acquired current remote sensing image facing the object to obtain a current image object; s2, extracting the shallow layer characteristics of the current image object before and after; s3, inputting the posterior shallow feature and the posterior image of the current image object into a pre-trained full convolution neural network model to obtain the depth feature of the posterior image output by the full convolution neural network model, wherein the full convolution neural network model is trained by using the original remote sensing image and ground reference data; s4, inputting the prior shallow feature and the prior image of the current image object into the full convolution neural network model by using transfer learning to obtain the depth feature of the prior image output by the full convolution neural network model; s5, carrying out change vector analysis on the depth features of the prior image and the depth features of the subsequent image to generate a change intensity feature map; and S6, dividing each pixel of the remote sensing image into a target pixel and a non-target pixel by using an unsupervised K-means clustering method, and finishing target extraction.

A full convolution neural network model trained by using an original remote sensing image and ground reference data is added with a shallow feature extraction method on the basis of deep semantic feature extraction of the existing full convolution neural network, so that target features are extracted more finely, and noise in a target extraction result is reduced.

Referring to fig. 2, the full convolution neural network model includes: 17 convolutional layers, 4 maximum pooling layers, 4 transpose convolutional layers, 4 cascade layers, and 1 object feature layer; the size of 15 convolutional layers is 3 × 3, the size of the other two convolutional layers is 1 × 1, the size of a sampling window used by the maximum pooling layer is 2 × 2, and the size of a window of the transposed convolutional layer is 2 × 2.

The first to seventeenth convolutional layers correspond to convolutional layers 1 to 17 respectively in fig. 2, the first to fourth largest pooling layers correspond to largest pooling layers 1 to 4 respectively in fig. 2, the first to fourth transpose convolutional layers correspond to transpose convolutional layers 1 to transpose convolutional layers 4 respectively in fig. 2, and the first to fourth cascade layers correspond to cascade layers 1 to cascade layers 4 respectively in fig. 2; in addition, the numbers in parentheses in fig. 2 represent dimensions.

The construction method of the full convolution neural network model is as follows: a first convolution layer, a second convolution layer, a first maximum pooling layer, a third convolution layer, a fourth convolution layer, a second maximum pooling layer, a fifth convolution layer, a sixth convolution layer, a third maximum pooling layer, a seventh convolution layer, an eighth convolution layer, a fourth maximum pooling layer, a ninth convolution layer, a first transposition convolution layer, a first cascade layer, a tenth convolution layer, an eleventh convolution layer, a second transposition learning layer, a second cascade layer, a twelfth convolution layer, a thirteenth convolution layer, a third transposition learning layer, a third cascade layer, a fourteenth convolution layer, a fifteenth convolution layer and a fourth transposition convolution layer are sequentially arranged and sequentially transmit data, in addition, the fourth transposition convolution layer and the object feature layer commonly transmit data to the fourth cascade layer, the sixteenth convolution layer and the seventeenth convolution layer are sequentially arranged and transmit data, and the seventeenth convolution is an output end of the full convolution neural network model, the first convolution layer is the input end of the full convolution neural network model, the fourth convolution layer can also transmit data to the third cascade layer, the sixth convolution layer can also directly transmit data to the second cascade layer, and the eighth convolution layer can also transmit data to the first transfer convolution layer.

The convolution kernel sizes of the first convolution layer to the fifteenth convolution layer are 3 × 3, and the convolution kernel sizes of the sixteenth convolution layer and the seventeenth convolution layer are 1 × 1.

The 17 convolutional layers, the 4 max pooling layers, the 4 transpose convolutional layers, the 4 cascade layers, and the 1 object feature layer have different dimensions, respectively.

The dimensions of the first convolutional layer and the second convolutional layer are 16, the dimensions of the third convolutional layer and the fourth convolutional layer are 32, the dimensions of the fifth convolutional layer and the sixth convolutional layer are 64, the dimensions of the seventh convolutional layer and the eighth convolutional layer are 128, the dimensions of the ninth convolutional layer and the tenth convolutional layer are 128, the dimensions of the eleventh convolutional layer and the twelfth convolutional layer are 64, the dimensions of the thirteenth convolutional layer and the fourteenth convolutional layer are 32, the dimensions of the fifteenth convolutional layer and the sixteenth convolutional layer are 16, and the dimension of the seventeenth convolutional layer is 2.

The first largest pooling layer has a dimension of 16, the second largest pooling layer has a dimension of 32, the third largest pooling layer has a dimension of 64, and the fourth largest pooling layer has a dimension of 128.

The first cascaded layer has a dimension of 256, the second cascaded layer has a dimension of 128, the third cascaded layer has a dimension of 64, and the fourth cascaded layer has a dimension of 31.

The first transposed convolutional layer has a dimension of 128, the second transposed convolutional layer has a dimension of 64, the third transposed convolutional layer has a dimension of 32, and the fourth transposed convolutional layer has a dimension of 16.

The dimension of the object feature layer is 15.

The training method of the full convolution neural network model comprises the following steps: randomly intercepting 400 sample remote sensing images with the size of 160 multiplied by 160 pixels and sample reference data from original remote sensing images and ground reference data; transposing and rotating the sample reference data to form a training sample; carrying out object-oriented segmentation on the training sample to obtain a sample image object; extracting the prior and the subsequent shallow features of the sample image object, and constructing an object feature layer by the prior and the subsequent shallow features of the sample image; training samples are input into a full convolution neural network model for training, an optimization function of the full convolution neural network model is an adaptive moment estimation algorithm, the learning rate is 1 multiplied by 10 < -4 >, a loss function is a binary cross entropy, and the batch size is 150.

The shallow features of the current image and the sample image include: the image has the texture characteristics of the homogeneity, the contrast and the second moment in four directions of 0 degree, 45 degrees, 90 degrees and 135 degrees in the gray level co-occurrence matrix.

The object-oriented segmentation method comprises the following steps: a parting network evolution algorithm or a watershed segmentation algorithm or a mean shift segmentation algorithm is used.

The full convolution neural network model designed by the embodiment can effectively extract multi-level features of the remote sensing image through cascade connection of shallow-layer features and deep-layer semantic features, allows more high-resolution information such as spectrum, texture and the like of an original image to be transmitted and fused in a deep-layer feature space due to the use of the shallow-layer object features, enriches high-resolution feature information, and can keep accurate boundaries of a target object and effectively reduce the features inside and outside the object.

In addition, aiming at the integrity, correctness, quality, overall accuracy and calculation time of the target extraction result, the embodiment also makes an experiment to prove the superiority of the embodiment of the present application compared with the full convolution neural network model in the prior art, and specific data are shown in table 1:

TABLE 1

As shown in table 1, table 1 shows parameters of results of extracting the target by the full convolution neural network model of the prior art and the present embodiment, and it can be known from table 1 that each parameter of extracting the target by the full convolution neural network model provided in the embodiment of the present application is superior to the full convolution neural network model of the prior art.

In addition, the full convolution neural network model is directly applied to the depth feature extraction of the remote sensing images of other matters by using the migration learning method, so that in the practical application, the model does not need to be trained again and parameters do not need to be adjusted, and the target extraction efficiency is greatly improved under the condition of high-resolution remote sensing image target extraction, particularly the condition of continuously monitoring the earth surface target.

The method for extracting the remote sensing image target is suitable for automatic identification and continuous monitoring of targets such as forest felling, forest fire, landslide and flood in the high-resolution remote sensing image.

An embodiment of the present application further provides an electronic device, please refer to fig. 3, where the electronic device includes: the remote sensing image target extraction method comprises a memory 601, a processor 602 and a computer program which is stored in the memory 601 and can run on the processor 602, wherein when the processor 602 executes the computer program, the remote sensing image target extraction method is realized.

Further, the electronic device further includes: at least one input device 603 and at least one output device 604.

The memory 601, the processor 602, the input device 603, and the output device 604 are connected by a bus 605.

The input device 603 may be a camera, a touch panel, a physical button, a mouse, or the like. The output device 604 may be embodied as a display screen.

The Memory 601 may be a high-speed Random Access Memory (RAM) Memory or a non-volatile Memory (non-volatile Memory), such as a disk Memory. The memory 601 is used for storing a set of executable program code, and the processor 602 is coupled to the memory 601.

Further, an embodiment of the present application also provides a computer-readable storage medium, which may be disposed in the electronic device in the foregoing embodiments, and the computer-readable storage medium may be the memory 601 in the foregoing embodiments. The computer readable storage medium has stored thereon a computer program which, when executed by the processor 602, implements the method for extracting a target of a remote sensing image described in the foregoing method embodiments.

Further, the computer-readable storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory 601 (ROM), a RAM, a magnetic disk, or an optical disk.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present invention is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no acts or modules are necessarily required of the invention.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the above description, for the remote sensing image object extraction method, the electronic device and the storage medium provided by the present invention, for those skilled in the art, there are variations in the specific implementation and application scope according to the ideas of the embodiments of the present invention, and in summary, the content of the present specification should not be construed as limiting the present invention.

Claims

1. A remote sensing image target extraction method is characterized by comprising the following steps:

carrying out object-oriented segmentation on the acquired current remote sensing image to obtain a current image object;

extracting shallow layer characteristics of a pre-image and a post-image of a current image object;

inputting the posterior shallow feature and the posterior image of the current image object into a pre-trained full convolution neural network model to obtain the depth feature of the posterior image output by the full convolution neural network model, wherein the full convolution neural network model is trained by using an original remote sensing image and ground reference data;

inputting the shallow layer characteristics of the current image object into the full convolution neural network model in advance by using transfer learning to obtain the depth characteristics of the image in advance output by the full convolution neural network model;

carrying out change vector analysis on the depth features of the previous image and the depth features of the subsequent image to generate a change intensity feature map;

and (3) dividing each pixel of the remote sensing image into a target pixel and a non-target pixel by using an unsupervised K-means clustering method to finish target extraction.

2. The method for extracting a target from a remote sensing image according to claim 1,

the full convolution neural network model includes:

17 convolutional layers, 4 maximum pooling layers, 4 transpose convolutional layers, 4 cascade layers, and 1 object feature layer;

the size of 15 convolutional layers is 3 × 3, the size of the other two convolutional layers is 1 × 1, the size of a sampling window used by the maximum pooling layer is 2 × 2, and the size of a window of the transposed convolutional layer is 2 × 2.

3. The method for extracting a target from a remote sensing image according to claim 2,

the full convolution neural network model is constructed in the following mode:

a first convolution layer, a second convolution layer, a first maximum pooling layer, a third convolution layer, a fourth convolution layer, a second maximum pooling layer, a fifth convolution layer, a sixth convolution layer, a third maximum pooling layer, a seventh convolution layer, an eighth convolution layer, a fourth maximum pooling layer, a ninth convolution layer, a first transposition convolution layer, a first cascade layer, a tenth convolution layer, an eleventh convolution layer, a second transposition learning layer, a second cascade layer, a twelfth convolution layer, a thirteenth convolution layer, a third transposition learning layer, a third cascade layer, a fourteenth convolution layer, a fifteenth convolution layer and a fourth transposition convolution layer are sequentially arranged and sequentially transmit data, in addition, the fourth transposition convolution layer and the object feature layer commonly transmit data to the fourth cascade layer, the sixteenth convolution layer and the seventeenth convolution layer are sequentially arranged and transmit data, and the seventeenth convolution is an output end of the full convolution neural network model, the first convolution layer is an input end of the full convolution neural network model, the fourth convolution layer can also transmit data to the third cascade layer, the sixth convolution layer can also directly transmit data to the second cascade layer, and the eighth convolution layer can also transmit data to the first transfer convolution layer;

the convolution kernel size of the sixteenth convolution layer and the seventeenth convolution layer is 1 × 1;

the 17 convolutional layers, the 4 maximum pooling layers, the 4 transposed convolutional layers, the 4 cascade layers and the 1 object feature layer have different dimensions respectively;

the dimensions of the first convolutional layer and the second convolutional layer are 16, the dimensions of the third convolutional layer and the fourth convolutional layer are 32, the dimensions of the fifth convolutional layer and the sixth convolutional layer are 64, the dimensions of the seventh convolutional layer and the eighth convolutional layer are 128, the dimensions of the ninth convolutional layer and the tenth convolutional layer are 128, the dimensions of the eleventh convolutional layer and the twelfth convolutional layer are 64, the dimensions of the thirteenth convolutional layer and the fourteenth convolutional layer are 32, the dimensions of the fifteenth convolutional layer and the sixteenth convolutional layer are 16, and the dimension of the seventeenth convolutional layer is 2;

the dimension of the first largest pooling layer is 16, the dimension of the second largest pooling layer is 32, the dimension of the third largest pooling layer is 64, and the dimension of the fourth largest pooling layer is 128;

the dimension of the first cascade layer is 256, the dimension of the second cascade layer is 128, the dimension of the third cascade layer is 64, and the dimension of the fourth cascade layer is 31;

the dimension of the first transposed convolutional layer is 128, the dimension of the second transposed convolutional layer is 64, the dimension of the third transposed convolutional layer is 32, and the dimension of the fourth transposed convolutional layer is 16;

the dimension of the object feature layer is 15.

4. The method for extracting a target from a remote sensing image according to claim 1,

the training method of the full convolution neural network model comprises the following steps:

randomly intercepting 400 sample remote sensing images with the size of 160 multiplied by 160 pixels and sample reference data from original remote sensing images and ground reference data;

transposing and rotating the sample reference data to form a training sample;

carrying out object-oriented segmentation on the training sample to obtain a sample image object;

extracting the prior and subsequent shallow features of the sample image object, and constructing an object feature layer by the prior and subsequent shallow features of the sample image;

inputting the training sample into the full convolution neural network model for training, wherein the optimization function of the full convolution neural network model is an adaptive moment estimation algorithm, and the learning rate is 1 multiplied by 10^-4The loss function is binary cross entropy and is large in batchAs small as 150.

5. The method for extracting a target from a remote sensing image according to claim 4,

6. The method for extracting a target from a remote sensing image according to claim 1,

the object-oriented segmentation method comprises the following steps:

a parting network evolution algorithm or a watershed segmentation algorithm or a mean shift segmentation algorithm is used.

7. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 6 when executing the computer program.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 6.