CN111985487B

CN111985487B - Remote sensing image target extraction method, electronic equipment and storage medium

Info

Publication number: CN111985487B
Application number: CN202010899790.7A
Authority: CN
Inventors: 张效康; 潘文安
Original assignee: Chinese University of Hong Kong Shenzhen
Current assignee: Chinese University of Hong Kong Shenzhen
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2024-03-19
Anticipated expiration: 2040-08-31
Also published as: CN111985487A

Abstract

The invention discloses a remote sensing image target extraction method, which comprises the following steps: object-oriented segmentation is carried out on the current remote sensing image after the event to obtain a current image object; extracting shallow features of the current image object before and after the current image object; inputting the post shallow features of the current image object and the post image into a full convolution neural network model, integrating the object-oriented shallow features and the image depth features by the constructed full convolution neural network, and selecting samples for model training to obtain the depth features of the post image; using transfer learning, inputting shallow features of a prior image of a current image object and the prior image into a full convolution neural network model to obtain depth features of the prior image; carrying out change vector analysis on the depth features of the prior image and the depth features of the post image to generate a change intensity feature map; dividing each pixel of the remote sensing image into a target pixel and a non-target pixel, and finishing target extraction; the extraction of the target features can be made finer and noise reduced.

Description

Remote sensing image target extraction method, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a remote sensing image target extraction method, an electronic device, and a storage medium.

Background

Ground object targets (such as landslide, flood, forest fire, house collapse and the like) are automatically extracted and detected from the high-resolution remote sensing images in advance and behind, so that the method has become an effective means for disaster monitoring and emergency rescue. At present, the full convolution neural network model is widely applied to remote sensing image classification, target detection and semantic segmentation.

However, although the full convolution neural network model extracts high-level semantic features in the application, more details are lost, as the features learned by deepening the hierarchy are more abstract, the existing full convolution neural network model is difficult to capture the accurate outlines of different objects at the pixel level, and the noise in the target extraction result is more.

Disclosure of Invention

The invention mainly aims to provide a remote sensing image target extraction method, an electronic device and a storage medium, and aims to solve the technical problem that in the prior art, an existing full convolution neural network is difficult to capture accurate outlines of different objects at a pixel level, and noise in a target extraction result is more.

In order to achieve the above object, a first aspect of the present invention provides a method for extracting a target from a remote sensing image, including: dividing the acquired post-event current remote sensing image object-oriented to obtain a current image object; extracting the shallow layer characteristics of the current image object before and after the current image object; inputting the posterior shallow features and the posterior images of the current image object into a pre-trained full-convolution neural network model to obtain depth features of the posterior images output by the full-convolution neural network model, wherein the full-convolution neural network model is trained by using an original remote sensing image and ground reference data; using transfer learning, inputting the advanced shallow features and the advanced images of the current image object into the full convolution neural network model to obtain the depth features of the advanced images output by the full convolution neural network model; carrying out change vector analysis on the depth features of the previous image and the depth features of the subsequent image to generate a change intensity feature map; and dividing each pixel of the remote sensing image into a target pixel and a non-target pixel by using an unsupervised K-means clustering method, and completing target extraction.

Further, the full convolutional neural network model includes: 17 convolutional layers, 4 max pooling layers, 4 transpose convolutional layers, 4 concatenation layers, and 1 object feature layer; the core size of 15 convolution layers is 3×3, the core sizes of the other two convolution layers are 1×1, the size of a sampling window used by the maximum pooling layer is 2×2, and the window size of the transposed convolution layer is 2×2.

Further, the construction mode of the full convolution neural network model is as follows: the first convolution layer, the second convolution layer, the first maximum pooling layer, the third convolution layer, the fourth convolution layer, the second maximum pooling layer, the fifth convolution layer, the sixth convolution layer, the third maximum pooling layer, the seventh convolution layer, the eighth convolution layer, the fourth maximum pooling layer, the ninth convolution layer, the first inversion convolution layer, the first cascade layer, the tenth convolution layer, the eleventh convolution layer, the second transposition study layer, the second cascade layer, the twelfth convolution layer, the thirteenth convolution layer, the third transposition study layer, the third cascade layer, the fourteenth convolution layer, the fifteenth convolution layer, the fourth transposition convolution layer are sequentially arranged and sequentially transmit data, in addition, the fourth cascade layer, the sixteenth convolution layer and the seventeenth convolution layer are sequentially arranged and transmit data, the seventeenth convolution layer is an output end of a full-convolution neural network model, the first convolution layer is an input end of the full-convolution network model, the fourth convolution layer can also transmit data to the third cascade layer, the fourth convolution layer can also transmit data to the eighth cascade layer, and the eighth convolution layer can also transmit data to the fourth cascade layer; the convolution kernel size of the sixteenth convolution layer is 1×1; the 17 convolution layers, the 4 maximum pooling layers, the 4 transposed convolution layers, the 4 cascade layers and the 1 object feature layer respectively have different dimensions; the dimensions of the first and second convolution layers are 16, the dimensions of the third and fourth convolution layers are 32, the dimensions of the fifth and sixth convolution layers are 64, the dimensions of the seventh and eighth convolution layers are 128, the dimensions of the ninth and tenth convolution layers are 128, the dimensions of the eleventh and twelfth convolution layers are 64, the dimensions of the thirteenth and fourteenth convolution layers are 32, the dimensions of the fifteenth and sixteenth convolution layers are 16, and the dimensions of the seventeenth convolution layer are 2; the dimension of the first largest pooling layer is 16, the dimension of the second largest pooling layer is 32, the dimension of the third largest pooling layer is 64, and the dimension of the fourth largest pooling layer is 128; the dimension of the first cascade layer is 256, the dimension of the second cascade layer is 128, the dimension of the third cascade layer is 64, and the dimension of the fourth cascade layer is 31; the dimension of the first transposed convolutional layer is 128, the dimension of the second transposed convolutional layer is 64, the dimension of the third transposed convolutional layer is 32, and the dimension of the fourth transposed convolutional layer is 16; the dimension of the object feature layer is 15.

Further, the training method of the full convolution neural network model is as follows: randomly intercepting 400 sample remote sensing images with 160 multiplied by 160 pixels and sample reference data from the original remote sensing images and ground reference data; transpose and rotate the said sample reference data, form the training sample; performing object-oriented segmentation on the training sample to obtain a sample image object; extracting the advanced and post shallow features of the sample image object, and constructing an object feature layer by the advanced and post shallow features of the sample image; inputting the training sample into the full convolution neural network model for training, wherein the optimization function of the full convolution neural network model is an adaptive moment estimation algorithm, and the learning rate is 1 multiplied by 10 ^-4 The loss function is binary cross entropy and the batch size is 150.

Further, the shallow features of the current image and the sample image include: the spectrum average value of the image in each wave band, and the homogeneity, contrast and second moment texture characteristics of the gray level co-occurrence matrix in four directions of 0 degree, 45 degree, 90 degree and 135 degree.

Further, the method for object-oriented segmentation includes: a parting network evolution algorithm or a watershed segmentation algorithm or a mean shift segmentation algorithm is used.

A third aspect of the present invention provides an electronic device, comprising: the remote sensing image target extraction method comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the remote sensing image target extraction method is realized by the processor when the processor executes the computer program.

A fourth aspect of the present invention provides a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the remote sensing image target extraction method according to any one of the above.

The invention provides a remote sensing image target extraction method, electronic equipment and a storage medium, which have the beneficial effects that: the full convolution neural network model with a brand new structure is trained by using the original remote sensing image and the ground reference data, and a method for extracting shallow features such as spectrum and texture is fused on the basis of extracting deep features of the existing full convolution neural network, so that the extraction of target features is finer, the detailed information of a target object can be kept, and noise in a target extraction result is reduced.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other drawings may be obtained from them without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a remote sensing image target extraction method according to an embodiment of the invention;

FIG. 2 is a schematic block diagram of a full convolution neural network model of a remote sensing image target extraction method according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention will be clearly described in conjunction with the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a remote sensing image target extraction method includes: s1, dividing the acquired post-event current remote sensing image object-oriented to obtain a current image object; s2, extracting the shallow features of the current image object before and after; s3, inputting the shallow layer characteristics and the post images of the current image object into a pre-trained full convolution neural network model to obtain depth characteristics of the post images output by the full convolution neural network model, wherein the full convolution neural network model is trained by using the original remote sensing images and ground reference data; s4, inputting the advanced shallow features and the advanced images of the current image object into a full convolution neural network model by using transfer learning to obtain the depth features of the advanced images output by the full convolution neural network model; s5, carrying out change vector analysis on the depth features of the prior image and the depth features of the post image to generate a change intensity feature map; s6, dividing each pixel of the remote sensing image into a target pixel and a non-target pixel by using an unsupervised K-means clustering method, and completing target extraction.

The full convolution neural network model which is trained by using the original remote sensing image and the ground reference data is added with a shallow feature extraction method on the basis of deep semantic feature extraction of the existing full convolution neural network, so that the extraction of target features is finer, and noise in a target extraction result is reduced.

Referring to fig. 2, the full convolution neural network model includes: 17 convolutional layers, 4 max pooling layers, 4 transpose convolutional layers, 4 concatenation layers, and 1 object feature layer; the core size of 15 convolution layers is 3×3, the core sizes of the other two convolution layers are 1×1, the size of a sampling window used by the maximum pooling layer is 2×2, and the window size of the transposed convolution layer is 2×2.

Wherein the first to seventeenth convolution layers respectively correspond to convolution layers 1 to 17 in fig. 2, the first to fourth maximum pooling layers respectively correspond to maximum pooling layers 1 to 4 in fig. 2, the first to fourth transposed convolution layers respectively correspond to transposed convolution layers 1 to 4 in fig. 2, and the first to fourth cascade layers respectively correspond to cascade layers 1 to 4 in fig. 2; in addition, numerals in parentheses in fig. 2 represent dimensions.

The construction mode of the full convolution neural network model is as follows: the first convolution layer, the second convolution layer, the first maximum pooling layer, the third convolution layer, the fourth convolution layer, the second maximum pooling layer, the fifth convolution layer, the sixth convolution layer, the third maximum pooling layer, the seventh convolution layer, the eighth convolution layer, the fourth maximum pooling layer, the ninth convolution layer, the first inversion convolution layer, the first cascade layer, the tenth convolution layer, the eleventh convolution layer, the second transposition learning layer, the second cascade layer, the twelfth convolution layer, the thirteenth convolution layer, the third transposition learning layer, the third cascade layer, the fourteenth convolution layer, the fifteenth convolution layer, the fourth transposition convolution layer are sequentially arranged and sequentially transmit data, in addition, the fourth transposition convolution layer and the object feature layer jointly transmit data to the fourth cascade layer, the sixteenth convolution layer and the seventeenth convolution layer are sequentially arranged and transmit data, the seventeenth convolution layer is an output end of a full-convolution neural network model, the first convolution layer is an input end of the full-convolution network model, the fourth convolution layer can also transmit data to the third cascade layer, the sixth convolution layer can also transmit data to the eighth cascade layer, and the eighth convolution layer can also transmit data to the fourth cascade layer.

The convolution kernel sizes of the first to fifteenth convolution layers are 3×3, and the convolution kernel sizes of the sixteenth and seventeenth convolution layers are 1×1.

The 17 convolutional layers, the 4 max pooling layers, the 4 transposed convolutional layers, the 4 concatenation layers, and the 1 object feature layer have different dimensions, respectively.

The dimensions of the first and second convolution layers are 16, the dimensions of the third and fourth convolution layers are 32, the dimensions of the fifth and sixth convolution layers are 64, the dimensions of the seventh and eighth convolution layers are 128, the dimensions of the ninth and tenth convolution layers are 128, the dimensions of the eleventh and twelfth convolution layers are 64, the dimensions of the thirteenth and fourteenth convolution layers are 32, the dimensions of the fifteenth and sixteenth convolution layers are 16, and the dimensions of the seventeenth convolution layer are 2.

The first largest pooling layer has dimensions of 16, the second largest pooling layer has dimensions of 32, the third largest pooling layer has dimensions of 64, and the fourth largest pooling layer has dimensions of 128.

The first cascade layer has a dimension of 256, the second cascade layer has a dimension of 128, the third cascade layer has a dimension of 64, and the fourth cascade layer has a dimension of 31.

The first transposed convolutional layer has dimensions 128, the second transposed convolutional layer has dimensions 64, the third transposed convolutional layer has dimensions 32, and the fourth transposed convolutional layer has dimensions 16.

The dimension of the object feature layer is 15.

The training method of the full convolution neural network model is as follows: randomly intercepting 400 sample remote sensing images with 160 multiplied by 160 pixels and sample reference data from the original remote sensing images and ground reference data; transpose and rotate the sample reference data to form a training sample; performing object-oriented segmentation on the training sample to obtain a sample image object; extracting the advanced and post shallow features of the sample image object, and constructing an object feature layer by the advanced and post shallow features of the sample image; the training samples are input into a full convolution neural network model for training, the optimization function of the full convolution neural network model is an adaptive moment estimation algorithm, the learning rate is 1 multiplied by 10 < -4 >, the loss function is a binary cross entropy, and the batch size is 150.

Shallow features of the current image and the sample image include: the spectrum average value of the image in each wave band, and the homogeneity, contrast and second moment texture characteristics of the gray level co-occurrence matrix in four directions of 0 degree, 45 degree, 90 degree and 135 degree.

The method for object-oriented segmentation comprises the following steps: a parting network evolution algorithm or a watershed segmentation algorithm or a mean shift segmentation algorithm is used.

The full convolution neural network model designed by the embodiment can effectively extract multi-level characteristics of remote sensing images through shallow characteristics and deep semantic characteristic cascade, allows more high-resolution information such as original image spectrums and textures to be transmitted and fused in a deep characteristic space due to the fact that shallow object characteristics are used, enriches high-resolution characteristic information, enables generated results to keep accurate boundaries of target objects, and effectively reduces characteristics inside and outside the objects.

In addition, experiments are also performed for the integrity, accuracy, quality, overall precision and calculation time of the target extraction result, so as to prove the superiority of the embodiment of the present application compared with the full convolution neural network model in the prior art, and specific data are shown in table 1:

TABLE 1

As shown in table 1, table 1 is the result parameters of the target extraction by the full convolutional neural network model of the prior art and the embodiment, and as can be seen from table 1, each parameter of the target extraction by the full convolutional neural network model provided in the embodiment of the present application is superior to the full convolutional neural network model of the prior art.

In addition, by using the migration learning method, the full convolution neural network model is directly applied to depth feature extraction of other event remote sensing images, so that training and parameter adjustment of the model are not needed again in actual application, and the target extraction efficiency is greatly improved under the condition that high-resolution remote sensing image targets are extracted, especially ground surface targets are continuously monitored.

The remote sensing image target extraction method provided by the embodiment of the application is suitable for automatic identification and continuous monitoring of targets such as forest cutting, forest fire, landslide, flood and the like in the high-resolution remote sensing image.

An embodiment of the present application further provides an electronic device, please refer to fig. 3, including: the remote sensing image target extraction method described in the foregoing is implemented by the memory 601, the processor 602 and a computer program stored in the memory 601 and executable on the processor 602 when the processor 602 executes the computer program.

Further, the electronic device further includes: at least one input device 603 and at least one output device 604.

The memory 601, the processor 602, the input device 603, and the output device 604 are connected via a bus 605.

The input device 603 may be a camera, a touch panel, a physical key, a mouse, or the like. The output device 604 may be, in particular, a display screen.

The Memory 601 may be a high-speed Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as a disk Memory. The memory 601 is used for storing a set of executable program codes and the processor 602 is coupled to the memory 601.

Further, the present application also provides a computer readable storage medium, which may be provided in the electronic device in each of the above embodiments, and the computer readable storage medium may be the memory 601 in the above embodiments. The computer readable storage medium stores a computer program which, when executed by the processor 602, implements the remote sensing image target extraction method described in the foregoing method embodiments.

Further, the computer-readable medium may be any medium capable of storing a program code, such as a usb (universal serial bus), a removable hard disk, a Read-Only Memory 601 (ROM), a RAM, a magnetic disk, or an optical disk.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present invention is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the present invention.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The foregoing describes a remote sensing image target extraction method, an electronic device and a storage medium provided by the present invention, and those skilled in the art should not understand the present invention to limit the scope of the present invention in view of the foregoing description of the present invention.

Claims

1. The remote sensing image target extraction method is characterized by comprising the following steps of: object-oriented segmentation is carried out on the obtained post-hoc current remote sensing image, and a current image object is obtained; extracting shallow features of a pre-image and a post-image of a current image object; inputting the posterior shallow features and the posterior images of the current image object into a pre-trained full-convolution neural network model to obtain depth features of the posterior images output by the full-convolution neural network model, wherein the full-convolution neural network model is trained by using an original remote sensing image and ground reference data; using transfer learning, inputting the advanced shallow features of the current image object into the full convolution neural network model to obtain the depth features of the advanced images output by the full convolution neural network model; carrying out change vector analysis on the depth features of the previous image and the depth features of the subsequent image to generate a change intensity feature map; dividing each pixel of the remote sensing image into a target pixel and a non-target pixel by using an unsupervised K-means clustering method to finish target extraction;

the full convolution neural network model includes: 17 convolutional layers, 4 max pooling layers, 4 transpose convolutional layers, 4 concatenation layers, and 1 object feature layer; the core size of 15 convolution layers is 3 multiplied by 3, the core sizes of the other two convolution layers are 1 multiplied by 1, the size of a sampling window used by the largest pooling layer is 2 multiplied by 2, and the window size of the transposed convolution layer is 2 multiplied by 2;

the construction mode of the full convolution neural network model is as follows: the first convolution layer, the second convolution layer, the first maximum pooling layer, the third convolution layer, the fourth convolution layer, the second maximum pooling layer, the fifth convolution layer, the sixth convolution layer, the third maximum pooling layer, the seventh convolution layer, the eighth convolution layer, the fourth maximum pooling layer, the ninth convolution layer, the first inversion convolution layer, the first cascade layer, the tenth convolution layer, the eleventh convolution layer, the second transposition study layer, the second cascade layer, the twelfth convolution layer, the thirteenth convolution layer, the third transposition study layer, the third cascade layer, the fourteenth convolution layer, the fifteenth convolution layer, the fourth transposition convolution layer are sequentially arranged and sequentially transmit data, in addition, the fourth cascade layer, the sixteenth convolution layer and the seventeenth convolution layer are sequentially arranged and transmit data, the seventeenth convolution layer is an output end of a full-convolution neural network model, the first convolution layer is an input end of the full-convolution network model, the fourth convolution layer can also transmit data to the third cascade layer, the fourth convolution layer can also transmit data to the eighth cascade layer, and the eighth convolution layer can also transmit data to the fourth cascade layer; the convolution kernel size of the sixteenth convolution layer is 1×1; the 17 convolution layers, the 4 maximum pooling layers, the 4 transposed convolution layers, the 4 cascade layers and the 1 object feature layer respectively have different dimensions; the dimensions of the first and second convolution layers are 16, the dimensions of the third and fourth convolution layers are 32, the dimensions of the fifth and sixth convolution layers are 64, the dimensions of the seventh and eighth convolution layers are 128, the dimensions of the ninth and tenth convolution layers are 128, the dimensions of the eleventh and twelfth convolution layers are 64, the dimensions of the thirteenth and fourteenth convolution layers are 32, the dimensions of the fifteenth and sixteenth convolution layers are 16, and the dimensions of the seventeenth convolution layer are 2; the dimension of the first largest pooling layer is 16, the dimension of the second largest pooling layer is 32, the dimension of the third largest pooling layer is 64, and the dimension of the fourth largest pooling layer is 128; the dimension of the first cascade layer is 256, the dimension of the second cascade layer is 128, the dimension of the third cascade layer is 64, and the dimension of the fourth cascade layer is 31; the dimension of the first transposed convolutional layer is 128, the dimension of the second transposed convolutional layer is 64, the dimension of the third transposed convolutional layer is 32, and the dimension of the fourth transposed convolutional layer is 16; the dimension of the object feature layer is 15.

2. The remote sensing image target extraction method according to claim 1, wherein the training method of the full convolution neural network model is as follows: randomly intercepting 400 sample remote sensing images with 160 multiplied by 160 pixels and sample reference data from the original remote sensing images and ground reference data; transpose and rotate the said sample reference data, form the training sample; performing object-oriented segmentation on the training sample to obtain a sample image object; extracting the advanced and post shallow features of the sample image object, and constructing an object feature layer by the advanced and post shallow features of the sample image; and inputting the training samples into the full convolution neural network model for training, wherein an optimization function of the full convolution neural network model is an adaptive moment estimation algorithm, the learning rate is 1 multiplied by 10 < -4 >, the loss function is a binary cross entropy, and the batch size is 150.

3. The method of claim 2, wherein the shallow features of the current image and the sample image comprise: the spectrum average value of the image in each wave band, and the homogeneity, contrast and second moment texture characteristics of the gray level co-occurrence matrix in four directions of 0 degree, 45 degree, 90 degree and 135 degree.

4. The method for extracting a target from a remote sensing image according to claim 1, wherein the method for object-oriented segmentation comprises: a parting network evolution algorithm or a watershed segmentation algorithm or a mean shift segmentation algorithm is used.

5. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 4 when executing the computer program.

6. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1 to 4.