CN116012709A

CN116012709A - High-resolution remote sensing image building extraction method and system

Info

Publication number: CN116012709A
Application number: CN202310017827.2A
Authority: CN
Inventors: 刘耀辉; 苏飞; 郑国强; 付萍杰; 毕京学; 王晋; 于明洋; 周洁; 靳奉祥
Original assignee: Shandong Jianzhu University
Current assignee: Shandong Jianzhu University
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-04-25
Anticipated expiration: 2043-01-06
Also published as: CN116012709B

Abstract

The invention provides a high-resolution remote sensing image building extraction method and a system, wherein the method comprises the following steps: acquiring aerial image data in a research area; performing image preprocessing and data enhancement on the aviation image data, and dividing the processed data into a training set, a test set and a verification set; respectively constructing a spatial information encoder, a semantic encoder, a dual-attention extractor and an attention characteristic definer; constructing a depth space attention network according to the space information encoder, the semantic encoder, the dual attention extractor and the attention feature definer, and training the depth space attention network according to the training set; respectively testing and verifying the depth space attention network by using the test set and the verification set to obtain an optimal remote sensing image building extraction model; and inputting the data to be detected into a remote sensing image building extraction model to obtain a detection result. The invention can improve the extraction accuracy of the remote sensing image building.

Description

High-resolution remote sensing image building extraction method and system

Technical Field

The invention relates to the technical field of image processing, in particular to a high-resolution remote sensing image building extraction method and system.

Background

With the rapid development of economic activities, the number of buildings is greatly increased, and the area distribution range of the constructed buildings is wider. At the same time, the building exterior profile shape also tends to be large and complex. Therefore, in order to effectively and reasonably grasp the economic development and population distribution of the areas, public service points such as infrastructure, service sites and the like are better arranged, and the efficient and accurate extraction of buildings in high-resolution remote sensing images is a problem to be solved.

In the field of computer image processing, the deep learning can effectively extract a target object through good training, and the method has excellent operation performance. However, building segmentation is a binary classification task involving building and non-building elements, and there are several problems with building extraction of high resolution remote sensing images through only the underlying convolutional neural network in the deep-learned base model. If the building accuracy rate is not high in a certain range when facing complex ground object scenes, the boundary is incomplete or the boundary is fuzzy when facing vegetation shielding, and the problems that the characteristics of the extracted building are scattered after a large amount of convolution layers in the existing model are accumulated are solved. These all create great difficulties in efficiently and accurately achieving high resolution remote sensing image building extraction.

Therefore, a method for improving the extraction accuracy of remote sensing image buildings is needed in the prior art to solve the above-mentioned problems.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a method and a system for effectively extracting a building in a high-resolution remote sensing image by taking the remote sensing image with noise and proper size as input data and constructing a depth space-based attention network based on an attention mechanism in deep learning in a complex scene of the high-resolution aerial image.

In order to achieve the above object, the present invention provides the following solutions:

a high-resolution remote sensing image building extraction method comprises the following steps:

acquiring aerial image data in a research area;

performing image preprocessing and data enhancement on the aerial image data, and dividing the processed data into a training set, a testing set and a verification set;

respectively constructing a spatial information encoder, a semantic encoder, a dual-attention extractor and an attention characteristic definer;

constructing a depth space attention network according to the spatial information encoder, the semantic encoder, the dual-attention extractor and the attention feature definer, and training the depth space attention network according to the training set;

testing and verifying the depth space attention network by using the test set and the verification set respectively to obtain an optimal remote sensing image building extraction model;

and inputting the data to be detected into the remote sensing image building extraction model to obtain a detection result.

Preferably, the acquiring aerial image data in the research area includes:

carrying out aviation in the research area by using an unmanned aerial vehicle according to a set flight route so as to acquire a remote sensing data set in the research area;

generating a remote sensing image of the research area according to the remote sensing data set with the geographic coordinates shot by the unmanned aerial vehicle;

and adjusting the spatial resolution of the remote sensing image, and manufacturing the aerial image data according to the remote sensing image of which the spatial resolution reaches a preset resolution threshold.

Preferably, the image preprocessing and data enhancement are performed on the aerial image data, and the processed data are divided into a training set, a testing set and a verification set, including:

determining a building sample in the aerial image data;

determining a building label from the building sample;

performing influence cutting on the aerial image data and the corresponding building label to obtain a cut image;

carrying out data enhancement on the cut image to obtain an expanded image;

and randomly dividing the expanded image according to a preset proportion to obtain the training set, the test set and the verification set.

Preferably, the spatial information encoder comprises three stages, each stage comprising a two-layer shallow structure; each layer of shallow structure consists of convolution operation, batch normalization and parameter correction linear units; the first layer and the second layer of each stage have the same number of filters and output feature map sizes; the semantic encoder comprises a channel segmentation and repartition module and 3 downsampling modules; the dual-attention extractor is constructed based on a kernel attention mechanism and a channel attention mechanism of dot product; the attention characteristic definer comprises a project layer and an upsampling module; each layer of the attention profile definer is composed of a convolution layer, a BN layer, and a pralu.

Preferably, constructing a depth space attention network according to the spatial information encoder, the semantic encoder, the dual attention extractor and the attention feature definer, and training the depth space attention network according to the training set includes:

constructing a spatial information coding network branch network according to the spatial information coder;

constructing a high-dimensional feature extraction branch network according to the semantic encoder, the dual-attention extractor and the attention feature definer;

constructing the depth space attention network according to the space information coding network branch network, the high-dimensional feature extraction branch network and a preset feature fusion device;

and inputting the training set into the depth space attention network, and training after adjusting the super parameters to obtain a trained depth space attention network.

Preferably, the testing set and the verification set are used for testing and verifying the depth space attention network respectively to obtain an optimal remote sensing image building extraction model, which comprises the following steps:

initializing training parameters of the depth space attention network;

testing the depth space attention network according to the test set, and evaluating the tested depth space attention network according to a preset evaluation index to obtain a tested depth space attention network;

and verifying the tested depth space attention network according to the verification set to obtain the optimal remote sensing image building extraction model.

A high resolution remote sensing image building extraction system comprising:

the image acquisition module is used for acquiring aviation image data in the research area;

the data processing module is used for carrying out image preprocessing and data enhancement on the aviation image data and dividing the processed data into a training set, a testing set and a verification set;

the module construction module is used for respectively constructing a spatial information encoder, a semantic encoder, a dual-attention extractor and an attention characteristic definer;

the network training module is used for constructing a depth space attention network according to the space information encoder, the semantic encoder, the dual attention extractor and the attention feature definer, and training the depth space attention network according to the training set;

the model optimizing module is used for testing and verifying the depth space attention network by utilizing the test set and the verification set respectively to obtain an optimal remote sensing image building extraction model;

and the detection module is used for inputting the data to be detected into the remote sensing image building extraction model to obtain a detection result.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention provides a high-resolution remote sensing image building extraction method and a system, wherein the method comprises the following steps: acquiring aerial image data in a research area; performing image preprocessing and data enhancement on the aerial image data, and dividing the processed data into a training set, a testing set and a verification set; respectively constructing a spatial information encoder, a semantic encoder, a dual-attention extractor and an attention characteristic definer; constructing a depth space attention network according to the spatial information encoder, the semantic encoder, the dual-attention extractor and the attention feature definer, and training the depth space attention network according to the training set; testing and verifying the depth space attention network by using the test set and the verification set respectively to obtain an optimal remote sensing image building extraction model; and inputting the data to be detected into the remote sensing image building extraction model to obtain a detection result. The invention can improve the extraction accuracy of the remote sensing image building.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a building extraction process according to an embodiment of the present invention;

FIG. 3 is a diagram showing a data set provided by an embodiment of the present invention;

FIG. 4 is a frame diagram of a deep attention model provided by an embodiment of the present invention;

fig. 5 is a graph of experimental results provided in the examples of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims of this application and in the drawings, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, inclusion of a list of steps, processes, methods, etc. is not limited to the listed steps but may alternatively include steps not listed or may alternatively include other steps inherent to such processes, methods, products, or apparatus.

The invention aims to provide a high-resolution remote sensing image building extraction method and system, which can improve the remote sensing image building extraction accuracy.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Fig. 1 is a flowchart of a method provided by an embodiment of the present invention, and as shown in fig. 1, the present invention provides a method for extracting a high-resolution remote sensing image building, including:

step 100: acquiring aerial image data in a research area;

step 200: performing image preprocessing and data enhancement on the aerial image data, and dividing the processed data into a training set, a testing set and a verification set;

step 300: respectively constructing a spatial information encoder, a semantic encoder, a dual-attention extractor and an attention characteristic definer;

step 400: constructing a depth space attention network according to the spatial information encoder, the semantic encoder, the dual-attention extractor and the attention feature definer, and training the depth space attention network according to the training set;

step 500: testing and verifying the depth space attention network by using the test set and the verification set respectively to obtain an optimal remote sensing image building extraction model;

step 600: and inputting the data to be detected into the remote sensing image building extraction model to obtain a detection result.

Preferably, the acquiring aerial image data in the research area includes:

As shown in fig. 2, the building extraction flow in the present embodiment includes the following steps:

step 1, manufacturing Weinan aerial image data through high-resolution aerial images acquired by an unmanned aerial vehicle, wherein the method comprises the following sub-steps of:

step 1.1, selecting a Helan village of Fu Ping county in the West of Shanxi province as a research area on the basis of field investigation, carrying out aviation on the research area through an unmanned aerial vehicle, and acquiring high-resolution remote sensing images in the research area to manufacture a data set.

Step 1.2 experimental data used in this study were from a small four-rotor Unmanned Aerial Vehicle (UAV) that could be used to autonomously plan the flight path within the study area. Wherein, the flight route is from southeast corner to northwest corner of the study area, and photographs are taken along an S route.

Step 1.3, after the remote sensing image with the geographic coordinates shot by the unmanned aerial vehicle is integrally imported into related software, the remote sensing image of the whole research area can be generated, then the image is subjected to spatial resolution adjustment, and the unmanned aerial vehicle data with the spatial resolution reaching 2.3 cm is used for manufacturing Weinan aviation image data.

determining a building sample in the aerial image data;

determining a building label from the building sample;

carrying out data enhancement on the cut image to obtain an expanded image;

Further, step 2 in this embodiment is: based on Weinan aviation image data manufactured in the step 1, the image is firstly preprocessed and the data is reinforced and then is proportionally divided into a training set, a testing set and a verification set, and the method comprises the following substeps:

step 2.1, in the software of ArcGIS 10.2, selecting a manual construction sample, wherein the manual construction sample comprises various types of constructions in a research area, converting a marked vector file into an image label in a 'tif' format through a 'vector-to-grid' tool, and further finishing the image marked construction label, as shown in fig. 3.

In step 2.2, in order to facilitate the deep learning calculation, image cutting is performed on the original image and the corresponding label, and the whole remote sensing image of the research area and the corresponding label are uniformly cut in the software of the ArcGIS 10.2 to form a small-size image with the size of 512×512 pixels.

And 2.3, carrying out data enhancement on the small-size image to expand the data volume, rotating each image and the corresponding label by 90 degrees, 180 degrees and 270 degrees, and carrying out random horizontal and vertical overturning operation with the probability of 0.5 to generate 830 images with 512 multiplied by 512 pixels.

Step 2.4, dividing 830 images of 512×512 pixels into training set, test set and verification set according to the ratio of 6:2:2.

Specifically, step 3 in this embodiment is four components required for constructing the depth space attention network, and includes the following sub-steps:

and 3.1, constructing a spatial information encoder SIEN, effectively combining high-level semantics and bottom-layer details, and improving the performance of semantic segmentation. A shallow SIEN with a total of 6 stages is designed, where each layer consists of convolution operations (Conv), batch Normalization (BN) and parameter correction linear units (prime). The first and second layers of each stage have the same number of filters (step size 2) and output feature map sizes. Therefore, SIEN extracts 1/8 of the original input, improves the channel capacity and enriches the space details.

In step 3.2, the semantic encoder SEN is constructed, setting the ratio of the input image resolution by applying 5 downsampling, the final output size being only 1/32 of the input image size, which may result in significant loss of spatial detail. SEN is based on the channel segmentation and repartitioning module fdss_ nbt module and applies 3 downsampling (output resolution is only 1/8 of the original image resolution, with 32, 64 and 128 channels). At

stages

3 and 4, special convolutions are introduced to increase the receptive field.

Step 3.3, a dual attention extractor DAB is constructed, and a kernel-based attention mechanism (KAM) is designed in terms of the spatial dimension of the image. In the channel dimension, using a dot product based Channel Attention Mechanism (CAM), the number of input channels C is typically much smaller than the number of pixels contained in the feature map, and the Softmax function of multiple channels is not as complex. The dual attention block (DAM) combines KAM using analog position dependencies and CAM using analog channel dependencies to enhance the recognition of objects by extracted features of each layer of feature maps.

And 3.4, constructing an attention character definer AFRB, specifically comprising a project layer and upsampling, wherein each layer consists of a convolution layer, a BN layer and a PReLU. Setting the number of input channels equal to the number of output channels of the dual-attention extractor DAM in step 3.3, the number of output channels being set to the following formula:

AFRB＝conv3(conv3(F _att ))；

wherein F is _att Representing the number of output channels of the DAM, conv3 represents the fusion convolution of the third layer, and AFRB represents the number of buildings.

Further, step 4 in this embodiment is: the four components constructed in the step 3 are combined into a remote sensing image building extraction algorithm, namely, based on a depth space attention network, the training set extracted from the building is trained, and the method comprises the following sub-steps:

step 4.1, as outlined in the depth-space attention network architecture shown in fig. 4, where the two branch networks are a spatial information coding network branch and a high-dimensional feature extraction branch, the spatial information coding network branch is composed of a spatial information encoder SIEN (shown in fig. 4 a), the high-dimensional feature extraction branch is composed of a semantic encoder SEN (shown in fig. 4 b) as a main building block for constructing an advanced feature extraction branch, SEN is composed of a feature fusion device FFM (shown in fig. 4 c), a dual attention extractor DAB (shown in fig. 4 d) and an attention feature definer AFRB (shown in fig. 4 e).

Step 4.2, input the input image to the SEN module, generate 4 spatially reduced feature maps (F _h,1 ，F _h,2 ，F _h,3 ，F _h,4 ). Feature map F _h,3 ，F _h,4 Having the same number of channels, having different expansion ratios to expand the receptive field convolution filter, employing a 1/8 downsampling strategy to map the high resolution features F _h,3 And F _h,4 And (3) combining the spatial features to obtain a final model.

And 4.3, inputting the images in the two folders of the images and the labels in the training set and the corresponding labels into the constructed depth space attention network model, and training after adjusting the super parameters to obtain an effective training parameter model.

initializing training parameters of the depth space attention network;

Specifically, step 5 in this embodiment is: the method comprises the following sub-steps of:

step 5.1, experiments are carried out by using a PyTorch deep learning framework, and an experimental environment is built by PyCharm software and is built in a Python language.

Step 5.2, according to the common evaluation criteria of deep learning semantic segmentation, five most common evaluation indexes are adopted for quality evaluation, including "Overall Accuracy (OA)", "Precision)", "Recall", "F1 score (F1)", and "intersection set (IoU)", and specific evaluation indexes are as follows:

wherein, P represents the positive sample number, N represents the negative sample number, TP represents the true positive number, TN represents the true negative number, FP represents the false positive number, and FN represents the false negative number.

Step 5.3 all experiments were carried out

CPU E5-2630v4 (2.20 GHz) and two NVIDIAGeForce GTX 1080Ti (11 GB) servers.

And 5.4, setting the parameter of input data to be 100 epoch, setting the latches to be 16, selecting Adam by an optimizer, setting the initial learning rate to be 0.0001, setting the weight attenuation to be 0.0001, setting the loss function to be a cross entropy loss function, and solving the network parameters of the minimized loss function based on a random gradient descent algorithm.

Further, in the embodiment, step 6 is to input the images and corresponding labels in the training set to the model constructed by Python software for training according to the parameters of the input data in training the depth space attention network model set in step 5, and the obtained parameter model is named "epoch100_latches 16_learning0.0001.Pth".

And 6.2, directly predicting the building extraction of the high-resolution remote sensing image without inputting the label in the verification set by using the parameter model, and then carrying out the precision evaluation of the model according to five evaluation indexes.

And 6.3, if the precision of the parameter model does not reach the expected effect (IoU reaches 85% or more), the super-parameter content of the model needs to be modified, the data of the training set is input again, the data of the testing set is used for testing, the parameter model with the best precision is stored, the image of the verification set is subjected to precision evaluation, and finally the parameter model with the best evaluation index is stored, namely the required depth space attention network model, and the high-precision result of the extracted high-resolution remote sensing image building is shown in fig. 5.

Corresponding to the above method, the embodiment further provides a high-resolution remote sensing image building extraction system, which includes:

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. The high-resolution remote sensing image building extraction method is characterized by comprising the following steps of:

acquiring aerial image data in a research area;

2. The method of claim 1, wherein the acquiring aerial image data in the investigation region comprises:

3. The method of claim 1, wherein performing image preprocessing and data enhancement on the aerial image data and dividing the processed data into a training set, a test set and a verification set comprises:

determining a building sample in the aerial image data;

determining a building label from the building sample;

carrying out data enhancement on the cut image to obtain an expanded image;

4. The method of claim 1, wherein the spatial information encoder comprises three stages, each stage comprising a two-layer shallow structure; each layer of shallow structure consists of convolution operation, batch normalization and parameter correction linear units; the first layer and the second layer of each stage have the same number of filters and output feature map sizes; the semantic encoder comprises a channel segmentation and repartition module and 3 downsampling modules; the dual-attention extractor is constructed based on a kernel attention mechanism and a channel attention mechanism of dot product; the attention characteristic definer comprises a project layer and an upsampling module; each layer of the attention profile definer is composed of a convolution layer, a BN layer, and a pralu.

5. The method of claim 1, wherein constructing a depth space attention network from the spatial information encoder, the semantic encoder, the dual attention extractor, and the attention feature definer, and training the depth space attention network from the training set, comprises:

6. The method for extracting a building from a high-resolution remote sensing image according to claim 1, wherein the testing and verifying the deep space attention network by using the testing set and the verifying set respectively, to obtain an optimal remote sensing image building extraction model, comprises:

initializing training parameters of the depth space attention network;

7. A high resolution remote sensing image building extraction system, comprising: