CN116012709B

CN116012709B - High-resolution remote sensing image building extraction method and system

Info

Publication number: CN116012709B
Application number: CN202310017827.2A
Authority: CN
Inventors: 刘耀辉; 苏飞; 郑国强; 付萍杰; 毕京学; 王晋; 于明洋; 周洁; 靳奉祥
Original assignee: Shandong Jianzhu University
Current assignee: Shandong Jianzhu University
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-07-18
Anticipated expiration: 2043-01-06
Also published as: CN116012709A

Abstract

The invention provides a high-resolution remote sensing image building extraction method and a system, wherein the method comprises the following steps: acquiring aerial image data in a research area; performing image preprocessing and data enhancement on the aviation image data, and dividing the processed data into a training set, a test set and a verification set; respectively constructing a spatial information encoder, a semantic encoder, a dual-attention extractor and an attention characteristic definer; constructing a depth space attention network according to the space information encoder, the semantic encoder, the dual attention extractor and the attention feature definer, and training the depth space attention network according to the training set; respectively testing and verifying the depth space attention network by using the test set and the verification set to obtain an optimal remote sensing image building extraction model; and inputting the data to be detected into a remote sensing image building extraction model to obtain a detection result. The invention can improve the extraction accuracy of the remote sensing image building.

Description

A method and system for extracting buildings from high-resolution remote sensing images

技术领域technical field

本发明涉及图像处理技术领域，特别是涉及一种高分辨率遥感影像建筑物提取方法及系统。The invention relates to the technical field of image processing, in particular to a method and system for extracting buildings from high-resolution remote sensing images.

背景技术Background technique

随着经济活动的快速发展，建筑物的数量大幅增加，构筑建筑物的地区分布范围也更加广泛。与此同时，建筑物外部轮廓形状也呈现大型化、复杂化趋势。因此，为有效、合理的把握区域的经济发展、人口分布状况，更好地对基础设施、服务站点等公共服务点进行安排，高效、准确的提取出高分辨率遥感影像中的建筑物是亟待解决的问题。With the rapid development of economic activities, the number of buildings has increased significantly, and the area where buildings are built has become more widely distributed. At the same time, the external contours of buildings are also showing a trend of becoming larger and more complex. Therefore, in order to effectively and reasonably grasp the regional economic development and population distribution, better arrange public service points such as infrastructure and service stations, and efficiently and accurately extract buildings from high-resolution remote sensing images is an urgent need. solved problem.

在计算机图像处理领域，深度学习经过良好的训练能够有效地提取出目标物，并展现出极好的运算性能。然而，建筑物分割是一项包含建筑物和非建筑物元素的二元分类任务，仅通过深度学习的基本模型中的基础卷积神经网络进行高分辨率遥感影像的建筑物提取还存在若干问题。如面对复杂地物场景时存在一定范围内建筑物精确率不高，面对植被遮挡时存在边界不完整或边界模糊，面对现下模型中的“卷积层”大量累加后提取建筑物的特征过于分散等问题。这些都给高效、准确地完成高分辨率遥感影像建筑物提取造成了极大困难。In the field of computer image processing, deep learning can effectively extract objects after good training, and exhibit excellent computing performance. However, building segmentation is a binary classification task that includes building and non-building elements, and there are still several problems in extracting buildings from high-resolution remote sensing images only through the basic convolutional neural network in the basic model of deep learning . For example, in the face of complex ground object scenes, the accuracy of buildings within a certain range is not high, and in the face of vegetation occlusion, there are incomplete or blurred boundaries. In the face of the "convolutional layer" in the current model, a large number of accumulations are used to extract the buildings. Features are too scattered and so on. All of these have caused great difficulties in efficiently and accurately extracting buildings from high-resolution remote sensing images.

因此，现有技术需要一种提高遥感影像建筑物提取准确率的方法，以解决以上所提的问题。Therefore, the prior art needs a method for improving the accuracy of extracting buildings from remote sensing images, so as to solve the above-mentioned problems.

发明内容Contents of the invention

为了克服现有技术的不足，本发明的目的是提供一种在高分辨率航空影像的复杂场景中，以含噪声、适当尺寸的遥感影像为输入数据，基于深度学习中的注意力机制构建的基于深度空间注意力网络，实现高分辨率遥感影像中建筑物有效提取的方法和系统。In order to overcome the deficiencies of the prior art, the purpose of the present invention is to provide a remote sensing image with noise and appropriate size as input data in complex scenes of high-resolution aerial images, based on the attention mechanism in deep learning. A method and system for effectively extracting buildings from high-resolution remote sensing images based on a deep spatial attention network.

为实现上述目的，本发明提供了如下方案：To achieve the above object, the present invention provides the following scheme:

一种高分辨率遥感影像建筑物提取方法，包括：A method for extracting buildings from high-resolution remote sensing images, comprising:

获取研究区内的航空影像数据；Obtain aerial image data in the study area;

对所述航空影像数据进行图像预处理和数据增强，并将处理后的数据划分为训练集、测试集和验证集；Carrying out image preprocessing and data enhancement to the aerial image data, and dividing the processed data into a training set, a test set and a verification set;

分别构建空间信息编码器、语义编码器、双注意力提取器和注意力特征定义器；Construct spatial information encoder, semantic encoder, dual attention extractor and attention feature definer respectively;

根据所述空间信息编码器、所述语义编码器、所述双注意力提取器和所述注意力特征定义器构建深度空间注意力网络，并根据所述训练集对所述深度空间注意力网络进行训练；Construct a deep spatial attention network according to the spatial information encoder, the semantic encoder, the double-attention extractor and the attention feature definer, and perform the deep spatial attention network according to the training set conduct training;

分别利用所述测试集和所述验证集对所述深度空间注意力网络进行测试和验证，得到最优的遥感影像建筑物提取模型；Using the test set and the verification set to test and verify the deep spatial attention network respectively, to obtain the optimal remote sensing image building extraction model;

将待检测数据输入至所述遥感影像建筑物提取模型中，得到检测结果。The data to be detected is input into the remote sensing image building extraction model to obtain a detection result.

优选地，所述获取研究区内的航空影像数据，包括：Preferably, said acquisition of aerial image data in the research area includes:

利用无人机在所述研究区内按照设定飞行路线进行航飞，以采集所述研究区中的遥感数据集；Utilizing the unmanned aerial vehicle to fly in the research area according to the set flight route, so as to collect the remote sensing data sets in the research area;

根据所述无人机拍摄的带有地理坐标的所述遥感数据集生成所述研究区域的遥感影像；Generating a remote sensing image of the research area according to the remote sensing data set with geographic coordinates taken by the drone;

将所述遥感影像进行空间分辨率的调整，并根据空间分辨率达到预设分辨率阈值的遥感影像制作成所述航空影像数据。The spatial resolution of the remote sensing image is adjusted, and the aerial image data is produced according to the remote sensing image whose spatial resolution reaches a preset resolution threshold.

优选地，对所述航空影像数据进行图像预处理和数据增强，并将处理后的数据划分为训练集、测试集和验证集，包括：Preferably, image preprocessing and data enhancement are performed on the aerial image data, and the processed data are divided into training set, test set and verification set, including:

确定所述航空影像数据中的建筑样本；identifying building samples in said aerial imagery data;

根据所述建筑样本确定建筑标签；determining a building label based on said building sample;

对所述航空影像数据和对应的所述建筑标签进行影响裁剪，得到裁剪后的影像；Performing impact clipping on the aerial image data and the corresponding building tags to obtain a clipped image;

将所述裁剪后的影像进行数据增强，得到扩充处理后的图像；performing data enhancement on the cropped image to obtain an expanded image;

按照预设比例将所述扩充后的图像进行随机划分，得到所述训练集、所述测试集和所述验证集。The expanded image is randomly divided according to a preset ratio to obtain the training set, the test set and the verification set.

优选地，所述空间信息编码器包括三个阶段，每个阶段包含有两层浅层结构；每层所述浅层结构由卷积运算、批量归一化和参数校正线性单元组成；每个阶段的第一层和第二层具有相同数量的过滤器和输出特征图大小；所述语义编码器包括通道分割和重划分模块和3个下采样模块；所述双注意力提取器是基于核注意力机制和点积的通道注意机制构建得到的；所述注意力特征定义器包括Projection层和上采样模块；所述注意力特征定义器的每一层均由卷积层、BN层和PReLU组成。Preferably, the spatial information encoder includes three stages, each stage contains two layers of shallow structure; each layer of the shallow structure is composed of convolution operation, batch normalization and parameter correction linear unit; each The first and second layers of the stage have the same number of filters and output feature map size; the semantic encoder includes channel segmentation and re-partitioning modules and 3 downsampling modules; the dual-attention extractor is based on kernel The channel attention mechanism of attention mechanism and dot product is constructed; The attention feature definer includes a Projection layer and an upsampling module; Each layer of the attention feature definer consists of a convolutional layer, a BN layer, and a PReLU composition.

优选地，根据所述空间信息编码器、所述语义编码器、所述双注意力提取器和所述注意力特征定义器构建深度空间注意力网络，并根据所述训练集对所述深度空间注意力网络进行训练，包括：Preferably, a deep spatial attention network is constructed according to the spatial information encoder, the semantic encoder, the double-attention extractor and the attention feature definer, and the depth spatial attention is analyzed according to the training set. The attention network is trained, including:

根据所述空间信息编码器构建空间信息编码网络分支网络；Constructing a spatial information encoding network branch network according to the spatial information encoder;

根据所述语义编码器、所述双注意力提取器和所述注意力特征定义器构建高维特征提取分支网络；Constructing a high-dimensional feature extraction branch network according to the semantic coder, the double-attention extractor and the attention feature definer;

根据所述空间信息编码网络分支网络、所述高维特征提取分支网络和预设的特征融合器构建所述深度空间注意力网络；Constructing the deep spatial attention network according to the spatial information encoding network branch network, the high-dimensional feature extraction branch network and the preset feature fusion device;

将所述训练集输入到所述所述深度空间注意力网络中，并调整超参数后进行训练，得到训练好的深度空间注意力网络。The training set is input into the deep spatial attention network, and hyperparameters are adjusted for training to obtain a trained deep spatial attention network.

优选地，分别利用所述测试集和所述验证集对所述深度空间注意力网络进行测试和验证，得到最优的遥感影像建筑物提取模型，包括：Preferably, the depth spatial attention network is tested and verified by using the test set and the verification set respectively to obtain an optimal remote sensing image building extraction model, including:

对所述深度空间注意力网络的训练参数进行初始化；Initialize the training parameters of the deep spatial attention network;

根据所述测试集对所述深度空间注意力网络进行测试，并根据预设的评估指标对测试的深度空间注意力网络进行评估，以得到测试好的深度空间注意力网络；Test the deep spatial attention network according to the test set, and evaluate the tested deep spatial attention network according to preset evaluation indicators, so as to obtain the tested deep spatial attention network;

根据所述验证集对测试好的深度空间注意力网络进行验证，得到所述最优的遥感影像建筑物提取模型。The tested deep spatial attention network is verified according to the verification set, and the optimal remote sensing image building extraction model is obtained.

一种高分辨率遥感影像建筑物提取系统，包括：A system for extracting buildings from high-resolution remote sensing images, including:

影像获取模块，用于获取研究区内的航空影像数据；Image acquisition module, used to acquire aerial image data in the research area;

数据处理模块，用于对所述航空影像数据进行图像预处理和数据增强，并将处理后的数据划分为训练集、测试集和验证集；A data processing module, for performing image preprocessing and data enhancement on the aerial image data, and dividing the processed data into a training set, a test set and a verification set;

模块构建模块，用于分别构建空间信息编码器、语义编码器、双注意力提取器和注意力特征定义器；Module building blocks for building spatial information encoder, semantic encoder, dual attention extractor and attentional feature definer respectively;

网络训练模块，用于根据所述空间信息编码器、所述语义编码器、所述双注意力提取器和所述注意力特征定义器构建深度空间注意力网络，并根据所述训练集对所述深度空间注意力网络进行训练；A network training module for constructing a deep spatial attention network according to the spatial information encoder, the semantic encoder, the double-attention extractor and the attention feature definer, and according to the training set for all The deep spatial attention network described above is trained;

模型寻优模块，用于分别利用所述测试集和所述验证集对所述深度空间注意力网络进行测试和验证，得到最优的遥感影像建筑物提取模型；The model optimization module is used to test and verify the deep spatial attention network by using the test set and the verification set respectively, so as to obtain the optimal remote sensing image building extraction model;

检测模块，用于将待检测数据输入至所述遥感影像建筑物提取模型中，得到检测结果。The detection module is used to input the data to be detected into the remote sensing image building extraction model to obtain detection results.

根据本发明提供的具体实施例，本发明公开了以下技术效果：According to the specific embodiments provided by the invention, the invention discloses the following technical effects:

本发明提供了一种高分辨率遥感影像建筑物提取方法及系统，方法包括：获取研究区内的航空影像数据；对所述航空影像数据进行图像预处理和数据增强，并将处理后的数据划分为训练集、测试集和验证集；分别构建空间信息编码器、语义编码器、双注意力提取器和注意力特征定义器；根据所述空间信息编码器、所述语义编码器、所述双注意力提取器和所述注意力特征定义器构建深度空间注意力网络，并根据所述训练集对所述深度空间注意力网络进行训练；分别利用所述测试集和所述验证集对所述深度空间注意力网络进行测试和验证，得到最优的遥感影像建筑物提取模型；将待检测数据输入至所述遥感影像建筑物提取模型中，得到检测结果。本发明能够提高遥感影像建筑物提取准确率。The invention provides a method and system for extracting buildings from high-resolution remote sensing images. The method includes: acquiring aerial image data in the research area; performing image preprocessing and data enhancement on the aerial image data, and converting the processed data to Divided into training set, test set and verification set; Build spatial information encoder, semantic encoder, double attention extractor and attention feature definer respectively; According to described spatial information encoder, described semantic encoder, described Two attention extractors and the attention feature definer construct depth space attention network, and according to described training set, described depth space attention network is trained; The deep spatial attention network is tested and verified to obtain the optimal remote sensing image building extraction model; the data to be detected is input into the remote sensing image building extraction model to obtain the detection result. The invention can improve the accuracy rate of remote sensing image building extraction.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the accompanying drawings required in the embodiments. Obviously, the accompanying drawings in the following description are only some of the present invention. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without paying creative labor.

图1为本发明实施例提供的方法流程图；Fig. 1 is the flow chart of the method provided by the embodiment of the present invention;

图2为本发明实施例提供的建筑物提取流程示意图；FIG. 2 is a schematic diagram of a building extraction process provided by an embodiment of the present invention;

图3为本发明实施例提供的数据集展示图；Fig. 3 is a data set display diagram provided by the embodiment of the present invention;

图4为本发明实施例提供的深度注意力模型的框架图；4 is a framework diagram of a deep attention model provided by an embodiment of the present invention;

图5为本发明实施例提供的实验结果图。Fig. 5 is a graph of experimental results provided by the embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.

本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤、过程、方法等没有限定于已列出的步骤，而是可选地还包括没有列出的步骤，或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤元。The terms "first", "second", "third" and "fourth" in the specification and claims of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order . Furthermore, the terms "include" and "have", as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, including a series of steps, processes, methods, etc. are not limited to the listed steps, but optionally also include steps that are not listed, or optionally also include inherent aspects of these processes, methods, products or equipment. other steps.

本发明的目的是提供一种高分辨率遥感影像建筑物提取方法及系统，能够提高遥感影像建筑物提取准确率。The purpose of the present invention is to provide a high-resolution remote sensing image building extraction method and system, which can improve the accuracy of remote sensing image building extraction.

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

图1为本发明实施例提供的方法流程图，如图1所示，本发明提供了一种高分辨率遥感影像建筑物提取方法，包括：Fig. 1 is a flow chart of the method provided by the embodiment of the present invention. As shown in Fig. 1, the present invention provides a method for extracting buildings from high-resolution remote sensing images, including:

步骤100：获取研究区内的航空影像数据；Step 100: Obtain aerial image data in the research area;

步骤200：对所述航空影像数据进行图像预处理和数据增强，并将处理后的数据划分为训练集、测试集和验证集；Step 200: Perform image preprocessing and data enhancement on the aerial image data, and divide the processed data into training set, test set and verification set;

步骤300：分别构建空间信息编码器、语义编码器、双注意力提取器和注意力特征定义器；Step 300: Constructing a spatial information encoder, a semantic encoder, a double-attention extractor and an attention feature definer respectively;

步骤400：根据所述空间信息编码器、所述语义编码器、所述双注意力提取器和所述注意力特征定义器构建深度空间注意力网络，并根据所述训练集对所述深度空间注意力网络进行训练；Step 400: Construct a deep spatial attention network according to the spatial information encoder, the semantic encoder, the double-attention extractor and the attention feature definer, and classify the depth spatial attention according to the training set attention network for training;

步骤500：分别利用所述测试集和所述验证集对所述深度空间注意力网络进行测试和验证，得到最优的遥感影像建筑物提取模型；Step 500: Using the test set and the verification set to test and verify the deep spatial attention network respectively, to obtain the optimal remote sensing image building extraction model;

步骤600：将待检测数据输入至所述遥感影像建筑物提取模型中，得到检测结果。Step 600: Input the data to be detected into the remote sensing image building extraction model to obtain a detection result.

如图2所示，本实施例中的建筑物提取流程包括以下步骤：As shown in Figure 2, the building extraction process in this embodiment includes the following steps:

步骤1，通过无人机采集的高分辨率航空影像，制作Weinan航空影像数据，包括以下子步骤：Step 1, making Weinan aerial image data through the high-resolution aerial image collected by UAV, including the following sub-steps:

步骤1.1，在实地调查的基础上，选取陕西省渭南市富平县贺兰村作为研究区域，通过无人机对研究区航飞，采集研究区中的高分辨率遥感影像后制作数据集。Step 1.1. On the basis of field investigation, select Helan Village, Fuping County, Weinan City, Shaanxi Province as the research area, and use drones to fly over the research area to collect high-resolution remote sensing images in the research area to create a dataset.

步骤1.2，本研究中使用的实验数据来自小型四旋翼无人机(UAV)，可用于自主规划研究区域内的飞行路径。其中，飞行路线是从研究区域的东南角到西北角，沿“S路线”拍摄照片。In step 1.2, the experimental data used in this study comes from a small quadrotor unmanned aerial vehicle (UAV), which can be used to autonomously plan flight paths within the study area. Among them, the flight route is from the southeast corner to the northwest corner of the research area, and the photos are taken along the "S route".

步骤1.3，将无人机拍摄的带有地理坐标的遥感影像整体导入相关软件后，可生成整片研究区域的遥感影像，再将影像进行空间分辨率的调整，使用空间分辨率达到2.3厘米的无人机数据制作成Weinan航空影像数据。Step 1.3, after importing the remote sensing images with geographical coordinates taken by the UAV into the relevant software, the remote sensing images of the entire research area can be generated, and then the spatial resolution of the images is adjusted, using a camera with a spatial resolution of 2.3 cm UAV data is made into Weinan aerial image data.

进一步地，本实施例中的步骤2为：基于步骤1制作的Weinan航空影像数据，先进行图像预处理和数据增强后再将其按比例划分成训练集、测试集和验证集，包括以下子步骤：Further, step 2 in this embodiment is: based on the Weinan aerial image data produced in step 1, first perform image preprocessing and data enhancement and then divide it into training set, test set and verification set in proportion, including the following step:

步骤2.1，在ArcGIS 10.2的软件中，通过手动方式建筑样本的选取，其中包含研究区域内的各种类型的建筑，将标注的矢量文件通过“矢量转栅格”工具转化为“.GIF”格式的影像标签label，进而完成影像标注的建筑标签，如图3所示。Step 2.1, in the ArcGIS 10.2 software, manually select architectural samples, which contain various types of buildings in the study area, and convert the marked vector files into ".GIF" format through the "Vector to Raster" tool The image label label, and then complete the building label of image annotation, as shown in Figure 3.

步骤2.2，为了便于深度学习计算，对原始图像及对应的标签label进行影像裁剪，在ArcGIS 10.2的软件中将研究区的整幅遥感影像及对应的标签label进行均匀裁剪，裁剪成尺寸为512×512像素的小尺寸影像。In step 2.2, in order to facilitate deep learning calculations, image cropping is performed on the original image and the corresponding label, and the entire remote sensing image of the research area and the corresponding label are uniformly cropped in the ArcGIS 10.2 software, and the cropped size is 512× Small image size of 512 pixels.

步骤2.3，将小尺寸影像再进行数据增强以扩充数据量，对每幅影像及对应的标签label进行旋转90°、180°和270°，再进行概率为0.5的随机水平和垂直翻转的操作，共生成了830幅512×512像素的图像。Step 2.3: Carry out data enhancement on small-sized images to expand the amount of data, rotate each image and the corresponding label label by 90°, 180°, and 270°, and then perform random horizontal and vertical flip operations with a probability of 0.5. A total of 830 images of 512 × 512 pixels were generated.

步骤2.4，将830幅512×512像素的图像按照6:2:2的比例随机划分到训练集、测试集和验证集中。In step 2.4, 830 images of 512 × 512 pixels are randomly divided into training set, test set and verification set according to the ratio of 6:2:2.

具体的，本实施例中的步骤3为构建深度空间注意力网络所需的四大组件，包括以下子步骤：Concretely, step 3 in the present embodiment is the four major components required for constructing the deep spatial attention network, including the following sub-steps:

步骤3.1，构建空间信息编码器SIEN，有效地结合高层语义和底层细节，提高了语义分割的性能。设计了具有三个阶段共6层的浅层SIEN，其中每层由卷积运算(Conv)、批量归一化(BN)和参数校正线性单元(PReLU)组成。每个阶段的第一层和第二层具有相同数量的过滤器(步长为2)和输出特征图大小。因此，SIEN提取了原始输入的1/8，提高了信道容量，丰富了空间细节。In step 3.1, the spatial information encoder SIEN is constructed, which effectively combines high-level semantics and low-level details to improve the performance of semantic segmentation. A shallow SIEN with three stages and six layers is designed, where each layer consists of convolution operation (Conv), batch normalization (BN) and parameter rectified linear unit (PReLU). The first and second layers of each stage have the same number of filters (with a stride of 2) and output feature map size. Therefore, SIEN extracts 1/8 of the original input, improves the channel capacity and enriches the spatial details.

步骤3.2，构建语义编码器SEN，通过应用5个下采样来设置输入图像分辨率的比率，最终输出的大小仅为输入图像大小的1/32，这可能导致空间细节的显著损失。SEN基于通道分割和重划分模块FDSS_nbt模块并应用3个下采样(输出分辨率仅为原始图像分辨率的1/8，具有32、64和128个通道)。在第3阶段和第4阶段，引入特殊卷积以增加感受野。Step 3.2, constructing the semantic encoder SEN, by applying 5 downsampling to set the ratio of input image resolution, the size of the final output is only 1/32 of the input image size, which may lead to significant loss of spatial details. SEN is based on the channel segmentation and resegmentation module FDSS_nbt module and applies 3 downsampling (the output resolution is only 1/8 of the original image resolution with 32, 64 and 128 channels). In stages 3 and 4, special convolutions are introduced to increase the receptive field.

步骤3.3，构建了双注意力提取器DAB，在影像的空间维度方面，设计了一种基于核注意力机制(KAM)。在通道维度上，使用基于点积的通道注意机制(CAM)，输入通道C的数量通常远小于特征图中包含的像素数量，多个通道的Softmax函数的复杂度不高。双注意块(DAM)组合了使用模拟位置依赖性的KAM和模拟通道依赖性的CAM，可以增强每一层特征图的提取特征对目标的辨识能力。In step 3.3, the dual-attention extractor DAB is constructed, and a kernel-based attention mechanism (KAM) is designed in terms of the spatial dimension of the image. In the channel dimension, using the dot product-based channel attention mechanism (CAM), the number of input channels C is usually much smaller than the number of pixels contained in the feature map, and the complexity of the Softmax function for multiple channels is not high. The Dual Attention Block (DAM) combines the KAM that simulates the position dependence and the CAM that simulates the channel dependence, which can enhance the ability of the extracted features of each layer feature map to recognize the target.

步骤3.4，构建注意力特征定义器AFRB，具体包含Projection层和上采样，每一层都由卷积层、BN层和PReLU组成。设置输入通道数等于步骤3.3中的双注意力提取器DAM的输出通道数，设置输出通道数为以下公式：Step 3.4, build the attention feature definer AFRB, which specifically includes the Projection layer and upsampling, each layer consists of a convolutional layer, a BN layer, and a PReLU. Set the number of input channels equal to the number of output channels of the double-attention extractor DAM in step 3.3, and set the number of output channels to the following formula:

AFRB＝conv3(conv3(F_att))；AFRB = conv3(conv3(F _att ));

其中，F_att表示DAM的输出通道数，conv3表示第三层的融合卷积，AFRB表示为建筑物的数量。Among them, F _att represents the number of output channels of DAM, conv3 represents the fusion convolution of the third layer, and AFRB represents the number of buildings.

进一步地，本实施例中步骤4为：通过步骤3中构建的四大组件组合成遥感影像建筑物提取算法——基于深度空间注意力网络，再对建筑物提取的训练集进行训练，包括以下子步骤：Further, step 4 in this embodiment is: combine the four major components constructed in step 3 into a remote sensing image building extraction algorithm based on a deep spatial attention network, and then train the training set for building extraction, including the following Substeps:

步骤4.1，如图4所示的深度空间注意力网络架构的概述，其中的两个分支网络是空间信息编码网络分支和高维特征提取分支，空间信息编码网络分支由空间信息编码器SIEN(图4中a所示)构成，高维特征提取分支由语义编码器SEN(图4中b所示)作为构建高级特征提取分支的主要构建块，SEN由特征融合器FFM(图4中c所示)、双注意力提取器DAB(图4中d所示)和注意力特征定义器AFRB(图4中e所示)组成。Step 4.1, an overview of the deep spatial attention network architecture shown in Figure 4, where the two branch networks are the spatial information encoding network branch and the high-dimensional feature extraction branch, and the spatial information encoding network branch is composed of the spatial information encoder SIEN (Fig. shown in a in Figure 4), the high-dimensional feature extraction branch is composed of the semantic encoder SEN (shown in b in Figure 4) as the main building block for constructing the advanced feature extraction branch, and SEN is composed of the feature fuser FFM (shown in c in Figure 4 ), a dual attention extractor DAB (shown in d in Figure 4) and an attention feature definer AFRB (shown in e in Figure 4).

步骤4.2，将输入图像输入到SEN模块，生成4个空间分辨率降低的特征图(F_h,1，F_h,2，F_h,3，F_h,4)。特征映射F_h,3，F_h,4具有相同数量的通道，具有不同的扩张率，以扩大感受野卷积滤波器，采用1/8降采样策略，将高分辨率特征图F_h,3和F_h,4的空间特征相结合，得到最终的模型。Step 4.2, input the input image to the SEN module to generate 4 feature maps with reduced spatial resolution (F _h,1 , F _h,2 , F _h,3 , F _h,4 ). The feature maps F _h,3 and F _h,4 have the same number of channels with different expansion rates to expand the receptive field convolution filter, and adopt a 1/8 downsampling strategy to convert the high-resolution feature map F _h,3 Combined with the spatial features of F _h,4 , the final model is obtained.

步骤4.3，将训练集中的images和labels两个文件夹里的影像和对应的标签输入到构建的深度空间注意力网络模型中，调整超参数后进行训练，可得到一个有效的训练参数模型。In step 4.3, the images and corresponding labels in the images and labels folders in the training set are input into the constructed deep spatial attention network model, and training is performed after adjusting the hyperparameters to obtain an effective training parameter model.

具体的，本实施例中的步骤5为：利用深度学习语义分割模型对测试集数据进行随机测试，直至随机测试达到满意效果，包括以下子步骤：Specifically, step 5 in this embodiment is: use the deep learning semantic segmentation model to perform random testing on the test set data until the random testing achieves satisfactory results, including the following sub-steps:

步骤5.1，实验使用PyTorch深度学习框架进行，实验环境通过PyCharm软件搭建，在Python语言中进行构建。Step 5.1, the experiment is carried out using the PyTorch deep learning framework, and the experimental environment is built by the PyCharm software, which is constructed in the Python language.

步骤5.2，按照深度学习语义分割的常用评价标准，采用了五个最常见的评估指标进行质量评估，包括“总体准确性(OA)”、“精度(Precision)”、“召回率(Recall)”、“F1分数(F1)”和“交并集(IoU)”，具体的评价指标的公式如下：Step 5.2, according to the common evaluation criteria of deep learning semantic segmentation, five most common evaluation indicators are used for quality evaluation, including "overall accuracy (OA)", "precision (Precision)", "recall rate (Recall)" , "F1 Score (F1)" and "Intersection Union (IoU)", the specific evaluation index formula is as follows:

其中，P表示阳性样本数，N表示阴性样本数，TP表示真阳性数，TN表示真阴性数，FP表示假阳性数，FN表示假阴性数。Among them, P represents the number of positive samples, N represents the number of negative samples, TP represents the number of true positives, TN represents the number of true negatives, FP represents the number of false positives, and FN represents the number of false negatives.

步骤5.3，所有实验均在配备CPU E5-2630v4(2.20GHz)和两台NVIDIAGeForce GTX 1080Ti(11GB)的服务器上进行。Step 5.3, all experiments are equipped with CPU E5-2630v4 (2.20GHz) and two NVIDIA GeForce GTX 1080Ti (11GB) servers.

步骤5.4，深度空间注意力网络模型训练时输入数据的参数设置epoch为100，batches为16，优化器选择Adam，初始学习率为0.0001，权重衰减为0.0001，损失函数为交叉熵损失函数，求解最小化损失函数的网络参数是基于随机梯度下降算法。Step 5.4, when training the deep spatial attention network model, set the parameters of the input data to epoch to 100, batches to 16, the optimizer to select Adam, the initial learning rate to 0.0001, the weight decay to 0.0001, and the loss function to be the cross-entropy loss function to find the minimum The network parameterization loss function is based on the stochastic gradient descent algorithm.

进一步地，本实施例中步骤6为按照步骤5设定的深度空间注意力网络模型训练时输入数据的参数，将训练集中的images和对应的labels输入到由Python软件构建的模型中进行训练，得到的参数模型命名为“epoch100_batches16_learning0.0001.pth”。Further, in the present embodiment, step 6 is to input data parameters according to the deep spatial attention network model training set in step 5, and the images and corresponding labels in the training set are input into the model constructed by Python software for training, The resulting parametric model is named "epoch100_batches16_learning0.0001.pth".

步骤6.2，使用这个参数模型对验证集中不输入标签label的高分辨率遥感影像直接进行建筑物提取的预测，再根据五个评估指标进行模型的精度评估。In step 6.2, use this parameter model to directly predict the building extraction for the high-resolution remote sensing images that do not input the label label in the verification set, and then evaluate the accuracy of the model according to the five evaluation indicators.

步骤6.3，如果此参数模型的精度未达到预期的效果(IoU达到85％及以上)，则需要修改模型的超参数内容，重新输入训练集的数据，使用测试集的数据进行测试并保存精度最好的参数模型，再对验证集的影像进行精度评估，最后保存下评价指标达到最好的参数模型即是所需的深度空间注意力网络模型，如图5所示是提取出的高分辨率遥感影像建筑物的高精度结果。Step 6.3, if the accuracy of this parameter model does not meet the expected effect (IoU reaches 85% and above), you need to modify the hyperparameter content of the model, re-input the data of the training set, use the data of the test set for testing and save the highest accuracy A good parameter model, then evaluate the accuracy of the images in the verification set, and finally save the parameter model with the best evaluation index is the required deep spatial attention network model, as shown in Figure 5 is the extracted high-resolution High-precision results of remote sensing image buildings.

对应上述方法，本实施例还提供了一种高分辨率遥感影像建筑物提取系统，包括：Corresponding to the above method, this embodiment also provides a high-resolution remote sensing image building extraction system, including:

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言，由于其与实施例公开的方法相对应，所以描述的比较简单，相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related information, please refer to the description of the method part.

本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处。综上所述，本说明书内容不应理解为对本发明的限制。In this paper, specific examples have been used to illustrate the principle and implementation of the present invention. The description of the above embodiments is only used to help understand the method of the present invention and its core idea; meanwhile, for those of ordinary skill in the art, according to the present invention Thoughts, there will be changes in specific implementation methods and application ranges. In summary, the contents of this specification should not be construed as limiting the present invention.

Claims

1. A method for extracting buildings from high-resolution remote sensing images, comprising:

Obtain aerial image data in the study area;

Carrying out image preprocessing and data enhancement to the aerial image data, and dividing the processed data into a training set, a test set and a verification set;

Construct spatial information encoder, semantic encoder, dual attention extractor and attention feature definer respectively;

Construct a deep spatial attention network according to the spatial information encoder, the semantic encoder, the double-attention extractor and the attention feature definer, and perform the deep spatial attention network according to the training set conduct training;

Using the test set and the verification set to test and verify the deep spatial attention network respectively, to obtain the optimal remote sensing image building extraction model;

Inputting the data to be detected into the remote sensing image building extraction model to obtain a detection result;

The spatial information encoder includes three stages, each stage contains two layers of shallow structure; each layer of the shallow structure is composed of convolution operation, batch normalization and parameter correction linear unit; the first layer of each stage The first layer and the second layer have the same number of filters and output feature map size; the semantic encoder includes channel segmentation and repartitioning modules and 3 downsampling modules; the dual attention extractor is based on the kernel attention mechanism and the channel attention mechanism of dot product are constructed; the attention feature definer includes a Projection layer and an upsampling module; each layer of the attention feature definer is composed of a convolutional layer, a BN layer and a PReLU;

Construct a deep spatial attention network according to the spatial information encoder, the semantic encoder, the double-attention extractor and the attention feature definer, and perform the deep spatial attention network according to the training set Conduct training, including:

Constructing a spatial information encoding network branch network according to the spatial information encoder;

Constructing a high-dimensional feature extraction branch network according to the semantic coder, the double-attention extractor and the attention feature definer;

Constructing the deep spatial attention network according to the spatial information encoding network branch network, the high-dimensional feature extraction branch network and the preset feature fusion device;

The training set is input into the deep spatial attention network, and hyperparameters are adjusted for training to obtain a trained deep spatial attention network.

2. The high-resolution remote sensing image building extraction method according to claim 1, wherein said acquisition of aerial image data in the study area includes:

Utilizing the unmanned aerial vehicle to fly in the research area according to the set flight route, so as to collect the remote sensing data sets in the research area;

Generating a remote sensing image of the research area according to the remote sensing data set with geographic coordinates taken by the drone;

The spatial resolution of the remote sensing image is adjusted, and the aerial image data is produced according to the remote sensing image whose spatial resolution reaches a preset resolution threshold.

3. the high-resolution remote sensing image building extraction method according to claim 1, is characterized in that, image preprocessing and data enhancement are carried out to described aerial image data, and the data after processing is divided into training set, test set and a validation set, including:

identifying building samples in said aerial imagery data;

determining a building label based on said building sample;

Performing impact clipping on the aerial image data and the corresponding building tags to obtain a clipped image;

performing data enhancement on the cropped image to obtain an expanded image;

The expanded image is randomly divided according to a preset ratio to obtain the training set, the test set and the verification set.

4. the high-resolution remote sensing image building extraction method according to claim 1, is characterized in that, utilizes described test set and described verification set to carry out test and verification to described deep space attention network respectively, obtains optimal The remote sensing image building extraction model, including:

Initialize the training parameters of the deep spatial attention network;

Test the deep spatial attention network according to the test set, and evaluate the tested deep spatial attention network according to preset evaluation indicators, so as to obtain the tested deep spatial attention network;

The tested deep spatial attention network is verified according to the verification set, and the optimal remote sensing image building extraction model is obtained.

5. A high-resolution remote sensing image building extraction system, characterized in that it comprises:

Image acquisition module, used to acquire aerial image data in the research area;

A data processing module, for performing image preprocessing and data enhancement on the aerial image data, and dividing the processed data into a training set, a test set and a verification set;

Module building blocks for building spatial information encoder, semantic encoder, dual attention extractor and attentional feature definer respectively;

A network training module for constructing a deep spatial attention network according to the spatial information encoder, the semantic encoder, the double-attention extractor and the attention feature definer, and according to the training set for all The deep spatial attention network described above is trained;

The model optimization module is used to test and verify the deep spatial attention network by using the test set and the verification set respectively, so as to obtain the optimal remote sensing image building extraction model;

A detection module, configured to input the data to be detected into the remote sensing image building extraction model to obtain a detection result;