CN108764039A

CN108764039A - Building extracting method, medium and the computing device of neural network, remote sensing image

Info

Publication number: CN108764039A
Application number: CN201810373725.3A
Authority: CN
Inventors: 李祥; 彭玲; 胡媛; 肖莎
Original assignee: Institute of Remote Sensing and Digital Earth of CAS
Current assignee: Institute of Remote Sensing and Digital Earth of CAS
Priority date: 2018-04-24
Filing date: 2018-04-24
Publication date: 2018-11-06
Anticipated expiration: 2038-04-24
Also published as: CN108764039B

Abstract

The invention discloses a neural network, a building extraction method of a remote sensing image, a medium and a computing device. The disclosed neural network is used for building extraction of remote sensing images, including: the input layer in the VGG network, the first to fifth convolutional layers, and the first to fourth pooling layers; the first single-scale fusion layer, the second The input end of a single-scale fusion layer is connected to the output end of the first convolutional layer; the second to fifth single-scale fusion layers, and the input ends of the second to fifth single-scale fusion layers are respectively connected to the second to fifth volumes The output end of the product layer; the first to fourth upsampling layers, the input ends of the first to fourth upsampling layers are respectively connected to the output ends of the second to fifth single-scale fusion layers; multi-scale splicing fusion layer, multi-scale The input end of the splicing and fusion layer is connected to the output ends of the first single-scale fusion layer and the first to fourth upsampling layers; an output layer. The disclosed neural network can effectively process buildings with dense distribution and various scales, and improve the precision of automatic building extraction.

Description

Neural network, building extraction method, medium and computing equipment of remote sensing image

技术领域technical field

本发明涉及神经网络和图像处理领域，尤其涉及一种神经网络、遥感影像的建筑物提取方法、介质及计算设备。The invention relates to the fields of neural network and image processing, in particular to a neural network, a method for extracting buildings from remote sensing images, a medium and a computing device.

背景技术Background technique

随着传感器技术的飞速发展，遥感影像的空间分辨率不断提高。受到计算机视觉领域深度学习算法的启发，目前学者多采用卷积神经网络来实现遥感图像的语义分割任务。虽然一些最前沿的方法已经在遥感影像语义分割任务上取得了不错的效果，但是都没有考虑遥感影像自身的一些特点。首先，常规计算机视觉语义分割任务中，待检测图像上一般只有少数几个到几十个目标，目标之间分布较为松散，见图1(a)。然而在遥感影像中，建筑分布一般较为密集，特别是在居民地区域，见图1(b)。其次，常规计算机视觉语义分割任务中，待检测目标尺寸一般较大，长宽一般在几十到几百像素，而遥感影像中建筑物尺寸一般要小很多，尺度(不同建筑物的影像自身所对应的像素数)变化也较大，见图1(c)。With the rapid development of sensor technology, the spatial resolution of remote sensing images has been continuously improved. Inspired by deep learning algorithms in the field of computer vision, scholars currently use convolutional neural networks to implement semantic segmentation tasks for remote sensing images. Although some cutting-edge methods have achieved good results in semantic segmentation tasks of remote sensing images, they have not considered some characteristics of remote sensing images themselves. First of all, in conventional computer vision semantic segmentation tasks, there are generally only a few to dozens of targets on the image to be detected, and the distribution of targets is relatively loose, as shown in Figure 1(a). However, in remote sensing images, the distribution of buildings is generally dense, especially in residential areas, as shown in Figure 1(b). Secondly, in conventional computer vision semantic segmentation tasks, the size of the target to be detected is generally large, and the length and width are generally tens to hundreds of pixels, while the size of buildings in remote sensing images is generally much smaller, and the scale (the image itself of different buildings) The corresponding number of pixels) also changes greatly, as shown in Figure 1(c).

为了保证语义分割的准确性，首先要保证建筑物(特征)提取的准确性。尽管现有技术中已经存在一些结合卷积神经网络来提取遥感影像中的某些特定目标的技术方案。例如，公开号为CN107025440A的专利申请中就公开了一种基于全卷积神经网络的遥感图像道路提取方法，所公开的技术方案使用全卷积神经网络实现结构性输出，可以充分挖掘遥感图像中道路的二维几何结构相关性。然而，现有技术中还没有能够充分利用卷积神经网络来提取遥感影像中不同尺度的建筑物的特征信息的有效方法。In order to ensure the accuracy of semantic segmentation, the accuracy of building (feature) extraction must first be ensured. Although there are already some technical solutions in the prior art that combine convolutional neural networks to extract some specific targets in remote sensing images. For example, the patent application with the publication number CN107025440A discloses a method for extracting roads from remote sensing images based on fully convolutional neural networks. The disclosed technical solution uses fully convolutional neural networks to achieve structural output, which can fully tap the 2D geometry dependency of roads. However, there is no effective method in the prior art that can make full use of convolutional neural networks to extract feature information of buildings of different scales in remote sensing images.

因此，需要提出新的技术方案来结合卷积神经网络对不同尺度下的影像特征进行融合，从而有效提高不同尺度建筑物的自动化提取的精度。Therefore, it is necessary to propose a new technical solution to combine the convolutional neural network to fuse image features at different scales, so as to effectively improve the accuracy of automatic extraction of buildings of different scales.

发明内容Contents of the invention

根据本发明的神经网络系统，用于遥感影像的建筑物提取，包括：According to the neural network system of the present invention, it is used for building extraction of remote sensing images, including:

VGG网络中的输入层、第一至第五卷积层、第一至第四池化层；The input layer, the first to fifth convolutional layers, and the first to fourth pooling layers in the VGG network;

第一单尺度融合层，第一单尺度融合层的输入端连接至第一卷积层的输出端，用于融合第一卷积层所输出的第一尺度多通道特征图并输出融合后的第一尺度融合单通道特征图；The first single-scale fusion layer, the input end of the first single-scale fusion layer is connected to the output end of the first convolutional layer, which is used to fuse the first-scale multi-channel feature map output by the first convolutional layer and output the fused First-scale fusion of single-channel feature maps;

第二至第五单尺度融合层，第二至第五单尺度融合层的输入端分别连接至第二至第五卷积层的输出端，用于分别融合第二至第五卷积层所输出的第二至第五尺度多通道特征图并分别输出融合后的第二至第五尺度融合单通道特征图；The second to fifth single-scale fusion layers, the input ends of the second to fifth single-scale fusion layers are respectively connected to the output ends of the second to fifth convolutional layers, for respectively fusing the second to fifth convolutional layers Output the second to fifth scale multi-channel feature maps and output the fused second to fifth scale single-channel feature maps respectively;

第一至第四上采样层，第一至第四上采样层的输入端分别连接至第二至第五单尺度融合层的输出端；The first to fourth upsampling layers, the input ends of the first to fourth upsampling layers are respectively connected to the output ends of the second to fifth single-scale fusion layers;

多尺度拼接融合层，多尺度拼接融合层的输入端连接至第一单尺度融合层、第一至第四上采样层的输出端，用于融合第一单尺度融合层、第一至第四上采样层所输出的特征图并输出融合后的多尺度融合单通道特征图；Multi-scale splicing fusion layer, the input end of the multi-scale splicing fusion layer is connected to the output end of the first single-scale fusion layer, the first to the fourth upsampling layer, and is used to fuse the first single-scale fusion layer, the first to the fourth The feature map output by the upsampling layer and output the fused multi-scale fusion single-channel feature map;

输出层，输出层的输入端连接至多尺度拼接融合层的输出端，用于基于多尺度融合单通道特征图输出建筑物特征图，The output layer, the input end of the output layer is connected to the output end of the multi-scale splicing and fusion layer, which is used to output the building feature map based on the multi-scale fusion single-channel feature map,

其中，第一单尺度融合层、第一至第四上采样层的输出端及多尺度拼接融合层的输出端所输出的是与遥感影像的分辨率相同的二维单通道特征图。Among them, the outputs of the first single-scale fusion layer, the output terminals of the first to fourth up-sampling layers, and the output terminals of the multi-scale splicing and fusion layer are two-dimensional single-channel feature maps with the same resolution as the remote sensing image.

根据本发明的神经网络系统，还包括：According to the neural network system of the present invention, it also includes:

第一至第四裁剪层，第一至第四裁剪层分别设置在第一至第四上采样层和多尺度拼接融合层之间，用于分别将第一至第四上采样层所输出的特征图裁剪至与原始输入影像相同的分辨率。The first to fourth clipping layers, the first to fourth clipping layers are respectively set between the first to fourth upsampling layers and the multi-scale splicing and fusion layer, for respectively outputting the first to fourth upsampling layers The feature maps are cropped to the same resolution as the original input image.

根据本发明的神经网络系统，在第一至第五卷积层之后还包括下列层：According to the neural network system of the present invention, the following layers are also included after the first to fifth convolutional layers:

第一至第五ReLU层、第一至第五Batch Normalization层、第一至第五Dropout层，用于避免过拟合，提高神经网络系统的泛化能力。The first to fifth ReLU layers, the first to fifth Batch Normalization layers, and the first to fifth Dropout layers are used to avoid overfitting and improve the generalization ability of the neural network system.

根据本发明的用于遥感影像的建筑物提取方法，包括：The building extraction method for remote sensing images according to the present invention includes:

构建经训练的如上文所述的神经网络系统；Construct a trained neural network system as described above;

使用经训练的神经网络系统来获取遥感影像所对应的建筑物特征图。The trained neural network system is used to obtain the building feature map corresponding to the remote sensing image.

根据本发明的用于遥感影像的建筑物提取方法，在构建经训练的如上文所述的神经网络系统的步骤之前，还包括：According to the building extraction method for remote sensing images of the present invention, before the step of constructing the trained neural network system as described above, it also includes:

使用包含建筑物的遥感训练影像及与遥感训练影像对应的标签影像的数据集对神经网络系统进行训练，以得到经训练的神经网络系统。The neural network system is trained by using a data set including remote sensing training images of buildings and label images corresponding to the remote sensing training images to obtain a trained neural network system.

根据本发明的用于遥感影像的建筑物提取方法，在获取遥感影像所对应的建筑物特征图之后，使用阈值法得到最终建筑物分布图。According to the building extraction method for remote sensing images of the present invention, after obtaining the building feature map corresponding to the remote sensing image, a threshold method is used to obtain the final building distribution map.

根据本发明的用于遥感影像的建筑物提取方法，在对神经网络系统进行训练时，使用Sigmoid Cross Entropy Loss损失函数并使用随机梯度下降算法。According to the building extraction method for remote sensing images of the present invention, the Sigmoid Cross Entropy Loss loss function and stochastic gradient descent algorithm are used when training the neural network system.

根据本发明的计算机可读存储介质，该存储介质上存储有计算机程序，程序被处理器执行时实现如上文所述的方法的步骤。According to the computer-readable storage medium of the present invention, a computer program is stored on the storage medium, and when the program is executed by a processor, the steps of the method as described above are realized.

根据本发明的计算设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，处理器执行程序时实现如上文所述的方法的步骤。The computing device according to the present invention includes a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, the steps of the method described above are realized.

根据本发明的上述技术方案，直接利用深层卷积神经网络中的多尺度信息，能够有效处理分布密集且尺度多样的建筑物，提高建筑物自动化提取的精度。另外，根据本发明的上述技术方案使用整幅影像作为输入，直接输出分割(即，建筑物提取)结果而不需要进行有重叠的切片，大幅提高了建筑物提取的效率。According to the above technical solution of the present invention, the multi-scale information in the deep convolutional neural network can be directly used to effectively process buildings with dense distribution and various scales, and improve the accuracy of automatic building extraction. In addition, according to the above technical solution of the present invention, the entire image is used as input, and the segmentation (ie, building extraction) result is directly output without overlapping slices, which greatly improves the efficiency of building extraction.

附图说明Description of drawings

并入到说明书中并且构成说明书的一部分的附图示出了本发明的实施例，并且与相关的文字描述一起用于解释本发明的原理。在这些附图中，类似的附图标记用于表示类似的要素。下面描述中的附图是本发明的一些实施例，而不是全部实施例。对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，可以根据这些附图获得其他的附图。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the embodiments of the invention and together with the related description serve to explain the principles of the invention. In the drawings, like reference numerals are used to denote like elements. The drawings in the following description are some, but not all, embodiments of the present invention. Those skilled in the art can obtain other drawings based on these drawings without creative efforts.

图1示出了常规待检测图像和本发明所要检测的遥感影像的示意图。Fig. 1 shows a schematic diagram of a conventional image to be detected and a remote sensing image to be detected by the present invention.

图2示例性地示出了根据本发明的神经网络系统的示意结构图。Fig. 2 exemplarily shows a schematic structural diagram of a neural network system according to the present invention.

图3示例性地示出了根据本发明的用于遥感影像的建筑物提取方法的示意流程图。Fig. 3 exemplarily shows a schematic flowchart of a method for extracting buildings from remote sensing images according to the present invention.

图4示例性地示出了图2所示的神经网络系统的各个层输出的不同影像图。FIG. 4 exemplarily shows different image maps output by each layer of the neural network system shown in FIG. 2 .

图5示例性地示出了一幅原始卫星遥感影像、其对应的真实标签影像图、以及根据本发明的技术方案所实际输出的建筑物特征图。Fig. 5 exemplarily shows an original satellite remote sensing image, its corresponding real label image map, and a building feature map actually output according to the technical solution of the present invention.

图6示例性地示出了根据本发明的技术方案在不同松弛系数下的准确率-召回率曲线。FIG. 6 exemplarily shows the precision-recall curves under different relaxation coefficients according to the technical solution of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互任意组合。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined arbitrarily with each other.

如背景技术部分结合图1所做的描述，由于遥感影像与常规图像之间存在上述区别，需要提出新的技术方案来结合卷积神经网络对不同尺度下的影像特征进行融合，从而有效提高不同尺度的建筑物的自动化提取的精度。As described in the background technology section in conjunction with Figure 1, due to the above-mentioned differences between remote sensing images and conventional images, it is necessary to propose a new technical solution to combine convolutional neural networks to fuse image features at different scales, so as to effectively improve the image quality of different images. Accuracy of automated extraction of scale buildings.

如图2所示，根据本发明的神经网络系统，用于遥感影像的建筑物提取，包括：As shown in Figure 2, according to the neural network system of the present invention, it is used for the building extraction of remote sensing images, including:

VGG网络中的输入层(对应于图2中的“input image”)、第一至第五卷积层(分别对应于图2中的“Conv1_2、Conv2_2、Conv3_3、Conv4_3、Conv5_3”所在的层集)、第一至第四池化层(分别对应于图2中的“Pool1、Pool2、Pool3、Pool4”层)；The input layer in the VGG network (corresponding to "input image" in Figure 2), the first to fifth convolutional layers (respectively corresponding to the layer set of "Conv1_2, Conv2_2, Conv3_3, Conv4_3, Conv5_3" in Figure 2 ), the first to fourth pooling layers (respectively corresponding to the "Pool1, Pool2, Pool3, Pool4" layers in Figure 2);

第一单尺度融合层(对应于图2中水平分布的左侧的第1个“Conv”)，第一单尺度融合层的输入端连接至第一卷积层的输出端，用于融合第一卷积层所输出的第一尺度多通道特征图并输出融合后的第一尺度融合单通道特征图；The first single-scale fusion layer (corresponding to the first "Conv" on the left side of the horizontal distribution in Figure 2), the input end of the first single-scale fusion layer is connected to the output end of the first convolutional layer for fusing the first A first-scale multi-channel feature map output by a convolutional layer and output a fused first-scale fused single-channel feature map;

第二至第五单尺度融合层(分别对应于图2中水平分布的第2至第5个“Conv”)，第二至第五单尺度融合层的输入端分别连接至第二至第五卷积层的输出端，用于分别融合第二至第五卷积层所输出的第二至第五尺度多通道特征图并分别输出融合后的第二至第五尺度融合单通道特征图；The second to fifth single-scale fusion layers (respectively corresponding to the second to fifth "Conv" in the horizontal distribution in Figure 2), the input ends of the second to fifth single-scale fusion layers are respectively connected to the second to fifth The output end of the convolutional layer is used to respectively fuse the second to fifth scale multi-channel feature maps output by the second to fifth convolutional layers and respectively output the fused second to fifth scale single-channel feature maps;

第一至第四上采样层(分别对应于图2中“2×Upsampling”、“4×Upsampling”、“8×Upsampling”、“16×Upsampling”)，第一至第四上采样层的输入端分别连接至第二至第五单尺度融合层的输出端；The first to fourth upsampling layers (respectively corresponding to "2×Upsampling", "4×Upsampling", "8×Upsampling", "16×Upsampling" in Figure 2), the input of the first to fourth upsampling layers The terminals are respectively connected to the output terminals of the second to the fifth single-scale fusion layers;

多尺度拼接融合层(对应于图2中的“Concat”层)，多尺度拼接融合层的输入端连接至第一单尺度融合层、第一至第四上采样层的输出端，用于融合第一单尺度融合层、第一至第四上采样层所输出的特征图并输出融合后的多尺度融合单通道特征图；Multi-scale splicing and fusion layer (corresponding to the "Concat" layer in Figure 2), the input of the multi-scale splicing and fusion layer is connected to the output of the first single-scale fusion layer and the first to fourth upsampling layers for fusion The feature map output by the first single-scale fusion layer, the first to the fourth upsampling layer, and output the fused multi-scale fusion single-channel feature map;

输出层(对应于图2中的“P”上方的“Conv”)，输出层的输入端连接至多尺度拼接融合层的输出端，用于基于多尺度融合单通道特征图输出建筑物特征图(对应于图2中的“P”)，The output layer (corresponding to "Conv" above "P" in Figure 2), the input end of the output layer is connected to the output end of the multi-scale splicing and fusion layer, which is used to output the building feature map based on the multi-scale fusion single-channel feature map ( corresponds to “P” in Figure 2),

在上述技术方案中，对于第一至第五单尺度融合层而言，尽管它们所对应的输入特征图的通道数(即，数量)C不同(如图2所示，分别为64、128、256、512、512)，然而，由于它们分别使用维度为1*(64、128、256、512、512)*1*1的不同卷积核来融合各自尺度下的所有输入特征图，所以最终都能够输出1个单通道特征图(即，第一至第五尺度融合单通道特征图，它们是分辨率分别为2562、1282、642、322、162的5个不同分辨率的单通道特征图)。In the above technical solution, for the first to fifth single-scale fusion layers, although the number of channels (that is, the number) C of their corresponding input feature maps is different (as shown in Figure 2, they are 64, 128, 256, 512, 512), however, since they use different convolution kernels with dimensions of 1*(64, 128, 256, 512, 512)*1*1 to fuse all input feature maps at their respective scales, the final Both can output a single-channel feature map (that is, the first to fifth scale fusion single-channel feature maps, which are 5 different resolutions of single-channel feature maps with resolutions of 2562, 1282, 642, 322, and 162 ).

对于多尺度拼接融合层而言，与上述第一至第五单尺度融合层的融合方法类似，其输入特征图的通道数C(即，数量)为5(包括第一单尺度融合层输出的特征图、以及使用上采样的方式得到的第一至第四上采样层所输出的4个新特征图，它们的分辨率均与原始遥感影像的分辨率相同)。因此，可以将这5个特征图拼接为一个5通道概率图(即，特征图)，并使用维度为1*5*1*1的卷积核得到单通道预测图(即，多尺度融合单通道特征图)。For the multi-scale splicing fusion layer, similar to the above-mentioned fusion method of the first to fifth single-scale fusion layers, the number of channels C (that is, the number) of the input feature map is 5 (including the output of the first single-scale fusion layer feature map, and the four new feature maps output by the first to fourth upsampling layers obtained by upsampling, and their resolutions are the same as those of the original remote sensing image). Therefore, these 5 feature maps can be spliced into a 5-channel probability map (i.e., feature map), and a single-channel prediction map (i.e., multi-scale fusion single channel feature map).

尽管图2所示的技术方案不需要对第一至第四上采样层进行裁剪，然而，可选地，上述神经网络系统还可以包括：Although the technical solution shown in FIG. 2 does not need to clip the first to fourth upsampling layers, however, optionally, the above-mentioned neural network system may also include:

第一至第四裁剪层(分别对应于图2中的“P2”至“P5”所在层集中的各个“Crop”)，第一至第四裁剪层分别设置在第一至第四上采样层和多尺度拼接融合层之间，用于分别将第一至第四上采样层所输出的特征图裁剪至与原始输入影像相同的分辨率。以自动适配遥感影像分辨率与第一至第四上采样层输出特征图的分辨率不一致的情况。The first to fourth clipping layers (respectively corresponding to each "Crop" in the layer set of "P2" to "P5" in Figure 2), the first to fourth clipping layers are respectively set in the first to fourth upsampling layers Between the multi-scale splicing and fusion layer, it is used to crop the feature maps output by the first to fourth upsampling layers to the same resolution as the original input image. To automatically adapt to the situation where the resolution of the remote sensing image is inconsistent with the resolution of the output feature maps of the first to fourth upsampling layers.

可选地，在第一至第五卷积层之后，上述神经网络系统还包括下列层(在图2中未示出)：Optionally, after the first to fifth convolutional layers, the above-mentioned neural network system also includes the following layers (not shown in Figure 2):

如图2所示的网络浅层生成具有精细空间分辨率但低层次语义信息的特征图，如图2所示的深层生成具有高级语义信息的粗糙特征图，如图2所示的中间层的特征映射对应于某些中间级特征。上述技术方案能够集成这些不同的特征图，因此，可以有效地提取具有不同外观或遮挡的建筑物。The shallow layer of the network shown in Figure 2 generates feature maps with fine spatial resolution but low-level semantic information, the deep layer shown in Figure 2 generates rough feature maps with high-level semantic information, and the middle layer shown in Figure 2 Feature maps correspond to some intermediate-level features. The above technical solution is able to integrate these different feature maps, thus, buildings with different appearances or occlusions can be extracted efficiently.

步骤S302：构建经训练的如上文所述的神经网络系统；Step S302: building a trained neural network system as described above;

步骤S304：使用经训练的神经网络系统来获取遥感影像(即，图2中的“inputimage”)所对应的建筑物概率图(即，如上文所述的建筑物特征图，即建筑物提取预测图“P”)。Step S304: Use the trained neural network system to obtain the building probability map (ie, the building feature map as described above, that is, the building extraction prediction Figure "P").

可选地，用于遥感影像的建筑物提取方法，在步骤S302之前还包括：Optionally, the method for extracting buildings from remote sensing images also includes before step S302:

步骤S302’：使用包含建筑物的遥感训练影像(即，图2中的“input image”)及与遥感训练影像对应的标签影像(即，图2中的“input map”)的数据集对神经网络系统进行训练，以得到经训练的神经网络系统。Step S302': Use the data set containing the remote sensing training image of the building (ie, "input image" in Figure 2) and the label image corresponding to the remote sensing training image (ie, "input map" in Figure 2) to neural network The network system is trained to obtain a trained neural network system.

可选地，在步骤S304和步骤S302’中，使用阈值法得到上述建筑物特征图(最终建筑提取结果)。Optionally, in step S304 and step S302', the threshold method is used to obtain the above-mentioned building feature map (final building extraction result).

可选地，在步骤S302’中，在对神经网络系统进行训练时，使用Sigmoid CrossEntropy Loss损失函数(即，图2中的“Loss”所对应的计算函数)和随机梯度下降算法。Optionally, in step S302', when training the neural network system, use the Sigmoid CrossEntropy Loss loss function (that is, the calculation function corresponding to "Loss" in Figure 2) and the stochastic gradient descent algorithm.

为了使本领域技术人员更好地理解本发明的有益技术效果，下面将结合具体实施例进行说明。In order to enable those skilled in the art to better understand the beneficial technical effect of the present invention, the following will be described in conjunction with specific embodiments.

如图4所示，图4(a)是一幅(第一尺度的)选自Massachusetts遥感数据集的原始卫星遥感影像(即，图2中的“input image”)，图4(b)是具有小感受野的第二尺度特征图经过插值后的特征图(即，图2中的“P2”)，基于该特征图能够提取原始卫星遥感影像的边缘和角点等低级特征。图4(c)是具有较大感受野的第三尺度特征图经过插值后的特征图(即，图2中的“P3”)，该特征图能够勾画出建筑物的初步轮廓。图4(d)是具有更大感受野的第四尺度特征图经过插值后的特征图(即，图2中的“P4”)，基于该特征图能够识别出湖泊等非建筑物区域。图4(e)是具有最大感受野的第五尺度特征图经过插值后的特征图(即，图2中的“P5”)，基于该特征图能够识别出湖泊和裸地等非建筑区域。最终，整合多层次的语义信息和空间信息获得了一个可靠的预测(上文所述的多尺度融合单通道特征图，即，图2中的“P”)，如图4(f)所示。As shown in Figure 4, Figure 4(a) is a (first scale) original satellite remote sensing image selected from the Massachusetts remote sensing dataset (that is, the “input image” in Figure 2), and Figure 4(b) is The interpolated feature map of the second-scale feature map with a small receptive field (ie, "P2" in Figure 2), based on this feature map, low-level features such as edges and corners of the original satellite remote sensing image can be extracted. Figure 4(c) is the interpolated feature map of the third-scale feature map with a larger receptive field (ie, "P3" in Fig. 2), which can outline the preliminary outline of the building. Figure 4(d) is the interpolated feature map of the fourth-scale feature map with a larger receptive field (ie, "P4" in Figure 2), based on which non-building areas such as lakes can be identified. Figure 4(e) is the interpolated feature map of the fifth-scale feature map with the largest receptive field (ie, "P5" in Figure 2), based on which non-building areas such as lakes and bare land can be identified. Finally, integrating multi-level semantic information and spatial information obtains a reliable prediction (the multi-scale fused single-channel feature map described above, i.e., “P” in Fig. 2), as shown in Fig. 4(f) .

图5示例性地示出了原始卫星遥感影像、其对应的真实标签影像图、以及根据本发明的技术方案所实际输出的建筑物特征图。Fig. 5 exemplarily shows the original satellite remote sensing image, its corresponding real label image map, and the actually output building feature map according to the technical solution of the present invention.

如图5所示，图5(a)是选自Massachusetts遥感数据集的一幅原始卫星遥感影像，图5(b)是其真实标签影像图，图5(c)是预测标签图(即，根据本发明的技术方案所实际输出的建筑物特征图)。从目视效果看，根据本发明的技术方案能够很好地预测建筑分布情况，且建筑边界准确。As shown in Figure 5, Figure 5(a) is an original satellite remote sensing image selected from the Massachusetts remote sensing dataset, Figure 5(b) is its real label image map, and Figure 5(c) is the predicted label map (ie, According to the actual output building feature map of the technical solution of the present invention). From the perspective of visual effects, the technical solution according to the invention can well predict the distribution of buildings, and the boundaries of buildings are accurate.

将准确率定义为检测出来的像素在真实像素的相邻ρ个像素范围以内的比例，将召回率定义为真实像素在检测出来的像素的相邻ρ个像素范围以内的比例。图6(a)为根据本发明的技术方案在ρ＝3时的准确率-召回率曲线，其对应的模型精度约为0.9668(对应图6(a)中用符号×表示的、准确率和召回率相等的点breakeven)。图6(b)为根据本发明的技术方案在ρ＝0时的准确率-召回率曲线，其对应的模型精度约为0.8424(对应图6(b)中用符号×表示的、准确率和召回率相等的点breakeven)。The accuracy rate is defined as the proportion of the detected pixels within the range of ρ adjacent pixels to the real pixel, and the recall rate is defined as the proportion of the detected pixels within the range of ρ adjacent pixels of the detected pixel. Fig. 6 (a) is the accuracy rate-recall rate curve when ρ=3 according to the technical solution of the present invention, and its corresponding model accuracy is about 0.9668 (corresponding to Fig. 6 (a) represented by symbol ×, accuracy rate and The point where the recall is equal breakeven). Fig. 6 (b) is the accuracy rate-recall rate curve when ρ=0 according to the technical solution of the present invention, and its corresponding model precision is about 0.8424 (corresponding to Fig. 6 (b) represented by symbol ×, accuracy rate and The point where the recall is equal breakeven).

表1给出了包括Mnih.V在其博士论文“Machine learning for aerial imagelabeling，Doctoral(2013)”中所公开的Mnih-CNN方案和Mnih-CNN+CRF方案、以及Saito在“Multiple object extraction from aerial imagery with convolutional neuralnetworks”中所公开的Saito-multi-MA方案和Saito-multi-MA&CIS方案、以及本发明的技术方案在内的不同方案之间的建筑提取性能对比。Table 1 shows the Mnih-CNN scheme and the Mnih-CNN+CRF scheme disclosed by Mnih.V in his doctoral thesis "Machine learning for aerial imagelabeling, Doctoral (2013)", and Saito in "Multiple object extraction from aerial Imagery with convolutional neuralnetworks "Saito-multi-MA scheme and Saito-multi-MA&CIS scheme disclosed in "imagery with convolutional neural networks", and the architectural extraction performance comparison between different schemes including the technical scheme of the present invention.

表1不同技术方案之间的性能对比Table 1 Performance comparison between different technical solutions

模型Model Breakeven(ρ＝3)Breakeven (ρ=3) Breakeven(ρ＝0)Breakeven (ρ=0) 预测时间(s)Prediction time (s) Mnih-CNNMnih-CNN 0.92710.9271 0.76610.7661 8.78.7 Mnih-CNN+CRFMnih-CNN+CRF 0.92820.9282 0.76380.7638 26.626.6 Saito-multi-MASaito-multi-MA 0.95030.9503 0.78730.7873 67.7267.72 Saito-multi-MA&CISSaito-multi-MA&CIS 0.95090.9509 0.78720.7872 67.8467.84 本发明的技术方案Technical scheme of the present invention 0.96680.9668 0.84240.8424 2.052.05

注：预测时间为单幅1500*1500大小测试影像预测所需平均时间，使用的显卡型号为NVIDIA TITAN X。Note: The prediction time is the average time required for prediction of a single 1500*1500 test image, and the graphics card model used is NVIDIA TITAN X.

从表1的结果可以看出，无论是在不同松弛系数(ρ＝3和ρ＝0)下的模型精度方面，还是在预测时间方面，根据本发明的上述技术方案都具有更好的技术效果。不仅能够显着提高提取精度，还可以减少运算时间。As can be seen from the results in Table 1, no matter in terms of model accuracy under different relaxation coefficients (ρ=3 and ρ=0), or in terms of prediction time, the above-mentioned technical scheme according to the present invention has better technical effects . Not only can the extraction accuracy be significantly improved, but also the operation time can be reduced.

结合根据本发明的上述技术方案，还提出了一种计算机可读存储介质，存储介质上存储有计算机程序，程序被处理器执行时实现如图3所示方法的步骤。In combination with the above-mentioned technical solution according to the present invention, a computer-readable storage medium is also proposed, on which a computer program is stored, and when the program is executed by a processor, the steps of the method shown in FIG. 3 are implemented.

结合根据本发明的上述技术方案，还提出了一种计算设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，处理器执行程序时实现如图3所示方法的步骤。In combination with the above-mentioned technical solution according to the present invention, a computing device is also proposed, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, the method shown in Figure 3 is implemented. A step of.

根据本发明的上述技术方案，使用VGG网络作为基础结构，抽取网络中各个分辨率特征图中的最后一层，使用卷积操作融合为单通道的特征图。通过重采样和特征图拼接，得到最终的预测结果。According to the above technical solution of the present invention, the VGG network is used as the basic structure, the last layer of each resolution feature map in the network is extracted, and the convolution operation is used to fuse into a single-channel feature map. Through resampling and feature map splicing, the final prediction result is obtained.

根据本发明的上述技术方案，还具有以下优点：1)能够融合各个分辨率特征图，从而提取输入影像多尺度信息，实现建筑物的精准提取；2)由于在预测(即，提取)时不需要进行模型集成，也不需要后处理操作，因而大幅提高了建筑物提取效率；3)由于使用全卷积网络，因而在显存允许的情况下能够接受任意尺寸的输入影像。According to the above technical solution of the present invention, it also has the following advantages: 1) It is possible to fuse each resolution feature map, thereby extracting the multi-scale information of the input image, and realizing the precise extraction of buildings; Model integration is required, and post-processing operations are not required, thus greatly improving the efficiency of building extraction; 3) Due to the use of a fully convolutional network, it can accept input images of any size as long as the video memory allows.

另外，根据本发明的上述技术方案，直接使用整幅影像作为输入，通过一次网络前向传播就可以获得分割(即，建筑物提取)结果，而不需要通过有重叠切片的方式来进行模型集成，也不需要进行后处理操作，大幅提高了建筑物提取效率。通过基于Massachusetts遥感数据集进行的上述对比测试结果可以表明，根据本发明的上述技术方案无论是在精度还是效率上都显著优于其他方法。In addition, according to the above technical solution of the present invention, the entire image is directly used as input, and the segmentation (ie, building extraction) result can be obtained through one network forward propagation, without the need for model integration by overlapping slices , and no post-processing operation is required, which greatly improves the efficiency of building extraction. The above comparison test results based on the Massachusetts remote sensing data set can show that the above technical solution according to the present invention is significantly better than other methods in terms of accuracy and efficiency.

上面描述的内容可以单独地或者以各种方式组合起来实施，而这些变型方式都在本发明的保护范围之内。The content described above can be implemented alone or combined in various ways, and these variants are all within the protection scope of the present invention.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中，在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分；例如，一个物理组件可以具有多个功能，或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器，如数字信号处理器或微处理器执行的软件，或者被实施为硬件，或者被实施为集成电路，如专用集成电路。这样的软件可以分布在计算机可读介质上，计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的，术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外，本领域普通技术人员公知的是，通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据，并且可包括任何信息递送介质。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制。尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例的技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them. Although the present invention has been described in detail with reference to the aforementioned embodiments, those of ordinary skill in the art should understand that: it can still modify the technical solutions described in the aforementioned embodiments, or perform equivalent replacements for some of the technical features; and these The modification or replacement does not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.

Claims

1. a kind of nerve network system, which is characterized in that the building for remote sensing image extracts, including：

Input layer, the first to the 5th convolutional layer, first to fourth pond layer in VGG networks；

First single scale fused layer, the input terminal of the first single scale fused layer are connected to the output of first convolutional layer End, for merging the first scale multi-channel feature figure that first convolutional layer is exported and the first scale exported after fusion melts Close single channel characteristic pattern；

The input terminal of second to the 5th single scale fused layer, the described second to the 5th single scale fused layer is respectively connected to described The output end of two to the 5th convolutional layers, the second to the 5th scale exported for merging the described second to the 5th convolutional layer respectively The second to the 5th scale after multi-channel feature figure and respectively output fusion merges single channel characteristic pattern；

The input terminal of first to fourth up-sampling layer, the first to fourth up-sampling layer is respectively connected to described second to the 5th The output end of single scale fused layer；

Multiple dimensioned splicing fused layer, it is described it is multiple dimensioned splicing fused layer input terminal be connected to the first single scale fused layer, The output end of the first to fourth up-sampling layer, for merging the first single scale fused layer, described first to fourth Characteristic pattern that sample level is exported simultaneously exports the Multiscale Fusion single channel characteristic pattern after fusion；

Output layer, the input terminal of the output layer is connected to the output end of the multiple dimensioned splicing fused layer, for based on described Multiscale Fusion single channel characteristic pattern exports building feature figure,

Wherein, the first single scale fused layer, the output end of the first to fourth up-sampling layer and the multiple dimensioned splicing What the output end of fused layer was exported is two-dimentional single channel characteristic pattern identical with the resolution ratio of the remote sensing image.

2. nerve network system as described in claim 1, which is characterized in that further include：

First to fourth cuts layer, and the first to fourth cutting layer is separately positioned on the first to fourth up-sampling layer and institute Between stating multiple dimensioned splicing fused layer, for respectively by described first to fourth up-sample the characteristic pattern that is exported of layer be cut to It is originally inputted the identical resolution ratio of image.

3. nerve network system as claimed in claim 1 or 2, which is characterized in that after the described first to the 5th convolutional layer Further include following layers：

First to the 5th ReLU layers, the first to the 5th Normalization layers of Batch, first to the 5th Dropout layers, be used for Over-fitting is avoided, the generalization ability of the nerve network system is improved.

4. a kind of building extracting method for remote sensing image, which is characterized in that including：

Build the housebroken nerve network system as described in any one of claim 1-3；

The building feature figure corresponding to remote sensing image is obtained using housebroken nerve network system.

5. being used for the building extracting method of remote sensing image as claimed in claim 4, which is characterized in that in the structure through instruction Before the step of experienced nerve network system as described in any one of claim 1-3, further include：

Use the remote sensing training image comprising building and the data set pair of label image corresponding with remote sensing training image The nerve network system is trained, to obtain the housebroken nerve network system.

6. being used for the building extracting method of remote sensing image as described in claim 4 or 5, which is characterized in that described in acquisition After the building feature figure corresponding to remote sensing image, final building distribution map is obtained using threshold method.

7. being used for the building extracting method of remote sensing image as claimed in claim 5, which is characterized in that the nerve net When network system is trained, calculated using Sigmoid Cross Entropy Loss loss functions and using stochastic gradient descent Method.

8. a kind of computer readable storage medium, which is characterized in that be stored with computer program, the journey on the storage medium The step of any one of claim 4 to 7 the method is realized when sequence is executed by processor.

9. a kind of computing device, which is characterized in that including memory, processor and be stored on the memory and can be described The computer program run on processor, the processor realize any one of claim 4 to 7 institute when executing described program The step of stating method.