CN113920147A

CN113920147A - Remote sensing image building extraction method and device based on deep learning

Info

Publication number: CN113920147A
Application number: CN202111522605.3A
Authority: CN
Inventors: 陈欢欢; 江贻芳; 朱云慧; 黄恩兴; 黄不了; 高健; 于娜; 王力; 李建平
Original assignee: Stargis Tianjin Technology Development Co ltd; University of Science and Technology of China USTC
Current assignee: Stargis Tianjin Technology Development Co ltd; University of Science and Technology of China USTC
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-01-11
Anticipated expiration: 2041-12-14
Also published as: CN113920147B

Abstract

The invention provides a remote sensing image building extraction method and equipment based on deep learning, wherein the method comprises the following steps: extracting image features of the remote sensing image, and generating a building example detection frame according to the obtained image features; constructing an initial outline of the building example according to the detection frame; generating a building polygon according to the initial contour and the image characteristics corresponding to the initial contour in the remote sensing image; and extracting a target building example in the remote sensing image based on the building polygon. The invention can effectively and accurately describe the geometric shape of the building example by combining the initial outline and the image characteristic of the building, and automatically realize the accurate extraction of the building example.

Description

Remote sensing image building extraction method and device based on deep learning

Technical Field

The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image building extraction method and device based on deep learning.

Background

Accurate building extraction plays a crucial role in many applications, such as building reconstruction, mapping, population estimation, illegal building monitoring, etc., while manually delineating individual buildings using tools such as ArcGIS consumes a great deal of time and effort. With the development of remote sensing technology, high-resolution images are widely applied, so that automatic extraction of buildings becomes possible. However, since the background information of the building is complex, densely distributed and diverse in appearance, it is challenging to develop an effective automatic building extraction method.

Conventional building extraction methods aim at identifying building regions using artificially designed features such as shapes, colors, textures, shadows, and the like. For example, Cui et al extracts a rough building region by using geometric features and color features of a building, and then detects straight lines by using hough transform according to the characteristic that the building generally has a regular outline, thereby extracting the outline of the building more finely. Sirmacek et al transform the building detection problem into a number of sub-graph matching problems by combining SIFT with a graph theory tool. Ok et al identify building areas using the spatial relationship between buildings and shadows. However, the ability of these methods to automatically extract building areas may be limited because these empirically designed features are only effective for a particular type of building under certain circumstances.

With the mature development of the deep learning technology, in the field of building extraction, the current research on the deep learning technology is dedicated to elaborately designing a Convolutional Neural Network (CNN) structure and generating pixel-level prediction for an input remote sensing image. For example, paisitrkriaangkrai et al train convolutional neural networks to extract features of scaled remote sensing images of different sizes, and supplement the features extracted by the neural networks with manually designed features to further improve the accuracy of building region identification. Kampffmeyer et al propose a method to replace the penalty in a full convolution neural network (FCN) with a median class frequency weighted penalty to achieve accurate segmentation of small objects. Hamaguchi et al have constructed a multitask model based on convolutional neural network to solve the problem of large area change of building instances, the model distinguishes buildings of different sizes, extracts the buildings and divides the buildings into different tasks according to the building sizes for processing, and utilizes road information to assist in building identification. Hamaguchi proposes that local information can be gathered by stepwise decreasing hole convolutions to improve the partitioning performance for small, dense building instances. Yuan proposes to improve the performance at the building boundary by using the distance representation of the pixel points to the building boundary.

It can be seen that although the existing remote sensing image building extraction method based on deep learning has made great progress compared with the traditional building extraction method, the way of predicting pixel by pixel is difficult to reason about the geometric characteristics of the building, and cannot generate accurate outlines and regular shapes, thereby possibly resulting in block segmentation and poor segmentation performance at the building boundary. Therefore, how to provide a remote sensing image building extraction method based on deep learning to accurately depict the geometric shape of a building example, and realizing the remote sensing image building extraction based on the geometric shape is a problem to be solved urgently at present.

Disclosure of Invention

The invention provides a remote sensing image building extraction method and equipment based on deep learning, which accurately describe the geometric shape of a building example by combining the initial contour and image characteristics of the building and automatically realize accurate extraction of the building example.

In one aspect of the invention, a remote sensing image building extraction method based on deep learning is provided, and the method comprises the following steps:

extracting image features of the remote sensing image, and generating a building example detection frame according to the obtained image features;

constructing an initial outline of the building example according to the detection frame;

generating a building polygon according to the initial contour and the image characteristics corresponding to the initial contour in the remote sensing image;

and extracting a target building example in the remote sensing image based on the building polygon.

Optionally, the extracting an image feature of the remote sensing image, and generating a building instance detection frame according to the obtained image feature includes:

estimating building examples existing in the remote sensing image and a plurality of estimated areas of each building example according to the image characteristics of the remote sensing image;

generating a plurality of corresponding recommendation areas for the corresponding building examples according to the plurality of estimated areas of the building examples;

and selecting a target recommendation area from the plurality of recommendation areas of each building example, and taking the target recommendation area as a detection frame of the building example.

Optionally, constructing an initial contour of the building instance according to the detection frame includes:

connecting the central points of the four boundaries of the detection frame to form a diamond-shaped outline;

predicting the target offset of the top point of the diamond-shaped profile and the target pole, adjusting the top point of the diamond-shaped profile according to the target offset, and taking the adjusted top point as the target pole;

respectively extending preset lengths to two ends of the boundary of the detection frame where the target pole is located by taking each target pole as a center to obtain four target boundaries;

and connecting the end points of the obtained target boundary in sequence to obtain an initial outline of the building example.

Optionally, the generating a building polygon according to the initial contour and the image feature corresponding to the initial contour in the remote sensing image includes:

s031, generating an initial building polygon according to the initial contour and the image characteristics corresponding to the initial contour in the remote sensing image;

s032, predicting missing vertex information in the initial building polygon according to the initial building polygon and image features of the initial building polygon corresponding to the initial building polygon in a remote sensing image;

s033, performing vertex adjustment on the initial building polygon according to the obtained vertex information to obtain a second building polygon;

and S034, performing iterative computation by taking the second building polygon as the initial building polygon according to the steps of S031-S033, performing vertex adjustment according to the predicted missing vertex information in the current building polygon until the preset iteration times are met, and taking the vertex-adjusted polygon obtained after the last iterative computation as the building polygon.

Optionally, the generating an initial building polygon according to the initial contour and the image feature corresponding to the initial contour in the remote sensing image includes:

selecting an initial point set for constructing an initial building polygon from the initial contour;

acquiring image characteristics of the position of each initial point in the initial point set according to a preset characteristic mapping relation;

based on a preset combined prediction network model, predicting the probability that each initial point in an initial point set is a real building vertex and the offset of the initial point from the real building vertex according to the acquired image characteristics of the position of each initial point;

and selecting candidate building vertexes from the initial point set according to the probability that each initial point in the initial point set is a real building vertex and the offset of the initial point from the real building vertex, and sequentially connecting the selected candidate building vertexes to generate an initial building polygon.

Optionally, the method further comprises:

pre-constructing the joint prediction network model;

optimizing the joint prediction network model based on a supervised learning strategy.

Optionally, the predicting, according to the initial building polygon and the image features of the initial building polygon in the remote sensing image, missing vertex information in the initial building polygon includes:

selecting a second initial point set for constructing the building polygon from the initial polygons by taking the initial building polygon as the initial polygon;

acquiring image characteristics of the position of each second initial point in the second initial point set according to a preset characteristic mapping relation;

predicting the probability that each second initial point in the second initial point set is a real building vertex and the offset of the second initial point from the real building vertex according to the acquired image characteristics of the position of each second initial point on the basis of a preset missing vertex prediction network;

and selecting the missing vertex information in the initial building polygon from the second initial point set according to the probability that each second initial point in the second initial point set is the real building vertex and the offset of the second initial point from the real building vertex.

Optionally, the method further comprises:

pre-constructing the missing vertex prediction network model;

optimizing the missing vertex prediction network model based on a supervised learning strategy.

Furthermore, the invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as described above.

In addition, the invention also provides remote sensing image building extraction equipment based on deep learning, which comprises a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor realizes the steps of the method when executing the computer program.

According to the remote sensing image building extraction method and device based on deep learning, the initial contour of a building is constructed according to a building example detection frame, a building polygon is generated according to the obtained initial contour and the image characteristics corresponding to the initial contour in the remote sensing image, and the target building example in the remote sensing image is extracted based on the building polygon. The invention can effectively and accurately describe the geometric shape of the building example by combining the initial outline and the image characteristic of the building, and automatically realize the accurate extraction of the building example.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a schematic flow chart of a remote sensing image building extraction method based on deep learning according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a specific process of step S03 in the method for extracting a remote sensing image building based on deep learning according to the embodiment of the present invention;

FIG. 3 is a flowchart illustrating the generation of tag values for an initial vertex heat map and initial offset values according to an embodiment of the present invention;

fig. 4 is a flowchart illustrating a process of generating a second vertex heat map tag value according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Fig. 1 schematically shows a flowchart of a remote sensing image building extraction method for deep learning according to an embodiment of the present invention. Referring to fig. 1, the method for extracting a deep-learning remote sensing image building provided by the embodiment of the invention specifically includes steps S01-S04, as follows:

and S01, extracting the image characteristics of the remote sensing image, and generating a building example detection frame according to the obtained image characteristics.

Specifically, for a given remote sensing image input image, image features are extracted through an object detector, and a group of building instance detection frames are generated according to the image features.

In this embodiment, the extracting of the image features of the remote sensing image in step S01, and generating the building instance detection frame according to the obtained image features specifically include: estimating building examples existing in the remote sensing image and a plurality of estimated areas of each building example according to the image characteristics of the remote sensing image; generating a plurality of corresponding recommendation areas for the corresponding building examples according to the plurality of estimated areas of the building examples; and selecting a target recommendation area from the plurality of recommendation areas of each building example, and taking the target recommendation area as a detection frame of the building example.

And S02, constructing an initial outline of the building example according to the detection frame.

In this embodiment, the constructing the initial contour of the building instance according to the detection frame in step S02 specifically includes: connecting the central points of the four boundaries of the detection frame to form a diamond-shaped outline; predicting the target offset of the top point of the diamond-shaped profile and the target pole, adjusting the top point of the diamond-shaped profile according to the target offset, and taking the adjusted top point as the target pole; respectively extending preset lengths to two ends of the boundary of the detection frame where the target pole is located by taking each target pole as a center to obtain four target boundaries; and connecting the end points of the obtained target boundary in sequence to obtain an initial outline of the building example.

In particular, since the octagonal outline extending from the four extreme points (i.e., the uppermost, leftmost, lowermost, and rightmost points) of the building can tightly surround the building example, the present invention takes this octagonal shape as the initial outline. In order to obtain the positions of the four extreme points, after a detection frame is generated for each building example by using an object detection algorithm, the central points of four boundaries of the detection frame are connected to form a diamond-shaped outline, the diamond-shaped outline is used as input, the offset between the diamond-shaped top point and the extreme points is predicted through a preset extreme point prediction network model, the top point of the diamond-shaped outline is adjusted, and the adjusted top point is used as a target pole. And after the target poles are obtained, for the uppermost and the lowermost target poles, the preset length of the side length of the horizontal side of the detection frame is horizontally extended towards the two ends by taking the point as the center, and for the leftmost and the rightmost poles, the preset length of the side length of the vertical side of the detection frame is vertically extended towards the two ends. And if the extended line segment exceeds the boundary of the detection frame, cutting off, and connecting the end points of the obtained 4 line segments to obtain an octagonal outline, namely the initial outline of the building example. The network structure of the extreme point prediction network model comprises 8 cyclic convolution (CirConv) layers with convolution kernel size of 9 and a standard one-dimensional convolution layer with convolution kernel size of 1, the generated feature graph is connected into features with channel number of 1280, and then one-dimensional convolution prediction point offset with convolution kernel size of 1 is used. The preset length can be 1/5-1/2, and is preferably 1/4.

And S03, generating a building polygon according to the initial contour and the image characteristics corresponding to the initial contour in the remote sensing image.

And S04, extracting a target building instance in the remote sensing image based on the building polygon.

The remote sensing image building extraction method based on deep learning provided by the embodiment of the invention constructs the initial outline of the building according to the building example detection frame, generates the building polygon according to the obtained initial outline and the image characteristics corresponding to the initial outline in the remote sensing image, extracts the target building example in the remote sensing image based on the building polygon, can effectively accurately describe the geometric shape of the building example by combining the initial outline and the image characteristics of the building, automatically realizes accurate extraction of the building example,

in the embodiment of the present invention, as shown in fig. 2, the step S03 of generating a building polygon according to the initial contour and the image feature of the initial contour corresponding to the remote sensing image specifically includes the following steps:

s033, performing vertex adjustment on the initial building polygon according to the obtained vertex information to obtain a second building polygon; and S034, performing iterative computation by taking the second building polygon as the initial building polygon according to the steps of S031-S033, performing vertex adjustment according to the predicted missing vertex information in the current building polygon until the preset iteration times are met, and taking the vertex-adjusted polygon obtained after the last iterative computation as the building polygon.

According to the embodiment of the invention, the geometric characteristics of the building are integrated into a general polygon deformation algorithm so as to improve the building extraction precision. Considering that a building instance may be represented as a polygon composed of neighboring building vertices, a portion of the initial points are adaptively selected as candidate building vertices and sequentially connected to generate an initial building polygon. Specifically, for each building instance, the polygon morphing process includes two stages: initial polygon generation and missing vertex restoration. In the first stage, an initial contour consisting of N points is first constructed from the detection box. And then, outputting a vertex heat map and point offset by taking the initial contour and the corresponding image characteristics as input through a preset vertex and offset joint prediction network model, and generating an initial building polygon. For the second stage, missing building vertices are iteratively predicted using the predicted building polygons and corresponding image features as inputs to construct final building boundaries.

Wherein, the step S031 of generating an initial building polygon according to the initial contour and the image features corresponding to the initial contour in the remote sensing image further includes: selecting an initial point set for constructing an initial building polygon from the initial contour; acquiring image characteristics of the position of each initial point in the initial point set according to a preset characteristic mapping relation; based on a preset combined prediction network model, predicting the probability that each initial point in an initial point set is a real building vertex and the offset of the initial point from the real building vertex according to the acquired image characteristics of the position of each initial point; and selecting candidate building vertexes from the initial point set according to the prediction result, and sequentially connecting the selected candidate building vertexes to generate an initial building polygon.

In the embodiment of the invention, based on a preset joint prediction network model, the probability that each initial point in an initial point set is a real building vertex and the offset of the initial point from the real building vertex are predicted according to the acquired image characteristics of the position of each initial point, and the method specifically comprises the following steps:

constructing input features of an initial point

The method comprises the following steps:

，

wherein,

indicating the ith initial point in the set of initial points, the symbol o indicates the join operation,

is expressed according to the initial point

Extracting the image characteristics of the position from a preset characteristic map;

namely, the image features captured by the CNN feature extraction network from the remote sensing image.

Input characteristics of each initial point

Inputting a preset joint prediction network model, and predicting an initial vertex heat map and initial point offsets of an initial point set, wherein the initial vertex heat map represents the probability that each initial point is a real building vertex, and the initial point offsets represent the offset of each initial point from the real building vertex. The structure of the joint prediction network model sequentially comprises a circulating convolution layer, a standard one-dimensional convolution layer and two one-dimensional convolution layers arranged in parallel.

In this embodiment, after the initial contour is constructed, N =128 initial points are uniformly sampled along the initial contour

，R²Representing a two-dimensional real vector. And then jointly predicting the initial vertex heat map by using a vertex and offset joint prediction network model

And initial point offset

. The former represents the probability that each initial point is a true building polygon vertex, and the latter represents the offset from the initial point to the true building vertex. First, construct an arbitrary initial point

Input feature of

Then, the input features are passed through 8 cyclic convolution (CirConv) layers with convolution kernel size of 9 and a standard one-dimensional convolution layer with convolution kernel size of 1, and the generated feature graph is connected into features with channel number of 1280. And predicting the initial vertex heat map and the initial point offset by using two one-dimensional convolutions with convolution kernels of 1 respectively.

The invention can make the extracted outline of the building more regular and more fit to the geometric shape of the building by generating a polygon composed of the top points of the building to represent the building example. Moreover, unique geometric characteristics of the building are integrated into a general polygonal deformation process, so that the polygonal deformation process is more reliable, and the performance of polygonal prediction is improved.

In the embodiment of the invention, a joint prediction network model needs to be constructed in advance. In order to guide the effective feature representation of feature extraction network learning, the joint prediction network model is optimized based on a supervised learning strategy, firstly, label values of an initial vertex heat map and an initial offset value are generated, then, a loss function of network optimization is determined, and finally, the optimized joint prediction network model is obtained.

In this embodiment, the joint prediction network model is optimized based on a supervised learning strategy, including generating a vertex heatmap and a label value of an offset value, as shown in fig. 3, which is specifically implemented as follows:

s31, selecting a target point set along the edge of the building in the remote sensing image, and determining a target index set of the real building vertex in the target point set, wherein the target point set comprises any real building vertex. Specifically, the same number of target points are uniformly sampled along the edge of the building in the remote sensing image according to the number of initial points in the initial point set, so as to obtain a target point set.

And S32, determining the initial points of the initial vertex heat map, which index belongs to the target index set, as positive samples, setting the probability of the positive samples in the vertex heat map to be 1, and determining the probability of the initial points in the initial vertex heat map, which are not the positive samples, according to the two-dimensional Gaussian distribution with the index of the positive samples as the center. In this embodiment, the dimensions of the initial point and the target point are the same, and both are 128 points, and if K is the index of the real building point in the target point. It is desirable that the initial point is as close as possible to the target point, i.e. the point with index K in the initial point is considered as a positive sample, so that it can predict a higher probability.

In the training process, in order to ensure that any real building vertex is contained in the target edge points, N =128 target point sets are uniformly sampled along the edge of the building

And the number of target points on each edge is distributed according to the edge length. Order to

M =14, M being the number of real building vertices in the target point, K being the index of the real building vertices. For target vertex heatmap

The initial point at which an arbitrary index belongs to K is considered a positive example and is set to 1. At the same time, it is contemplated that target points near the building vertices may also form similarly shaped polygons for the initial point

If its index i is close to arbitrary

Then, then

Is not directly set to 0 but is set to

Given as a central two-dimensional gaussian.

S33, setting the weight label of the offset corresponding to the initial point with the probability greater than the preset high value in the initial vertex heat map as 1, and setting the weight label of the offset corresponding to the initial point with the probability less than the preset high value in the initial vertex heat map as a preset value, wherein the preset value is less than 1. Wherein, the preset value can be selected to be 0.1. The goal of offset prediction is to move the initial points on the vertex heatmap with high values towards the real building vertices. This means that other initial points do not directly participate in the construction of the final building polygon, nor do they need to participate in the point regression process. However, since these points are densely distributed, more contextual features may be provided to facilitate the learning process, and thus these points are still given a lower weight of 0.1 rather than being set directly to 0 to optimize their location. Notably, these points are not required to maintain high regression accuracy, as they are only expected to provide context information rather than forming building polygons.

In this embodiment, optimizing the joint prediction network model based on a supervised learning strategy further includes determining a loss function for network optimization, and the specific implementation is as follows:

adopting a Focal loss function to construct a first loss function corresponding to the vertex heat map, and optimizing the joint prediction network model based on the first loss function, wherein the first loss function is defined as:

，

wherein, α =2, β =4, M is the number of real building vertices in the target point set, N is the number of initial points in the initial point set,

as the probability that the ith initial point is the true building vertex,

is the probability that the ith target point is the true building vertex.

By using smooth_L1Constructing a second loss function corresponding to the point offset by the loss function, and optimizing the joint prediction network model based on the second loss function, wherein the second loss function is defined as:

，

wherein N is the number of initial points in the initial point set,

the loss for scaling the ith initial point, which is set to the corresponding weight label, 1 or 0.1,

is the (i) th target point,

is the ith initial point of the first image,

the offset of the ith initial point from the corresponding real building vertex.

Using the predicted vertex heatmap and offsets pointing to the building vertices, the initial building polygons may be generated. However, since only some initial points are selected as polygon vertices, rather than using all initial points to construct polygon boundaries, the performance of the algorithm may be degraded if some building vertices are missing. Therefore, the invention provides a missing vertex recovery strategy, which is realized by adopting an iterative calculation mode because all the missing vertices are difficult to recover in one traversal.

In step S032, predicting vertex information missing from the initial building polygon according to the initial building polygon and the image features of the initial building polygon in the remote sensing image further includes: selecting a second initial point set for constructing the building polygon from the initial polygons by taking the initial building polygon as the initial polygon; acquiring image characteristics of the position of each second initial point in the second initial point set according to a preset characteristic mapping relation; predicting the probability that each second initial point in the second initial point set is a real building vertex and the offset of the second initial point from the real building vertex according to the acquired image characteristics of the position of each second initial point on the basis of a preset missing vertex prediction network; and selecting the vertex information missing from the initial building polygon from the second initial point set according to the prediction result.

The method includes the following steps of predicting the probability that each second initial point in a second initial point set is a real building vertex and the offset of the second initial point from the real building vertex according to the acquired image characteristics of the position of each second initial point based on a preset missing vertex prediction network, and specifically includes the following steps:

constructing input features for a second initial point

The method comprises the following specific steps:

，

wherein,

denotes the ith second initial point in the second initial point set, the symbol o denotes a join operation,

is according to the secondInitial point

The point position of the image is extracted from the preset feature map, namely the image feature captured by the CNN feature extraction network from the remote sensing image,

taking a binary value to represent a second initial point

Whether it is a vertex of the initial polygon;

the input characteristics of each second initial point

Inputting a preset missing vertex prediction network model, and predicting a second initial vertex heat map and second point offsets of a second initial point set, wherein the second initial vertex heat map represents the probability that each second initial point is a real building vertex, and the second point offsets represent the offset of each second initial point from the real building vertex. The missing vertex prediction network model is similar in structure to the joint prediction network model, and includes an additional input to indicate the location of each original polygon vertex.

In this embodiment, in the t-th iteration of the recovery strategy, the building polygon generated in the previous stage is used as the initial polygon in the current stage. To recover the missing vertices of the original polygon, first sample N =128 points along the polygon outline

And selecting some of the polygon vertices as new polygon vertices to construct new building polygons for the next stage. Therefore, the characteristics of N =128 sampling points are used as input, and a plurality of convolution layers are adopted to jointly learn a new vertex heat map

And point offset

The former represents the probability that an arbitrary sampling point becomes a new polygon vertex, and the latter represents the offset amount by which an arbitrary sampling point becomes a new polygon vertex, for adjusting the position of the polygon vertex.

In the embodiment of the invention, a missing vertex prediction network needs to be constructed in advance, a missing vertex prediction network model is optimized based on a supervised learning strategy, a second initial vertex heat map label value is generated firstly, and then a loss function of network optimization is determined to obtain the optimized missing vertex prediction network.

In this embodiment, optimizing the missing vertex prediction network model based on a supervised learning strategy includes generating a second initial vertex heatmap label value, as shown in fig. 4, which is specifically implemented as follows:

s41, calculating Euclidean distances between each point in the first sequence and each point in the second sequence by adopting a dynamic time warping algorithm, wherein the first sequence consists of initial polygon vertexes, and the second sequence consists of real building vertexes;

s42, searching the target building vertex with the minimum Euclidean distance from each polygon vertex in the first sequence in the second sequence, and realizing the matching of the initial polygon vertex and the target building vertex;

s43, selecting a second target point set along each target building vertex in the remote sensing image, and determining a second target index set of the real building vertices in the second target point set, wherein the second target point set comprises any real building vertex which is not connected with each polygon vertex in the first sequence;

and S44, determining the initial points of the second initial vertex heat map, of which the indexes belong to the second target index set, as positive samples, setting the probability of the initial points in the second initial vertex heat map to be 1, and determining the probability of the second initial points in the second initial vertex heat map except the positive samples according to a two-dimensional Gaussian distribution with the indexes of the positive samples as centers.

In this embodiment, in order to

And

generating a reliable target, the first step is to match each vertex of the initial polygon with a real building vertex in the current iteration process. In consideration of the dependency between polygon vertices, the present invention employs a variant Dynamic Time Warping (DTW) algorithm to implement the vertex matching process. For an initial polygon with Q =9 vertices and a building instance with M =14 vertices, their vertex sequence is represented as a first sequence

And a second sequence

Wherein

And

the Euclidean distance between them is within the range of

And any

The constituent vertex pairs are smallest. Calculating the minimum distance between the first sequence A and the second sequence B using the DTW algorithm, whichever

And any

One or more consecutive vertices in sequence B and sequence a, respectively, may be matched. However, since only one vertex of the initial polygon can be matchedA real building vertex, for any of the following rules

Selecting unique target vertices

. The method specifically comprises the following steps: firstly, considering the target vertex not connected with other vertexes in A, then weighting each target vertex according to the area of the region surrounded by the vertex and two adjacent vertexes thereof, and finally only having the highest weight

And (6) selecting.

After matching the two vertex sequences, sampling N =128 building vertices to obtain a second set of target points

. In particular, for any edge of the initial polygon

First, determine and

matching target vertices

，

And

target point number between is fixed

. If it is not

And

and if any real building vertex exists between the sampling points, distributing the sampling points on the sides of each building according to the side length. Is provided with

,

，

Is not arbitrary in the second set of target points

The number of the top points of the connected real buildings,

in the t-th iteration

Each is not at will

Indices of connected real building vertices. For target vertex heatmap

If the index i belongs to

Then, then

Considered a positive example, is set to 1. At the same time, it is contemplated that target points near the building vertices may also form similarly shaped polygons for the second initial point if its index i is near any arbitrary point

Then, then

Is not directly set to 0 but is set to

Given as a central two-dimensional gaussian.

In this embodiment, optimizing the missing vertex prediction network model based on a supervised learning strategy includes determining a loss function for network optimization, which is specifically implemented as follows:

adopting a Focal loss function to construct a third loss function corresponding to the second initial vertex heat map, and optimizing the missing vertex prediction network model based on the third loss function, wherein the third loss function is defined as:

，

wherein α =2, β =4, N is the number of second initial points in the second initial point set,

as a probability that the ith second initiation point is a true building vertex,

as a probability that the ith second target point is a true building vertex,

taking a binary value, when the ith second target point is close to the vertex whose index belongs to the second set of target indices,

a value of 0, otherwise 1, L obeys

·

Number of vertices of =1, t is the number of iterations of the current missing vertex restoration process.

In the process of the t iteration, smooth is adopted_L1Constructing a fourth loss function corresponding to the second point offset by the loss function, and optimizing the missing vertex prediction network model based on the fourth loss function, wherein the fourth loss function is defined as:

，

wherein N is the number of second initial points in the second initial point set,

for scaling the loss of the ith second initial point, setting it as a corresponding weight label according to the index of whether the ith second initial point is a real building vertex,

is the ith second target point and,

is the ith second initiation point and,

the offset of the ith second initial point from the corresponding real building vertex.

In this embodiment, the total loss function of the missing vertex restoration policy is defined as:

，

where T is the total number of iterations and T may be chosen to be 3.

The missing vertex recovery mode provided by the invention can be used for iteratively recovering and refining the polygon vertex, so that the algorithm is more robust to inaccurate prediction and complex building shapes. It should be noted that this method may also be applied to other contour-based methods to improve the polygon prediction capability, and the present invention is not limited thereto.

For simplicity of explanation, the method embodiments are described as a series of acts or combinations, but those skilled in the art will appreciate that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the embodiments of the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

In addition, the embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the remote sensing image building extraction method based on deep learning as described above.

All or part of the flow of the method of the embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a processor, to instruct related hardware to implement the steps of the embodiments of the methods. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

In the specific implementation process of the present embodiment, reference may be made to the foregoing embodiments, which have corresponding technical effects.

In addition, an embodiment of the present invention further provides a remote sensing image building extraction device based on deep learning, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps in each of the embodiments of the remote sensing image building extraction method based on deep learning, for example, S11 to S14 shown in fig. 1.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units can be a series of instruction segments of a computer program capable of achieving specific functions, and the instruction segments are used for describing the execution process of the computer program in the remote sensing image building extraction device based on deep learning.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is the control center for the electronic device and that connects the various parts of the overall electronic device using various interfaces and wires.

The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the electronic device by running or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, any of the embodiments claimed herein may be used in any combination.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A remote sensing image building extraction method based on deep learning is characterized by comprising the following steps:

extracting image features of the remote sensing image, and generating a building example detection frame according to the image features;

2. The method of claim 1, wherein the extracting image features of the remote sensing image and generating the building instance detection frame according to the image features comprises:

and selecting a target recommendation area from the plurality of recommendation areas, and using the target recommendation area as a detection frame of the building example.

3. The method of claim 1, wherein constructing an initial outline of the building instance from the detection box comprises:

predicting the target offset of the top point of the diamond-shaped contour and the target pole, adjusting the top point of the diamond-shaped contour according to the target offset, and taking the adjusted top point as the target pole;

and sequentially connecting the obtained end points of the four target boundaries to obtain an initial outline of the building example.

4. The method of claim 1, wherein generating a building polygon from the initial contour and corresponding image features of the initial contour in the remote sensing imagery comprises:

s033, performing vertex adjustment on the initial building polygon according to the vertex information to obtain a second building polygon;

5. The method of claim 4, wherein generating an initial building polygon from the initial contour and corresponding image features of the initial contour in the remote sensing imagery comprises:

based on a preset combined prediction network model, predicting the probability that each initial point in the initial point set is the top point of a real building and the offset of the initial point from the top point of the real building according to the image characteristics of the position of each initial point;

and selecting candidate building vertexes from the initial point set according to the probability that each initial point in the initial point set is a real building vertex and the offset of the initial point from the real building vertex, and sequentially connecting the candidate building vertexes to generate an initial building polygon.

6. The method of claim 5, further comprising:

pre-constructing the joint prediction network model;

7. The method of claim 4, wherein predicting missing vertex information in the initial building polygon from the initial building polygon and corresponding image features of the initial building polygon in a remote sensing image comprises:

predicting the probability that each second initial point in the second initial point set is a real building vertex and the offset of the second initial point from the real building vertex according to the image characteristics of the position of each second initial point on the basis of a preset missing vertex prediction network;

8. The method of claim 7, further comprising:

pre-constructing the missing vertex prediction network model;

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.

10. A remote sensing image building extraction device based on deep learning, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the steps of the method according to any one of claims 1 to 8 when executing the computer program.