CN111652250B

CN111652250B - Remote sensing image building extraction method and device based on polygons and storage medium

Info

Publication number: CN111652250B
Application number: CN202010517418.5A
Authority: CN
Inventors: 朱云慧; 黄恩兴; 陈欢欢; 江贻芳; 黄不了; 于娜; 高健
Original assignee: Stargis Tianjin Technology Development Co ltd; University of Science and Technology of China USTC
Current assignee: Stargis Tianjin Technology Development Co ltd; University of Science and Technology of China USTC
Priority date: 2020-06-09
Filing date: 2020-06-09
Publication date: 2023-05-26
Anticipated expiration: 2040-06-09
Also published as: CN111652250A

Abstract

The invention provides a remote sensing image building extraction method and device based on polygons, a storage medium and electronic equipment, wherein the method comprises the following steps: predicting the remote sensing image according to a pre-constructed characteristic extraction network model to obtain a prediction result of a top point diagram and a direction diagram of a target building in the remote sensing image; carrying out regression calculation on each predicted vertex in the vertex map by using a preset network regression model to obtain candidate vertices of the target building; generating a vector polygon corresponding to the target building according to the candidate vertexes and the predicted pattern; and extracting a target building instance in the remote sensing image based on the vector polygon. The invention represents the building instance by generating the self-adaptive polygon composed of the building vertexes, and the polygon output naturally follows the geometric shape of the building, so that the coverage rate and geometric similarity of the building can be improved, and the extraction precision of the remote sensing image building can be further improved.

Description

Remote sensing image building extraction method and device based on polygons and storage medium

Technical Field

The invention relates to the technical field of remote sensing image processing, in particular to a polygon-based remote sensing image building extraction method, a polygon-based remote sensing image building extraction device, a storage medium and electronic equipment.

Background

The extraction of the building examples of the remote sensing images has important significance for various works such as urban management and planning, post-disaster rescue, population research and the like, however, due to the dense distribution and diversity of the buildings, the manual drawing of the building examples of a large number of remote sensing images can consume a great deal of time and energy, so that the automatic extraction of the building examples is urgently needed.

In the past decades, traditional methods have attempted to identify buildings by texture, lines, shadows, and more complex empirical design features, but due to the large variability between buildings in remote sensing images, such methods have not been suitable for complex scenes, nor have they achieved automated applications. In recent years, development of a deep learning method has advanced a new round of research. Most current methods of deep learning based building instance extraction can be divided into three categories, pixel-by-pixel segmentation, building-by-building segmentation, and profiling of structured contours, respectively. However, the existing methods lack efficient ways to infer the overall contour of the building, making it difficult to accurately extract building instances.

Disclosure of Invention

The invention provides a polygon-based remote sensing image building extraction method, a polygon-based remote sensing image building extraction device, a storage medium and electronic equipment, which can effectively combine geometric features of a building to further accurately draw the outline of a building instance and automatically realize accurate extraction of the building instance.

In one aspect of the invention, a method for extracting a remote sensing image building based on polygons is provided, and the method comprises the following steps:

predicting the remote sensing image according to a pre-constructed characteristic extraction network model to obtain a prediction result of a top point diagram and a direction diagram of a target building in the remote sensing image;

carrying out regression calculation on each predicted vertex in the vertex map by using a preset network regression model to obtain candidate vertices of the target building;

generating a vector polygon corresponding to the target building according to the candidate vertexes and the predicted pattern;

and extracting a target building instance in the remote sensing image based on the vector polygon.

Optionally, the generating a vector polygon corresponding to the target building according to the candidate vertex and the predicted pattern includes:

constructing a directed graph set according to the candidate vertexes and the predicted directional patterns, wherein the directed graph set comprises polygons formed by any number of candidate vertexes according to the corresponding predicted directional patterns;

calculating the weight corresponding to the edge formed by connecting any two vertexes in the directed graph set according to a preset calculation model;

and selecting the polygon with the largest sum of the weights corresponding to the sides in the directed graph set as the vector polygon corresponding to the target building.

Optionally, the selecting the polygon with the largest sum of weights corresponding to the sides in the directed graph set as the vector polygon corresponding to the target building includes:

arranging each edge in the directed graph set in descending weight order;

constructing a new directed graph set, wherein the newly constructed directed graph set comprises polygons which are composed of at least 3 vertexes;

and selecting the polygon with the largest average weight corresponding to each side from the newly constructed directed graph set.

Optionally, before the remote sensing image is predicted according to the pre-constructed feature extraction network model, the method further comprises the step of constructing the feature extraction network model;

the construction of the feature extraction network model specifically comprises the following steps:

constructing a feature extraction network by adopting a U-NET network with an encoder/decoder structure and based on RESNET-18;

and optimizing the feature extraction network based on a supervised learning strategy to obtain an optimized feature extraction network model.

Optionally, the optimizing the feature extraction network based on the supervised learning strategy includes:

and constructing a first loss function corresponding to the vertex graph according to the confidence that the predicted vertex in the predicted value of the vertex graph becomes the real vertex, and optimizing the feature extraction network based on the first loss function.

Optionally, the optimizing the feature extraction network based on the supervised learning strategy further includes:

and constructing a second loss function corresponding to the pattern according to the error between the pattern predicted value and the pattern label value, and optimizing the feature extraction network based on the second loss function.

Optionally, the performing regression calculation on each predicted vertex in the vertex map by using a preset network regression model includes:

the network regression model is built in advance and comprises two network branches and a convolution layer, wherein each branch consists of two 1024 complete connection layers;

respectively cutting local blocks corresponding to each prediction vertex from an encoder part and a decoder part of the feature extraction network model according to a preset size, and respectively taking the corresponding two cutting areas as the input of two network branches of the network regression model;

and summing the two vectors with 1024 channels generated by the two network branches by using a network regression model to obtain a fusion vector, and processing the fusion vector by using a convolution layer of the network regression model to obtain a predictive regression vector corresponding to the predictive vertex.

Optionally, after the constructing the network regression model, the method further includes:

optimizing the network regression model based on a supervised learning strategy to obtain an optimized network regression model;

the optimizing the network regression model based on the supervised learning strategy includes:

generating a corresponding target regression vector for each predicted vertex in the predicted vertex map according to the coordinate value of the target building vertex, the coordinate value of the predicted building vertex and the side length of the local block area corresponding to the predicted vertex;

utilizing smoothL in Fast R-CNN based on predictive regression vector and target regression vector ₁ The loss function optimization trains the network regression model.

In another aspect of the present invention, there is provided a polygon-based remote sensing image building extraction apparatus, the apparatus comprising:

the prediction unit is used for predicting the remote sensing image according to the pre-constructed feature extraction network model to obtain a prediction result of a top point diagram and a direction diagram of a target building in the remote sensing image;

the regression unit is used for carrying out regression calculation on each predicted vertex in the vertex map by utilizing a preset network regression model to obtain a candidate vertex of the target building;

A generating unit, configured to generate a vector polygon corresponding to a target building according to the candidate vertex and the predicted pattern;

and the extraction unit is used for extracting the target building instance in the remote sensing image based on the vector polygon.

Optionally, the generating unit includes:

a configuration subunit, configured to construct a directed graph set according to the candidate vertices and the predicted patterns, where the directed graph set includes polygons formed by any number of candidate vertices according to the corresponding predicted patterns;

the computing subunit is used for computing the weights corresponding to the edges formed by connecting any two vertexes in the directed graph set according to a preset computing model;

and the selecting subunit is used for selecting the polygon with the largest sum of the weights corresponding to each side in the directed graph set as the vector polygon corresponding to the target building.

Optionally, the selecting subunit is specifically configured to arrange each edge in the directed graph set in descending order of weight; constructing a new directed graph set, wherein the newly constructed directed graph set comprises polygons which are composed of at least 3 vertexes; and selecting the polygon with the largest average weight corresponding to each side from the newly constructed directed graph set.

Optionally, the apparatus further comprises:

the model training unit is used for constructing the characteristic extraction network model before the remote sensing image is predicted according to the pre-constructed characteristic extraction network model;

the model training unit specifically comprises:

a training subunit for constructing a feature extraction network using a RESNET-18 based U-NET network having an encoder/decoder architecture;

and the optimizing subunit is used for optimizing the feature extraction network based on the supervised learning strategy to obtain an optimized feature extraction network model.

Optionally, the optimizing subunit is specifically configured to construct a first loss function corresponding to the vertex graph according to a confidence that a predicted vertex in the predicted value of the vertex graph becomes a true vertex, and optimize the feature extraction network based on the first loss function;

the optimizing subunit is specifically further configured to construct a second loss function corresponding to the pattern according to an error between the pattern predicted value and the pattern label value, and optimize the feature extraction network based on the second loss function.

Optionally, the regression unit includes:

a building subunit, configured to pre-build the network regression model, where the network regression model includes two network branches and a convolution layer, and each branch is composed of two 1024 completely connected layers;

The configuration subunit is used for respectively cutting the local blocks corresponding to each prediction vertex from the encoder part and the decoder part of the feature extraction network model according to the preset size, and respectively taking the corresponding two cutting areas as the input of two network branches of the network regression model;

and the regression subunit is used for summing the two vectors with 1024 channels generated by the two network branches by utilizing the network regression model to obtain a fusion vector, and processing the fusion vector by utilizing the network regression model convolution layer to obtain a predictive regression vector corresponding to the predictive vertex.

Optionally, the regression unit further includes:

the regression model optimizing subunit is used for optimizing the network regression model based on a supervised learning strategy after the constructing subunit constructs the network regression model to obtain an optimized network regression model;

the regression model optimization subunit includes:

the generation module is used for generating a corresponding target regression vector for each predicted vertex in the predicted vertex diagram according to the coordinate value of the target building vertex, the coordinate value of the predicted building vertex and the side length of the local block area corresponding to the predicted vertex;

An optimization training module for utilizing smoothL in Fast R-CNN according to the predictive regression vector and the target regression vector ₁ The loss function optimization trains the network regression model.

The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the method as described above.

The invention further provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the steps of the method as described above when executing the program.

According to the remote sensing image building extraction method, device, storage medium and electronic equipment based on the polygons, provided by the embodiment of the invention, the building instance is represented by generating the self-adaptive polygons composed of the building vertices, and the polygon output naturally follows the geometric shape of the building, so that the coverage rate and geometric similarity of the building can be improved, and the extraction precision of the remote sensing image building is further improved.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

fig. 1 is a schematic flow chart of a method for extracting a remote sensing image building based on a polygon according to an embodiment of the invention;

FIG. 2 is a diagram of a remote sensing image to be extracted according to an embodiment of the present invention;

FIG. 3 is a predicted vertex graph corresponding to a remote sensing image to be extracted according to an embodiment of the present invention;

FIG. 4 is a predicted pattern corresponding to a remote sensing image to be extracted according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an implementation flow of step S12 in a remote sensing image building extraction method based on polygons according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a remote sensing image building extraction device based on polygons according to an embodiment of the present invention;

fig. 7 is a schematic diagram of an internal structure of a regression unit in a remote sensing image building extraction device based on polygons according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Fig. 1 schematically illustrates a flowchart of a polygon-based remote sensing image building extraction method according to an embodiment of the present invention. Referring to fig. 1, the remote sensing image building extraction method based on polygons according to the embodiment of the present invention specifically includes steps S11 to S14, as follows:

S11, predicting the remote sensing image according to a pre-constructed feature extraction network model to obtain a prediction result of a top point diagram and a direction diagram of a target building in the remote sensing image, see fig. 2-4.

S12, carrying out regression calculation on each predicted vertex in the vertex map by using a preset network regression model to obtain candidate vertices of the target building.

And S13, generating a vector polygon corresponding to the target building according to the candidate vertexes and the predicted pattern.

S14, extracting a target building instance in the remote sensing image based on the vector polygon.

According to the remote sensing image building extraction method based on the polygons, provided by the embodiment of the invention, the building instance is represented by generating the self-adaptive polygons formed by the building vertices, the polygon output naturally follows the geometric shape of the building, the coverage rate and geometric similarity of the building can be improved, and the remote sensing image building extraction precision is further improved.

In the embodiment of the invention, before the remote sensing image is predicted according to the pre-constructed feature extraction network model, the method further comprises the step of constructing the feature extraction network model.

Specifically, an implementation manner of constructing the feature extraction network model specifically includes the following steps: constructing a feature extraction network by adopting a U-NET network with an encoder/decoder structure and based on RESNET-18; and optimizing the feature extraction network based on a supervised learning strategy to obtain an optimized feature extraction network model.

In practice, for a building example comprising 6 building vertices, the example is first represented as a polygon of 6 ordered building vertices, with a connection between any two adjacent building vertices (including the first vertex and the last vertex), i.e., a building edge. Thus, building instance extraction translates into the task of finding the building polygon. In one specific example, the task may be broken down into two steps, namely locating building vertices and connecting the vertices in order. For this purpose, first, a vertex map of a building instance is predicted, the former showing the possible locations of the building vertices, and a direction map, the latter helping to determine whether there is a connection between the vertices.

In order to precisely locate vertices and identify building objects, low-level features with high resolution and high-level features with strong semantics are required. Therefore, a U-shaped network structure is introduced, combining high resolution with strong semantic features. A reset-18 based U-NET network with encoder and decoder architecture is used as the feature extractor, specifically comprising:

an encoder network is constructed. Conv 1-5 blocks in Resnet-18 form an encoder network, and the resolution of the feature map output by the final encoder network is sixteen times that of the original image, namely 112×112/16= 7*7.

A decoder network is constructed. The symmetrical network structure of the encoder is taken as part of the decoder. At this time, the resolution of the output feature map is 112×112 in accordance with the original image. Subsequently, the feature map output by the four repeated 3×3 convolution layers is divided into two branches as a shared feature, one branch outputting a pattern by one 1×1 convolution layer and the other branch outputting a top pattern by one 3×3 convolution layer and one 1×1 convolution layer. The resolution of both the top dot pattern and the pattern output is 112 x 112.

In order to guide the feature extraction network to learn effective feature representation, the feature extraction network is optimized based on a supervised learning strategy in the embodiment, and the specific implementation includes the following steps: constructing a first loss function corresponding to the vertex graph according to the confidence that the predicted vertex in the predicted value of the vertex graph becomes a real vertex, and optimizing the feature extraction network based on the first loss function; and constructing a second loss function corresponding to the pattern according to the error between the pattern predicted value and the pattern label value, and optimizing the feature extraction network based on the second loss function.

The predicted vertex graph represents the probability that each pixel becomes a building vertex. To achieve accurate positioning, the network should predict high values at 6 building vertices and low values in other areas. Meanwhile, since the number of positive samples (i.e. building vertices) 6 is much smaller than the number of negative samples 112 x 112-6, a loss function is required to overcome the positive/negative class problem of extreme imbalance, wherein the first loss function L of the vertex map _Mv The method comprises the following steps:

wherein M is _V Is a predicted vertex graph, p represents any pixel in the vertex graph, p _v Belongs to a real building roof point set, N _min Representing the minimum number of roof points, gamma is a single scalar, affecting the predicted number, and the actual number of roof points for a building is expressed as

Specifically, N _min Set to 3 x 10 and γ set to 10.

Given a set of polygon vertices, i.e., the locations of 6 building vertices, it is possible to generate different polygons by connecting the vertices in different orders. If a polygon is to be selected that best fits the shape of the building, the pattern should be able to measure whether there are edges between any two polygon vertices. The pattern encodes the location and direction of the building boundary. Building boundary areas may be represented as:

a represents a region corresponding to a building boundary;

second loss function L of the pattern _MD The method comprises the following steps:

the invention utilizes L between the pattern label value and the predicted value ₂ Loss to train the network, where M _D Representing a predicted pattern, M _D (P) represents the predicted value of the pattern corresponding to the P pixel, P is any pixel on the pattern, M _D * (P) represents a pattern label value corresponding to the P pixel;

wherein, the liquid crystal display device comprises a liquid crystal display device,

expressed as from->

To->

Unit vector of line segment direction, edge e _ij Corresponding to building vertices

And->

The thickness between the two is a line segment of three pixels.

Low order features are critical to accurately locating building vertices. However, to predict the roof map, semantic features are also required to distinguish building instances from the background, which may affect the learning of low-order features and thus the positioning accuracy. To further improve the accuracy of prediction of building vertices, regression was performed for each predicted building vertex (number of predicted fixed point numbers about 6×10=60).

In one embodiment of the present invention, as shown in fig. 5, in S12, regression calculation is performed on each predicted vertex in the vertex map by using a preset network regression model, and the method specifically includes the following steps:

s121, constructing the network regression model in advance, wherein the network regression model comprises two network branches and a convolution layer, and each branch consists of two 1024 completely connected layers. In this embodiment, the network regression model, i.e. the vertex regressor, needs to consider more detailed features around the vertices, and in addition, introduces advanced features learned by the feature extractor to screen out interference information in the detailed features. Specifically, there are two sibling branches in the vertex regressor that take as input the low-level features clipped from the encoder and the high-level features clipped from the decoder, respectively.

S122, respectively cutting local blocks corresponding to each prediction vertex from an encoder part and a decoder part of the feature extraction network model according to the preset size, and respectively taking the corresponding two cutting areas as the input of two network branches of the network regression model.

S123, summing two vectors with 1024 channels generated through two network branches by using a network regression model to obtain a fusion vector, and passing through the network regression modelThe convolution layer processes the fusion vector to obtain a predictive regression vector R= (R) corresponding to the predictive vertex _x ,R _y )。

In this embodiment, the clipping region corresponds to a local block of 28 x 28 around each predicted vertex. The input is first passed through two branches, each consisting of two 1024 fully connected layers, respectively. Then, by summing the two generated vectors with 1024 channels, a fusion vector is obtained, followed by adding a 1×1 convolution layer, a 2-dimensional regression vector is obtained.

Further, after the network regression model is constructed, the network regression model is optimized based on a supervised learning strategy, and an optimized network regression model is obtained.

Specifically, the network regression model is optimized based on a supervised learning strategy, and the method comprises the following implementation procedures: generating a corresponding target regression vector for each predicted vertex in the predicted vertex map according to the coordinate value of the target building vertex, the coordinate value of the predicted building vertex and the side length of the local block area corresponding to the predicted vertex; utilizing smoothL in Fast R-CNN based on predictive regression vector and target regression vector ₁ The loss function optimization trains the network regression model.

Specifically, a corresponding target regression vector is generated for each predicted vertex in the predicted vertex map, and the target regression vector is used as label data of the predicted vertex, and the specific calculation formula is as follows:

target regression vector

Wherein:

and->

X, y coordinates, T, respectively, of the target building vertices _x And T _u Is the coordinates of the predicted building vertices, S represents the side length of the local block region corresponding to the predicted vertices, and 28 is taken.

By smoothL in Fast R-CNN ₁ The network regression model is trained by loss function optimization, wherein the model is defined as a predictive regression vector R= (R) _x ,R _y ) And the loss function on the target regression vector is as follows:

where k represents the index of the predicted vertex, and R (k) represent the predicted value and the true value of the regression vector of the kth predicted vertex, respectively.

The implementation of building polygon generation is described in detail below by way of specific embodiments.

To construct a polygon, the locations and arrangement of vertices should be provided. Using the vertex map and regression vector, candidate locations for the building vertices may be determined, and the predicted 60 candidate vertices may be non-maximally suppressed in their 3*3 neighborhood, resulting in 6 candidate vertices. Any possible arrangement would result in a polygon, and then the most similar building boundary would be selected from all candidate polygons as the final building polygon.

First, a directed graph set G is constructed from the candidate vertices and the predicted directional graphs: g= (V, E);

wherein v= { V _i } _i＝1,…, Is a set of n candidate vertices, v= { V in this particular example _i } _i＝1,…, Is a set of 6 candidate vertices, which,

is a directed edge set between any two vertices, v _i For the ith predicted vertex, v _j Predicting the vertex for the j-th;

the connection between any two vertices is an edge, and then the weight of the edge measures the confidence of the existence of the edge. Therefore, after the directed graph set G is obtained, the weights corresponding to the edges formed by connecting any two vertexes in G are calculated as follows:

wherein P is _ij Is of slave v _i To v _j Pixels on a line segment of (2);

and finally, selecting the polygon with the largest sum of the weights corresponding to the sides in G as the vector polygon corresponding to the target building.

The polygon with the largest sum of the weights corresponding to the sides in the G is selected, and the specific implementation mode is as follows: arranging each edge in the G in descending weight order; building a new directed graph set G ^′ Wherein G is ^′ There is a loop consisting of at least 3 vertices; selecting G ^′ The polygon with the largest average weight.

In the embodiment of the invention, a polygon most similar to a building boundary, that is, all sides constituting the polygon have the highest average confidence. Thus, the problem of selecting the most similar polygon is redefined to find a ring with the greatest weight in graph G. To this end, the invention arranges each edge in G in descending weight order to reconstruct a new G gradually ^′ Up to G ^′ There is a loop consisting of at least 3 vertices. Finally, at G ^′ The loop with the highest average weight in (a) is the final predicted building polygon.

The present invention represents a building instance by generating an adaptive polygon consisting of building vertices, the polygon output naturally following the geometry of the building. Thus, building coverage and geometric similarity can be improved.

According to the invention, the positions and the arrangement of the vertices of the predicted building are supervised through the corresponding label data, so that the learning of the image features can be directly optimized, and the feature representation performance of the model is further improved.

The present invention can introduce detailed structures around vertices to compensate for the loss of low-level features caused by joint learning building identification and boundary localization by adjusting the position of each predicted vertex. Therefore, the prediction accuracy of the building polygon can be further improved.

For the purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated by one of ordinary skill in the art that the methodologies are not limited by the order of acts, as some acts may, in accordance with the embodiments of the present invention, occur in other orders and concurrently. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred, and that the acts are not necessarily required in the practice of the invention.

Fig. 6 schematically illustrates a structural diagram of a polygon-based remote sensing image building extraction device according to an embodiment of the present invention. Referring to fig. 6, the remote sensing image building extraction device based on polygons according to the embodiment of the present invention specifically includes a prediction unit 201, a regression unit 202, a generation unit 203, and an extraction unit 204, where:

the prediction unit 201 is configured to predict a remote sensing image according to a pre-constructed feature extraction network model, so as to obtain a prediction result of a top point diagram and a direction diagram of a target building in the remote sensing image;

a regression unit 202, configured to perform regression calculation on each predicted vertex in the vertex map by using a preset network regression model, so as to obtain a candidate vertex of the target building;

a generating unit 203, configured to generate a vector polygon corresponding to the target building according to the candidate vertices and the predicted pattern;

the extracting unit 204 is configured to extract a target building instance in the remote sensing image based on the vector polygon.

In the embodiment of the present invention, the generating unit 203 includes a configuration subunit, a calculation subunit, and a selection subunit, where:

In the embodiment of the invention, the selecting subunit is specifically configured to arrange each edge in the directed graph set in descending order of weight; constructing a new directed graph set, wherein the newly constructed directed graph set comprises polygons which are composed of at least 3 vertexes; and selecting the polygon with the largest average weight corresponding to each side from the newly constructed directed graph set.

In the embodiment of the present invention, the apparatus further includes a model training unit, which is not shown in the drawings, and the model training unit is configured to construct the feature extraction network model before the remote sensing image is predicted according to the pre-constructed feature extraction network model.

The model training unit specifically comprises a training subunit and an optimizing subunit, wherein:

Further, the optimizing subunit specifically builds a first loss function corresponding to the vertex graph according to the confidence that the predicted vertex in the predicted value of the vertex graph becomes a true vertex, and optimizes the feature extraction network based on the first loss function;

the optimization subunit is specifically further configured to construct a second loss function corresponding to the pattern according to an error between the pattern predicted value and the pattern label value, and optimize the feature extraction network based on the second loss function.

In the embodiment of the present invention, referring to fig. 7, the regression unit 202 includes a construction subunit 2021, a configuration subunit 2022, and a regression subunit 2023, where:

a building subunit 2021, configured to pre-build the network regression model, where the network regression model includes two network branches and one convolution layer, and each branch is composed of two 1024 completely connected layers;

a configuration subunit 2022, configured to extract, from the features, an encoder portion and a decoder portion of the network model according to preset sizes, respectively crop the local blocks corresponding to each predicted vertex, and use the two corresponding clipping regions as inputs of two network branches of the network regression model respectively;

The regression subunit 2023 is configured to sum two vectors with 1024 channels generated by two network branches by using a network regression model to obtain a fusion vector, and process the fusion vector by using a convolution layer of the network regression model to obtain a predictive regression vector corresponding to the predicted vertex.

Further, the regression unit 202 further includes a regression model optimization subunit not shown in the drawing, and the regression model optimization subunit is configured to optimize the network regression model based on a supervised learning strategy after the construction subunit constructs the network regression model, to obtain an optimized network regression model.

Specifically, the regression model optimization subunit comprises a generation module and an optimization training module, wherein:

the optimization training module is used for utilizing SmoothL in Fast R-CNN according to the predicted regression vector and the target regression vector ₁ The loss function optimization trains the network regression model.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

According to the remote sensing image building extraction method and device based on the polygons, provided by the embodiment of the invention, the building instance is represented by generating the self-adaptive polygons formed by the building vertices, the polygon output naturally follows the geometric shape of the building, the coverage rate and geometric similarity of the building can be improved, and the remote sensing image building extraction precision is further improved.

Furthermore, embodiments of the present invention provide a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements the steps of the method as described above.

In this embodiment, the module/unit integrated with the polygon-based remote sensing image building extraction device may be stored in a computer readable storage medium if implemented as a software functional unit and sold or used as a separate product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The electronic device provided by the embodiment of the invention comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the steps in the embodiment of the remote sensing image building extraction method based on polygons are realized when the processor executes the computer program, for example, S11-S14 shown in fig. 1. Alternatively, the processor may implement the functions of the modules/units in the embodiments of the polygon-based remote sensing image building extraction apparatus when executing the computer program, for example, the prediction unit 201, the regression unit 202, the generation unit 203, and the extraction unit 204 shown in fig. 6.

The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to accomplish the present invention, for example. The one or more modules/units may be a series of computer program instruction segments capable of performing a specific function describing the execution of the computer program in the polygon based remote sensing image building extraction device. For example, the computer program may be divided into a prediction unit 201, a regression unit 202, a generation unit 203, and an extraction unit 204.

The electronic equipment can be mobile computers, notebooks, palm computers, mobile phones and other equipment. The electronic device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the electronic device in this embodiment may include more or fewer components, or may combine certain components, or different components, e.g., the electronic device may also include input and output devices, network access devices, buses, etc.

The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is a control center of the electronic device, connecting various parts of the overall electronic device using various interfaces and lines.

The memory may be used to store the computer program and/or modules, and the processor may implement various functions of the electronic device by running or executing the computer program and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The method for extracting the remote sensing image building based on the polygon is characterized by comprising the following steps of:

2. The method of claim 1, wherein the generating a vector polygon corresponding to a target building from the candidate vertices and the predicted pattern comprises:

3. The method according to claim 2, wherein selecting the polygon with the greatest sum of weights corresponding to the sides in the set of directed graphs as the vector polygon corresponding to the target building comprises:

arranging each edge in the directed graph set in descending weight order;

4. The method of claim 1, further comprising the step of constructing the feature extraction network model prior to predicting the remote sensing image from the pre-constructed feature extraction network model;

5. The method of claim 4, wherein optimizing the feature extraction network based on the supervised learning strategy comprises:

6. The method of claim 4 or 5, wherein the optimizing the feature extraction network based on a supervised learning strategy further comprises:

7. The method of claim 1, wherein performing regression calculation on each predicted vertex in the vertex map using a predetermined network regression model comprises:

8. The method of claim 7, wherein after the constructing the network regression model, the method further comprises:

corresponding utilization of smoothL in Fast R-CNN according to predictive regression vector and target regression vector ₁ The loss function optimization trains the network regression model.

9. A remote sensing image building extraction device based on polygons, the device comprising:

10. The apparatus of claim 9, wherein the generating unit comprises:

11. The apparatus of claim 10, wherein the selecting subunit is specifically configured to arrange each edge in the directed graph set in descending order of weight; constructing a new directed graph set, wherein the newly constructed directed graph set comprises polygons which are composed of at least 3 vertexes; and selecting the polygon with the largest average weight corresponding to each side from the newly constructed directed graph set.

12. The apparatus of claim 9, wherein the apparatus further comprises:

The model training unit specifically comprises:

13. The apparatus according to claim 12, wherein the optimizing subunit is specifically configured to construct a first loss function corresponding to the vertex map according to a confidence that a predicted vertex in the predicted value of the vertex map is a true vertex, and optimize the feature extraction network based on the first loss function;

14. The apparatus of claim 9, wherein the regression unit comprises:

15. The apparatus of claim 14, wherein the regression unit further comprises:

the regression model optimization subunit includes:

16. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-8.

17. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-8 when executing the program.