CN113658203A

CN113658203A - Method and device for extracting three-dimensional outline of building and training neural network

Info

Publication number: CN113658203A
Application number: CN202110963024.7A
Authority: CN
Inventors: 孟令宣
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2021-11-16

Abstract

The embodiment of the disclosure provides a building three-dimensional contour extraction and neural network training method and device, wherein the neural network is used for obtaining an offset vector prediction result of a building to determine the height of the building, and the offset vector prediction result is used for assisting a semantic segmentation result to extract the three-dimensional contour of the building, so that the three-dimensional contour of the building is obtained, and the accuracy of the building three-dimensional contour extraction can be improved.

Description

Method and device for extracting three-dimensional outline of building and training neural network

Technical Field

The disclosure relates to the technical field of computer vision, in particular to a method and a device for extracting a three-dimensional outline of a building and training a neural network.

Background

Obtaining the three-dimensional outline of the building from the remote sensing image is one of important requirements of surveying and mapping, and can provide important data support for disaster assessment, environment supervision, resident management, city planning and the like. However, for a tall building, it is difficult for the related art building three-dimensional contour extraction method to accurately extract the three-dimensional contour of the building.

Disclosure of Invention

The disclosure provides a building three-dimensional contour extraction and neural network training method and device.

In a first aspect, an embodiment of the present disclosure provides a method for extracting a three-dimensional outline of a building, where the method includes: processing a single-scene image comprising a building through a neural network to obtain a semantic segmentation result and an offset vector prediction result, wherein the semantic segmentation result is used for determining a three-dimensional area on the building, and the offset vector prediction result is used for determining the height of the building; and extracting the three-dimensional contour of the building based on the semantic segmentation result and the offset vector prediction result of the building to obtain the three-dimensional contour of the building.

In some embodiments, the semantic segmentation results include edge segmentation results for determining building edges, and region segmentation results for determining regions on the building; the offset vector prediction results comprise offset vectors of pixel points of the roof area and the side of the building.

In some embodiments, the semantic segmentation result further comprises at least one of: an edge direction segmentation result for determining an edge direction of the building; a base segmentation result for determining a base area of the building.

In some embodiments, the offset vector predictor further comprises at least one of: the average deviation direction angle of the building in the single-scene image; and the offset vector of the pixel points in the base area of the building is the vector of any second pixel point in the base area pointing to the first pixel point corresponding to the second pixel point in the roof area.

In some embodiments, the semantic segmentation results include a base segmentation result for determining a base region of a building, and a region segmentation result for determining regions on the building, the three-dimensional region including a roof region of the building; the offset vector prediction result comprises an offset vector of a first pixel point, and the first pixel point is a pixel point of the roof area; the neural network obtains the base segmentation result based on the following modes: for each first pixel point, translating the first pixel point based on the offset vector of the first pixel point to obtain a second pixel point corresponding to the first pixel point; and determining the base segmentation result based on the second pixel points corresponding to the first pixel points.

In some embodiments, the neural network comprises a plurality of semantic segmentation networks and a plurality of prediction networks, the plurality of semantic segmentation networks comprising: an edge segmentation network for determining building edges; a regional division network for determining regions on the building; an edge direction segmentation network for determining an edge direction of the building; a base partition network for determining a base area of the building; the plurality of predictive networks includes: a first vector prediction network for determining offset vectors for pixel points of a rooftop region and a side of the building; a second vector prediction network for determining an offset vector for pixel points of a base area of the building; a direction angle prediction network for determining an average offset direction angle of buildings in the monoscopic image.

In some embodiments, the method further comprises: processing a sample image through an initial neural network, wherein the initial neural network comprises a plurality of initial segmentation networks and a plurality of initial prediction networks, the plurality of initial segmentation networks comprise an initial edge segmentation network, an initial region segmentation network, an initial edge direction segmentation network and an initial base segmentation network, and the initial prediction networks comprise an initial first vector prediction network, an initial second vector prediction network and an initial direction angle prediction network; respectively acquiring the segmentation loss of each initial segmentation network and the prediction loss of each initial prediction network; summing the segmentation loss of each initial segmentation network and the prediction loss of each initial prediction network to obtain the loss of the initial neural network; and training the initial neural network based on the loss of the initial neural network to obtain the neural network.

In some embodiments, the method further comprises: determining a template and a target domain based on the semantic segmentation result; determining a translation vector based on the offset vector prediction result, translating the template based on the translation vector, and determining the matching degree between the translated template and the target domain; optimizing the offset vector prediction result based on the matching degree to obtain an optimized offset vector prediction result; the three-dimensional contour extraction of the building based on the semantic segmentation result and the offset vector prediction result of the building comprises the following steps: and extracting the three-dimensional contour of the building based on the semantic segmentation result of the building and the optimized offset vector prediction result.

In some embodiments, the offset vector predictor comprises an offset vector for each pixel point of the rooftop region; the determining a translation vector based on the offset vector predictor comprises: determining a length range of the translation vector based on an average length of offset vectors of respective pixel points of the rooftop region, the average length being within the length range; determining a direction of the translation vector based on an average direction angle of the offset vectors of the respective pixel points of the rooftop region.

In some embodiments, the offset vector predictor comprises an average offset direction angle of a building in the monoscopic image; the determining the direction of the translation vector based on the average direction angle of the offset vectors of the pixel points of the roof region comprises: determining the average deviation direction angle of the building in the single-scene image as the direction of the translation vector under the condition that the average length is smaller than the preset length; and determining the average direction angle of the offset vector of each pixel point of the roof area as the direction of the translation vector under the condition that the average length is greater than or equal to the preset length.

In some embodiments, the semantic segmentation results include edge segmentation results for determining building edges; the template is an edge between a roof area and a side of the building, and the target area is an edge between a base area and a side of the building; and/or the semantic segmentation results comprise edge segmentation results for determining building edges; the template is an edge between a roof area and a building side, and the target area is an edge between a building side and a non-building area; and/or the semantic segmentation results comprise region segmentation results for determining regions on the building; the template is a roof area and the target area is a base area.

In some embodiments, the semantic segmentation results include edge segmentation results for determining building edges and edge direction segmentation results for determining edge directions of the building; the method further comprises the following steps: performing polygonization processing on the building edge based on the edge direction segmentation result; the three-dimensional contour extraction of the building based on the semantic segmentation result and the offset vector prediction result of the building comprises the following steps: and performing three-dimensional contour extraction on the building based on the semanteme segmentation result subjected to the polygonization processing and the offset vector prediction result.

In some embodiments, the single-view images include a first single-view image of a pre-disaster area to be processed and a second single-view image of the post-disaster area to be processed, and the three-dimensional outlines of the buildings include a first three-dimensional outline of each building in the area to be processed before a disaster and a second three-dimensional outline of each building in the area to be processed after the disaster; the method further comprises the following steps: for each building in the area to be processed, determining the similarity of a first three-dimensional contour and a second three-dimensional contour of the building; acquiring the number of buildings with the similarity lower than a preset similarity threshold in the area to be processed; and determining the disaster degree of the area to be processed based on the number of buildings with the similarity lower than a preset similarity threshold.

In some embodiments, the method further comprises: obtaining an area of a roof region in a three-dimensional profile of the building; and under the condition that the area of the roof area of the building is larger than a preset area threshold value, determining that the building is a violation building.

In a second aspect, an embodiment of the present disclosure provides a method for training a neural network, the method including: processing a sample image through an initial neural network, wherein the initial neural network comprises an initial segmentation network and an initial prediction network, the initial segmentation network is used for obtaining a semantic segmentation result, the semantic segmentation result is used for determining a three-dimensional area on a building, the initial prediction network is used for obtaining an offset vector prediction result, and the offset vector prediction result is used for determining the height of the building; respectively obtaining the loss of the initial segmentation network and the loss of the initial prediction network, and determining the loss of the initial neural network based on the loss of the initial segmentation network and the loss of the initial prediction network; training the initial neural network based on the loss of the initial neural network.

In some embodiments, the semantic segmentation network comprises: an edge segmentation network for determining building edges; a regional division network for determining regions on the building; the predictive network includes: a first vector prediction network for determining offset vectors for pixel points of a rooftop region and a side of the building.

In some embodiments, the semantic segmentation network further comprises at least one of: an edge direction segmentation network for determining an edge direction of the building; a base partition network for determining a base area of a building.

In some embodiments, the predictive network further comprises at least one of: a second vector prediction network for determining an offset vector for pixel points of a base area of the building; and a direction angle prediction network for determining an average offset direction angle of the building in the monoscopic image.

In a third aspect, an embodiment of the present disclosure provides an apparatus for extracting a three-dimensional contour of a building, the apparatus including: the system comprises a first processing module, a second processing module and a third processing module, wherein the first processing module is used for processing a single-scene image comprising a building through a neural network to obtain a semantic segmentation result and an offset vector prediction result, the semantic segmentation result is used for determining a three-dimensional area on the building, and the offset vector prediction result is used for determining the height of the building; and the contour extraction module is used for extracting the three-dimensional contour of the building based on the semantic segmentation result and the offset vector prediction result of the building to obtain the three-dimensional contour of the building.

In some embodiments, the apparatus further comprises: the system comprises a sample processing module, a first prediction module and a second prediction module, wherein the sample processing module is used for processing a sample image through an initial neural network, the initial neural network comprises a plurality of initial segmentation networks and a plurality of initial prediction networks, the plurality of initial segmentation networks comprise an initial edge segmentation network, an initial region segmentation network, an initial edge direction segmentation network and an initial base segmentation network, and the initial prediction networks comprise an initial first vector prediction network, an initial second vector prediction network and an initial direction angle prediction network; the loss acquisition module is used for respectively acquiring the segmentation loss of each initial segmentation network and the prediction loss of each initial prediction network; the summing module is used for summing the segmentation loss of each initial segmentation network and the prediction loss of each initial prediction network to obtain the loss of the initial neural network; and the training module is used for training the initial neural network based on the loss of the initial neural network to obtain the neural network.

In some embodiments, the apparatus further comprises: a first determining module for determining a template and a target domain based on the semantic segmentation result; the translation module is used for determining a translation vector based on the offset vector prediction result, translating the template based on the translation vector and determining the matching degree between the translated template and the target domain; the optimization module is used for optimizing the offset vector prediction result based on the matching degree to obtain an optimized offset vector prediction result; the contour extraction module is to: and extracting the three-dimensional contour of the building based on the semantic segmentation result of the building and the optimized offset vector prediction result.

In some embodiments, the offset vector predictor comprises an offset vector for each pixel point of the rooftop region; the translation module includes: a first determining unit, configured to determine a length range of the translation vector based on an average length of offset vectors of respective pixel points of the roof region, where the average length is within the length range; and the second determining unit is used for determining the direction of the translation vector based on the average direction angle of the offset vectors of all the pixel points of the roof area.

In some embodiments, the offset vector predictor comprises an average offset direction angle of a building in the monoscopic image; the second determination unit is configured to: determining the average deviation direction angle of the building in the single-scene image as the direction of the translation vector under the condition that the average length is smaller than the preset length; and determining the average direction angle of the offset vector of each pixel point of the roof area as the direction of the translation vector under the condition that the average length is greater than or equal to the preset length.

In some embodiments, the semantic segmentation results include edge segmentation results for determining building edges and edge direction segmentation results for determining edge directions of the building; the device further comprises: the polygonization processing module is used for carrying out polygonization processing on the building edge based on the edge direction segmentation result; the contour extraction module is to: and performing three-dimensional contour extraction on the building based on the semanteme segmentation result subjected to the polygonization processing and the offset vector prediction result.

In some embodiments, the single-view images include a first single-view image of a pre-disaster area to be processed and a second single-view image of the post-disaster area to be processed, and the three-dimensional outlines of the buildings include a first three-dimensional outline of each building in the area to be processed before a disaster and a second three-dimensional outline of each building in the area to be processed after the disaster; the device further comprises: the similarity determining module is used for determining the similarity of a first three-dimensional contour and a second three-dimensional contour of each building in the to-be-processed area; the quantity obtaining module is used for obtaining the quantity of the buildings with the similarity lower than a preset similarity threshold in the area to be processed; and the disaster degree determining module is used for determining the disaster degree of the area to be processed based on the number of the buildings with the similarity lower than a preset similarity threshold.

In some embodiments, the apparatus further comprises: the area acquisition module is used for acquiring the area of a roof region in the three-dimensional outline of the building; and the illegal building determining module is used for determining that the building is the illegal building under the condition that the area of the roof area of the building is larger than a preset area threshold value.

In a fourth aspect, an embodiment of the present disclosure provides an apparatus for training a neural network, where the apparatus includes: the system comprises a first processing module, a second processing module and a third processing module, wherein the first processing module is used for processing a sample image through an initial neural network, the initial neural network comprises an initial segmentation network and an initial prediction network, the initial segmentation network is used for obtaining a semantic segmentation result, the semantic segmentation result is used for determining a three-dimensional area on a building, the initial prediction network is used for obtaining an offset vector prediction result, and the offset vector prediction result is used for determining the height of the building; a loss obtaining module, configured to obtain a loss of the initial segmented network and a loss of the initial predicted network, respectively, and determine a loss of the initial neural network based on the loss of the initial segmented network and the loss of the initial predicted network; a training module to train the initial neural network based on the loss of the initial neural network.

In a fifth aspect, the embodiments of the present disclosure provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method according to any of the embodiments.

In a sixth aspect, the embodiments of the present disclosure provide a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method according to any of the embodiments when executing the program.

According to the embodiment of the disclosure, the offset vector prediction result of the building is obtained through the neural network to determine the height of the building, and the offset vector prediction result assists the semantic segmentation result to extract the three-dimensional contour of the building, so that the accuracy of extracting the three-dimensional contour of the building can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flowchart of a building three-dimensional contour extraction method according to an embodiment of the present disclosure.

Fig. 2 is a schematic illustration of a building edge and building area of an embodiment of the disclosure.

Fig. 3 is a schematic diagram of an offset vector of an embodiment of the present disclosure.

Fig. 4 is a schematic structural diagram of a neural network according to an embodiment of the present disclosure.

Fig. 5 is a schematic diagram of a template matching process of an embodiment of the present disclosure.

Fig. 6 is a flowchart of a training method of a neural network of an embodiment of the present disclosure.

Fig. 7 is a block diagram of a building three-dimensional contour extraction device according to an embodiment of the present disclosure.

Fig. 8 is a block diagram of a training apparatus of a neural network of an embodiment of the present disclosure.

Fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In order to make the technical solutions in the embodiments of the present disclosure better understood and make the above objects, features and advantages of the embodiments of the present disclosure more comprehensible, the technical solutions in the embodiments of the present disclosure are described in further detail below with reference to the accompanying drawings.

The three-dimensional contour extraction of the building generally refers to the extraction of three-dimensional areas such as a roof area, a base area, and a side surface of the building from a remote sensing image. The extraction of the three-dimensional outline of the building is one of important requirements of mapping, and can provide important data support for disaster assessment, environment supervision, resident management, urban planning and the like.

At present, there are many methods for identifying buildings or segmenting the remote sensing images at a pixel level. For example, a building is extracted using an object detection model such as a fast Region Convolutional Neural network (fast RCNN), a semantic segmentation model such as U-Net, or an example segmentation model such as a Mask Region Convolutional Neural network (Mask-RCNN). Although these methods can more accurately segment the entire building from the image, they cannot further accurately extract three-dimensional areas such as the roof area, the base area, and the side surfaces of the building. In addition, since the base area and the roof area of the short building are almost overlapped in the image, the method can obtain relatively accurate base contour prediction results. However, for high-rise buildings, it is difficult to accurately extract the three-dimensional contour of the high-rise building by these methods because the high-rise building base region and the roof region in the non-orthographic remote sensing image are significantly offset, the base region of the building is severely blocked, and the roof region and the side surface of the building in the image have certain similarity.

Based on this, the disclosed embodiment provides a building three-dimensional contour extraction method, as shown in fig. 1, the method includes:

step 101: processing a single-scene image comprising a building through a neural network to obtain a semantic segmentation result and an offset vector prediction result, wherein the semantic segmentation result is used for determining a three-dimensional area on the building, and the offset vector prediction result is used for determining the height of the building;

step 102: and extracting the three-dimensional contour of the building based on the semantic segmentation result and the offset vector prediction result of the building to obtain the three-dimensional contour of the building.

In step 101, a single-view image is a remote sensing image acquired from a single view angle of an area where building contour extraction is required. In a traditional building three-dimensional contour extraction mode, a plurality of visual angles are generally adopted to respectively acquire images of the same area to obtain an image group of the area, and the building three-dimensional contour extraction is carried out based on the image group. The processing cost and complexity of this approach are high. The embodiment of the disclosure is to extract the three-dimensional outline of the building based on the monoscopic image, that is, only one remote sensing image of each area needing to extract the three-dimensional outline of the building is needed to be acquired. In order to solve the above problems, the embodiment of the present disclosure combines the semantic segmentation result and the offset vector prediction result to extract the three-dimensional contour of the building, so as to improve the accuracy of the extraction result obtained based on the monoscopic image.

The semantic segmentation result can be obtained by performing semantic segmentation on the monoscopic image. In some embodiments, the semantic segmentation results include edge segmentation results for determining building edges, and region segmentation results for determining regions on the building. By segmenting the building edges and the areas on the building, the accuracy of three-dimensional contour extraction can be improved. Where an edge refers to the boundary between two different semantics. As shown in fig. 2, in the monoscopic image 200, the edges may include at least one of: edges AB, BC between the roof area 201 of the building and the non-building areas in the monoscopic image; edges CD, AD between the roof area 201 of the building and the sides 202 of the building; left and right edges AF, GD, and CH of the side 202 of the building; the bottom edge of the side 202 of the building, i.e. the edge FG, GH between the side 202 of the building and the base area 203 of the building; the other regions of the monoscopic image except the above-mentioned various edges. The roof area 201 is defined by ABCD, the side 202 is defined by ADGF and DCHG, and the base area 203 is defined by EFGH. The dotted line is a blocked area, and is invisible in the monoscopic image. A building area refers to the exterior surface of a building and can be generally divided into a roof area, a side area, and other areas in addition to the two building areas.

In some embodiments, the semantic segmentation results further comprise edge direction segmentation results for determining an edge direction of the building. The edge direction prediction result may include N edge direction categories and background categories, where N +1 categories are total, and N is a positive integer. Each edge direction corresponds to an angle range, and 0-360 degrees can be divided into N intervals to obtain the N +1 categories. The background type is a type corresponding to other regions except the various edges. By predicting the edge direction, the building edge can be processed into a polygon, and the building edge can be processed into a regular shape (e.g., a quadrangle).

In some embodiments, the semantic segmentation results further comprise base segmentation results for determining a base area of the building. The base segmentation result may include two categories, namely a category that belongs to the building base area and a category that does not belong to the building base area. By obtaining the base segmentation result, the accuracy of three-dimensional contour extraction can be further improved.

When the base is segmented, aiming at each pixel point (called as a first pixel point) in a roof area, the first pixel point can be translated based on the offset vector of the first pixel point to obtain a second pixel point corresponding to the first pixel point; and determining the base segmentation result based on the second pixel points corresponding to the first pixel points. The base area is often invisible in the image due to reasons such as being shielded, and through the mode, the roof area is directly translated to obtain the base area, so that the accuracy of base segmentation can be improved.

The offset vector prediction results comprise offset vectors of pixel points of the roof area and the side of the building. The building offset is at a pixel level, referring to fig. 3, the monoscopic image 300 includes a building 301, and for a pixel point in the roof area or the side surface, taking the pixel point P1 on the side surface 3011 as an example, the offset vector of the pixel point P1 is a vector pointing the pixel point P2 corresponding to the pixel point P1 on the base area to the pixel point P1, wherein the pixel point P2 is a projection point of the pixel point P1 on the base area. The offset vector of the background area other than the building in the monoscopic image is 0. For any building, the directions of the offset vectors of all the pixel points on the building are consistent, the lengths of the offset vectors are gradually increased from the base area, and the lengths of the offset vectors of the pixel points in the roof area are the largest. The offset vector can be decomposed into an offset component in the x-direction and an offset component in the y-direction, measured in pixels. The x direction is orthogonal to the y direction, and as shown in fig. 3, the positive directions of the x direction and the y direction are the right direction and the upper direction of the monoscopic image, respectively. The offset vector of the background area except for the building in the monoscopic image has 0 offset component in the x direction and the y direction. Offset vector prediction performs pixel-level prediction by regression for two channels, which represent an offset component in the x-direction and an offset component in the y-direction, respectively.

In some embodiments, the offset vector predictor further comprises an average offset direction angle of a building in the monoscopic image. Considering that the offset direction angles of most buildings in a single image are basically the same, the building offset direction angle prediction is a prediction task at image level and is labeled as the average value of the offset direction angles of all buildings in the single image. For short buildings, the direction prediction of the offset vector is not accurate generally, and by acquiring the average offset direction angle of the building, the direction of the offset vector of the short building can be optimized based on the average offset direction angle of the building, so that the direction prediction of the offset vector of the short building is more accurate. Wherein, a short building refers to a building in which the length of the offset vector is less than a preset length threshold. Buildings in which the length of the offset vector is greater than or equal to a preset length threshold may be referred to as high-rise buildings.

In some embodiments, the offset vector predictor further includes an offset vector of pixel points of a base region of the building, wherein the offset vector of each second pixel point of the base region is defined as a vector of the second pixel point to a first pixel point of the second pixel point corresponding to the rooftop region. For example, if a pixel point P3 (the first pixel point) in the roof area in fig. 3 corresponds to a pixel point P2 (the second pixel point) in the base area, i.e., the projection point of the pixel point P3 in the base area is the pixel point P2, the offset vector of the pixel point P2 is equal to the offset vector of the pixel point P3. The offset vector prediction of the pixel points in the base region also performs prediction of two channel pixel levels through regression, and respectively represents an offset component in the x direction and an offset component in the y direction. The offset vector prediction task of the pixel points added into the base area can improve the prediction effect of the offset vector prediction of the building.

The neural network can acquire semantic segmentation results and offset vector prediction results in a multitask learning mode. In some embodiments, the neural network is configured to perform at least one semantic segmentation task to obtain semantic segmentation results and to perform at least one offset vector prediction task to obtain offset vector prediction results. Different tasks may be performed by different sub-networks in the neural network. In some embodiments, the sub-network comprises a plurality of semantic segmentation networks including an edge segmentation network N for determining building edges and a plurality of prediction networks_edge(ii) a Area-splitting network N for determining areas on a building_roof(ii) a Edge direction segmentation network N for determining edge directions of said building_orient(ii) a Base partition network N for determining the base area of a building_foot. The plurality of predictive networks includes: first vector prediction network N for determining offset vectors of pixel points of roof area and side of said building_{field_a}(ii) a Second vector prediction network N for determining an offset vector of pixels of a base area of said building_{field_b}(ii) a Direction angle prediction network N for determining average offset direction angle of buildings in monoscopic image_angv. Fig. 4 is a schematic structural diagram of a neural network according to an embodiment of the present disclosure.

The neural network in the embodiment simultaneously learns 4 semantic segmentation tasks and 3 offset tasks, and the tasks can be mutually promoted, so that key elements such as a roof area, a side surface, a base area and the like of a building can be better learned. Under different application scenarios, the at least one semantic segmentation task and the at least one prediction task can be executed, and the number and the type of the sub-networks in the neural network can be adjusted according to the executed tasks.

In some embodiments, the neural network may further include a feature extraction network, configured to perform feature extraction on the input monoscopic image to obtain features of the monoscopic image. Alternatively, the feature extraction network may employ a convolutional neural network. The above-described respective semantic segmentation networks and prediction networks may share the feature extraction network. Each of the semantic segmentation network and the prediction network may include a head structure (head), and the head of one network is related to a task executed by the network, and is used for performing further feature extraction on features output by the feature extraction network, so as to extract features related to the task executed by the network, and the features are used for being output to a subsequent layer of the network for processing, so that the network outputs a corresponding processing result.

The neural network may be derived by initial neural network training. Specifically, a sample image is processed by an initial neural network comprising a plurality of initial segmentation networks and a plurality of initial prediction networks, the plurality of initial segmentsThe cut network comprises an initial edge segmentation network N_{edge_o}Initial segmentation network N_{roof_o}Initial edge direction split network N_{orient_o}And an initial base splitting network N_{foot_o}The initial prediction network comprises an initial first vector prediction network N_{field_ao}Initial second vector prediction network N_{field_bo}And initial direction angle prediction network N_{angv_o}. The segmentation loss of each initial segmentation network and the prediction loss of each initial prediction network may be obtained respectively, the segmentation loss of each initial segmentation network and the prediction loss of each initial prediction network are summed to obtain the loss of the initial neural network, and the initial neural network is trained based on the loss of the initial neural network to obtain the neural network.

In some embodiments, the 4 semantic segmentation tasks described above may all use cross-entropy loss. By L_orient、L_edge、L_roof、L_footThe loss corresponding to the edge direction division network, the edge division network, the area division network and the base division network is respectively shown. The building offset azimuth prediction task is a classification task, using cross entropy loss. The above respective losses are shown in formula (1):

under the condition that all the 4 semantic segmentation tasks adopt cross entropy loss, L_orient、L_edge、L_roof、L_footAre collectively denoted as L_segN represents the total number of pixel points in the single scene image; c denotes the total number of classes of the classification task, e.g. at L_segRepresents L_orientIn the case of (1), C is equal to N +1 (i.e., N edge direction classes and background classes), at L_segRepresents L_orientC is equal to 3 (i.e. 3 categories of roof area, side and other areas than the two building areas mentioned above); y is_i,cAnd p (y)_i,c) Binary indicator respectively representing pixel point i and summary of pixel point i belonging to category cAnd (4) rate. In case that pixel point i belongs to category c, y_i,cIs 1; in case that the pixel point i does not belong to the category c, y_i,cIs 0.

Total loss L of four semantic segmentation tasks_semAs shown in equation (2):

L_sem＝L_orient+α₁L_roof+α₂L_foot+α₃L_edge (2)

wherein alpha is₁、α₂And alpha₃Are weights.

Building deflection vector prediction and deflection prediction at the base area are regression tasks, using Mean Square Error (MSE) penalties, as shown in equation (3). L is_ang、L_{field_a}、L_{field_b}The losses corresponding to the direction angle prediction network, the first vector prediction network and the second vector prediction network are respectively represented. Wherein the content of the first and second substances,

a vector of offsets representing the prediction is represented,

representing the actual offset vector, n representing the total number of pixels in the monoscopic image, L_{field_a}And L_{field_b}Are collectively denoted as L_field，||·||₂Representing a two-norm.

Loss L of direction angle prediction task_angvAs shown in equation (4):

where K denotes the total number of categories of direction angles, y_kIs a binary indicator; p (y)_k) The probability that the direction angle corresponding to the monoscopic image belongs to the category k is represented. In a single scene image pairWhen the type of the corresponding direction angle is the same as the real direction type corresponding to the monoscopic image, y_k1, when the type of the direction angle corresponding to the single-view image is different from the real direction type corresponding to the single-view image, y_kIs 0.

Predicting total loss L of a task_offAs shown in equation (5):

L_off＝L_angv+L_{field_a}+L_{field_b} (5)

the total loss of the neural network is the sum of the total loss of the semantic segmentation task and the total loss of the prediction task, as shown in formula (6):

L＝L_sem+L_off (6)。

the offset vector between the base area and the roof area is obtained directly from the prediction result of the building offset vector, and the accuracy is not high. Considering that the present disclosure predicts the semantic edge of a building, the edge of the roof area and sides of a building, and the base area of a building at the same time, the prediction results of different tasks can be integrated, further optimizing the offset vector through post-processing. The post-processing method is mainly template matching.

When template matching is carried out, a template and a target domain can be determined based on the semantic segmentation result; determining a translation vector based on the offset vector prediction result, translating the template based on the translation vector, and determining the matching degree between the translated template and the target domain; and optimizing the offset vector prediction result based on the matching degree to obtain the optimized offset vector prediction result. And then, extracting the three-dimensional contour of the building based on the semantic segmentation result of the building and the optimized offset vector prediction result.

For each building, we use V to represent the translation vector when the templates match. The angle and magnitude interval of V is determined by the prediction of the building offset direction angle and the prediction of the building offset vector. In some embodiments, the range of lengths of the translation vectors may be determined based on the average length (len) of the offset vectors for the various pixel points of the rooftop regionThe average length is within the length range. For example, the ratio (r) may be predefined₁，r₂) Then the length range of the translation vector is [ r ]₁*len,r₂*len]。

The direction of the translation vector may also be determined based on an average direction angle of the offset vectors for the various pixel points of the rooftop region. Further, in the case that the average length is smaller than a preset length, determining an average offset direction angle of a building in the monoscopic image as a direction of the translation vector. And determining the average direction angle of the offset vector of each pixel point of the roof area as the direction of the translation vector under the condition that the average length is greater than or equal to the preset length.

In the embodiment, the translation vector V used when the template matching is determined based on the offset vector prediction result and the prediction result of the direction angle, because the accuracy of the general direction prediction result of a short building (a building with an average length smaller than a preset length) is low, the accuracy of template matching can be improved by determining the average direction angle of the building in the whole graph as the direction of the translation vector, so that the accuracy of extracting the three-dimensional contour of the building is improved. And the direction of tall buildings (buildings with the average length greater than or equal to the preset length) is accurate, so that the predicted direction can be directly adopted as the direction of the translation vector.

In the template matching process, the template is moved from r along the angle of V₁Gradually shift len to r₂Len. And calculating the matching degree of the template and the target domain once every time one pixel is moved, wherein the place with the maximum matching degree is the final offset. In some embodiments, the matching degree of the template and the target domain may be calculated using an Intersection of Union (IoU) of the two. Alternatively, r₁＝0.7，r₂The template matching effect is best when the template is 1.5.

In some embodiments, referring to fig. 5, the template matching task includes at least one of the following 3 sub-matching tasks:

(1) edge matching task 1: considering that the edge shape between the roof side edge and the side bottom edge is similar, the template for this task is the edge between the roof area and the building side, and the target area is the edge between the base area and the building side. Wherein the roof side edge is the edge between the roof region and the side, and the side bottom edge is the edge between the side and the base region.

(2) Edge matching task 2: considering that the roof side edge is similar in shape to the side background edge, the template for this task is the edge between the roof area and the building side, and the target domain is the edge between the building side and the non-building area. The side background edge is an edge between the side and a background area of the single-scene image except the building.

(3) Roof-base matching tasks: in view of the similarity between the roof area and the base area, the template for this task is the roof area and the target domain is the base area.

Of the 3 tasks, each sub-matching task can calculate IoU each time it moves, and the total IoU is the sum of 3 IoU. Where total IoU is largest, the final optimized offset is obtained.

In some embodiments, the building edge may be further subjected to polygonization processing based on the edge direction segmentation result, and the building may be subjected to three-dimensional contour extraction based on the polygonized semantic segmentation result and the offset vector prediction result.

Firstly, the area segmentation result of the building can be converted into the edge contour of the roof area, and the coordinates of each pixel point on the edge contour of the roof area are determined in a dense point collection mode. For each pixel point on the edge contour of the roof area, if the difference between the edge direction category of the pixel point and the edge direction category of the adjacent pixel point of the pixel point is greater than a given difference threshold, the pixel point is used as a vertex and is reserved, otherwise, the pixel point is deleted. And then, predicting the deviation between each vertex and the real vertex corresponding to the vertex by adopting a vertex correction method based on a graph neural network, so as to further optimize the polygon vertex and obtain the polygon of the roof area after regularization treatment.

The base area can be determined by the optimized offset and the polygon of the roof area after the regularization processing, so that the three-dimensional outline of the whole building after the regularization is obtained. Specifically, the coordinate of the pixel point in the roof area minus the offset vector is the coordinate of the corresponding pixel point in the base area.

The traditional segmentation or detection method can only obtain the segmentation result of the whole building or the three-dimensional contour information of a short building, but the method of the embodiment of the disclosure can accurately obtain the three-dimensional contour of the building no matter the short building or a high-rise building, and the robustness and the accuracy of the three-dimensional contour extraction exceed those of the traditional method.

In addition, some related techniques acquire three-dimensional information of buildings by means of various data sources, such as lidar data, digital elevation models, and the like. The acquisition cost of the data is high, and the position and time for acquiring the data are limited, so that the application scenes of the methods are very limited. The method can obtain the three-dimensional outline of the building by using the monoscopic remote sensing image without laser radar data or digital elevation model data, and has low cost and wide application range.

In some application scenarios, a first single-view image of a to-be-processed area before a disaster occurs and a second single-view image of the to-be-processed area after the disaster occurs may be acquired. For each building in the area to be processed, the method of any of the foregoing embodiments may be adopted to process the first single-view image and the second single-view image respectively, so as to obtain a first three-dimensional contour before the building is in disaster and a second three-dimensional contour after the building is in disaster. Then, the similarity between the first three-dimensional contour and the second three-dimensional contour of the building can be determined, the number of buildings with the similarity lower than a preset similarity threshold in the area to be processed is obtained, and the disaster degree of the area to be processed is determined based on the number of buildings with the similarity lower than the preset similarity threshold.

For example, when the number of buildings with the similarity lower than a preset similarity threshold is greater than a preset number threshold, determining that the disaster degree of the area to be processed is a first level; and under the condition that the number of the buildings with the similarity not lower than the preset similarity threshold is larger than the preset number threshold, determining that the disaster degree of the area to be processed is a second level, wherein the first level is higher than the second level, namely, the disaster degree corresponding to the first level is more serious than that of the second level. Because the three-dimensional contour of the embodiment of the disclosure includes different areas on buildings such as a roof area and a side surface, the similarity between the buildings can be more accurately determined, and thus the disaster degree can be accurately determined.

In other application scenarios, the area of a roof region in the three-dimensional outline of the building can be obtained; and under the condition that the area of the roof area of the building is larger than a preset area threshold value, determining that the building is a violation building. Because the three-dimensional profile of the embodiment of the disclosure comprises different areas on buildings such as a roof area, a side face and the like, the condition that the area determination is inaccurate due to the fact that the side face is misjudged as the roof area can be reduced, and the accuracy of determining the illegal buildings is improved.

Besides the above application scenarios, the distribution and floor height of the building can be obtained by extracting the three-dimensional outline of the building, so as to provide data support for counting the number of residents living frequently and planning cities, and the extracted three-dimensional outline of the building can be used in other scenarios, which is not further listed in this disclosure.

As shown in fig. 6, an embodiment of the present disclosure further provides a training method of a neural network, where the method includes:

step 601: processing a sample image through an initial neural network, wherein the initial neural network comprises an initial segmentation network and an initial prediction network, the initial segmentation network is used for obtaining semantic segmentation results, the semantic segmentation results are used for determining a three-dimensional area on the building, the initial prediction network is used for obtaining offset vector prediction results, and the offset vector prediction results are used for determining the height of the building;

step 602: respectively obtaining the loss of the initial segmentation network and the loss of the initial prediction network, and determining the loss of the initial neural network based on the loss of the initial segmentation network and the loss of the initial prediction network;

step 603: training the initial neural network based on the loss of the initial neural network.

The neural network obtained by training through the training method of the embodiment of the disclosure can be used for processing a single-scene image including a building in the building three-dimensional contour extraction method of any one of the embodiments to obtain a semantic segmentation result and an offset vector prediction result. The details of the training method are described in the embodiment of the building three-dimensional contour extraction method, and are not repeated herein.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

As shown in fig. 7, an embodiment of the present disclosure further provides a building three-dimensional contour extraction apparatus, where the apparatus includes:

a first processing module 701, configured to process a monoscopic image including a building through a neural network to obtain a semantic segmentation result and an offset vector prediction result, where the semantic segmentation result is used to determine a three-dimensional area on the building, and the offset vector prediction result is used to determine a height of the building;

and the contour extraction module 702 is configured to perform three-dimensional contour extraction on the building based on the semantic segmentation result and the offset vector prediction result of the building, so as to obtain a three-dimensional contour of the building.

As shown in fig. 8, an embodiment of the present disclosure further provides a training apparatus for a neural network, where the apparatus includes:

a second processing module 801, configured to process a sample image through an initial neural network, where the initial neural network includes an initial segmentation network and an initial prediction network, the initial segmentation network is configured to obtain a semantic segmentation result, the semantic segmentation result is configured to determine a three-dimensional area on the building, the initial prediction network is configured to obtain an offset vector prediction result, and the offset vector prediction result is configured to determine a height of the building;

a loss obtaining module 802, configured to obtain a loss of the initial segmented network and a loss of the initial predicted network, and determine a loss of the initial neural network based on the loss of the initial segmented network and the loss of the initial predicted network;

a training module 803, configured to train the initial neural network based on the loss of the initial neural network.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present specification also provide a computer device, which at least includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method according to any of the foregoing embodiments when executing the program.

Fig. 9 is a schematic diagram illustrating a more specific hardware structure of a computing device according to an embodiment of the present disclosure, where the computing device may include: a processor 901, a memory 902, an input/output interface 903, a communication interface 904, and a bus 905. Wherein the processor 901, the memory 902, the input/output interface 903 and the communication interface 904 enable a communication connection within the device with each other through a bus 905.

The processor 901 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present specification. The processor 901 may further include a display card, which may be an Nvidia titan X display card or a 1080Ti display card, etc.

The Memory 902 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 902 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 902 and called by the processor 901 for execution.

The input/output interface 903 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 904 is used for connecting a communication module (not shown in the figure) to realize communication interaction between the device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 905 includes a pathway to transfer information between various components of the device, such as processor 901, memory 902, input/output interface 903, and communication interface 904.

It should be noted that although the above-mentioned device only shows the processor 901, the memory 902, the input/output interface 903, the communication interface 904 and the bus 905, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method of any of the foregoing embodiments.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, and the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in one or more software and/or hardware when implementing the embodiments of the present disclosure. And part or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The foregoing is only a specific embodiment of the embodiments of the present disclosure, and it should be noted that, for those skilled in the art, a plurality of modifications and decorations can be made without departing from the principle of the embodiments of the present disclosure, and these modifications and decorations should also be regarded as the protection scope of the embodiments of the present disclosure.

Claims

1. A method for extracting a three-dimensional outline of a building, which is characterized by comprising the following steps:

processing a single-scene image comprising a building through a neural network to obtain a semantic segmentation result and an offset vector prediction result, wherein the semantic segmentation result is used for determining a three-dimensional area on the building, and the offset vector prediction result is used for determining the height of the building;

and extracting the three-dimensional contour of the building based on the semantic segmentation result and the offset vector prediction result of the building to obtain the three-dimensional contour of the building.

2. The method of claim 1, wherein the semantic segmentation results comprise edge segmentation results for determining building edges and region segmentation results for determining regions on a building;

the offset vector prediction results comprise offset vectors of pixel points of the roof area and the side of the building.

3. The method of claim 2, wherein the semantic segmentation results further comprise at least one of:

an edge direction segmentation result for determining an edge direction of the building;

a base segmentation result for determining a base area of the building;

and/or

The offset vector predictor further includes at least one of:

the average deviation direction angle of the building in the single-scene image;

and the offset vector of the pixel points in the base area of the building is the vector of any second pixel point in the base area pointing to the first pixel point corresponding to the second pixel point in the roof area.

4. The method of any one of claims 1-3, wherein the semantic segmentation results include a base segmentation result for determining a base region of the building, and a region segmentation result for determining regions on the building, the three-dimensional region including a roof region of the building; the offset vector prediction result comprises an offset vector of a first pixel point, and the first pixel point is a pixel point of the roof area;

the neural network obtains the base segmentation result based on the following modes:

for each first pixel point, translating the first pixel point based on the offset vector of the first pixel point to obtain a second pixel point corresponding to the first pixel point;

and determining the base segmentation result based on the second pixel points corresponding to the first pixel points.

5. The method of any of claims 1-4, wherein the neural network comprises a plurality of semantic segmentation networks and a plurality of prediction networks, the plurality of semantic segmentation networks comprising:

an edge segmentation network for determining building edges;

a regional division network for determining regions on the building;

an edge direction segmentation network for determining an edge direction of the building;

a base partition network for determining a base area of the building;

the plurality of predictive networks includes:

a first vector prediction network for determining offset vectors for pixel points of a rooftop region and a side of the building;

a second vector prediction network for determining an offset vector for pixel points of a base area of the building;

a direction angle prediction network for determining an average offset direction angle of buildings in the monoscopic image.

6. The method of claim 5, further comprising:

processing a sample image through an initial neural network, wherein the initial neural network comprises a plurality of initial segmentation networks and a plurality of initial prediction networks, the plurality of initial segmentation networks comprise an initial edge segmentation network, an initial region segmentation network, an initial edge direction segmentation network and an initial base segmentation network, and the initial prediction networks comprise an initial first vector prediction network, an initial second vector prediction network and an initial direction angle prediction network;

respectively acquiring the segmentation loss of each initial segmentation network and the prediction loss of each initial prediction network;

summing the segmentation loss of each initial segmentation network and the prediction loss of each initial prediction network to obtain the loss of the initial neural network;

and training the initial neural network based on the loss of the initial neural network to obtain the neural network.

7. The method according to any one of claims 1-6, further comprising:

determining a template and a target domain based on the semantic segmentation result;

determining a translation vector based on the offset vector prediction result, translating the template based on the translation vector, and determining the matching degree between the translated template and the target domain;

optimizing the offset vector prediction result based on the matching degree to obtain an optimized offset vector prediction result;

the three-dimensional contour extraction of the building based on the semantic segmentation result and the offset vector prediction result of the building comprises the following steps:

and extracting the three-dimensional contour of the building based on the semantic segmentation result of the building and the optimized offset vector prediction result.

8. The method of claim 7, wherein the offset vector predictor comprises an offset vector for each pixel of the rooftop region; the determining a translation vector based on the offset vector predictor comprises:

determining a length range of the translation vector based on an average length of offset vectors of respective pixel points of the rooftop region, the average length being within the length range;

determining a direction of the translation vector based on an average direction angle of the offset vectors of the respective pixel points of the rooftop region.

9. The method of claim 8, wherein the offset vector predictor comprises an average offset direction angle of a building in the monoscopic image; the determining the direction of the translation vector based on the average direction angle of the offset vectors of the pixel points of the roof region comprises:

determining the average deviation direction angle of the building in the single-scene image as the direction of the translation vector under the condition that the average length is smaller than the preset length;

and determining the average direction angle of the offset vector of each pixel point of the roof area as the direction of the translation vector under the condition that the average length is greater than or equal to the preset length.

10. The method according to any of claims 7-9, wherein the semantic segmentation results comprise edge segmentation results for determining building edges; the template is an edge between a roof area and a side of the building, and the target area is an edge between a base area and a side of the building; and/or

The semantic segmentation results comprise edge segmentation results for determining building edges; the template is an edge between a roof area and a building side, and the target area is an edge between a building side and a non-building area; and/or

The semantic segmentation result comprises a region segmentation result used for determining each region on the building; the template is a roof area and the target area is a base area.

11. The method according to any one of claims 1-10, wherein the semantic segmentation results comprise edge segmentation results for determining building edges and edge direction segmentation results for determining the edge direction of the building; the method further comprises the following steps:

performing polygonization processing on the building edge based on the edge direction segmentation result;

and performing three-dimensional contour extraction on the building based on the semanteme segmentation result subjected to the polygonization processing and the offset vector prediction result.

12. The method according to any one of claims 1 to 11, wherein the single-view images include a first single-view image of a pre-disaster area to be processed and a second single-view image of the post-disaster area to be processed, and the three-dimensional outlines of the buildings include a first three-dimensional outline of each building in the pre-disaster area to be processed and a second three-dimensional outline of each building in the post-disaster area to be processed; the method further comprises the following steps:

for each building in the area to be processed, determining the similarity of a first three-dimensional contour and a second three-dimensional contour of the building;

acquiring the number of buildings with the similarity lower than a preset similarity threshold in the area to be processed;

and determining the disaster degree of the area to be processed based on the number of buildings with the similarity lower than a preset similarity threshold.

13. The method according to any one of claims 1-12, further comprising:

obtaining an area of a roof region in a three-dimensional profile of the building;

and under the condition that the area of the roof area of the building is larger than a preset area threshold value, determining that the building is a violation building.

14. A method of training a neural network, the method comprising:

processing a sample image through an initial neural network, wherein the initial neural network comprises an initial segmentation network and an initial prediction network, the initial segmentation network is used for obtaining a semantic segmentation result, the semantic segmentation result is used for determining a three-dimensional area on a building, the initial prediction network is used for obtaining an offset vector prediction result, and the offset vector prediction result is used for determining the height of the building;

respectively obtaining the loss of the initial segmentation network and the loss of the initial prediction network, and determining the loss of the initial neural network based on the loss of the initial segmentation network and the loss of the initial prediction network;

training the initial neural network based on the loss of the initial neural network.

15. The method of claim 14, wherein the semantic segmentation network comprises:

an edge segmentation network for determining building edges;

a regional division network for determining regions on the building;

the predictive network includes:

a first vector prediction network for determining offset vectors for pixel points of a rooftop region and a side of the building.

16. The method of claim 15, wherein the semantic segmentation network further comprises at least one of:

a base partition network for determining a base area of the building;

and/or

The predictive network further includes at least one of:

a second vector prediction network for determining an offset vector for pixel points of a base area of the building; and

17. An apparatus for extracting a three-dimensional outline of a building, the apparatus comprising:

the system comprises a first processing module, a second processing module and a third processing module, wherein the first processing module is used for processing a single-scene image comprising a building through a neural network to obtain a semantic segmentation result and an offset vector prediction result, the semantic segmentation result is used for determining a three-dimensional area on the building, and the offset vector prediction result is used for determining the height of the building;

and the contour extraction module is used for extracting the three-dimensional contour of the building based on the semantic segmentation result and the offset vector prediction result of the building to obtain the three-dimensional contour of the building.

18. An apparatus for training a neural network, the apparatus comprising:

the system comprises a first processing module, a second processing module and a third processing module, wherein the first processing module is used for processing a sample image through an initial neural network, the initial neural network comprises an initial segmentation network and an initial prediction network, the initial segmentation network is used for obtaining a semantic segmentation result, the semantic segmentation result is used for determining a three-dimensional area on a building, the initial prediction network is used for obtaining an offset vector prediction result, and the offset vector prediction result is used for determining the height of the building;

a loss obtaining module, configured to obtain a loss of the initial segmented network and a loss of the initial predicted network, respectively, and determine a loss of the initial neural network based on the loss of the initial segmented network and the loss of the initial predicted network;

a training module to train the initial neural network based on the loss of the initial neural network.

19. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 16.

20. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 16 when executing the program.