CN117152604A

CN117152604A - Building contour extraction method and device, electronic equipment and storage medium

Info

Publication number: CN117152604A
Application number: CN202310952456.7A
Authority: CN
Inventors: 李少丹; 朱梓萌; 付诗雨; 张小霞
Original assignee: Hebei Normal University
Current assignee: Hebei Normal University
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-12-01

Abstract

The application provides a method and a device for extracting a building contour, electronic equipment and a storage medium, wherein the method comprises the following steps: constructing a new Faster R-CNN model and an RH-CUnet model, preliminarily positioning a building in a target image through the new Faster R-CNN model to obtain a building boundary frame, and repositioning the building boundary frame to obtain a building area image; extracting image semantic features from the building region image through the RH-CUnet model, extracting building edge features from the building region image, fusing the image semantic features and the building edge features to obtain edge fusion features, extracting building corner points from the edge fusion features, and outputting building outlines through the extracted building corner point data. The application can rapidly and accurately identify the buildings in the large-range remote sensing image.

Description

Building contour extraction method and device, electronic equipment and storage medium

Technical Field

The application mainly relates to the technical field of image processing, in particular to a method and a device for extracting building outlines, electronic equipment and a storage medium.

Background

With the continuous progress of remote sensing technology, remote sensing images have gradually become a main data source for building extraction. The rural building is a main element of rural area, and the information of the rural building has important significance for estimating and predicting rural population, improving rural vibration and accelerating the urban process. Different from the building distribution in urban areas, the residents and the buildings in rural areas are scattered, and are often distributed with vegetation, farmlands and the like in a staggered manner. Most of the current researches aim at urban residents and buildings, less focus is on rural areas, the research is concentrated on small-range images, and the research on the extraction of the rural residents of a large-range remote sensing image is still immature. Previous studies have found that the residual spectrum method achieves good results in extracting small-scale rural residents, but the image results are not ideal in a large scale.

Disclosure of Invention

The application aims to solve the technical problem of providing a method, a device, electronic equipment and a storage medium for extracting building outlines aiming at the defects of the prior art.

The technical scheme for solving the technical problems is as follows: the method for extracting the outline of the building comprises the following steps:

s1, constructing a model, which comprises the following steps:

performing model construction based on an original Faster R-CNN model and a Spectral Residual model to obtain a Faster R-CNN new model;

performing model construction based on an original U-Net model, a CBAM double-attention mechanism module, an RCF edge detection network and a Harris corner detection operator to obtain an RH-CUnet model;

s2, building contour extraction, which comprises the following steps:

preliminary positioning is carried out on a building in the target image through the novel Faster R-CNN model to obtain a building boundary frame, and repositioning is carried out on the building boundary frame to obtain a building area image;

extracting image semantic features from the building region image through the RH-CUnet model, extracting building edge features from the building region image, fusing the image semantic features and the building edge features to obtain edge fusion features, extracting building corner points from the edge fusion features, and outputting building contours through the extracted building corner point data.

The other technical scheme for solving the technical problems is as follows: the device for extracting the building outline comprises a model building module and a building outline extracting module:

the model construction module is used for carrying out model construction based on an original Faster R-CNN model and a Spectral Residual model to obtain a Faster R-CNN new model;

the method is also used for carrying out model construction based on an original U-Net model, a CBAM double-attention mechanism module, an RCF edge detection network and a Harris corner detection operator to obtain an RH-CUnet model;

the building contour extraction module is used for carrying out preliminary positioning on a building in a target image through the fast R-CNN new model to obtain a building boundary frame, and carrying out repositioning on the building boundary frame to obtain a building area image;

The other technical scheme for solving the technical problems is as follows: an electronic device comprising a memory and a processor, the memory having a computer program stored therein, the processor, when executing the computer program, implementing a method of extracting a building contour as described above.

The other technical scheme for solving the technical problems is as follows: a computer readable storage medium storing a computer program which, when executed by a processor, implements a method of extracting a building contour as described above.

The beneficial effects of the application are as follows: the novel fast R-CNN model and the RH-CUnet model are built through the model building module, the building is roughly positioned and finely positioned through the novel fast R-CNN model, the positioning accuracy is improved, the corner points of the building are further focused through the building contour extraction module, and the contour is further refined.

Drawings

Fig. 1 is a flow chart of a method for extracting a building contour according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a flow chart of processing an image by a fast R-CNN model according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of coarse positioning and fine extraction according to an embodiment of the present application;

fig. 4 is a schematic diagram of a bounding box-based extraction flow of a Spectral Residual model according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a CUnet model according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of a RH-CUnet method for extracting a building according to an embodiment of the present application;

fig. 7 is a block diagram of a device for extracting a building contour according to an embodiment of the present application.

Detailed Description

The principles and features of the present application are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the application and are not to be construed as limiting the scope of the application.

Example 1:

as shown in fig. 1, a method for extracting a building contour includes the following steps:

s1, constructing a model, which comprises the following steps:

s2, building contour extraction, which comprises the following steps:

It should be understood that the target image is a remote sensing image, and that the new model of Faster R-CNN and the RH-CUnet model constitute a model framework.

In the embodiment, the novel fast R-CNN model and the RH-CUnet model are built through the model building module, the building is roughly positioned and finely positioned through the novel fast R-CNN model, the positioning accuracy is improved, the building contour extraction module is used for focusing on the corner points of the building, and the contour is further refined.

Based on the above embodiment, in the step S1, a new model of fast R-CNN is obtained by performing model construction based on the original fast R-CNN model and Spectral Residual model, specifically:

connecting the output of the original fast R-CNN model with the input of the Spectral Residual model;

in the step S2, a building in the target image is initially positioned through the novel fast R-CNN model to obtain a building boundary frame, and the building boundary frame is repositioned to obtain a building area image, specifically:

and preliminarily positioning a building in the target image through an original Faster R-CNN model in the Faster R-CNN new model to obtain a building boundary frame, and repositioning the building boundary frame through a Spectral Residual model in the Faster R-CNN new model to obtain a building region image.

In the step, the residual spectrum method is combined with the deep neural network to extract the residential areas on the large-scale remote sensing image. A method of embedding the Spectral Residual model into the fast R-CNN model was used. The method is divided into two parts, coarse positioning and fine extraction. Coarse localization of candidate built-up area bounding boxes on a large scale of remote sensing images is first obtained by training the modified Faster R-CNN, after which the exact boundaries of each built-up area are obtained based on the bounding boxes using a Spectral Residual model.

As shown in fig. 2, 3 and 4, it should be understood that preliminary positioning, i.e. rough positioning, in fig. 2, firstly, the fast R-CNN network inputs an image into the convolutional network to perform feature extraction, and a feature map is obtained through the last shared convolutional layer; inputting the feature map into an RPN network to select candidate areas, and fixing each suggested network to a uniform size through RoiAlign to generate suggested areas, so as to ensure that the input full-connection layer is feature vectors with the same size (namely fixed size); and finally inputting the result into a full connection layer to connect the two branches: one branch is connected with the softmax classifier through the full connection layer to output a target class vector so as to obtain a target class; the other branch is connected with the boundary box regressor through the full connection layer to output the boundary box position information.

Then, repositioning, namely accurate extraction: after a residential land detection frame is obtained by a fast R-CNN method, the boundary of the residential land is accurately extracted by using an SR model (Spectral Residual model). As shown in fig. 4, first, a color picture I (x) is converted into a gray-scale image G (x); then, carrying out Fourier transform on the gray level image to convert the image into a frequency domain; then reconstructing a corresponding saliency map S (x) in the spatial domain from the residual spectrum in the frequency domain through inverse Fourier transform; and finally, generating a corresponding binary image from the saliency map by using an Otsu threshold method, and carrying out binary mask processing on the binary image combined with an original input image to obtain a final residential land extraction result (namely a building area image).

In order to prove the effectiveness of the novel model of Faster R-CNN, the method is used for extracting residents of three large-scale images, evaluating the residents in qualitative and quantitative aspects, and finding that the method can effectively and accurately extract residents with different sizes in the remote sensing images.

Based on the above embodiment, in the step S1, model construction is performed based on the original U-Net model, the CBAM dual-attention mechanism module, the RCF edge detection network, and the Harris corner detection operator, so as to obtain an RH-CUnet model, which specifically includes:

s11, adding a CBAM double-attention mechanism module to the jump connection part of the original U-Net model to obtain a CUnet model.

It should be appreciated that the classical network model U-Net was developed based on a fully convolutional neural network, a "U" network model consisting of both encoder and decoder. The left side of the model is provided with an encoder which is a stack of convolution and maximum pooling and is used for extracting the features of the image, gradually compressing the space dimension of the feature map and expanding the feature map channel; the right side is a decoder for restoring the space dimension and detail information of the image and compressing the channels of the feature map. Aiming at the phenomena that the U-Net network is easy to have false detection and missing detection and the building extraction is incomplete when a building is extracted, the application combines a CBAM double-attention mechanism with a classical network model U-Net (hereinafter referred to as a CUnet model), and an improved network model (the CUnet model) is shown in figure 5.

CBAM is a dual attention mechanism module consisting of a spatial attention mechanism and a channel attention mechanism. The channel attention can extract global salient feature textures, and the influence of the complex background of the image on model training is reduced; spatial attention further screens for salient features on a spatial scale, thereby more effectively learning the features of traditional village buildings. This embodiment adds CBAM to the part of the jump connection, i.e. before feature fusion, as indicated by the arrow in fig. 5. The CBAM attention mechanism is introduced into the semantic segmentation network U-Net, so that the model can focus on information (traditional village buildings) which is more critical to the current task in a plurality of input information, the attention degree of other information such as roads, vegetation, rivers and the like is reduced, even irrelevant information is filtered, and the efficiency and accuracy of task processing are improved.

And S12, connecting the input of the RCF edge detection network and the input of the CUnet model in parallel, connecting the output of the CUnet model with a feature fusion module of the RCF edge detection network to obtain an RCF-CUnet model, and taking the output of the RCF edge detection network as the output of the RCF-CUnet model.

Specifically, the RCF network structure includes three modules, namely a trunk module, a deep supervision module, and a feature fusion module. The backbone network of the RCF adopts all convolution layers of VGG16, and the total convolution layers are 5 stages, and the part can realize automatic extraction of the edge characteristics of the characteristic map; aiming at the problem that the convergence effect of the main network is not ideal when the network is trained due to a plurality of parameters of the main network, the model designs a deep supervision module to carry out deep supervision learning on 5 stages. And finally, superposing the output characteristic edge graph of each stage by utilizing a characteristic fusion module so as to achieve the capability of acquiring various mixed information. Aiming at the problems of fuzzy and unsmooth target edges when the traditional village is extracted by the semantic segmentation network U-Net, the embodiment provides a semantic segmentation model (RCF-Cnet model) combining edge detection on the basis of CUnet. The access of the edge recognition network can promote the attention of the model to the edges of the building, provide accurate and rich edge information for semantic segmentation, and enable the extracted edges of the building to be smoother and more accurate, thereby improving the extraction precision of the building.

As shown in the feature fusion module of fig. 6, semantic information is extracted based on the CUnet network and edge information is extracted by the RCF network, the above 2 features are initially fused by using a Concat fusion mode in the feature fusion module, 4-layer convolution operation is performed on the fused result, and finally softmax classification is performed on the fused result, so that a final building segmentation result is obtained.

And S13, connecting the output of the RCF-CUnet model with the input of a Harris corner detection operator to obtain an RH-CUnet model, wherein the output of the Harris corner detection operator is used as the output of the RH-CUnet model.

On the basis of the above embodiment, in S2, image semantic features are extracted from the building area image through the RH-CUnet model, building edge features are extracted from the building area image, the image semantic features and the building edge features are fused to obtain edge fusion features, building corner point extraction is performed on the edge fusion features, and building contours are output through the extracted building corner point data, specifically:

extracting image semantic features from the building area image through the CUnet model and extracting building edge features from the building area image through the RCF edge detection network;

fusing the image semantic features and the building edge features through a feature fusion module of the RCF edge detection network, and outputting an edge fusion feature map;

and extracting building corner points from the edge fusion feature map through the RH-CUnet model, and outputting building outlines through the building corner point data.

In the above embodiment, the proposed RH-CUnet model includes 4 parts in total, as shown in FIG. 6. The first part introduces CBAM into the jump linking part of the U-Net model, namely before feature fusion; the second part is connected with an RCF edge detection network in parallel on the basis of the second part, and the network can learn the edge information of a building, improve the attention of a model to the edge of the building and solve the problems of fuzzy and unsmooth target edges of rural buildings; and the third part is to fuse the extracted semantic features with the edge features by using a feature fusion module. And the fourth part is to introduce a corner detection algorithm Harris (namely Harris corner detection operator) to improve the attention of the model to the corner of the building, and solve the problems of low corner positioning precision and irregular corner shape.

On the basis of the above embodiment, the extracting, by the RH-CUnet model, building corner points from the edge fusion feature, and outputting building outlines by using the extracted building corner point data, specifically includes:

establishing a local detection window according to the Harris corner detection operator, and setting the moving parameters of the local detection window;

and moving the local detection window in the edge fusion feature map according to the movement parameters, generating energy change when the local detection window moves each time, calculating the energy change to obtain an energy change value, extracting building corner points according to the energy change value, and outputting building contours through the extracted building corner point data.

On the basis of the above embodiment, the calculating the energy change to obtain an energy change value, and extracting building corner data according to the energy change value specifically includes:

calculating the energy change according to a detection formula of the Harris corner detection operator, wherein the detection formula is as follows:

R＝(λ ₁ +λ ₂ )-k(λ ₁ λ ₂ ))，

wherein k is a constant and takes a value of 0.04; lambda (lambda) ₁ And lambda (lambda) ₂ Curvature as a local autocorrelation function;

and comparing the R value with a threshold value, and extracting a central pixel point of the local detection window as building corner data if the R value is larger than the threshold value and is a maximum value in a neighborhood.

It should be understood that the U-Net network is easy to have the problems that the angular point positioning precision is low, the polygon top points are easy to deviate from the edges of the building, the extracted building angular points are irregular, and the like when the traditional village building is extracted, so that a Harris angular point detection operator is introduced on the basis of the RCF-CUnet network to improve the attention degree (RH-CUnet) of the model to the building angular points, and accurate and rich angular point information is provided for semantic segmentation. The Harris corner detection operator principle is as follows: and designing a local detection window in the image, when the window is subjected to micro movement along all directions, extracting a central pixel point of the window as a corner point of a building when the energy change value exceeds a set threshold value, wherein the threshold value is 0.05.

To demonstrate the effectiveness of the RH-CUnet model in extracting rural buildings, we utilized the model to verify the homemade traditional village building dataset and the published WHO building dataset. The finally extracted rural building reduces the phenomena of missing detection, false detection and incomplete building to a certain extent, and the outline shape of the building is smooth and complete, and the corners are sharp and clear, which proves that the RH-CUnet model is effective in the extraction of the rural building.

Example 2:

as shown in fig. 7, an extraction device for building outlines includes a model building module and a building outline extraction module:

On the basis of the above embodiment, in the model building module, model building is performed based on an original U-Net model, a CBAM dual-attention mechanism module, an RCF edge detection network, and a Harris corner detection operator, so as to obtain an RH-CUnet model, which specifically includes:

in the building contour extraction module, a building in a target image is initially positioned through the fast R-CNN new model to obtain a building boundary frame, and the building boundary frame is repositioned to obtain a building area image, which specifically comprises the following steps:

In the model construction module, model construction is carried out based on an original U-Net model, a CBAM double-attention mechanism module, an RCF edge detection network and a Harris corner detection operator to obtain an RH-CUnet model, wherein the model construction module comprises the following concrete steps:

adding a CBAM dual-attention mechanism module to a jump connection part of an original U-Net model to obtain a CUnet model;

connecting an input of an RCF edge detection network and an input of a CUnet model in parallel, and connecting an output of the CUnet model with a feature fusion module of the RCF edge detection network to obtain an RCF-CUnet model, wherein the output of the RCF edge detection network is used as an output of the RCF-CUnet model;

and connecting the output of the RCF-CUnet model with the input of a Harris corner detection operator to obtain an RH-CUnet model, wherein the output of the Harris corner detection operator is used as the output of the RH-CUnet model.

The application can be applied to urban and rural planning construction, provides basic data of buildings for urban and rural planning in the rural vibration strategy of China, and has important significance for rural vibration and acceleration of urban construction.

On the basis of the above embodiment, in the building contour extraction module, image semantic features are extracted from the building region image through the RH-CUnet model, building edge features are extracted from the building region image, the image semantic features and the building edge features are fused to obtain edge fusion features, building corner point extraction is performed on the edge fusion features, and building contours are output through the extracted building corner point data, specifically:

R＝(λ ₁ +λ ₂ )-k(λ ₁ λ ₂ ))，

wherein k is a constant and has a value of 0.04, lambda ₁ And lambda (lambda) ₂ Curvature as a local autocorrelation function;

and comparing the R value with a threshold value, and if the R value is larger than the threshold value and is the maximum value in the neighborhood, extracting the central pixel point of the local detection window as building corner data, wherein the threshold value is 0.05.

Example 3:

an electronic device comprising a memory and a processor, the memory having stored therein a computer program, which when executed by the processor implements a method of extracting a building contour as described above.

Example 4:

a computer readable storage medium storing a computer program, characterized in that the method of extracting a building contour as described above is implemented when the computer program is executed by a processor.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and units described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present application.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the application are intended to be included within the scope of the application.

Claims

1. The method for extracting the outline of the building is characterized by comprising the following steps of:

s1, constructing a model, which comprises the following steps:

s2, building contour extraction, which comprises the following steps:

2. The method for extracting a building contour according to claim 1, wherein in S1, model construction is performed based on an original fast R-CNN model and a Spectral Residual model to obtain a new fast R-CNN model, specifically:

3. The method for extracting a building contour according to claim 1, wherein in S1, model construction is performed based on an original U-Net model, a CBAM dual-attention mechanism module, an RCF edge detection network, and a Harris corner detection operator to obtain an RH-CUnet model, specifically:

4. The method for extracting a building contour according to claim 3, wherein in S2, image semantic features are extracted from the building region image through the RH-CUnet model, building edge features are extracted from the building region image, the image semantic features and the building edge features are fused to obtain edge fusion features, building corner point extraction is performed on the edge fusion features, and building contours are output through the extracted building corner point data, specifically:

5. A method for extracting a building contour according to claim 3, wherein the edge fusion feature is extracted by the RH-CUnet model, and the building contour is output by the extracted building corner data, specifically:

6. The method for extracting building contour according to claim 5, wherein the calculating the energy variation to obtain an energy variation value, and extracting building corner data according to the energy variation value comprises:

R＝(λ ₁ +λ ₂ )-k(λ ₁ λ ₂ ))，

where k is a constant, lambda ₁ And lambda (lambda) ₂ Curvature as a local autocorrelation function;

7. The device for extracting the building outline is characterized by comprising a model building module and a building outline extracting module:

8. The method for extracting a building contour according to claim 7, wherein in the building contour extraction module, model construction is performed based on an original U-Net model, a CBAM dual-attention mechanism module, an RCF edge detection network and a Harris corner detection operator to obtain an RH-CUnet model, specifically:

9. An electronic device comprising a memory and a processor, the memory having a computer program stored therein, the processor, when executing the computer program, implementing the method of extracting a building contour according to any one of claims 1 to 6.

10. A computer-readable storage medium storing a computer program, characterized in that the method of extracting a building contour according to any one of claims 1 to 6 is implemented when the computer program is executed by a processor.