CN116052019B

CN116052019B - High-quality detection method suitable for built-up area of large-area high-resolution satellite image

Info

Publication number: CN116052019B
Application number: CN202310331820.8A
Authority: CN
Inventors: 陈一祥; 姚帅; 陈学业; 李胜
Original assignee: Shenzhen Planning And Natural Resources Data Management Center Shenzhen Spatial Geographic Information Center
Current assignee: Shenzhen Planning And Natural Resources Data Management Center Shenzhen Spatial Geographic Information Center
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-07-25
Anticipated expiration: 2043-03-31
Also published as: CN116052019A

Abstract

The invention discloses a built-up area high-quality detection method suitable for large-area high-resolution satellite images, which comprises the following steps of: s1) constructing a sample set of built-in areas and non-built-in areas; s2) constructing a lightweight convolutional neural network model fused with multi-level features and training the model; s3) carrying out multi-mode division on the images of the research area; s4) adopting the model constructed and trained by the S2 to predict the image blocks in the S3 block by block, and obtaining the membership degree of each image block membership built area under each division mode; s5) determining the class labels of the membership degree of the image blocks which cover each pixel in an integrated way, and obtaining the final high-quality built-up area detection result of the pixel level. The invention is based on the image processing and integrated classification recognition framework of multi-mode division, and can utilize multi-scale and multi-directional context information for each pixel, and realize pixel-level classification through the class label of the integrated multi-mode block, so that the detection result of the built-up area is finer and complete.

Description

High-quality detection method suitable for built-up area of large-area high-resolution satellite image

Technical Field

The invention belongs to the field of remote sensing image processing, and particularly relates to a built-up area high-quality detection method suitable for large-area high-resolution satellite images.

Background

Satellite remote sensing has become an important support technology for rapidly acquiring and updating geographic space information such as the position, the range and the like of an urban built-up area due to large scale and periodic observability. The wide availability of high resolution satellite data provides the possibility to achieve finer scale mapping of the built-up area. But high resolution satellite imagery contains various types of detail features and complex scenes compared to mid-low resolution imagery, which makes image processing and target detection more difficult. In particular, urban built-up areas are a large range of heterogeneous geographical objects, the internal different locations of which contain diverse ground feature features, with complex spatial structure patterns; at the same time, the increase in spatial resolution also means a considerable increase in the data volume of the images to be processed, which necessarily adds computational expense to the detection of the built-up area, these factors all presenting a great challenge to the extraction and mapping of the built-up area of the high-resolution satellite images. Furthermore, not only does the urban built-up area cover a wide spatial range, but it is also a problem to be solved how to accurately locate its boundaries and maintain the integrity of the target shape with defined shape boundaries.

The conventional extraction method of the built-up area mainly uses a manually designed (processed) feature extraction algorithm to realize the feature representation of the built-up area according to the texture, structure and the like of the built-up area, and then uses a supervised or unsupervised classification method to realize the extraction of the built-up area. The method is suitable for high-resolution images of small areas or simple scenes, and for high-resolution images of large areas, due to the space heterogeneity of features and the increase of data volume, the feature extraction algorithm with the universality designed manually is difficult, so that the extraction effect of the prior art is often not ideal.

In recent years, the appearance of deep learning provides a new idea for automatic extraction of high-resolution remote sensing image building areas. Deep learning can automatically learn multi-scale features of scenes and objects from sample data through operations such as convolution, pooling, etc., overcoming the deficiencies of conventional manually designed features. In the field of computational vision and pattern recognition, various Deep Convolutional Neural Network (DCNN) models, such as VGG-Net, googLeNet, resNet and DenseNet, have been developed, and these networks and their improved structures have been widely used for remote sensing image processing tasks including scene classification, land use mapping, object detection, and the like. However, most of these networks are designed for multi-face multivariate classification tasks, have complex model structures and huge numbers of parameters, the performance quality of which depends on a large number of labeled samples, and the training process of these models generally requires a large computational consumption.

High-resolution image built-up area extraction is a binary classification problem, and only a lightweight network is needed to meet the requirements. Under the deep learning framework, the built-on region extraction may be achieved by semantic segmentation or scene classification. The former requires pixel-level samples, which are generally time-consuming and laborious to obtain, while the latter requires tile-level samples, which are easier to obtain, and block-based processing also has higher processing efficiency and better feature representation performance for large-area images. Existing block-based processing approaches typically use regular grid partitions of fixed size to generate image blocks, which can disrupt the spatial relationship inside the built-up area, lead to large false positives and false negatives, and also result in severe jagged boundaries.

Disclosure of Invention

The invention aims at: a built-up area high-quality detection method suitable for large-area high-resolution satellite images is provided to solve the problems in the prior art.

In order to achieve the above purpose, the present invention provides the following technical solutions: the method for detecting the high quality of the built-up area suitable for the large-area high-resolution satellite image comprises the following steps:

s1, determining the size of a sample image according to the relation between the size of a building to be detected and the resolution of an image, and manufacturing a high-resolution built-in area and non-built-in area sample set which cover diversified scenes in a block level;

s2, constructing a lightweight convolutional neural network fused with multi-level features, taking the image blocks of the two types of sample sets obtained in the step S1 as input and the corresponding labels thereof as output, and training the neural network to obtain an image block membership degree representation model;

s3, dividing the image to be detected in multiple structural modes to obtain corresponding image block data sets in each structural mode;

s4, performing feature calculation on the image blocks of the multiple structural modes of the region to be detected by using the image block membership degree representation model obtained in the step S2 to obtain membership degrees of the built region to which each image block belongs;

s5, traversing image blocks of multiple structural modes of each pixel of the image of the region to be detected, and determining a pixel type label by combining the membership degree of the built-up region to which each image block belongs to obtain a final pixel level high-quality built-up region detection result.

Further, in the step S1, the size of the sample image is determined according to the relationship between the detectable building size and the image resolution, and the determination method is as follows: the size of a preset sample image is s multiplied by s, the actual length of a ground building is L, the resolution of an image is R, and the following formula is adopted:

s=L/R

determining the size of the sample image;

the high-resolution built-in area and non-built-in area sample set of the coverage diversified scene of the manufacturing block level is expressed as follows:

,

wherein, the liquid crystal display device comprises a liquid crystal display device,for a sample image block>For its corresponding label, a value of 1 corresponds to the class of built-up areas, a value of 0 corresponds to the class of non-built-up areas,Nis the number of samples.

Further, in the step S2, the lightweight convolutional neural network model fused with the multi-level features includes a 3×3 convolutional layer, a 2×2 max pooling layer, three feature extraction modules, two feature fusion modules, a global average pooling layer, and a full connection layer for generating classification results.

Further, the step S3 of performing the multiple modes of the image of the region to be detected includes the following sub-steps:

s3.1, defining a primary grid: namely, covering a regular square grid of an image of a region to be detected, and presetting the size of square grid units;

s3.2, rotating the original grid around the center of the original grid by 45 degrees clockwise or anticlockwise, and correspondingly obtaining a rotating grid;

s3.3, cutting the image of the region to be detected by using the original grid and the rotary grid respectively, and correspondingly obtaining an original grid image block and a rotary grid image block respectively;

s3.4, respectively carrying out t times of equidistant translation on the original grid and the rotary grid along the horizontal direction of the image of the region to be detected, wherein the translation interval is S/t, carrying out one time of regular grid partitioning on the image of the region to be detected covered by the original grid and the rotary grid once per translation, and correspondingly obtaining image blocks generated by the division of the translation grid to be respectively used as an original grid horizontal translation image block and a rotary grid horizontal translation image block;

and S3.5, respectively translating the original grid and the rotary grid downwards along the vertical direction of the image of the region to be detected and dividing the image into blocks, and correspondingly obtaining an original grid vertical translation image block and a rotary grid vertical translation image block.

Further, the above-mentioned method for detecting the quality of the built-up area suitable for the large-area high-resolution satellite image, wherein the three feature extraction modules comprise two paths: the first path is a 1 x 1 convolutional layer, which is used to accept the underlying features; the second path includes a depth separable convolution connected by three sequences, and the feature map output by the two paths outputs the final feature after passing through a 2 x 2 max pooling layer after the channel dimensions are connected.

Further, in the above method for detecting the high quality of the built-up area suitable for the large-area high-resolution satellite image, the feature fusion module sequentially passes through a 1×1 convolution layer, a global max pooling layer, a full connection layer and a sigmoid function on the input low-layer features, and then obtains the fused features with the input high-layer features through point multiplication operation.

Further, the foregoing method for high quality detection of built-on areas suitable for large-area high-resolution satellite images, the three sequentially connected depth-separable convolutions comprise one 1×1 convolution and two 3×3 convolutions for extracting features of different levels.

Further, the step S4 includes the following sub-steps:

s4.1, traversing image blocks of various structural modes for each pixel of an image in a region to be detected, if one image block comprises the pixel, determining that the image block covers the pixel, and defining all image block sets covering the pixel as context image blocks of the pixel;

s4.2, sorting all membership degrees of the context image block of the pixel, and determining a maximum value and a minimum value of the membership degrees;

s4.3, judging whether the minimum value of the membership degree obtained in the step S4.2 is larger than a preset membership degree threshold value or not based on the preset membership degree threshold value, if yes, marking the pixel as a built area class, otherwise, judging whether the maximum value of the membership degree obtained in the step S4.2 is smaller than the preset membership degree threshold value or not, if yes, marking the pixel as a non-built area class, otherwise, marking the pixel as a boundary pixel;

s4.4, carrying out 2X 2 up-sampling on each block in the context image blocks aiming at the boundary pixels to obtain four image blocks with the size of S/2X S/2, then obtaining the membership degree of each image block by the context image blocks after up-sampling according to the step S3, and then returning to the step S4.1 to the step S4.3 to mark the image blocks after up-sampling again and update the boundary pixels;

s4.5, calculating the membership degree of any updated boundary pixel and the corresponding up-sampled context image block in a distance weighting mode, wherein the membership degree is represented by the following formula:

v=v _i ×exp（-d _i ² ）/∑ exp（-d _i ² ），

in the formula, v _i Membership degree, d, of the ith context image block of the boundary pixel _i The distance between the center pixel of the ith context image block and the boundary pixel is defined, and v is the weighted membership degree of the boundary pixel;

s4.6, judging whether the weighted membership of any updated boundary pixel is greater than a preset membership threshold, if so, marking the boundary pixel as a built-up area class, otherwise, marking the boundary pixel as a non-built-up area class;

and S4.7, merging the pixels of the built-in area class and the non-built-in area class obtained in the step S4.3 and the pixels of the built-in area class and the non-built-in area class obtained in the step S4.6 to obtain a high-quality built-in area detection result of the final pixel level of the research area image.

Further, in the method for detecting the high quality of the built-up area suitable for the large-area high-resolution satellite image, the preset membership threshold is 0.5.

Compared with the prior art, the invention has the following beneficial effects: the invention designs a lightweight convolutional neural network integrating multi-level features, which overcomes the defects of long training period and large required sample data size caused by excessive parameters of a general convolutional neural network, and the integration of the multi-level features enables the network to have higher detection precision. The invention provides an image processing and integrated classification recognition framework based on multi-mode division, and can utilize multi-scale and multi-directional context information for each pixel, and realize pixel-level classification through a class label of an integrated multi-mode block, so that the detection result of a built-up area is finer and complete.

Drawings

Fig. 1 is a flow chart of the method of the present invention.

FIG. 2 is a block diagram of a lightweight convolutional neural network incorporating multi-level features.

Detailed Description

For a better understanding of the technical content of the present invention, specific examples are set forth below, along with the accompanying drawings.

Aspects of the invention are described herein with reference to the drawings, in which there are shown many illustrative embodiments. The embodiments of the present invention are not limited to the embodiments described in the drawings. It is to be understood that this invention is capable of being carried out by any of the various concepts and embodiments described above and as such described in detail below, since the disclosed concepts and embodiments are not limited to any implementation. Additionally, some aspects of the disclosure may be used alone or in any suitable combination with other aspects of the disclosure.

As shown in fig. 1, the flow chart of the present invention is suitable for a method for detecting the high quality of a built-up area of a large-area high-resolution satellite image, which comprises the following steps S1 to S5:

s1, determining the size of a sample image according to the relation between the size of a building to be detected and the resolution of an image, and manufacturing a high-resolution built-in area and a non-built-in area sample set which cover a diversified scene at the block level and corresponding labels.

The size of a preset sample image is s multiplied by s, the actual length of a ground building is L, the resolution of an image is R, and the following formula is adopted:

s=L/R

the size of the sample image is determined.

For example, the average length of a building in a city is 80-120m, the resolution of the image is 2m, and s is preferably in the range of 40-60 m.

,

The built-up area class of the sample set covers various types of artificial geographic elements such as living areas, commercial areas, industrial areas and the like, and the non-built-up area class covers various types of natural surface elements such as water bodies, vegetation, farmlands, bare lands and the like; and performing operations such as multi-angle rotation, multi-scale resampling, local replacement, recombination and the like on the sample image, so as to further increase the difference of the samples.

the lightweight convolutional neural network structure integrating the multi-level features is shown in fig. 2:

the lightweight convolutional neural network model fusing the multi-level features comprises a 3X 3 convolutional layer, a 2X 2 max pooling layer, three feature extraction modules, two feature fusion modules, a global average pooling layer and a full connection layer for generating classification results.

The three feature extraction modules include two paths: the first path is a 1 x 1 convolutional layer, which is used to accept the underlying features; the second path includes a depth separable convolution connected by three sequences, and the feature map output by the two paths outputs the final feature after passing through a 2 x 2 max pooling layer after the channel dimensions are connected.

The feature fusion module obtains fused features with input high-level features through dot multiplication operation after the input low-level features sequentially pass through a 1 multiplied by 1 convolution layer, a global maximum pooling layer, a full connection layer and a sigmoid function.

The three sequentially connected depth separable convolutions include one 1 x 1 convolution and two 3 x 3 convolutions for extracting features of different levels.

S3, dividing the image to be detected in multiple structural modes to obtain corresponding image block data sets in each structural mode; the multi-structure mode division of the band detection zone includes:

defining a primary grid: namely, covering a regular square grid of an image of a region to be detected, and presetting the size of square grid units;

rotating the original grid around the center of the original grid by 45 degrees clockwise or anticlockwise to correspondingly obtain a rotating grid;

cutting the image of the region to be detected by using the original grid and the rotary grid respectively, and correspondingly obtaining an original grid image block and a rotary grid image block respectively;

respectively carrying out t times of equal-interval translation on the original grid and the rotary grid along the horizontal direction of the image of the area to be detected, wherein the translation interval is s/t, carrying out one time of regular grid partitioning on the covered image of the area to be detected once in each translation, and correspondingly obtaining image blocks generated by the division of the translation grid as an original grid horizontal translation image block and a rotary grid horizontal translation image block respectively;

and respectively translating the original grid and the rotary grid downwards along the vertical direction of the image of the region to be detected and dividing the image into blocks, and correspondingly obtaining an original grid vertical translation image block and a rotary grid vertical translation image block.

And S4, performing feature calculation on the image blocks of the multiple structural modes of the region to be detected by using the image block membership degree representation model obtained in the step S2 to obtain membership degrees of the built region to which each image block belongs. The method specifically comprises the following substeps:

s4.3, based on a preset membership threshold, setting the membership threshold to be 0.5, judging whether the minimum value of the membership obtained in the step S4.2 is larger than the preset membership threshold by 0.5, if so, marking the pixel as a built-up area class, otherwise, judging whether the maximum value of the membership obtained in the step S4.2 is smaller than the preset membership threshold by 0.5, if so, marking the pixel as a non-built-up area class, otherwise, marking the pixel as a boundary pixel;

v=v _i ×exp（-d _i ² ）/∑ exp（-d _i ² ），

While the invention has been described in terms of preferred embodiments, it is not intended to be limiting. Those skilled in the art will appreciate that various modifications and adaptations can be made without departing from the spirit and scope of the present invention. Accordingly, the scope of the invention is defined by the appended claims.

Claims

1. A method for high quality detection of a built-up area suitable for large area high resolution satellite images, comprising the steps of:

s3, dividing the image to be detected in multiple structural modes to obtain corresponding image block data sets in each structural mode; the method specifically comprises the following substeps:

s3.5, respectively translating the original grid and the rotary grid downwards along the vertical direction of the image of the region to be detected and dividing the image into blocks, and correspondingly obtaining an original grid vertical translation image block and a rotary grid vertical translation image block;

s4, performing feature calculation on the image blocks of the multiple structural modes of the region to be detected by using the image block membership degree representation model obtained in the step S2 to obtain membership degrees of the built region to which each image block belongs; traversing image blocks of multiple structural modes of each pixel of an image of a region to be detected, determining a pixel type label by combining the membership degree of each image block to a built-up region, and obtaining a final pixel level high-quality built-up region detection result; the method specifically comprises the following substeps:

v＝v _i ×exp(-d _i ² )/∑exp(-d _i ² )，

2. The method of high quality detection of a built-on area suitable for large area high resolution satellite images according to claim 1, wherein in step S1 the size of the sample image is determined based on the relation between the detectable building size and the image resolution in such a way that: the size of a preset sample image is s multiplied by s, the actual length of a ground building is L, the resolution of an image is R, and the following formula is adopted:

s＝L/R

determining the size of the sample image;

{(X _i ，Y _i )|Y _i ＝0，1；i＝1，2，…，N}，

wherein X is _i For sample image block, Y _i For its corresponding label, a value of 1 corresponds to the built-up area class, a value of 0 corresponds to the non-built-up area class, and N is the number of samples.

3. The method according to claim 2, wherein in step S2, the lightweight convolutional neural network model for merging multi-level features comprises a 3×3 convolutional layer, a 2×2 max pooling layer, three feature extraction modules, two feature fusion modules, a global averaging pooling layer, and a fully connected layer for generating classification results.

4. A method of high quality detection of a built-on area suitable for large area high resolution satellite imagery according to claim 3, wherein said three feature extraction modules comprise two paths: the first path is a 1 x 1 convolutional layer for accepting low level features; the second path includes a depth separable convolution connected by three sequences, and the feature map output by the two paths outputs the final feature after passing through a 2 x 2 max pooling layer after the channel dimensions are connected.

5. A method for high quality detection of a built-up area suitable for large area high resolution satellite images according to claim 3, wherein the feature fusion module obtains the fused feature with the input high level feature by dot multiplication operation after passing the input low level feature through a 1 x 1 convolution layer, a global max pooling layer, a full connection layer and a sigmoid function in sequence.

6. The method of high quality detection of a built-on area suitable for large area high resolution satellite imagery according to claim 4, wherein said three sequentially connected depth separable convolutions include one 1 x 1 convolution and two 3 x 3 convolutions for extracting features of different levels.

7. The method for high quality detection of a built-on area suitable for large area high resolution satellite images according to claim 1, wherein the preset membership threshold is 0.5.