CN112990041B

CN112990041B - Remote sensing image building extraction method based on improved U-net

Info

Publication number: CN112990041B
Application number: CN202110319351.9A
Authority: CN
Inventors: 李林宜; 姚远; 孟令奎
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2022-10-18
Anticipated expiration: 2041-03-25
Also published as: CN112990041A

Abstract

The invention provides a remote sensing image building extraction method based on improved U-net, which comprises the steps of carrying out improved U-net network training by utilizing a training sample image set and a label set to obtain a trained improved U-net network; the improved U-net network is characterized in that a residual error module, a middle transition bridging module and an inclusion type maximum pooling module are introduced on the basis of a U-net deep neural network; building extraction is carried out on the processed multispectral remote sensing image by utilizing a trained improved U-net network, and a building extraction probability result graph is obtained; and according to a preset threshold value, after a building extraction result probability graph is obtained, setting a gray threshold value T, carrying out building extraction result binarization, and obtaining a building extraction result. The invention provides an improved U-net network structure for building extraction by utilizing the spatial characteristics and spectral characteristics of buildings in remote sensing images, the network has better building extraction capability, and the accuracy and the effect of building extraction in the remote sensing images are high.

Description

Remote sensing image building extraction method based on improved U-net

Technical Field

The invention belongs to the field of crossing of a remote sensing technology and a computer vision technology, and relates to a remote sensing image building extraction method based on an improved U-net.

Background

The building is the main environment of people working and living, occupies a large amount of urban land, and building monitoring has important significance for urban development and people life. Building extraction is a key technology for building monitoring, and with the continuous development of remote sensing technology, building extraction by using remote sensing images becomes an important and efficient technical means, however, due to the uncertainty of the remote sensing images and the complexity of buildings, the difficulty of building extraction is high, and the precision is not high. The remote sensing image building extraction has great significance for the aspects of city planning, ecological environment protection, resource development and the like.

Disclosure of Invention

The invention provides a remote sensing image building extraction method based on an improved U-net, and aims to solve the problems of high building extraction difficulty and low precision in the technical field of remote sensing.

The technical scheme of the invention is a remote sensing image building extraction method based on improved U-net, which comprises the following steps:

step 1, performing improved U-net network training by using a training sample image set and a label set to obtain a trained improved U-net network;

the improved U-net network is characterized in that a residual error module, a middle transition bridging module and an inclusion type maximum pooling module are introduced on the basis of a U-net deep neural network;

step 2, building extraction is carried out on the processed multispectral remote sensing image by utilizing the improved U-net network trained in the step 1, and a building extraction probability result graph is obtained;

and step 3, carrying out building extraction result binarization according to a preset threshold value T to obtain a building extraction result, wherein the realization method comprises the following steps,

and (3) after the probability graph of the building extraction result obtained after the processing in the step (2) is obtained, identifying that the building with the gray scale larger than T is represented by 1, and identifying that the building with the gray scale smaller than T is represented by 0, and binarizing the building extraction result to obtain the building extraction result.

In step 1, preprocessing the training image and the label, and uniformly scaling the training image and the label to the size of M × M pixels to obtain a training sample adaptive to a network architecture;

and inputting a training sample to carry out improved U-net network training to obtain a trained improved U-net network.

Moreover, the image scaling interpolation mode adopted in the training image and label preprocessing in the step 1 is bilinear interpolation, and the value of M is 128.

Moreover, the input characteristic dimension of the improved U-net network is 128 × 128 × 3, which corresponds to the RGB three bands of the training image, and the output characteristic dimension is 128 × 128 × 1.

Moreover, the improved U-net network adopts a three-layer network architecture, and comprises 3 down-sampling processes and 3 up-sampling processes.

Moreover, the improved U-net network uses an inclusion-type max pooling module in downsampling, including performing max pooling operations with pore size =2,3,5,7, respectively, performing Concat splicing operations on the obtained 4 feature maps, and then restoring the merged feature maps to the original size by 1 × 1 convolution and using a Relu activation function.

And the improved U-net network adopts a convolution residual module for 3 times in the encoding process, and comprises the steps of setting input characteristics as input, firstly carrying out 2 times of convolution operation with convolution kernel size of 3 multiplied by 3, batch standardization and Relu function activation operation on the input characteristics input to obtain characteristics x, then carrying out addition operation on the input characteristics input and x, and finally carrying out convolution operation with convolution kernel size of 3 multiplied by 3, batch standardization and Relu function activation operation on the added characteristics.

And the improved U-net network obtains output in a middle transition bridging process by adopting a 4-time void convolution combined dense connection mode, and comprises the steps of firstly performing a void convolution operation with a relation of relationship =1 on input features in a middle transition bridging module, performing batch standardization and Relu function activation, adding the result and the input features, performing a void convolution operation with a relation of relationship =2, then respectively performing the void convolution with a relation of relationship =4 and 8 and batch standardization and Relu function activation according to the rule, and finally adding the obtained 4 features and the input features to obtain final output features.

Furthermore, step 2 is implemented as follows,

firstly, preprocessing a remote sensing image, inputting the processed remote sensing image into a trained network in a blocking manner, and then splicing to obtain a building extraction probability map; the pre-processing includes an image normalization operation.

In step 3, the threshold T used for building extraction binarization is preset to be 0.5.

The invention utilizes the space form characteristic and the spectral characteristic of the building, utilizes the improved U-net deep learning network and effectively extracts the building through a plurality of steps, and obtains the ideal building extraction effect. The building extraction in the remote sensing image has high precision and good effect. The method fully considers various problems in the building extraction process, and extracts the building through the improved U-net deep neural network and the related image processing technology. The invention can obtain more accurate and complete building extraction results.

Drawings

Fig. 1 is a flow chart of a remote sensing image building extraction method based on improved U-net according to an embodiment of the present invention.

Fig. 2 is a flow chart of the steps of the improved U-net network training of the embodiment of the present invention.

Fig. 3 is a flow chart of the improved U-net building extraction steps of an embodiment of the present invention.

Fig. 4 is a flowchart of the building identification result binarization step according to the embodiment of the invention.

Fig. 5 is an overall architecture diagram of an improved U-net network according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of an inclusion-type max-pooling module in the improved U-net according to an embodiment of the present invention.

Fig. 7 is a schematic diagram of residual modules in the improved U-net according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of an intermediate transition bridge module in the improved U-net according to the embodiment of the present invention.

Detailed Description

The invention provides a remote sensing image building extraction method and system based on improved U-net, which take the form and spectral characteristics of remote sensing image buildings into consideration. U-Net is one of the older algorithms for semantic segmentation by using a full convolution network, and has innovativeness at that time through the symmetrical U-shaped structures of a compression path and an expansion path, and influences the design of a plurality of subsequent segmentation networks to a certain extent, and the name of the network is also taken from the U-shaped structure. However, the application of U-net directly in the technical problem to be solved by the present invention is not effective and further research is required.

In order to solve the problems of high difficulty and low precision of building extraction in the technical field of remote sensing, the invention provides an improved U-net-based remote sensing image building extraction method, which designs an improved U-net network structure suitable for building extraction by utilizing the spatial characteristics and spectral characteristics of buildings in remote sensing images and stronger learning and generalization capabilities of a deep neural network, wherein the network has better building extraction capability, and the building extraction precision in the remote sensing images is high, and the effect is good.

Referring to fig. 1, the embodiment of the method for extracting a remote sensing image building based on an improved U-net, which takes a structure in a pearl river valley as an example, specifically describes the process of the invention as follows:

after a training sample image set and a label set are selected, carrying out optimization and improvement by taking U-net as a basic network architecture to obtain an improved U-net network;

preprocessing a training image and a label, uniformly scaling the training image and the label to a preset pixel size (marked as M multiplied by M pixel size), and obtaining a training sample adaptive to a network architecture;

and finally, inputting a training sample to carry out improved U-net deep neural network training to obtain a trained improved U-net deep neural network.

Preferably, the image scaling interpolation method in the training image and label preprocessing is bilinear interpolation, and the value of M is 128.

The specific procedures of the examples are illustrated below:

firstly, carrying out image scaling processing on a selected training sample image set and a label set by using a bilinear interpolation mode, uniformly scaling the training image and the label to an image size of 128 x 128 to obtain a training image and a label of an adaptive network, and then inputting the training sample into an improved U-net deep neural network with a network architecture and training related parameters set for training to obtain the trained improved U-net deep neural network, wherein the overall architecture of the improved U-net deep neural network is shown in fig. 5. The invention provides an introduced residual error module, a middle transition bridging module and an inclusion type maximum pooling module on the basis of the existing U-net deep neural network. Preferably, the improved U-net adopts a three-layer network architecture design, i.e., 3 downsampling and 3 upsampling processes are performed.

In the aspect of network architecture design, the input feature dimension of the improved U-net deep neural network is 128 × 128 × 3, the input feature dimension corresponds to RGB three bands of a training image respectively, the output feature dimension is 128 × 128 × 1, the three-layer network architecture design is adopted according to the label image size of the training image, the input image is subjected to multi-scale maximum pooling operation of pore size =2,3,5 and 7 based on the idea of the Incep network design in a network, the obtained 4 feature maps are subjected to Concat splicing operation, and then the merged feature maps are restored to the original size by adopting a Relu activation function through 1 × 1 convolution, so that the downsampling operation based on the idea of Incep is completed. The downsampling process based on the inclusion idea integrates the multi-scale features, and is more beneficial to extracting more comprehensive image features, and the specific architecture of the inclusion type maximum pooling module is shown in fig. 6. Where deep learning terms are involved, those skilled in the art will understand that pool size refers to window size, inclusion is a network model.

And then carrying out residual error module processing, wherein a residual error module in the improved U-net comprises 3 times of convolution operation, and an input characteristic is set as input, firstly, carrying out 2 times of convolution operation with a convolution kernel size of 3 x 3, batch standardization (BN) and Relu function activation operation on the input characteristic input to obtain a characteristic x, then carrying out Addition (ADD) operation on the input characteristic input and the input characteristic x, and finally carrying out convolution operation with a convolution kernel size of 3 x 3, batch standardization and Relu function activation operation on the added characteristic, thereby finishing the residual error module processing operation. The residual error module can effectively solve the problems of gradient diffusion and network degradation, and the specific architecture of the residual error module is shown in fig. 7.

The method comprises the steps that after 3 times of downsampling and residual error module processing based on an integration idea are repeated on an input image, output features of 64 x 64, 32 x 128 and 16 x 256 are obtained respectively, then the input image enters an intermediate transition bridging module, the improved U-net obtains output in a 4-time cavity convolution and dense connection mode in the intermediate transition bridging process, firstly, cavity convolution operation with an expansion rate of partition =1 is conducted on the input features, batch standardization and Relu function activation are conducted, the result and the input features are added, cavity convolution operation with a partition rate of =2 is conducted, then, cavity convolution with partition rates =4 and 8 and batch standardization and Relu function activation operation are conducted according to the rule, finally, the obtained 4 features and the input features are added to obtain output features of 16 x 256, and combination of dense connection can obtain a larger receptive field, accordingly, the effect of object identification segmentation is improved, transfer of the features and gradients is enabled to be more effective, and network training is easier to conduct. The specific architecture of the intermediate transition bridge module based on the hole convolution and the dense connection is shown in fig. 8.

Performing Concat splicing operation on the output features and the image features of the previous layer of downsampling, performing upsampling to obtain 32 × 32 × 256 output features, performing convolution operation with a convolution kernel size of 3 × 3 and Relu function activation operation twice to obtain 32 × 32 × 128 output features, repeating the feature splicing and upsampling processes to obtain 64 × 64 × 128 output features, performing convolution operation with a convolution kernel size of 3 × 3 and Relu function activation operation twice to obtain 64 × 64 × 64 × 64 output features, performing feature splicing and upsampling processes to obtain 128 × 128 × 64 output features, performing convolution operation with a convolution kernel size of 3 × 3 and convolution layers of 32 and 2 and Relu function activation operation twice, and performing convolution operation with a convolution kernel size of 1 × 1 and convolution operation with a convolution layer size of 1 and Sigmoid function activation operation once to obtain 128 × 128 × 1 output features. Namely the final probability output result of the improved U-net.

Preferably, in the aspect of setting related parameters of network training, since the number of building targets and non-building targets in the remote sensing image is not balanced, and Dice loss is beneficial to solving the problem of data balance, the improved U-net adopts Dice loss as a loss function in the training process, and the adam optimizer has the advantage of high convergence speed, so adam is adopted as the optimizer, accuracy is adopted as an evaluation target, and batch size) =4, eopichs (iteration number) =16.

The Dice loss is a loss function widely applied to the field of image segmentation, and is used for measuring the image classification precision, and the expression is as follows:

p _i predicted probability value, g, for the ith pixel in the image _i Is the true value of the ith pixel in the image, and N is the total number of pixels of the image.

As shown in fig. 2, the specific process includes designing an improved U-net architecture and related parameters for training, preprocessing a training image and a label to obtain a training sample adapted to the network, inputting the training sample to perform an improved U-net training, and obtaining a trained improved U-net deep learning network.

Step 2, firstly, image preprocessing is carried out on the multispectral remote sensing image to be extracted, then the processed multispectral remote sensing image is input into the improved U-net network trained in the step 1 in a blocking mode to carry out building extraction, and then image splicing operation is carried out, so that a building extraction probability result graph is obtained;

the specific procedures of the examples are illustrated below:

taking the extraction of buildings in the Zhujiang river basin as an example, firstly, preprocessing a remote sensing image to be subjected to building extraction in the Zhujiang river basin, wherein the preprocessing comprises an image normalization operation, and the normalization operation mode is as follows: and X _ nor = X/255, performing image normalization, inputting the processed remote sensing image into a trained improved U-net network in a 128X 128 block mode, and splicing all blocks to obtain a complete building extraction probability result diagram.

As shown in fig. 3, the specific process is that, firstly, the multispectral remote sensing image of the pearl river basin is preprocessed, improved U-net building identification is carried out, and finally, a building identification probability map is obtained.

Step 3, after the building extraction probability graph obtained in the step 2 is obtained, setting a threshold value, and carrying out building extraction result binarization to obtain a building extraction result;

the specific procedures of the examples are illustrated below:

taking the architecture extraction of the Zhujiang river basin as an example, after the probability map of the architecture extraction result obtained after the processing of the step 2 is obtained, setting a gray threshold value T, identifying that the architecture is represented by 1 if the gray is greater than T, and representing that the architecture is not represented by 0 if the gray is less than T, carrying out binarization on the architecture extraction result to obtain the architecture extraction result of the Zhujiang river basin, wherein the threshold value T is preferably 0.5.

As shown in fig. 4, the specific process is to set a threshold value to perform image binarization on the building identification probability map, and finally obtain the building extraction result in the zhujiang river basin.

In specific implementation, a person skilled in the art can implement the automatic operation process by using a computer software technology, and a system device for implementing the method, such as a computer-readable storage medium storing a corresponding computer program according to the technical solution of the present invention and a computer device including a corresponding computer program for operating the computer program, should also be within the scope of the present invention.

In some possible embodiments, the remote sensing image building extraction system based on the improved U-net is provided, and comprises the following modules,

the first module is used for carrying out improved U-net network training by utilizing the training sample image set and the label set to obtain a trained improved U-net network;

the second module is used for extracting the buildings from the processed multispectral remote sensing images by utilizing the improved U-net network trained by the first module to obtain a building extraction probability result graph;

a third module for carrying out building extraction result binarization according to a preset threshold value T to obtain a building extraction result, the realization method is as follows,

and after a building extraction result probability graph obtained after the second module processing is obtained, identifying that the building is represented by 1 when the gray scale is greater than T, and represented by 0 when the gray scale is less than T, and binarizing the building extraction result to obtain the building extraction result.

In some possible embodiments, the remote sensing image building extraction system based on the improved U-net comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the memory to execute the remote sensing image building extraction method based on the improved U-net.

In some possible embodiments, a remote sensing image building extraction system based on the improved U-net is provided, which includes a readable storage medium, on which a computer program is stored, and when the computer program is executed, the remote sensing image building extraction system based on the improved U-net is implemented.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A remote sensing image building extraction method based on improved U-net is characterized by comprising the following steps:

the improved U-net network adopts an inclusion type maximum pooling module in down-sampling;

the improved U-net network adopts a residual error module in the encoding process;

the improved U-net network adopts a three-layer network architecture, input images are processed by an inclusion type maximum pooling module and a residual error module 3 times repeatedly, corresponding output characteristics are obtained respectively, and then the input images enter an intermediate transition bridging module; the intermediate transition bridge module obtains output by adopting a 4-time cavity convolution combined dense connection mode;

step 2, carrying out building extraction on the processed multispectral remote sensing image by using the improved U-net network trained in the step 1 to obtain a building extraction probability result graph;

2. The remote sensing image building extraction method based on the improved U-net according to claim 1, characterized in that: in the step 1, preprocessing a training image and a label, uniformly scaling the training image and the label to the size of M multiplied by M pixels to obtain a training sample adaptive to a network architecture;

3. The remote sensing image building extraction method based on the improved U-net as claimed in claim 2, wherein: the image scaling interpolation mode adopted in the training image and label preprocessing in the step 1 is bilinear interpolation, and the value of M is 128.

4. The remote sensing image building extraction method based on the improved U-net according to claim 3, characterized in that: the input characteristic dimension of the improved U-net network is 128 multiplied by 3, the input characteristic dimension corresponds to RGB three bands of the training image respectively, and the output characteristic dimension is 128 multiplied by 1.

5. The remote sensing image building extraction method based on the improved U-net according to claim 1, 2,3 or 4, characterized in that: the improved U-net network adopts a three-layer network architecture and comprises 3 down-sampling processes and 3 up-sampling processes.

6. The remote sensing image building extraction method based on the improved U-net according to claim 5, characterized in that: the improved U-net network adopts an inclusion type maximum pooling module in down-sampling, and comprises the steps of performing maximum pooling operation by respectively using pool size =2,3,5 and 7, performing Concat splicing operation on the obtained 4 feature maps, and recovering the combined feature maps to the original size by adopting a Relu activation function through 1 × 1 convolution.

7. The remote sensing image building extraction method based on the improved U-net according to claim 5, characterized in that: the improved U-net network adopts a convolution residual module for 3 times in the encoding process, and comprises the steps of setting input characteristics as input, firstly carrying out 2 times of convolution operation with convolution kernel size of 3 multiplied by 3, batch standardization and Relu function activation operation on the input characteristics input to obtain characteristics x, then carrying out addition operation on the input characteristics input and x, and finally carrying out convolution operation with convolution kernel size of 3 multiplied by 3, batch standardization and Relu function activation operation on the added characteristics.

8. The remote sensing image building extraction method based on the improved U-net according to claim 5, characterized in that: the improved U-net network obtains output in a mode of combining 4 times of void convolution with dense connection in the middle transition bridging process, and comprises the steps of firstly performing void convolution operation with a differentiation rate =1 on input features in a middle transition bridging module, performing batch standardization and Relu function activation, adding results and the input features to perform void convolution operation with a differentiation rate =2, then respectively performing void convolution with a differentiation rate =4 and 8 and batch standardization and Relu function activation according to the rule, and finally adding the obtained 4 features and the input features to obtain final output features.

9. The remote sensing image building extraction method based on the improved U-net according to claim 1, 2,3 or 4, characterized in that: the step 2 is realized as follows,

firstly, preprocessing a remote sensing image, inputting the processed remote sensing image into a trained network in a blocking mode, and then splicing to obtain a building extraction probability map; the pre-processing includes an image normalization operation.

10. The remote sensing image building extraction method based on the improved U-net according to claim 1, 2,3 or 4, characterized in that: in the step 3, the preset value of the threshold value T adopted by the building extraction binarization is 0.5.