CN111582194B

CN111582194B - Multi-temporal high-resolution remote sensing image building extraction method based on multi-feature LSTM network

Info

Publication number: CN111582194B
Application number: CN202010395467.6A
Authority: CN
Inventors: 顾玲嘉; 王钰涵; 任瑞治
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2020-05-12
Filing date: 2020-05-12
Publication date: 2022-03-29
Anticipated expiration: 2040-05-12
Also published as: CN111582194A

Abstract

The invention discloses a multi-temporal high-resolution remote sensing image building extraction method based on a multi-feature LSTM network, and belongs to the technical field of satellite remote sensing image processing and application. The method aims to solve the problems of low accuracy rate of building extraction results, high error rate, fuzzy boundaries and the like in the existing method. The invention adopts a plurality of multi-temporal high-resolution two-number remote sensing images as a data source, extracts the spectral characteristics of a building by using a method based on HSI color transformation, extracts the shape characteristics of the building by using a method combining graph segmentation and conditional random field post-processing, extracts the texture information characteristics of the building by using a method based on Gabor wavelet transformation and extracts the index characteristics of the building by using a method based on DSBI index, forms a building characteristic set with 60 characteristic wave bands by using the extracted spectrum, shape, texture and index characteristics of the multi-temporal building, sends a manufactured building sample and a label into an LSTM network to obtain a crude building extraction result, and obtains a final result after morphological processing.

Description

Multi-temporal high-resolution remote sensing image building extraction method based on multi-feature LSTM network

Technical Field

The invention belongs to the technical field of satellite remote sensing image processing and application.

Background

The building is a sort of ground object which is most widely distributed in cities, so the building is one of important indexes for evaluating the economic development of the cities. Nowadays, cities are continuously developed, buildings are continuously increased, and the manufacturing materials and the appearance shapes of the buildings are changed day by day, so that the accuracy of the building extraction result obtained by using the traditional method is poor. For a long time, in order to extract buildings in cities, a method of manually measuring and drawing city building topographic maps by mappers is mainly adopted for building extraction, although the method has certain authenticity, the time cost and the labor cost of the method are huge, and the building extraction result can be different because different mappers have different ground feature cognitive abilities. With the rapid development of the remote sensing technology and the deep learning technology, the spatial resolution and the time resolution of a remote sensing image are continuously improved, a deep learning network model is continuously optimized, and a manual surveying and mapping method, high-resolution remote sensing data and the deep learning technology are combined, so that a building extraction result with high accuracy can be obtained more efficiently. At present, the building extraction method of remote sensing images mainly includes two categories, namely a building extraction method based on spectrum and morphological index and a building extraction method based on deep learning, and the representative research is as follows:

in a building extraction study using spectral and morphological indices, Jin and Davis propose to extract buildings by using the spectral features of the highlights of the buildings by combining image spectral information and morphological detail information with each other. (see automatic Building Extraction from High-resolution software image in urea Areas Using Structural); then, yellow et al propose a method for extracting the building roof in a high-resolution image by using a Morphological Building Index (MBI), wherein the MBI index method integrates spectral information, geometric information and context information, and yellow et al obtain a satisfactory building roof extraction result based on the method. (see A New Building Extraction Postprocessing Framework for High-Spatial-Resolution Remote-Sensing image); the building in the high-resolution image is extracted by combining the morphological index and the object-oriented method by taking the building object as a unit in the Reineckia Linzhou and Zhang Junxian, and finally an excellent extraction result is obtained. (see object-oriented morphological building index and its high resolution remote sensing image building extraction application)

In the building extraction research based on deep learning, Kaiqiang Chen et al designed a 27-layer deep convolutional neural network with convolution and deconvolution to extract buildings pixel by pixel with the pixel as the minimum unit according to the characteristics of various and complex appearances of the buildings. (see, the magnetic Segmentation of agricultural Images With marketing consistent Neural Networks); on the basis of a U-Net model, the Wuguang and Chenqi improve and design a novel full convolution neural network, and improve a building extraction result in an aerial remote sensing image. (see aerial image building detection based on U-shaped convolutional neural networks); yongyang Xu combines ResNet and a guide filter to well complete the building extraction task, and verifies the effectiveness of ResNet in the building extraction task. (see Building extraction in very high resolution sensing image using deep learning and guiding filters)

To date, scholars at home and abroad propose a plurality of building extraction methods, but still have some obvious defects: (1) due to the fact that spectral characteristics of the building and the road are similar, the road and the building cannot be effectively distinguished, and therefore the accuracy rate of building extraction results is low and the boundary is fuzzy; (2) due to the interference of building shadows and vegetation, a plurality of non-buildings are wrongly divided into buildings, so that the extraction error rate of the buildings is high; (3) at present, most of data sources of building extraction research based on deep learning are single-time phase data sets only containing red, green and blue three-channel information in the early stage, and the spectral characteristics and multi-time-phase characteristics of remote sensing images are ignored.

Abundant ground feature detail information in the high-resolution remote sensing image provides a reliable data source for accurately extracting the building, and the influence of a foreign matter co-spectrum phenomenon on building extraction can be reduced; the multi-temporal remote sensing image containing the time sequence information has advantages in city change detection and building extraction, and the influence of building shadows and vegetation on building extraction can be reduced by using the multi-temporal high-resolution remote sensing image as a data source; different from the traditional building extraction method based on the deep learning of the single-temporal remote sensing image, the method utilizes the long-time memory (LSTM) network sensitive to the time sequence change to extract the building, and the network can fully utilize the spectral characteristics and the multi-temporal characteristics of the multi-temporal high-resolution remote sensing image, thereby being beneficial to the accurate extraction of the building.

Disclosure of Invention

The method aims to solve the problems of low accuracy, high error rate, fuzzy boundary and the like of a building extraction method using a single-time-phase high-resolution remote sensing image. The method comprises the steps of extracting a building by using a method of combining a multi-feature LSTM network with Q (Q is more than or equal to 3) multi-temporal high-resolution remote sensing images, extracting spectral features of the building by using a method based on HSI color transformation, extracting shape features of the building by using a method of combining graph segmentation with conditional random field post-processing, extracting texture information features of the building by using a method based on Gabor wavelet transformation and extracting index features of the building by using a method based on DSBI indexes, and extracting 10 feature wave bands in total; the extracted spectrum, shape, texture and index features of the Q-amplitude multi-temporal building form a building feature set with Q multiplied by 10 feature bands, the building feature set is respectively manufactured into a building sample and a label, then the building sample and the label are sent into an LSTM network to obtain a building crude extraction result, and an accurate building extraction result can be obtained after morphological processing, so that a reliable reference basis is provided for urban planning and construction.

The technical scheme adopted by the invention comprises the following specific steps:

step one, image preprocessing: firstly, Q (Q is more than or equal to 3) high-resolution binary (GF-2) data images are respectively subjected to independent radiometric calibration, image registration, atmospheric correction and image fusion, then multi-temporal remote sensing data are subjected to unified registration, and finally, images are cut to select required image areas.

(a) The method for radiometric calibration comprises the following steps: radiometric calibration is a process of converting a digital quantization value (DN) of an image into physical quantities such as a radiance value, a reflectivity, or a surface temperature. The Radiometric Calibration parameters are generally placed in a metadata file, and the parameters can be automatically read from the metadata file by using a Radiometric Calibration tool (Radiometric Calibration) of a complete remote sensing image processing platform (ENVI), so that Radiometric Calibration is completed.

(b) The image registration method comprises the following steps: images shot from different sensors and different angles can be deviated, and image registration is a process of matching and superposing two images of different shooting angles or different sensors to realize image unification. The Image of the GF-2 panchromatic sensor is selected as a reference Image, and automatic Registration of the panchromatic sensor Image and the multispectral sensor Image in the GF-2 is achieved by utilizing an Image Registration tool (Image Registration) of an integrated remote sensing Image processing platform (ENVI).

(c) The atmospheric correction method comprises the following steps: and (3) eliminating radiation errors caused by atmospheric influence by using an atmospheric correction tool (FLASSH) of a complete remote sensing image processing platform (ENVI), and inverting the real surface reflectivity of the ground object.

(d) The image fusion method comprises the following steps: the GF-2 image comprises a multispectral image and a panchromatic image with different spatial resolutions, the precision and the quality of the image are improved by utilizing a computer technology, so that the processed remote sensing image has detail texture information of the panchromatic image with high resolution and obvious spectral characteristics of the multispectral image, and the processing process is called as an image fusion process. A fused image with the spatial resolution of 1m is obtained by using an image fusion tool (NNDiffuse Pan Sharpening) of a complete remote sensing image processing platform (ENVI).

(e) The method for multi-temporal data image registration comprises the following steps: images shot from different times can be deviated, and the multi-temporal data image matching criterion is a process of matching and superposing two or more images with different times to realize image unification. Firstly, an Image in one time phase is selected as a reference Image, images in other time phases are registered as the reference Image by taking the reference Image as a standard, and automatic Registration of Q (Q is more than or equal to 3) multi-time-phase data images is realized by utilizing an Image Registration tool (Image Registration) of a complete remote sensing Image processing platform (ENVI).

(f) Image cropping is used to obtain a region of interest (ROI).

Step two, extracting the multi-temporal building features: building features are very important influencing factors in building extraction research. After the acquired high-resolution remote sensing image is preprocessed, building spectral features are extracted by using a method based on HSI color transformation, shape features of a building are extracted by using a method based on combination of graph segmentation and conditional random field postprocessing, texture information features of the building are extracted by using a method based on Gabor wavelet transformation, and index features of the building are extracted by using a method based on DSBI indexes.

(a) HSI color conversion: converting an RGB model of a single GF-2 image into an HIS model, wherein H represents hue, S represents saturation and I represents brightness, and deriving the HSI color conversion formula as follows (RGB in the formula is normalized to a [0,1] interval before calculation):

wherein R, G, B in formulas (1), (2) and (3) represent the red, green and blue bands of the original GF-2 image, respectively.

(b) The method based on the combination of graph segmentation and conditional random field post-processing comprises the following steps: after HSI transformation of the raw data, different types of features change significantly. The building shape features can be extracted by segmenting the image after S (saturation) transformation and I (brightness) transformation by a graph segmentation-based method. Graph-based segmentation methods work directly on data points in the feature space, implementing traditional single-chain clustering by generating a minimum spanning tree of data points, and then segmenting the image by deleting edges with lengths greater than a given threshold. Abstracting a remote sensing image G to be segmented by a weighted graphThe vertex set V and the edge set E form G ═ (V, E), the vertex V ∈ V, each pixel or each region in the image G is separately viewed as an independent vertex of the figure, and the edges connecting a pair of vertices (V, E)_i,v_j) Make it have weight w (v)_i,v_j). Side (v)_i,v_j) The weight of (a) is equal to the difference between two vertices connected by the edge, representing the dissimilarity between the vertices. The specific implementation method of graph segmentation comprises the following steps:

(1) the dissimilarity between each pixel point and its 4-neighborhood (excluding diagonal pixel points) or 8-neighborhood, i.e., their weights, is calculated.

(2) The edges of a pair of vertexes are sorted from small to large according to the dissimilarity degree to obtain e₁,e₂,...,e_NThe edge with the least dissimilarity is selected and merged into one partition.

(3) Judging the currently selected edge e_n(N ═ 2,3,. N) is combined. Let the vertex to which it is connected be (v)_i,v_j) Then, the following merging condition needs to be satisfied:

①v_iand v_jNot belonging to the same zone, i.e. Id (v)_i)≠Id(v_j)；Id(v_i) And Id (v)_j) Represents v_iAnd v_jThe area where the device is located;

the dissimilarity should not be greater than the dissimilarity between the two, i.e. w (v)_i,v_j)≤MInt(C_i,C_j)；

Wherein int (C) is an intra-class difference,

MST is a minimum spanning tree; MInt (C)_i,C_j) Indicates the minimum intra-class difference, C_iAnd C_jRepresenting any two regions.

(4) Update threshold and class tag number:

updating class label numbers: will Id (v)_i) And Id (v)_j) Has a label number of Id (v)_i)。

Updating the threshold for dissimilarity of the tag numberThe values are:

(5) and if N is less than or equal to N (N is 2, 3.., N), arranging the edges according to the sequence, then selecting the next edge and turning to the step (3), and if not, ending the step to obtain a segmentation result S based on the graph.

Because the feature class distribution is complex and partial areas are excessively segmented, a Conditional Random Field (CRF) is introduced to process the segmented image, and the segmentation accuracy is improved. The CRF can keep the classification result in the communication area consistent and protect the edge structure of the feature. The classification probability of a CRF is determined by both a univariate potential function and a binary potential function. Firstly, the segmentation type of the remote sensing image S (namely the image segmentation result) is recorded as A ═ a₁,a₂,…,a_lAnd the segmentation result is recorded as B ═ B₁,b₂,…,b_gAnd B is a probability undirected graph. The essence of the method for combining graph segmentation and CRF post-processing is that the graph segmentation image S is taken as a premise, the graph segmentation result is taken as a unitary potential function value of a conditional random field, and conditional probability values P (B ═ a) of m different classes of all pixel points are solved_kI S), k ═ 1,2,. m. P (B | S) can be represented as:

in the formula, μ represents all the maximum cliques on the undirected graph B, and Z is a normalization factor. Psi_μ(B_μ| S) is a function that satisfies the Gibbs distribution, the above equation can be converted as follows:

wherein the content of the first and second substances,

obtaining a unitary potential function of a segmentation result after independently considering self information of each pixel point, wherein a graph segmentation result B is used as an estimation value of the unitary potential function; phi is a₂The paired potential function represents the influence of the relationship between the position information and the color information of each pixel point on the segmentation, and can be obtained as follows:

in the formula, the first and second organic solvents are,

p_iand p_jRespectively representing the physical positions of the ith and jth pixels; theta_iAnd theta_jRepresenting the color vectors of the ith and jth pixels, respectively. Omega⁽¹⁾、ω⁽²⁾Respectively, the energy weight, λ (b)_i,b_j) Is a classification compatibility weight. Wherein ω is⁽¹⁾、ω⁽²⁾And the value of λ is the specific value that needs to be determined through extensive sample training. Alpha and beta represent the similarity and the approximation degree between the segmentation image and the original image respectively, alpha and beta are set values, the range is 0-100, preferably 80, gamma is a smoothing kernel parameter, and gamma is generally set to be 1. The CRF can make the classification result in the connected region consistent, and protect the edge structure of the ground object.

(c) Gabor wavelet transform: in image processing, a Gabor filter is often used to analyze image texture information and edge information, and the Gabor filter can detect whether an image has specific frequency content in a specific direction of a specific region, so the Gabor wavelet transform is of great significance to extract texture features of buildings. The Gabor filter can obtain texture and edge characteristic information in different directions in the remote sensing image, and the preprocessed high-resolution two-color red, green and blue three-band image data are grayed and then represented by Y: y ═ Y (i, j) |1 ≦ i ≦ M,1 ≦ j ≦ N }, where M is the height of the image and N is the width of the image; y (i, j) is an image pixel point, and the convolution of the grayed remote sensing image Y and the Gabor wavelet function can be expressed as:

G_v,uz＝Y(z)*g_v,uz………………………………………………… (9)

and (i, j) is the pixel coordinates of the image, and is a convolution operator. g_v,uz represents a multichannel Gabor wavelet function, v and u represent the scale and direction of the Gabor wavelet function, respectively, and the obtained result G_v,uz is Gabor feature value extraction data of the image.

(d) Building DSBI index extraction: due to spectral aliasing, other objects contained in the remote sensing image interfere with the building extraction. According to the characteristic that the difference index between the medium blue wave band and the red wave band of the high-resolution remote sensing image with 4 wave bands and the difference value between the blue wave band and the green wave band can maximize the gap between a building and a background as much as possible, the invention uses the difference spectrum construction index (DSBI) to carry out a building extraction experiment, reduces the interference of other buildings to the building, and has the formula as (10):

DSBI＝0.5(Blue-Green)+0.5(Blue-Red)………………………(10)

wherein "Blue", "Red" and "Green" in formula (10) represent spectral reflectance values in Blue, Red and Green wavelength bands, respectively.

Step three, constructing a multi-temporal multi-feature building data set: extracting spectral features of a building by using a red wave band, a green wave band, a blue wave band and a near infrared wave band (NIR) in each GF-2 original satellite data and a method of HSI color transformation, extracting shape feature wave bands of the building by using a method of combining graph segmentation and conditional random field post-processing, extracting texture feature wave bands of the building by using a Gabor wavelet transformation method, extracting index feature wave bands of the building by using a DSBI index-based method, and extracting 10 feature wave bands in total; after Q amplitude GF-2 original satellite data is processed, the data is arranged according to the time sequence for satellite shooting to form new data with the spatial resolution of 1m and with Q multiplied by 10 multi-temporal building characteristic wave bands.

Step four, selection of a training set: the building and the non-building have different characteristics, and the training samples and the labels are respectively selected according to the building and the non-building. And taking the pixel points as units, selecting 20% of the pixel points for training, verifying 20% of the pixel points, and testing the accuracy of the network model by using all the pixel points.

Step five, the optimal unit LSTM network-based building extraction method comprises the following steps: the LSTM network can model the time correlation and can dig the time trend from the time sequence.

The best cell LSTM network consists of a network of n LSTM cells, each LSTM cell connected to a Dropout random deactivation layer. After n LSTM cells, the output contains a fully connected layer to improve generalization capability. The last layer of the model is the output layer, which contains two neurons, corresponding to the probabilities of two classes (building and non-building), respectively. The multi-feature LSTM network is composed of a plurality of LSTM units, memory units are added in each nerve unit of the hidden layer by the LSTM units, and the memory and forgetting degree of information before and current information are controlled by an input gate, a forgetting gate and an output gate when the LSTM units are transmitted in the hidden layer, so that the LSTM network has a long-term memory function. The LSTM unit updates the memory state of the network to the information by using the output results of the forgetting gate and the input gate, and can decide which useful information should be discarded or kept in the training process through the forgetting gate. The input gate is responsible for processing the input information of the current sequence position and judging whether the information is reserved. While the function of the output gate is used to output the last retained information content. The specific construction algorithm is obtained based on the optimal unit LSTM network building extraction algorithm model:

(a) part 1 of constructing the LSTM network: part 1 is the input layer, and the input image size is m × (Q × 10) × 1, where m is the size of batch _ size, i.e. m pixels are selected from the training samples each time to perform the network training, m is generally set to 100, and Q × 10 is the total number of the bands of the input data.

(b) Part 2 of constructing the LSTM network: first, 1 Dropout random deactivation layer is connected behind 1 LSTM unit network structure to form a combination, and the combination is reused n times to form part 2 of the LSTM network structure. The number of the filters of a single LSTM unit network is 32, and the Dropout random inactivation layer randomly selects 50% of sample data in the network for inactivation treatment, so that overfitting can be effectively relieved, and the regularization effect is achieved to a certain extent.

(c) Part 3 of constructing the LSTM network: part 3 is a Full Connected layer (FC) with the number of filters being 64, data needs to be flattened and is transmitted into the Full Connected layer, the input of the Full Connected layer is mx320, and the output is mx64.

(d) Constructing part 4 of the LSTM network; in part 4, a Softmax classifier is used as an output layer, the number of output layers is m multiplied by 2, and the extraction results are divided into 2 types of extraction results of buildings and non-buildings.

(e) And outputting a crude extraction result of the multi-temporal high-resolution remote sensing image building by the multi-feature LSTM network.

And finally, determining the optimal unit number as n-2, 3 or 4, selecting n-3 or 4 when the building environment is complex, and selecting n-2 when the building environment is single, so as to obtain a crude building extraction result with high accuracy.

Step six, post-treatment of crude extraction results: there is still a small number of building edges that need refinement due to the coarse extraction of buildings obtained using multi-feature LSTM networks. The invention uses a morphological vertical line corrosion or transverse line corrosion method to carry out post-processing on the multi-feature LSTM crude extraction result.

The invention has the beneficial effects that:

the invention uses multi-temporal high-resolution remote sensing images as data sources and extracts buildings by using invariance and variability of certain ground features under time sequence change. The important spectral features, shape features, texture features and index features of the multi-temporal building are obtained by fully utilizing HSI transformation, combining graph-based segmentation with conditional random field post-processing, Gabor wavelet transformation and calculating DSBI building indexes. The method for organically combining the multi-feature LSTM network and the multi-temporal data obtains a building crude extraction result with good building profile, and removes the wrongly-divided roads and shadows by utilizing a post-processing algorithm, so that the building extraction result with the average accuracy of 90.5% and the average overall accuracy of 93.1% is finally obtained. The method can accurately extract the buildings from the high-resolution remote sensing images, and provides reliable analysis data for planning and designing cities and building smart cities.

Drawings

FIG. 1 is an investigation region used in the present invention.

FIG. 2 is a flow chart of the multi-temporal high-resolution remote sensing image building extraction method based on the multi-feature LSTM network.

Fig. 3 is a diagram of 5 experimental buildings in an investigation region used by the present invention.

Fig. 4 is a spectral feature diagram of a building obtained by HSI transformation according to embodiment 1 of the present invention.

Fig. 5 is a graph of the shape characteristics of the building obtained by the method based on graph segmentation and conditional random field post-processing in embodiment 1 of the present invention.

Fig. 6 is a graph of the texture features of the building obtained by Gabor wavelet transform according to embodiment 1 of the present invention.

FIG. 7 is a graph of building index features obtained by calculating the DSBI building in example 1 of the present invention.

Fig. 8 is a block diagram of a multi-feature LSTM network of embodiment 1 of the present invention.

Fig. 9 shows the rough extraction result of the multi-temporal high-resolution remote sensing image building based on the multi-feature LSTM network in embodiment 1 of the present invention.

FIG. 10 is a graph showing the results of the post-treatment of the crude extract of the building in example 1 of the present invention.

Detailed Description

The technical solution of the invention is further explained and illustrated in the form of specific embodiments.

Example 1:

the method comprises the steps of adopting 6 high-resolution two-number (GF-2) multi-temporal remote sensing images shot in 2015 year 1-month-2 days, 2015 year 5-month-15 days, 2015 year 9-month-20 days, 2015 year 11-month-28 days, 2016 year 3-month-25 days and 2017 year 2-month-28 days as data sources, conducting image preprocessing on 6 original data, extracting building characteristics of 6 scenes of multi-temporal data by using a segmentation method combining HSI color transformation, graph segmentation and conditional random field post-processing, Gabor wavelet transformation and a DSBI index calculation method, finally arranging extracted multi-temporal building characteristic wave bands and four wave bands of the original data according to the sequence of shooting time of the original satellite data to form a multi-temporal building characteristic set with 60 wave bands, training a multi-temporal building extraction model by using the characteristic set as input data of a multi-characteristic LSTM network, and obtaining a building crude extraction result, finally obtaining the building extraction result with the average accuracy rate of 90.5 percent after the morphological treatment. The experimental area is located in the central school district of Jilin university (figure 1) in Jilin province, Changchun city, regular-shaped buildings and irregular-shaped buildings in the school, such as rectangular, L-shaped, circular and annular. And the campus roof is made of various building materials including cement, asphalt, glass bricks and color steel plates, and in addition, the experimental area is obviously changed in four seasons, so that potential characteristics under the time sequence change of multi-characteristic LSTM network excavation are facilitated. As shown in table 1, 6 GF-2 high resolution remote sensing images with different shooting time and spatial resolution of 1m were taken as experimental data, and GF-2 raw data had 4 different bands, namely, red band, green band and blue band (table 1). And the accuracy of the method for extracting the building is verified by taking the unmanned aerial vehicle data as ground real verification data, and the whole flow chart (figure 2) of the embodiment is referred.

TABLE 1

The method comprises the following steps: image preprocessing

The obtained 6 pieces of multi-temporal high-resolution second data are respectively subjected to radiometric calibration, image registration, atmospheric correction and image fusion, and the processed remote sensing image is subjected to multi-temporal registration and image cutting to select a required image area.

(a) Radiation calibration: and (3) completing the conversion from a digital quantization value (DN) to a radiance value by utilizing a Radiometric Calibration tool (Radiometric Calibration) of a complete remote sensing image processing platform (ENVI).

(b) Image registration: the Image of the GF-2 panchromatic sensor is selected as a reference Image, and automatic Registration of the panchromatic sensor Image and the multispectral sensor Image in the GF-2 is achieved by utilizing an Image Registration tool (Image Registration) of an integrated remote sensing Image processing platform (ENVI).

(c) Atmospheric correction: the atmospheric correction tool (FLASSH) of the complete remote sensing image processing platform (ENVI) is utilized to eliminate radiation errors caused by atmospheric influences.

(d) Image fusion: and fusing the multispectral image and the panchromatic image with different spatial resolutions in the GF-2 image by using an image fusion tool (NNDiffuse Pan Sharpening) of an integral remote sensing image processing platform (ENVI).

(e) Multi-temporal data image registration: firstly, an Image in one time phase is selected as a reference Image, images of other time phases are registered as the reference Image by taking the reference Image as a standard, and automatic Registration of 6 multi-time-phase data images is realized by utilizing an Image Registration tool (Image Registration) of a complete remote sensing Image processing platform (ENVI).

(c) The region of interest (ROI) is obtained by image cropping. Finally, 5 representative building extraction experimental sites in the central school district of the university of Jilin, Changchun city, Jilin province were obtained, the sizes of the data of the remote sensing images were respectively 206 × 225, 176 × 185, 279 × 296, 318 × 337 and 250 × 300 (FIG. 3), the experimental sites 1 to 3 contained irregular buildings of different shapes, and the experimental sites 4 and 5 contained both regular-shaped buildings and irregular-shaped buildings. The roof material in all experimental sites is mainly made of cement and asphalt, a color steel roof building is arranged below the experimental site 3, and the roofs in the experimental sites 4 and 5 also comprise glass buildings. The construction of a complex experiment is helpful for verifying the effectiveness of the building extraction method of the invention.

Step two, extracting multi-temporal building multi-feature indexes:

after the acquired high-resolution remote sensing image is preprocessed, building spectral features are extracted by using a method based on HSI color transformation to distinguish buildings and roads, the buildings and the roads after H (hue) transformation are highlighted, the difference between the buildings and the roads after S (saturation) transformation and I (brightness) transformation is obvious, and the outlines of the buildings are well stored. Taking the S (saturation) transformation and the I (brightness) transformation images into comprehensive consideration, the S (saturation) transformation and the I (brightness) transformation images are selected as the building spectral features (figure 4).

After HSI transformation is carried out on original data, different types of features are changed remarkably, and the shape features of buildings after S (saturation) transformation and I (brightness) transformation are extracted by a method based on combination of graph segmentation and conditional random field post-processing, so that the shape features of regular-shaped buildings and irregular-shaped buildings are protected (figure 5).

By using Gabor transformation, important features such as texture information, building edges and the like in the satellite image can be extracted, and meanwhile, interference of shadow on building extraction can be well removed through Gabor transformation. The Gabor transform can obtain different texture edge features in different directions and scales. According to multiple experiments, the invention selects v-8 and u-8 as the direction function and scale function of the Gabor to extract the texture information characteristic of the building (fig. 6).

The DSBI building index can maximize the gap between the building and the background, reduce the interference of other buildings to the building, and extract the index features of the building based on the DSBI index method (fig. 7).

Step three: constructing a multi-temporal multi-feature building data set:

a near infrared band, a red band, a green band and a blue band in 6 scenes GF-2 original satellite data are subjected to HSI color transformation to obtain a building S (saturation) transformation spectrum characteristic and an I (brightness) transformation spectrum characteristic, a shape characteristic band of the building S (saturation) transformation and a shape characteristic band of the I (brightness) transformation are obtained based on a method of combining graph segmentation and conditional random field post-processing, a texture characteristic band of the building is obtained based on a Gabor wavelet transformation method, an index characteristic band of the building is obtained based on a DSBI index method, the data are arranged according to a satellite shooting time sequence, new data with a spatial resolution of 1m and 60 multi-temporal phase building characteristic bands are formed, and the band combination condition is shown in a table 2.

TABLE 2

Step four: and (4) selecting a training set.

The remote sensing images are divided into two types according to buildings and non-buildings, samples and labels of the buildings and the non-buildings are respectively manufactured, and the selected training samples meet requirements in sample quality and sample quantity. Building extraction is carried out based on image pixel points, 20% of pixel points of each experimental area image are selected for training, 20% of pixel points of each experimental area image are selected for verification, and the whole experimental area image is selected as a test set to verify model accuracy.

Step five: multi-feature LSTM algorithm

The multi-feature LSTM network of the present invention includes an input layer, n LSTM units connected to a Dropout random deactivation layer, 1 fully connected layer, and building and non-building output layers (fig. 8). The forgetting gate, the input gate and the output gate of the LSTM network can enable the network to have time sequence memory, further, the buildings which are not changed are extracted by finding the continuously changed shadows around the buildings in the multi-temporal data, and the interference caused by roads with similar building spectrums is reduced by finding the difference before and after the ice and snow are melted in winter. The buildings with different shapes built by various different building materials can be accurately extracted by comprehensively considering the multi-temporal characteristics of the high-resolution remote sensing image and the multi-features of the buildings, and the purpose of improving the accuracy and the effectiveness of the extraction of the buildings is achieved.

TABLE 3

The data size of 5 experiments in the experiment area is 206 × 225, 176 × 185, 279 × 296, 318 × 337 and 250 × 300, and the total number of pixels of each of the 5 experiments is 46350, 32560, 82584, 107166 and 75000 pixel points. The total number of data bands of the experimental area is 60, a single pixel with 60 band information is used as the minimum unit for training the LSTM network, each m pixels are used as a batch input network, feature learning is carried out pixel by pixel, and an extraction result is output pixel by pixel. The training precision of the LSTM algorithm is changed under different iteration times, the iteration times are set to 10000 finally, and the highest model precision can be obtained when the learning rate is 0.001. The parameter settings for the LSTM algorithm are shown in table 3.

The rough building extraction result of the multi-feature LSTM network building extraction algorithm under the combination of the multi-temporal high-resolution remote sensing image with 60 feature bands is shown in FIG. 9, the extraction result is divided into two types of land and thing, namely, a building and a non-building, white pixels represent the extracted building, and black pixels represent the non-building.

Step six: post-processing of the results of the crude extraction

In the course of obtaining the rough extraction result of the building by using the multi-feature LSTM network, there still exist some roads which are misclassified as buildings and need to be removed. According to the shape characteristics of the roads, the invention uses a morphological vertical line corrosion or horizontal line corrosion method to carry out post-processing on the multi-feature LSTM coarse extraction result, removes the wrongly-divided roads and obtains a building extraction result with higher accuracy (figure 10).

The LSTM network in the steps of the method is realized by calling the LSTM library of the python programming language, and the method utilizes the advantages of the multi-temporal remote sensing image in the aspects of city change detection and building extraction, and reduces the influence of shadows and vegetation on the building extraction. Selecting a multi-temporal remote sensing image as a data source, and acquiring saturation spectral characteristics and brightness spectral characteristics of a multi-temporal building by utilizing HSI (high speed integration) transformation; acquiring the shape characteristics of the building by using a method based on combination of graph segmentation and conditional random field post-processing; obtaining the texture characteristics of the building by utilizing Gabor wavelet transform; and calculating the DSBI index of the building to obtain the index characteristic of the building. Arranging 24 original wave bands of 6 original data and 36 acquired building characteristic wave bands according to a satellite shooting sequence to form a building characteristic set with 60 wave bands, sending the manufactured building sample and the label into a multi-characteristic LSTM network to obtain a building crude extraction result, and performing morphological processing to obtain an accurate building extraction result. The unmanned aerial vehicle real data is used for verifying the building extraction result obtained by the method, buildings with various shapes (regular shapes and irregular shapes) and buildings with various building materials (cement, asphalt, glass and color steel) in an experimental area can be extracted by using the algorithm, the average overall precision can reach 93.1%, and the average building extraction accuracy rate precision can reach 90.5%.

The experimental results are as follows: in order to illustrate the effectiveness of extracting buildings by using a multi-feature LSTM network multi-temporal high-resolution remote sensing image building extraction method, four traditional building methods (a building extraction method based on a spectral index, a building extraction method based on a VGG network, a building extraction method based on a U-Net network and a building extraction method based on a ResNet network) are selected and compared with the method disclosed by the invention. Although the traditional building extraction method can obtain a good building extraction result, the traditional building extraction method still has the problems of low accuracy, high error rate, poor profile and the like. The multi-temporal high-resolution remote sensing image building extraction method based on the multi-feature LSTM network has the advantages of higher accuracy, low error fraction and more complete building contour, is favorable for urban planning and development, and can provide reliable data sources and theoretical basis for smart city construction.

Claims

1. The method for extracting the buildings from the multi-temporal high-resolution remote sensing images based on the multi-feature LSTM network is characterized by comprising the following specific steps of:

step one, image preprocessing: firstly, respectively carrying out independent radiometric calibration, image registration, atmospheric correction and image fusion on Q high-resolution two-number data images, then carrying out unified registration on selected remote sensing image data, and finally cutting the image to select a required image area, wherein Q is more than or equal to 3;

step two, extracting the multi-temporal building features: after preprocessing the obtained high-resolution remote sensing image, extracting the spectral characteristics of the building by using a method based on HSI color transformation, extracting the shape characteristics of the building by using a method based on the combination of graph segmentation and conditional random field post-processing, extracting the texture information characteristics of the building by using a method based on Gabor wavelet transformation and extracting the index characteristics of the building by using a method based on DSBI index;

step three, constructing a multi-temporal multi-feature building data set:

obtaining S saturation transformation spectral characteristics and I brightness transformation spectral characteristics of a building by a red waveband, a green waveband, a blue waveband and a near infrared waveband in each GF-2 original satellite data and a method of HSI color transformation, obtaining S saturation transformation shape characteristic wavebands and I brightness transformation shape characteristic wavebands of the building by a method of combining graph segmentation and conditional random field post-processing, obtaining texture characteristic wavebands of the building by a Gabor wavelet transformation method, obtaining index characteristic wavebands of the building by a DSBI index method, and extracting 10 characteristic wavebands; after Q amplitude GF-2 original satellite data is processed, the data is arranged according to the time sequence for satellite shooting to form new data with the spatial resolution of 1m and Q multiplied by 10 multi-temporal building characteristic wave bands;

step four, selection of a training set: the building and the non-building have different characteristics, and the training samples and the labels are respectively selected according to the building and the non-building; taking pixel points as units, selecting 20% of the pixel points for training, verifying 20% of the pixel points, and testing the accuracy of the network model by using all the pixel points;

constructing an optimal unit LSTM network, training the LSTM network by using a training set, and extracting the building from the multi-temporal multi-feature building data set by using the trained LSTM network:

(a) part 1 of constructing the LSTM network: part 1 is an input layer, the size of an input image is mx (Q multiplied by 10) multiplied by 1, wherein m is the size of batch _ size, namely m pixels are selected from training samples to be a batch for network training each time, m is set to be 100, and Q multiplied by 10 is the total number of wave band numbers of input data;

(b) part 2 of constructing the LSTM network: firstly, connecting 1 Dropout random inactivation layer behind 1 LSTM unit network structure to form a combination, and repeatedly using the combination n times to form the 2 nd part of the LSTM network structure; the number of the filters of a single LSTM unit network is 32, and a Dropout random inactivation layer randomly selects 50% of sample data in the network for inactivation;

(c) part 3 of constructing the LSTM network: part 3 is a full connection layer with the number of filters being 64, the input of the full connection layer is mx320, and the output is mx64;

(d) constructing part 4 of the LSTM network; using a Softmax classifier as an output layer, outputting the number m multiplied by 2, and dividing the extraction results into 2 types of extraction results of buildings and non-buildings;

step six, post-treatment of crude extraction results: and acquiring a rough extraction result of the building by using the multi-feature LSTM network, and performing post-processing on the rough extraction result of the multi-feature LSTM by using a morphological vertical line corrosion or transverse line corrosion method to obtain a final optimized processing result.

2. The method for extracting buildings based on multi-temporal high-resolution remote sensing images of multi-feature LSTM network as claimed in claim 1,

in the second step, the HSI color conversion method is as follows: converting the RGB model of the single GF-2 image into an HIS model, wherein H represents hue, S represents saturation and I represents brightness, and the HSI color conversion formula is as follows:

3. The building extraction method of the multi-temporal high-resolution remote sensing image based on the multi-feature LSTM network as claimed in claim 1, wherein the method based on the combination of graph segmentation and conditional random field post-processing in the second step is as follows:

first step, graph segmentation:

(1) calculating the dissimilarity degree between each pixel point and the 4 neighborhood or the 8 neighborhood, namely the weight of each pixel point and the 4 neighborhood or the 8 neighborhood;

(2) the edges of a pair of vertexes are sorted from small to large according to the dissimilarity degree to obtain e₁,e₂,...,e_NSelecting the edge with the minimum dissimilarity and combining the edge with the minimum dissimilarity into one partition;

(3) judging the currently selected edge e_n(N ═ 2,3,. N) is combined; let the vertex to which it is connected be (v)_i,v_j) Then, the following merging condition needs to be satisfied:

Wherein, w (v)_i，v_j) Is an edge (v)_i，v_j) The weight is equal to the difference between two vertexes connected by the edge, and represents the dissimilarity between the vertexes; int (C) is the intra-class difference,

MST is a minimum spanning tree;MInt(C_i,C_j) Indicates the minimum intra-class difference, C_iAnd C_jRepresents any two regions;

(4) update threshold and class tag number:

updating class label numbers: will Id (v)_i) And Id (v)_j) Has a label number of Id (v)_i)；

Secondly, updating the dissimilarity threshold of the label number as follows:

(5) if N is not more than N (N is 2, 3.., N), arranging according to the sequence, then selecting the next edge and turning to the step (3), otherwise, ending, and obtaining a segmentation result S based on the graph;

secondly, conditional random field post-processing:

first, the division type of the division result S is expressed as a ═ a₁，a₂，…，a_lAnd the segmentation result is recorded as B ═ B₁,b₂,…,b_gB is a probability undirected graph; the essence of the method for combining graph segmentation and CRF post-processing is that the graph segmentation image S is taken as a premise, the graph segmentation result is taken as a unitary potential function value of a conditional random field, and conditional probability values P (B ═ a) of m different classes of all pixel points are solved_k|S)，k＝1,2,..m；

P (B | S) can be represented as:

in the formula, μ represents all the maximum cliques on the undirected graph B, and Z is a normalization factor; psi_μ(B_μ| S) is a function that satisfies the Gibbs distribution, the above equation can be converted as follows:

wherein phi is₁Obtaining a unitary potential function of a segmentation result after independently considering self information of each pixel point, wherein a graph segmentation result B is used as an estimation value of the unitary potential function; phi is a₂The paired potential function represents the influence of the relationship between the position information and the color information of each pixel point on the segmentation, and can be obtained as follows:

in the formula, the first and second organic solvents are,

p_iand p_jRespectively representing the physical positions of the ith and jth pixels; theta_iAnd theta_jColor vectors representing the ith and jth pixels, respectively; omega⁽¹⁾、ω⁽²⁾Respectively, the energy weight, λ (b)_i,b_j) To classify the compatibility weight, where ω⁽¹⁾、ω⁽²⁾And the value of λ is the specific value that needs to be determined by extensive sample training; alpha and beta respectively represent the similarity and the approximation degree between the segmentation image and the original image, alpha and beta are set values, the range is 0-100, gamma is a smoothing kernel parameter, and gamma is set to be 1.

4. The building extraction method of the multi-temporal high-resolution remote sensing image based on the multi-feature LSTM network according to claim 1, wherein in the second step, the Gabor wavelet transform method is as follows:

graying the preprocessed high-resolution two-color red, green and blue three-band image data, and then expressing the data by Y: y ═ Y (i, j) |1 ≦ i ≦ M,1 ≦ j ≦ N }, where M is the height of the image and N is the width of the image; y (i, j) is an image pixel point, and the convolution of the grayed remote sensing image Y and the Gabor wavelet function can be expressed as:

wherein, z ═ (i, j) is the pixel point coordinate of the image, and is the convolution operator; g_v,uz represents a multichannel Gabor wavelet function, v and u represent the scale and direction of the Gabor wavelet function, respectively, and the obtained result G_v,uz is Gabor feature value extraction data of the image.

5. The building extraction method of the multi-temporal high-resolution remote sensing image based on the multi-feature LSTM network as claimed in claim 1, wherein the building DSBI index extraction method comprises the following steps:

DSBI＝0.5(Blue-Green)+0.5(Blue-Red)………………………(10)