CN111259797A

CN111259797A - Iterative remote sensing image road extraction method based on points

Info

Publication number: CN111259797A
Application number: CN202010046338.6A
Authority: CN
Inventors: 程明明; 谭永强; 任博; 高尚华; 李炫毅
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2020-06-09

Abstract

An iterative remote sensing image road extraction method based on points. The road center line graph is automatically extracted from the remote sensing image, so that the collection efficiency is higher and the cost is lower. In order to improve road connectivity and simultaneously keep accurate alignment between a road map and a real road center line, the invention provides an iterative road map exploration method which is based on points, guided by segmentation clues and explored with variable step lengths and tracks. The method comprises the steps of obtaining a central line segmentation point, a road end point segmentation point and a connecting point segmentation point, wherein the segmentation clues are embodied as central line segmentation and intersection point segmentation in a neural network as monitoring information, the variable step length is embodied as training the neural network by using the adjustable step length at the road intersection point, the road end point and the connecting point, and the track exploration method is embodied as utilizing one-time remote sensing image input to obtain a next step point set starting from an image central point and according to a time sequence.

Description

Iterative remote sensing image road extraction method based on points

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a method for extracting a remote sensing image road by using a neural network and application thereof.

Background

The extraction of roads from remote sensing images is a research topic in the field of remote sensing. Conventional methods construct road maps by various techniques, such as using a priori knowledge of nearby buildings and vehicles (hinz et al) [ ISPRS J outer of phosphor mapping and Remote Sensing,2003,58(1-2):83-98], form factors (song et al) [ photomechanical Engineering & Remote Sensing,2004,70(12): 1365-. In addition, a minimum spanning tree (turretken et al) [ IEEE Conference Computer Vision and Pattern Recognition,2012, 566-.

Since 2010, deep learning has been the main research direction for extracting road maps due to its excellent classification regression performance to generate road maps with higher performance. In mnih et al [ spring European Conference on computer Vision,2010,210-223] a restricted Boltzmann machine was used for the first time in the road extraction task, the data was used to preprocess the dimension-reduced input data, and further a post-processing method was used to remove the broken road or fill the holes in the broken road. Saito et al [ Electronic Imaging,2016 (10):1-9] generated road segmentation directly from raw remote sensing images using Convolutional Neural Networks (CNNs), without preprocessing, and extracted road centerlines using cascaded neural networks. Zhang et al IEEE Geoscience and Remote Sensing Letters,2018, 749-. The D-linknet proposed by Zhou et al (IEEE Conference on Computer Vision and Pattern Recognition works, 2018,182- > 186) combines the inflation convolution method and linknet to expand the field of view of extracting roads from high-resolution satellite images.

In order to generate the vectorized road map, connectivity of the generated road network and alignment accuracy with real roads should be considered at the same time. Currently, there are two main frameworks for deep learning to obtain road maps: one is to extract road segmentation and then process the road segmentation into a vectorized road map, and the other is to directly convert the remote sensing image into the vectorized road map through iterative exploration.

The improvement of the first framework described above mainly consists in a neural network for road segmentation, and the post-processing method is substantially the same: firstly, a threshold value method is adopted to carry out binarization on road segmentation, then a morphological thinning technology is applied to obtain a road skeleton with a single pixel width, each activated pixel can be regarded as a vertex, and finally a Ramer-Douglas-Peucker algorithm is used for eliminating the redundancy of the map. Mattyus et al IEEE International Conference on computer Vision,2017,3438-]During neural network training, a lightweight CNN was supervised using soft-IoU loss, and after conversion to road maps, they treated further post-processing as a shortest Path problem, using A^*The algorithm removes the short edges and lost connections. Batra et al [ IEEE Conference on Computer Vision and Pattern recognition,2019,10385-]The method is characterized in that direction learning and erasure optimization learning are introduced into the neural network, wherein the direction learning endows the neural network with the capability of processing the connection between pixels, the erasure refinement learning optimizes the road segmentation output of the first neural network, and the obtained road graph has better connectivity on an average path length similarity index (APLS).

The second framework described above employs an iterative exploration algorithm to directly generate the road map. In the training phase, starting points are generated from truth labels of road maps, in the inference phase, Bastani et al IEEE Conference on Computer Vision and Pattern Recognition,2018,4720-]From the additionalA starting point is generated in the road segmentation network. Then, the trimmed remote sensing image with the starting point as the center is input into the neural network iteratively. In general, a road map G is a vectorized representation of a road label map, which contains a set of vertices V-V₁，v₂，…，v_nAnd an edge set E ═ E₁，e₂，…，e_nThe road graph is constructed by iteratively searching for a new vertex V along the road and adding it to the existing road graph G, and adding a new edge E between the two vertices, specifically, iteratively searching for a set of starting vertices S indicating the starting point for the search, where S is typically obtained from a peak point of a road segmentation or road intersection segmentation, initializing the set of vertices V in G to a copy of S, popping up a vertex V from S for each search, the neural network takes the remote sensing image centered at the vertex as input and predicts the next set of vertices V ', if the predicted V ∈ V ' has a matching vertex in the same region of V, then takes the matching vertex as the newly obtained vertex, then adds the newly obtained vertex V and the edge E between the current vertex and the new vertex to V and E, respectively updates G.S to S ∪ V ', then obtains a new starting vertex from S and begins with a new vertex, starts with a new vertex, and ends with a search for a new vertex, and then continues to search for a matching angle using the algorithm found before the search string of the road graph until the current termination [ see angle found ] is equal to determine whether the algorithm found, the algorithm continues to use the algorithm found whether the algorithm found is based on, the algorithm [ 17135, the algorithm, if the algorithm continues to output from the algorithm, if the algorithm continues to output from the algorithm, if the algorithm, if the algorithm continues to update G.S, update is found, update is found to update, update the algorithm, update is found to update]Polygons are used to fit the shape of roads and buildings. They use a polygon-based CNN-RNN structure, cyclically extract road geometry keypoints, and apply right-hand rules to outline roads.

In the evolution and development process of the road map extraction method, stronger and stronger road segmentation capability is gradually shown, and the connectivity of roads is more and more emphasized. The existing road segmentation-based method generally lacks connectivity constraints, and a road map generated by an iterative exploration-based method is generally difficult to align with a road centerline. How to fully utilize the center line fitting performance of road segmentation and the connectivity of iterative exploration is a key for improving the method for constructing the road map.

Disclosure of Invention

The invention aims to solve the problem that the existing method is difficult to have both graph-level connectivity and pixel-level accuracy, and provides an iterative road graph exploration method which takes road segmentation as guidance and is assisted with dynamic step length and track exploration.

The technical scheme of the invention is as follows:

a point-based iterative remote sensing image road extraction method comprises the following steps:

a. and using the point as a position representation of the next iteration exploration, and jointly constraining the moving direction and the step information. The concrete expression is as follows: two-dimensional Gaussian distribution with the following step of point placement coordinates as the center is used as a point representation form, exists in a two-dimensional vector with the same size as the input remote sensing picture, is used for pixel-by-pixel supervision during neural network training, and takes the output of the neural network in a constraint reasoning stage as a point form;

b. using a variable iterative exploration step size, it is embodied as: for road intersection points, road end points and connecting points, variable step lengths are used, and for non-special conditions, fixed step lengths are used;

c. the road segmentation and the intersection segmentation are used as information clues to guide the generation of the drop points of the iterative exploration, which is specifically represented as follows: using road segmentation and intersection segmentation as supervision information of a neural network;

d. the track exploration method using one-step multi-step prediction is characterized in that: repeatedly utilizing the characteristics obtained by one-time input of the remote sensing image to generate multi-step point-falling prediction of a sequence starting from the current iteration central point;

the invention has the advantages and beneficial effects that:

the invention uses variable iterative exploration step length to realize dynamic alignment of a training stage and a road, uses road segmentation and intersection segmentation as information clues to guide exploration and realize better alignment of the road and the intersection, uses a road trunk characteristic sum as input through a neural network detector part in a network architecture, fuses the characteristics through rapid down-sampling and up-sampling, and amplifies and refines high-level and low-level road information through a decoder part to accurately generate predicted distribution. The finally generated road map has both map-level connectivity and pixel-level accuracy, and given the remote sensing image and the searched track, the predicted track can be effectively generated end to end, and the road map is formed.

Drawings

Fig. 1 is a schematic diagram of variable step size and segmentation cues in the present invention, wherein the first row is (a) fixed and (b) variable step size, and the second row is (c) no segmentation cue and (d) with segmentation cue.

Fig. 2 shows a next prediction representation in the present invention, wherein (a) is direction + fixed step size, (b) is point + fixed step size, and (c) is point + variable step size.

Fig. 3 shows a specific case of the variable step size in the present invention, wherein (a) the intersection of the road, (b) the end of the road, and (c) the connection point with the existing route.

FIG. 4 is a diagram of a neural network architecture in the present invention.

FIG. 5 is a schematic diagram of the result of the present invention in inferring a road map, wherein the image labeled with letters is a visualized road map obtained by applying the technology. Wherein (a) is input image, (b) is truth marking, (c) is road segmentation, (d) is intersection segmentation, and (e) is basic characteristic

(f) To include intersection segmentation guide

(g) To include road segmentation guidance

(h) For including intersection segmentation and road segmentation guidance

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

The iterative exploration framework iteratively constructs a road map by continually predicting next actions and incorporating their predictions into an existing road map. The invention adopts the framework and provides a plurality of schemes for improving the performance of the constructed route map. First, a point-based next-step movement representation is utilized, which is a joint representation of movement angle and distance, thus applying a variety of constraints during the training phase without the need for a variety of oversight. Secondly, a variable step size detection technique is proposed herein, which benefits from a point-based representation, by simply changing the location of the drop point in the supervision information, which aims at the training phase to align dynamically with the road. The present document also utilizes road and intersection segmentation cues to guide exploration and achieve better road and intersection alignment. Finally, a trajectory exploration technique that improves efficiency and combines excellent performance can generate multi-channel outputs, each representing a time-sequential set of drop points. The present invention unifies these schemes into our proposed neural network to generate roadmaps with high connectivity and precise alignment.

Iterative point-based exploration: in this work, we represent the position of the next move as a point that unifies the representation of the angle and distance of the move, as shown in fig. 2 (b). Two-dimensional Gaussian distribution with the following step of point placement coordinates as the center is used as a point representation form, the point representation form exists in a two-dimensional vector with the same size as the input remote sensing picture, the point representation form is used as pixel-by-pixel supervision during neural network training, and the output of the neural network in the constraint reasoning stage is used as a point form. In the training phase, the monitoring of the next movement is always set on the centre line of the road, so the inference phase output can be expected to iteratively track the actual road. With point-based exploration as a pixel-level prediction task, the neural network can accurately predict the next step of action on the road centerline. During the reasoning process, the coordinates of the next step can be obtained from the peak of the prediction distribution. Multiple constraints (e.g., direction and step size) can be easily applied to the representation of the points during the training phase without the need for complex forms of supervision.

Variable step size of exploration: as shown in fig. 2(a), the previous method detects the next movement using an angle classifier with a fixed step size, divides 360 degrees into 64 equally spaced intervals, outputs only one angle interval with the highest classification probability at a time, and takes the center value of the interval as the finally determined direction. As shown in fig. 3, there are various "non-trivial points" on a road, such as road intersections, road end points, and connection points. The length of the road between the current position and the nearby non-trivial next step move is often difficult to be an integer multiple of a fixed step size. As shown in fig. 1(a), when the detector with a fixed step size encounters an intersection in the next move, a path that does not coincide with the real road may be generated. In order to ensure the accurate alignment of the road and the intersection point, the method designs a variable step size scheme. In the training phase, we perform exploration on the null graph under the supervision of the real label graph. In each exploration step, we will dynamically track the real annotation graph to generate next action supervision. Here we denote the fixed step size as s, while the variable step size is designed to be an adjustable length between 0.5 × s and 1.5 × s. Specifically, when there is a non-trivial point within a factor of 1.5 × s from the current vertex, we will generate exactly a gaussian form of supervision over the detected points. With variable step sizes, the problem of non-trivial points (e.g., intersections) can be easily handled, and thus the structure that generates the road map will align with the real road. In addition, this approach also helps to enhance the connectivity of the graph, a special case is shown in fig. 3(c), where the output of the neural network can easily match and connect previously interrupted probe points with variable step lengths. For the case where there are no non-trivial points in the next exploration area, we generate a new gaussian profile supervise using a fixed step strategy starting from the current starting point. In summary, we use fixed step size supervision in the middle of the road and switch to variable step sizes around non-trivial points. If conventional methods employing angle learning are intended to achieve such variable step sizes, additional step size learning must be carefully designed. In contrast, with the benefit of our proposed "point" representation, the point-based detector can learn variable step sizes through training without adding a moving distance detection branch, as shown in fig. 2(c), where the moving distance is encoded into the point-based representation.

Track exploration: in the framework of iterative exploration, each step may introduce slight errors. Inspired by techniques such as long-term reward and experience replay, the method can predict the trajectory of the movement by outputting a plurality of sequential movements given one input. We do this by sending down-sampled next step motion predictions back periodically to the neural network detector section (hourglass module) up to T times. It should be noted that we extract image features only once, given the remote sensing image as input. By using a recursive mechanism, the neural network will obtain the ability to predict longer distances for future trajectories and reduce the overall error.

And (3) segmenting the clues: unlike exploration mechanisms that focus on local next steps, segmentation techniques allow neural networks to have more global views. Comparing the results of Bastani et al in FIG. 1(c) with our results (d), we can see that in a long term view, if one does not know where the exploration should be made, it will result in a mismatch with the real road at the road and intersection.

And (3) road segmentation clues: the goal of road segmentation is to extract the centerline of the road from the remote sensing image. As shown in fig. 1(b), the road center line may better represent the topology of the road map from a macroscopic perspective. Here we explain two main purposes of using road segmentation in the present method. First, the iterative exploration method mainly focuses on the location of the local next step action, but lacks a comprehensive understanding of the road area (i.e., the location of the real road). Specifically, as can be seen in fig. 5(e), (g) and (h), guidance from road regions of road segmentation can reduce misalignment with real roads. Second, in the form of the next step of prediction, road segmentation may be considered as an ideal choice of search points, and therefore, road segmentation may provide reasonable centerline guidance prior to search.

And (3) intersection segmentation clues: intersection segmentation is applicable to our variable step method, which can guide the prediction of the next step movement ahead of the intersection. Since the intersection in the remote sensing image is usually a region, intersection segmentation cues can help the network to accurately learn the optimal intersection location during training. In contrast, when several road segments meet in an intersection region without the help of intersection cues, the exploration method may produce fewer or more intersections due to its short-sighted nature. As shown in fig. 5(e) and (g), when intersection segmentation is not used as a support, it is often difficult to accurately generate uniform intersections at a complex intersection, and the prediction result is more reasonable under the guidance of intersection segmentation. Similar to the reason for using road segmentation, intersection segmentation can give prior information of intersection positions, which helps neural networks to better identify distance patterns and determine step sizes to achieve accurate positioning of intersections.

The neural network architecture: as shown in FIG. 4, the VGG is adopted as a main network in the design of the network architecture, the side branch output of the feature pyramid is fused, and the basic features of the quarter-scale remote sensing image are extracted

In order to make the neural network clearly determine the advancing direction, the search path segmentation is generated from the road map intermediate result obtained by inference

As an intermediate input. We exploit the fused side branch output features

Generating road and intersection segmentation predictions, respectively

And

and (4) showing. Correspond toThe method employs road supervision

And intersection supervision

To guide the backbone network to learn the basic representation of the road. The hourglass module is a pyramid-shaped feature automatic encoder which integrates the road trunk features with

As input, the features are fused by fast downsampling and upsampling. When segmentation cues are used as implicit guidance, our input to the hourglass module merges further joint segmentation cue features into

Here, we are using for simplicity

And

and representing the segmentation characteristics in the middle of the network and the output prediction result according to the corresponding situation.

And

will serve as an input for the hourglass module and bring good interpretability to the design. After the hourglass module, the figures will be labeled by actual gaussians

Side supervision to ensure next step

A coarse gaussian distribution. The decoder part of our network is setIn order to magnify and refine the high-level and low-level road information, it helps to accurately generate the predicted distribution. Finally, supervise

Will be used to ensure fine next step prediction results

Furthermore, due to the multi-task co-learning, our approach is end-to-end, and no separate network is needed to obtain the starting point.

To cycle through the predicted T steps, we will predict the final

Downsampled to the quarter scale and reused by concatenation with the input of the hourglass module described above. We use placeholders initialized with 0

To ensure consistency in the number of feature channels. Given a

We obtain the next move probability label map

Here, D (·) denotes a downsampling operation. Therefore, we can loop through T time steps. If a segment intersects T time steps ahead, e.g. when timestamp T is k and k < T, we will ignore timestamp T > k +1 when computing the loss function, since the supervision after exploring T k +1 will be ambiguous for the connection of vertices.

As for details of the network architecture design, the hourglass module is constructed by 4-level down-sampling and 4-level up-sampling with residual connection. Each layer contains two Conv-ReLU layers with a convolution kernel size of 3. Each decoder block sums the 32-channel trunk feature computed from the previous block and the 32-channel next motion feature, followed by two 3 x 3 convolutional layers. We used the standard twoSeparately optimizing for cross entropy loss

And

the total loss function is:

where LX, Y is the binary cross entropy loss between the prediction matrix X and the real labeling matrix Y. U (-) represents an upsampling function, λ is a parameter used to balance multiple types of losses,

determined by min (k +1, T), λ₁And λ₁Are set to 1.

Table 1 shows the performance of 3 evaluation indexes using different configurations, wherein P-F1 represents the pixel level road accuracy, J-F1 represents the intersection matching accuracy, and APLS represents the connectivity of the graph.

Variable step size	Road segmentation	Intersection segmentation	Trajectory exploration	P-F1	J-F1	APLS
										53.28	36.25	34.69
√				56.42	43.83	46.22
							√	√		68.28	56.21	49.46
√		√		61.28	55.49	50.75
							√	√	√	69.81	59.42	57.28
√	√	√	√	73.43	62.26	58.40

TABLE 1

As can be seen from table 1:

1. after the variable step length is introduced, the accuracy of the intersection point is greatly improved, so that the quality of an iterative exploration input diagram is improved, and the connectivity and the pixel accuracy of a road diagram are indirectly improved;

2. after road segmentation and intersection segmentation are introduced, the pixel-level accuracy and connectivity of a road map are respectively and greatly improved, and the effectiveness of segmentation guidance is proved;

3. after the track exploration is introduced, the pixel level accuracy and the road map connectivity are improved, and the track exploration technology is proved to be capable of improving the overall perception of the exploration method on the exploration track and improving the road map building capability.

Claims

1. A point-based iterative remote sensing image road extraction method is characterized by comprising the following steps:

a. using the point as position representation of iterative exploration, and jointly constraining the moving direction and step length information;

b. using variable step length of iterative exploration to realize dynamic alignment of the training stage and the road;

c. the road segmentation and the intersection segmentation are used as information clues to guide exploration and realize better alignment of the road and the intersection;

d. generating multi-channel output by using a track exploration method of once multi-step prediction;

the above steps are consolidated into a neural network architecture to generate roadmaps with high connectivity and precise alignment.

2. The point-based iterative remote sensing image road extraction method according to claim 1, characterized in that: the point is used as the position representation of the iterative exploration, and the two-dimensional Gaussian distribution taking the coordinates of the next landing point as the center is used as the representation form of the point, exists in a two-dimensional vector and is used as the supervision pixel by pixel during the neural network training.

3. The point-based iterative remote sensing image road extraction method according to claim 1, characterized in that: for road intersections, road end points and connection points, variable step lengths are used, and for non-special cases, fixed step lengths are used.

4. The point-based iterative remote sensing image road extraction method according to claim 1, characterized in that: road segmentation and intersection segmentation are used as supervisory information for the neural network.

5. The point-based iterative remote sensing image road extraction method according to claim 1, characterized in that: the track exploration method is to repeatedly utilize the characteristics obtained by one-time input of the remote sensing image to generate multi-step point-falling prediction starting from the current iteration central point.

6. The point-based iterative remote sensing image road extraction method according to claim 1, characterized in that: the network architecture comprises a neural network detector part and a decoder part, VGG is used as a main network, side branch output of a characteristic pyramid of the neural network is fused, basic characteristics of a quarter-scale remote sensing image are extracted, exploration path segmentation is generated from an inferred road map intermediate result and is used as intermediate input, and road and intersection segmentation prediction is generated by utilizing the fused side branch output characteristics; wherein the neural network detector portion fuses features by fast down-sampling and up-sampling with road trunk features and as input, and wherein the decoder portion is designed to amplify and refine high-level and low-level road information for accurate generation of the predicted distribution.

7. The point-based iterative remote sensing image road extraction method according to claim 6, characterized in that: the neural network detector portion is constructed by 4-layer down-sampling and 4-layer up-sampling with residual concatenation, each layer containing two Conv-ReLU layers, the convolution kernel size being 3; each decoder block sums the 32-channel trunk feature computed from the previous block and the 32-channel next motion feature, followed by two 3 x 3 convolutional layers.