CN112488046A

CN112488046A - Lane line extraction method based on high-resolution images of unmanned aerial vehicle

Info

Publication number: CN112488046A
Application number: CN202011479306.1A
Authority: CN
Inventors: 余卓渊; 吕可晶; 张颖超; 石智杰; 严虹
Original assignee: Institute of Geographic Sciences and Natural Resources of CAS
Current assignee: Institute of Geographic Sciences and Natural Resources of CAS
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2021-03-12
Anticipated expiration: 2040-12-15
Also published as: CN112488046B

Abstract

The invention discloses a lane line extraction method based on high-resolution images of an unmanned aerial vehicle, which comprises the following steps: s1, constructing and training a U-Net network model; s2, segmenting the aerial image of the unmanned aerial vehicle by using the trained U-Net network model to obtain an ROI mask; s3, extracting gradient features and color features of the aerial images of the unmanned aerial vehicle to obtain extracted images, and filtering background noise of the extracted images by using an ROI mask; s4, performing threshold segmentation on the extracted image with the background noise filtered by using an Otsu algorithm, and extracting lane line features to obtain a binarized feature image; s5, performing quality enhancement on the characteristic image by using an image filtering algorithm; s6, determining the number of lane lines in the feature image and the initial positions of the lane lines in the feature image by using the feature histogram; and S7, carrying out positioning detection on the lane lines by using a sliding window algorithm and extracting each lane line in the characteristic image by polynomial fitting.

Description

Lane line extraction method based on high-resolution images of unmanned aerial vehicle

Technical Field

The invention relates to the field of image processing, in particular to a lane line extraction method based on high-resolution images of unmanned aerial vehicles.

Background

With the rise of automatic driving, and other novel location-based services and industries, new requirements on the accuracy, content structure, calculation mode and the like of a map are made, so that the service objects of the map are not only human but are slowly transited to a machine. The traditional navigation map model generally only depicts the position and the shape of a road and does not reflect the detailed information of the road. To meet these new demands, the concepts of high-precision navigation maps and holographic position maps have been developed. In the above map model, the lane lines are both basic data and important components of the road data model.

The traditional lane line detection system is mainly applied to the field of intelligent driving of vehicles, and is mainly used for acquiring lane line information based on sensors such as a vehicle-mounted monocular/monocular camera and the like and extracting and detecting lane lines. Most of the existing lane line visual detection algorithms are based on ground image data, and the algorithms usually show good adaptability to single-frame image data, but when the algorithms are applied to large-scale drawing, the efficiency and the precision are usually problems. The application of the unmanned aerial vehicle brings a new visual angle to a large-range lane line detection method, firstly, the unmanned aerial vehicle can acquire road surface structure information with centimeter-level spatial resolution, and marking information of a road surface can be efficiently and clearly acquired, so that the extraction requirement of lane lines can be met. But relative to the on-vehicle vision sensor in ground, unmanned aerial vehicle high resolution remote sensing image scale is bigger, and the visual angle is also inconsistent, and the bigger imaging range brings the high efficiency and also means more noise to it is very little that the detection target also often shows on the image. If direct draw based on unmanned aerial vehicle remote sensing image, there are the unbalanced and background noise scheduling problem of sample, and the many times of convolution operation in the deep learning in addition often can cause the losing of detail, and this makes lane line information lose very easily, therefore at present rarely studies and uses high resolution remote sensing image to carry out the extraction of lane line.

Disclosure of Invention

Aiming at the problems, the invention provides the lane line extraction method based on the high-resolution image of the unmanned aerial vehicle, which can improve the large-range drawing efficiency and the drawing precision

In order to achieve the purpose, the technical scheme of the invention is as follows:

a lane line extraction method based on high-resolution images of unmanned aerial vehicles comprises the following steps:

s1, constructing and training a U-Net network model;

s2, segmenting the aerial image of the unmanned aerial vehicle by using the trained U-Net network model to obtain an ROI mask;

s3, extracting gradient features and color features of the aerial images of the unmanned aerial vehicle to obtain extracted images, and filtering background noise of the extracted images by using an ROI mask;

s4, performing threshold segmentation on the extracted image with the background noise filtered by using an Otsu algorithm, and extracting lane line features to obtain a binarized feature image;

s5, performing quality enhancement on the characteristic image by using an image filtering algorithm;

s6, determining the number of lane lines in the feature image and the initial positions of the lane lines in the feature image by using the feature histogram;

and S7, carrying out positioning detection on the lane lines by using a sliding window algorithm and extracting each lane line in the characteristic image by polynomial fitting.

Further, in the step S1, a binary cross entropy loss function is used to perform U-Net network model training; the binary cross entropy loss function is defined as follows:

wherein: x and Y respectively represent a predicted image and a real label; w and H are the width and height of the predicted image, respectively; x is the number of_ijAnd y_ijRespectively representing the pixel values of the predicted image and the real label at (i, j).

Further, the ROI mask obtained by the segmentation is subjected to hole filling by using a dilation algorithm in step S2.

Further, when the gradient feature of the aerial image of the unmanned aerial vehicle is obtained in step S3, the aerial image of the unmanned aerial vehicle is convolved by using two convolution kernels respectively, so as to obtain a gradient Gx along the width and a gradient Gy along the height, where the gradient value G is calculated by the following formula:

for a discrete gray-scale image, the gradient value G is calculated by the formula:

G＝|Gx|+|Gy|：

the calculation formula of the gradient direction is as follows:

further, in the step S3, when the color feature of the aerial image of the unmanned aerial vehicle is obtained, an HSL color model is used for obtaining the color feature; the calculation formula is as follows:

V_max←max(R，G，B)；

V_min←min(R，G，B)：

wherein: l is more than or equal to 0 and less than or equal to 1, S is more than or equal to 0 and less than or equal to 1, H is more than or equal to 0 and less than or equal to 360, and L, S and H respectively represent each channel value of the HSL color space.

Further, in the step S3, the gradient feature and the color feature are combined and then synchronously extracted; the joint formula is as follows:

(G_x∩G_dir∩G_mag)∪HSL_l.

wherein: g_xTransverse gradient binary matrix, G, representing aerial images of unmanned aerial vehicle_dirGradient direction threshold binary matrix, G, representing aerial images of unmanned aerial vehicle_magGradient threshold binary matrix, HSL, representing unmanned aerial vehicle aerial imagery_lA threshold binary matrix representing the HSL color space s-channel.

Further, in step S6, the number of non-zero pixels in each column in the feature histogram is counted, then the local maximum value of the feature histogram is obtained according to the pixel distance between the lane lines, and the number of the lane lines in the feature image and the initial position of the lane lines in the feature image are determined according to the local maximum value of the feature histogram.

Further, the step S7 includes the following steps:

s71, setting the size of the sliding window and the minimum number of nonzero pixels in the sliding window, and calculating the number n of the sliding window according to the height of the sliding window and the size of the aerial image of the unmanned aerial vehicle;

s72, respectively taking the position of each lane line as the middle point of the lower boundary of the initial sliding window, and storing the coordinates of all non-zero pixel points in the initial sliding window;

s73, counting the number of non-zero pixel points in the initial sliding window and comparing the number with a threshold, if the number is larger than the threshold, calculating the mean value of the horizontal coordinates of all the non-zero pixel points in the initial sliding window as the middle point of the lower edge of the next sliding window, and the vertical coordinate of the upper edge of the initial sliding window is always used as the vertical coordinate of the lower edge of the next sliding window, and circulating the steps until the number of the sliding windows reaches n, thus finishing the detection of the sliding window;

and S74, after the sliding window detection is completed, performing polynomial fitting according to the coordinates of the nonzero pixel points stored in the sliding window, and respectively extracting each lane line in the characteristic image.

Compared with the prior art, the invention has the advantages and positive effects that:

firstly, semantic segmentation is carried out on a high-resolution unmanned aerial vehicle image by using a classical U-Net network to obtain an ROI mask for filtering a large amount of background noise; meanwhile, the gradient and color features of the combined image are extracted, multiple convolution operations are not needed, the problems of detail information loss and long time effectiveness of large-range extraction are solved, the number and the initial position of the lane lines are determined by detecting the local maximum value through the feature histogram, and the detection and extraction of a plurality of lane lines are realized by using a sliding window algorithm; the method is simple to operate, convenient and fast, lane line extraction operation can be performed through aerial images of the unmanned aerial vehicle, and compared with the operation of collecting lane line information through a ground vehicle-mounted vision sensor, the method has the advantages of high drawing efficiency in a large range and high precision, and guides the direction for road information collection work of a map.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a block diagram of the framework of the present invention;

FIG. 2 is a schematic diagram of the architecture of a U-Net network model;

FIG. 3 is a schematic diagram of a feature histogram and local maxima;

FIG. 4 is a block flow diagram of a sliding window detection framework;

fig. 5 is a diagram showing the effect of lane line extraction.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments of the present invention by a person skilled in the art without any creative effort, should be included in the protection scope of the present invention.

The embodiment discloses a lane line extraction method based on high-resolution images of an unmanned aerial vehicle, wherein the process of extracting lane lines is shown in fig. 1: firstly, a U-Net network model is constructed and trained, and an unmanned aerial vehicle aerial image is segmented by using the trained U-Net network to obtain a road area as an ROI mask. Then, extracting gradient and color features of the image, filtering out background noise by means of an ROI mask, and obtaining a binary feature image by using an Otsu algorithm; and comprehensively using an image filtering algorithm to enhance the quality of the characteristic image, determining the number and the initial position of the lane lines by means of the characteristic histogram, and using a sliding window algorithm to realize the positioning of the lane lines and perform polynomial fitting.

U-Net network model construction

The U-Net network is established on the basis of the FCN network, and the processing originally used for medical images is also widely applied to the aspect of remote sensing ground object extraction at present. As shown in fig. 2, like most networks such as CNN, the U-Net network is mainly composed of an input layer, a hidden layer, and an output layer, and is a U-type encoder-decoder structure. The encoding stage (left side of fig. 2) is similar to the VGG network and is mainly composed of simple convolution, pooled downsampling. The left side in fig. 2 is similar to a VGG network, consisting primarily of simple convolutional, pooled downsampling. The invention uses convolution kernels of 3 x 3 and 1 x 1 to perform convolution operations, 3 x 3 for extracting features and 1 x 1 for changing dimensions. In addition, the present invention performs maximum pooling using a size of 2 x 2, resulting in a total of 5 scales of images including input. In the decoding phase (right side of fig. 2), upsampling and multi-scale feature fusion are performed, the spatial dimension is gradually restored, and the detailed information of the object is restored.

The present invention performs upsampling by a transposed convolution and then performs feature fusion. Unlike the jump-level structure of the FCN, the FCN directly and simply sums the outputs of the following layers with the final output in order to add detail information to the segmentation result, thereby obtaining a finer segmentation result.

The U-Net network adopted by the invention copies the feature information of each scale in the encoding stage, respectively superposes the copied feature information to the corresponding feature map in the decoding stage, and splices the copied feature information in the channel dimension to form thicker high-dimensional feature, thereby reducing the information loss in the down-sampling process, adding detail information to the feature map and improving the segmentation precision.

As shown in fig. 2, as the number of network layers increases, the distribution of the input values of the activation function gradually shifts during the training process, so that the gradient of the neural network at the lower layer disappears during the back propagation. Aiming at the problem of gradient disappearance, Batch Normalization (BN for short) is used to fix the input distribution of each layer of neural network so as to solve the problem of gradient diffusion.

U-Net network model training

In order to ensure that parameters can be converged quickly in the back propagation training process and avoid the problem of gradient disappearance, just as most image segmentation tasks, the invention avoids using a quadratic loss function and adopts a binary cross entropy loss function (BCE), and the minimized binary cross entropy loss function is used for U-Net network training, and the binary cross entropy loss function is defined as follows:

wherein: x and Y respectively represent a predicted image and a real label; w and H are the width and height of the image, respectively; x is the number of_ijAnd y_ijRespectively representing the pixel values of the predicted image and the real label at (i, j).

Although the U-Net network can show strong segmentation performance on a small data set, the performance of the network as a deep learning network greatly depends on a training data set. The invention enhances the data set by means of random rotation, cutting, mirror image and the like. Due to the limitation of hardware equipment, the Batchsize is set to be 6, and the RMSprop optimization algorithm is adopted to carry out three times of training, wherein the epochs at each time are respectively 40, 40 and 20, and the learning rates corresponding to the epochs are respectively set to be 0.01, 0.005 and 0.0001. And performing semantic segmentation on the aerial image by using the trained network model, performing quality enhancement on the segmented ROI, and filling a cavity in the ROI by using an expansion algorithm.

Feature extraction

China has definite regulations on the line type, color and the like of lane lines, the lane lines have obvious edge characteristics such as gradient, gray level and the like compared with the surrounding road surface environment, the detection and the positioning of the lane lines can be realized according to the characteristics, and further the identification and the positioning of roads are realized, but the traditional method mostly depends on single characteristics, and the invention simultaneously uses the gradient characteristics and the color characteristics to detect the lane lines.

And a plurality of lane line extraction algorithms based on gradient features in the visual information process the image through a Canny edge extraction algorithm to obtain the edge features of the image, and then complete the detection of the lane lines by combining Hough transformation. While the Canny algorithm extracts the road features, edge features in other directions and other irrelevant gradient features in the image, such as the outline of shadows on the road and the outline of vehicles, are extracted, and the irrelevant edge features become noise for subsequent lane line detection and are difficult to remove. The method can separately calculate the transverse or longitudinal gradient relative to the Canny operator Sobel operator, and can acquire the direction of the gradient through the transverse or longitudinal gradient, so that the Sobel operator is used for extracting the edge gradient feature of the road. The process of solving the gradient of the image direction by the Sobel operator can be understood as solving a first-order partial derivative of the xy direction of the image, and a discrete difference operator is often used for solving an approximate gradient in order to simplify the calculation.

Specifically, on a two-dimensional image, that is, the image is convolved by using two convolution kernels respectively to obtain a gradient Gx along the width and a gradient Gy along the height direction, the calculation of the Sobel gradient value G of the image can be very conveniently obtained.

For discrete gray scale images, the invention uses absolute value addition as the value of G in order to simplify the calculation process.

G＝|Gx|+|Gy|；

Due to the structural characteristics of the lane lines, abrupt change of the gradient of the edge of the lane lines often occurs in the horizontal direction, the transverse gradient and the gradient direction of the image are calculated, threshold binarization is set, the edge characteristics of the lane can be effectively extracted, and meanwhile, part of other noise can be prevented from being introduced. For the gradient of the image, the directional gradient of Sobel is needed, and the directional gradient is also needed when the gradient direction is obtained, and the nature of the Sobel directional gradient is a partial derivative in the horizontal and vertical directions, so the obtaining of the gradient direction can be carried out by the following formula.

The Sobel operator extracts gray scale edge features, and the edge features can be divided into gray scale edges and color edges according to the color features of the edge features. Studies have shown that about 90% of the edges in a color image coincide with their corresponding edges in a grayscale image, while about 10% of the edge features are undetectable using only grayscale images. Common color space models are: RGB, HSV, HSI, HSL, YCbCr, etc. Most of the current research is based on the RGB color space, but this color space is not easy to be divided and is computationally intensive. However, the RGB color space is a psychophysical color system, and it is difficult to simulate the visual perception of human color, and the HSL color model is a reprojection of RGB, which is an intuitive color-based color system. The HSL is hue, saturation and brightness, the color space is described by using a cone space model, the change of hue, brightness and saturation can be well realized, a color can be better defined and described, and the color space is very suitable for processing an image with obvious light and shade change.

V_max←max(R，G，B)；

Wherein: l is more than or equal to 0 and less than or equal to 1, S is more than or equal to 0 and less than or equal to 1, and H is more than or equal to 0 and less than or equal to 360;

since the linear characteristic of the lane is obvious, the mode of the gradient direction can be counted, and the rotation transformation of the angle is applied to the image, so that the lane line is perpendicular to the transverse edge of the image. The direction of the lane line is perpendicular to the x direction, so for the gradient feature, the gradient in the x direction and the gradient value are selected as the gradient feature. The color characteristics can be clearly found through experiments, and the color characteristics of the s channel are more suitable for detecting the lane line. And then respectively carrying out ROI masking treatment on the features to filter out non-road region noise, and carrying out threshold segmentation by using a 0tsu algorithm to extract lane line features.

Lane line detection

In the foregoing, the Sobel operator can well acquire the gradient features (including the gradient direction, the gradient value and the transverse gradient value) of the image, but when there is a shadow or other edge features on the road surface, the shadow or other edge features are also extracted to generate noise, so that the invention performs logical and operation on the gradient features. In addition, the feature extraction by using the color threshold value alone is also limited by the defect caused by the complex road surface condition, the gradient and the color feature are combined, and the specific combination method comprises the following steps:

(G_x∩G_mag)∪HSL_l

wherein: gx_{Watch (A)}Representing a transverse gradient binary matrix of the image, G_magGradient threshold binary matrices, HSL, representing images_lA threshold binary matrix representing the HSL color space s-channel.

Although the ROI mask obtained using U-Net segmentation has removed a lot of non-road background noise, there are still some other noises in these feature maps, which are mostly discrete small white dots. In addition, the lane line is broken. In order to eliminate the influence of the noise, the quality of the characteristic diagram needs to be enhanced, and the detection precision is improved. Specifically, small objects (such as burrs and isolated points) are eliminated by using an open operation on the premise of ensuring that the area of the object does not change significantly, and then the closed operation is used for filling the concave closed holes and cracks.

The invention adopts a sliding window algorithm to further detect the lane lines, and the initial positions of the lane lines and the number of the lane lines need to be determined. As shown in fig. 3, firstly, the number of non-zero pixels in each column of the feature map is counted by the feature histogram, and according to the pixel distance between the lane lines, the local maximum value of the feature histogram is obtained to determine the number of the lane lines and the initial position of the lane lines in the image.

The initial position of the lane line is obtained through the characteristic statistical histogram, and the lane line can be accurately detected by using a sliding window algorithm. As shown in fig. 4, first, parameters such as the size of the sliding window and the minimum number of non-zero pixels in the window are set, and the number n of sliding windows is calculated from the height of the sliding window and the size of the aerial image. And respectively taking the approximate position of each lane line as the middle point of the lower boundary of the initial sliding window, storing the coordinates of all non-zero pixel points in the window, counting the number of the non-zero pixel points in the window and counting the number of the non-zero pixel points in the window at a threshold value, if the number of the non-zero pixel points is larger than the threshold value, calculating the mean value of the horizontal coordinates of all the non-zero pixel points in the sliding window, taking the mean value as the middle point of the lower boundary of the next sliding window, always taking the vertical coordinate of the upper boundary of the sliding window as the vertical coordinate of the lower boundary of the next sliding window, and circulating until the. After the sliding window detection is completed, polynomial fitting is performed according to the stored coordinates of the nonzero pixel points in the sliding window to extract each lane line example (as shown in fig. 5).

Experimental data

The training data used in the experiment mainly comes from a WHUBuildingDataset data set, the aerial image of the unmanned aerial vehicle covers 450km2, the spatial resolution is 0.075 m, and the aerial image covers urban, suburban and wild areas. Considering the performance of hardware equipment, for the convenience of training, 273 pieces of data are selected from the complete data set without using the complete data set, meanwhile, part of self-collected aerial photography data (mainly suburbs or rural areas) in the north of the lake are involved, the size of the data is uniformly adjusted to 320 x 320, complete experimental data is formed, and the training data and the verification data are divided according to the proportion of 7: 3. The invention adopts the residual self-acquisition data as the test data to carry out the subsequent operation.

Results and analysis of the experiments

The method has the advantages that the semantic segmentation is carried out on the aerial images through the U-Net network to obtain the ROI, and compared with a common supervision classification method in remote sensing images, the U-Net network can achieve higher segmentation accuracy. In order to evaluate the segmentation precision of the U-Net, the ROI segmentation is carried out on the test data by using a commonly used supervision classification method in ENVI, maximum likelihood, mahalanobis distance and binary coding. For the two classification problems of ROI segmentation, a confusion matrix can be used for quantitatively evaluating the segmentation precision, the accuracy and the recall ratio are used as evaluation indexes, and the results are shown in Table 1;

TABLE 1

	Rate of accuracy/%)	Recall/%)
			Maximum likelihood	95.25	76.97
Mahalanobis distance	65.49	92.61
			binary	76.56	85.65

Wherein: tp (true positive) indicates the number of correct ROI pixels, fp (true negative) indicates the number of pixels that are not ROI pixels but are segmented into ROI, and FN indicates the number of pixels that are ROI pixels but are segmented into non-ROI.

The present invention was evaluated using the accuracy and recall for the lane lines, which were calculated to be 79.01% and 83.12% by defining the width of the lane line as 10 pixels. At present, most of lane line detection is based on vehicle-mounted image data, and the method cannot be directly compared with other methods due to different data sources. However, referring to the performance on the ground data set KITTI, the accuracy rate of the traditional Hough transformation detection method is 63.47%, the recall rate is 59.04%, and the effects of 92.81% and 93.19% of the recall rate can be achieved based on the Lanent of the deep learning method. Compared with lane line extraction based on vehicle-mounted equipment, the method provided by the invention can realize better precision by directly using the high-resolution remote sensing image of the unmanned aerial vehicle to detect and extract the lane line, and further proves the feasibility of the method provided by the invention.

Claims

1. A lane line extraction method based on high-resolution images of unmanned aerial vehicles is characterized by comprising the following steps: the method comprises the following steps:

s1, constructing and training a U-Net network model;

2. The lane line extraction method based on high-resolution images of unmanned aerial vehicles according to claim 1, wherein: in the step S1, a binary cross entropy loss function is used for training a U-Net network model; the binary cross entropy loss function is defined as follows:

3. The lane line extraction method based on high-resolution images of unmanned aerial vehicles according to claim 2, wherein: for the ROI mask obtained by segmentation in step S2, a hole filling algorithm is used.

4. The lane line extraction method based on high-resolution images of unmanned aerial vehicles according to claim 3, wherein: when the gradient feature of the aerial image of the unmanned aerial vehicle is obtained in step S3, two convolution kernels are respectively used to perform convolution processing on the aerial image of the unmanned aerial vehicle, so as to obtain a gradient Gx along the width and a gradient Gy along the height direction, and a calculation formula of a gradient value G is as follows:

G＝|Gx|+|Gy|；

the calculation formula of the gradient direction is as follows:

5. the lane line extraction method based on high-resolution images of unmanned aerial vehicles according to claim 4, wherein: in the step S3, when the color features of the aerial image of the unmanned aerial vehicle are obtained, an HSL color model is used for obtaining the color features; the calculation formula is as follows:

V_min←max(R，G，B)；

V_min←min(R，G，B)；

6. The lane line extraction method based on high-resolution images of unmanned aerial vehicles according to claim 5, wherein: in the step S3, the gradient feature and the color feature are combined and then synchronously extracted; the joint formula is as follows:

(G_x∩G_dir∩G_mag)∪HSL_l；

7. The lane line extraction method based on high-resolution images of unmanned aerial vehicles according to claim 6, wherein: in step S6, the number of non-zero pixels in each column in the feature histogram is counted, then the local maximum of the feature histogram is obtained according to the pixel distance between the lane lines, and the number of the lane lines in the feature image and the initial position of the lane lines in the feature image are determined according to the local maximum of the feature histogram.

8. The lane line extraction method based on high-resolution images of unmanned aerial vehicles according to claim 7, wherein: the step S7 includes the steps of: