CN112613392A

CN112613392A - Lane line detection method, device and system based on semantic segmentation and storage medium

Info

Publication number: CN112613392A
Application number: CN202011508480.4A
Authority: CN
Inventors: 宋聚宝; 潘定海; 原诚寅
Original assignee: Beijing New Energy Vehicle Technology Innovation Center Co Ltd
Current assignee: Beijing New Energy Vehicle Technology Innovation Center Co Ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-04-06

Abstract

The invention relates to the field of lane line detection, in particular to a method, a system and a storage medium for detecting a lane line based on semantic segmentation, wherein the method comprises the steps of performing image binarization semantic segmentation on characteristic data of an input image to obtain pixel points of the lane line, predicting the output category of the lane line by combining a data set, and calculating classification probability; and dynamically adjusting the weight coefficient and the sphere radius of a clustering algorithm according to the classification probability and the change of the lane line direction, classifying the pixel points through the clustering algorithm, and parameterizing each lane line by adopting a curve fitting algorithm. The embodiment combines semantic segmentation with a self-adaptive clustering method aiming at complex traffic environment, realizes pixel-level identification of lane lines, can meet urgent needs of automatic driving, and has very important significance on the practical application value and safety value of automatic driving by a high-performance detection algorithm.

Description

Lane line detection method, device and system based on semantic segmentation and storage medium

Technical Field

The invention relates to the field of lane line detection, in particular to a method, a device and a system for detecting lane lines based on semantic segmentation and a storage medium.

Background

With the increasing popularization of automobiles, the incidence rate of road traffic accidents is continuously increased, and the life and property safety of people is seriously influenced. Automatic driving can effectively reduce the occurrence of traffic accidents, and lane line detection and identification are important components of an automatic driving system. The traditional lane line detection method is mainly based on edge feature detection or image segmentation, is easily disturbed by illumination change, running vehicles, road damage and the like, causes the robustness of the algorithm to be reduced, and cannot achieve the required detection accuracy rate under bad weather and complex environment.

The lane line detection is a basic module in automatic driving, various lane line detection algorithms realized based on traditional image processing exist, and in the aspect of lane line detection, the traditional detection method mainly carries out edge detection, thresholding processing and curve fitting on a road image based on an image processing technology. The method mainly comprises the steps of firstly carrying out inverse perspective transformation on an ROI (region of interest), then carrying out Gaussian filtering in a special direction, carrying out thresholding, then carrying out linear or curve fitting on the result, wherein the fitting method mainly comprises a least square method, a median intercept method, Hough transformation and Randac fitting, and finally positioning and extending the fitted curve. The algorithm needs to acquire camera parameters, has higher requirement on definition of a sign line of a road image, and has poorer robustness and generalization capability; however, the lane line is a traffic sign which is easily corroded, and the lane line is damaged or even blurred with the increase of the service time, so that the features of the lane line are difficult to extract and detect by a plurality of traditional lane line detection algorithms, the lane line is of various types, different cities can have more or less differences with other cities due to the actual traffic conditions of the cities, and the accurate detection of the lane line at different positions becomes a great challenge.

With the progress of research, the scenes which are dealt with by the lane line detection task are more and more diversified, and the low-level understanding of 'white and yellow lines' is gradually removed. How to realize the detection of the position of the lane line on the semantics based on the fuzzy, illuminated and even completely shielded lane line image is almost impossible to realize based on the traditional image processing method, and the method becomes a problem to be solved in the field.

The deep learning method utilizes the network model to automatically learn the target characteristics, has higher generalization capability and can effectively improve the accuracy of target detection. Because of the excellent performance of convolutional neural networks, many researchers have conducted many computer vision tasks based on convolutional neural networks in recent years, and convolutional neural networks have enjoyed great success in the field of machine vision applications in recent years. Object detection and image segmentation are the hot directions therein. However, in the prior art, a convolutional neural network is generally developed for a general purpose, and no research for detecting lane lines by using the convolutional neural network in a complex traffic environment is available.

Disclosure of Invention

In view of the technical defects and technical drawbacks in the prior art, embodiments of the present invention provide a method, an apparatus, a system, and a storage medium for detecting lane lines based on semantic segmentation, which overcome or at least partially solve the above problems.

As an aspect of the embodiments of the present invention, there is provided a lane line detection method based on semantic segmentation, the method including:

marking a lane line of the driving image, and constructing a data set of the lane line;

extracting characteristic data of an input image, performing image binarization semantic segmentation to obtain pixel points of a lane line, predicting the output category of the lane line by combining the data set, and calculating classification probability;

filtering and communicating pixel points of the lane lines, dynamically adjusting weight coefficients and sphere radiuses of a clustering algorithm according to classification probability and changes of the lane line direction, and classifying the pixel points through the clustering algorithm to obtain the clustered lane lines;

and traversing the connected domain of the clustered lane lines, counting corresponding pixel points, and fitting a parameter equation.

Further, the steps of extracting the characteristic data of the input image, performing image binarization semantic segmentation to obtain pixel points of the lane line, predicting the category of the lane line by combining the data set, and calculating the classification probability include:

creating an encoder network for extracting feature data of an input image based on the FCN-VGG16 network, the encoder network including a first preset number of volume blocks and pooling layers, a second preset number of full volume layers; and the convolution layer in the convolution block adopts a linear rectification function as an activation function, the full convolution layer comprises a dropout layer for preventing transition fitting, and the characteristic image output by the encoder network and the intermediate image output by the intermediate pooling layer are classified and predicted through a prediction network to obtain the lane line type of the input image.

The method further comprises the step of carrying out deconvolution on the characteristic image output by the encoder network through a decoder network to restore the image size, wherein a layer jump structure of the decoder network is provided with a first stage and a second stage, the first stage is to combine the output of a first preset intermediate pooling layer and the output of a final full convolution layer, and the second stage is to combine the output of a second preset intermediate pooling layer and the output of the first stage.

Further, the prediction network comprises a prediction layer, an deconvolution layer and a shear layer, a point-by-point addition layer and a softmax layer;

inputting the output of the last full convolution layer of the decoder network into a first prediction layer, inputting the output of the first prediction layer into a first preset deconvolution layer amplification characteristic image, and inputting the output of the deconvolution layer into a first shear layer; inputting a second preset intermediate pooling layer into a second prediction layer, setting a first stage of a layer jump strategy behind the second prediction layer, and inputting the output of the first shearing layer and the output of the second prediction layer into a first point-by-point addition layer;

the output of the point-by-point addition layer is input into a second preset deconvolution layer, and the output of the second preset deconvolution layer is input into a second shear layer; inputting the output of the first preset intermediate pooling layer into a third prediction layer, and setting a second stage of a layer jump strategy behind the third prediction layer, wherein the second stage further comprises inputting the output of the second shear layer and the output of the third prediction layer into a second point-by-point addition layer; and/or

The prediction network is provided with a third preset number of output categories, and the classification probability of the output categories of the lane lines is calculated through the softmax layer.

Further, the steps of filtering and communicating the pixel points of the lane line, dynamically adjusting the weight coefficient and the sphere radius of a clustering algorithm according to the classification probability and the change of the lane line direction, classifying the pixel points through the clustering algorithm to obtain the clustered lane line comprise,

segmenting independent lane line pixel points and communicating adjacent pixel points through morphological operation;

traversing pixel points of the lane line connected domain, and filtering and determining the type of the connected domain according to the lane line identification and the number of the pixel points; and/or

Outputting the classification probability through an FCN-VGG16 network, and determining a weight coefficient of a clustering algorithm; analyzing the change trend of the pixel points to judge the lane line direction, determining the radius of a sphere and constructing a self-adaptive Mean Shift clustering algorithm;

and classifying the pixel points of each lane line by using an adaptive Mean Shift clustering algorithm.

Further, the step of marking the lane line of the driving image comprises:

forming a closed area in the lane line area through a multipoint connection line, and marking the contour line of the closed area;

drawing a lane line of the shielding part by combining points of the partially assumed lane line displayed in the driving image, and connecting the points of the same lane line for marking;

and marking the unclear lane lines in the driving image by adjusting the scale.

Further, the method further comprises:

collecting driving videos of different road conditions, different weather conditions, different illumination conditions and different traffic conditions to obtain images to be processed;

correcting the image to be processed to obtain a driving image;

and preprocessing the marked driving image to increase the sample size of the data set, wherein the preprocessing mode comprises one or more of style migration, brightness transformation, symmetric transformation and mirror image transformation.

As another aspect of the embodiments of the present invention, there is also provided a lane line detection apparatus based on semantic segmentation, including:

the sample library construction module is used for marking the lane line of the driving image and constructing a data set of the lane line;

the semantic segmentation module is used for extracting characteristic data of an input image, performing image binarization semantic segmentation to obtain pixel points of the lane line, predicting the output category of the lane line by combining the data set and calculating classification probability;

the clustering module is used for filtering and communicating pixel points of the lane lines, dynamically adjusting weight coefficients and sphere radiuses of a clustering algorithm according to the classification probability and the change of the lane line direction, and classifying the pixel points through the clustering algorithm to obtain the clustered lane lines;

and the fitting module is used for traversing the connected domain of the clustered lane lines, counting corresponding pixel points and fitting a parameter equation.

As a further aspect of the embodiments of the present invention, there is provided a lane line detection system based on semantic segmentation, including: a memory, a processor, a communication bus, and a lane line detection program based on semantic segmentation stored on the memory,

the communication bus is used for realizing communication connection between the processor and the memory;

the processor is configured to execute the semantic segmentation-based lane line detection program to implement the steps of the semantic segmentation-based lane line detection method according to any one of the above embodiments.

As another aspect of the embodiments of the present invention, there is also provided a storage medium having at least a semantic segmentation based lane detection program stored thereon, wherein the semantic segmentation based lane line detection program, when executed by a processor, implements the steps of the semantic segmentation based lane line detection method according to any one of the above embodiments.

The embodiment of the invention at least realizes the following partial technical effects:

the embodiment combines semantic segmentation with a self-adaptive clustering method aiming at complex traffic environment, realizes pixel-level identification of lane lines, can meet urgent needs of automatic driving, and has very important significance on the practical application value and safety value of automatic driving by a high-performance detection algorithm.

Furthermore, the FCN-VCG16 full convolution neural network is adopted, optimization and innovation are carried out on the network, so that the segmentation of lane lines in a road image is better adapted, an optimized and improved self-adaptive clustering method frame and a lane line fitting method are used through a proper image enhancement method and a high-quality sample library, and the robust performance and the generalization performance are stronger; the self-adaptive clustering algorithm constructed in the embodiment determines the weight coefficient of the image by using the FCN network characteristics, dynamically changes the radius of the Mean Shift sphere by changing the direction of the lane line, greatly optimizes the clustering effect, and has a good effect on the actual scene with large pixel change of the lane line.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a method for detecting lane lines based on semantic segmentation according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an overall network structure of a lane line detection method based on semantic segmentation according to an embodiment of the present invention;

FIG. 3 is a flow chart of an adaptive Mean Shift algorithm according to an embodiment of the present invention;

FIG. 4 is a flow chart of a point set filtering algorithm and a communication method according to an embodiment of the invention

FIG. 5 is a flow chart of the adaptive Mean Shift algorithm for sphere radius calculation in an embodiment of the present invention;

fig. 6 is a schematic diagram of a lane line detection apparatus based on semantic segmentation according to an embodiment of the present invention.

Detailed Description

In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.

The figures and the following description depict alternative embodiments of the invention to teach those skilled in the art how to make and use the invention. Some conventional aspects have been simplified or omitted for the purpose of teaching the present invention. Those skilled in the art will appreciate that variations or substitutions from these embodiments will fall within the scope of the invention. Those skilled in the art will appreciate that the features described below can be combined in various ways to form multiple variations of the invention. Thus, the present invention is not limited to the following alternative embodiments, but is only limited by the claims and their equivalents.

In one embodiment, as shown in fig. 1, there is provided a lane line detection method based on semantic segmentation, the method including:

s11, marking a lane line of the driving image, and constructing a data set of the lane line;

s12, extracting characteristic data of an input image, performing image binarization semantic segmentation to obtain pixel points of a lane line, predicting the output category of the lane line by combining the data set, and calculating classification probability;

s13, filtering and communicating the pixel points of the lane lines, dynamically adjusting the weight coefficient and the sphere radius of a clustering algorithm according to the classification probability and the change of the lane line direction, and classifying the pixel points through the clustering algorithm to obtain the clustered lane lines;

s14 traversing the connected domain of the clustered lane lines and counting corresponding pixel points to perform fitting of a parameter equation.

In this embodiment, referring to fig. 2, the overall network structure diagram is based on pixel-level semantic segmentation and example segmentation research, the embodiment converts the lane line detection problem into an example segmentation problem, and outputs a result as each lane line example in the road image. The semantic segmentation of the embodiment can be based on a VGG-16 network, and the example segmentation consists of two parts, namely road image binarization semantic segmentation and lane line pixel clustering. The image binarization semantic segmentation is to binarize a road image containing various information to obtain pixel points of a lane line; and further clustering the obtained lane line pixel points through lane line pixel clustering to form different lane line examples. The method and the device can solve the problem that the detection algorithm is invalid when the number of the lane lines in the road image changes, improve the operation speed of the algorithm, and meet the real-time requirement of lane line detection. When different lane line examples are detected, each lane line is parameterized by adopting a curve fitting algorithm, wherein the fitted mathematical model can comprise a polynomial model, a spline curve model and a parabolic model. In order to ensure the efficiency of calculation and fitting, the image is preferably converted into a bird's-eye view by perspective transformation to perform curve fitting, and the curve is projected into the original image by an inverse transformation matrix after the fitted curve is generated.

In this embodiment, the lane line detection may be performed based on a deep learning Full Convolution Network (FCN), where the image segmentation is substantially classification at a pixel level, and the segmentation of the lane line is to correctly classify the pixels belonging to the lane line. The semantic segmentation of the embodiment of the invention can be an FCN-based image segmentation network, can complete the detection of the lane line by a pixel-level classification network, and realizes the same size of the output characteristic diagram and the original image.

The image semantic segmentation is at a pixel level, and the image binarization semantic segmentation outputs a binary segmentation image and divides image pixel points into lane lines and a background;

in the prior art, the convolutional neural network CNN loses image details in the convolutional convolution and pooling posing processes, that is, the feature map size becomes gradually smaller, so that the specific contour of an object cannot be well pointed out, the specific object to which each pixel belongs cannot be pointed out, and accurate segmentation cannot be achieved. The embodiment may be image semantic segmentation based on full probabilistic Networks (FCN), where the basic structure of FCN can be regarded as being composed of two parts, the first half is a convolution pooling layer, and is the same as a conventional convolution neural network, and is used for capturing context and semantic information of an image to express image features. The second half is an upsampling layer, which is used to restore the feature map extracted from the first half of the FCN to the original image size. For the original image, after the convolution and pooling operations, the feature map size may become 1/2, 1/4, 1/8, 1/16 through 1/32 the original image size. The feature map is then up-sampled by a factor of 32 to restore the original image size, thus completing the image segmentation. In order to obtain better segmentation effect, the segmentation result of the front layer and the segmentation result of the back layer can be effectively fused through a jump structure, for example, based on three different structures of FCN, such as FCN-32, FCN-16 and FCN-8. The three FCN structures differ in the step size of the last convolution. It is worth noting that all three FCN network architectures share the same convolutional pooling layer, but the way of upsampling and hopping connections differs.

In addition, there are significant differences in the FCN network structure depending on the encoder network used. The current known classification models including AlexNet, VGG-16, GoogLeNet and ResNet can be converted into full convolution models, and the core operation is as follows: all the connection layers are replaced by convolution layers, and space mapping is output instead of classification scores. These mappings are derived from small step convolution upsampling (also known as deconvolution) to produce dense pixel-level labels. FCN is to classify the image in pixel level to realize the image segmentation in semantic level, and it adopts the deconvolution layer to sample the feature map of the last convolution layer to restore it to the same size of the input image, so as to predict each pixel and keep the space information of the original input image.

In S14, the specific method of the lane line parameter fitting may include fitting a parameter equation to the lane line after traversing all the connected domains and counting the corresponding pixel points, and solving a curve y ═ p (x) that is similar to the true curve y ═ f (x) by using the m known coordinate points using the least square method. Assuming that m coordinate points (i ═ 0, 1, …, m), an n-th order approximating polynomial function passing through the m coordinate points is given as follows: h is_θ(x)＝θ₀+θ₁x+θ₂x²+...θ_nxⁿWhere theta (theta)₀，θ₁，θ₂，…，θ_n) For the parameters, find a set of θ (θ)₀，θ₁，θ₂，…，θ_n) So that

(sum of squared residuals) minimization, i.e. solving

In one embodiment, the S12 includes:

In the embodiment, optimization and innovation in different scenes and different semantic environments are realized by utilizing FCN semantic segmentation, and the point cloud semantic segmentation scene is modified on the basis of selection of FCN-VGG 16. The main function of the encoder network in semantic segmentation is to create an abstract representation of the input feature map, and the encoder network is usually constructed using an existing classification network. Through supervised learning, the encoder network can self-learn features for semantic segmentation. The embodiment designs an encoder network based on VGG16, VGG16 adopts a plurality of 3 x 3 convolution kernels, and small convolution kernels are stacked by combining nonlinear layers, so that a more complex mode can be learned under the condition that the number of network layers is deepened.

Preferably, the encoder network in this embodiment is used to extract feature data of an image, and the specific structure may be as shown in table 1, where a plurality of convolutional layers are stacked and referred to as a convolutional Block (Convolution Block), the entire encoder network includes 5 convolutional blocks, all convolutional layers in the network use 3 × 3 convolutional kernels, all pooling layers use maximum pooling, and the size of the maximum pooling is 2 × 2. In the feature extraction stage, two-dimensional gridding is used for organizing point cloud data features, the input of an encoder network is a tensor with the size of 512 multiplied by 3, wherein the tensor with the size of 512 multiplied by 3 refers to the size of an input image, the width and the height are 512, and RGB three channels are represented by vectors to be 512 multiplied by 3; the convolutional layers in table 1 may use linear rectification functions as activation functions; the step size of the pooling layer is 2 × 2, so each time pooling is performed, the width and height of the feature map become half of the original width and height. The encoder network back-end uses two full convolutional layers to replace the two full connection layers in the original VGG16 network, and retains the two dropout layers in the VGG16 to prevent model overfitting.

TABLE 1

After the sign extraction of the encoder network, the size of the input feature map is reduced to 16 × 16 due to the step size of the pooling layer, and the image after pooling becomes the original 1/2 each time, and the encoder network preferably uses 5 times of pooling in the embodiment; prediction networks are used for the final output of the encoder network and the output of the intermediate pooling layer, respectively, wherein the prediction networks are used for classification, prediction is performed on the categories of the lane lines of the output and input images, such as solid lines, dotted lines, and the like, and the prediction output maps are combined and upsampled to obtain a final prediction output map with the size of 512 x 512.

In one embodiment, the method further comprises deconvolving the feature image output by the encoder network to recover the image size by a decoder network, the layer-skipping structure of the decoder network is provided with a first stage of combining the output of the first preset intermediate pooling layer and the output of the last full convolution layer, and a second stage of combining the output of the second preset intermediate pooling layer and the output of the first stage.

In this embodiment, the decoder network is mainly a deconvolution operation and a summations element by element (pointwise add) operation. Since the heatmap only retains high-level features and does not contain detail information of the segmentation boundary, the embodiment can preferably adopt a layer jump structure of FCN-8s to refine the segmentation boundary; the heat map is a feature vector obtained by convolution in the encoder, namely various features of the input image are extracted through the encoding network, the size of the feature map is continuously reduced, and then the reduced image is subjected to deconvolution through the decoding network to recover the image size. The jump layer structure of FCN-8s combines the first preset middle pooling layer maxpool-4 layer, the second preset middle pooling layer maxpool-3 layer, and the prediction output of the heatmap, respectively, and the jump layer structure and the prediction network in this embodiment may be coupled together, i.e., prediction classification is performed before the jump layers are stacked. The layer jump structure comprises two stages: a first stage of combining the output of the maxpool-4 layer and the final output of the encoder network (fc-2 layer output); the second stage combines the output of the maxpool-3 layer with the output of the first stage.

In one embodiment, the prediction network comprises a prediction layer, an deconvolution layer, and a shear layer, a point-by-point addition layer, and a softmax layer;

In this embodiment, a 1 × 1 convolution kernel can be used, and the specific structures of the decoder network and the prediction network are shown in table 2, in this embodiment, the output of fc-2 layer of the encoder network is first input to a prediction convolution layer score-1, the size of convolution kernel of score-1 layer is 1 × 1, and the number of output channels is 6. The output prediction map of the score-1 layer is then input to a deconvolution layer deconv-1, which has the effect of scaling the heat map size up to 32 x 32, the output size of the maxpool-4 layer, in order to superimpose the two prediction maps, which can use a 4 x 4 convolution kernel, with a step size of 2 x 2,

since the deconvolution operation is the inverse of the convolution, the output has a size that is the sum of the input size of the normal convolution and the fill size, W in the following equation_outAnd h_outThe width and height of the output graph are calculated according to the formula (1-1) and the formula (1-2) in the deconvolution process.

w_out＝(w_in-1)×s+f (1-1)；

h_out＝(h_in-1)×s+f (1-2)；

Wherein S represents the step size and f represents the size of the convolution kernel, then according to this formula, the output size of the deconv-1 layer is 34 × 34, in order to ensure the size consistency, the output map of the deconv-1 layer is input into the crop layer crop-1, and the crop-1 layer cuts out 2 rows and 2 columns to obtain the output map with the size of 32 × 32 × 6. Next, a prediction layer score-2 was added after maxpool-4, and prediction classification was performed by using a 1 × 1 convolution kernel in the same manner, thereby obtaining an output prediction map having a size of 32 × 32 × 6. Where the first skip is added, the output plot of crop-1 and the output plot of score-2 are summed point-by-point to obtain an output plot of the same size 32 x 6, such a skip allows both the deep features and the features of the relatively shallow layer (the fourth pooling layer) to be preserved, thus completing the first stage skip.

Preferably, the layer skipping is carried out three times, the basic logic is that the image obtained after coding becomes small, therefore, three times of coded output is subjected to deconvolution in the convolution process, namely, the layer skipping is carried out to obtain the image, then fusion is carried out, the size of the obtained image is consistent with that of the input, and meanwhile, the image also retains enough high-level characteristics.

TABLE 2

And realizing the second-stage layer jump in the same way, as shown by a deconvolution layer deconv-2 in a table, firstly deconvolving the output of the first-stage layer jump, using a 4 × 4 convolution kernel and a 2 × 2 step size by the deconv-2 to obtain an output graph with the size of 66 × 66 × 6 in the same way, further upsampling the prediction graph, shearing the prediction graph with the same way to obtain an output graph with the size of 64 × 64 × 6, then performing prediction classification on the output of the maxpool-3, performing layer jump point-by-point addition on the outputs of crop-2 and score-3 to complete the second-stage layer jump, and adding the characteristics of a shallower layer (a third pooling layer) into the final prediction output.

After the two-stage layer jump is completed, the present embodiment restores the prediction graph to the input size (512 × 512 × 6), which is achieved by deconvolution and cropping. As shown in the deconvolution layer deconv-3 in table 2, a prediction map with a size of 64 × 64 × 6 is upsampled to 520 × 520 × 6 using a large convolution kernel of 16 × 16 with a step size of 8 × 8, and finally 8 rows and 8 columns are cut out by the cut-out layer crop-3 to obtain a prediction output map with a size of 512 × 512 × 6.

Preferably, the output categories of this embodiment may include a real straight line, a virtual straight line, a real curve, a virtual curve, a stop line, and an unknown type, the third prediction number may be 6 types of output categories, in order to obtain a classification probability of each category, a softmax layer is added at the end of the network, a probability that each mesh belongs to each category is calculated, and then a category to which the mesh belongs is obtained; it can be shown how likely the lane in the image belongs to a straight lane, i.e., the probability of an accurate description.

In one embodiment, in conjunction with fig. 3, the S13 includes,

s21, segmenting independent lane line pixel points and communicating adjacent pixel points through morphological operations;

s22 traversing the pixel points of the lane line connected domain, and filtering and determining the type of the connected domain according to the lane line identification and the number of the pixel points;

s23, outputting the classification probability through the FCN-VGG16 network, and determining the weight coefficient of the clustering algorithm; analyzing the change trend of the pixel points to judge the lane line direction, determining the radius of a sphere and constructing a self-adaptive Mean Shift clustering algorithm;

s24 classifies the pixel points of each lane line by using an adaptive Mean Shift clustering algorithm.

The binary image is obtained by semantic segmentation of the lane lines, noise caused by more road surface shadows and stains can be filtered, and interference caused by vehicle edges and roadside guardrails cannot be avoided. In the embodiment, the clustering algorithm is utilized to effectively reduce noise and improve the fitting precision of the lane line. In order to separate pixel points in a road image to obtain different lane lines, when network loss function setting is carried out, the distance between the pixel points belonging to the same lane line is smaller, and the distance between the pixel points of different lane lines is larger. The constituent elements of the loss function include variance and distance. The purpose of clustering is to determine the points of the same lane line, such as two adjacent lane lines, the pixels are very close, and the clustering is used for classifying the pixel points of each lane line respectively.

The convolutional neural network can well classify each lane line, and the classification result can be used as the input of the secondary curve fitting of the lane lines. However, the result of the convolutional neural network for lane line segmentation may be inaccurate, that is, the class id of the pixel point belonging to the second lane line in the segmented lane line region may be wrong. In the image, the form of the lane line is generally a straight line or a similar parabola, the embodiment combines the form rule of the lane line and the characteristic of judging the far and near directions of the lane line according to the thickness of the lane line to construct an adaptive clustering algorithm, the image quality is improved by using the image morphology and a connected region matching method, the weight and the size of the clustering algorithm are adjusted by predicting the pixel change direction of the lane line, and the algorithm uses a Mean shift clustering algorithm.

In this embodiment, S21 is to perform image morphology processing and connected region verification, wherein the image morphology operation is to change the shape of the object, for example, erosion is "thinning" and inflation is "fatening", and the function thereof is mainly to eliminate noise; segmenting independent image elements and connecting adjacent elements in the image; searching for an obvious maximum value area or a minimum value area in the image; after morphological operations are used, the set of lane line segmentation result points should be filtered. The method comprises the steps that a convolutional neural network model is segmented to obtain lane line pixels at an entity level, all connected domains are found from a result, after all connected domains in the segmented result are found, each pixel in each connected domain is traversed, a lane line id (namely the number of lane lines) of each pixel is checked, the number of pixel points contained in each lane line id is counted, if the same connected domain contains a plurality of lane line ids, the lane line ids in the connected domains need to be combined and combined into the lane line id with the largest number of pixel points. The filtering method of the present embodiment adopts a connected domain matching method. With reference to fig. 4, the flow steps of the point set filtering algorithm and the communication method include:

s31 finding connected domains in the semantic segmentation result;

s32 counting lane line type and pixel point number;

s33, judging whether the same connected domain contains a plurality of lane line ids, if yes, turning to S34; if not, go to S35;

s34 merging connected domain types according to the lane line id with the largest number of pixel points;

s35, judging whether the traversal of the connected domain is finished; if yes, go to S36; if not, go to S32;

s36 saves the classification result for each connected domain.

In S23, an adaptive Mean shift algorithm is constructed based on a standard Mean shift algorithm, wherein the standard Mean shift algorithm is an algorithm based on kernel density estimation and can be used for clustering, image segmentation, tracking and the like. In the d-dimensional space, a point k is selected optionally, then a high-dimensional ball is made by taking the point as the center of the ball and taking h as the radius, the centers of the points falling in the ball and the center of the ball form corresponding vectors, and the vectors take the center of the circle as a starting point and the points falling in the ball as an end point. These vectors are added, and the result of the addition is a Mean shift vector Mh. The Mean shift algorithm solves a vector, so that the sphere center always moves towards the direction with the maximum density of the data set, and the average position of the points in the sphere is found as a new sphere center position during iteration. The Mean shift vector is calculated as (2-1).

In the formula: m_hIs Mean Shift vector; k is the center of the high-dimensional ball; m is_iEach vector point in the sphere with h as the radius; k is the number of vector points in the high-dimensional sphere; s_hIs a high-dimensional sphere area with a radius h and consists of a set of vector points n which satisfy the following relation, and the formula is (2-2).

S_h(k)＝{n：(n-m_i)^T(n-m_i)＜h²} (2-2)

The standard Mean Shift algorithm defines a kernel function, the adaptive Mean Shift algorithm of the embodiment adds a weight coefficient on the basis of the kernel function, the kernel function is defined so that the contribution of the offset value to the offset vector is different along with the difference of the distance between the sample and the offset point, and the weight coefficient makes the weights of different samples different. In this embodiment, the weight coefficient is dynamically adjusted according to different conditions, and an adaptive Mean Shift clustering algorithm is constructed according to the sphere radius. The weight coefficient is obtained according to characteristic parameters of FCN network activation, each characteristic is provided with a corresponding coefficient when being activated, the coefficient is obtained through big data statistics according to practical experience, each image is processed through FCN to obtain a corresponding activated characteristic parameter list, and the weight coefficient of the Mean Shift corresponding to the image can be obtained through calculation. The specific process is as follows:

s41, acquiring FCN activation characteristic parameters; wherein the characteristic parameter may be a classification probability;

s42 calculates a weight coefficient corresponding to Mean Shift to the image in association with each feature coefficient.

The radius of the sphere is determined according to the change of the lane line direction in the image, which is similar to the judgment of the distance of the rear vehicle by using a rearview mirror of the automobile. The change trend of the pixels is analyzed to judge the distance direction of the lane line, the same lane, if the widths are equal, the occupied pixel points are less, the distance facing an observer is longer, the occupied pixel points are more, and the distance facing the observer is shorter. With reference to fig. 5, the specific flow chart is as follows:

s51 inputting an image;

s52 calculating X-direction pixels in the image;

s53 determining the X direction;

s54 calculating Y-direction pixels in the image;

s55 determining the Y direction;

s56, calculating the direction angle of the current lane line;

s57 determining the current lane line direction by combining the lane line polynomial;

s58 calculates the sphere size.

In one embodiment, the step of labeling the lane line of the driving image in S11 includes:

and marking the unclear lane lines in the driving image by adjusting the scale.

In this embodiment, the marking of the lane line is not simply marking the lane line in the driving image, but continuously marking the same lane line, no matter a solid line or a dotted line, for the situation of shielding, etc., the shielding part of the lane line is drawn according to the front and back visible part of the lane line, for the image with some unclear lane lines, the marking software can enlarge and reduce the image, conveniently marking the lane line, completely marking the lane line according to the outline of the lane line, and the adopted marking mode and the lane line marking software modified on the basis of the source code can realize the marking of the pixel level of the image. When the images are labeled in actual operation, the images can be sequentially labeled from left to right, when a certain lane line is labeled, only the lane line region needs to be selected through a plurality of point frames to form a closed region, then the labeling of the lane line can be completed by double-clicking the corresponding label on the right side of the software, by adopting the labeling mode, the lane line segmentation model can extract and learn the features under the severe conditions, the conditions that the subsequent model can better deal with shielding, the lane line is not clear and the like are ensured, and the method has a certain prediction function.

In the embodiment, the marking can at least comprise two parts, namely, the marking of the outlines of special lane lines such as curves, chutes, turntables and the like is realized in a multi-point line connection mode; another meaning is that the actual labeling is also performed for unclear, blocked, etc., not only the lane lines displayed in the picture are labeled, but also the blurred and lost parts are included.

In this embodiment, the annotation refers to an action and an operation, the image is annotated to form a sample library of a data set, and the subsequent semantic model refers to a method using the sample library, for example, after a series of processing is performed on the input image, the sample library data is called through a certain logic to compare and calibrate, and a result is output.

The marking quality of the data set has important influence on the accuracy and robustness of the model, so that the reasonable lane marking mode is adopted, and the lane line is accurately marked and classified, thereby having important significance on the lane line detection model. The marking mode of the lane line has great influence on the final segmentation effect of the whole model, and the embodiment can ensure that the subsequent semantic model has stronger robustness to the adverse conditions such as shielding of the lane line and unclear marking of the lane line, so that the model can realize the prediction of the lane line under some extreme conditions.

When a training data set of the lane lines is constructed, when the lane lines in the image are shielded by vehicles or have no clear visible lane lines, points of some lane lines can be assumed and are connected, and the points of the assumed lane lines can supplement fuzzy lane lines when being labeled manually.

In one embodiment, the method further comprises:

correcting the image to be processed to obtain a driving image;

In this embodiment, the preprocessing may be a region of interest (ROI), that is, in image processing, a region to be processed is outlined in a manner of a square, a circle, an ellipse, an irregular polygon, or the like from a processed image; for example, one image is large, but there are lane lines in one corner, and the others are grasses, etc., which can reduce the amount of calculation.

Because the actual traffic environment is complex and changeable, the traffic rules of different countries and regions have many differences, so that the types and the designs of traffic signs and lane lines can have more or less differences; meanwhile, in the actual driving process, weather conditions and illumination conditions cause great difficulty in detecting the lane lines, cameras are mostly adopted for detecting the lane lines, when rainy weather, snowy weather and foggy days occur, the lane lines are polluted, reflected light is blocked, and the like, and the cameras are temporarily disabled under the conditions of sudden change of illumination or over-strong illumination, road surface reflection and the like, the embodiment doubles the marked data set by carrying out modes of style migration, brightness transformation, symmetrical transformation and the like on the images, and collects driving videos of different road conditions, different weather conditions, different illumination conditions and different traffic conditions to manufacture a lane line identification data set; the lane line data under different scenes and a certain scene are increased, so that the extraction of different lane line characteristics of the scenes by the semantic segmentation model is facilitated, the higher the lane line detection accuracy of the model under the scene is, and the better the robustness of the model is.

In the embodiment, data is enhanced through a series of data enhancement means, and limited training data is utilized to the maximum extent; the mirror image transformation is equivalent to the transformation of the view angle of the driving lane into the view angle of the opposite lane, and for a computer, the mirrored pictures are different pictures, wherein the pictures contain different lane line characteristic information, so that a subsequent semantic segmentation model can learn more lane line characteristic information.

For example, the scene under different light can be simulated by brightness conversion, different angles can be simulated by symmetry and mirror image conversion, and multiple materials of N can be obtained by one-time shooting.

The digital image is a matrix for a computer, the matrix dimensions of different types of images are different, a general gray image is a two-dimensional matrix, an RGB color image is a three-dimensional matrix, the error rate in the training process and the error rate in the testing process of a subsequent algorithm model can be reduced through the data enhancement, and the method has a good promotion effect on the training of a small data set model, particularly on a convolutional neural network.

Because the edge of the image is easy to deform and is formed by a lens of a camera lens, the imaging accuracy is ensured by correcting each image to be processed; since the camera imaging forms a perspective effect according to the distance of an object, namely, the lane lines seen by people gradually converge to a point at a distance, the lane lines are parallel in the real world as if looking down from an aerial perspective. In the embodiment, the lane lines are converted into parallel images through advanced processing, so that the correlation coefficient of the quadratic curve can be correctly analyzed and the curvature radius can be calculated at the back, and finally the lane lines of the curve can be identified.

Based on the same invention concept, the embodiment of the invention also provides a lane line detection device based on semantic segmentation, a lane line detection system based on semantic segmentation and a storage medium, and as the principle of the problems solved by the invention is similar to the lane line detection method based on semantic segmentation of the previous embodiment, the implementation of the lane line detection device based on semantic segmentation, the lane line detection system based on semantic segmentation and the storage medium can be referred to the previous embodiment of the lane line detection method based on semantic segmentation, and repeated parts are not repeated.

In one embodiment, there is also provided a lane line detection apparatus based on semantic segmentation, which, with reference to fig. 6, includes:

the sample library construction module 11 is used for marking a lane line of the driving image and constructing a data set of the lane line;

the semantic segmentation module 12 is used for extracting characteristic data of an input image, performing image binarization semantic segmentation to obtain pixel points of a lane line, predicting the output category of the lane line by combining the data set, and calculating classification probability;

the clustering module 13 is used for filtering and communicating the pixel points of the lane lines, dynamically adjusting the weight coefficients and the sphere radius of a clustering algorithm according to the classification probability and the change of the lane line direction, and classifying the pixel points through the clustering algorithm to obtain the clustered lane lines;

and the fitting module 14 is used for traversing the connected domain of the clustered lane lines, counting corresponding pixel points and fitting a parameter equation.

In one embodiment, there is provided a semantic segmentation based lane line detection system, including: a memory, a processor, a communication bus, and a lane line detection program based on semantic segmentation stored on the memory,

In one embodiment, a storage medium is further provided, the storage medium having at least a semantic segmentation based lane detection program stored thereon, the semantic segmentation based lane line detection program when executed by a processor implementing the steps of the semantic segmentation based lane line detection method according to any one of the above embodiments.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A lane line detection method based on semantic segmentation is characterized by comprising the following steps:

2. The method for detecting the lane line based on the semantic segmentation as claimed in claim 1, wherein the steps of extracting the feature data of the input image, performing the image binarization semantic segmentation to obtain the pixel points of the lane line, predicting the category of the lane line by combining the data set, and calculating the classification probability comprise:

3. The method according to claim 2, wherein the method further comprises deconvolving the feature image outputted from the encoder network to recover the image size through a decoder network, and the layer-skipping structure of the decoder network has a first stage and a second stage, the first stage is to combine the output of the first preset intermediate pooling layer and the output of the last full convolution layer, and the second stage is to combine the output of the second preset intermediate pooling layer and the output of the first stage.

4. The method according to claim 3, wherein the prediction network comprises a prediction layer, an deconvolution layer and a shear layer, a point-by-point addition layer and a softmax layer;

5. The method according to claim 1, wherein the step of filtering and communicating the pixels of the lane line, dynamically adjusting the weight coefficient and the sphere radius of the clustering algorithm according to the classification probability and the change of the lane line direction, classifying the pixels by the clustering algorithm to obtain the clustered lane line comprises,

Outputting the classification probability through an FCN-VGG16 network, and determining a weight coefficient of a clustering algorithm; analyzing the change trend of the pixel points to judge the lane line direction, determining the radius of a sphere and constructing a self-adaptive Mean Shift clustering algorithm; and classifying the pixel points of each lane line by using an adaptive Mean Shift clustering algorithm.

6. The method for detecting lane lines based on semantic segmentation as claimed in claim 1, wherein the step of labeling the lane lines of the driving image comprises:

and marking the unclear lane lines in the driving image by adjusting the scale.

7. The method for lane line detection based on semantic segmentation according to any one of claims 1-6, further comprising:

correcting the image to be processed to obtain a driving image;

8. A lane line detection apparatus based on semantic segmentation, the apparatus comprising:

9. A semantic segmentation based lane line detection system, comprising: a memory, a processor, a communication bus, and a lane line detection program based on semantic segmentation stored on the memory,

the processor is configured to execute the semantic segmentation based lane line detection program to implement the steps of the semantic segmentation based lane line detection method according to any one of claims 1 to 7.

10. A storage medium having stored thereon at least a semantic segmentation based lane line detection program, which when executed by a processor implements the steps of the semantic segmentation based lane line detection method according to any one of claims 1 to 7.