CN108846328B

CN108846328B - Lane detection method based on geometric regularization constraint

Info

Publication number: CN108846328B
Application number: CN201810527769.7A
Authority: CN
Inventors: 徐奕; 倪冰冰; 张�杰
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2018-05-29
Filing date: 2018-05-29
Publication date: 2020-10-16
Anticipated expiration: 2038-05-29
Also published as: CN108846328A

Abstract

The invention provides a lane detection method based on geometric regularization constraint, which comprises the following steps: step S1, extracting characteristics of the input driving scene image to obtain a preliminary lane detection result and a lane line detection result; step S2, cross-compares the preliminary lane detection and lane line detection results, corrects the detection error region, and outputs the final lane detection result. And step S3, optimizing the detection result by combining the loss function based on the structural information with the cross entropy loss, and training the network. The invention relates to a high-efficiency and high-precision method for dividing a drivable area, which effectively eliminates environmental interference and improves the accuracy of lane detection by introducing the inherent geometric information of roads in a traffic scene as constraint on the conventional lane detection model. The invention does not need to carry out preprocessing and post-processing on the image, and realizes end-to-end lane detection. The experimental result shows that compared with the classical detection method, the detection accuracy of the invention is greatly improved.

Description

Lane detection method based on geometric regularization constraint

Technical Field

The invention relates to the technical field of lane detection based on visual images, in particular to a lane detection method based on geometric regularization constraint.

Background

Lane detection based on visual images is one of the important problems of intelligent driving, and is mainly used for detecting a current drivable lane area from a traffic scene image. Based on the lane detection result, the intelligent driving system can perform path planning and driving behavior decision. However, at present, the lane detection method still has various limitations in terms of accuracy and applicable scenes.

The existing lane detection methods can be mainly divided into three types. The first method is mainly based on texture features, and carries out region fusion on self-similar regions in a traffic scene by adopting methods such as region growing and the like, and finally obtains a lane region. However, this approach is difficult to deal with dissimilar areas in the lane area and is therefore too sensitive to shadows and other disturbances. The second method is based on the edge information of the lane, extracts the edge information by using a high-pass filter or gradient, fits a final lane edge curve by adopting a curve fitting algorithm, and frames a final lane area by using the lane edge. However, due to the edge occlusion problem and the object interference problem existing in the actual scene, the detection result robustness of the method is poor. The last method adopts a deep learning method, firstly extracts abstract characteristics of a traffic scene through a semantic segmentation network, and then reconstructs a pixel-level lane region probability map by utilizing the characteristics so as to detect lanes. Although deep learning can roughly detect a lane area, detailed parts are poor in detection effect and are greatly influenced by a complex scene.

In summary, the existing lane detection method only considers part of information in a traffic scene, and thus does not have high accuracy and strong robustness.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a lane detection method based on internal geometric constraint, which considers the internal geometric constraint of a lane to detect the lane on the basis of the existing research. According to the invention, a multi-target neural network model, namely lane detection and lane line detection, is established, so that the neural network can learn the internal relation between two targets. On the basis, the two targets are connected through the feature extraction network, and the network further realizes the mutual constraint influence of the two targets. In addition, the invention also provides a loss function guide network training based on geometric constraint.

The invention is realized by the following technical scheme.

A lane detection method based on geometric regularization constraint comprises the following steps:

step S1, extracting characteristics of the input driving scene image to obtain a preliminary lane detection result and a lane line detection result;

and step S2, cross-comparing the preliminary lane detection result and the lane line detection result, correcting the detection error area and outputting the final lane detection result and the lane line detection result.

Preferably, the step S1 includes the following sub-steps:

step S11, constructing a feature extraction network by using a plurality of convolution layers and downsampling layers, and extracting image features of an input driving scene image; wherein:

the input of the feature extraction network is an input driving scene image after the size of a down-sampling layer is reduced; extracting image features from concrete to abstract layer by layer through a convolutional layer and a feature extraction network;

the network structure of the feature extraction network is as follows: B-CR (32) -CR (32) -M-CR (64) -CR (64) -M-CR (128) -CR (128) -CR (128) -M-CR (256) -CR (256) -CR (256) -M-CR (512) -CR (512) -CR (512); wherein B represents a batch normalization layer, C represents a convolution layer, R represents an activation layer ReLU, and M represents a downsampling layer; the number in brackets indicates the number of convolutional layer output channels; the active layer ReLU is defined as:

where x is the input to the active layer ReLU;

the image feature output by the feature extraction network is f_eIn order to ensure the tensor dimension is not changed, the image characteristic f_eOn the basis of (a) is linked with an image feature f_eAll zero tensor zero with the same size, and the image feature output by the final feature extraction network is f_ezDefined as:

f_ez＝[f_e，zero]_k

wherein]_kDenotes f_eAnd zero are linked along the kth dimension;

step S12, extracting the image feature f_ezAdopting a pixel classification network consisting of an deconvolution layer and an upper sampling layer to perform preliminary lane area detection on an input driving scene image; step S13, extracting the image feature f_ezComprising an anti-convolution layer and an up-sampling layerThe pixel classification network performs preliminary lane line detection on the input driving scene image;

wherein, in step S12 and step S13, the image feature f is adopted simultaneously_ezBut two pixel classification networks are used to implement lane and lane line detection, respectively.

The image feature f extracted in step S11_ezRespectively obtaining a feature map with the same resolution as the input driving scene image through a pixel classification network consisting of an up-sampling layer and a deconvolution layer, and classifying the category of each pixel point by using the feature map;

the pixel classification network is in mirror symmetry with the feature extraction network; the network structure of the pixel classification network is as follows: DR (512) -DR (512) -DR (512) -U-DR (256) -DR (256) -DR (256) -U-DR (128) -DR (128) -DR (128) -U-DR (64) -DR (64) -U-DR (32) -DS (z); wherein D represents an deconvolution layer, U represents an upsampling layer, and S represents an active layer Sigmoid; the number in parentheses indicates the number of deconvolution output channels; when the number z of output channels of the last deconvolution layer is 1, indicating that the pixel point belongs to a lane area or a lane line, and when the number z of output channels of the last deconvolution layer is 0, indicating that the speed limit point does not belong to the lane area or the lane line;

the active layer Sigmoid is defined as the following function:

wherein, x is the input of the active layer Sigmoid;

through the up-sampling layers with the same number as the down-sampling layers, the pixel classification network restores the feature map to the resolution which is the same as that of the input driving scene image, so that the feature map and the pixel points are in one-to-one correspondence; and classifying the pixel points by the Sigmoid function of the activation layer in a probability mode, and finally outputting a probability graph to show the probability that each pixel point belongs to a lane area or a lane line, so as to obtain a preliminary lane detection result and a lane line detection result.

Preferably, the step S2 includes the following sub-steps:

step S21, based on image features f_eCorrecting the lane detection result by extracting the internal geometric constraint of the lane line together with the initial lane line detection result;

step S22, based on the image feature f_eAnd correcting the lane line detection result by extracting the geometric constraint of the lane edge together with the preliminary lane detection result.

Preferably, the step S21 includes the following sub-steps:

step S211, extracting lane line correction characteristics by using the preliminary lane line detection result, and carrying out geometric constraint on lane detection; wherein:

for extracting the lane line correction feature and matching the image feature f obtained in step S11_eThe lane line correction characteristics f output by the fusion and correction characteristic extraction network are carried out_mrSize requirement and image characteristics f_eThe sizes are the same; based on this, the network structure of the modified feature extraction network is: B-CR (32) -CR (32) -M-CR (64) -CR (64) -M-CR (128) -CR (128) -CR (128) -M-CR (256) -CR (256) -CR (256) -M-CR (512) -CR (512) -CR (512); wherein B represents a batch normalization layer, C represents a convolution layer, R represents an activation layer, and M represents a down-sampling layer; the number in brackets indicates the number of convolutional layer output channels;

the modified feature extraction network receives the feature map output by the penultimate deconvolution layer of the pixel classification network in the step S13 to perform feature re-extraction;

step S212, correcting the characteristic f by using the lane line_mrCorrecting the lane detection result and generating an accurate lane detection result; wherein:

correcting the lane line correction characteristic f obtained in the step S211_mrAnd the image feature f obtained in step S11_eConnecting to obtain input characteristics f finally used for lane detection_elIt is defined as:

f_el＝[f_e，f_mr]_k

input feature f_elInputting the pixel classification network defined in the step S12, performing lane detection by using the same network parameters, and finally obtaining an accurate lane detection result constrained by the geometric relationship of lane lines。

Preferably, the step S22 includes the following sub-steps:

step S221, extracting lane correction characteristics by using the preliminary lane detection result, and carrying out geometric constraint on lane line detection; wherein:

for extracting lane correction feature and comparing it with the image feature f in step S11_eMerging, correcting the lane correction feature f output by the feature extraction network_lrSize requirement and image characteristics f_eThe size is the same, and based on the size, the network structure of the corrected feature extraction network is as follows: B-CR (32) -CR (32) -M-CR (64) -CR (64) -M-CR (128) -CR (128) -CR (128) -M-CR (256) -CR (256) -CR (256) -M-CR (512) -CR (512) -CR (512); wherein B represents a batch normalization layer, C represents a convolution layer, R represents an activation layer, and M represents a down-sampling layer; the number in brackets indicates the number of convolutional layer output channels;

the modified feature extraction network receives the feature map output by the penultimate deconvolution layer of the pixel classification network in the step S12 to perform feature re-extraction;

step S222, using the lane correction feature f_lrCorrecting the lane line detection result and generating an accurate lane line detection result; wherein:

the lane correction feature f obtained in step S221_lrAnd the image feature f obtained in step S11_eConnecting to obtain the input characteristics f finally used for detecting the lane line_emIt is defined as:

f_em＝[f_e，f_lr]_k

input feature f_emInputting the pixel classification network defined in the step S13, and performing lane line detection by using the same network parameters to finally obtain an accurate lane line detection result constrained by the geometric relationship of the lane.

Preferably, any one or more of the following features are also included:

-the reduced size of the driving scene image is: w h 3; wherein w is the image width, h is the image height, and the image channel is 3;

-image features f_eIs of a size of

-image features f_ezIs of a size of

In step S12, the classification of the category to which each pixel belongs using the feature map includes: a lane area and a non-lane area;

in step S13, the classification of the category to which each pixel belongs using the feature map includes: the lane line area and the non-lane line area.

Preferably, any one or more of the following features are also included:

lane line correction feature f_mrIs of a size of

-input features f_elA size of

Wherein w is the width of the driving scene image after the size reduction, and h is the height of the driving scene image after the size reduction.

Preferably, any one or more of the following features are also included:

-lane correction feature f_lrIs of a size of

-input features f_emA size of

Preferably, the method further includes step S3, optimizing the lane detection result and the lane line detection result by combining the structure information-based loss function and the cross entropy loss function, and training all the networks simultaneously end to end.

Preferably, the step S3 is specifically:

for lane detection results:

adopting a loss function based on the boundary consistency, and measuring the boundary consistency through a cross-over ratio to obtain a loss function optimization lane detection result based on the cross-over ratio; wherein, the loss function based on the boundary consistency is a loss function which assumes that the lane and the lane line have inherent consistency on the boundary; loss function l based on cross-over ratio_baThe definition is as follows:

l_ba＝1-IoU

wherein x_iFor inputting pixel points of the driving scene image, p (x)_i) Is a pixel point x_iThe probability value output by the position activation layer Sigmoid; y (x)_i) Is a pixel point x_iActual class, representing pixel level multiplication;

for lane line detection results:

optimizing a lane line detection result by using a loss function based on the region; wherein the region-based loss function is defined as follows:

wherein the constraint term G (x)_i) 1 denotes all pixels in the lane area, I_r(x_i) Representing probability values of all lane areas recovered based on lane line detection results;

the method for recovering the lane area through the lane line detection result depends on the spatial correlation among the pixel points, namely the most relevant pixel points should contribute the same information, so that the recovered lane areaThe probability value is the same as the probability value of the pixel point on the nearest lane line, and I is defined_r(x_i) The following were used:

I_r(x_i)＝I_b(x′_j)

wherein d (x)_i，m_j) Representing a pixel point x_iAnd m_jEuclidean distance of, I_b(x′_j) Is at pixel point x'_jProbability of lane line up, argmin_mjRepresenting the pixel point position at which its back function is minimized; the resulting region-based loss function l is therefore_aaThe definition is as follows:

the four different loss functions are added by weight to obtain a loss function l for training the whole network, which is defined as follows

l＝l_lce+l_mce+λ₁l_ba+λ₂l_aa

Wherein l_lceLoss function for detecting targets for a lane,/_,ceDetecting a loss function, λ, of the target for a lane line₁As a loss function l based on cross-over ratio_baWeight of (a), λ₂As a region-based loss function l_aaThe weight of (c).

The invention provides a lane detection method based on geometric regularization constraint, which is a method for performing lane detection through mutual constraint of sub-networks. Specifically, the method constructs a multi-target network structure, learns the internal geometric relation between lanes and lane lines, and realizes the mutual optimization of detection results among targets through the feature extraction network, so that better detection results can be obtained under complex scenes and interference compared with the common method. In addition, on the basis of the existing loss function, the invention provides a loss function based on geometric constraint to guide network training and improve detection precision.

Compared with the prior art, the invention has the following beneficial effects:

the invention can effectively utilize the lane area information with high consistency and the lane line information containing the curve edge in the traffic scene. Compared with the existing method, the method simultaneously uses a plurality of image characteristics, overcomes the limitation of the existing method under certain interference, can use different scenes and has stronger robustness.

The invention adds information transmission among targets on the basis of a simple multi-target network, thereby forming a two-stage lane detection network. By extracting the characteristics of the primary detection result, the invention enhances the information sharing effect of the multi-target network.

According to the invention, a loss function based on internal geometric constraint is introduced in the process of training the network, and the geometric constraint is explicitly introduced for network training, so that the detection precision is further improved.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a diagram of a network framework for lane detection based on geometric normalization constraint according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a loss function based on boundary a priori knowledge in an embodiment of the present invention, in which (a) is a schematic diagram comparing lane detection with an actual lane area, and (b) is a schematic diagram of the loss function for measuring the boundary consistency between lane detection and the actual lane area.

Fig. 3 is a schematic diagram of a loss function based on area prior knowledge in an embodiment of the present invention, in which (a) is a schematic diagram of lane line detection and actual lane line comparison, and (b) is a schematic diagram of a lane area generated based on a lane line detection result; in the figure, I₁And I₂The middle solid line is a lane line detection result, and the dotted line is a missed lane line; a is any position in the lane area, P₁And P₂The foot being a perpendicular to two lane lines。

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

Referring to fig. 1, a lane detection method based on geometric regularization constraint includes the following steps:

step S1, extracting characteristics of the input image (driving scene image) to obtain a preliminary lane detection result and a lane line detection result;

step S2, cross-compares the preliminary lane detection and lane line detection results, corrects the detection error region, and outputs the final lane detection result.

The lane detection method based on the geometric regularization constraint successfully realizes the mutual constraint of the network and obtains a high-quality lane detection result.

Preferably, the step S1 includes the following sub-steps:

step S11, constructing a feature extraction network by using a plurality of convolution layers and downsampling layers, and extracting image features of an input driving scene image;

the input of the feature extraction network is a driving scene image with the size reduced by a down-sampling layer, wherein the size is w x h x 3, w is the image width, h is the image height, and the image channel is 3; through the convolution layer, the feature extraction network can extract image features from concrete to abstract layer by layer, and the down-sampling layer ensures that the computation amount does not increase explosively with the increase of the network depth on one hand and extracts the most obvious features of the image on the other hand so as to prevent the loss of key information in the down-sampling process;

the specific network structure of the feature extraction network is as follows: B-CR (32) -CR (32) -M-CR (64) -CR (64) -M-CR (128) -CR (128) -CR (128) -M-CR (256) -CR (256) -CR (256) -M-CR (512) -CR (512) -CR (512); wherein B represents a Batch Normalization Layer (Batch), C represents a Convolution Layer (Convolution Layer), R represents an activation Layer (ReLU), and M represents a downsampling Layer (Max Pooling Layer); the number in brackets indicates the number of convolutional layer output channels;

the active layer (ReLU) is defined as:

where x is an input to the activation layer (ReLU);

the image feature output by the feature extraction network is f_e，f_eA size of

To ensure the tensor scale is unchanged, the image characteristic f_eOn the basis of (a) is linked with an image feature f_eThe all-zero tensor zero with the same size is output by the final feature extraction network, and the image feature output by the final feature extraction network is f_ezDefined as:

wherein]_kRepresenting the junction of two tensors along the kth dimension; preferably, k is 3; image feature f_ezIs of a size of

Step S12, extracting the image feature fe_ezAdopting a pixel classification network consisting of an deconvolution layer and an upper sampling layer to perform preliminary lane area detection on an input driving scene image;

the method specifically comprises the following steps:

the image feature fe extracted in step S11 is combined_ezObtaining a feature map with the same resolution as the original input driving scene image through a pixel classification network consisting of an up-sampling layer and a deconvolution layer, and classifying the category of each pixel point by using the feature map; because the resolution of the image needs to be restored, the network structure of the pixel classification network and the feature extraction network which are in mirror symmetry is adopted;

the specific network structure of the pixel classification network is as follows: DR (512) -DR (512) -DR (512) -U-DR (256) -DR (256) -DR (256) -U-DR (128) -DR (128) -DR (128) -U-DR (64) -DR (64) -U-DR (32) -DS (1); wherein D represents a Deconvolution Layer (Deconvolition Layer), U represents an Up-sampling Layer (Up-sample Layer), S represents an activation Layer (Sigmoid), and the number in the brackets represents the number of output channels of the Deconvolution Layer, and it is noted that the number of output channels of the last Deconvolution Layer is 1, so as to distinguish whether a pixel is a lane region;

the activation layer (Sigmoid) is defined as the following function:

wherein x is an input of an activation layer (Sigmoid);

through the up-sampling layers with the same number as the down-sampling layers, the pixel classification network can restore the feature map to the resolution which is the same as that of the input image, so that the feature map and the pixel points are in one-to-one correspondence; classifying the pixel points in a probability mode by an active layer (Sigmoid) function, and finally outputting a probability graph to show the probability that each pixel point belongs to a lane area, so as to obtain a primary lane detection result;

step S13, extracting the image feature fe_ezAdopting a pixel classification network consisting of an deconvolution layer and an upper sampling layer to perform primary lane line detection on an input driving scene image;

the method specifically comprises the following steps:

the image feature f extracted in step S11 is subjected to the same network structure (DR (512) -DR (512) -DR (512) -U-DR (256) -DR (256) -DR (256) -U-DR (128) -DR (128) -U-DR (64) -DR (64) -U-DR (32) -DS (1)) as in step S12_ezObtaining a feature map with the same resolution as the original input driving scene image through a pixel classification network consisting of an up-sampling layer and a deconvolution layer, and classifying the category of each pixel point by using the feature map; like step S12, step S13 inputs the same image feature f_ez。

Preferably, the step S2 includes the following sub-steps:

step S21, based on the image feature f_eCorrecting the lane detection result by extracting the internal geometric constraint of the lane line together with the initial lane line detection result;

step S22, based on image feature fe_eAnd correcting the lane line detection result by extracting the geometric constraint of the lane edge together with the preliminary lane detection result.

Preferably, the step S21 includes the following sub-steps:

step S211, extracting lane line correction characteristics by using the preliminary lane line detection result, and carrying out geometric constraint on lane detection;

the method specifically comprises the following steps:

in order to extract the lane line correction feature and compare it with the image feature f in step S11_eFusing, correcting the tensor f of the feature extraction network output_mr(i.e., lane line correction feature f)_mr) The size must be matched with the image characteristics f_eThe same, and therefore the network structure of the modified feature extraction network is the same as that of step S11 (B-CR (32) -M-CR (64) -M-CR (128) -M-CR (256) -M-CR (512)); finally correcting feature extraction network output lane line correction feature f_mrIs of a size of

Meanwhile, in order to improve the efficiency of feature extraction and accelerate the convergence speed of the network, the feature extraction network receives the feature graph output by the penultimate deconvolution layer of the pixel classification network in the step S13 to perform feature re-extraction instead of the output by the last deconvolution layer;

step S212, correcting the characteristic f by using the lane line_mrCorrecting the lane detection result and generating an accurate lane detection result;

the method specifically comprises the following steps:

f_el＝[f_e，f_mr]_k

preferably, k is 3; resulting input features f_elA size of

Input feature f_elInputting the pixel classification network defined in the step S12, and performing lane detection by using the same network parameters; and finally, an accurate lane detection result which passes through the geometric relationship constraint of the lane lines can be obtained.

Preferably, the step S22 includes the following sub-steps:

step S221, extracting lane correction characteristics by using the preliminary lane detection result, and carrying out geometric constraint on lane line detection;

the method specifically comprises the following steps:

for extracting lane correction feature and comparing it with the image feature f in step S11_eFusing, correcting the tensor f of the feature extraction network output_lr(i.e., lane correction feature f)_lr) The size must be matched with the image characteristics f_eThe same, and therefore the network structure of the modified feature extraction network is the same as that of step S11 (B-CR (32) -M-CR (64) -M-CR (128) -M-CR (256) -M-CR (512)); finally correcting the lane correction feature f output by the feature extraction network_lrIs of a size of

Meanwhile, in order to improve the efficiency of feature extraction and accelerate the convergence speed of the network, the feature extraction network receives the feature graph output by the penultimate deconvolution layer of the pixel classification network in the step S12 to perform feature re-extraction instead of the output by the last deconvolution layer;

step S212, using the lane correction feature f_lrCorrecting the lane line detection result and generating an accurate lane line detection result;

the method specifically comprises the following steps:

f_em＝[f_e，f_lr]₃

resulting input features f_emA size of

Input feature f_emInputting the pixel classification network defined in the step S13, and detecting the lane line by using the same network parameters; and finally, an accurate lane line detection result which is constrained by the geometric relationship of the lane can be obtained.

Preferably, the lane detection method based on geometric regularization constraint further includes step S3, where the lane detection result and the lane line detection result are optimized by combining the loss function based on the structural information with the cross entropy loss function, and a network is trained.

The step S3 specifically includes:

referring to fig. 2, for the lane detection result:

l_ba＝1-IoU

referring to fig. 3, for lane line detection results:

the method for recovering the lane area through the lane line detection result depends on the spatial correlation among the pixel points, namely the most relevant pixel points should contribute the same information, so the probability value of the recovered lane area is the same as the probability value of the pixel point on the nearest lane line, and I is defined_r(x_i) The following were used:

I_r(x_i)＝I_b(x′_j)

l＝l_lce+l_mce+λ₁l_ba+λ₂l_aa

Wherein l_lceLoss function for detecting targets for a lane,/_mceDetecting a loss function, λ, of the target for a lane line₁As a loss function l based on cross-over ratio_baWeight of (a), λ₂As a region-based loss function l_aaThe weight of (c).

The lane detection method based on the geometric regularization constraint is used for solving the problem of drivable area segmentation in an intelligent driving scene, and is a high-efficiency and high-precision drivable area segmentation method. The method comprises the following steps: step S1, performing preliminary lane detection and lane line detection on the input image, and segmenting to obtain a preliminary lane detection result; step S2, cross-comparing the preliminary lane and lane line detection results, correcting the detection error area and outputting the final lane detection result; on the basis of the existing lane detection model, the method effectively eliminates environmental interference and improves the accuracy of lane detection by introducing the inherent geometric information of the road in the traffic scene as constraint. The invention does not need to carry out preprocessing and post-processing on the image and can realize end-to-end lane detection. The experimental result shows that compared with the classical detection method, the detection accuracy of the invention is greatly improved.

The lane detection method based on geometric regularization constraint provided above is described in detail below with respect to the design principle and implementation steps of the method.

Unlike ordinary semantic segmentation, lane detection not only needs to segment different types of objects in a scene, but also needs to distinguish different lanes to obtain a lane region with high precision. In order to better overcome the influence of self-similarity between adjacent lanes on the detection effect, the invention provides a multi-target network structure, learns the internal geometric relation between the lanes and the lane lines, and realizes the mutual optimization of the detection results among the targets through a feature extraction network, thereby obtaining better detection results under complex scenes and interference compared with the common method.

1. Preliminary lane and lane line detection

Firstly, feature extraction is carried out on an original image, the size of an input image is w x h 3, w is the image width, h is the image height, and the image channel is 3. Through the convolution layer, the feature extraction network can extract image features from concrete to abstract layer by layer, and the down sampling ensures that the computation amount does not increase explosively with the increase of the network depth on one hand, and extracts the most significant features of the image on the other hand to prevent the loss of key information in the down sampling process.

The specific feature extraction network structure is as follows: B-CR (32) -CR (32) -M-CR (64) -CR (64) -M-CR (128) -CR (128) -CR (128) -M-CR (256) -CR (256) -M-CR (512)

-CR (512). Where B denotes a Batch Normalization Layer, C denotes a Convolution Layer, R denotes an activation Layer (ReLU), and M denotes a down-sampling Layer (Max potential Layer). The numbers in brackets indicate the number of convolutional layer output channels. The active layer ReLU is defined as:

where x is the input to the active layer ReLU. The image feature output by the final feature extraction network is f_e，f_eA size of

Since it is necessary to extract the correction feature and combine the correction feature with f in step S2_eMerging so that the number of channels of the deconvolution kernel in step S2 is not equal to f_eThe channel of (2). To ensure that the network can be trained end-to-end, the invention is characterized by the feature f_eIs connected with f_eThe all-zero tensor zero of the same size, the final output should be characterized by f_ezDefined as:

f_ez＝[f_e，zero]₃

wherein]_kRepresenting the joining of the two tensors along the k-th dimension. Final characteristic f_eIs of a size of

Will be characteristic f_ezAnd calculating by adopting a pixel classification network consisting of an up-sampling layer and a deconvolution layer to obtain a feature map with the same resolution as the original image, and classifying the category of each pixel point by using the feature map. Since it is necessary to restore to the image resolution, a network structure that is mirror symmetric to the feature extraction network is adopted here.

The specific network structure is DR (512) -DR (512) -DR (512) -U-DR (256) -DR (256) -DR (256) -U-DR (128) -DR (128) -DR (128) -U-DR (64) -DR (64) -U-DR (32) -DS (1). Where D denotes a Deconvolution Layer (Deconvolution Layer), U denotes an Up-sample Layer (Up-sample Layer), and S denotes an active Layer (Sigmoid). The numbers in brackets indicate the convolutional layer output channels, noting that the number of output channels of the final model is 1.

The active layer Sigmoid is defined as:

where x is the input of the active layer Sigmoid. Through the up-sampling layers with the same number as the down-sampling layers, the pixel classification network can restore the feature map to the resolution ratio same as that of the input image, and therefore one-to-one correspondence between the feature map and the pixel points is achieved. And the Sigmoid function classifies the pixel points in a probability form and finally outputs a probability graph.

The loss function corresponding to the active layer Sigmoid is a cross entropy function defined as:

wherein x_iBeing pixel points of an image, p (x)_i) Is a pixel point x_iProbability value output by the location activation layer Sigmoid. y (x)_i) Is a pixel point x_iActual class, in the present invention, if x_iThe value of the attribute belonging to the lane or lane line is 1, otherwise, it is 0.

The invention provides a multi-target network, so that two independent pixel classification networks are needed to carry out lane detection and lane line detection respectively, the two networks use respective convolution kernels respectively, and the two pixel classification networks are updated independently according to respective detection results in the training process. And because the feature extraction network is shared by the two sub-networks, the feature extraction network is jointly updated under the influence of the two detection results. The loss function of the training is:

l＝l_lce+l_mce

wherein l_lceLoss function for detecting target for lane, and_mceand (4) detecting loss functions of the targets for the lane lines, wherein the weights of the two functions are the same in the training process.

2. Lane and lane line detection correction

On the basis of the first step, the method utilizes the preliminary lane line detection result to extract correction features, and carries out geometric constraint on lane detection. For extracting correction features and combining with the features f_eFusing, correcting feature extraction network, and final feature size and f_eSame as that of

Meanwhile, in order to reduce network training parameters and accelerate the convergence speed of the network, the feature extraction network receives a feature map output by a penultimate deconvolution layer in the pixel classification network to carry out feature re-extraction instead of the output of a last deconvolution layer. Finally, the two correction feature extraction networks respectively receive the preliminary detection results of the lane and the lane line and output a correction feature f_lrAnd f_mr。

Will correct the characteristic f_lr、f_mrAnd characteristic f_eConnecting to obtain input characteristics f finally used for lane detection and lane line detection_elAnd f_emIt is defined as:

f_el＝[f_e，f_mr]₃

f_em＝[f_e，f_lr]₃

resulting feature f_elAnd f_emA size of

To achieve end-to-end network training and detection, and to reduce network parameters, feature f_elAnd f_emInputting the pixel classification network defined in the first step, and in the forward and backward propagation processesAnd sharing the weight value. Since half of the features received by the pixel classification network in the first step are all-zero features, this part of the weights will not play a role in the first step and will not participate in the back propagation. And the weight value is only concerned with back propagation in the step. The training method of this step will be described below.

3. Structural loss function definition

In order to explicitly introduce geometric constraint of the lane, the invention adopts a loss function based on structural information and a cross entropy loss function which are combined for optimizing a detection result and training a network.

For lane detection results, most of the false detection problems appear in the form of regions, and a simple cross entropy loss function cannot measure the deviation degree of the lane geometric structure. For lane detection, the invention therefore uses a loss function based on boundary consistency, which is based on the assumption that lanes and lane lines have an inherent consistency at the boundary. Because the simple boundary comparison may cause a great loss value, thereby causing difficulty in network training, the invention adopts the intersection ratio to measure the boundary consistency, and obtains a loss function based on the intersection ratio to optimize the lane detection result. Loss function l based on cross-over ratio_baThe definition is as follows:

l_ba＝1-IoU

wherein x_iBeing pixel points of an image, p (x)_i) Is a pixel point x_iProbability value output by the location activation layer Sigmoid. y (x)_i) Is a pixel point x_iActual class, denotes pixel level multiplication. Since the calculation process of the loss function adopts pixel-level operation, the whole loss function is conductive, and end-to-end training can be carried out.

For the lane line detection result, the detection result is more easily influenced by a low signal-to-noise ratio to cause the problem of missing detection. This loss function is defined as follows:

wherein the constraint term G (x)_i) 1 denotes all pixels in the lane area and I_r(x_i) And representing the probability values of all the lane areas recovered based on the lane line detection result.

The method for recovering the lane area through the lane line detection result mainly depends on the spatial correlation among the pixel points, namely the most relevant pixel points should contribute the same information. The probability value of the recovered lane area is the same as the probability value of the pixel point on the lane line nearest to the recovered lane area, and is defined as follows:

I_r(x_i)＝I_b(x′_j)

wherein d (x)_i，m_j) Representing a pixel point x_iAnd m_jEuclidean distance of, I_b(x′_j) Is at pixel point x'_jProbability of lane line up, argmin_mjIndicating the pixel point location that minimizes the latter function. The resulting region-based loss function l is therefore_aaThe definition is as follows:

in order to train the whole network, the invention adds four different loss functions by weight, and the final loss function l is defined as follows

l＝l_lce+l_mce+λ₁l_ba+λ₂l_aa

Specific examples

In this embodiment, two databases, namely KITTI and RVD, are selected for experiments, and the matching effect of this embodiment is observed. And compared with the existing best method, and the experimental result is analyzed. To compare the performance differences between different models, the network is trained using only the training images in the database, with no additional data.

The KITTI database comprises 289 training pictures and 290 testing pictures, which respectively comprise three different road scenes: single lane roads, multi-lane roads, and lane-less roads. A single lane road is defined as a road with only two opposite direction lanes, whereas a multi-lane road has multiple lanes in one driving direction, and a lane-free road does not have obvious lane markings. Since the lane-free road has the problem that some lanes are difficult to define, in the actual training and testing, a small number of lane-free roads are excluded.

The RVD database contains over 10 hours of traffic scene images acquired by using multiple cameras, and over 10000 manually labeled images, which contain different weather and different road conditions, including highway scenes, urban road scenes, rainy day scenes and nighttime scenes.

Effects of the implementation

Compared with the existing method, the method has the advantages that the accuracy (P), the recall rate (R), the F1Score (F1Score) and the cross-over ratio (IoU) are greatly improved on a KITTI database. As shown in table 1, the present invention obtains better performance in single-lane, multi-lane and no-lane scenes, which indicates that the present invention can better adapt to the changes of different scenes and environments, and has more robustness.

Table 1 is the experimental results and comparison on the KITTI dataset:

it is noted that, as shown in table 2, the modified feature and the structure penalty function significantly improve performance compared to a purely multitasking network. Compared with a multitask network, the correction feature improves the intersection ratio of the verification results by 1.3%, and the F1score by 0.007. And the structure loss function improves the intersection ratio of the verification results by 0.9 percent and improves the F1score by 0.005. And meanwhile, the model with the addition of the correction feature and the structural loss function achieves the best performance, the intersection ratio is improved by 2.1%, and the F1score is improved by 0.012. Therefore, both the correction feature and the structure loss function play a crucial role in performance improvement.

On the RVD database, the present invention also achieved better results as shown in table 2. It is noted that unlike other methods, the accuracy, recall, and F1score vs. union ratio of the present invention do not fluctuate dramatically with scene changes, especially in nighttime scenes. The problem that the lane line is difficult to identify due to the change of illumination is difficult to solve by other methods, but the invention can carry out secondary correction on the detected lane at night due to the correction, thereby obtaining better result.

Table 2 is the experimental results and comparison on the RVD data set:

the foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims

1. A lane detection method based on geometric regularization constraint is characterized by comprising the following steps:

step S2, cross-comparing the preliminary lane detection result and the lane line detection result, correcting the detection error area and outputting the final lane detection result and the lane line detection result;

the step S1 includes the following sub-steps:

where x is the input to the active layer ReLU;

f_ez＝[f_e,zero]_k

wherein]_kRepresenting the junction of two tensors along the k-dimension, here denoted f_eAnd zero are linked along the kth dimension;

step S12, extracting the image feature f_ezAdopting a pixel classification network consisting of an deconvolution layer and an upper sampling layer to perform preliminary lane area detection on an input driving scene image; step S13, extracting the image feature f_ezAdopting a pixel classification network consisting of an deconvolution layer and an upper sampling layer to perform primary lane line detection on an input driving scene image;

wherein, in step S12 and step S13, the image feature f is adopted simultaneously_ezBut two pixel classification networks are used for respectively realizing lane and lane line detection;

the image feature f extracted in step S11_ezRespectively obtaining and inputting through a pixel classification network composed of an up-sampling layer and a deconvolution layerDriving feature maps with the same resolution of scene images, and classifying the category of each pixel point by using the feature maps;

the active layer Sigmoid is defined as the following function:

wherein, x is the input of the active layer Sigmoid;

2. The geometric regularization constraint-based lane detection method according to claim 1, wherein said step S2 includes the sub-steps of:

step S22, based on the image feature f_eAnd correcting lane line detection by extracting geometric constraint of lane edge from the initial lane detection resultAnd (6) measuring the result.

3. The geometric regularization constraint-based lane detection method according to claim 2, wherein said step S21 includes the sub-steps of:

f_el＝[f_e,f_mr]_k

wherein]_kThe representation tensors are linked along the kth dimension, where the representation f_eAnd f_mrThe two tensors are linked along the kth dimension;

input feature f_elAnd inputting the pixel classification network defined in the step S12, and performing lane detection by using the same network parameters to finally obtain an accurate lane detection result which passes through the geometric relationship constraint of the lane lines.

4. The geometric regularization constraint-based lane detection method according to claim 2, wherein said step S22 includes the sub-steps of:

f_em＝[f_e,f_lr]_k

wherein]_kThe representation tensors are linked along the kth dimension, where the representation f_eAnd f_lrThe two tensors are linked along the kth dimension;

5. The geometric regularization constraint-based lane detection method according to claim 1, further comprising any one or more of the following features:

-image features f_eIs of a size of

-image features f_ezIs of a size of

6. The geometric regularization constraint-based lane detection method according to claim 3, further comprising any one or more of the following features:

lane line correction feature f_mrIs of a size of

-input features f_elA size of

Where w1 is the driving scene image width after the size reduction, and h1 is the driving scene image height after the size reduction.

7. The geometric regularization constraint-based lane detection method according to claim 4, further comprising any one or more of the following features:

-lane correction feature f_lrIs of a size of

-input features f_emA size of

8. The geometric regularization constraint-based lane detection method according to any one of claims 1 to 7, further comprising a step S3 of optimizing lane detection results and lane line detection results by combining a loss function based on structural information with a cross entropy loss function, and training all the networks simultaneously end to end.

9. The geometric regularization constraint-based lane detection method according to claim 8, wherein said step S3 is specifically:

for lane detection results:

l_ba＝1-IoU

wherein x_iFor inputting driving scene imagesPixel point, p (x)_i) Is a pixel point x_iThe probability value output by the position activation layer Sigmoid; y (x)_i) Is a pixel point x_iActual class, representing pixel level multiplication;

for lane line detection results:

the method for recovering the lane area through the lane line detection result depends on the spatial correlation among the pixel points, namely the most relevant pixel points contribute the same information, so that the probability value of the recovered lane area is the same as the probability value of the pixel point on the nearest lane line, and I is defined_r(x_i) The following were used:

I_r(x_i)＝I_b(x′_j)

wherein d (x)_i,m_j) Representing a pixel point x_iAnd m_jEuclidean distance of, I_b(x′_j) Is at pixel point x'_jThe probability of the upper lane line is,

representing the pixel point position at which its back function is minimized; the resulting region-based loss function l is therefore_aaThe definition is as follows:

l＝l_lce+l_mce+λ₁l_ba+λ₂l_aa