CN114463721A

CN114463721A - Lane line detection method based on spatial feature interaction

Info

Publication number: CN114463721A
Application number: CN202210113686.XA
Authority: CN
Inventors: 宋立新; 焦守文
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2022-01-30
Filing date: 2022-01-30
Publication date: 2022-05-10

Abstract

The invention discloses a lane line detection method based on spatial feature interaction. The lane line detection problem is defined as position selection and classification in the row direction based on spatial feature interaction. The purpose is when guaranteeing faster detection speed, through the interaction of spatial feature, makes every position all spatial information in can both the perception same characteristic map, can improve the problem that detection effect is not good under receiving factors such as vehicle shelter from, ground wearing and tearing, light environment. In addition, the invention provides bilateral upsampling combining coarse-grained and fine-grained features in an upsampling stage, so that a low-resolution feature map can be accurately restored to pixel-level prediction. The invention comprises the following steps: the method comprises the following steps: processing the training data; step two: constructing a lane line detection network based on spatial feature interaction; step three: and (5) training a lane line detection model. Step four: and (5) testing the lane line detection model. The invention belongs to the technical field of automatic driving.

Description

Lane line detection method based on spatial feature interaction

Technical Field

The invention relates to the technical field of automobile auxiliary driving or automatic driving, in particular to a lane line detection method.

Background

The lane line detection is a process of automatically sensing the shape and position of a marked lane line and is a key component of an automatic driving system. The lane line detection is used as a basic module in automatic driving, and plays an important role in the applications of vehicle real-time positioning, driving route planning, lane keeping assistance, adaptive cruise control and the like. This leaves lane line detection still a number of challenges due to severe occlusion, severe weather conditions, a fuzzy road surface, and the inherent slimness of the lane itself.

Conventional lane line detection methods typically rely on manual operations to extract features and then post-process to fit the shape of the lane line. However, the conventional method cannot maintain robustness in a real scene because the manually designed model cannot deal with the diversity of lane lines in different scenes.

In recent years, most of research on lane line detection has focused on deep learning. Early methods based on deep learning were to detect lane lines by segmentation, but a faster detection speed was essential for lane line detection algorithms due to the severe real-time requirements of autonomous driving. For this purpose, the method of line direction detection proposes to define lane line detection as finding a set of positions of the lane lines in certain lines in the image, i.e. selecting, classifying based on the positions in the line direction. Although the method has a faster detection speed, due to the slender characteristic of the lane line, the number of the pixels of the marked lane line is far smaller than that of the background pixels, and the method is often difficult to extract fine lane line features, so that the detection performance is low. It is more challenging that a lane line may be almost completely obscured by a crowded car, and that lane line can only be surmised with common sense. Therefore, the low-quality features extracted by the common CNN tend to reduce the fine lane line features, and the effect is poor in a complex scene. In fact, there is a high correlation between lane lines, and studying how to obtain such correlation would bring hope for more accurately detecting lane lines in complex scenes with weak visual clues.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a lane line detection method based on spatial feature interaction. The method defines the lane line detection problem as position selection and classification in the row direction based on space feature interaction. The purpose is when guaranteeing faster detection speed, through the interaction of spatial feature, can improve the problem that detection effect is not good under receiving factors such as vehicle shelter from, ground wearing and tearing, light environment to influence well, has effectively improved lane line detection's accuracy nature and robustness.

The above purpose is realized by the following technical scheme:

a lane line detection method based on spatial feature interaction comprises the following steps:

the method comprises the following steps: processing the training data;

step two: constructing a lane line detection network based on spatial feature interaction;

step three: training the network model established in the second step by using the data processed in the first step, performing parameter learning on the model by using an Adam optimization strategy, and storing a final training model;

step four: and testing the final network model in the third step.

The lane line detection method based on the spatial feature interaction is characterized in that the first step comprises the following processes:

first, the original image is resized to 288 × 800. Then, in order to improve generalization capability, a data enhancement method combining rotation, vertical movement and horizontal movement is applied to the scaled image. In addition, since the edge of the enhanced image may be vacant, in order to maintain the lane structure, the lane line is extended to the boundary of the image.

The lane line detection method based on the spatial feature interaction is characterized in that the second step comprises the following processes:

(1) integral structure of lane line detection network

The invention defines the lane line detection problem as position selection and classification in the row direction based on space feature interaction. The whole network structure mainly comprises three parts: the system comprises a feature extractor, a spatial feature interaction module and a predictor based on classification. In addition, an auxiliary segmentation module is provided, and it should be noted that the auxiliary segmentation task is only used in the training stage and is deleted in the testing stage.

(2) Feature extractor

At this stage, the preliminary features are extracted, and the ResNet is used as a feature extractor after removing the full connection layer. The feature extractor consists of 17 convolution layers, each convolution layer being followed by a batch normalization layer, a ReLU activation function layer.

(3) Spatial feature interaction module

And sending the feature map extracted by the feature extractor into a spatial feature interaction module, wherein the spatial feature interaction module achieves interaction of spatial information by moving the slice feature map in the vertical and horizontal directions. In each iteration, the slice signature will move around in 4 directions, passing information in the vertical and horizontal directions. And finally, K iterations are needed to ensure that each position can receive the information in the whole feature diagram. Specifically, there is provided a three-dimensional eigenmap tensor X of size C × H × W, where C, H and W represent the number of channels, the number of rows, and the number of columns, respectively.

The values of the feature map X at the kth iteration are shown, where c, i and j represent the channel, row and column indices, respectively. Then the forward calculation formulas (1), (2), (3) and (4) of the spatial feature interaction module are as follows:

where K is the number of iterations, and K is log₂And L. L in the formula (1) and the formula (2) is W and H, respectively. f is the nonlinear activation function, the present invention uses ReLU. The X labeled' represents the updated element. s_kIs the move step in the k-th iteration. The formula (1) and the formula (2) are vertical and horizontal information transfer formulas, respectively. F is a set of one-dimensional convolution kernels, where m, c, n represent the indices of the input channel, output channel, convolution kernel width, respectively. Here both the number of input channels and the number of output channels are equal to C. Z in the formulae (1) and (2) is an intermediate result of information transfer. Note that the feature map X is divided into H slices in the horizontal direction and W slices in the vertical direction. Moving step s_kAnd dynamically determining the information transmission distance under the control of the iteration number k.

The information transmission has four directions, and the invention uses 'bottom to top' and 'top to bottom' as vertical information interaction and 'left to right' and 'right to left' as horizontal information interaction. By continuously moving the slice feature map in the vertical and horizontal directions, all spatial information in the same feature map can be interacted and perceived at each position.

(4) Classification-based predictor

In pursuit of faster detection speed, the predictive part of the network is to select and classify the lane line position on each predefined row. H predefined rows are selected according to the training data, each predefined row being divided into (w +1) small cells. When prediction based on classification is carried out, the rich characteristic graph learned by the spatial characteristic interaction module is mapped to a characteristic graph m multiplied by h multiplied by (w +1) of the dimension required by classification row by row through two fully connected layers, wherein m represents the number of lane lines. Then (w +1) -dimensional classification is performed on h predefined rows, respectively. The positions of all the lane lines on the predefined rows are found, and then the whole lane line is predicted.

(5) Auxiliary segmentation module

Because the segmentation network has finer prediction on the lane line edge, but because of larger calculation amount, the invention only uses the segmentation task in the training stage, and assists the main network to train the model better. Thus, even if an additional segmentation task is added, the detection speed is not affected. In the auxiliary segmentation task, firstly, a feature graph processed by the spatial feature interaction module and two feature graphs of different scales extracted by the feature extractor are unified into a feature graph of the same size, then the feature graph, the feature graph and the feature graph are spliced to obtain a feature graph, the feature graph passes through a convolution layer, and the feature graph sampled from the convolution layer to be consistent with the original graph in size is subjected to segmentation prediction through double-side sampling. Sampling on two sides is divided into two parts, and one part depends on bilinear difference values to obtain coarse-grained up-sampling characteristics; another part relies on transpose convolution to fine tune coarse-grained fine information loss. The results of the two parts are fused by an addition operation.

The lane line detection method based on the spatial feature interaction is characterized in that the third step comprises the following processes:

and taking the processed lane line image as the input of the network, and training the model by using an Adam optimization algorithm to minimize a composite loss function. The recombination loss is: l is_total＝L_cls+βL_seg. Wherein L is_clsTo classify the loss, L_segFor the segmentation loss, β is the loss coefficient. The present invention uses focus loss as classification loss and cross entropy as auxiliary segmentation loss. L is_clsAnd L_segAs shown in formulas (5) and (6):

wherein p is ∈ [0,1]]Is the predicted probability of the model for the label y ═ 1, α ∈ [0,1 ∈]Is a balance factor, (1-p)^γIs the sample difficulty weight modulation factor.

Wherein p ∈ [0,1] is the predicted probability of the model for the label y ═ 1, and α ∈ [0,1] is the balance factor.

And (3) adopting an early-stopping strategy to prevent overfitting in model training, and storing the final training model after the training is finished.

The lane line detection method based on the spatial feature interaction is characterized in that the fourth step comprises the following processes:

the original image is first resized to 288 x 800. And taking the processed lane line image as the input of the network, loading the trained model, and obtaining the detection result of the lane line through forward propagation.

The invention has the following beneficial effects:

compared with the existing method, the lane line detection method based on the spatial feature interaction has the advantages that the robustness is enhanced, and the lane line detection method can be better suitable for the road conditions of complex roads, different light conditions and the like. By continuously moving the slice feature map in the vertical and horizontal directions, all spatial information in the same feature map can be interacted and perceived at each position. Lane line detection is a task that is highly dependent on surrounding cues. If one lane line is occluded or worn but has strong shape priors, it can be inferred from other lanes, car direction, road shape, or other visual cues by capturing the spatial relationship of the pixels between rows and columns. In addition, the present invention provides a bilateral upsampling combining coarse-grained and fine-grained features at the upsampling stage, which can accurately restore the low-resolution feature map to a pixel-level prediction. Finally, the detection method of the invention selects and classifies the lane line position on each predefined line. Because the predefined line number is far smaller than the height of the image, the lane line detection method can achieve higher detection speed.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of a lane line detection network according to the present invention;

FIG. 3 is a schematic diagram of a spatial feature interaction module according to the present invention;

FIG. 4 is a schematic diagram of the spatial feature interaction "from right to left" message delivery in accordance with the present invention;

FIG. 5 is a schematic diagram of a bilateral upsampling structure of the present invention;

fig. 6 is a diagram illustrating the lane line detection effect of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

As shown in fig. 1, the lane line detection method based on spatial feature interaction according to the present invention includes the following steps: firstly, processing training data; secondly, constructing a lane line detection network based on spatial feature interaction; thirdly, training the network model established in the second step by using the data processed in the first step, performing parameter learning on the model by using an Adam optimization strategy, and storing a final training model; and fourthly, testing the final network model in the third step.

The method comprises the following steps: processing the training data;

the present embodiment uses the cuiane data set, collected by cameras mounted on six different vehicles driven by different drivers in beijing. The CULane dataset collected over 55 hours of video and extracted 133,235 frames. Wherein the training set size is 88880 frames, the validation set size is 9675 frames, and the test set size is 34680 frames. The data set contains 9 different scenes including normal, crowded, curved, glare, night, no lane, shadow, intersection and downtown arrow scenes.

First, to balance the speed of detection, the image size of the original data set is scaled to 288 × 800. Then, in order to improve generalization capability, a data enhancement method combining rotation, vertical movement and horizontal movement is applied to the scaled image. In addition, since the edge of the enhanced image may be vacant, in order to maintain the lane structure, the lane line is extended to the boundary of the image.

(1) integral structure of lane line detection network

The invention defines the lane line detection problem as position selection and classification in the row direction based on space feature interaction. The whole network structure is shown in fig. 2 and mainly comprises three parts: the system comprises a feature extractor, a spatial feature interaction module and a predictor based on classification. In addition, an auxiliary segmentation module is provided, and it should be noted that the auxiliary segmentation task is only used in the training stage and is deleted in the testing stage.

(2) Feature extractor

At this stage, the preliminary features are extracted, and the ResNet is used as a feature extractor after removing the full connection layer. The feature extractor consists of 17 convolution layers, each convolution layer is followed by a batch normalization layer and a ReLU activation function layer.

(3) Spatial feature interaction module

The values of the feature map X at the kth iteration are shown, where c, i and j represent the channel, row and column indices, respectively. Then the forward calculation formulas (1) and (2) of the spatial feature interaction moduleThe followings (3) and (4) are as follows:

where K is the number of iterations, and K is log₂And L. L in the formula (1) and the formula (2) is W and H, respectively. f is the nonlinear activation function, the present invention uses ReLU. The X labeled' represents the updated element. s_kIs the move step in the k-th iteration. The formula (1) and the formula (2) are vertical and horizontal information transfer formulas, respectively. F is a set of one-dimensional convolution kernels, where m, c, n represent the indices of the input channel, output channel, convolution kernel width, respectively. Here both the number of input channels and the number of output channels are equal to C. Z in the formulae (1) and (2) is an intermediate result of information transfer. Note that the feature map X is divided into H slices in the horizontal direction and W slices in the vertical direction, as shown in fig. 3(a) and 3 (b). Moving step s_kAnd dynamically determining the information transmission distance under the control of the iteration number k.

There are four directions for information transfer, and the present invention uses "bottom-up" (as shown in fig. 3(a) "," top-down "as vertical information interaction, and" left-to-right "," right-to-left "(as shown in fig. 3 (b)) as horizontal information interaction. By continuously moving the slice feature map in the vertical and horizontal directions, all spatial information in the same feature map can be interacted and perceived at each position. The information transfer from "right to left" is used as an illustration here, as shown in fig. 4. When the number of iterations k is 1, s₁Each X is 1_iCan receive X_i+1The characteristics of (1). Due to the repeated movement, the column at the end may also receive a feature of the other side, namely X_w-1Can receive X₀The characteristics of (1). When the iteration number k is 2, s ₂2 each X_iCan receive X_i+2The method is characterized in that. With X₀For example, X₀X may be received in a second iteration₂Considering X in the previous iteration₀Received from X₁Information of (A), and X₂Received from X₃Information of (2), now X₀Received from X in only two iterations₀、X₁、X₂、X₃The information of (1). The next iteration is similar to the above process. After all K iterations, when the iteration number K is K, each X_iThe information in the entire feature map can be perceived.

(4) Classification-based predictor

(5) Auxiliary segmentation module

Because the segmentation network has finer prediction on the lane line edge, but because of larger calculation amount, the invention only uses the segmentation task in the training stage, and assists the main network to train the model better. Thus, even if an additional segmentation task is added, the detection speed is not affected. In the auxiliary segmentation task, firstly, a feature graph processed by the spatial feature interaction module and two feature graphs of different scales extracted by the feature extractor are unified into a feature graph of the same size, then the feature graph, the feature graph and the feature graph are spliced to obtain a feature graph, the feature graph passes through a convolution layer, and the feature graph sampled from the convolution layer to be consistent with the original graph in size is subjected to segmentation prediction through double-side sampling.

The double-sided sampling consists of coarse-grained branches and fine-grained branches, and the structure is shown in fig. 5. Coarse grain branching will quickly obtain coarse up-sampled features from the previous layer. The number of channels is first reduced to 1/2 for the input feature map by 1 x 1 convolution, and then bilinear interpolation is used directly to up-sample the input feature map. The fine grain branch is used to fine tune the fine information loss of the coarse grain branch, and the path is deeper than the coarse grain branch. The feature map is upsampled using the transposed convolution with step size 2 while reducing the number of channels 1/2. Thereafter, two non-bottleneck blocks (non-bottle) are stacked. The non-bottleneck block is composed of 4 convolutions of 3 × 1 and 1 × 3 with BN and ReLU, which can maintain the shape of the feature map and extract information efficiently by way of decomposition. And finally, carrying out addition operation on the two branches.

for the CULane dataset, the present invention uses the rows defined by the dataset. Specifically, the range of rows of the CULane dataset with an image height of 590 is 260 to 580, with a step size of 10. The number of cells on each predefined row is set to 200. In the optimization process, the processed lane line image is used as the input of a network, an Adam optimization algorithm is used for enabling a composite loss function to be minimum to train a model, the momentum is 0.9, the cosine attenuation learning rate is initialized by 4e-4, the batch size is 16, and the training iteration number is 50. The recombination loss is: l is_total＝L_cls+βL_seg. Wherein L is_clsTo classify the loss, L_segFor the division loss, β is a loss coefficient, where β is set to 1. The present invention uses focus loss as classification loss and cross entropy as auxiliary segmentation loss. L is_clsAnd L_segAs shown in formulas (5) and (6):

And (3) adopting an early stopping strategy to prevent overfitting during model training, and storing the final training model after the training is finished.

Step four: and testing the final network model in the third step.

The original image size is first scaled to 288 x 800. And taking the processed lane line image as the input of the network, loading the trained model, and obtaining the detection result of the lane line through forward propagation. As can be seen from FIG. 6, the method of the present invention can accurately detect the lane line even when the vehicle is crowded at night. For the CULane data set, each lane line is considered to be a 30-pixel wide line. If the intersection ratio (IoU) of the predicted value and the true value of the lane line is greater than the threshold value (0.5), it is considered as True Positive (TP). The F1 score was used as an evaluation index as shown in formula (7):

wherein the content of the first and second substances,

FP and FN were false positive and false negative, respectively. The test results under the CULane data set are shown in table 1.

Table 1 results of the experiment:

the above-described embodiments are merely illustrative of the present invention and are not limited to the scope thereof, and those skilled in the art can make modifications to the parts thereof without departing from the spirit and scope of the present invention.

Claims

1. A lane line detection method based on spatial feature interaction is characterized by comprising the following steps:

the method comprises the following steps: processing the training data;

step four: and testing the final network model in the third step.

2. The method as claimed in claim 1, wherein the processing of the training data in the first step comprises the following steps:

first, the original image is resized to 288 × 800; then, in order to improve generalization capability, a data enhancement method combining rotation, vertical movement and horizontal movement is adopted for the zoomed image; in addition, since the edge of the enhanced image may be vacant, in order to maintain the lane structure, the lane line is extended to the boundary of the image.

3. The lane line detection method based on spatial feature interaction according to claim 1, wherein the lane line detection network structure in the second step mainly comprises three parts: the system comprises a feature extractor, a spatial feature interaction module and a predictor based on classification; in addition, an auxiliary segmentation module is provided, and it should be noted that the auxiliary segmentation task is only used in the training stage and is deleted in the testing stage.

4. A method according to claim 3, wherein the feature extractor is obtained by removing the fully connected layer from ResNet; the feature extractor consists of 17 convolution layers, each convolution layer being followed by a batch normalization layer, a ReLU activation function layer.

5. The method of claim 3, wherein the spatial feature interaction module achieves the interaction of the spatial information by moving the sliced feature map in vertical and horizontal directions;

the information transmission has four directions, the invention uses 'from bottom to top' and 'from top to bottom' as vertical information interaction, and 'from left to right' and 'from right to left' as horizontal information interaction; continuously moving the slice characteristic diagram in the vertical and horizontal directions to carry out K times of iteration so that all spatial information in the same characteristic diagram can be interacted and sensed at each position; specifically, a three-dimensional feature map tensor X is provided, of size C H W, where C, H and W represent the number of channels, rows and columns respectively,

representing the values of the feature map X at the kth iteration, where c, i, and j represent the channel, row, and column indices, respectively; then the forward calculation formulas (1), (2), (3) and (4) of the spatial feature interaction module are as follows:

where K is the number of iterations, and K is log₂L, L in the formulae (1) and (2) being W and H, respectively, f being a non-linear activation function, X, which is denoted by' representing an updated element, s_kIs the move step in the kth iteration; the formula (1) and the formula (2) are vertical and horizontal information transfer formulas respectively, F is a group of one-dimensional convolution kernels, wherein m, C and n respectively represent indexes of input channels, output channels and the widths of the convolution kernels, wherein the number of the input channels and the number of the output channels are equal to C, Z in the formula (1) and the formula (2) is an intermediate result of information transfer, and the characteristic diagram X is divided into H slices in the horizontal direction and W slices in the vertical direction; moving step s_kAnd dynamically determining the information transfer distance under the control of the iteration number k.

6. The method of claim 3, wherein the classification-based predictor makes a selection, classification of lane line position on each predefined row;

firstly, selecting h predefined rows according to training data, wherein each predefined row is divided into (w +1) small units; when prediction based on classification is carried out, mapping rich feature maps learned by the spatial feature interaction module to feature maps m multiplied by h multiplied by (w +1) (m represents the number of lane lines) with dimensions required by classification row by row through two fully-connected layers; then (w +1) -dimensional classification is carried out on h predefined rows respectively, and the positions of the lane lines on all the predefined rows are found, so that the whole lane line is predicted.

7. Method according to claim 3, characterized in that said secondary segmentation module is used only during the training phase, comprising the following processes:

in the auxiliary segmentation task, firstly unifying a feature map processed by a spatial feature interaction module and two feature maps with different scales extracted by a feature extractor into a feature map with the same size, splicing the feature maps, passing the obtained feature map through a convolution layer, and performing segmentation prediction on the feature map which is sampled to be consistent with the original map in size through double-side sampling; sampling on two sides is divided into two parts, and one part depends on bilinear difference values to obtain coarse-grained up-sampling characteristics; another part relies on transpose convolution to fine tune coarse-grained fine information loss.

8. The method of claim 7, wherein said bilateral upsampling consists of coarse-grain and fine-grain branches.

The coarse-granularity branch is used for quickly acquiring coarse up-sampling features from the upper layer, firstly, 1 × 1 convolution is used for reducing the number of channels to 1/2 of an input feature map, and then bilinear interpolation is directly used for up-sampling the input feature map; the fine grain branch is used for fine adjustment of fine information loss of the coarse grain branch, the path is deeper than the coarse grain branch, the feature graph is up-sampled by using a transposition convolution with the step size of 2, the number of channels is reduced by 1/2, and then two non-bottleneck blocks (non-bottle) are stacked, wherein the non-bottleneck blocks are formed by 4 convolutions of 3 × 1 and 1 × 3 with BN and ReLU, the shape of the feature graph can be maintained, and information can be efficiently extracted in a decomposition mode; and finally, carrying out addition operation on the two branches.

9. The lane line detection method based on spatial feature interaction of claim 1, wherein in step three, the network model training adopts an Adam optimization strategy, the momentum is 0.9, the cosine decay learning rate initialized by 4e-4, the batch size is 16, and the training iteration number is 50; model training employs an early-stop strategy to prevent overfitting.