CN112149526B

CN112149526B - Lane line detection method and system based on long-distance information fusion

Info

Publication number: CN112149526B
Application number: CN202010928733.7A
Authority: CN
Inventors: 李松斌; 唐计刚; 刘鹏
Original assignee: Nanhai Research Station Institute Of Acoustics Chinese Academy Of Sciences
Current assignee: Nanhai Research Station Institute Of Acoustics Chinese Academy Of Sciences
Priority date: 2020-09-07
Filing date: 2020-09-07
Publication date: 2023-11-28
Anticipated expiration: 2040-09-07
Also published as: CN112149526A

Abstract

The invention discloses a lane line detection method and a lane line detection system based on long-distance information fusion, wherein the method comprises the following steps: preprocessing a picture to be detected; inputting the preprocessed image into a pre-trained lane detection model to obtain a detection image of lane line and background segmentation, wherein white represents the lane line and black represents the background; the lane line detection model includes: the system comprises a feature dimension reduction module, a long-distance feature association module, a global information fusion module, a feature reconstruction module and a patch discrimination module; the feature dimension reduction module is used for reducing dimension of the preprocessed image to be detected and extracting low-layer image features; the long-distance characteristic association module is used for enhancing the association between the long-distance characteristic points of the lane lines according to the specificity of the lane lines; the global information fusion module is used for calculating global relevance and carrying out global information fusion on the images; the feature reconstruction module is used for obtaining a reconstructed image; the patch judging module is used for judging the confidence level of the reconstructed image.

Description

Lane line detection method and system based on long-distance information fusion

Technical Field

The invention relates to the technical field of machine vision and deep learning, in particular to a lane line detection method and system based on long-distance information fusion.

Background

Autopilot is a technological direction that people are always pursuing. There are a number of difficulties in implementation, the most important of which is the problem of environmental perception. Lane line detection is a key step in implementing automated driving techniques as an important component of vehicle visual environment perception.

For many years, attempts have been made to explore lane line detection methods based on a variety of different types of sensor devices, such as lidar, cameras, etc., that are accurate and efficient. Among them, the high cost of radar equipment limits its practical application value. Due to the low cost of the camera, the information richness of the acquired image and the development of image processing technology, the lane line detection method based on vision is widely studied.

Through development in recent decades, the flow of a visual-based lane line detection method is basically unified. We can divide this into the following processes: image acquisition and preprocessing, region of interest shearing, feature extraction, feature fusion and lane line fitting. These methods typically rely on manual decision-making of the type of selected feature, the nature of which depends on the pre-processing scenario. Thus, this type of approach has its own scene limitations. This limitation also limits the practical application of these methods. How to enable the detection method to adaptively select image features is a key step of the vision-based lane line detection method to practical application. With the development of big data technology, methods for extracting features of interest from big data have been widely studied. Among them, the most effective method for large-scale image feature extraction is convolutional neural network.

The convolutional neural network promotes the development of computer vision to a great extent, and comprises the fields of image classification, semantic segmentation, target detection and the like. The convolutional neural network is developed from a multi-layer perceptron, and the core idea is to simulate the learning mechanism of human brain to vision, and construct a deep neural network model through a stacked hierarchical structure. Through the design of the thought and the learning mechanism, the convolutional neural network can adaptively extract target characteristics in the training process. The method for adaptively extracting the target features is greatly improved compared with a manual heuristic selection method.

As an application task of image processing, a lane line detection method based on vision has greatly advanced depending on the development of image processing technology. The feature extraction process of vision-based lane line detection methods has been gradually replaced by convolutional neural networks. Compared with the manual feature selection method, the accuracy of the conventional lane line detection method based on the convolutional neural network model is greatly improved. However, how to design more task-specific network structures and strategies is still a problem that requires intensive research.

Disclosure of Invention

The invention aims to overcome the technical defects of the existing lane line detection method and provides a lane line detection method and system based on long-distance information fusion.

In order to achieve the above object, the present invention provides a lane line detection method based on long-distance information fusion, the method comprising:

preprocessing a picture to be detected;

inputting the image obtained by pretreatment into a pre-trained lane detection model to obtain a detection image of lane line and background segmentation, wherein white represents the lane line and black represents the background;

the lane line detection model includes: the system comprises a feature dimension reduction module, a long-distance feature association module, a global information fusion module, a feature reconstruction module and a patch discrimination module; wherein,

the feature dimension reduction module is used for reducing dimension of the input preprocessed image to be detected and extracting low-layer image features;

the long-distance characteristic association module is used for expanding the low-layer image characteristics and enhancing the association between the long-distance characteristic points of the lane lines according to the specificity of the lane lines;

the global information fusion module is used for calculating global relevance between the deep lane line features and the non-lane line features and carrying out global information fusion on the images;

the feature reconstruction module is used for reconstructing the features, long-distance associated features and low-layer image features after the global information fusion into image input dimensions to obtain a reconstructed image with the same size as the input image;

the patch judging module is used for judging the confidence level of the reconstructed image.

As an improvement of the above method, the pretreatment is specifically: and cutting the picture to be detected by adopting a bilinear interpolation method to obtain an image with the size of 256 multiplied by 512 multiplied by 3.

As an improvement of the method, the characteristic dimension reduction module comprises a first convolution block and a second convolution block which are connected in sequence; wherein,

the first convolution block comprises a convolution of 2 layers 3×3 with a step size of 1 and a ReLU function;

the second convolution block comprises a 3-layer 3 x 3 convolution with a step size of 1 and a ReLU function.

As an improvement of the above method, the long-distance feature correlation module includes an asymmetric convolution unit and a splicing unit; wherein,

the asymmetric convolution unit comprises a first branch and a second branch which are parallel; the first branch is a convolution layer with n multiplied by 1 and the step length of 1; the second branch is a convolution layer with 1×n and step length of 1; the convolution layers all adopt ReLU as an activation function; n is an integer greater than 1;

the input of the splicing unit is the output x of the first branch ₁ And the output x of the second branch ₂ The output of the splicing unit is an expanded characteristic diagram X, and the processing procedure of the splicing unit is as follows:

X＝concat(x ₁ ,x ₂ )。

as an improvement of the method, the global information fusion module comprises a position attention module and a channel attention module which are connected in parallel; wherein,

the position attention module comprises 3 1×1 convolution blocks, and the specific processing of the position attention module is as follows: calculating the correlation of different pixel points, and carrying out weighted summation on the characteristics;

correlation s of different pixels _ji Expressed as:

weighted sum E _j Expressed as:

wherein B, C and D represent the output of the input matrix A through 3 1×1 convolutional layers, respectivelyMatrices, each of which has dimensions c×h×w, wherein C, H and W represent the number of channels, height and width, respectively, of the feature map, B _i ,C _j ,D _i Respectively representing the ith row vector of matrix B, the jth column vector of matrix C, and the ith row vector of matrix D, s _ji (j, i) th element, E, representing the correlation matrix S between global pixel points _j The j-th column vector, A, representing the feature map E _j The j-th column vector of the input matrix A is represented, alpha represents a weight coefficient, N represents the number of the pixel points of the feature image, and sigma represents summation operation;

the specific processing of the channel attention module is as follows: calculating the correlation of different channels of the feature map, and carrying out weighted summation on the features;

correlation G of different channels _qk Expressed as:

wherein G is _qk (q, k) th element of the channel correlation matrix G, A _k Represents the kth row vector of input matrix A, A _q A q-th column vector representing the input matrix a;

weighted sum F _q Expressed as:

wherein F is _q The q-th column vector of the feature map F, and beta represents the weight of the weight and the weight of the rear matrix;

element-level addition is performed on two parallel branch output feature graphs E and F, and the feature graph X is expressed as follows:

X＝E+F。

as an improvement of the method, the feature reconstruction module comprises 5 transpose convolution layers with step length of 2 and 5 batch normalization layers which are cascaded; the transpose convolution layer is used for realizing feature mapping of the output result of the high-level abstract features to the output layer and amplifying the size of the feature map by two times;

the batch normalization layer is used for normalizing the feature images.

As an improvement of the method, the patch judging module comprises 4 convolution blocks and 1 convolution block with 1×1 step length of 1, wherein the 4 convolution blocks are cascaded; the first 4 convolution blocks each include 1 convolution layer with 3 x 3 and a step size of 2, 1 batch normalization layer, and 1 LeakRelu activation function; 1 x 1 convolution block with a step size of 1 is used to convert the feature map from the 16 x 32 x 512 dimension to a 16 x 32 x 1 confidence map.

As an improvement of the above method, the method further includes a training step of the lane line detection model; the method specifically comprises the following steps:

step 1), selecting pictures, and establishing a training set after preprocessing;

step 2) labeling pictures of the training set to obtain a labeled training set, and randomly dividing the training set into U groups according to each group of U sheets;

step 3) randomly selecting a group of pictures to input a lane line detection model, and outputting a group of detection images;

step 4) calculating the error between the detection result and the real label by adopting weighted cross entropy:

wherein,representing real label->Average error from detection result y, p (x _r ) Model forward output result representing the r-th sample, q (x _r ) A true label representing the r sample, m representing the classified category number;

step 5) adjusting all parameters of the model based on a gradient descent method to obtain a new model parameter combination, and turning to step 3); continuously repeating until all the U groups of pictures are input into the model, and entering the step 6);

step 6) re-shuffling the pictures of the training set, and turning to the step 1); and obtaining a trained lane line detection model until the optimal parameter combination of the model is trained.

A lane line detection system based on long-distance information fusion, the system comprising: the lane line detection system comprises a preprocessing module, a detection module and a trained lane line detection model; wherein,

the preprocessing module is used for preprocessing the picture to be detected, and cutting the picture to be detected by adopting a bilinear interpolation method to obtain a picture with the size of 256 multiplied by 512 multiplied by 3;

the detection module is used for inputting the preprocessed picture into a trained lane line detection model to obtain a detection image of lane line and background segmentation, wherein white represents the lane line and black represents the background;

Compared with the prior art, the invention has the advantages that:

1. the method not only improves the performance of the lane line detection method, but also obtains good detection effect for lane line detection under complex conditions;

2. the method of the invention provides the idea of improving the long-distance dependence of the algorithm aiming at lane line detection. Aiming at the problem of lane line edge divergence of the detection results of the existing various methods, the model structure is reasonably designed. Finally, the model is enabled to excavate and learn the lane line characteristics in a data driving mode, so that the performance of the lane line detection method is effectively improved;

3. the constructed lane line detection model can automatically extract and learn lane line characteristics, and the characteristics have better depicting capability on the lane line edges and the long-distance relationship of the lane lines;

4. according to the lane line detection model constructed by the method, the linear characteristics of the lane lines are expanded through asymmetric convolution and the correlation calculation among pixel points is carried out through an attention mechanism, so that the deep learning model has stronger pertinence on the linear characteristics of the lane lines, and meanwhile, the deep learning model has excellent generalization capability on the lane line detection under the conditions of lane shielding and the like; when testing the reference data set, the method of the invention achieves performance that exceeds the existing methods; therefore, the invention can meet the requirements of practical lane line detection application.

Drawings

FIG. 1 is a schematic view of a lane line detection model of the present invention;

FIG. 2 (a) is a schematic diagram of a location attention structure of the global information fusion module of the present invention;

FIG. 2 (b) is a schematic diagram of the channel attention structure of the global information fusion module of the present invention;

FIG. 3 is a schematic diagram of a patch determination module according to the present invention;

FIG. 4 is a schematic diagram of the global architecture of the method of the present invention;

FIG. 5 is a flowchart of the lane line detection model training step of the present invention.

Detailed Description

The method of the invention comprises the following steps:

preprocessing a picture to be detected;

and inputting the preprocessed image into a trained preliminary feature extraction network to obtain preliminary lane image features. The preliminary lane features are used as input of a long-distance information fusion network which is trained in advance, and long-distance information fusion among lane feature points is enhanced. And finally obtaining a detection image of the lane line and the background segmentation, wherein white represents the lane line and black represents the background.

The technical scheme of the invention is described in detail below with reference to the accompanying drawings and examples.

Example 1

The embodiment 1 of the invention provides a lane line detection method based on long-distance information fusion, which comprises the following steps:

the method comprises the following steps of 1) establishing a lane line detection model;

as shown in fig. 1, the lane line detection model includes: the system comprises a feature dimension reduction module, a long-distance feature association module, a global information fusion module, a feature reconstruction module and a patch discrimination module;

the feature dimension reduction module is used for reducing dimension of the input picture features and extracting lane line features; the module consists of 5 convolution blocks, containing two types: a first convolution block and a second convolution block. The first convolution block consists of two layers of convolution with the size of 3 multiplied by 3 and the step length of 1 and a ReLU activation function; the second convolution block consists of three layers of convolutions of size 3 x 3, step size 1 and a ReLU function. The dimension of the output characteristic diagram of each convolution block is respectively as follows: 128×256×64, 64×128×128, 32×64×256, 16×32× 512,8 ×16×512. Feature maps of different sizes and different depths extract different lane line features from the input image.

The long-distance feature association module is used for operating the feature map output by the third convolution block of the feature dimension reduction module; the module consists of two parallel asymmetric convolutions, comprising two parallel branches: a first branch and a second branch; the first branch comprises a convolution layer with the size of 5 multiplied by 1, the step length of 1 and the padding of 2; the second branch is filled with a convolution layer with the size of 1 multiplied by 5, the step length of 1 and the length of 2; the convolution layers all adopt ReLU as an activation function;

splicing the feature graphs generated by the two parallel branches according to the channel dimension; the splicing mode is expressed as follows:

X＝concat(x ₁ ,x ₂ )

wherein X represents a characteristic diagram generated after splicing, concat ()' represents splicing operation, and X is the number of the characteristic diagrams ₁ ,x ₂ Output feature maps corresponding to the first branch and the second branch respectively; x is x ₁ ,x ₂ The output feature map sizes of (a) are 32×64×256, and the output feature map sizes of x are 32×64×512. And finally, taking X as one of the inputs of a third convolution block of the characteristic reconstruction module to reconstruct the characteristic.

The global information fusion module is used to further enhance global dependencies between pixels, as shown in fig. 2 (a) and fig. 2 (b). This module acts on the fifth convolution block of the feature extraction module, whose input feature map dimensions are 8 x 16 x 512. The module comprises two attention modules connected in parallel: a location attention module and a channel attention module; the position attention module consists of 3 1 x 1 convolutions and functions to calculate the correlation of different pixels and to weight and sum the features.

Correlation s of different pixels _ji Expressed as:

weighted sum E _j Expressed as:

wherein B, C and D represent the output matrices of the input matrix a through 3 1×1 convolutional layers, respectively, each having dimensions c×h×w, wherein C, H and W represent the number of channels, height and width of the feature map, respectively, and sizes 512,8 and 16, respectively. B (B) _i ,C _j ,D _i Respectively representing the ith row vector of matrix B, the jth column vector of matrix C, and the ith row vector of matrix D，s _ji (j, i) th element, E, representing the correlation matrix S between global pixel points _j The j-th column vector of the output feature map E representing the position attention module, A _j The j-th column vector of the input matrix A is represented, alpha represents a weight coefficient, N represents the number of the pixel points of the feature image, and sigma represents summation operation;

the channel attention module is formed by modifying a position attention module and is used for calculating the correlations of different channels of the characteristic diagram and carrying out weighted summation on the characteristics.

Correlation G of different channels _qk Expressed as:

weighted sum F _q Expressed as:

wherein F is _q The q-th column vector of the feature map F, and beta represents the weight of the weight and the weight of the rear matrix; the size of the feature map F is 8×16×512.

Element-level addition is performed on the feature graphs E and F of two parallel branch outputs, and the summation mode is expressed as follows:

X＝E+F

finally, the size of the feature map X is 8×16×512. The feature map is used as an input of a first convolution block of the feature reconstruction module to perform feature reconstruction.

The feature reconstruction module comprises 5 transpose convolution modules in cascade. Each transpose convolution module is comprised of a transpose convolution layer and a batch normalization layer. The transposed convolution layer is used for realizing feature mapping of the output result of the high-level abstract features to the output layer and amplifying the size of the feature map by two times. The batch normalization layer is used for normalizing the feature map, which is beneficial to the rapid convergence of the network and the prevention of loss and divergence.

The patch discriminating module comprises 4 concatenated convolution blocks, as shown in fig. 3. Each convolution block consists of a convolution layer with a convolution kernel size of 3 x 3 and a step size of 2, a batch normalization layer and a leakrlu activation function. Finally, the feature map is converted from 512 channels to 1 channel by a convolution with a 1 convolution kernel size of 1×1 and a step size of 1. Finally, a patch with the size of 16×32×1 is obtained.

As shown in fig. 4, a global structural diagram of the method of the present invention is shown.

Step 2) training the established lane line detection model by using the pictures of the training set, as shown in fig. 5, including:

step 2-1) clipping all pictures of the training set by adopting a bilinear interpolation method, wherein the size is as follows: 256×512×3;

step 2-2), randomly dividing the pictures of the training set into N groups according to N pieces of each group;

step 2-3) randomly reading a group of pictures, inputting the group of pictures into a feature dimension reduction module for dimension reduction and extracting low-layer features;

step 2-4) inputting the feature map obtained by the third convolution block of the feature dimension reduction module into a long-distance feature association module to realize long-distance association among shallow features and obtain a feature map of long-distance association after the third convolution block;

step 2-5), inputting the feature map obtained by the fifth convolution block of the feature dimension reduction module into a global information fusion module to realize global information fusion of deep features;

step 2-6), the global information fusion module obtains a first layer reconstruction convolution block of the feature map input feature reconstruction module, and the 2 nd, 4 th and 5 th reconstruction convolution blocks are input into a splice of the output of the last reconstruction convolution layer and the output of the corresponding feature dimension reduction convolution layer. The input of the 3 rd reconstruction convolution block is the concatenation of the output of the 2 nd reconstruction convolution block and the output of the long-distance characteristic association module. Performing feature mapping of an output result on the feature dimension after feature reconstruction, and outputting a group of detection images;

step 2-7) splicing the detection image and the input image to be used as the input of the patch judging module. The patch discriminating module determines the authenticity of the detected image.

Step 2-8) updating parameters in the model by adopting a gradient descent method; repeating iteration until the optimal parameter combination of the model is trained; comprising the following steps:

step 2-8-1), carrying out error calculation on the result output by the model and the real label; the error calculation adopts cross entropy, which is expressed as:

wherein,representing a set of authentic tags->Average error between the set of detection results y, p (x _i ) Representing the forward output result of the model, q (x _i ) Representing the real labels, Σ representing the summation;

step 2-8-2) adopts the parameters obtained in step 2-7-1) as the weight value of the iteration; randomly selecting a group of pictures from the rest pictures, and obtaining a new parameter combination through steps 2-1), 2-2), 2-3), 2-4), 2-5), 2-6), 2-7) and 2-8-1); repeating iteration until one iteration is completed;

step 2-8-3) re-shuffling the training pictures, and turning to step 2-2); and repeatedly executing until the optimal parameter combination of the model is trained.

Step 3) cutting the picture to be detected by adopting a bilinear interpolation method to enable the picture to be detected to meet the requirement of the lane line detection model input required size: 256×512×3, then a lane line detection model with an optimal parameter combination is input, and a lane line detection result of the road image is output.

Verifying the trained lane line detection model by using a group of pictures:

for the picture to be detected, the size of the picture to be detected is adjusted to 256 multiplied by 512 multiplied by 3 which is required by model input by bilinear interpolation;

inputting the adjusted picture into a lane line detection model with optimal parameters, and obtaining a prediction result through model forward transmission;

comparing the class prediction graph corresponding to the maximum probability value of the output result with the real label graph, if the class prediction graph is consistent with the real label graph, predicting correctly, otherwise, predicting incorrectly;

by executing the steps, all the pictures to be detected are detected, and the method can be found to have high detection accuracy according to the comparison result.

Example 2

Based on the above method, embodiment 2 of the present invention proposes a lane line detection system based on long-distance information fusion. The system comprises: the system comprises a trained lane line detection model, a preprocessing module and a detection module; wherein,

the detection module is used for inputting the preprocessed picture into a pre-trained lane line detection model to obtain a detection image of lane line and background segmentation, wherein white represents the lane line and black represents the background.

Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and are not limiting. Although the present invention has been described in detail with reference to the embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the appended claims.

Claims

1. A lane line detection method based on long-distance information fusion, the method comprising:

preprocessing a picture to be detected;

inputting the image obtained by pretreatment into a pre-trained lane line detection model to obtain a detection image of lane line and background segmentation, wherein white represents the lane line and black represents the background;

the patch judging module is used for judging the confidence level of the reconstructed image;

the long-distance characteristic association module comprises an asymmetric convolution unit and a splicing unit; wherein,

X＝concat(x ₁ ,x ₂ )；

the global information fusion module comprises a position attention module and a channel attention module which are connected in parallel; wherein,

correlation s of different pixels _ji Expressed as:

weighted sum E _j Expressed as:

wherein B, C and D represent the output matrices of the input matrix A through 3 1×1 convolutional layers, respectively, each matrix having dimensions C×H×W, whereinC. H and W respectively represent the channel number, the height and the width of the feature map, B _i ,C _j ,D _i Respectively representing the ith row vector of matrix B, the jth column vector of matrix C, and the ith row vector of matrix D, s _ji (j, i) th element, E, representing the correlation matrix S between global pixel points _j The j-th column vector, A, representing the feature map E _j The j-th column vector of the input matrix A is represented, alpha represents a weight coefficient, N represents the number of the pixel points of the feature image, and sigma represents summation operation;

correlation G of different channels _qk Expressed as:

weighted sum F _q Expressed as:

wherein F is _q The q-th column vector of the feature map F, β representing the weight coefficient;

X＝E+F。

2. the lane line detection method based on long-distance information fusion according to claim 1, wherein the preprocessing specifically comprises: and cutting the picture to be detected by adopting a bilinear interpolation method to obtain an image with the size of 256 multiplied by 512 multiplied by 3.

3. The lane line detection method based on long-distance information fusion according to claim 2, wherein the feature dimension reduction module comprises a first convolution block and a second convolution block which are sequentially connected; wherein,

4. The lane line detection method based on long-distance information fusion according to claim 3, wherein the feature reconstruction module comprises 5 transposed convolution layers with step length of 2 and 5 batch normalization layers in cascade; the transpose convolution layer is used for realizing feature mapping of the output result of the high-level abstract features to the output layer and amplifying the size of the feature map by two times;

the batch normalization layer is used for normalizing the feature images.

5. The lane line detection method based on long-distance information fusion according to claim 4, wherein the patch discriminating module comprises 4 concatenated convolution blocks and 1 convolution block with 1 x 1 and 1 step size; the first 4 convolution blocks each include 1 convolution layer with 3 x 3 and a step size of 2, 1 batch normalization layer, and 1 LeakRelu activation function; 1 x 1 convolution block with a step size of 1 is used to convert the feature map from the 16 x 32 x 512 dimension to a 16 x 32 x 1 confidence map.

6. The lane line detection method based on long-distance information fusion according to claim 5, further comprising a training step of a lane line detection model; the method specifically comprises the following steps:

wherein,represents the average error between the real label y and the detection result y, p (x _r ) Model forward output result representing the r-th sample, q (x _r ) A true label representing the r sample, m representing the classified category number;

7. A lane line detection system based on long-distance information fusion, the system comprising: the lane line detection system comprises a preprocessing module, a detection module and a trained lane line detection model; wherein,

X＝concat(x ₁ ,x ₂ )；

correlation s of different pixels _ji Expressed as:

weighted sum E _j Expressed as:

wherein B, C and D represent the output matrix of the input matrix A through 3 1×1 convolutional layers, respectively, each matrix having dimensions C×H×W, wherein C, H and W represent the number of channels, height and width, respectively, of the feature map, B _i ,C _j ,D _i Respectively representing the ith row vector of matrix B, the jth column vector of matrix C, and the ith row vector of matrix D, s _ji (j, i) th element, E, representing the correlation matrix S between global pixel points _j The j-th column vector, A, representing the feature map E _j The j-th column vector of the input matrix A is represented, alpha represents a weight coefficient, N represents the number of the pixel points of the feature image, and sigma represents summation operation;

correlation G of different channels _qk Expressed as:

weighted sum F _q Expressed as:

X＝E+F。