CN112785610B

CN112785610B - Lane line semantic segmentation method integrating low-level features

Info

Publication number: CN112785610B
Application number: CN202110049820.XA
Authority: CN
Inventors: 姜立标; 周文超
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-01-14
Filing date: 2021-01-14
Publication date: 2023-05-23
Anticipated expiration: 2041-01-14
Also published as: CN112785610A

Abstract

The invention provides a lane line semantic segmentation method fusing low-level features, which comprises the following steps: collecting video in the running process of the unmanned vehicle, and screening images containing lane lines from the video to form a lane line data set; preprocessing the data set to obtain a training set required by training; inputting lane line images in the training set into a network model for training, respectively extracting second-layer image features, third-layer image features and last-layer image features by the network model, and carrying out multiple low-layer feature fusion operations; and carrying out up-sampling operation of a preset multiple on the semantic information image fused with the low-level features to obtain a lane line semantic segmentation image. The method can realize semantic segmentation of the lane line image, identify the lane line in the driving scene, has the advantages of higher accuracy, stronger anti-interference capability, good robustness and the like, and can be applied to unmanned vehicles.

Description

Lane line semantic segmentation method integrating low-level features

Technical Field

The invention relates to the field of computer vision, in particular to a lane line image semantic segmentation method, and more particularly relates to a lane line semantic segmentation method with low-level features fused.

Background

The lane line semantic segmentation technology is one application of computer vision in the field of unmanned vehicles, and the external environment around the vehicle is perceived through a vehicle-mounted camera, so that the unmanned vehicles are assisted to know the surrounding world. The main core of the lane line semantic segmentation technology is that classification of the category of each pixel point on the image is realized for lane line image information acquired in the running process of the vehicle.

The most main problem of the current lane line detection technology is that accuracy and real-time cannot be simultaneously considered. Traditional lane line detection methods rely on predefined lane line models to detect lane lines, which are susceptible to environmental factor changes. In some complex environments, the traditional road detection method is not high in robustness and is easily influenced by road environments such as illumination intensity, road width change, lane line blurring, lane line edge degradation and the like, and although some methods can accurately detect lane lines, the execution speed is too slow to meet the use requirement; some approaches have improved real-time performance, but at the cost of accuracy. In the automatic driving process, accurate identification of lane line information from images plays a key role in automatic driving. Therefore, a high-efficiency and real-time semantic segmentation algorithm is designed, and the rapid development of the unmanned automobile can be promoted.

Disclosure of Invention

In order to solve the problems of the traditional lane line detection method in the background technology, the invention provides the lane line semantic segmentation method fused with low-level features, which can efficiently detect lane line information in an image in real time, has good algorithm stability and can be applied to an unmanned system.

The invention provides a lane line semantic segmentation method fusing low-level features, which comprises the following steps:

1) Collecting video data in the running process of the unmanned vehicle, and screening out images containing lane lines from the video data to form a data set;

2) Image preprocessing is carried out on images in the data set, and a training set required by training is obtained;

3) Inputting the images in the training set into a network model for training, extracting the second layer image features, the third layer image features and the last layer image features in the network model, and carrying out multiple low layer feature fusion operations to obtain semantic information images fused with the low layer features;

4) And carrying out up-sampling operation of a preset multiple on the semantic information image fused with the low-level features to obtain a final lane line semantic segmentation image.

Further, in the step 1), the lane line data set collection process specifically includes the following steps:

1.1 Horizontally mounting the vehicle-mounted camera on a vehicle, driving in various urban road scenes, and shooting video at a preset acquisition frame rate in the driving process to obtain video data comprising lane lines;

1.2 Extracting the last frame image of each second to form a road scene data set;

1.3 Deleting images which do not contain lane lines in the road scene data set, wherein the rest images form the lane line data set, and the images in the data set contain a plurality of urban road scenes.

Further, the urban road scene includes high speed, tunnel, urban area, suburb and campus.

Further, in the step 2), the preprocessing of the image in the dataset comprises the following specific steps:

2.1 Cutting the images in the data set respectively, and extracting the interested areas of the images in the data set;

2.2 Data enhancement processing is carried out on the data obtained in the step 2.1), more images containing lane lines are obtained, and a lane line training set required by training is formed.

Further, the data enhancement processing in step 2.2) means that for a photo in the training set, one of brightness, contrast, scaling, rotation is randomly selected, and the attribute value of the image is randomly changed.

Further, in the step 3), the multiple low-level feature fusion operations are performed, and the specific process is as follows:

3.2 Performing up-sampling operation of a preset multiple on the image features of the last layer, performing convolution operation of 1 multiplied by 1 on the image features of the third layer, and fusing the two layers of image features by using fusion operation to obtain a semantic information image of the first fused low-layer features;

3.3 The semantic information image of the low-level features is fused for the first time, the up-sampling operation of a preset multiple is carried out, meanwhile, the convolution operation of 1 multiplied by 1 is carried out on the features of the second-level images, and fusion operation is used for fusing the two features to obtain the semantic information image of the low-level features fused for the second time, namely the semantic information image of the low-level features fused.

Further, the network model in the step 3) is a deep v3+ network model.

Further, in the step 3), the step of inputting the images in the training set into the network model for training specifically includes: the network model adopts a learning strategy of poly to determine the initial learning rateAnd ending the learning rate, wherein the learning rate needs to be multiplied in the training process of the network model

item is the number of iterations.

Further, in the step 3), the network model needs to calculate the cross loss function during each training, and then adopts a random gradient descent method to optimize the training;

wherein, the cross-over loss function is:

wherein L represents a cross-loss function value; y is a real sample label;

to predict the output value.

Further, after the final lane line semantic segmentation image is obtained in the step 4), the pixel precision and the average intersection ratio are adopted as evaluation indexes of the lane line semantic segmentation result,

wherein, the pixel Precision (PA) is:

average cross-over ratio (MIoU) is:

wherein k+1 represents a category of semantic segmentation, and comprises k target classes and 1 background class; p is p _ii Representing the number of pixels originally belonging to class i and also predicted as class i, p _ij Representing the number of pixels originally belonging to class i but predicted to class j, p _ji Representing the number of pixels originally belonging to the j-th class but predicted as the i-th class.

Compared with the prior art, the invention has the following beneficial effects:

(1) Compared with the traditional lane line detection method, the method can be used for fusing the low-layer characteristic information of the image for multiple times to obtain more lane line boundary information, fusing the low-layer lane line boundary information with the high-layer lane line semantic information, improving the dividing precision of the lane lines, and has the advantages of good instantaneity, high accuracy, good robustness, capability of detecting the lane lines in various scenes and good algorithm universality.

(2) The method disclosed by the invention fuses the boundary information of the lane lines in the image for multiple times, so that the lane line detection process is not influenced by light, the lane lines on the image can be accurately identified and segmented under the condition that the lane lines are fuzzy, and the method has stronger anti-interference capability.

(3) According to the characteristics of various lane lines and easy interference from external environment, the deep labV & lt3+ & gt is selected as the network model, the network model has strong characteristic extraction capability, can adapt to multi-scale shape characteristics, and simultaneously combines an image enhancement method to improve the generalization capability of the network model.

Drawings

Fig. 1 is a schematic flow chart of an embodiment of the present invention.

Fig. 2 is a network configuration diagram of an embodiment of the present invention.

Fig. 3 is a road scene image.

Fig. 4 is a semantic segmentation result image according to an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and specific examples.

Referring to fig. 1-4, the lane line semantic segmentation method with fusion of low-level features provided in this embodiment includes the following steps:

step S1: video data in the running process of the unmanned vehicle are collected, and images containing lane lines are screened out to form a data set.

The method specifically comprises the steps of extracting the last frame of image of each second, forming a road scene data set, deleting all pictures which do not contain lane line information, and forming a plurality of urban road scene data sets which contain high speed, tunnels, urban areas, suburbs, campuses and the like.

The lane line data set acquisition process comprises the following specific processes:

1.1 Horizontally mounting the vehicle-mounted camera on a vehicle, driving in various urban road scenes, and shooting video in the driving process to obtain video data containing lane lines;

1.2 The acquisition frame rate of the video is 30 frames/second, and under the actual road condition, the difference of 30 frames of road scenes per second is not great, so that the last frame of image per second is extracted to form a road scene data set;

1.3 The image which does not contain the lane lines is deleted in the road scene data set, and the rest of the image forms the lane line data set and contains a plurality of urban road scenes such as high speed, tunnels, urban areas, suburbs, campuses and the like.

Step S2: and carrying out image preprocessing on the images in the data set to obtain a training set required by training.

Preprocessing of images in a data set, wherein the specific process is as follows:

2.1 The resolution of the image in the lane line data set is 1920×1080, wherein the lane line of interest is occupied in the lower half of the image, the part of no interest is occupied in the upper half of the image, the image is required to be cut, and the region of interest with the size of 1920×720 is extracted;

2.2 And 2.1) carrying out data enhancement processing on the obtained data in the step 2.1), and adjusting the obtained data from multiple aspects of brightness, contrast, scaling, rotation and the like to obtain more images containing lane lines, thereby forming a lane line training set required by training. In specific implementation, the data enhancement operation specifically includes: for a photo in the training set, randomly selecting one of brightness, contrast, scaling and rotation, and randomly changing the attribute value of the image, wherein the weight value of the brightness is randomly valued between-1 and 1, the weight of the contrast is randomly valued between 0.1 and 0.3, the scaled weight is valued between 0 and 1, and the rotated weight is valued between-1 and 1.

Step S3: and (2) taking the training set obtained in the step (S2) as the input of a network model, extracting the second layer image features, the third layer image features and the last layer image features in the network model, and carrying out multiple low layer feature fusion operations to obtain the semantic information image fused with the low layer features.

In this embodiment, the network model adopts a deep v3+ network model, please refer to fig. 1, and the specific process is as follows:

3.1 Extracting a second layer image feature, a third layer image feature and a last layer image feature from the deep labV < 3+ > network model respectively;

3.2 Carrying out up-sampling operation of a preset multiple (2 times in the embodiment) on the image features of the last layer, gradually recovering the resolution of the image, simultaneously, in order to reduce the parameter quantity of the model and accelerate the lane line detection speed, carrying out convolution operation of 1 multiplied by 1 on the image features of the third layer, and fusing the two layers of image features by using fusion operation to obtain a semantic information image of the first fused low-layer feature;

3.3 The semantic information image of the first fusion low-level feature is subjected to up-sampling operation of a preset multiple (2 times in the embodiment), meanwhile, the second-level image feature is subjected to convolution operation of 1×1, and the semantic information image of the second fusion low-level feature is obtained by fusion of the two images through fusion operation.

In this embodiment, the training method of the network model specifically includes: the network model employs a learning strategy of "poly". In the training process, in order to avoid that the model cannot be converged due to too large learning rate or the convergence speed is too slow due to too small learning rate, the initial learning rate is set to be 0.007, and the end learning rate is set to be 1e ^-6 Then, in the training process of the model, the learning rate needs to be multiplied by

Wherein the power in the formula is 0.9, the max_iter is 30000, the iter is the iteration number, each epoch is iterated for 30000, and the model trains 43 epochs in total.

The network model needs to calculate the cross loss function during each training, and then adopts a random gradient descent method to optimize the training, and the cross loss function is adopted in the embodiment as follows:

wherein L represents a cross-loss function value; y is a real sample label;

to predict the output value. When predicting output value

The closer to true tag y, the smaller the value of the loss function L.

Step S4: and (3) carrying out up-sampling operation of a preset multiple (4 times in the embodiment) on the semantic information image fused with the low-level features, and recovering the resolution of the image to the original image size to obtain a final lane line semantic segmentation result. And then using the pixel precision and the average intersection ratio as evaluation indexes of the lane line semantic segmentation results to evaluate the precision of the semantic segmentation results.

Pixel Precision (PA):

average cross-over ratio (MIoU):

/>

wherein k+1 represents a category of semantic segmentation, including k target classes and 1 background class; p is p _ii Representing the number of pixels originally belonging to class i and also predicted as class i, p _ij Representing the number of pixels originally belonging to the ith class predicted as the jth class, p _ji Representing the number of pixels originally belonging to the j-th class but predicted as the i-th class.

In this embodiment, there are 5812 experimental pictures, as shown in fig. 3, two of which are 4650 pictures for training, various types of road scenes are uniformly distributed in the data set, and the rest 1162 lane line images are taken as verification set pictures, and the sizes of the images are 1920×1080.

During training, firstly, the image input size of the network model is set to 1920×720, the learning rate adopts the poly learning strategy in the step 3, and the final network model parameters are obtained after training 43 epochs.

And testing the trained network model by using the lane line verification set to obtain a prediction result. The result of the final network model on the lane line training set is shown in fig. 4, and the lane line can be clearly detected. Therefore, the method can realize semantic segmentation of the lane line image, identify the lane line in the driving scene, has the advantages of higher accuracy, stronger anti-interference capability, good robustness and the like, and can be applied to unmanned vehicles.

The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the invention. The embodiments of the present application and features in the embodiments may be combined with each other arbitrarily without conflict.

Claims

1. The lane line semantic segmentation method integrating the low-level features is characterized by comprising the following steps of:

4) Carrying out up-sampling operation of preset multiples on the semantic information image fused with the low-level features to obtain a final lane line semantic segmentation image;

in the step 3), the multiple low-level feature fusion operations are performed, and the specific process is as follows:

carrying out up-sampling operation of a preset multiple on the image features of the last layer, carrying out convolution operation of 1 multiplied by 1 on the image features of the third layer, and fusing the image features of the two layers by using fusion operation to obtain a semantic information image of the first fused low-layer features;

and carrying out up-sampling operation of a preset multiple on the semantic information image of the first fusion low-level feature, carrying out convolution operation of 1 multiplied by 1 on the second-level image feature, and fusing the semantic information image and the second-level image feature by using fusion operation to obtain the semantic information image of the second fusion low-level feature, namely the semantic information image of the fusion low-level feature.

2. The lane line semantic segmentation method fused with low-level features according to claim 1, wherein the method comprises the following steps: in the step 1), the lane line data set acquisition process specifically includes the following steps:

3. The lane line semantic segmentation method fused with low-level features according to claim 2, wherein the method comprises the following steps: the urban road scene includes high speed, tunnels, urban areas, suburbs and campuses.

4. The lane line semantic segmentation method fused with low-level features according to claim 1, wherein the method comprises the following steps: in the step 2), preprocessing of the image in the data set comprises the following specific processes:

5. The lane line semantic segmentation method fused with low-level features according to claim 4, wherein the method comprises the following steps: the data enhancement processing in step 2.2) refers to randomly selecting one of a plurality of attributes for a photo in a training set, and randomly changing the value of the selected attribute of the image, wherein the attributes include brightness, contrast, scaling and rotation.

6. The lane line semantic segmentation method fused with low-level features according to claim 1, wherein the method comprises the following steps: the network model in the step 3) is a deep labv3+ network model.

7. The lane line semantic segmentation method fused with low-level features according to claim 1, wherein the method comprises the following steps: the step 3) of inputting the images in the training set into the network model for training specifically includes: the network model adopts a learning strategy of poly to determine an initial learning rate and an end learning rate, and the learning rate needs to be multiplied in the training process of the network model

item is the number of iterations.

8. The lane line semantic segmentation method fused with low-level features according to claim 1, wherein the method comprises the following steps: in the step 3), the network model needs to calculate a cross loss function during each training, and then a random gradient descent method is adopted to optimize the training;

wherein, the cross-over loss function is:

wherein L represents a cross-loss function value; y is a real sample label;

to predict the output value.

9. The lane line semantic segmentation method fused with low-level features according to any one of claims 1 to 8, wherein: after the final lane line semantic segmentation image is obtained in the step 4), the pixel precision and the average intersection ratio are adopted as evaluation indexes of lane line semantic segmentation results,

wherein, the pixel Precision (PA) is:

average cross-over ratio (MIoU) is: