CN111860349A

CN111860349A - Intelligent vehicle lane line semantic segmentation method and system

Info

Publication number: CN111860349A
Application number: CN202010716801.3A
Authority: CN
Inventors: 刘冶; 张希
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-07-23
Filing date: 2020-07-23
Publication date: 2020-10-30

Abstract

The invention provides a method and a system for semantic segmentation of lane lines of an intelligent vehicle, wherein the method comprises the following steps: selecting a frame of picture in a video acquired by an intelligent vehicle in the driving process as an input picture; inputting the input picture into a feature extraction network for feature extraction to obtain a first layer of feature extraction picture and a last layer of feature extraction picture of the feature extraction network; performing preset times of upsampling operation on the last layer of feature extraction image, combining the upsampling operation with the first layer of feature extraction image and performing convolution operation to obtain convoluted image information; performing preset times of upsampling operation on the feature extraction image output by the last layer, combining the feature extraction image of the first layer and the image information after convolution, and performing convolution operation to obtain the image information after the second convolution; and performing preset-time up-sampling operation on the image data after the second convolution, and outputting a semantic segmentation image of the lane line. The method fully utilizes the existing hardware resources, ensures the prediction accuracy of the image pixel points, and does not increase the hardware cost.

Description

Intelligent vehicle lane line semantic segmentation method and system

Technical Field

The invention relates to the field of computer vision, in particular to a method and a system for intelligently segmenting lane line semantics of a vehicle, and more particularly to a method for achieving lane line semantics segmentation based on redundant feature extraction information.

Background

The semantic segmentation of the lane lines is the application of a computer vision technology in the field of intelligent vehicles, and assists the intelligent vehicles in identifying the lane lines and constructing and updating high-precision maps in the driving process.

The core of the lane line semantic segmentation technology is to realize classification of the category of each pixel point on a picture for one frame of picture acquired by an intelligent vehicle in the driving process.

The success of convolution operation in the aspect of feature extraction promotes the application of a convolution neural network in the field of semantic segmentation. The neural network of the current coding-decoding architecture is widely applied to the image pixel point classification problem, wherein the coding part extracts important characteristic information in a picture through a series of convolution operations, and the decoding part constructs a final semantic segmentation output picture from the important characteristic information through a series of upsampling operations.

Although the neural network of the encoding-decoding architecture can solve the problem of image pixel point classification to a certain extent, the problem that image pixel point classification errors are troubling the semantic segmentation technology to be better applied to the field of intelligent vehicles is solved.

Patent document CN109766878A (application number: 201910287099.0) discloses a method and device for detecting lane lines, which relate to the technical field of automatic driving and are used for solving the problems of low accuracy and poor robustness of current lane line detection, and patent document CN109766878A discloses that the maximum height value, the average reflection intensity and the point cloud number statistical density of a grid in a bird's-eye view feature map are used as the input of darknet for feature extraction; determining the characteristic information of the lane line points by fusing the high-resolution of the low-level characteristics and the high-semantic information of the high-level characteristics through the FPN; determining lane line points corresponding to the lane line points in the aerial view feature map in the point cloud map according to the feature information; and taking the lane line points with the reflection intensity larger than the average reflection intensity threshold value in the lane line points in the point cloud map as lane line characteristic points, and performing geometric model fitting according to the lane line characteristic points in the point cloud map to determine lane lines.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method and a system for intelligently segmenting lane line semantics of a vehicle.

The method for semantically segmenting the lane lines of the intelligent vehicle, provided by the invention, comprises the following steps:

step M1: selecting a frame of picture in a video acquired by an intelligent vehicle in the driving process as an input picture;

step M2: inputting the input picture into a feature extraction network for feature extraction, and acquiring a first-layer feature extraction picture and a last-layer feature extraction picture of the feature extraction network;

step M3: performing preset times of upsampling operation on the last layer of feature extraction image, combining the upsampling operation with the first layer of feature extraction image and performing convolution operation to obtain convoluted image information;

step M4: performing preset times of upsampling operation on the feature extraction image output by the last layer, combining the feature extraction image of the first layer and the image information after convolution, and performing convolution operation to obtain the image information after the second convolution;

step M5: and performing preset-time up-sampling operation on the image data after the second convolution, and outputting a semantic segmentation image of the lane line.

Preferably, the feature extraction network comprises x layers of convolution layers and progressively extracts the information features between the pixels in the image.

Preferably, the feature extraction information output from the first layer and the last layer of the feature extraction network respectively performs convolution operations with convolution kernels of 1 × 1.

Preferably, the preset-times upsampling operation comprises a bilinear interpolation operation.

Preferably, the merging includes stitching the feature data tensors based on the number of channels.

Preferably, the convolution operation does not change the height and width of the image data.

The invention provides a system for semantically segmenting a lane line of an intelligent vehicle, which comprises the following steps:

module M1: selecting a frame of picture in a video acquired by an intelligent vehicle in the driving process as an input picture;

module M2: inputting the input picture into a feature extraction network for feature extraction, and acquiring a first-layer feature extraction picture and a last-layer feature extraction picture of the feature extraction network;

module M3: performing preset times of upsampling operation on the last layer of feature extraction image, combining the upsampling operation with the first layer of feature extraction image and performing convolution operation to obtain convoluted image information;

module M4: performing preset times of upsampling operation on the feature extraction image output by the last layer, combining the feature extraction image of the first layer and the image information after convolution, and performing convolution operation to obtain the image information after the second convolution;

module M5: and performing preset-time up-sampling operation on the image data after the second convolution, and outputting a semantic segmentation image of the lane line.

Preferably, the merging includes stitching the feature data tensors by using the number of channels as a standard;

the convolution operation does not change the height and width of the image data.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention fully utilizes the existing hardware resources, ensures the prediction accuracy of the image pixel points and does not generate the prediction error

The overhead of hardware is increased;

2. the invention fully utilizes the existing software resources, and the difficulty of software development is not additionally increased;

3. compared with the prior art, the method has the advantages that the prediction accuracy of the image pixel points is improved, and meanwhile, the calculation cost is reduced;

4. is beneficial to falling to the ground and being commercialized.

5. The method fully considers the importance of the information contained in the first layer characteristic diagram and the last quadruple up-sampling characteristic diagram, and improves the semantic segmentation accuracy of the image through the redundant operation of characteristic diagram combination.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a flow chart of the method of the present invention

FIG. 2 is a network framework diagram of the present invention

FIG. 3 is a perspective transformed image of the present invention

FIG. 4 is a semantic segmentation output image of the present invention

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

In order to improve the accuracy of image pixel point classification, the invention provides a method for realizing lane line semantic segmentation based on redundant feature extraction information, which is used for applying the image feature information extracted by a neural network coding part to a decoding part of a neural network for multiple times, so that the accuracy of image pixel point classification can be effectively improved, and the requirements of intelligent vehicle driving and high-precision map construction and updating are met.

Example 1

step M1: selecting a frame of picture in a video acquired by an intelligent vehicle in the driving process as an input picture; carrying out corresponding perspective transformation on the input image to obtain a road surface picture at a proper angle;

step M2: inputting the road surface picture after perspective transformation into a feature extraction network for feature extraction, and acquiring a first layer feature extraction picture and a last layer feature extraction picture of the feature extraction network;

step M3: performing quadruple up-sampling operation on the last layer of feature extraction image, combining the last layer of feature extraction image with the first layer of feature extraction image, and performing convolution operation to obtain convoluted image information;

step M4: performing quadruple up-sampling operation on the feature extraction image output by the last layer, combining the feature extraction image output by the last layer with the feature extraction image output by the first layer and the convolved image information, and performing convolution operation to obtain the image information after the second convolution;

step M5: and performing quadruple up-sampling operation on the image data subjected to the second convolution, and outputting a semantic segmentation image of the lane line.

The feature extraction network is a convolutional neural network composed of convolution operation, ReLU, batch standardization and the like and used for extracting feature information in the picture.

Specifically, the feature extraction network comprises x layers of convolution layers and progressively extracts information features among pixels in the image.

Specifically, the feature extraction information output by the first layer and the last layer of the feature extraction network respectively performs convolution operation with a convolution kernel of 1 × 1, so that the workload of calculation is reduced, and the memory space overhead of the memory is reduced.

In particular, the quadruple upsampling operation comprises a bilinear interpolation operation or other alternative quadruple upsampling operation.

Specifically, the merging includes stitching the feature data tensors using the number of channels as a standard.

Specifically, the convolution operation does not change the height and width of the image data.

module M1: selecting a frame of picture in a video acquired by an intelligent vehicle in the driving process as an input picture; carrying out corresponding perspective transformation on the input image to obtain a road surface picture at a proper angle;

module M2: inputting the road surface picture after perspective transformation into a feature extraction network for feature extraction, and acquiring a first layer feature extraction picture and a last layer feature extraction picture of the feature extraction network;

module M3: performing quadruple up-sampling operation on the last layer of feature extraction image, combining the last layer of feature extraction image with the first layer of feature extraction image, and performing convolution operation to obtain convoluted image information;

module M4: performing quadruple up-sampling operation on the feature extraction image output by the last layer, combining the feature extraction image output by the last layer with the feature extraction image output by the first layer and the convolved image information, and performing convolution operation to obtain the image information after the second convolution;

module M5: and performing quadruple up-sampling operation on the image data subjected to the second convolution, and outputting a semantic segmentation image of the lane line.

Example 2

Example 2 is a modification of example 1

The invention provides a method for intelligently segmenting the lane line semantics of a vehicle, which fully utilizes the importance and the redundancy of the feature information of the first layer and the last layer of a feature extraction network to realize the improvement of the lane line semantics segmentation accuracy. In the present embodiment, the semantic segmentation of a single image is taken as an example, and as shown in fig. 1 to 4, the semantic segmentation method includes the following steps:

step 1: one frame of picture collected in a video in the driving process of the intelligent vehicle is used as an original image, and perspective transformation is carried out on the original image to obtain an input image of the concerned road information. And in the process of carrying out perspective transformation processing on the original image, adjusting the size of the image.

Step 2: inputting the perspective transformation image obtained in the step 1 into a feature extraction network; the feature extraction network is composed of six convolutional layers, and extraction of important information among pixel points in the image is achieved.

And step 3: and (3) acquiring image data of the first layer and the last layer of the feature extraction network based on the step (2).

And 4, step 4: the size of the image data of the last layer after performing the convolution operation is 256 × 32, and the data size after performing the four times up-sampling operation is 256 × 128.

And 5: the data size after the convolution operation is performed on the image data of the first layer is 48 × 128.

Step 6: and (5) splicing the image data of the step 5 and the image data of the step 6 to obtain the data size of 304 x 128.

And 7: the data size after performing the convolution operation on the image data of step 6 is 256 × 128.

And 8: and (5) splicing the image data of the step 7 with the image data of the step 4 and the step 5 to obtain the data size of 560 × 128.

And step 9: the data size after performing the convolution operation on the image data of step 8 is 256 × 128.

Step 10: the data size after performing the convolution operation on the image data of step 8 is 9 × 128.

Step 11: the data size after performing the four times up-sampling operation on the image data of step 10 is 9 x 512.

Step 12: a semantically segmented output picture is obtained from the image data of step 11.

In the description of the present application, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present application and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present application.

Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A method for semantic segmentation of lane lines of an intelligent vehicle is characterized by comprising the following steps:

2. The method for intelligent vehicle lane line semantic segmentation according to claim 1, wherein the feature extraction network comprises x layers of convolutional layers, and the inter-pixel information features in the image are extracted progressively.

3. The method for intelligent vehicle lane line semantic segmentation as claimed in claim 1, wherein the feature extraction information output by the first layer and the last layer of the feature extraction network respectively performs convolution operation with convolution kernel of 1x 1.

4. The method of intelligent vehicle lane line semantic segmentation as claimed in claim 1, wherein the preset-multiple upsampling operation comprises a bilinear interpolation operation.

5. The method for semantically segmenting the lane lines of the intelligent vehicle according to claim 1, wherein the performing the pre-doubled upsampling operation on the feature extraction map of the last layer, merging with the feature extraction map of the first layer, performing the merging in the convolution operation, and performing the pre-doubled upsampling operation on the feature extraction map output by the last layer, merging with the feature extraction map of the first layer and the image information after the convolution, and performing the merging in the convolution operation includes stitching feature data tensors with the number of channels as a standard.

6. The method of intelligent vehicle lane line semantic segmentation as claimed in claim 1 wherein the convolution operation does not change the height and width of the image data.

7. A system for intelligent vehicle lane line semantic segmentation, comprising:

8. The system for intelligent vehicle lane line semantic segmentation according to claim 7, wherein the feature extraction network comprises x layers of convolutional layers, and the inter-pixel information features in the image are extracted progressively.

9. The system for intelligent vehicle lane line semantic segmentation as claimed in claim 7, wherein the preset-multiple upsampling operation comprises a bilinear interpolation operation.

10. The system for semantically segmenting the lane lines of the intelligent vehicle according to claim 7, wherein the performing the pre-doubled upsampling operation on the feature extraction map of the last layer, merging with the feature extraction map of the first layer, performing the merging in the convolution operation, and performing the pre-doubled upsampling operation on the feature extraction map output by the last layer, merging with the feature extraction map of the first layer and the image information after the convolution, and performing the merging in the convolution operation includes stitching feature data tensors with the number of channels as a standard;