CN115082888B

CN115082888B - Lane line detection method and device

Info

Publication number: CN115082888B
Application number: CN202210993629.5A
Authority: CN
Inventors: 张永昌; 贺翔翔; 何哲琪; 张雨; 邱忠营; 赵富旺
Original assignee: Beijing Qingzhou Zhihang Intelligent Technology Co ltd
Current assignee: Beijing Qingzhou Zhihang Intelligent Technology Co ltd
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2022-10-25
Anticipated expiration: 2042-08-18
Also published as: CN115082888A

Abstract

The embodiment of the invention relates to a lane line detection method and a device, wherein the method comprises the following steps: acquiring a first image; zooming the first image to generate a second image; performing feature extraction on the second image based on a backbone network to generate a first feature map; performing lane line starting point detection on the first feature map based on a key point detection network to generate a first starting point coordinate set; detecting lane line foreground points of the first feature map based on a binary segmentation network to generate a first foreground point coordinate set; performing lane offset voting on the first feature map by pixel points based on a regression voting network to obtain a first offset feature map; performing lane line semantic feature marking on pixel points of the first feature map; drawing lane lines on the first characteristic diagram; and outputting the first characteristic diagram which is completely drawn. According to the invention, the lane line detection calculation amount can be reduced, and the lane line semantic features for classifying lane lines can be added to the image while the visual lane line detection image is output.

Description

Lane line detection method and device

Technical Field

The invention relates to the technical field of data processing, in particular to a lane line detection method and a lane line detection device.

Background

The lane line of the autonomous vehicle needs to be detected during driving. In the conventional lane line detection scheme, a detection scheme based on an anchor point is mostly adopted, edge pixel points of a left boundary, a right boundary and a lower boundary in a two-dimensional perception image are used as the anchor point, ray proposals in different directions are made from the anchor point, and lane line characteristic regression calculation is performed on each ray proposal of each anchor point. The regression calculation amount of the lane line features of the scheme is determined by the number of anchor points and the number of ray proposals, and even if the number of anchor points of the most common perception image is thousands, the number of ray proposals can be more than ten, so that the calculation amount of the scheme is very large, and a large amount of calculation resources are consumed.

Disclosure of Invention

The invention aims to provide a lane line detection method, a lane line detection device, electronic equipment and a computer-readable storage medium, aiming at the defects of the prior art, wherein a basic feature graph is generated by firstly extracting basic features of a perception image through a backbone network; then, respectively carrying out lane line starting point detection, lane line foreground point detection and x and y direction offset feature learning from the foreground point to the starting point on the basic feature map through three intelligent networks (a key point detection network, a binary segmentation network and a regression voting network); then, according to the output of the three intelligent networks, performing lane line semantic feature marking on the pixel points of the basic feature map; and then drawing the lane lines according to the lane line semantic features of the pixel points. By the method, the calculated amount of lane line detection can be greatly reduced; and the semantic features of the lane lines for classifying the lane lines can be added to the image while the visual lane line detection image is output.

In order to achieve the above object, a first aspect of the embodiments of the present invention provides a lane line detection method, where the method includes:

acquiring a first image;

carrying out image scaling processing on the first image according to a preset image size to generate a corresponding second image;

performing feature extraction processing on the second image based on a backbone network to generate a corresponding first feature map;

performing lane line starting point detection processing on the first feature map based on a key point detection network to generate a corresponding first starting point coordinate set; the first set of origin coordinates comprises a plurality of first origin coordinates;

performing lane line foreground point detection processing on the first feature map based on a binary segmentation network to generate a corresponding first foreground point coordinate set; the first set of foreground point coordinates comprises a plurality of first foreground point coordinates;

performing pixel point lane offset voting processing on the first feature map based on a regression voting network to obtain a corresponding first offset feature map;

performing lane line semantic feature labeling processing on pixel points of the first feature map according to the first starting point coordinate set, the first foreground point coordinate set and the first offset feature map;

drawing lane lines on the first characteristic diagram; and outputting the first characteristic diagram which is used for finishing the drawing of the lane line as a lane line detection result.

Preferably, the preset image size is W ₀ *H ₀ ，W ₀ 、H ₀ Respectively the width and the height of the preset image size;

the second image has a size W ₀ *H ₀ ；

The backbone network is a three-level characteristic pyramid network taking a residual error network as a characteristic extraction network;

the key point detection network comprises a first convolution network unit and a second convolution network unit; the first convolution network unit is connected with the second convolution network unit;

the binary segmentation network comprises a third convolution network unit, a fourth convolution network unit, a first full-connection network unit and a second full-connection network unit; the third convolution network unit is connected with the fourth convolution network unit, the fourth convolution network unit is connected with the first fully-connected network unit, and the first fully-connected network unit is connected with the second fully-connected network unit; the first full-connection network unit consists of a full-connection network layer and a normalization network layer; the second fully-connected network unit consists of a fully-connected network layer and a softmax classified network layer;

the regression voting network comprises a fifth convolution network unit, a sixth convolution network unit and a third full-connection network unit; the fifth convolution network unit is connected with the sixth convolution network unit, and the sixth convolution network unit is connected with the third fully-connected network unit; the third fully connected network unit consists of two layers of fully connected networks.

Preferably, the feature extraction processing performed on the second image based on the backbone network to generate a corresponding first feature map specifically includes:

inputting the second image into the backbone network for feature extraction to obtain three-level output feature graphs with different sizes, namely a first-level feature graph, a second-level feature graph and a third-level feature graph; and taking the primary feature map as the first feature map; the first characteristic diagram has a size W ₁ *H ₁ Characteristic dimension of D ₁ ，W ₁ 、H ₁ Width and height, W, of the first feature map, respectively ₁ =W ₀ /2、H ₁ =H ₀ /2、D ₁ =64。

Preferably, the performing, by the keypoint detection network, lane start point detection processing on the first feature map to generate a corresponding first start point coordinate set specifically includes:

inputting the first feature map into the first convolutional network unit; performing feature extraction on the first feature map by using a preset first convolution filter according to a mode that the step length is 1 and the filling is 1 by the first convolution network unit to generate a corresponding second feature map; the first convolution filter is composed of a first number n ₁ 3 × 3 convolution kernel component of (a); the dimension of the second characteristic diagram is W ₂ *H ₂ Characteristic dimension of D ₂ ，W ₂ 、H ₂ Width and height, W, of the second feature map, respectively ₂ =W ₁ 、H ₂ =H ₁ 、D ₂ =n ₁ (ii) a The first number n ₁ By default 64;

inputting the second feature map into the second convolutional network unit; performing feature extraction on the second feature map by using a preset second convolution filter by the second convolution network unit according to a mode that the step length is 1 and the filling is 1 to generate a corresponding third feature map; the second convolution filter consists of 1 convolution kernel of 3 × 3; the dimension of the third characteristic diagram is W ₃ *H ₃ Characteristic dimension of D ₃ ，W ₃ 、H ₃ Width and height, W, of the third feature map, respectively ₃ =W ₂ =W ₁ 、H ₃ =H ₂ =W ₁ 、D ₃ =1；

Performing non-maximum suppression processing on the third feature map, specifically: sliding on the third feature map using a3 × 3 sliding window with a sliding step size of 1; extracting the maximum pixel value in the current sliding window to generate a corresponding current maximum value every time sliding is carried out once, and resetting the pixel value of the pixel point of which the pixel value in the current sliding window is not the current maximum value;

extracting the coordinates of the pixel points of which the pixel values exceed a preset pixel threshold value on the third characteristic image as corresponding coordinates of the first starting point; and forming a corresponding first starting point coordinate set by all the obtained first starting point coordinates.

Preferably, the performing, based on the binary segmentation network, the lane line foreground point detection processing on the first feature map to generate a corresponding first foreground point coordinate set specifically includes:

inputting the first feature map into the third convolutional network unit; performing feature extraction on the first feature map by using a preset third convolution filter according to the mode that the step length is 1 and the filling is 1 by the third convolution network unit to generate a corresponding fourth feature map; the third convolution filter is formed by a second number n ₂ 3 × 3 convolution kernel composition of (a); the fourth characteristic diagram has the size W ₄ *H ₄ Characteristic dimension of D ₄ ，W ₄ 、H ₄ Width and height, W, of the fourth feature map, respectively ₄ =W ₁ 、H ₄ =H ₁ 、D ₄ =n ₂ (ii) a The second number n ₂ By default 128;

inputting the fourth feature map into the fourth convolution network unit; performing feature extraction on the fourth feature map by using a preset fourth convolution filter according to a mode that the step length is 1 and the filling is 1 by the fourth convolution network unit to generate a corresponding fifth feature map; said fourth convolution filter consisting of a third number n ₃ 3 × 3 convolution kernel composition of (a); the dimension of the fifth characteristic diagram is W ₅ *H ₅ Characteristic dimension of D ₅ ，W ₅ 、H ₅ Width and height, W, of the fifth feature map, respectively ₅ =W ₄ =W ₁ 、H ₅ =H ₄ =H ₁ 、D ₅ =n ₃ (ii) a The third number n ₃ By default 64;

converting the fifth feature map into a one-dimensional feature map with a length W ₅ *H ₅ *D ₅ The first feature vector of (1);

inputting the first feature vector into the first fully-connected network element; performing feature regression operation on the first feature vector by the first fully-connected network unit to generate a corresponding second feature vector, and performing normalization processing on the second feature vector to generate a corresponding third feature vector;

inputting the third feature vector into the second fully-connected network unit; performing feature regression operation on the third feature vector by the second fully-connected network unit to generate a corresponding fourth feature vector, and performing second classification processing on the fourth feature vector to generate a corresponding first classification tensor; the first class tensor comprises W ₅ *H ₅ A first class vector a of length 2 _x,y ,1≤x≤W ₅ ,1≤y≤H ₅ (ii) a The first classification vector a _x,y The foreground point probability and the background point probability are included; each of the first classification vectors a _x,y Corresponding to a pixel point of the first feature map;

for each of the first classification vectors a _x,y Traversing is carried out; passing, the first classification vector a of the current pass _x,y Recording as a current vector; recording the probability of larger numerical value in the current vector as a first probability; if the first probability is the foreground point probability, extracting the pixel point coordinate (x, y) corresponding to the current vector as the corresponding first foreground point coordinate; and when the traversal is finished, forming a corresponding first foreground point coordinate set by all the obtained first foreground point coordinates.

Preferably, the performing, based on the regression voting network, pixel point lane offset voting processing on the first feature map to obtain a corresponding first offset feature map specifically includes:

inputting the first feature map into the fifth convolutional network unit; performing feature extraction on the first feature map by using a preset fifth convolution filter according to a mode that the step length is 1 and the filling is 1 by the fifth convolution network unit to generate a corresponding sixth feature map; said fifth convolution filter is composed of a fourth number n ₄ 3 × 3 convolution kernel composition of (a); the sixth featureDimension of the figure is W ₆ *H ₆ Characteristic dimension of D ₆ ，W ₆ 、H ₆ Width and height, W, of the sixth feature map, respectively ₆ =W ₁ 、H ₆ =H ₁ 、D ₆ =n ₄ (ii) a The fourth number n ₄ By default 128;

inputting the sixth feature map into the sixth convolutional network unit; performing feature extraction on the sixth feature map by using a preset sixth convolution filter by the sixth convolution network unit according to a mode that the step length is 1 and the filling is 1 to generate a corresponding seventh feature map; the sixth convolution filter is formed by a fifth number n ₅ 3 × 3 convolution kernel composition of (a); the dimension of the seventh characteristic diagram is W ₇ *H ₇ Characteristic dimension of D ₇ ，W ₇ 、H ₇ Width and height, W, of the seventh feature map, respectively ₇ =W ₆ =W ₁ 、H ₇ =H ₆ =H ₁ 、D ₇ =n ₅ (ii) a The fifth number n ₅ By default 16;

converting the seventh feature map into a one-dimensional feature map with a length W ₇ *H ₇ *D ₇ The first seed vector of (a);

inputting the first seed vector into the third fully-connected network element; learning, by the third fully-connected network unit, x-direction and y-direction pixel point coordinate offsets from a pixel point on each lane line on the seventh feature map corresponding to the first seed vector to a starting point of the located lane line through two layers of fully-connected networks to generate a corresponding first offset tensor; the first offset tensor comprises W ₇ *H ₇ A first offset vector b of length 2 _x,y ,1≤x≤W ₇ ,1≤y≤H ₇ (ii) a The first offset vector b _x,y Comprises a first offset quantity delta x and a second offset quantity delta y; each of the first offset vectors b _x,y Corresponding to a pixel point of the first feature map; the first offset vector b _x,y If the two offsets are all 0, the corresponding pixel point on the first characteristic diagram is a non-lane line pixel point, and if the two offsets are not all 0, the corresponding pixel point on the first characteristic diagram isThe corresponding pixel point is a lane line pixel point, and the first offset delta x and the second offset delta y are coordinate offsets of x-direction pixel points and y-direction pixel points from the current lane line pixel point to the starting point of the lane line where the current lane line pixel point is located respectively;

performing an eigen map conversion on the first migration tensor to generate a corresponding first migration eigen map; the first offset profile has a dimension W ₈ *H ₈ A characteristic dimension of 2,W ₈ 、H ₈ Width and height, W, of the first offset profile, respectively ₈ =W ₇ =W ₁ 、H ₈ =H ₇ =H ₁ (ii) a The first offset profile comprises W ₈ *H ₈ A first pixel point; each first pixel point comprises two characteristic data which are respectively the first offset delta x and the second offset delta y.

Preferably, before performing pixel point lane offset voting on the first feature map based on the regression voting network to obtain a corresponding first offset feature map, the method further includes training the regression voting network; the training of the regression voting network specifically includes:

step 71, acquiring a first training image; taking pixel points covered by all the lane lines on the first training image as first foreground points, taking pixel points outside all the lane lines as first background points, and taking a central point of an initial position of any lane line as a first initial point;

wherein the first training image has a size W ₀ *H ₀ ；

Step 72, constructing a first comparison characteristic diagram; taking second pixel points of the pixel point coordinates on the first comparison characteristic image corresponding to the first foreground points as second foreground points, taking the second pixel points of the pixel point coordinates corresponding to the first background points as second background points, and taking the second pixel points of the pixel point coordinates corresponding to the first initial points as second initial points;

wherein the first comparison feature map has a dimension W ₀ *H ₀ The characteristic dimension is 2; the first isThe alignment feature map includes W ₀ *H ₀ Each second pixel point comprises two characteristic data;

step 73, setting two feature data of each second background point and each second starting point on the first comparison feature map as 0;

step 74, traversing each second foreground point on the first comparison feature map; taking the currently traversed second foreground point as a current foreground point; and calculating the coordinate offset of the x-direction pixel point and the y-direction pixel point of the second starting point corresponding to the current foreground point to the lane line where the current foreground point is located to generate the corresponding x-direction offset delta x ₁ And y-direction offset amount Deltay ₁ (ii) a And shifting the x-direction by an amount Deltax ₁ And the y-direction offset amount Deltay ₁ As two feature data of the current foreground point;

step 75, performing feature extraction processing on the first training image based on the backbone network to generate a corresponding first training feature map;

step 76, inputting the first training feature map into the regression voting network; performing, by the fifth convolution network unit, feature extraction on the first training feature map by using the fifth convolution filter in a manner of step size 1 and padding to 1 to generate a corresponding second training feature map; performing feature extraction on the second training feature map by using the sixth convolution filter according to a mode that the step length is 1 and the filling is 1 by the sixth convolution network unit to generate a corresponding third training feature map; converting the third training feature map into a one-dimensional first training seed vector; learning the coordinate offset of x-direction pixel points and y-direction pixel points from pixel points on each lane line to the starting point of the lane line on which the pixel points are located on the third training feature map corresponding to the first training seed vector by the third full-connection network unit through two layers of full-connection networks to generate corresponding first training offset tensors; performing characteristic diagram conversion on the first training offset tensor to generate a corresponding first training offset characteristic diagram;

wherein the first training offset feature map size is W ₀ *H ₀ With a characteristic dimension of2; the first training offset profile comprises W ₀ *H ₀ Each third pixel point comprises two characteristic data which are respectively x-direction offset delta x ₂ And y-direction offset amount Deltay ₂ (ii) a The third pixel point and the second pixel point correspond to each other one by one according to the pixel point coordinates (x, y), wherein x belongs to [1, W ] ₀ ],y∈[1,H ₀ ]；

Step 77, construct L1 loss function F as

；

Step 78, calculating the average absolute errors of all corresponding second and third pixel points in the first comparison feature map and the first training offset feature map according to the L1 loss function F to generate a corresponding first loss value;

step 79, judging the first loss value based on a preset reasonable loss interval; if the first loss value meets the reasonable loss interval, determining that the training loss value of the first training image reaches the standard and turning to step 81; if the first loss value does not meet the reasonable loss interval, go to step 80;

step 80, solving the network parameters of the regression voting network which enable the L1 loss function F to reach the minimum value to generate a corresponding first network parameter set; modulating the network parameters of the regression voting network according to the first network parameter set; after the modulation is completed, returning to step 76 to continue training based on the first training feature map;

and step 81, going to step 71 to acquire a new first training image again for training until the training loss values of the specified number of first training images reach the standard.

Preferably, the performing, according to the first start point coordinate set, the first foreground point coordinate set, and the first offset feature map, lane line semantic feature labeling processing on the pixel points of the first feature map specifically includes:

distributing a corresponding lane line identifier for each first starting point coordinate;

taking the first pixel points corresponding to the first foreground point coordinates on the first offset characteristic diagram as first lane line pixel points;

traversing each first lane line pixel point; during the traversal, taking the currently traversed first lane line pixel points as current lane line pixel points; extracting x-direction coordinates and y-direction coordinates of the current lane line pixel points as corresponding first abscissa and first ordinate, extracting the first offset delta x and the second offset delta y of the current lane line pixel points as corresponding first transverse offset and first longitudinal offset, and generating corresponding first starting point abscissa x by adding the first abscissa and the first transverse offset ₀ And generating a corresponding first starting point ordinate y from the sum of the first ordinate and the first longitudinal offset ₀ And from said first starting point abscissa x ₀ And the first starting point ordinate y ₀ Forming corresponding second starting point coordinates; calculating the linear distances between the second starting point coordinates and all the first starting point coordinates to obtain a plurality of first distances; taking the first starting point coordinate corresponding to the shortest first distance in the plurality of first distances as a matching starting point coordinate corresponding to the current lane line pixel point; taking the lane line identification corresponding to the matching initial point coordinate as the matching lane line identification corresponding to the current lane line pixel point;

adding a lane line semantic feature for each pixel point on the first feature map, and initializing feature values of the lane line semantic features of all the pixel points as invalid identifications; setting the lane line semantic features of the pixel points corresponding to the first starting point coordinates as corresponding lane line identifications; and setting the lane line semantic features of the pixel points corresponding to the first lane line pixel points as the corresponding matching lane line identifiers.

Preferably, the step of drawing lane lines on the first characteristic diagram specifically includes:

distributing a corresponding lane line color pixel value to each lane line mark;

traversing each pixel point on the first characteristic diagram; during the traversal, marking the currently traversed pixel as a current pixel, and extracting the lane line semantic features of the current pixel as corresponding current lane line identifiers; and if the current lane line identifier is not an invalid identifier, setting the pixel value of the current pixel point as the lane line color pixel value corresponding to the current lane line identifier.

A second aspect of an embodiment of the present invention provides an apparatus for implementing the lane line detection method according to the first aspect, where the apparatus includes: the system comprises an acquisition module, an image preprocessing module, a backbone network processing module, a key point detection network processing module, a binary segmentation network processing module, a regression voting network processing module and a lane line detection output module;

the acquisition module is used for acquiring a first image;

the image preprocessing module is used for carrying out image scaling processing on the first image according to a preset image size to generate a corresponding second image;

the backbone network processing module is used for performing feature extraction processing on the second image based on a backbone network to generate a corresponding first feature map;

the key point detection network processing module is used for carrying out lane line starting point detection processing on the first feature map based on a key point detection network to generate a corresponding first starting point coordinate set; the first set of origin coordinates comprises a plurality of first origin coordinates;

the binary segmentation network processing module is used for carrying out lane line foreground point detection processing on the first feature map based on a binary segmentation network to generate a corresponding first foreground point coordinate set; the first set of foreground point coordinates comprises a plurality of first foreground point coordinates;

the regression voting network processing module is used for performing pixel point lane deviation voting processing on the first feature map based on a regression voting network to obtain a corresponding first deviation feature map;

the lane line detection output module is used for marking the semantic feature of the pixel points of the first feature map according to the first starting point coordinate set, the first foreground point coordinate set and the first offset feature map; drawing lane lines on the first characteristic diagram; and outputting the first characteristic diagram which is used for finishing the drawing of the lane line as a lane line detection result.

A third aspect of an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a transceiver;

the processor is configured to be coupled to the memory, read and execute instructions in the memory, so as to implement the method steps of the first aspect;

the transceiver is coupled to the processor, and the processor controls the transceiver to transmit and receive messages.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium storing computer instructions that, when executed by a computer, cause the computer to perform the method of the first aspect.

The embodiment of the invention provides a lane line detection method, a lane line detection device, electronic equipment and a computer readable storage medium, wherein a basic feature graph is generated by extracting basic features of a perception image through a backbone network; then, detecting a lane line starting point of the basic feature map based on a key point detection network, detecting lane line foreground spots of the basic feature map based on a binary segmentation network, and learning the x-direction and y-direction offset features from the lane line foreground spots to the starting point based on a regression voting network; distributing a corresponding lane line mark for each detected lane line starting point; marking corresponding lane line semantic features on pixel points on each lane line of the basic feature map according to the detected coordinates of the starting point of the lane line, the coordinates of the foreground points of the lane line and the offset features from the foreground points to the starting point of the lane line, wherein the lane line semantic features correspond to a lane line identifier and a lane line color pixel value; and then according to the lane line color pixel values corresponding to the lane line semantic features, dyeing the pixel points belonging to different lane lines on the basic feature map. According to the invention, on one hand, the calculated amount of lane line detection is greatly reduced; on the other hand, the visual lane line detection image is output, and the lane line semantic features for classifying lane lines are added to the image.

Drawings

Fig. 1 is a schematic view of a lane line detection method according to a first embodiment of the present invention;

fig. 2 is a block diagram of a lane line detection apparatus according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

As shown in fig. 1, which is a schematic view of a lane line detection method provided in an embodiment of the present invention, the method mainly includes the following steps:

step 1, a first image is obtained.

Here, the first image is a two-dimensional perception image of the environment where the vehicle is located captured by the vehicle-mounted camera.

Step 2, carrying out image scaling processing on the first image according to a preset image size to generate a corresponding second image;

wherein the preset image size is W ₀ *H ₀ ，W ₀ 、H ₀ Respectively the width and height of a preset image size; the size of the second image is W ₀ *H ₀ 。

Here, the preset image size is a specified size of the subsequently used backbone network to the input image; for example, if the specified size of the backbone network for the input image is 800 × 288 (width × height), then the width W of the preset image size ₀ I.e. 800, height H ₀ 288, respectively. The size of the first image, which is a perception image output by the vehicle-mounted camera, is not uniform, so that before the main network is used, any first image needs to be zoomed according to the preset image size to obtain the width-height size W ₀ *H ₀ I.e. the second image.

Step 3, performing feature extraction processing on the second image based on the backbone network to generate a corresponding first feature map;

the method specifically comprises the following steps: inputting the second image into a main network for feature extraction to obtain three-level output feature graphs with different sizes, namely a first-level feature graph, a second-level feature graph and a third-level feature graph; taking the primary feature map as a first feature map;

the main network is a three-level characteristic pyramid network which takes a residual error network as a characteristic extraction network; dimension W of the first feature map ₁ *H ₁ Characteristic dimension of D ₁ ，W ₁ 、H ₁ Width and height, W, of the first profile, respectively ₁ =W ₀ /2、H ₁ =H ₀ /2、D ₁ =64。

Here, the embodiment of the present invention uses a three-level Feature Pyramid Network (FPN) that takes a Residual network (ResNets) as a Feature extraction network to construct a backbone network, and performs basic Feature learning on an input second image according to basic features of known multiple types of objects (such as lane lines, buildings, pedestrians, vehicles, animals, plants, bicycles, tricycles, and other objects) through the backbone network, so as to output three-scale Feature maps with rich semantic information, i.e., primary, secondary, and tertiary Feature maps; and selecting a primary feature map as a basic feature map, namely a first feature map.

Further, the network structure of the three-level feature pyramid network comprises a down-sampling residual network side from bottom to top and an up-sampling feature extraction network side from top to bottom; the down-sampling residual error network side comprises a first sampling network layer, a second sampling network layer and a third sampling network layer from bottom to top, and the first sampling network layer, the second sampling network layer and the third sampling network layer are sequentially connected; the upper sampling feature extraction network side comprises a third feature extraction layer, a second feature extraction layer and a first-level feature extraction layer from top to bottom, and the third feature extraction layer, the second feature extraction layer and the first-level feature extraction layer are sequentially connected; in addition, the first-level sampling network layer is also connected with the first-level feature extraction layer, the second-level sampling network layer is also connected with the second-level feature extraction layer, and the third-level residual error network layer is also connected with the third-level feature extraction layer; the first, second and third sampling network layers are implemented with reference to the first three networks (conv 1, conv2_ x, conv3_ x) of the ResNet101 network published by the technical paper "Deep Residual Learning for Image registration".

Further, based on the detailed network structure of the backbone network, when inputting the second image into the backbone network for feature extraction, the specific steps of the embodiment of the present invention are as follows: carrying out down-sampling characteristic dimension expansion on the second image by a first-stage sampling network layer according to the specified characteristic dimension to generate a corresponding first-stage sampling graph; the second-level sampling network layer carries out down-sampling residual error operation on the first-level sampling graph to generate a corresponding second-level sampling graph; the third-level sampling network layer performs down-sampling residual operation on the second-level sampling graph to generate a corresponding third-level sampling graph; the three-level characteristic extraction layer is used for carrying out characteristic extraction processing on the three-level sampling graph to generate a corresponding three-level characteristic graph; the second-level feature extraction layer is used for carrying out up-sampling on the third-level feature map, carrying out image fusion on the up-sampled image and the second-level sampling map, and carrying out feature extraction processing on the fused image to generate a corresponding second-level feature map; the primary feature extraction layer is used for carrying out up-sampling on the secondary feature map, carrying out image fusion on the up-sampled image and the primary sampled image, and carrying out feature extraction processing on the fused image to generate a corresponding primary feature map; and outputting the obtained first-level, second-level and third-level characteristic diagrams.

Here, the embodiment of the present invention performs 1/2 equal-ratio scaling on the width and height of each stage of input image with reference to the step-by-step down-sampling scheme of ResNet101 when performing step-by-step down-sampling through the one-, two-, and three-stage sampling networks, so as to obtain the one-stage and two-stage input imagesThe overall sizes of the first-level characteristic graph and the third-level characteristic graph are 1/4, 1/16 and 1/32 of the original input image, namely the second image, respectively, and the widths/heights of the obtained first-level characteristic graph, the obtained second-level characteristic graph and the obtained third-level characteristic graph are 1/2, 1/4 and 1/16 of the widths/heights of the second image respectively; therefore, the width W of the first feature map (primary feature map) obtained in the current step 1 ₁ =W ₀ /2, height H ₁ =H ₀ /2. Further, from the feature dimensions of the output feature map of the previous three-level network (conv 1, conv2_ x, conv3_ x) of the ResNet101, it can be known that the feature dimensions of the first, second and third-level feature maps should be 64, 128 and 256, respectively, so that the feature dimension D of the first feature map (first-level feature map) obtained in the current step 1 is the feature dimension D ₁ =64。

It should be further noted that, in the embodiment of the present invention, before performing the feature extraction processing on the second image based on the backbone network, the backbone network needs to be trained to perform corresponding network parameter setting on all the sampling network layers and the feature extraction layers in the foregoing steps. The training mode is similar to the training mode of a feature pyramid network conventionally used for target detection (such as lane lines, buildings, pedestrians, vehicles, animals, plants, bicycles, tricycles, and the like), and is not further described herein.

Step 4, performing lane line initial point detection processing on the first characteristic diagram based on the key point detection network to generate a corresponding first initial point coordinate set;

the key point detection network comprises a first convolution network unit and a second convolution network unit; the first convolution network unit is connected with the second convolution network unit; the first set of start point coordinates comprises a plurality of first start point coordinates;

the method specifically comprises the following steps: step 41, inputting the first feature map into the first convolution network unit; performing feature extraction on the first feature map by using a preset first convolution filter according to the mode that the step length is 1 and the filling is 1 by the first convolution network unit to generate a corresponding second feature map;

wherein the first convolution filter is formed by a first number n ₁ 3 × 3 convolution kernel composition of (a); dimension W of the second feature map ₂ *H ₂ Characteristic dimension of D ₂ ，W ₂ 、H ₂ Width and height, W, of the second characteristic diagram, respectively ₂ =W ₁ 、H ₂ =H ₁ 、D ₂ =n ₁ (ii) a A first number n ₁ Default to 64;

here, the Convolution filter (Convolution filter) is a filter constructed based on the principle of Convolution operation, and is mainly used for feature extraction; the filtering principle of the convolution filter is known from the disclosed technical implementation scheme, and the essence of the convolution filter is that a convolution operation is carried out on an input characteristic diagram based on a convolution kernel with a given size according to an appointed convolution step length and a convolution filling mode, and a corresponding filtering characteristic diagram is generated according to a convolution operation result; in the filtering process, if the characteristic dimension of the input characteristic diagram is 1, performing convolution operation on the input characteristic diagram based on the convolution kernel with the set size according to the appointed convolution step size and the convolution filling mode to obtain a filtering characteristic diagram; if the characteristic dimension of the input characteristic diagram is not 1, dividing the input characteristic diagram into a plurality of single-layer characteristic diagrams according to the characteristic dimension, performing convolution operation on each single-layer characteristic diagram respectively according to an agreed convolution step size and a convolution filling mode based on the convolution kernel with the set size to obtain a plurality of single-layer filtering characteristic diagrams, and fusing the single-layer filtering characteristic diagrams according to a channel characteristic adding mode to obtain a filtering characteristic diagram fused with multi-layer filtering characteristics;

based on the filtering principle of the convolution filter, in step 41, the specific step of performing feature extraction on the first feature map by using the preset first convolution filter according to the step size of 1 and the filling of 1 by the first convolution network unit to generate the corresponding second feature map includes: use of n by a first convolutional network element ₁ Filtering the first characteristic diagram by convolution kernels with the size of 3 x 3 according to the mode that the step length is 1 and the filling is 1 respectively to obtain n ₁ A filter characteristic diagram with characteristic dimension of 1 and the same length and width as the first characteristic diagram, and then for n ₁ Splicing the filtering characteristic graphs to obtain the characteristic graphs with the same length and width as the first characteristic graph and with the characteristic dimension of n ₁ The second feature map of (1);

here, the first convolutional network element is inThe appointed convolution step length in the filtering process is the step length of 1, and the appointed convolution filling mode is the filling of 1; the convolution filling method with 1 filling is actually to add 1 pixel point with a set value (default 0) on each of the four sides of the first feature map according to the up-down and left-right symmetrical relationship, so that the size of the second feature map obtained by performing convolution with a3 × 3 convolution kernel in a step size 1 manner should be equal to the size (W) of the first feature map ₁ *H ₁ ) Keeping consistent; in addition, as can be seen from the above description of the convolution filter, the feature dimension of the second feature map is determined by the number of convolution kernels of the first convolution filter, that is, the first number n ₁ Determining; therefore, the width W of the second feature map ₂ =W ₁ Height H ₂ =H ₁ Characteristic dimension D ₂ =n ₁ ；

Step 42, inputting the second feature map into a second convolution network unit; performing feature extraction on the second feature map by using a preset second convolution filter by using a second convolution network unit according to a mode that the step length is 1 and the filling is 1 to generate a corresponding third feature map;

wherein the second convolution filter consists of 1 convolution kernel of 3 × 3; dimension W of the third feature pattern ₃ *H ₃ Characteristic dimension of D ₃ ，W ₃ 、H ₃ Width and height, W, of the third feature map, respectively ₃ =W ₂ =W ₁ 、H ₃ =H ₂ =H ₁ 、D ₃ =1；

Here, the filtering processing manner of the second convolutional network unit is similar to that in step 41, and is not further described herein; because the agreed convolution step size of the second convolution network unit in the filtering process is 1, the agreed convolution filling mode is 1, and the number of convolution kernels of the used second convolution filter is 1; therefore, the size of the filtered third feature map should be equal to the size (W) of the second feature map ₂ *H ₂ ) Remain consistent and feature dimensions of 1, i.e.: width W of third feature map ₃ =W ₂ =W ₁ Height H ₃ =H ₂ =H ₁ Characteristic dimension D ₃ =1；

Here, as is apparent from the above description of steps 41 to 42, the keypoint detection network according to the embodiment of the present invention actually performs a thermodynamic diagram (heat map) regression process on the first feature map with the start point of the lane line as the keypoint; the obtained third characteristic diagram is actually a thermodynamic diagram, the pixel value of each pixel point on the third characteristic diagram is a thermal value, and the thermal value represents the probability that the current pixel point is identified as the starting point of the lane line;

step 43, performing non-maximum suppression processing on the third feature map;

the method comprises the following specific steps: sliding on the third feature map by using a3 × 3 sliding window with a sliding step size of 1; extracting the maximum pixel value in the current sliding window to generate a corresponding current maximum value every time sliding is carried out once, and resetting the pixel value of the pixel point of which the pixel value in the current sliding window is not the current maximum value;

here, through the Non-Maximum Suppression (Non-Maximum Suppression) operation, at most one pixel having a pixel value not 0 is reserved in each 3 × 3 pixel region on the third feature map;

step 44, extracting the coordinates of the pixel points of which the pixel values exceed the preset pixel threshold value on the third characteristic image as corresponding first initial point coordinates; and forming a corresponding first starting point coordinate set by all the obtained first starting point coordinates.

Here, the preset pixel threshold is a preset thermodynamic diagram pixel threshold, and the threshold is used for a pixel value corresponding to a starting point of the lane line; in the current step, the remaining pixel points with pixel values not equal to 0 on the third feature map are further screened, the pixel points with pixel values higher than a preset pixel threshold value are taken as the finally identified lane line starting points, the pixel point coordinates of each lane line starting point are extracted as corresponding first starting point coordinates, and a first starting point coordinate set is formed by one or more obtained first starting point coordinates.

In summary, the process of performing the lane starting point detection processing on the first feature map based on the keypoint detection network in the current step 4 consists of the above steps 41 to 44; step 41-42 is to perform thermodynamic diagram regression on the first characteristic diagram by taking the starting point of the lane line as a key point based on the key point detection network, and obtain a thermodynamic diagram as a third characteristic diagram; step 43, performing non-maximum suppression on the third feature map based on a3 × 3 sliding window, which is equivalent to performing noise reduction processing on the third feature map once; and step 44, based on a preset pixel threshold value, performing lane line starting point identification on the third feature map after noise reduction is completed, and forming a first starting point coordinate set by pixel point coordinates of one or more identified lane line starting points, namely first starting point coordinates.

It should be noted that, in the embodiment of the present invention, before the detection processing of the lane start point is performed on the first feature map based on the key point detection network, the key point detection network needs to be trained; the method for training the key point detection network comprises the following steps:

step A1, acquiring a training image with a lane line as a key point training image, and marking the starting point position of the lane line on the key point training image to generate a plurality of first marking points;

step A2, constructing a thermodynamic diagram with the same size as the key point training image as a first label thermodynamic diagram, setting the pixel values of pixel points corresponding to the first marking points on the first label thermodynamic diagram to be 1, and setting the pixel values of the other pixel points to be 0; on the first label thermodynamic diagram, a corresponding two-dimensional Gaussian distribution region (namely a region in which four adjacent regions of the center and the region along the center are located) is constructed by taking the pixel points corresponding to the first marking points as the center, and the pixel values of the pixel points in the region are set according to a corresponding two-dimensional Gaussian distribution calculation mode;

a3, constructing a corresponding network Loss function based on a Focus Loss (Focus Loss) function;

a4, performing feature extraction processing on the key point training images based on a backbone network to generate corresponding key point training feature maps;

step A5, inputting the key point training characteristic diagram into a key point detection network, and obtaining a corresponding first training thermodynamic diagram through the operation of a first convolution network unit and a second convolution network unit; calculating loss values of the first training thermodynamic diagram and the first label thermodynamic diagram based on a network loss function to generate corresponding training loss values;

step A6, judging whether the training loss value meets a preset reasonable loss range; if so, turning to the step A1 to obtain the next training image for training until the training loss values of the specified number of training images all meet the reasonable loss range; if not, solving the network parameters of the key point detection network which enables the network loss function to reach the minimum value to generate a corresponding key point network parameter set; modulating the key point detection network according to the key point network parameter set; after the modulation is completed, returning to the step A5 to continue training based on the known key point training feature map.

Step 5, performing lane line foreground point detection processing on the first feature map based on a binary segmentation network to generate a corresponding first foreground point coordinate set;

the binary segmentation network comprises a third convolution network unit, a fourth convolution network unit, a first full-connection network unit and a second full-connection network unit; the third convolution network unit is connected with the fourth convolution network unit, the fourth convolution network unit is connected with the first full-connection network unit, and the first full-connection network unit is connected with the second full-connection network unit; the first full-connection network unit consists of a full-connection network layer and a normalization network layer; the second full-connection network unit consists of a full-connection network layer and a softmax classification network layer; the first foreground point coordinate set comprises a plurality of first foreground point coordinates;

the method specifically comprises the following steps: step 51, inputting the first feature map into a third convolution network unit; performing feature extraction on the first feature map by using a preset third convolution filter by using a third convolution network unit according to a mode that the step length is 1 and the filling is 1 to generate a corresponding fourth feature map;

wherein the third convolution filter is formed by a second number n ₂ 3 × 3 convolution kernel component of (a); dimension W of the fourth feature pattern ₄ *H ₄ Characteristic dimension of D ₄ ，W ₄ 、H ₄ Width and height, W, of the fourth feature map, respectively ₄ =W ₁ 、H ₄ =H ₁ 、D ₄ =n ₂ (ii) a A second number n ₂ By default 128;

here, the filtering processing manner of the third convolutional network unit is also similar to that in step 41, and is not further described herein; because the convolution step size appointed by the third convolution network unit in the filtering process is step size 1, the appointed convolution filling mode is filling 1, and the number of convolution kernels of the used third convolution filter is the second number n ₂ (ii) a Therefore, the size of the filtered third feature map should be equal to the size (W) of the first feature map ₁ *H ₁ ) Keep consistent and have a second number n of feature dimensions ₂ Namely: width W of the fourth feature map ₄ =W ₁ Height H ₄ =H ₁ Characteristic dimension D ₄ =n ₂ ；

Step 52, inputting the fourth feature map into a fourth convolution network unit; performing feature extraction on the fourth feature map by using a preset fourth convolution filter according to the step length of 1 and the filling of 1 by using a fourth convolution network unit to generate a corresponding fifth feature map;

wherein the fourth convolution filter is formed by a third number n ₃ A convolution kernel of 3 × 3; dimension W of the fifth feature map ₅ *H ₅ Characteristic dimension of D ₅ ，W ₅ 、H ₅ Width and height, W, of the fifth feature map, respectively ₅ =W ₄ =W ₁ 、H ₅ =H ₄ =H ₁ 、D ₅ =n ₃ (ii) a A third number n ₃ By default 64;

here, the filtering processing manner of the fourth convolutional network unit is also similar to that in step 41, and is not further described herein; because the convolution step size agreed by the fourth convolution network unit in the filtering process is step size 1, the agreed convolution filling mode is filling 1, and the number of convolution kernels of the used fourth convolution filter is the third number n ₃ (ii) a Therefore, the size of the filtered fifth feature map should be equal to the size (W) of the fourth feature map ₄ *H ₄ ) Remain consistent and have a characteristic dimension ofA third number n ₃ Namely: width W of fifth feature map ₅ =W ₄ =W ₁ Height H ₅ =H ₄ =H ₁ Characteristic dimension D ₅ =n ₃ ；

Step 53, converting the fifth feature map into a one-dimensional feature map with a length W ₅ *H ₅ *D ₅ The first feature vector of (1);

step 54, inputting the first feature vector into a first full-connection network unit; performing feature regression operation on the first feature vector by the first full-connection network unit to generate a corresponding second feature vector, and performing normalization processing on the second feature vector to generate a corresponding third feature vector;

here, it is known that the first fully-connected network unit is composed of a fully-connected network layer and a normalized network layer, and the input vector length of the first fully-connected network unit is W ₅ *H ₅ *D ₅ Output vector of length W ₅ *H ₅ *(D ₅ 4); when the first full-connection network unit receives the first feature vector, the length is W ₅ *H ₅ *D ₅ The first feature vector is input into the full-connection network layer to perform feature regression operation to obtain a second feature vector, and then the second feature vector is input into the normalization network layer to perform normalization processing to obtain a second feature vector with the length of W ₅ *H ₅ *(D ₅ A third feature vector of/4);

step 55, inputting the third feature vector into the second full-connection network unit; performing feature regression operation on the third feature vector by a second full-connection network unit to generate a corresponding fourth feature vector, and performing second classification processing on the fourth feature vector to generate a corresponding first classification tensor;

wherein the first class tensor comprises W ₅ *H ₅ A first class vector a of length 2 _x,y ,1≤x≤W ₅ ,1≤y≤H ₅ (ii) a First classification vector a _x,y The foreground point probability and the background point probability are included; each first classification vector a _x,y Corresponding to a pixel point of the first feature map;

here, the second full link is knownThe network connection unit comprises a fully-connected network layer and a softmax classified network layer, and the input vector length of the second fully-connected network unit is W ₅ *H ₅ *(D ₅ /4) output tensor shape W ₅ *H ₅ *2; when the second full-connection network unit receives the third feature vector, the length is W ₅ *H ₅ *(D ₅ /4) inputting the third feature vector into the fully-connected network layer to perform feature regression operation to obtain a fourth feature vector, inputting the fourth feature vector into the softmax classification network layer to perform two classification processing on foreground points and background points to obtain a second feature vector ₅ *H ₅ A first classification vector a _x,y A first class tensor of formation; the first classification vector a obtained here _x,y Corresponding to the pixel points in the fifth characteristic diagram, the size of the fifth characteristic diagram is consistent with that of the first characteristic diagram, so the first classification vector a _x,y The first characteristic diagram is in one-to-one correspondence with the pixel points in the first characteristic diagram;

step 56, for each first classification vector a _x,y Traversing; during the traversal, the first classification vector a of the current traversal is used _x,y Recording as a current vector; recording the probability of larger numerical value in the current vector as a first probability; if the first probability is the foreground point probability, extracting the pixel point coordinate (x, y) corresponding to the current vector as a corresponding first foreground point coordinate; and when the traversal is finished, forming a corresponding first foreground point coordinate set by all the obtained first foreground point coordinates.

In summary, the process of performing the lane line foreground point detection processing on the first feature map based on the binary segmentation network in the current step 5 consists of the above steps 51 to 56; in the steps 51-52, the first feature map is subjected to foreground and background feature learning based on two convolution units by taking the lane line pixel points as foreground points and taking the non-lane line pixel points as background points to obtain a corresponding feature map, namely a fifth feature map; step 53, converting the fifth feature map into a one-dimensional vector to be input into a subsequent first fully-connected network unit; step 54-55, classifying the pixel points on the fifth feature map by using two full-connection network units to obtain a first classification tensor; step 56 is to extract the pixel point coordinates of the pixel points of the foreground point type on the fifth feature map as the first foreground point coordinates, and form a first foreground point coordinate set by all the obtained first foreground point coordinates.

It should be noted that, in the embodiment of the present invention, before performing the detection processing of the lane line foreground point on the first feature map based on the binary segmentation network, the binary segmentation network needs to be trained; the method for training the binary segmentation network comprises the following steps:

step B1, acquiring a training image with lane lines as a binary segmentation training image, and marking the center lines of all the lane lines on the binary segmentation training image to obtain a plurality of first center lines; widening the specified pixel width s leftwards and rightwards by taking each first central line as a reference to obtain a corresponding first lane line area;

here, each first lane line region corresponds to one first center line, and the width of each first lane line region is 2s;

step B2, constructing a binary image with the same size as the binary segmentation training image as a first label binary image, setting the pixel values of the pixel points of each first lane line region on the first label binary image to be 1, and setting the pixel values of the other pixel points to be 0; carrying out one-hot (one-hot) vector coding on the first label binary image to obtain a corresponding first label vector;

step B3, constructing a corresponding classification Loss function based on the Focus Loss (Focus Loss) function, and constructing a corresponding lane line offset Loss function based on the L1 Loss (L1 Loss) function;

b4, performing feature extraction processing on the binary segmentation training image based on the backbone network to generate a corresponding binary segmentation training feature map;

step B5, inputting the binary segmentation training feature map into a binary segmentation network to obtain a corresponding training classification tensor; carrying out one-hot vector coding on the training classification tensor to obtain a corresponding first training vector; performing binary image conversion on the training classification tensor to generate a corresponding first training binary image; calculating the classification loss of the first training vector and the first label vector based on a classification loss function to generate a corresponding classification loss value, and calculating the lane line offset loss of the first training binary image and the first label binary image based on a lane line offset loss function to generate a corresponding lane line offset loss value; calculating the sum of the classification loss value and the lane line offset loss value to generate a corresponding first loss sum;

step B6, judging whether the first loss sum meets a preset reasonable loss range; if yes, turning to the step B1 to obtain the next training image for training until the first loss sum of the specified number of training images meets the reasonable loss range; if not, solving the network parameters of the binary segmentation network which enables the classification loss function and the lane line offset loss function to reach the minimum value to generate a corresponding binary segmentation network parameter set; modulating the binary segmentation network according to the binary segmentation network parameter set; after the modulation is completed, returning to the step B5 to continue training based on the known binary segmentation training feature map.

Step 6, performing pixel point lane offset voting processing on the first feature map based on a regression voting network to obtain a corresponding first offset feature map;

the regression voting network comprises a fifth convolution network unit, a sixth convolution network unit and a third full-connection network unit; the fifth convolution network unit is connected with the sixth convolution network unit, and the sixth convolution network unit is connected with the third full-connection network unit; the third fully-connected network unit consists of two layers of fully-connected networks;

the method specifically comprises the following steps: step 61, inputting the first feature map into a fifth convolution network unit; performing feature extraction on the first feature map by using a preset fifth convolution filter by using a fifth convolution network unit according to a mode that the step length is 1 and the filling is 1 to generate a corresponding sixth feature map;

wherein the fifth convolution filter consists of a fourth number n ₄ 3 × 3 convolution kernel composition of (a); dimension W of the sixth feature pattern ₆ *H ₆ Characteristic dimension of D ₆ ，W ₆ 、H ₆ Width and height, W, of the sixth feature map, respectively ₆ =W ₁ 、H ₆ =H ₁ 、D ₆ =n ₄ (ii) a Fourth number n ₄ By default 128;

here, the filtering processing manner of the fifth convolutional network unit is also similar to that in step 41, and is not further described herein; because the convolution step size agreed by the fifth convolution network unit in the filtering process is 1, the agreed convolution filling mode is filling 1, and the number of convolution kernels of the fifth convolution filter used is the fourth number n ₄ (ii) a Therefore, the size of the sixth feature map obtained by filtering should be equal to the size (W) of the first feature map ₁ *H ₁ ) Keep consistent and feature dimension is fourth number n ₄ Namely: width W of sixth feature map ₆ =W ₁ Height H ₆ =H ₁ Characteristic dimension D ₆ =n ₄ ；

Step 62, inputting the sixth feature map into a sixth convolution network unit; performing feature extraction on the sixth feature map by using a preset sixth convolution filter by using a sixth convolution network unit according to a mode that the step length is 1 and the filling is 1 to generate a corresponding seventh feature map;

wherein the sixth convolution filter consists of a fifth number n ₅ 3 × 3 convolution kernel composition of (a); dimension W of the seventh feature map ₇ *H ₇ Characteristic dimension of D ₇ ，W ₇ 、H ₇ Width and height, W, of the seventh characteristic diagram, respectively ₇ =W ₆ =W ₁ 、H ₇ =H ₆ =H ₁ 、D ₇ =n ₅ (ii) a A fifth number n ₅ 16 by default;

here, the filtering processing manner of the sixth convolutional network unit is also similar to that in step 41, and is not further described herein; because the convolution step size agreed by the sixth convolution network unit in the filtering process is 1, the agreed convolution filling mode is 1, and the number of convolution kernels of the sixth convolution filter used is the fifth number n ₅ (ii) a Therefore, the size of the filtered seventh feature map should be equal to the size (W) of the sixth feature map ₆ *H ₆ ) Remain consistent and have a feature dimension of a fifth number n ₅ Namely: seventh characteristic diagramWidth W of ₇ =W ₆ =W ₁ Height H ₇ =H ₆ =H ₁ Characteristic dimension D ₇ =n ₅ ；

Step 63, converting the seventh feature map into a one-dimensional feature map with the length W ₇ *H ₇ *D ₇ The first seed vector of (1);

step 64, inputting the first seed vector into a third full-connection network unit; learning the coordinate offset of x-direction and y-direction pixel points from the pixel points on each lane line to the initial point of the lane line on the seventh feature map corresponding to the first seed vector by a third full-connection network unit through two layers of full-connection networks to generate a corresponding first offset tensor;

wherein the first offset tensor comprises W ₇ *H ₇ A first offset vector b of length 2 _x,y ,1≤x≤W ₇ ,1≤y≤H ₇ (ii) a A first offset vector b _x,y Comprises a first offset quantity delta x and a second offset quantity delta y; each first offset vector b _x,y Corresponding to a pixel point of the first feature map; a first offset vector b _x,y If the two offsets are all 0, the corresponding pixel point on the first characteristic diagram is a non-lane line pixel point, if the two offsets are not all 0, the corresponding pixel point on the first characteristic diagram is a lane line pixel point, and the first offset delta x and the second offset delta y are respectively coordinate offsets of the pixel point in the x direction and the y direction from the current lane line pixel point to the starting point of the lane line where the current lane line pixel point is located;

here, the third fully-connected network element is a regression network formed by two fully-connected networks, the first of which has an input vector of length W ₇ *H ₇ *D ₇ The output vector length is W ₇ *H ₇ *(D ₇ /2) input vector length of the second fully-connected network is W ₇ *H ₇ *(D ₇ /2) output vector length W ₇ *H ₇ *2; the first offset vector b obtained here _x,y Corresponding to the pixel points in the seventh characteristic diagram, the seventh characteristic diagram is consistent with the first characteristic diagram in size, so the first offset vector b _x,y And the image in the first characteristic diagramThe prime points are also in one-to-one correspondence;

step 65, performing eigen map conversion on the first migration tensor to generate a corresponding first migration eigen map;

wherein the first offset feature map has a dimension W ₈ *H ₈ A characteristic dimension of 2,W ₈ 、H ₈ Width and height, W, of the first offset profile, respectively ₈ =W ₇ =W ₁ 、H ₈ =H ₇ =H ₁ (ii) a The first offset profile includes W ₈ *H ₈ A first pixel point; each first pixel point comprises two characteristic data which are respectively a first offset delta x and a second offset delta y.

To sum up, the process of performing lane line foreground spot detection processing on the first feature map based on the regression voting network in the current step 6 consists of the steps 61-65; in steps 61-62, feature extraction is performed on the first feature map based on two convolution units to obtain a corresponding feature map, namely a seventh feature map; step 63, converting the seventh feature map into a one-dimensional vector to be input into a subsequent third fully-connected network unit; step 64, learning the coordinate offset of the x-direction pixel point and the y-direction pixel point from the pixel point on the lane line on the seventh characteristic diagram to the starting point of the lane line on which the pixel point is located through two full-connection networks to obtain a first offset tensor; step 65 converts the first migration tensor into an offset eigenmap, i.e., a first migrated eigenmap, that is consistent with the first eigenmap size.

It should be noted that, in the embodiment of the present invention, before the detection processing of the lane line foreground points is performed on the first feature map based on the regression voting network, the regression voting network needs to be trained; the method for training the regression voting network comprises the following steps:

step C1, acquiring a first training image; taking pixel points covered by all lane lines on the first training image as first foreground points, taking pixel points outside all lane lines as first background points, and taking a central point of an initial position of any lane line as a first initial point;

wherein the first training image has a size W ₀ *H ₀ ；

Step C2, constructing a first comparison characteristic diagram; taking second pixel points of the pixel point coordinates on the first comparison characteristic image corresponding to each first foreground point as second foreground points, taking second pixel points of the pixel point coordinates corresponding to each first background point as second background points, and taking second pixel points of the pixel point coordinates corresponding to each first starting point as second starting points;

wherein the first comparison characteristic diagram has a size W ₀ *H ₀ The characteristic dimension is 2; the first comparative characteristic diagram includes W ₀ *H ₀ Each second pixel point comprises two characteristic data;

step C3, setting two feature data of each second background point and each second starting point on the first comparison feature map as 0;

step C4, traversing each second foreground point on the first comparison feature map; taking the currently traversed second foreground point as a current foreground point; and calculating the coordinate offset of the x-direction pixel point and the y-direction pixel point from the current foreground point to the second starting point corresponding to the lane line where the current foreground point is located to generate the corresponding x-direction offset delta x ₁ And y-direction offset amount Deltay ₁ (ii) a And offset the x direction by an amount Deltax ₁ And a y-direction offset amount Deltay ₁ Two characteristic data of the current foreground point are taken;

step C5, performing feature extraction processing on the first training image based on the backbone network to generate a corresponding first training feature map;

step C6, inputting the first training feature map into a regression voting network; performing feature extraction on the first training feature map by using a fifth convolution filter by using a fifth convolution network unit according to a mode that the step length is 1 and the filling is 1 to generate a corresponding second training feature map; performing feature extraction on the second training feature map by using a sixth convolution filter according to the mode that the step length is 1 and the filling is 1 by using a sixth convolution network unit to generate a corresponding third training feature map; converting the third training characteristic diagram into a one-dimensional first training seed vector; a third full-connection network unit learns x and y direction pixel point coordinate offsets from pixel points on each lane line to the initial point of the lane line on which the pixel points are located on a third training feature map corresponding to the first training seed vector through two layers of full-connection networks to generate a corresponding first training offset tensor; performing characteristic diagram conversion on the first training offset tensor to generate a corresponding first training offset characteristic diagram;

wherein the first training offset signature has a size W ₀ *H ₀ The characteristic dimension is 2; the first training offset profile includes W ₀ *H ₀ Each third pixel point comprises two characteristic data which are respectively x-direction offset delta x ₂ And y-direction offset amount Deltay ₂ (ii) a The third pixel point and the second pixel point correspond to each other according to the pixel point coordinates (x, y), wherein x belongs to [1, W ] ₀ ],y∈[1,H ₀ ]；

Step C7, constructing an L1 loss function F of

；

Step C8, calculating average absolute errors of all corresponding second and third pixel points in the first comparison characteristic diagram and the first training offset characteristic diagram according to the L1 loss function F to generate corresponding first loss values;

step C9, judging a first loss value based on a preset reasonable loss interval; if the first loss value meets the reasonable loss interval, determining that the training loss value of the first training image reaches the standard and turning to the step C11; if the first loss value does not meet the reasonable loss interval, turning to the step C10;

step C10, solving the network parameters of the regression voting network which enables the L1 loss function F to reach the minimum value to generate a corresponding first network parameter set; modulating the network parameters of the regression voting network according to the first network parameter set; after modulation is finished, returning to the step C6 to continue training based on the known first training feature map;

and step C11, returning to the step C1 to obtain new first training images again for training until the training loss values of the first training images in the specified number reach the standard.

Step 7, performing lane line semantic feature marking processing on the pixel points of the first feature map according to the first initial point coordinate set, the first foreground point coordinate set and the first offset feature map;

the method specifically comprises the following steps: step 71, allocating a corresponding lane line identifier to each first starting point coordinate;

here, each first starting point coordinate corresponds to a real lane, and allocating a lane line mark to each first starting point coordinate is that allocating a corresponding lane example mark to each real lane;

step 72, taking first pixel points corresponding to the first foreground point coordinates on the first offset characteristic diagram as first lane line pixel points;

step 73, traversing each first lane line pixel point; during traversal, taking the currently traversed first lane line pixel points as current lane line pixel points; extracting x-direction coordinates and y-direction coordinates of the current lane line pixel points as corresponding first abscissa and first ordinate, extracting first offset delta x and second offset delta y of the current lane line pixel points as corresponding first transverse offset and first longitudinal offset, and generating corresponding first starting point abscissa x by adding the first abscissa and the first transverse offset ₀ And generating a corresponding first starting point ordinate y from the sum of the first ordinate and the first longitudinal offset ₀ And from the first starting point abscissa x ₀ And a first starting point ordinate y ₀ Forming a corresponding second starting point coordinate; calculating the linear distances between the second starting point coordinate and all the first starting point coordinates to obtain a plurality of first distances; taking a first starting point coordinate corresponding to the shortest first distance in the plurality of first distances as a matching starting point coordinate corresponding to the current lane line pixel point; taking the lane line identifier corresponding to the coordinate of the matching initial point as a matching lane line identifier corresponding to the pixel point of the current lane line;

here, the pixel coordinates (first abscissa, first ordinate) of each first lane line pixel point on the first offset feature map plus the corresponding two-dimensional offset (first offset Δ x, second offset Δ y) is the second start point coordinates (first abscissa + Δ x, first ordinate + Δ y) of the lane line start point corresponding to the first lane line pixel point predicted by the regression voting network; each first initial point coordinate in the first initial point coordinate set is an initial point coordinate of each lane line predicted by the key point detection network; calculating the straight-line distance from the second starting point coordinate to each first starting point coordinate, wherein the shortest straight-line distance is the first starting point coordinate matched with the current second starting point coordinate; in this way, a first start point coordinate corresponding to each first lane line pixel point can be obtained, that is, all the first lane line pixel points on the first offset feature map can be classified according to the first start point coordinate, and the corresponding classification identifier is the lane line identifier;

step 74, adding a lane line semantic feature for each pixel point on the first feature map, and initializing feature values of the lane line semantic features of all the pixel points as invalid identifications; setting the lane line semantic features of the pixel points corresponding to the first initial point coordinates as corresponding lane line identifications; and setting the lane line semantic features of the pixel points corresponding to the first lane line pixel points as corresponding matched lane line identifications.

Here, the size of the initial first feature map is W ₁ *H ₁ Characteristic dimension of D ₁ After adding a lane line semantic feature, the size of the first feature map is W ₁ *H ₁ Characteristic dimension of D ₁ +1; the lane line semantic features of all the pixel points which are not on the lane line on the first feature map are invalid identifications (for example, 0), and the lane line semantic feature of any pixel point on any lane line is the lane line identification of the lane line where the pixel point is located; in this way, the example classification is completed for the lane lines detected on the first feature map.

Step 8, drawing lane lines on the first characteristic diagram; outputting a first characteristic diagram which is drawn by the lane line as a lane line detection result;

wherein, carry out lane line drawing on first characteristic map, specifically include: distributing a corresponding lane line color pixel value for each lane line mark; traversing each pixel point on the first characteristic diagram; during the traversal, recording the currently traversed pixel as a current pixel, and extracting the lane line semantic features of the current pixel as corresponding current lane line identifiers; and if the current lane line identifier is not an invalid identifier, setting the pixel value of the current pixel point as a lane line color pixel value corresponding to the current lane line identifier.

Here, by setting different pixel values for the pixel points belonging to different lane line identifications, it is possible to visually distinguish different lane line instances on the first feature map, thereby obtaining a visual lane line detection image with lane line instance information.

Fig. 2 is a block diagram of a lane line detection apparatus according to a second embodiment of the present invention, where the apparatus is a terminal device or a server that implements the foregoing method embodiment, and may also be an apparatus that enables the foregoing terminal device or server to implement the foregoing method embodiment, for example, the apparatus may be an apparatus or a chip system of the foregoing terminal device or server. As shown in fig. 2, the apparatus includes: the system comprises an acquisition module 201, an image preprocessing module 202, a backbone network processing module 203, a key point detection network processing module 204, a binary segmentation network processing module 205, a regression voting network processing module 206 and a lane line detection output module 207.

The acquisition module 201 is configured to acquire a first image.

The image preprocessing module 202 is configured to perform image scaling processing on the first image according to a preset image size to generate a corresponding second image.

The backbone network processing module 203 is configured to perform feature extraction processing on the second image based on a backbone network to generate a corresponding first feature map.

The key point detection network processing module 204 is configured to perform lane line starting point detection processing on the first feature map based on a key point detection network to generate a corresponding first starting point coordinate set; the first set of start point coordinates comprises a plurality of first start point coordinates.

The binary segmentation network processing module 205 is configured to perform lane line foreground point detection processing on the first feature map based on a binary segmentation network to generate a corresponding first foreground point coordinate set; the first set of foreground point coordinates includes a plurality of first foreground point coordinates.

The regression voting network processing module 206 is configured to perform pixel point lane deviation voting on the first feature map based on a regression voting network to obtain a corresponding first deviation feature map.

The lane line detection output module 207 is configured to perform lane line semantic feature labeling processing on the pixel points of the first feature map according to the first start point coordinate set, the first foreground point coordinate set, and the first offset feature map; drawing lane lines on the first characteristic diagram; and outputting the first characteristic diagram which is used for finishing the drawing of the lane line as a lane line detection result.

The lane line detection device provided by the embodiment of the invention can execute the method steps in the method embodiment, and the implementation principle and the technical effect are similar, so that the detailed description is omitted.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the obtaining module may be a processing element separately set up, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a processing element of the apparatus calls and executes the functions of the determining module. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more Digital Signal Processors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, when some of the above modules are implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor that can invoke the program code. As another example, these modules may be integrated together and implemented in the form of a System-on-a-chip (SOC).

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the foregoing method embodiments are generated in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, bluetooth, microwave, etc.) means.

Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. The electronic device may be the terminal device or the server, or may be a terminal device or a server connected to the terminal device or the server and implementing the method according to the embodiment of the present invention. As shown in fig. 3, the electronic device may include: a processor 301 (e.g., a CPU), a memory 302, a transceiver 303; the transceiver 303 is coupled to the processor 301, and the processor 301 controls transceiving operation of the transceiver 303. Various instructions may be stored in memory 302 for performing various processing functions and implementing the processing steps described in the foregoing method embodiments. Preferably, the electronic device according to an embodiment of the present invention further includes: a power supply 304, a system bus 305, and a communication port 306. The system bus 305 is used to implement communication connections between the elements. The communication port 306 is used for connection communication between the electronic device and other peripherals.

The system bus 305 mentioned in fig. 3 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 3, but that does not indicate only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM), and may further include a Non-Volatile Memory (Non-Volatile Memory), such as at least one disk Memory.

The Processor may be a general-purpose Processor, and includes a central Processing Unit CPU, a Network Processor (NP), a Graphics Processing Unit (GPU), and the like; but also a digital signal processor DSP, an application specific integrated circuit ASIC, a field programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components.

It should be noted that the embodiment of the present invention also provides a computer-readable storage medium, in which instructions are stored, and when the computer-readable storage medium runs on a computer, the computer is caused to execute the method and the processing procedure provided in the above embodiment.

The embodiment of the present invention further provides a chip for executing the instructions, where the chip is configured to execute the processing steps described in the foregoing method embodiment.

The embodiment of the invention provides a lane line detection method, a lane line detection device, electronic equipment and a computer readable storage medium, wherein a basic feature graph is generated by extracting basic features of a perception image through a backbone network; then, detecting a lane line starting point of the basic feature map based on a key point detection network, detecting lane line foreground spots of the basic feature map based on a binary segmentation network, and learning the x-direction and y-direction offset features from the lane line foreground spots to the starting point based on a regression voting network; distributing a corresponding lane line mark for each detected lane line starting point; marking corresponding lane line semantic features on pixel points on each lane line of the basic feature map according to the detected lane line starting point coordinates, the detected lane line foreground point coordinates and the detected offset features from the lane line foreground points to the starting point, wherein the lane line semantic features correspond to a lane line identifier and a lane line color pixel value; and then according to the lane line color pixel values corresponding to the lane line semantic features, dyeing the pixel points belonging to different lane lines on the basic feature map. According to the invention, on one hand, the calculated amount of lane line detection is greatly reduced; on the other hand, the visual lane line detection image is output, and the lane line semantic features for lane line classification are added to the image.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A lane line detection method, the method comprising:

acquiring a first image;

performing image scaling processing on the first image according to a preset image size to generate a corresponding second image;

drawing lane lines on the first characteristic diagram; outputting the first characteristic diagram which is drawn by the lane line as a lane line detection result;

wherein the preset image size is W ₀ *H ₀ ，W ₀ 、H ₀ Respectively the width and the height of the preset image size;

the second image has a size W ₀ *H ₀ ；

the binary segmentation network comprises a third convolution network unit, a fourth convolution network unit, a first full-connection network unit and a second full-connection network unit; the third convolution network unit is connected with the fourth convolution network unit, the fourth convolution network unit is connected with the first fully-connected network unit, and the first fully-connected network unit is connected with the second fully-connected network unit; the first fully-connected network unit consists of a fully-connected network layer and a normalized network layer; the second fully-connected network unit consists of a fully-connected network layer and a softmax classified network layer;

the regression voting network comprises a fifth convolution network unit, a sixth convolution network unit and a third full-connection network unit; the fifth convolution network unit is connected with the sixth convolution network unit, and the sixth convolution network unit is connected with the third fully-connected network unit; the third fully connected network unit consists of two layers of fully connected networks;

the feature extraction processing on the second image based on the backbone network to generate a corresponding first feature map specifically includes:

inputting the second image into the backbone network for feature extraction to obtain three-level output feature maps with different sizes, namely a first-level feature map, a second-level feature map and a third-level feature map; and using the primary feature map as the first feature map; the first characteristic diagram has a size W ₁ *H ₁ Characteristic dimension of D ₁ ，W ₁ 、H ₁ Width and height, W, of the first feature map, respectively ₁ =W ₀ /2、H ₁ =H ₀ /2、D ₁ =64；

The processing of the lane offset voting of the pixel points on the first feature map based on the regression voting network to obtain a corresponding first offset feature map specifically includes:

inputting the first feature map into the fifth convolution network unit; performing feature extraction on the first feature map by using a preset fifth convolution filter according to a mode that the step length is 1 and the filling is 1 by the fifth convolution network unit to generate a corresponding sixth feature map; said fifth convolution filter is composed of a fourth number n ₄ 3 × 3 convolution kernel composition of (a); the sixth characteristic diagram has a size W ₆ *H ₆ Characteristic dimension of D ₆ ，W ₆ 、H ₆ Width and height, W, of the sixth feature map, respectively ₆ =W ₁ 、H ₆ =H ₁ 、D ₆ =n ₄ (ii) a The fourth number n ₄ By default 128;

inputting the sixth feature map into the sixth convolutional network unit; performing feature extraction on the sixth feature map by using a preset sixth convolution filter by the sixth convolution network unit according to a mode that the step length is 1 and the filling is 1 to generate a corresponding seventh feature map; the sixth convolution filter is formed by a fifth number n ₅ 3 × 3 convolution kernel composition of (a); the dimension of the seventh characteristic diagram is W ₇ *H ₇ Characteristic dimension of D ₇ ，W ₇ 、H ₇ Are respectively as followsWidth and height of seventh feature map, W ₇ =W ₆ =W ₁ 、H ₇ =H ₆ =H ₁ 、D ₇ =n ₅ (ii) a The fifth number n ₅ 16 by default;

converting the seventh feature map into a one-dimensional length W ₇ *H ₇ *D ₇ The first seed vector of (a);

inputting the first seed vector into the third fully-connected network element; learning the coordinate offset of x-direction and y-direction pixel points from the pixel points on each lane line on the seventh feature map corresponding to the first seed vector to the starting point of the located lane line by the third full-connection network unit through two layers of full-connection networks to generate a corresponding first offset tensor; the first offset tensor comprises W ₇ *H ₇ A first offset vector b of length 2 _x,y ,1≤x≤W ₇ ,1≤y≤H ₇ (ii) a The first offset vector b _x,y Comprises a first offset quantity delta x and a second offset quantity delta y; each of the first offset vectors b _x,y Corresponding to a pixel point of the first feature map; the first offset vector b _x,y If the two offsets are not 0, the corresponding pixel point on the first characteristic diagram is a non-lane line pixel point, if the two offsets are not 0, the corresponding pixel point on the first characteristic diagram is a lane line pixel point, and the first offset delta x and the second offset delta y are respectively the x-direction and y-direction pixel point coordinate offsets from the current lane line pixel point to the starting point of the lane line where the current lane line pixel point is located;

performing eigen map conversion on the first migration tensor to generate a corresponding first migration eigen map; the first offset profile has a dimension W ₈ *H ₈ A characteristic dimension of 2,W ₈ 、H ₈ Width and height, W, of the first offset profile, respectively ₈ =W ₇ =W ₁ 、H ₈ =H ₇ =H ₁ (ii) a The first offset profile comprises W ₈ *H ₈ A first pixel point; each first pixel point comprises two characteristic data which are respectively the first offset delta x and the second offset delta y.

2. The method according to claim 1, wherein the performing, by the network based on keypoint detection, processing of lane start point detection on the first feature map to generate a corresponding first start point coordinate set specifically includes:

inputting the first feature map into the first convolutional network unit; performing feature extraction on the first feature map by using a preset first convolution filter by the first convolution network unit according to a mode that the step length is 1 and the filling is 1 to generate a corresponding second feature map; the first convolution filter is composed of a first number n ₁ 3 × 3 convolution kernel composition of (a); the second characteristic diagram has a size W ₂ *H ₂ Characteristic dimension of D ₂ ，W ₂ 、H ₂ Width and height, W, of the second feature map, respectively ₂ =W ₁ 、H ₂ =H ₁ 、D ₂ =n ₁ (ii) a The first number n ₁ By default 64;

inputting the second feature map into the second convolutional network unit; performing feature extraction on the second feature map by using a preset second convolution filter according to the mode that the step length is 1 and the filling is 1 by the second convolution network unit to generate a corresponding third feature map; the second convolution filter consists of 1 convolution kernel of 3 × 3; the dimension of the third characteristic diagram is W ₃ *H ₃ Characteristic dimension of D ₃ ，W ₃ 、H ₃ Width and height, W, of the third feature map, respectively ₃ =W ₂ =W ₁ 、H ₃ =H ₂ =W ₁ 、D ₃ =1；

3. The method according to claim 1, wherein the performing, based on a binary segmentation network, a lane line foreground point detection process on the first feature map to generate a corresponding first foreground point coordinate set specifically includes:

inputting the first feature map into the third convolution network unit; performing feature extraction on the first feature map by using a preset third convolution filter according to the mode that the step length is 1 and the filling is 1 by the third convolution network unit to generate a corresponding fourth feature map; the third convolution filter is formed by a second number n ₂ 3 × 3 convolution kernel component of (a); the fourth characteristic diagram has the size W ₄ *H ₄ Characteristic dimension of D ₄ ，W ₄ 、H ₄ Width and height, W, of the fourth feature map, respectively ₄ =W ₁ 、H ₄ =H ₁ 、D ₄ =n ₂ (ii) a The second number n ₂ By default 128;

inputting the fourth feature map into the fourth convolutional network unit; performing feature extraction on the fourth feature map by using a preset fourth convolution filter according to a mode that the step length is 1 and the filling is 1 by the fourth convolution network unit to generate a corresponding fifth feature map; said fourth convolution filter consisting of a third number n ₃ A convolution kernel of 3 × 3; the dimension of the fifth characteristic diagram is W ₅ *H ₅ Characteristic dimension of D ₅ ，W ₅ 、H ₅ Width and height, W, of the fifth feature map, respectively ₅ =W ₄ =W ₁ 、H ₅ =H ₄ =H ₁ 、D ₅ =n ₃ (ii) a The third number n ₃ Default to 64;

converting the fifth feature map into a one-dimensional feature map with a length W ₅ *H ₅ *D ₅ A first feature vector of (a);

inputting the first feature vector into the first fully-connected network unit; performing feature regression operation on the first feature vector by the first fully-connected network unit to generate a corresponding second feature vector, and performing normalization processing on the second feature vector to generate a corresponding third feature vector;

inputting the third feature vector into the second fully-connected network unit; performing feature regression operation on the third feature vector by the second fully-connected network unit to generate a corresponding fourth feature vector, and performing secondary classification processing on the fourth feature vector to generate a corresponding first classification tensor; the first class tensor comprises W ₅ *H ₅ A first class vector a of length 2 _x,y ,1≤x≤W ₅ ,1≤y≤H ₅ (ii) a The first classification vector a _x,y The foreground point probability and the background point probability are included; each of the first classification vectors a _x,y Corresponding to a pixel point of the first feature map;

4. The lane line detection method according to claim 1, wherein before performing pixel point lane offset voting on the first feature map based on a regression voting network to obtain a corresponding first offset feature map, the method further comprises training the regression voting network; the training of the regression voting network specifically includes:

step 71, acquiring a first training image; taking pixel points covered by all lane lines on the first training image as first foreground points, taking pixel points outside all lane lines as first background points, and taking a central point of an initial position of any lane line as a first initial point;

wherein the first training image has a size W ₀ *H ₀ ；

Step 72, constructing a first comparison characteristic diagram; taking second pixel points of the pixel point coordinates on the first comparison characteristic image corresponding to the first foreground points as second foreground points, taking second pixel points of the pixel point coordinates corresponding to the first background points as second background points, and taking the second pixel points of the pixel point coordinates corresponding to the first starting points as second starting points;

wherein the first comparison characteristic diagram has a size W ₀ *H ₀ The characteristic dimension is 2; the first comparative characteristic diagram comprises W ₀ *H ₀ Each second pixel point comprises two characteristic data;

step 74, traversing each second foreground point on the first comparison feature map; taking the currently traversed second foreground point as a current foreground point; and calculating the coordinate offset of the x-direction pixel point and the y-direction pixel point of the second starting point corresponding to the lane line where the current foreground point is located to generate the corresponding x-direction offset delta x ₁ And a y-direction offset amount Deltay ₁ (ii) a And shifting the x-direction by an amount Deltax ₁ And said y-direction offset Δ y ₁ As two feature data of the current foreground point;

wherein the first training offset feature map size is W ₀ *H ₀ The characteristic dimension is 2; the first training offset profile comprises W ₀ *H ₀ Each third pixel point comprises two characteristic data which are respectively x-direction offset delta x ₂ And y-direction offset amount Deltay ₂ (ii) a The third pixel point corresponds to the second pixel point one by one according to the pixel point coordinates (x, y), x belongs to [1, W ] ₀ ],y∈[1,H ₀ ]；

Step 77, construct L1 loss function F as

；

Step 78, calculating the average absolute errors of all corresponding second and third pixels in the first comparison feature map and the first training offset feature map according to the L1 loss function F to generate corresponding first loss values;

step 80, solving the network parameters of the regression voting network which enable the L1 loss function F to reach the minimum value to generate a corresponding first network parameter set; modulating the network parameters of the regression voting network according to the first network parameter set; after modulation is completed, returning to step 76 to continue training based on the first training feature map;

5. The method according to claim 1, wherein the performing, according to the first start point coordinate set, the first foreground point coordinate set, and the first offset feature map, lane line semantic feature labeling processing on the pixels of the first feature map specifically includes:

traversing each first lane line pixel point; during the traversal, taking the currently traversed first lane line pixel points as current lane line pixel points; extracting x-direction coordinates and y-direction coordinates of the current lane line pixel points as corresponding first abscissa and first ordinate, extracting the first offset delta x and the second offset delta y of the current lane line pixel points as corresponding first transverse offset and first longitudinal offset, and generating corresponding first starting point abscissa x by adding the first abscissa and the first transverse offset ₀ And generating a corresponding first starting point ordinate y from the sum of the first ordinate and the first longitudinal offset ₀ And is represented by the first starting point abscissa x ₀ And said first starting point ordinate y ₀ Forming a corresponding second starting point coordinate; calculating the linear distances between the second starting point coordinates and all the first starting point coordinates to obtain a plurality of first distances; and will beThe first starting point coordinate corresponding to the shortest first distance in the plurality of first distances is used as a matching starting point coordinate corresponding to the current lane line pixel point; taking the lane line identification corresponding to the matching initial point coordinate as the matching lane line identification corresponding to the current lane line pixel point;

adding a lane line semantic feature for each pixel point on the first feature map, and initializing feature values of the lane line semantic features of all the pixel points as invalid identifications; setting the lane line semantic features of the pixel points corresponding to the first starting point coordinates as corresponding lane line identifications; and setting the lane line semantic features of the pixel points corresponding to the first lane line pixel points as the corresponding matched lane line identifications.

6. The lane line detection method according to claim 1, wherein the step of performing lane line drawing on the first feature map specifically includes:

allocating a corresponding lane line color pixel value to each lane line mark;

traversing each pixel point on the first characteristic diagram; during the traversal, recording the currently traversed pixel as a current pixel, and extracting the lane line semantic features of the current pixel as corresponding current lane line identifiers; and if the current lane line identifier is not an invalid identifier, setting the pixel value of the current pixel point as the lane line color pixel value corresponding to the current lane line identifier.

7. An apparatus for implementing the lane line detection method according to any one of claims 1 to 6, the apparatus comprising: the system comprises an acquisition module, an image preprocessing module, a backbone network processing module, a key point detection network processing module, a binary segmentation network processing module, a regression voting network processing module and a lane line detection output module;

the acquisition module is used for acquiring a first image;

the regression voting network processing module is used for carrying out pixel point lane deviation voting processing on the first feature map based on a regression voting network to obtain a corresponding first deviation feature map;

the lane line detection output module is used for performing lane line semantic feature labeling processing on pixel points of the first feature map according to the first starting point coordinate set, the first foreground point coordinate set and the first offset feature map; drawing lane lines on the first characteristic diagram; and outputting the first characteristic diagram which is used for finishing the drawing of the lane line as a lane line detection result.

8. An electronic device, comprising: a memory, a processor, and a transceiver;

the processor is used for being coupled with the memory, reading and executing the instructions in the memory to realize the method steps of any one of claims 1-6;

9. A computer-readable storage medium having stored thereon computer instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-6.