CN114463724A

CN114463724A - Lane extraction and recognition method based on machine vision

Info

Publication number: CN114463724A
Application number: CN202210370999.3A
Authority: CN
Inventors: 倪树新; 陈启光; 李宇盛
Original assignee: Nanjing Huizhu Information Technology Research Institute Co ltd
Current assignee: Nanjing Huizhu Information Technology Research Institute Co ltd
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2022-05-10

Abstract

The invention provides a machine vision-based lane extraction and identification method, which comprises the following steps: s1, detecting vehicles, and completing the detection of the vehicles by utilizing a mature YOLOv5 network; s2, extracting vehicle tracks, and fitting the central coordinate information of the detected target frame in the step S1 into track information of the vehicle; and S3, automatically extracting the lane, wherein the vehicle track generated in the step S2 is used as an initial condition of the graph cutting algorithm to finish the automatic extraction of the lane. The lane extraction and identification method based on machine vision has the advantage of directly and automatically extracting lanes.

Description

Lane extraction and recognition method based on machine vision

Technical Field

The invention relates to the technical field of lane extraction, in particular to a lane extraction and identification method based on machine vision.

Background

Lane extraction under complex scenes is an important application technology in the field of computer vision, plays an important role in traffic management, urban planning, lane monitoring and the like, such as the aspects of traffic flow statistics, the detection and recognition of roadside signboards, vehicle detection and tracking, abnormal falling object detection on lanes and the like, the information extraction range can be limited by extracting the lane, a foundation is laid for detecting abnormal behaviors on the lane, in recent years, the lane extraction widely uses a deep learning algorithm, but is limited by the number of lane sample sets, if a certain class of objects is not in the training data set and the basic features of the certain class of objects are not learned through the deep convolutional neural network, the extraction performance of the certain class of objects is not high, the calculated amount is also large, the threshold value method is one of extremely classical image segmentation algorithms, and the lane area can be extracted through manual or self-adaptive threshold value selection by utilizing the difference of gray values.

At present, methods such as a threshold method are only suitable for clear boundaries of an interested region and a background, images with simple background pixels are not suitable for lane extraction in a complex scene, a graph cutting algorithm can search and specify seed points from the interested region according to lane data, and a good extraction result is obtained by defining corresponding foreground and background, but manual interaction is needed, and time and labor are consumed.

Therefore, it is necessary to provide a lane extraction and identification method based on machine vision to solve the above technical problems.

Disclosure of Invention

In order to solve the technical problem, the invention provides a lane extraction and identification method based on machine vision, which can directly and automatically extract lanes.

The lane extraction and identification method based on the machine vision provided by the invention comprises the following steps:

s1, vehicle detection is carried out, and the detection of the vehicle is completed by utilizing a mature YOLOv5 network;

s2, extracting vehicle tracks, and fitting the central coordinate information of the detected target frame in the step S1 into track information of the vehicle;

and S3, automatically extracting the lane, wherein the vehicle track generated in the step S2 is used as an initial condition of the graph cutting algorithm to finish the automatic extraction of the lane.

In order to achieve the effects of being capable of completing the training of a model by using a large amount of data, enabling the generalization capability and robustness of the model to be stronger and being more suitable for complex and changeable actual scenes, the YOLO in the YOLO 5 is a target detection algorithm based on a convolutional neural network, and is capable of completing the training of the model by using a large amount of data, and a network model of the YOLO vehicle detection algorithm is divided into four parts, namely an input end, a feature extraction network, a multi-scale feature fusion module and a prediction output end.

In order to achieve the effects of conveniently enriching a data set and increasing the universality of a network, the input end comprises three parts of data enhancement, self-adaptive anchor frame calculation and self-adaptive picture scaling, and pictures are spliced by random clipping, random scaling and random distribution, so that the data set is enriched and the universality of the network is increased; the self-adaptive anchor frame calculation is to calculate the optimal anchor frame value in different training sets in a self-adaptive manner; adaptive picture scaling is the unification of images into a standard size.

In order to achieve the effect of conveniently extracting features, in the feature extraction network, an input image is sliced by a Focus structure of feature extraction, a feature map of 304 × 304 × 32 is obtained after 32 convolution kernels, and a Yolov5 is used for reference of a CSP structure in a Yolov4 backbone network.

In order to achieve the effect of conveniently fusing the feature information, the multi-scale feature fusion module uses a structure of FPN (feature pyramid network) and PAN (pyramid attention network), wherein the FPN transmits and fuses feature information of a high layer from top to bottom to transmit strong semantic features, the PAN transmits strong positioning features from bottom to top, and the feature fusion capability of the FPN and the PAN is enhanced by using the structure of the FPN and the PAN simultaneously.

In order to achieve the effect of conveniently measuring the accuracy of target detection, the prediction output end comprises a Boundingbox loss function and non-maximum suppression, and the loss calculation of the YOLO series is based on confidence coefficient loss, classification loss and positioning loss.

In order to achieve the effect of conveniently extracting the track information of the vehicle, the vehicle track extraction in step S2 includes obtaining a center point of the target frame and determining the motion track, the obtaining of the center point of the target frame selects the target frame according to the obtained vehicle ID number, and records the position information of the vehicle target frame with x, y, w, and h, where x and y represent coordinates of the upper left corner of the target frame, and w and h represent the length and width of the target frame, and then the center coordinates of the target frame are (x + w/2, y + h/2).

In order to achieve the effect of conveniently determining the motion trail of the vehicle, the determination of the motion trail comprises the following steps: tracking each frame to obtain a central coordinate, simultaneously establishing a new point set list, putting the position information of the continuous frames of the same vehicle target in the corresponding point set list, thus establishing a database of a plurality of vehicle position information, and finally fitting the position central points into a motion track according to an algorithm.

In order to achieve the effect of conveniently extracting lane information, the automatically extracting the lane in step S3 includes automatically extracting the lane by combining the motion trajectory and the graph cut algorithm: and processing the obtained motion track information, and selecting an optimal track to initialize the background and the seed points of the lane in the graph cutting algorithm so as to realize the automatic extraction of the lane area.

In order to achieve the effect of conveniently and automatically extracting lane information, the lane automatically extracts and selects an optimal vehicle track, the length of each vehicle track is calculated by a mathematical method according to central coordinates (x + w/2, y + h/2) i and (x + w/2, y + h/2) j, i and j respectively represent a first frame and a last frame, the lengths of a plurality of tracks are averaged, a shorter track is eliminated by a method of comparing the average values, all the reserved tracks are traversed, the longest, leftmost and rightmost tracks are found out to be used as optimal tracks, wherein, an initialization graph cut algorithm translates the three selected optimal tracks by a proper amount to a non-lane area, the graph cut algorithm is initialized, the longest track is marked as a 'lane' seed point, the translated track is defined as a background seed point, the lane extraction method comprises the steps of establishing a weighted graph of similarity between each pixel point and a foreground background by using graph cutting on the basis of an initial seed point, continuously iterating, calculating and solving the global optimum of the minimum cut, and automatically extracting lanes, wherein Long-tracklist is a lane pixel subset, Left-tracklist and Right-tracklist are background pixel subsets.

Compared with the related art, the lane extraction and identification method based on the machine vision has the following beneficial effects:

1. the invention provides a lane automatic extraction algorithm by combining a YOLOv5 vehicle detection and a graph cutting method, the vehicle position information detected by YOLOv5 is fitted into a motion track, the graph cutting algorithm is initialized by the motion track information, and the final extraction effect is obtained, the vehicle position information of the algorithm is from an open-source YOLOv5 network model, and the accurate vehicle detection can be completed without a complex training process, the lane area in a scene can be effectively and automatically extracted by the algorithm, the accuracy and the robustness are higher, the problems that methods such as a threshold value method are only suitable for images with clear interesting areas and background boundaries, background pixels are simple, the lane extraction is not suitable for the lane extraction in the complex scene, the graph cutting algorithm can search and specify seed points from the interesting areas aiming at the lane data, and better extraction results are obtained by defining corresponding 'foreground' and 'background', but the problems of time and labor consumption due to manual interaction are solved;

2. the invention finishes the automatic extraction of the lanes by using the generated vehicle tracks as the initial conditions of a graph cut algorithm, wherein the lanes automatically extract and select the optimal vehicle tracks, the length of each vehicle track is calculated by a mathematical method according to central coordinates (x + w/2, y + h/2) i and (x + w/2, y + h/2) j, i and j respectively represent a first frame and a last frame, the lengths of a plurality of tracks are averaged, the shorter tracks are removed by a method of comparing the average values, all the reserved tracks are traversed, the longest, leftmost and rightmost tracks are found out to be the optimal tracks, wherein, the initialization graph cut algorithm translates the three selected optimal tracks by a proper amount to a non-lane area, initializes the graph cut algorithm, marks the longest track as a 'lane' seed point, defining the translated track as background seed points, namely defining Long-tracklist as a lane pixel subset and Left-tracklist and Right-tracklist as background pixel subsets, establishing a weighted graph of similarity between each pixel point and the foreground background by using graph cutting on the basis of the initial seed points, continuously iterating, calculating the global optimum of the minimum cut, and automatically extracting the lane.

Drawings

Fig. 1 is a flowchart of a lane extraction and identification method based on machine vision according to a preferred embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and embodiments.

Please refer to fig. 1 in combination, wherein fig. 1 is a flowchart illustrating a method of a lane extraction and identification method based on machine vision according to a preferred embodiment of the present invention;

In a specific implementation process, as shown in fig. 1, YOLO in YOLO 5 is a target detection algorithm based on a convolutional neural network, and can complete model training by using a large amount of data, and a network model of the YOLO vehicle detection algorithm is divided into four parts, namely an input end, a feature extraction network, a multi-scale feature fusion module and a prediction output end.

The input end comprises three parts of data enhancement, self-adaptive anchor frame calculation and self-adaptive picture scaling, and pictures are spliced by random cutting, random scaling and random distribution, so that a data set is enriched, and the universality of a network is improved; the self-adaptive anchor frame calculation is to calculate the optimal anchor frame value in different training sets in a self-adaptive manner; adaptive picture scaling is the unification of images into a standard size.

In the feature extraction network, the Focus structure of feature extraction slices the input image, and 32 convolution kernels are performed to obtain a feature map of 304 × 304 × 32, and Yolov5 refers to the CSP structure in a Yolov4 backbone network.

The multi-scale feature fusion module uses a structure of FPN (feature pyramid network) and PAN (pyramid attention network), wherein the FPN transmits and fuses feature information of a high layer from top to bottom to transmit strong semantic features, the PAN transmits strong positioning features from bottom to top, and the FPN and the PAN simultaneously use the structure to enhance the network feature fusion capability.

The prediction output end comprises a Boundingbox loss function and non-maximum value inhibition, and the loss calculation of the YOLO series is based on confidence coefficient loss, classification loss and positioning loss.

It should be noted that: the input end comprises three parts of data enhancement, self-adaptive anchor frame calculation and self-adaptive picture scaling, and pictures are spliced by random cutting, random scaling and random distribution, so that a data set is enriched, and the universality of a network is improved; the self-adaptive anchor frame calculation is to calculate the optimal anchor frame value in different training sets in a self-adaptive manner; the self-adaptive picture scaling is to unify the image into a standard size, in a feature extraction network, an input image is sliced by a Focus structure of feature extraction, a feature map of 304 × 304 × 32 is obtained after 32 convolution kernels, a Yolov5 refers to a CSP structure in a Yolov4 backbone network, a multi-scale feature fusion module uses an FPN (feature pyramid network) and PAN (pyramid attention network) structure, the FPN transfers and fuses feature information of a high layer from top to bottom to transfer strong semantic features, the PAN transfers strong positioning features from bottom to top, the two simultaneously use a reinforced network feature fusion capability, a prediction output end comprises a Boundingbox loss function and non-maximum value suppression, loss calculation of the Yoloo series is based on confidence loss, classification loss and positioning loss, Gloulos is used as loss of the Boundingbox, Gloulos is a distance measurement, input is coordinates of a prediction box and a real box, and output is a loss function value, the method is used for evaluating the position loss of a target frame and a prediction frame so as to measure the target detection accuracy, the video collected by a camera is used as the input of a YOLOv5 network, the vehicle target information in a scene is extracted, and the target position is calibrated by using the detection frame.

Referring to fig. 1, in step S2, the vehicle track extraction includes obtaining a center point of a target frame and determining a motion track, obtaining the center point of the target frame, selecting the target frame according to the obtained vehicle ID number, and recording position information of the vehicle target frame with x, y, w, and h, where x and y represent coordinates of an upper left corner of the target frame, and w and h represent a length and a width of the target frame, and then the center coordinates of the target frame are (x + w/2, y + h/2).

Determination of the motion track: tracking each frame to obtain a central coordinate, simultaneously establishing a new point set list, putting the position information of the continuous frames of the same vehicle target in the corresponding point set list, thus establishing a database of a plurality of vehicle position information, and finally fitting the position central points into a motion track according to an algorithm.

It should be noted that: the method comprises the steps of extracting a vehicle track, extracting and obtaining a target frame center point and determining a motion track, obtaining the target frame center point, selecting the target frame according to an obtained vehicle ID number, recording position information of the vehicle target frame by x, y, w and h, wherein x and y represent coordinates of the upper left corner of the target frame, w and h represent the length and width of the target frame, the center coordinates of the target frame are (x + w/2, y + h/2), the center point information of the target frame can be conveniently obtained, a database of a plurality of vehicle position information is conveniently established, and finally, the position center points are fitted into the motion track according to an algorithm.

Referring to fig. 1, the automatic lane extraction in step S3 includes automatically extracting lanes by combining the motion trajectory and the graph cut algorithm: and processing the obtained motion track information, and selecting an optimal track to initialize the background and the seed points of the lane in the graph cutting algorithm so as to realize the automatic extraction of the lane area.

The method comprises the steps of automatically extracting and selecting an optimal vehicle track from a lane, calculating the length of each vehicle track by a mathematical method according to central coordinates (x + w/2, y + h/2) i and (x + w/2, y + h/2) j, wherein i and j respectively represent a first frame and a last frame, averaging the lengths of a plurality of tracks, eliminating shorter tracks by a method of comparing the average values, traversing all reserved tracks, and finding the longest, leftmost and rightmost tracks as optimal tracks, wherein the three selected optimal tracks are translated to a non-lane area by an appropriate amount by an initialized graph cutting algorithm, the longest track is marked as a 'lane' seed point, the translated tracks are defined as background seed points, namely Long-tracklist is a lane pixel subset, Left-tracklist and Right-tracklist are pixel subsets of the background, and on the basis of the initial seed points, establishing a weighted graph of the similarity between each pixel point and the foreground background by using graph cuts, continuously iterating, and solving the global optimum of the minimum cut through calculation to automatically extract the lane.

It should be noted that: the length of each vehicle track is calculated by a mathematical method according to central coordinates (x + w/2, y + h/2) i and (x + w/2, y + h/2) j, i and j respectively represent a first frame and a last frame, the lengths of a plurality of tracks are averaged, a shorter track is removed by a method of comparing the average values, all the reserved tracks are traversed, the three longest, leftmost and rightmost tracks are found out to be the optimal tracks, the global optimum of the minimum cut can be conveniently calculated and solved, and automatic extraction of the lane is carried out.

It should be noted that: the scheme provided by the application can not only realize the automatic extraction of the lane, but also has good lane extraction effect in a complex environment, and in order to evaluate the segmentation performance, four measurement methods are adopted:

(1) cross-over ratio (IOU): ratio of the overlap area between the segmentation value and the real label to the union area between the two:

；

(2) dice coefficient (Dicecoefficient): a set similarity metric function, which is generally used to calculate the similarity between two samples, has a value range of [0, 1], where 1 represents the maximum similarity between prediction and reality:

；

(3) pixel accuracy p (precision): the percentage of correctly classified pixels in the image, that is, the proportion of correctly classified pixels to the total pixels:

；

(4) relative Volume Error (RVE): the segmentation graph is compared with the real labels to calculate error values, and a smaller value of RVE indicates better algorithm performance:

wherein, A is a background set of a segmentation algorithm, and B is a real background set of manual segmentation.

Compared with segmentation results of GrabCut and a threshold method, the segmentation results of the GrabCut and the threshold method have low pixel accuracy and high relative volume error, particularly the threshold segmentation results have the pixel accuracy as low as below 0.5, the requirement of road extraction cannot be met, the segmentation results of the applied algorithm are relatively accurate and stable, the accuracy of the applied algorithm is better, the effect is more stable, and the segmentation result data pairs of different algorithms are shown in a table 1.

The working principle of the lane extraction and identification method based on the machine vision provided by the invention is as follows:

when the system is used, firstly, a mature YOLOv5 network is used for completing vehicle detection, wherein YOLO is a target detection algorithm based on a convolutional neural network and can be used for completing model training by using a large amount of data, the network model of the YOLO vehicle detection algorithm is divided into four parts, namely an input end, a feature extraction network, a multi-scale feature fusion module and a prediction output end, the input end comprises three parts, namely data enhancement, self-adaptive anchor frame calculation and self-adaptive picture scaling, pictures are spliced by using random clipping, random scaling and random distribution, a data set is enriched, and the universality of the network is improved; the self-adaptive anchor frame calculation is to calculate the optimal anchor frame value in different training sets in a self-adaptive manner; the adaptive picture scaling is to unify the image into a standard size, in a feature extraction network, a Focus structure for feature extraction slices an input image, and after 32 convolution kernels, a feature map of 304 × 304 × 32 is obtained, a Yolov5 refers to a CSP structure in a Yolov4 backbone network, a multi-scale feature fusion module uses an FPN (feature pyramid network) and PAN (pyramid attention network) structure, the FPN transfers and fuses feature information of a high layer from top to bottom, and transfers strong semantic features, the PAN transfers strong positioning features from bottom to top, and the two simultaneously use a reinforced network feature fusion capability, a prediction output end comprises a Boundingbox loss function and non-maximum suppression, loss calculation of the Yolo series is based on confidence loss, classification loss and positioning loss; then, the central coordinate information of the detection target frame is used for fitting track information of the vehicle; and finally, using the generated vehicle track as an initial condition of a graph cut algorithm to finish automatic extraction of the lane, wherein the lane automatically extracts and selects an optimal vehicle track, the length of each vehicle track is calculated by a mathematical method according to central coordinates (x + w/2, y + h/2) i and (x + w/2, y + h/2) j, i and j respectively represent a first frame and a last frame, the lengths of a plurality of tracks are averaged, a shorter track is eliminated by a method of comparing the average values, all the reserved tracks are traversed, the longest, leftmost and rightmost tracks are found out to be the optimal tracks, wherein the three selected optimal tracks are translated to a non-lane area by an appropriate amount by the initialized graph cut algorithm, the longest track is marked as a 'lane' seed point, defining the translated track as background seed points, namely defining Long-tracklist as a lane pixel subset and Left-tracklist and Right-tracklist as background pixel subsets, establishing a weighted graph of similarity between each pixel point and the foreground background by using graph cutting on the basis of the initial seed points, continuously iterating, calculating the global optimum of the minimum cut, and automatically extracting the lane.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A lane extraction and identification method based on machine vision is characterized by comprising the following steps:

2. The machine vision-based lane extraction and identification method according to claim 1, wherein YOLO in YOLO 5 is a convolutional neural network-based target detection algorithm capable of completing model training by using a large amount of data, and the network model of the YOLO vehicle detection algorithm is divided into four parts, namely an input end, a feature extraction network, a multi-scale feature fusion module and a prediction output end.

3. The machine vision-based lane extraction and identification method according to claim 2, wherein the input end comprises three parts of data enhancement, adaptive anchor frame calculation and adaptive picture scaling, and pictures are spliced by using random clipping, random scaling and random distribution, so that a data set is enriched, and the universality of a network is improved; the self-adaptive anchor frame calculation is to calculate the optimal anchor frame value in different training sets in a self-adaptive manner; adaptive picture scaling is the unification of images into a standard size.

4. The machine vision-based lane extraction and identification method according to claim 2, wherein in the feature extraction network, a Focus structure of feature extraction slices an input image, and 32 convolution kernels are performed to obtain a 304 x 32 feature map, and Yolov5 is used for reference of a CSP structure in a Yolov4 backbone network.

5. The machine vision-based lane extraction and identification method according to claim 2, wherein the multi-scale feature fusion module uses a structure of FPN (feature pyramid network) and PAN (pyramid attention network), wherein the FPN fuses feature information transfer of a high layer from top to bottom to transfer strong semantic features, and the PAN transfers strong positioning features from bottom to top, and both of them use the enhanced network feature fusion capability.

6. The machine vision-based lane extraction and identification method according to claim 2, wherein the prediction output end comprises a Boundingbox loss function and non-maximum suppression, and the YOLO series loss calculation is based on confidence loss, classification loss and positioning loss.

7. The method for extracting and identifying lanes based on machine vision according to claim 1, wherein the vehicle trajectory extraction in step S2 includes obtaining a center point of a target frame and determining a motion trajectory, the obtaining of the center point of the target frame selects the target frame according to the obtained vehicle ID number, and records the position information of the target frame of the vehicle with x, y, w, and h, where x and y represent coordinates of the upper left corner of the target frame, and w and h represent the length and width of the target frame, and then the center coordinate of the target frame is (x + w/2, y + h/2).

8. The machine-vision-based lane extraction and identification method of claim 7, wherein the determination of the motion trajectory: tracking each frame to obtain a central coordinate, simultaneously establishing a new point set list, putting the position information of the continuous frames of the same vehicle target in the corresponding point set list, thus establishing a database of a plurality of vehicle position information, and finally fitting the position central points into a motion track according to an algorithm.

9. The machine vision-based lane extraction and recognition method according to claim 1, wherein the step S3 of automatically extracting lanes comprises automatically extracting lanes by combining a motion trajectory and a graph cut algorithm: and processing the obtained motion track information, and selecting an optimal track to initialize the background and the seed points of the lane in the graph cutting algorithm so as to realize the automatic extraction of the lane area.

10. The machine vision-based lane extraction and recognition method according to claim 1, wherein the lane automatic extraction selects an optimal vehicle trajectory, the length of each vehicle trajectory is mathematically calculated according to center coordinates (x + w/2, y + h/2) i and (x + w/2, y + h/2) j, i and j represent a first frame and a last frame, the lengths of a plurality of trajectories are averaged, a shorter trajectory is eliminated by comparing the average values, all the trajectories that remain are traversed, and the longest, leftmost and rightmost trajectories are found as optimal trajectories, wherein the initialization graph cut algorithm translates the selected three optimal trajectories by an appropriate amount to a non-lane area, the initialization graph cut algorithm marks the longest trajectory as a "lane" seed point, defining the translated track as background seed points, namely defining Long-tracklist as a lane pixel subset and Left-tracklist and Right-tracklist as background pixel subsets, establishing a weighted graph of similarity between each pixel point and the foreground background by using graph cutting on the basis of the initial seed points, continuously iterating, calculating the global optimum of the minimum cut, and automatically extracting the lane.