CN115620259A

CN115620259A - Lane line detection method based on traffic off-site law enforcement scene

Info

Publication number: CN115620259A
Application number: CN202211099628.2A
Authority: CN
Inventors: 李万清; 陈超强; 林永杰; 刘俊; 袁友伟; 陈鑫
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-09-07
Filing date: 2022-09-07
Publication date: 2023-01-17

Abstract

The invention discloses a lane line detection method based on a traffic off-site law enforcement scene. The method comprises the steps of forming a multi-feedback characteristic pyramid module based on ResNet50 and FPN, then filtering low-quality meaningless regression frames through pixel-level anchorless regression and combining hierarchical prediction and centrality calculation, improving lane line prediction rate, obtaining coordinates of rectangles outside lane lines, and obtaining a final lane line detection result in an image to be detected through spatial position relation filtering, clustering and straight line fitting in sequence. The invention can well detect the lane line in various complex environments such as shadow, uneven illumination, multi-vehicle interference, road surface rainwater, dirt, reflection and the like.

Description

Lane line detection method based on traffic off-site law enforcement scene

Technical Field

The invention belongs to the field of image processing, and particularly relates to a lane line detection method based on a traffic off-site law enforcement scene.

Background

An illegal behavior corresponding to a lane line in a traffic off-site law enforcement scene is illegal lane change. The illegal behavior judgment is an important part of artificial intelligence auxiliary judgment. The artificial intelligence judges that the illegal lane change needs to know the relationship between the position of the vehicle and the lane line and judges whether the vehicle moves from one side of the lane line to the other side, so that the position information of the lane line needs to be known. The illegal lane change can seriously affect the normal running of other vehicles, the number of the illegal behaviors is large, a plurality of law enforcement cameras carry out illegal snapshot on the illegal behaviors, and the increase of the law enforcement cameras represents the increase of the labeling workload, so that the lane line detection under the traffic off-site scene is extremely necessary.

Currently, most of research on lane line detection aims at an automatic driving scene, and the lane line detection research under a traffic off-site scene is very rare. However, the difference between the automatic driving scene and the traffic off-site scene is large, and the automatic driving scene cannot be directly applied to the traffic off-site scene, so that an applicable algorithm needs to be developed specially for the traffic off-site scene.

In summary, the lane line detection under the traffic off-site law enforcement scene has several difficulties as follows:

(1) The coming and going vehicles can block the lane lines, and particularly the large vehicles can not see the lane lines at all.

(2) The structure of the lane line is elongated, is not obvious in characteristics, and is easily confused with other objects.

(3) The actual traffic conditions are very complex, such as shadows and uneven illumination, road surface rainwater, stains, light reflection and the like, which all interfere with the detection of lane lines.

How to solve the problem of lane line detection in traffic off-site scenes is a technical problem to be solved urgently at present.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a lane line detection method based on a traffic off-site law enforcement scene. The invention adopts the following specific technical scheme:

a lane line detection method based on traffic off-site law enforcement scene comprises the following steps:

s1, obtaining a marked training data set, wherein each image sample comprises an image which is shot by a law enforcement camera and contains lane lines, endpoints at two ends of a center line of each lane line and lane line deviation are marked in the image in advance, and the lane line deviation is used for judging whether the lane lines are in main diagonal lines or auxiliary diagonal lines in an outer rectangular frame of the lane lines; the image samples in the training data set belong to different intersection scenes, and all the image samples are divided into a day image subset shot in the day and a night image subset shot at night according to shooting time;

s2, aiming at each image sample in the training data set, integrating two retention principles of less vehicles and more priority of daytime images in preference to nighttime images, screening and filtering all image samples under the same intersection scene by combining the gray value of the images and the number of the vehicles in the images, and respectively rejecting the image samples exceeding the threshold number aiming at each intersection scene;

s3, training a lane line detection network by using the training data set filtered by the S2 with the minimum loss function as a target;

the lane line detection network comprises an input module, a multi-feedback characteristic pyramid module, a classification regression module and an output module;

the input module is used for inputting the original image of the lane line to be detected into the network;

the multi-feedback characteristic pyramid module is formed by cascading a plurality of ResNet 50-based characteristic pyramid networks; the first characteristic pyramid model based on ResNet50 takes an original image as the only input, and the ResNet50 is used as a backbone network to extract multi-level characteristics from bottom to top and then outputs four characteristic graphs with different scales through the characteristic pyramid network; except for the first ResNet 50-based feature pyramid model, in each of the rest ResNet 50-based feature pyramid models, an original image is used as a first input, four different-scale feature maps output in the previous ResNet 50-based feature pyramid model are simultaneously received as a second input, in the process of extracting the feature maps from bottom to top from the first input by a ResNet50 backbone network, after each feature map of one scale is extracted, the feature maps of the corresponding scale in the second input are connected, and then are convolved by 3 x 3 holes to be used for extracting the feature map of the next scale, and after four different-scale feature maps output by the feature pyramid network in the current ResNet 50-based feature pyramid model are fused with the second input in the corresponding levels, four different-scale feature maps are finally output to be used as the final output of the current ResNet 50-based feature pyramid model;

four feature graphs with different scales finally output in the multi-feedback feature pyramid module are used as the input of the classification regression module; the classification regression module comprises a classification network and a regression network, wherein the classification network is used for carrying out secondary classification on each pixel point in each characteristic graph and outputting a classification label and a confidence coefficient of whether a mapping area of each pixel point in an original image belongs to a lane line; the regression network is used for obtaining 6 prediction parameters of each pixel point in each characteristic graph through regression, and the prediction parameters comprise distance values from the center of a mapping area of each pixel point in an original image to four boundaries of a lane line outsourcing rectangular frame, lane line deviation and centrality;

in the output module, firstly, the two classification results of the classification network are utilized to carry out primary screening on pixel points in each feature map, pixel points which do not belong to a lane line are eliminated, then, the remaining pixel points which belong to the lane line are screened for the second time, the pixel points of which the distance values exceed the threshold range are eliminated, then, the remaining pixel points are subjected to non-maximum value inhibition based on the corresponding confidence degrees, finally, the prediction result of a lane line outer-wrapped rectangular frame in an original image is obtained, and each outer-wrapped rectangular frame in the prediction result is combined with the lane line deviation to determine the diagonal line of the corresponding outer-wrapped rectangular frame to serve as the final output lane line prediction result;

s4, inputting the image to be detected containing the lane line into the trained lane line detection network, and outputting a lane line prediction result in the image to be detected;

s5, filtering the lane line prediction result in the image to be detected, and removing line segments which do not meet the spatial position relation of the lane line in the image;

s6, dividing each lane line which is filtered in the S5 into a series of points, clustering the points of all the lane lines together, and clustering the points belonging to the same lane line to the same cluster;

and S7, performing straight line fitting on the points in each cluster, and taking the line segment obtained by fitting as a final lane line detection result in the image to be detected.

Preferably, the specific method of S2 is as follows:

s21, converting each image sample in the training data set from an RGB image into a gray image, then calculating the gray mean value of all pixels in each image sample, and then calculating the mean value of the gray mean values of all image samples in each subset aiming at a day image subset and a night image subset respectively to be used as the average brightness of the corresponding subset; taking the average value of the average brightness of the two subsets as a brightness distinguishing threshold value for distinguishing daytime and night;

s22, carrying out vehicle detection on each image sample in the training data set by using the trained target detection model to obtain the number of vehicles in each image sample, then calculating the average number of vehicles in all the image samples in the training data set, and finally calculating the vehicle weight of each image sample as the ratio of the number of vehicles in the image sample to the average number of vehicles multiplied by the average brightness of the daytime image subset;

s23, calculating the mass weight =255+ lambda alpha-gray-beta-weight of each image sample in the training data set according to the brightness distinguishing threshold and the vehicle weight, wherein gray represents the gray level mean value of all pixels in the currently calculated image sample, the weight represents the vehicle weight corresponding to the currently calculated image sample, alpha and beta are two weights respectively, lambda is a weight determined by the brightness distinguishing threshold bound and gray, lambda = lambda 1 if gray is greater than or equal to bound, lambda = lambda 2 if gray is less than bound, lambda 1+ lambda 2=1, and lambda 1 is greater than lambda 2;

s24, aiming at all image samples under each intersection scene in the training data set, sequencing the image samples according to respective quality weights, if the number of the image samples under one intersection scene exceeds the threshold number, keeping the image samples meeting the threshold number from large to small according to the quality weights, and if the number of the image samples under one intersection scene does not exceed the threshold number, keeping all the image samples.

Preferably, the weights α and β are 1 and 2, respectively, and the weights λ 1 and λ 2 are 0.6 and 0.4, respectively.

Preferably, the centrality of each pixel point is calculated by the distance value from the center of the mapping area of the pixel point in the original image to four boundaries of the lane line outsourcing rectangular frame, and the calculation formula is as follows:

in the formula: centra represents the centrality of a pixel point, and (l, t, r, b) is the distance value from the center of a mapping area of the pixel point in an original image to four boundaries of a lane line outer-wrapping rectangular frame; the range of the centrality of one pixel point is 0 to 1, the larger the centrality of the pixel point is, the larger the corresponding confidence coefficient is, and the smaller the centrality of the pixel point is, the smaller the corresponding confidence coefficient is.

Preferably, the loss function of the lane line detection network is as follows:

in the formula: n represents the number of classification labels; lambda, mu and nu respectively represent weight values occupied by the three losses; l is a radical of an alcohol _cls Representing a focus loss for calculating a class prediction error;

represents the prediction box class at location (x, y), c _x,y A real frame class at the position (x, y) is represented, and the class value is 1 when the real frame class belongs to the lane line and 0 when the real frame class does not belong to the lane line; l is _reg Representing the IOU loss, for calculating the error between the prediction box and the real box;

is a decision function if

If the value is more than 0, the function value is 1, otherwise, the function value is 0; t is t _x,y 4 distances representing the position (x, y) to the 4 boundaries of the real box,

4 distances representing the position (x, y) to the 4 boundaries of the prediction box; l is _ctr A binary cross entropy loss function representing the position centrality for calculating the distance, s, between the predicted frame and the central point of the real frame _x,y The centrality of the position (x, y) to the centre point of the real box,

the centrality of the position (x, y) to the central point of the prediction box; l is _lean A binary cross entropy loss function representing inclination deviation, which is used for calculating the deviation of the lane line in the outsourcing rectangular frame;

indicating the predicted bias value, l, for position (x, y) _x,y And (3) representing a real deviation value corresponding to the position (x, y), wherein the deviation value is 1 when the lane line is positioned on the main diagonal of the outer-wrapping rectangular frame, and the deviation value is 0 when the lane line is positioned on the auxiliary diagonal.

Preferably, in the output module, the remaining pixel points belonging to the lane line are screened for the second time, and a specific method for eliminating the pixel points whose distance value exceeds the threshold range is as follows:

sets its five thresholds d ₀ 、d ₁ 、d ₂ 、d ₃ 、d ₄ 0, 64,128,256 and ∞ in sequence, for the ith layer feature map F output by the multi-feedback feature pyramid module _i I =1,2,3,4 if F _i Corresponding to one pixel point in (l) is calculated ⁽ⁱ⁾ ,t ⁽ⁱ⁾ ,r ⁽ⁱ⁾ ,b ⁽ⁱ⁾ ) Satisfies max (l) ⁽ⁱ⁾ ,t ⁽ⁱ⁾ ,r ⁽ⁱ⁾ ,b ⁽ⁱ⁾ )＜d _i-1 Or max (l) ⁽ⁱ⁾ ,t ⁽ⁱ⁾ ,r ⁽ⁱ⁾ ,b ⁽ⁱ⁾ )＞d _i Then the pixel point is eliminated.

Preferably, the multi-feedback feature pyramid module is formed by cascading 3 feature pyramid networks based on ResNet 50.

Preferably, in S5, when filtering the result of predicting the lane line in the image to be detected, the spatial position relationship that the lane line should satisfy in the image is that the included angle between the lane line and the horizontal line is not within the range of 45 to 90 degrees.

Preferably, in S6, a specific method of collectively clustering the points of all the lane lines includes:

s61, putting all pixel points into a first set initialized to be empty;

s62, randomly taking out a pixel point from the current first set and adding the pixel point into the initialized empty second set;

s63, traversing all the pixel points in the first set, judging whether a pixel point with a distance smaller than the maximum clustering distance exists in the second set, and if so, adding the current traversed pixel point into the second set; the maximum clustering distance is the maximum distance value allowed between adjacent pixel points on one lane line in the image;

s64, continuously repeating the step S63 until no new pixel points are added into the second set, and taking all the pixel points in the second set as a cluster type cluster, wherein the pixel points in the cluster type cluster belong to the same lane line;

and S65, continuously repeating S62-S64 until all the pixel points in the first set are divided into clustering clusters to obtain pixel points corresponding to each lane line.

Preferably, in S7, the straight line fitting is implemented by using a RANSAC algorithm.

Compared with the prior art, the invention has the following beneficial effects:

1. the method has a good effect on detecting the lane line in a complex scene. The invention carries out weight calculation on the pictures in the same scene during the construction of the data set, screens the pictures with few vehicles (less vehicles shield the lane lines) and good light environment (the light effect in the day is better than that in the night) to participate in training,

2. the invention has very high accuracy of detecting and identifying the lane line. Because the lane line is easily shielded by vehicles coming and going, the lane line detection method and the device have the advantages that the lane line detection is simultaneously carried out on a plurality of pictures in the same scene, then the detection results of all the pictures are clustered and fitted, and the detection accuracy in the scene is greatly improved.

Drawings

FIG. 1 is an exemplary diagram of an illegal lane change image in an embodiment.

FIG. 2 is an unmarked original image in an embodiment.

FIG. 3 is a center line drawing in the example.

Fig. 4 is a set of reordering pictures according to an embodiment.

FIG. 5 is an architecture diagram of the ODL-Net network model in the embodiment.

Fig. 6 is a diagram of a model architecture of a ResNet50 network in an embodiment.

Fig. 7 is an embodiment of a ResNet50 based FPN network.

Fig. 8 is an MFP module in the embodiment.

FIG. 9 is an expanded view of the MFP process in the embodiment.

Fig. 10 is a diagram of a feedback connection process between FPNs in the embodiment.

FIG. 11 is a pixel prediction bounding box in an embodiment.

Fig. 12 is a graph of model prediction results in the example.

Fig. 13 is a RANSAC fitting straight line diagram in the example.

FIG. 14 is a diagram showing the detection results of the first ODL-Net algorithm in the example.

FIG. 15 is a graph showing the detection results of the second ODL-Net algorithm in the example.

FIG. 16 is a diagram showing the detection results of the third ODL-Net algorithm in the example.

FIG. 17 is a diagram showing the detection results of the fourth ODL-Net algorithm in the embodiment.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, embodiments accompanying figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. The technical characteristics in the embodiments of the present invention can be combined correspondingly without mutual conflict.

In the description of the present invention, it should be understood that when an element is referred to as being "connected" to another element, it can be directly connected to the other element or be indirectly connected to the other element, i.e., intervening elements may be present. In contrast, when an element is referred to as being "directly connected to" another element, there are no intervening elements present.

In the description of the present invention, it is to be understood that the terms "first", "second", and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature.

In a preferred embodiment of the present invention, a lane line detection method based on a traffic off-site law enforcement scenario is provided, and specific steps are shown in S1 to S7, and specific implementation manners of the steps are described in detail below.

S1, obtaining a marked training data set, wherein each image sample comprises an image which is shot by a law enforcement camera and contains lane lines, endpoints at two ends of a center line of each lane line and lane line deviation are marked in the image in advance, and the lane line deviation is used for judging whether the lane lines are in main diagonal lines or auxiliary diagonal lines in an outer rectangular frame of the lane lines; the image samples in the training data set belong to different intersection scenes, and all the image samples are divided into a day image subset shot in the day and a night image subset shot at night according to the shooting time.

S2, aiming at each image sample in the training data set, two retention principles of less vehicles and more priority of daytime images in preference to nighttime images are integrated, all image samples under the same intersection scene are screened and filtered by combining the gray value of the images and the number of the vehicles in the images, and the image samples exceeding the threshold number are respectively removed aiming at each intersection scene.

In the present invention, the specific method of S2 is as follows:

s21, converting each image sample in the training data set from an RGB image into a gray image, then calculating the gray mean value of all pixels in each image sample, and then calculating the mean value of the gray mean values of all image samples in each subset aiming at a day image subset and a night image subset respectively to be used as the average brightness of the corresponding subset; the average of the average luminances of the two subsets is used as a luminance distinguishing threshold for distinguishing daytime and night.

S22, performing vehicle detection on each image sample in the training data set by using the trained target detection model to obtain the number of vehicles in each image sample, then calculating the average number of vehicles in all the image samples in the training data set, and finally calculating the vehicle weight of each image sample as the ratio of the number of vehicles in the image sample to the average number of vehicles multiplied by the average brightness of the daytime image subset.

And S23, calculating a mass weight =255+ λ α gray- β carWeight of each image sample in the training data set according to the brightness discrimination threshold and the vehicle weight, wherein gray represents a gray level mean value of all pixels in the currently calculated image sample, carWeight represents a vehicle weight corresponding to the currently calculated image sample, α and β are two weights respectively, λ is a weight determined by the brightness discrimination threshold bound and the gray, λ = λ 1 if the gray is greater than or equal to bound, λ = λ 2 if the gray is less than bound, λ 1+ λ 2=1, and λ 1 > λ 2.

In the present invention, the above-mentioned weights α and β are preferably 1 and 2, respectively, and the weights λ 1 and λ 2 are preferably 0.6 and 0.4, respectively.

S24, sequencing all image samples in each intersection scene in the training data set according to respective quality weights, if the number of the image samples in one intersection scene exceeds the threshold number, keeping the image samples meeting the threshold number in the sequence from large to small according to the quality weights, and if the number of the image samples in one intersection scene does not exceed the threshold number, keeping all the image samples.

And S3, training the lane line detection network by using the training data set filtered by the S2 with the minimum loss function as a target.

The lane line detection network comprises an input module, a multiple feedback characteristic pyramid module, a classification regression module and an output module, and specific functions and implementation forms of the four modules are described below.

In the invention, the input module is used for inputting the original image of the lane line to be detected into the network, the original image input in the model training stage is the image sample, and the original image input in the inference or actual application stage is the image to be detected.

In the invention, the multi-feedback characteristic pyramid module is formed by cascading a plurality of characteristic pyramid networks based on ResNet 50; the first characteristic pyramid model based on ResNet50 takes an original image as the only input, and ResNet50 is used as a backbone network to extract multi-level characteristics from bottom to top and then outputs four characteristic graphs with different scales through the characteristic pyramid network; in addition to the first characteristic pyramid model based on ResNet50, in each of the rest characteristic pyramid models based on ResNet50, an original image is used as a first input, four different-scale characteristic diagrams output from the previous characteristic pyramid model based on ResNet50 are simultaneously received as a second input, in the process of extracting the characteristic diagrams from bottom to top from the first input by a ResNet50 backbone network, after each characteristic diagram of one scale is extracted, the characteristic diagram is connected with the characteristic diagram of the corresponding scale in the second input and then is convolved by a 3-by-3 hole to be used for extracting the characteristic diagram of the next scale, and after four different-scale characteristic diagrams output by the characteristic pyramid network in the current characteristic pyramid model based on ResNet50 are fused with the second input in corresponding levels, four different-scale characteristic diagrams are finally output to be used as the final output of the current characteristic pyramid model based on ResNet 50.

Preferably, the multi-feedback feature pyramid module is formed by cascading 3 feature pyramid networks based on the ResNet 50.

It should be noted that, in the process of splicing and connecting four feature maps of different scales output by a previous feature pyramid model based on the ResNet50 and a feature map extracted from the back ResNet50 backbone network from bottom to top, correspondence of the scales needs to be maintained so that the two feature maps can be connected in white (Concat). Similarly, when the four feature maps with different scales output by the feature pyramid network in the current feature pyramid model based on the ResNet50 are fused with the four feature maps in the second input, correspondence of scales also needs to be considered, so that the two feature maps can be fused (Fusion).

In the invention, four different-scale feature maps finally output in the multi-feedback feature pyramid module are used as the input of the classification regression module; the classification regression module comprises a classification network and a regression network, wherein the classification network is used for carrying out secondary classification on each pixel point in each feature map and outputting a classification label and a confidence coefficient of whether a mapping area of each pixel point in an original image belongs to a lane line; the regression network is used for obtaining 6 prediction parameters of each pixel point in each characteristic graph through regression, and the prediction parameters comprise distance values from the center of a mapping area of each pixel point in an original image to four boundaries of a lane line outsourcing rectangular frame, lane line deviation and centrality.

In the present invention, both the classification network and the regression network can be realized by 4-layer cascade 1 × 1 convolution. The classification network outputs a characteristic diagram of 1 channel through four cascaded 1 multiplied by 1 convolutions, and the characteristic diagram represents whether a certain pixel is located in the range of the lane line. The regression network outputs a characteristic diagram of 6 channels through four cascaded 1 × 1 convolutions, each pixel can obtain 6 parameters which respectively represent the distance value (l, t, r, b) from the center of a mapping area of each pixel point in an original image to four boundaries of a lane line outer-wrapping rectangular frame, the lane line deviation (marking the main diagonal line or the auxiliary diagonal line of the lane line outer-wrapping rectangular frame) and the center degree.

In the invention, the centrality of each pixel point is calculated by the distance value from the center of the mapping area of the pixel point in the original image to four boundaries of the lane line external rectangular frame, and the calculation formula is as follows:

in the formula: centra represents the centrality of a pixel point, and (l, t, r, b) is the distance value from the center of a mapping area of the pixel point in an original image to four boundaries of a lane line outer-wrapping rectangular frame; the range of the centrality of one pixel point is 0 to 1, the larger the centrality of the pixel point is, the larger the corresponding confidence is, and the smaller the centrality of the pixel point is, the smaller the corresponding confidence is.

In the invention, in the output module, firstly, the binary result of the classification network is utilized to carry out primary screening on pixel points in each characteristic diagram, pixel points which do not belong to a lane line are eliminated, then, the remaining pixel points which belong to the lane line are screened for the second time, the pixel points of which the distance value exceeds the threshold range are eliminated, then, the remaining pixel points are subjected to non-maximum value inhibition based on the corresponding confidence coefficient, finally, the prediction result of a lane line outer-wrapped rectangular frame in an original image is obtained, and each outer-wrapped rectangular frame in the prediction result is combined with the lane line deviation to determine the corresponding outer-wrapped rectangular frame diagonal line to serve as the final output lane line prediction result.

In the present invention, in the output module, the specific method of performing the second screening on the remaining pixel points belonging to the lane line and rejecting the pixel points whose distance value exceeds the threshold range is as follows:

sets its five thresholds d ₀ 、d ₁ 、d ₂ 、d ₃ 、d ₄ Sequentially 0, 64,128,256 and ∞ for the ith layer feature map F output by the multi-feedback feature pyramid module _i I =1,2,3,4 if F _i Corresponding to one pixel point in (l) is calculated ⁽ⁱ⁾ ,t ⁽ⁱ⁾ ,r ⁽ⁱ⁾ ,b ⁽ⁱ⁾ ) Satisfies max (l) ⁽ⁱ⁾ ,t ⁽ⁱ⁾ ,r ⁽ⁱ⁾ ,b ⁽ⁱ⁾ )＜d _i-1 Or max (l) ⁽ⁱ⁾ ,t ⁽ⁱ⁾ ,r ⁽ⁱ⁾ ,b ⁽ⁱ⁾ )＞d _i Then the pixel point is eliminated.

In the training process, the loss function of the lane line detection network is as follows:

in the formula: n represents the number of classification labels; lambda, mu and nu respectively represent weight values occupied by the three losses; l is _cls Representing a focus loss for calculating a class prediction error;

represents the prediction box class at location (x, y), c _x,y A real frame class at the position (x, y) is represented, and the class value is 1 when the real frame class belongs to the lane line and 0 when the real frame class does not belong to the lane line; l is _reg Representing IOU penalty, for calculating predictionMeasuring the error between the frame and the real frame;

is a decision function if

If the value is more than 0, the function value is 1, otherwise, the function value is 0; t is t _x,y Represents the 4 distances of the location (x, y) to the 4 boundaries of the real box,

the centrality of the position (x, y) to the centre point of the prediction box; l is a radical of an alcohol _lean A binary cross entropy loss function representing inclination deviation, which is used for calculating the deviation of the lane line in the outsourcing rectangular frame;

Preferably, the weight values λ, μ, and ν are all 1.

It should be noted that the prediction frame and the real frame refer to the lane line outsourcing rectangle frame obtained by prediction and actually labeled respectively.

And S4, inputting the image to be detected containing the lane line into the trained lane line detection network, and outputting a lane line prediction result in the image to be detected.

And S5, filtering the lane line prediction result in the image to be detected, and removing line segments which do not meet the spatial position relation of the lane line in the image.

In the invention, when the lane line prediction result in the image to be detected is filtered, the spatial position relation which the lane line should meet in the image is that the included angle between the lane line and the horizontal line is not in the range of 45-90 degrees.

And S6, dividing each lane line which is filtered in the S5 into a series of points, clustering the points of all the lane lines together, and clustering the points belonging to the same lane line into the same cluster.

In the invention, the specific method for clustering all the points of the lane lines together comprises the following steps:

s61, putting all pixel points into a first set initialized to be empty;

s64, continuously repeating the step S63 until no new pixel points are added into the second set, taking all the pixel points in the second set as a cluster type cluster, wherein the pixel points in the cluster type cluster belong to the same lane line;

In the invention, the straight line fitting is realized by using RANSAC algorithm.

The lane line detection method based on the traffic off-site law enforcement scene shown in S1 to S7 is applied to a specific example to show a specific implementation process and an available technical effect thereof.

Examples

In this embodiment, the lane line detection method based on the traffic off-site law enforcement scenario shown in S1 to S7 is specifically implemented through the following processes:

step 1. Making a data set

The image data targeted by the embodiment is real data in a traffic off-site law enforcement scene, and all the data are illegal vehicle data shot by traffic monitoring equipment (law enforcement camera). The data was obtained from 968 devices, for a total of 5247 images, and a typical lane-change violation image containing the lane line, as captured by a law enforcement camera, is shown in fig. 1. The invention provides a lane marking method different from the existing method aiming at traffic off-site law enforcement scenes.

In this embodiment, a tool used for labeling the image sample is label. Different from the existing target detection labeling method and semantic segmentation labeling method, the existing target detection labeling method usually uses a LabelImg tool, and the target is framed by a rectangular frame which is just attached to the edge of the target. In the existing semantic segmentation labeling method, a plurality of points are labeled along the edge of a target, so that the points form a closed area, and the target is wrapped in the closed area, so that pixel-level labeling is realized, and the labeling is quite troublesome.

In this embodiment, the lane line to be detected is a straight line, but the straight line cannot be labeled by using a rectangular frame for target detection, and it is too cumbersome to use a semantic segmentation method, so in this embodiment, a line segment form is used to label the straight line. The centerline of the lane line is marked using the line segment marking of the marking tool Labelme. The reason that the central line is marked instead of the whole lane line segment is that the lane line in the real world does not have thickness as the line segment in the concept, and the lane line is a slender quadrangle instead of a line, so that only the middle points of the short line segments at the two ends of the lane line need to be marked during actual marking, and the connection of the two middle points is the central line representing the lane line. Fig. 2 is an unmarked original traffic image, and fig. 3 is a marked image after marking the center line. It can be seen that the labeling is very simple, requiring only two points.

After the annotation is completed, the label automatically saves the annotation result into a JSON (JavaScript Object notification) file.

In this embodiment, the lane line is detected by adopting the idea of target detection, that is, the lane line is detected as a diagonal line of the rectangular frame, so that the existing line segment label needs to be converted into a rectangular frame label. Firstly, reading information of two points from a JSON file, then sequencing the two points, and setting a point with a smaller abscissa as (x 1, y 1) and a point with a larger abscissa as (x 2, y 2); then, the slope of the line segment is calculated according to the following formula:

if the slope is positive, the rectangular box coordinates [ top, left, bottom, right ] are [ y1, x2, y2, x1], and the set angle is biased to 0. On the contrary, [ y2, x2, y1, x1], the angular deviation is set to 1. In order to prevent the straight line from being vertical and the slope cannot be calculated, when the abscissa of two points is the same, the slope is not calculated, and the angle deviation is set to 1. Because the rectangular frame has two diagonal lines, the angle is biased to 0 when the line segment is the major diagonal line of the rectangular frame and biased to 1 when the line segment is the minor diagonal line of the rectangular frame. The angle deviation represents the deviation of the lane line in the outsourcing rectangular frame, and when the outsourcing rectangular frame is obtained through subsequent detection, whether the lane line is positioned on the main diagonal line or the auxiliary diagonal line of the rectangle can be determined according to the deviation mark.

And then regenerating the converted labeling information into a labeling file. The final markup document content includes an image name, a target category, a list of location information (top, left, bottom, right), and lane line bias.

Step 2, data preprocessing

The scene targeted by the embodiment is a traffic off-site scene, wherein each image sample is obtained by a law enforcement camera above an intersection through overhead shooting of the intersection. Therefore, under the same law enforcement camera, the background of the image data is always the same, and only the positions of the person and the vehicle are inconsistent. Each illegal vehicle corresponds to three pictures, that is, under the same equipment, a large amount of traffic image data in the same background can be obtained, and only the positions of people and the positions of vehicles are different. Under the background, the influence of various complex environments such as vehicle shielding, night, shadow and uneven illumination, road surface rainwater, dirt and reflection, vehicle shielding and the like on the detection of the lane line is considered, so that image data under relatively ideal environments with less vehicle shielding and good weather conditions need to be selected as much as possible during the data set manufacturing, the data set needs to be subjected to weight judgment and screening, and the detection rate of the lane line is improved.

In order to better judge the image quality, the embodiment combines the gray value of the image and the number of vehicles in the image, adopts a quality weight formula, calculates the quality values of all traffic images in the same equipment through the quality weight formula, the higher the quality value is, the higher the image quality is, then sorts the images through the image quality, and preferentially uses the high-quality images to detect the lane lines on the premise of enough image number. The quality of the pictures is screened through the quality weight, and the detection rate of the lane line can be improved.

Step 2.1 calculating day and night data thresholds

Firstly, dividing all image data in a data set into two batches of data, wherein one batch is day image data (called day image subset) and one batch is night image data (called night image subset), respectively calculating the average brightness of the two batches of image data, and then obtaining a brightness distinguishing threshold value capable of distinguishing day from night by calculating the arithmetic average value of the day image data and the night image data, wherein the calculation formula is as follows:

Gray＝R*0.299+G*0.587+B*0.114 (2)

wherein, R, G, B in the formula (2) represent three channels of the image, a single pixel calculates a Gray value through the values of the three channels, and Gray represents the Gray value of the single pixel. Equation (3) is used to calculate the average gray level of all pixels in an image, and h and w are the height and width of the image, respectively. Equation (4) represents calculating the average luminance (i.e., the average of the gray-scale means of all images) of a batch of image data, D represents a subset of data (which may be a batch of day data or a batch of night data), and n represents the number of image samples in the subset. In formula (5), bright _day Representing the average brightness, bright, of a subset of the daytime images _night The luminances of the night image subset, both of which can be calculated by equation (4), and bound represents a luminance discrimination threshold value for discriminating between day and night.

Step 2.2 calculating vehicle weight threshold based on vehicle quantity

And (3) performing transfer learning on the scene of the invention by using the yolov3 pre-training model to obtain a target detection model applicable to the scene of the invention. And detecting the image data by using the model to obtain the number of vehicles and the coordinates of the vehicles in the image data, and storing the coordinates into a JSON file. In the JSON file format, the first two values in each list represent the top left coordinate of the vehicle, and the last two values represent the bottom right coordinate of the vehicle.

And calculating the vehicle weight according to the number of the image vehicles, wherein the calculation formula is as follows:

wherein equation (6) calculates the number of all imagesThe average number of vehicles avg in the data set, n represents the number of images in the data set, and S represents the data set. Equation (7) calculates the vehicle weight of the image. carWeight represents the weight of vehicles in the current picture, carNum represents the number of vehicles in the current picture, bright _day The average brightness of the subset of the daytime image is represented, where the daytime image gray average bright is used _day In order to improve the influence of the vehicle weight on the image quality.

And 2.3, finally, calculating the mass weight of each image sample in the training data set according to the brightness distinguishing threshold and the vehicle weight, wherein the calculation formula is shown as the following formula (8):

geight＝255+λ·α·gray-β·carWeight (8)

where α and β are the weight occupied by the brightness and the weight occupied by the number of vehicles, respectively, and these two values are set to 1 and 2 in the present embodiment, because the influence of the number of vehicles on the lane lines is much greater than the influence of the daytime and the nighttime, the vehicle weight ratio is increased. gray represents the average gray value of the image, carWeight represents the vehicle weight of the image calculated by the formula (7), lambda is changed according to the value of the gray, and when the value of the gray is greater than or equal to the boundary value of the day and the night, the image is considered to be the day, and the value of the lambda is 0.6; when the value of gray is smaller than the boundary value, the image is considered to be in the dark, and the value of λ is 0.4, which is mainly to increase the weight of the daytime image, but since the influence of the daytime and dark is not so large, the daytime weight is only increased to 0.6, and the dark weight is set to 0.4.

In this embodiment, image data shot under the same law enforcement camera, that is, the same intersection scene, is sequenced through the quality weight, and therefore images with poor quality are sequenced instead of being directly rejected by setting a quality weight threshold, because there are sometimes only images at night or only images with a large number of vehicles, in this case, once the quality weight threshold is rejected, all the images are rejected, and lane marking detection cannot be performed. Therefore, the invention adopts a sequencing mode to obtain high-quality images by selecting the images with higher quality weight so as to improve the detection effect of the lane line.

In this embodiment, after the quality weights of all images in the scene are calculated, the image samples are sorted according to the quality weights, then the first 6 image samples are taken according to the weight sequence from small to large and retained, and the rest image samples are deleted from the data set. The threshold value of the number of the reserved image samples is 6, and is obtained through multiple experiments, so that the highest lane line identification rate and the lowest error rate can be ensured. If the image data is less than 6, the quality formula sorting is not needed, and all the images are reserved.

Fig. 4 shows a group of pictures to check the image quality ranking effect of this step, (a) there are few white overhead cars for images, (b) there are few night cars for images, (c) there are many white overhead cars for images, and (d) there are many black-night cars for images. Since it is difficult to find an image satisfying the above 4 conditions at the same time on the same device, the 4 images come from 4 different devices, and the quality weighting is performed on the images according to the weighting formula, which shows that the quality weighting formula of the present invention is more focused on the images with fewer vehicles.

Step 3, building of lane line detection network ODL-Net based on multiple feedback characteristic pyramid

The lane line detection network based on the multi-feedback characteristic pyramid is named as ODL-Net, and the network model architecture diagram is shown in FIG. 5.

The backbone network backbone adopted in this embodiment is ResNet50, and in order to adapt to subsequent processing, the ResNet50 needs to remove the last FC layer. The structure of the ResNet50 model is shown in FIG. 6. The ResNet50 is added with a residual network, so that the problems that parameters of a deep network cannot be learned and gradient dispersion occurs are solved under the conditions that extra parameters are not added and the calculation complexity is not increased, the network can develop to a deeper layer, and the feature extraction effect is improved. Meanwhile, the lane lines detected by the embodiment are slender continuous-shaped structures and have strong spatial relationship, but the appearance clues are few, and the spatial hierarchy information is often lost by common convolution, so that the image detail characteristics are lost. In order to extract features better, in this embodiment, all standard 3 × 3 convolutions in the Resnet50 are converted into a 3 × 3 void convolution module, and the void convolution module is formed by superposing a void convolution result with an expansion coefficient of 1 and a void convolution result with an expansion coefficient of 3, and is used to improve the receptive field range of each pixel in the feature map, so that the performance of the feature extraction network is greatly improved, and the recognition rate of lane lines is finally improved.

Step 3.1 Multi-feedback feature pyramid Module construction

In order to extract the linear features of the lane lines more effectively, the embodiment provides a multi-feedback feature pyramid (MFP) module, which performs feature extraction on the pictures for multiple times and performs multi-scale feature fusion, so that the context information of the lane lines is extracted more thoroughly and deeply, and the detection effect is greatly improved.

The multi-feedback feature pyramid (MFP) module proposed in this embodiment is built on a Feature Pyramid Network (FPN) based on ResNet50, and the structure of the FPN based on ResNet50 is shown in fig. 7. By incorporating the additional feedback connection of the FPN into the bottom-up backbone layer, a feature extraction network, i.e. MFP module, is obtained, as shown in fig. 8. It may extract features from the image multiple times. The MFP module obtains a more powerful feature extraction capability by multi-extracting image features. Similar to RFP, features received by the feedback connection directly from the top layer of the network are imported into the bottom layer of the bottom-up ResNet50 to speed training and improve performance. The MFP module provided by the embodiment realizes multiple feature extraction, the backbone and the FPN run for multiple times from bottom to top, and the final output features depend on the features in the previous steps. The MFP process is expanded as shown in FIG. 9. The feedback connection between MFPs is shown in fig. 10. In the embodiment, the MFP only uses the triple feature pyramid, the accuracy is improved less than that of the triple feature pyramid, the parameter is increased, and the training and detecting speeds are reduced. The specific data processing flow in the triple-feature pyramid is as follows:

the first characteristic pyramid model based on ResNet50 takes an original image as the only input, and ResNet50 is used as a backbone network to extract multi-level characteristics from bottom to top and then outputs four characteristic graphs with different scales through the characteristic pyramid network; in addition to the first characteristic pyramid model based on ResNet50, in each of the rest characteristic pyramid models based on ResNet50, an original image is used as a first input, four different-scale characteristic diagrams output in the previous characteristic pyramid model based on ResNet50 are simultaneously received as a second input, in the process of extracting the characteristic diagrams from bottom to top from the first input by a ResNet50 backbone network, after each characteristic diagram of one scale is extracted, the characteristic diagram is connected with the characteristic diagram of the corresponding scale in the second input and then is subjected to 3-by-3 hole convolution for extracting the characteristic diagram of the next scale, and after four different-scale characteristic diagrams output by the characteristic pyramid network in the current characteristic pyramid model based on ResNet50 are subjected to corresponding level fusion with the second input, four different-scale characteristic diagrams are finally output as the final output of the current characteristic pyramid model based on ResNet 50;

in the embodiment, the standard 3 × 3 convolution of all the pyramids is changed into the cavity convolution module, so that the spatial information can be better acquired, and the lane line detection probability is improved; meanwhile, BN can be added between every two golden towers, so that the learning convergence speed is increased, and the training speed is accelerated.

Step 3.2 regression prediction

In this embodiment, coordinate regression is performed on the target in an anchor-frame-free manner. Currently, most target detection models, such as SSD, refindet, YOLO series, RCNN series, retinaNet, etc., use anchors or propusal to predict targets, that is, predefined anchors or propusal are calculated in advance to match targets, and each anchor or propusal is predicted once.

For lane line detection under the scenario of the embodiment, the anchor may cause the following problems: the size and the dimension of the anchor are fixed, and when the detection target changes greatly, the anchor influences the detection effect and influences the generalization capability; a large number of anchor frames can generate a large number of negative samples, so that the positive and negative samples are unbalanced; meanwhile, a large amount of IOU calculation is needed, so that the performance of the model is influenced, the training speed is low, and a large amount of invalid memory is occupied; therefore, the lane line is not detected by adopting a preset anchor mode in the embodiment, and accordingly, the embodiment is simpler than the existing anchor-based detection model.

And directly regressing the target bounding box of the object corresponding to each pixel in a mode of predicting each pixel on the feature map, and mapping each pixel on the feature map back to the input image.

The size of the input original image is 800 × 800, and the ith real bounding box label of the input image is defined as box _i Then box _i ＝(x ₀ ⁽ⁱ⁾ ,y ₀ ⁽ⁱ⁾ ,x ₁ ⁽ⁱ⁾ ,y ₁ ⁽ⁱ⁾ ,lean ⁽ⁱ⁾ ,c ⁽ⁱ⁾ ). Here (x) ₀ ⁽ⁱ⁾ ,y ₀ ⁽ⁱ⁾ ) The coordinates of the upper left corner of the real bounding rectangle (x) ₁ ⁽ⁱ⁾ ,y ₁ ⁽ⁱ⁾ ) The coordinate of the lower right corner of the rectangle representing the real boundary, lean ⁽ⁱ⁾ The lane line deviation representing the real bounding box is 0 or 1, i.e. whether this lane line is the main diagonal or the sub diagonal, c ⁽ⁱ⁾ The category of the real bounding box is set as one type in this embodiment, and therefore c is 1.

Let F _i D represents the current down-sampling multiple of the characteristic diagram for the ith layer characteristic diagram output by the backbone network. Now map each pixel point on the feature map back to the original image location, assuming the coordinates of the pixel point to be (x) ₀ ,y ₀ ) Then, after mapping back to the original image, the corresponding coordinates are:

the position of the pixel point mapped back to the original image has d ² This embodiment maps it back to the center-most position, thus requiring d/2 to be added to both x and y.

During model training, the target bounding box is not directly predicted, but the distances (l, t, r and b) from the coordinates (x and y) of the pixel points to 4 edges of the bounding box are predicted, and then the final target bounding box is obtained through calculation according to the 4 variables. The relationship between (l, t, r, b) and (x, y) is shown in FIG. 11. Since each pixel point corresponds to a real border frame, when one pixel point falls into a plurality of real target frames, it is necessary to determine which target frame is to be predicted. But this still affects the performance of the model, so the present embodiment proposes hierarchical prediction to solve the problem.

The MFP module generates a plurality of feature maps, and thus can use these feature maps to perform hierarchical prediction of objects of different sizes. The prediction mode of the target in this embodiment is to regress 4 distances from the pixel point to 4 edges of the target bounding box, and the larger the target is, the larger the 4 distances are, so that the size of the predicted target frame can be controlled by limiting the 4 distances, and thus the problem that one point falls in a plurality of rectangular frames can be solved. Calculating the regression distance (l, t, r, b) of each position on all the feature maps, and defining the regression distance for each feature map, the ResNet50 backbone network will generate 4 feature maps, i.e. F ₁ To F ₄ And setting the distance d ₀ To d ₄ In the order of 0, 64,128,256, ∞. If a position prediction satisfies max (l) ⁽ⁱ⁾ ,t ⁽ⁱ⁾ ,r ⁽ⁱ⁾ ,b ⁽ⁱ⁾ )＜d _i-1 Or max (l) ⁽ⁱ⁾ ,t ⁽ⁱ⁾ ,r ⁽ⁱ⁾ ,b ⁽ⁱ⁾ )＞d _i Then the regression bounding box is no longer needed because the bounding box size that the pixel on the feature map should predict has been exceeded. In the scenario of this embodiment, because the lane lines are detected, a certain distance exists between the lane lines, and the overlapping portion is small, as long as the maximum regression distance is defined, this situation can be reduced to the greatest extent, and even if there is a situation where the lane lines fall on multiple real bounding boxes, only the real bounding box with the smallest area needs to be selected as the regression target. Hierarchical prediction can largely solve this problem.

Finally, considering that the area covered by one target frame is large, each pixel point can generate one target frame, the result error of the position prediction which is far away from the center of the target boundary frame is large, a plurality of low-quality meaningless boundary frames can be generated, and the prediction effect is reduced. To eliminate these bounding boxes, this embodiment adds a variable to predict the centrality of the location, which is mainly used to describe the normalized distance from the location to the regression target center of the location. The centrality calculation formula is:

centrality reduces the confidence of those bounding boxes that are far from the target centre, and then filters out these low confidence bounding boxes by Non-Maximum Suppression (NMS), thereby improving detection performance. The centrality ranges from 0 to 1, so it can be trained using a binary cross entropy penalty, which is added to the penalty function.

In this embodiment, both the classification network and the regression network can be implemented by a 4-layer cascade of 1 × 1 convolutions. The classification network outputs a characteristic diagram of 1 channel through four cascaded 1 multiplied by 1 convolutions, and the characteristic diagram represents whether a certain pixel is located in the range of the lane line. The regression network outputs a characteristic diagram of 6 channels through four cascaded 1 × 1 convolutions, each pixel can obtain 6 parameters which respectively represent the distance value (l, t, r, b) from the center of a mapping area of each pixel point in an original image to four boundaries of a lane line outer-wrapping rectangular frame, the lane line deviation (marking the main diagonal line or the auxiliary diagonal line of the lane line outer-wrapping rectangular frame) and the center degree.

In summary, each pixel has 4 position coordinates output, a lane line bias, a centrality, and a binary classifier classification label and confidence associated with the centrality. That is, the present embodiment directly considers the target bounding box as a training sample. And combining the predicted boundary distance with the original image position to obtain a predicted boundary frame, judging the orientation of the lane line by combining the inclination amount, and then converting to obtain the coordinates of two end points of the lane line.

Step 3.3 loss function

The loss function for ODL-Net network model training is defined as follows:

indicates the prediction box class at position (x, y), c _x,y A real frame class at the position (x, y) is represented, and the class value is 1 when the real frame class belongs to the lane line and 0 when the real frame class does not belong to the lane line; l is a radical of an alcohol _reg Representing the IOU loss, for calculating the error between the prediction box and the real box;

is a judgment function if

4 distances representing the position (x, y) to the 4 boundaries of the prediction box; l is a radical of an alcohol _ctr A binary cross entropy loss function representing the position centrality for calculating the distance, s, between the predicted frame and the real frame centre _x,y The centrality of the position (x, y) to the centre point of the real box,

the centrality of the position (x, y) to the centre point of the prediction box; l is _lean A binary cross entropy loss function representing inclination deviation, which is used for calculating the deviation of the lane line in the outsourcing rectangular frame;

indicating the predicted bias value, l, for position (x, y) _x,y Representing the true deviation value corresponding to the position (x, y), the lane line is positioned on the main diagonal of the outer-wrapped rectangular frameThe timing deviation value is 1, and the timing deviation value is 0 on the sub diagonal. The weighted values λ, μ, and ν in this embodiment are all 1.

Step 3.4 model training and prediction

Dividing the previous data set into a training set, a verification set and a test set according to the proportion of 7. When training loss falls below 0.01 over 50 cycles and there is a tendency for validation set loss to begin to rise, training ends and the model is saved for later reuse. In the experiment of the embodiment, the training loss is reduced to 0.43, and when the verification loss is reduced to 0.56, the model performance is optimal.

And calling the trained model to predict the pictures in the test set so as to test the effect of the model. The model prediction results are shown in fig. 12. It can be seen that the model training is quite in place, and the lane lines can be better detected by the prediction result.

Step 4 lane line post-processing

And carrying out data preprocessing on a plurality of images under the same equipment, and then calling the model to detect the preprocessed plurality of image data, so that lane line detection results in the plurality of images can be obtained. And merging the lane line detection results of the multiple images into the same set, and performing post-processing on the line segments to obtain the final lane line of the scene. The present embodiment will describe the post-processing technique in detail.

Step 4.1 line segment Filtering

After the model is called for detection of multiple images, multiple line segments are generated, some of the line segments are generated by detection errors, and even if the line segments are actually lane lines, multiple line segments are generated.

As a result of counting the lane line angles in the data set, it was found that the lane lines are mostly deviated to the vertical x-axis, the stop line is often perpendicular to the y-axis, and even if the angles are slightly deviated, the angles do not change greatly, and therefore, the angles of the stop line are considered to be within 10. The angle of the lane lines is larger than the stop lines, most of which are perpendicular to the x-axis, and only a small part of which is deviated, but not too much, so that the angle of the lane lines can be considered to be between 45 and 90 degrees. This is because the traffic image of the embodiment is shot by the law enforcement camera, and if the angle deviation is too large, the height of the vehicle can block the lane line, which can cause that it is difficult for the traffic police to judge whether the vehicle is pressed, thereby influencing the illegal judgment. This is a scene characteristic of the present embodiment. Therefore, in the embodiment, the line segments which form an angle of 10 to 45 degrees with the horizontal line in the detection result of the ODL-Net network model can be filtered out by using the angle ranges of the stop line and the lane line, and the line segments are detected in error.

Step 4.2 Lane line clustering

After a large number of lane line detection results are obtained, the true lane lines need to be obtained from the lane line detection results, and therefore the line segments need to be clustered. However, because line segment clustering is difficult to implement, and the lane line detection result still contains partial wrong detection results, various line segments may be interwoven together, and all the lines are clustered into one class.

Converting the line segment into multiple points, wherein the conversion steps are as follows: each line segment is divided by 10 pixels, for example, a line segment 100 pixels long is converted into 11 points, and each point is separated by 10 pixels. By the method, all line segments are converted into points, and then all the points are clustered together.

The following introduces the specific steps of the clustering algorithm based on the point distance calculation:

1) Putting all pixel points into a first set A initialized to be empty;

2) Randomly taking out a pixel point from the current first set A and adding the pixel point into the second set B initialized to be empty;

3) Traversing all the pixel points in the first set A, judging whether a pixel point with the distance between the current traversed pixel point and the current traversed pixel point being less than the maximum clustering distance exists in the second set B, and if so, adding the current traversed pixel point into the second set B; the maximum clustering distance is the maximum distance value allowed between adjacent pixel points on one lane line in the image;

4) Repeating the step 3) continuously until no new pixel points are added into the second set B, and taking all the pixel points in the second set B as a cluster class cluster, wherein the pixel points in the cluster class cluster belong to the same lane line;

5) And repeating 2) -4) continuously until all the pixel points in the first set A are divided into clustering clusters to obtain the pixel points corresponding to each lane line. Finally, n cluster types are obtained, and each cluster type generally comprises dozens of or hundreds of points.

Step 4.3 lane line fitting

After clustering, each lane line is already clustered into one type, and the lane lines can be obtained only by fitting points in the same type. Since there may be a small part of line segments belonging to the detection error in each type of point, the fitting effect may be poor due to the least square fitting, so the present embodiment adopts a random sample consensus algorithm (RANSAC algorithm) to fit the lane line. The RANSAC algorithm can estimate parameters of a mathematical model from a group of data containing outliers in an iterative mode, and as long as the iteration times are increased, a desired result can be obtained with a high probability and the method cannot be influenced by the outliers. In this embodiment, the above detection includes a segment with partial detection errors, and after the segment is converted into a point, the point is an outlier, a true lane line can be detected in data including the outlier by using the RANSAC algorithm, and if a straight line is fitted by using the least square method, the segment is affected by the outlier, so that the segment deviates from the original position. Meanwhile, trimming the fitted lane line and cutting off redundant parts. The results of the RANSAC algorithm fitting the line are shown in FIG. 13.

At this point, a complete lane line is detected. The detected lane lines can be represented by coordinates of two end points, so that the detection result can be stored in a JSON file, the lane lines are visualized on an image background during manual adjustment, and the two end points of the lane lines to be corrected are manually dragged to correct the lane lines.

In conclusion, the method has a good effect on the lane line detection in a complex scene. According to the method, the weight calculation is carried out on the pictures in the same scene during the construction of the data set, the pictures with few vehicles (less shielding of the vehicles with less quantity on the lane lines) and good light environment (the light effect in the daytime is better than that in the evening) are screened to participate in the training, and the influence of the data on the model under severe conditions can be avoided to a great extent. Fig. 14, fig. 15, fig. 16, and fig. 17 show detection results representing a dark environment and a situation where a lane line is worn, a situation where rain and dirt are accumulated at night, a situation where a lane line blocks light reflection at night, and a situation where light is accumulated at a plurality of cars at night, respectively.

Because the lane line is easily shielded by vehicles coming and going, the method and the device perform lane line detection on a plurality of pictures in the same scene at the same time, and then perform clustering and fitting on the detection results of the pictures, thereby greatly improving the detection accuracy in the scene. As shown in tables 1 and 2, the detection accuracy of the invention in the two widely used and lane line evaluation data sets TuSimple and CULane evaluation indexes can reach 96.98% and 95.49% respectively.

Table 1 comparison table of different model performances under TuSimple evaluation index

Table 2 comparison table of different model performances under curane evaluation index

Therefore, the method has very high accuracy in detecting and identifying the lane lines.

The above-described embodiments are merely preferred embodiments of the present invention, and are not intended to limit the present invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical solutions obtained by means of equivalent substitution or equivalent transformation all fall within the protection scope of the present invention.

Claims

1. A lane line detection method based on a traffic off-site law enforcement scene is characterized by comprising the following steps:

s1, acquiring a marked training data set, wherein each image sample comprises an image which is overlooked by a law enforcement camera and contains lane lines, the image is pre-marked with end points at two ends of a central line of each lane line and lane line deviation, and the lane line deviation is used for judging whether the lane lines are in a main diagonal line or a secondary diagonal line in an outer rectangular frame of the lane lines; the image samples in the training data set belong to different intersection scenes, and all the image samples are divided into a day image subset shot in the day and a night image subset shot at night according to shooting time;

the lane line detection network comprises an input module, a multiple feedback characteristic pyramid module, a classification regression module and an output module;

the multi-feedback characteristic pyramid module is formed by cascading a plurality of characteristic pyramid networks based on ResNet 50; the first characteristic pyramid model based on ResNet50 takes an original image as the only input, and the ResNet50 is used as a backbone network to extract multi-level characteristics from bottom to top and then outputs four characteristic graphs with different scales through the characteristic pyramid network; in addition to the first characteristic pyramid model based on ResNet50, in each of the rest characteristic pyramid models based on ResNet50, an original image is used as a first input, four different-scale characteristic diagrams output in the previous characteristic pyramid model based on ResNet50 are simultaneously received as a second input, in the process of extracting the characteristic diagrams from bottom to top from the first input by a ResNet50 backbone network, after each characteristic diagram of one scale is extracted, the characteristic diagram is connected with the characteristic diagram of the corresponding scale in the second input and then is subjected to 3-by-3 hole convolution for extracting the characteristic diagram of the next scale, and after four different-scale characteristic diagrams output by the characteristic pyramid network in the current characteristic pyramid model based on ResNet50 are subjected to corresponding level fusion with the second input, four different-scale characteristic diagrams are finally output as the final output of the current characteristic pyramid model based on ResNet 50;

four different-scale feature graphs finally output in the multi-feedback feature pyramid module are used as the input of the classification regression module; the classification regression module comprises a classification network and a regression network, wherein the classification network is used for carrying out secondary classification on each pixel point in each characteristic graph and outputting a classification label and a confidence coefficient of whether a mapping area of each pixel point in an original image belongs to a lane line; the regression network is used for obtaining 6 prediction parameters of each pixel point in each characteristic graph through regression, and the prediction parameters comprise distance values from the center of a mapping area of each pixel point in an original image to four boundaries of a lane line outsourcing rectangular frame, lane line deviation and centrality;

in the output module, firstly, the two classification results of the classification network are utilized to carry out primary screening on pixel points in each characteristic graph, pixel points which do not belong to the lane line are eliminated, then, the remaining pixel points which belong to the lane line are screened for the second time, the pixel points of which the distance values exceed the threshold range are eliminated, then, the remaining pixel points are subjected to non-maximum value inhibition based on the corresponding confidence coefficient, the prediction result of the lane line outsourcing rectangular frame in the original image is finally obtained, and each outsourcing rectangular frame in the prediction result is combined with the lane line deflection to determine the diagonal line of the corresponding outsourcing rectangular frame to serve as the final output lane line prediction result;

s5, filtering lane line prediction results in the image to be detected, and removing line segments which do not meet the spatial position relation of the lane lines in the image;

2. The method for detecting the lane line based on the traffic off-site law enforcement scenario as claimed in claim 1, wherein the specific method of S2 is as follows:

s21, converting each image sample in the training data set from an RGB image into a gray level image, then calculating the gray level mean value of all pixels in each image sample, and then calculating the mean value of the gray level mean values of all image samples in each subset aiming at a day image subset and a night image subset respectively to be used as the mean brightness of the corresponding subset; taking the average value of the average brightness of the two subsets as a brightness distinguishing threshold value for distinguishing daytime and night;

s23, calculating a mass weight =255+ λ α gray- β carWeight of each image sample in the training data set according to the brightness discrimination threshold and the vehicle weight, wherein gray represents a gray level mean value of all pixels in the currently calculated image sample, carWeight represents a vehicle weight corresponding to the currently calculated image sample, α and β are two weights respectively, λ is a weight determined by the brightness discrimination threshold bound and the gray, λ = λ 1 if the gray is greater than or equal to bound, λ = λ 2 if the gray is less than bound, λ 1+ λ 2=1, and λ 1 > λ 2;

3. The method for detecting the lane line according to claim 2, wherein the weights α and β are 1 and 2, respectively, and the weights λ 1 and λ 2 are 0.6 and 0.4, respectively.

4. The method for detecting the lane line based on the traffic non-onsite law enforcement scene as claimed in claim 1, wherein the centrality of each pixel point is calculated from the distance value from the center of the mapping area of the pixel point in the original image to the four borders of the lane line outer-covering rectangular frame, and the calculation formula is as follows:

5. The lane line detection method based on traffic off-site law enforcement scenario of claim 1, wherein the loss function of the lane line detection network is:

in the formula: n represents the number of classification tags; lambda, mu and nu respectively represent weight values occupied by the three losses; l is _cls Representing a focus loss for calculating a class prediction error;

is a judgment function if

4 distances representing the position (x, y) to the 4 boundaries of the prediction box; l is _ctr A binary cross entropy loss function representing the position centrality for calculating the distance, s, between the predicted frame and the real frame centre _x,y The centrality of the position (x, y) to the centre point of the real box,

the centrality of the position (x, y) to the centre point of the prediction box; l is _lean Binary cross entropy loss function representing inclination deviation for calculating lane lineWrapping a deviation in the rectangular frame;

indicates the predicted bias value, l, for position (x, y) _x,y And (3) representing a real deviation value corresponding to the position (x, y), wherein the deviation value is 1 when the lane line is positioned on the main diagonal of the outer-wrapping rectangular frame, and the deviation value is 0 when the lane line is positioned on the auxiliary diagonal.

6. The method for detecting the lane line based on the traffic off-site law enforcement scenario as claimed in claim 1, wherein the output module performs a second screening on the remaining pixels belonging to the lane line, and the specific method for eliminating the pixels with the distance value exceeding the threshold range is as follows:

sets its five thresholds d ₀ 、d ₁ 、d ₂ 、d ₃ 、d ₄ 0, 64,128,256 and ∞ in sequence, for the ith layer feature map F output by the multi-feedback feature pyramid module _i I =1,2,3,4 if F _i Four distance values (l) corresponding to one pixel point in (b) ⁽ⁱ⁾ ,t ⁽ⁱ⁾ ,r ⁽ⁱ⁾ ,b ⁽ⁱ⁾ ) Satisfies max (l) ⁽ⁱ⁾ ,t ⁽ⁱ⁾ ,r ⁽ⁱ⁾ ,b ⁽ⁱ⁾ )＜d _i-1 Or max (l) ⁽ⁱ⁾ ,t ⁽ⁱ⁾ ,r ⁽ⁱ⁾ ,b ⁽ⁱ⁾ )＞d _i Then the pixel point is eliminated.

7. The method for detecting the lane line based on the traffic off-site law enforcement scenario of claim 1, wherein the multi-feedback feature pyramid module is formed by cascading 3 feature pyramid networks based on ResNet 50.

8. The method for detecting the lane line based on the traffic off-site law enforcement scenario of claim 1, wherein in step S5, when filtering the predicted lane line result in the image to be detected, the spatial position relationship that the lane line should satisfy in the image is that the included angle between the lane line and the horizontal line is not within the range of 45 to 90 degrees.

9. The method for detecting lane lines in a scene based on traffic off-site law enforcement according to claim 1, wherein in S6, the specific method for clustering all the lane line points together is as follows:

s61, putting all pixel points into a first set initialized to be empty;

s63, traversing all the pixel points in the first set, judging whether a pixel point with a distance smaller than the maximum clustering distance exists in the second set, and if so, adding the current traversed pixel point into the second set; the maximum clustering distance is the maximum distance value allowed between adjacent pixel points on a lane line in the image;

10. The method for detecting the lane line based on the traffic off-site law enforcement scenario of claim 1, wherein in S7, the straight line fitting is implemented by using RANSAC algorithm.