CN112949398A

CN112949398A - Lane line detection method, distance measurement method and corresponding device

Info

Publication number: CN112949398A
Application number: CN202110129993.2A
Authority: CN
Inventors: 聂荣佶
Original assignee: Chengdu Anzhijie Technology Co ltd
Current assignee: Chengdu Anzhijie Technology Co ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2021-06-11
Anticipated expiration: 2041-01-29
Also published as: CN112949398B

Abstract

The application relates to the technical field of image processing, and provides a lane line detection method, a distance measurement method and a corresponding device. The lane line detection method comprises the following steps: determining an interested area image containing a lane line from a road image acquired by a vehicle-mounted camera; processing the image of the region of interest by utilizing a semantic segmentation network, obtaining a segmentation mask of a lane line in the image, and converting the segmentation mask into a corresponding top view; determining a set of lane line pixel points corresponding to different lane lines in the top view, and converting coordinates of the lane line pixel points in the set from coordinates in the top view into coordinates in a road image or coordinates in a world coordinate system; and fitting to form a lane line equation according to the set of the lane line pixel points after the coordinates are transformed. The lane line detection method is beneficial to improving the accuracy of lane line detection.

Description

Lane line detection method, distance measurement method and corresponding device

Technical Field

The invention relates to the technical field of image processing, in particular to a lane line detection method, a distance measurement method and a corresponding device.

Background

In Advanced Driving Assistance Systems (ADAS) and automatic Driving, it is a basic requirement to detect lane lines from images collected by a vehicle-mounted camera. Based on the detected lane line, the functions of lane departure early warning, lane keeping, lane change assistance and the like can be realized. However, the existing method is not highly accurate for lane line detection.

Disclosure of Invention

An object of the embodiments of the present application is to provide a lane line detection method, a distance measurement method and a corresponding device, so as to improve the above technical problems.

In order to achieve the above purpose, the present application provides the following technical solutions:

in a first aspect, an embodiment of the present application provides a lane line detection method, including: determining an interested area image containing a lane line from a road image acquired by a vehicle-mounted camera; processing the interesting region image by utilizing a semantic segmentation network to obtain a segmentation mask of a lane line in the image, and converting the segmentation mask into a corresponding top view; determining a set of lane line pixel points corresponding to different lane lines in the top view, and transforming coordinates of the lane line pixel points in the set from coordinates in the top view into coordinates in the road image or coordinates in a world coordinate system; and fitting to form a lane line equation according to the set of the lane line pixel points after the coordinates are transformed.

Compared with the prior art, the lane line detection method at least has the following advantages: firstly, a deep learning method (meaning segmentation network) is utilized to segment the lane line, which is beneficial to improving the lane line segmentation precision and further improving the lane line detection precision in the subsequent steps; secondly, the lane line detection is not directly performed based on the segmentation mask, but the segmentation mask is converted into a corresponding top view, and then the lane line detection is performed based on the top view, because originally parallel lane lines in the segmentation mask are likely to converge at a distance due to a perspective effect, an algorithm (such as a sliding frame algorithm) for searching for a lane line pixel point in a subsequent step is difficult to distinguish different lane lines, and after the segmentation mask is converted into the top view, due to the fact that the perspective effect in the segmentation mask is eliminated, a parallel relation between different lane lines is kept in the top view, and therefore the lane line detection accuracy can be improved.

In an implementation manner of the first aspect, the processing the image of the region of interest by using a semantic segmentation network to obtain a segmentation mask of a lane line in the image includes: extracting the multi-scale features of the region-of-interest image by using a backbone network in the semantic segmentation network; fusing the multi-scale features by using a feature fusion network in the semantic segmentation network, and outputting segmentation results aiming at different lane line categories; converting the segmentation result into the segmentation mask.

In the implementation mode, the multi-scale features are extracted and fused through the semantic segmentation network to obtain the segmentation mask, so that the lane line segmentation precision is improved.

In an implementation manner of the first aspect, the extracting, by using a backbone network in the semantic segmentation network, the multi-scale features of the region-of-interest image includes: extracting the multi-scale features by utilizing a plurality of bottleeck modules in the backbone network, wherein the bottleeck modules are convolution modules in the MobileNet; the fusing the multi-scale features by using the feature fusion network in the semantic segmentation network and outputting segmentation results aiming at different lane line categories comprises the following steps: convolving the features of each scale, adding the convolved features of each scale to the fusion features of the scale, and sampling the addition result through deconvolution to obtain the fusion features of the previous scale; and after convolution, the features of the minimum scale are directly used as an addition result, and the fusion features of the previous scale calculated by using the features of the maximum scale are used as the segmentation result.

In the implementation manner, the backbone network adopts a bottleeck module in a lightweight network MobileNet (e.g., MobileNetv2), so that the network computation amount is small while the segmentation accuracy is satisfied. And in the feature fusion network part, the features are up-sampled and added and fused step by step according to the sequence of the feature scales from small to large, and because the features with different scales contain different semantic information and correspond to different receptive fields, the expression capability of the features can be obviously improved through feature fusion, and the lane line segmentation precision is further improved.

In an implementation manner of the first aspect, the converting the segmentation mask into a corresponding top view includes: determining a region to be detected in the segmentation mask by using the parameters of the vehicle-mounted camera, wherein the region to be detected represents a near region of a lane where the vehicle is located; and converting the part of the segmentation mask in the area to be detected into the top view.

In many practical applications, only the lane where the host vehicle is located (corresponding to two lane lines) needs to be concerned, and in addition, the lane lines are usually regularly extended (for example, extended in a straight line), and it is sufficient to fit the lane line equation only by searching a part of the lane line pixel points. Therefore, based on these two considerations, it is possible to perform lane line detection only in the near region (corresponding to the region to be detected in the segmentation mask) of the lane in which the host vehicle is located, so as to save the amount of computation and hardly affect the lane line detection effect.

In one implementation form of the first aspect, the top view is a binary image. In the process of detecting the lane lines based on the top view, the categories of the lane lines may not be concerned, so that the use of the binary map as the top view may simplify the subsequent detection steps (e.g., statistical histogram).

In an implementation manner of the first aspect, processing the region of interest image by using a semantic segmentation network, obtaining a segmentation mask of a lane line in the image, and converting the segmentation mask into a corresponding top view includes: determining a distal portion image containing a distal lane line and a proximal portion image containing a proximal lane line from the region of interest image; respectively processing the far-end partial image and the near-end partial image by utilizing the semantic segmentation network to obtain a segmentation mask of a lane line in the far-end part image and a segmentation mask of a lane line in the near-end part image; and converting the two segmentation masks into the top view which is jointly corresponding to the two segmentation masks.

In the interested region image, due to the perspective effect, the far-end lane lines occupy less pixel points, and the near-end lane lines occupy more pixel points. If the image of the region of interest is directly processed by using the semantic segmentation network, since the image is likely to be zoomed (mostly zoomed out here if the resolution of the road image is higher) to the size required by the network during the input, the far-end lane lines only occupy a few pixel points in the zoomed image, so that the semantic segmentation network is difficult to effectively segment the image, and the part of the lane lines is difficult to detect subsequently.

According to the implementation mode, before the lane lines are segmented by the semantic segmentation network, the interested area image is not zoomed wholly, but the far-end part image and the near-end part image are respectively zoomed, and as the size of the far-end part image is obviously smaller than that of the interested area image, the reduction proportion of the far-end part image is not large (and is not necessarily reduced), so that the far-end lane lines occupy more pixel points in the zoomed image, the semantic segmentation network can effectively segment the lane lines, and the lane lines can be effectively detected subsequently.

In an implementation manner of the first aspect, the determining a set of lane line pixel points corresponding to different lane lines in the top view includes: counting the number of lane line pixel points at each abscissa in the top view to obtain a histogram; determining a positioning base point of the lane line in the top view according to the abscissa of the peak in the histogram; the positioning base points are located at the bottom of the top view, and each positioning base point corresponds to one lane line; and searching along the longitudinal direction of the top view by taking the positioning base point as an initial position to obtain a lane line pixel point set corresponding to different lane lines in the top view.

The peak in the histogram, that is, the position where the pixel points of the lane line are most densely distributed, is naturally most likely to be the position where the lane line is located, so that the positioning base point of the lane line can be quickly and accurately found through histogram statistics, and the subsequent search process for the pixel points of the lane line is simplified.

In an implementation manner of the first aspect, the counting the number of lane line pixel points at each abscissa in the top view to obtain a histogram includes: and counting the number of lane line pixel points at each abscissa in the area with the designated height at the bottom of the top view to obtain the histogram.

When the histogram is counted, only a small area at the bottom of the top view can be taken for counting, on one hand, because the positioning base point is positioned at the bottom of the top view, the positioning base point is not greatly related to a far lane line, and the searching in the small area is enough, so that the calculation amount is saved, and more accurate position of the positioning base point can be obtained, and on the other hand, because the small area is closer to the camera, the inclination degree of the lane line is smaller, so that the position of the positioning base point can be determined more accurately

In one implementation manner of the first aspect, the determining a positioning base point of a lane line in the top view according to an abscissa of a peak in the histogram includes: if a known positioning base point exists, searching a peak in a preset range near the abscissa of the known positioning base point in the histogram, and determining the positioning base point according to the found abscissa of the peak; if no known positioning base point exists, searching a peak in the whole abscissa range of the histogram, and determining the positioning base point according to the found abscissa of the peak; wherein the known positioning base point is a positioning base point determined in a process of performing lane line detection on a preamble image of the road image.

The detection of the lane lines is likely to be a continuous process, and the road images are continuously collected and detected in the driving process of the vehicle. If the position of a positioning base point (referred to as a known positioning base point) is already determined when the lane line detection is performed on the preceding image (for example, the previous frame or several previous frames) of the current road image, in consideration of the continuity of the vehicle motion and the continuity of the lane line itself, in the top view corresponding to the current road image, if the positioning base point of the lane line exists, the position of the positioning base point does not deviate too far from the position of the known positioning base point, so that the search is performed near the abscissa of the known positioning base point in the histogram, and the efficiency of the positioning base point position can be determined. However, if the position of the positioning base point cannot be determined when the lane line detection is performed on the preceding image of the current road image (for example, there is no lane line at all in the preceding image or the lane line detection is started from the current road image), the positioning base point of the lane line can only be found within the entire abscissa range of the histogram.

In one implementation manner of the first aspect, the searching for a peak in the histogram within a preset range near an abscissa of the known positioning base point, and determining the positioning base point according to the found abscissa of the peak includes: searching peaks in a preset range near the abscissa of two known positioning base points in the histogram, if two peaks are found, and the abscissas x1 and x2 meet the condition that thresh _ min < abs (x1-x2) < thresh _ max, determining x1 and x2 as the abscissas of the two positioning base points, wherein thresh _ min is a preset minimum width of a lane line, thresh _ max is a preset maximum width of the lane line, and abs represents absolute value calculation, and the two positioning base points correspond to the two lane lines of the lane where the vehicle is located.

In the above implementation, if the abscissa of the two peaks satisfies the condition thresh _ min < abs (x1-x2) < thresh _ max, it indicates that x1 and x2 respectively correspond to the two lane line positions of the lane in which the host vehicle is located.

In one implementation manner of the second aspect, the searching for a peak in the entire abscissa range of the histogram and determining the positioning base point according to the found abscissa of the peak includes: searching wave crests in the range [0, L/2] and [ L/2, L ] of the horizontal coordinates of the histogram respectively; wherein L is the maximum value of the abscissa of the histogram; if two peaks are found and the abscissa x1 and x2 satisfy the condition thresh _ min < abs (x1-x2) < thresh _ max, determining x1 and x2 as the abscissa of two positioning base points; if two peaks are found and the abscissa x1 and x2 thereof satisfy the conditions 2 × thresh _ min < abs (x1-x2) < 2 × thresh max and abs (x1-L/2) < abs (x2-L/2), determining x1 and (x1+ x2)/2 as the abscissa of two positioning base points; if two peaks are found and the abscissa x1 and x2 thereof satisfy the conditions of 2 × thresh _ min < abs (x1-x2) < 2 × thresh _ max and abs (x1-L/2) > abs (x2-L/2), determining x2 and (x1+ x2)/2 as the abscissa of two positioning base points; wherein thresh _ min is a preset minimum width of a lane line, thresh max is a preset maximum width of the lane line, abs represents an absolute value calculation, and the two positioning base points correspond to two lane lines of a lane where the vehicle is located.

In the implementation manner, if the abscissa of the two peaks satisfies the condition thresh _ min < abs (x1-x2) < thresh _ max, it indicates that there are two lane lines in the top view at this time, and x1 and x2 respectively correspond to the two lane line positions of the lane where the host vehicle is located. If the abscissa of the two peaks satisfies the condition 2 × thresh _ min < abs (x1-x2) < 2 × thresh _ max, it indicates that there are three lane lines in the top view at this time (even though the size of the region to be detected is limited to only include two lane lines when the top view is switched, there may be three lane lines included in the top view because the lanes of different types are not of equal width), and a positioning base point corresponding to the two lane lines of the lane where the host vehicle is located should be further calculated.

At this time, if abs (x1-L/2) < abs (x2-L/2), it indicates that the vehicle is in the lane near the left (the three lane lines together form two lanes), and x1 and (x1+ x2)/2 respectively correspond to the two lane line positions of the lane where the vehicle is located (the two lane lines near the left of the three lane lines); if abs (x1-L/2) > abs (x2-L/2) indicates that the vehicle is in the lane to the right, x2 and (x1+ x2)/2 respectively correspond to the two lane line positions of the lane in which the vehicle is located (the two lane lines to the right of the three lane lines).

In an implementation manner of the first aspect, the searching along a longitudinal direction of the top view with the positioning base point as a starting position to obtain a set of lane line pixel points corresponding to different lane lines in the top view includes: determining the initial position of the sliding frame according to the positioning base point; starting from the initial position, moving the sliding frame to the top of the top view to search until the position of the sliding frame reaches the top of the top view, and obtaining the lane line pixel point set corresponding to the positioning base point; and after the sliding frame moves to a position, judging whether the sliding frame is effective at the position, and if the sliding frame is effective, adding the lane line pixel points in the sliding frame into the lane line pixel point set corresponding to the positioning base point.

The above implementation mode may be called sliding frame search, starting from each positioning base point, by continuously moving the position of the sliding frame, the pixel point set belonging to the lane line corresponding to the positioning base point is gradually searched, the search process is simple, and the position of the sliding frame can be flexibly adjusted in the search process, so that the search result is accurate.

In one implementation manner of the first aspect, the determining whether the slide frame is valid at the position includes: and judging whether the number of the lane line pixel points in the sliding frame at the position exceeds a first threshold value, if so, determining that the sliding frame is valid, and otherwise, determining that the sliding frame is invalid.

The number of lane line pixel points in the sliding frame is large (exceeding the first threshold), which indicates that the position of the sliding frame is matched with the position of the lane line, otherwise, the lane line pixel points in the sliding frame may be false detection noise or the sliding frame deviates from the position of the lane line.

In an implementation manner of the first aspect, if the sliding frame is valid, the method further includes: and calculating the mean value of the abscissa of the lane line pixel points in the sliding frame, and determining a new position to which the sliding frame is to be moved according to the calculation result.

And calculating the new position of the sliding frame according to the abscissa mean value of the lane line pixel points in the sliding frame, so that the sliding frame can move according to the extending trend of the lane line.

In one implementation form of the first aspect, the method further comprises: if the sliding frame is invalid, directly moving the sliding frame to a new position for continuous searching; wherein, the mode of moving the sliding frame comprises one of the following modes: keeping the abscissa of the sliding frame unchanged, and changing the ordinate of the sliding frame to enable the sliding frame to move towards the top of the top view; and determining the extension trend of the lane line corresponding to the positioning base point according to the searched lane line pixel points, and moving the sliding frame to the top of the top view according to the extension trend.

The sliding frame may be ineffective and may be a normal phenomenon, for example, a break or a large curve in the lane line may cause the sliding frame to fail to match the position of the lane line. When the sliding frame is invalid, the sliding frame should be moved continuously to try to search for a new lane line pixel point, rather than stopping the search. One way is to continue moving the carriage upwards, for example if there is a break in the lane line, this way will soon find the next lane line; another way is to move the sliding frame along the extension of the lane line, which can largely ensure that the sliding frame always moves along with the lane line due to the continuity of the lane line, for example, if there is a large curve in the lane line, it will quickly find a new lane line position.

In one implementation form of the first aspect, the method further comprises: and after the search is finished, judging whether the total number of the effective sliding frames obtained in the search process exceeds a second threshold, if so, recognizing the obtainable lane line pixel point set, otherwise, not recognizing the obtainable lane line pixel point set.

If the number of the effective sliding frames is large (exceeds a second threshold), the obtained lane line pixel point set is high in reliability and should be approved, and otherwise, the lane line pixel point set is not approved. The accuracy of lane line detection can be improved through the judgment, and false detection is avoided.

In a second aspect, an embodiment of the present application provides a ranging method, including: acquiring a road image acquired by a vehicle-mounted camera; obtaining a lane line equation of a lane where the vehicle is located in the road image by using the lane line detection method provided by the first aspect or any one of the possible implementation manners of the first aspect; calculating a first pixel width according to the lane line equation, wherein the first pixel width is the pixel width of a lane where the vehicle is located at the blind area boundary of the vehicle-mounted camera in the road image; calculating a second pixel width according to the lane line equation, wherein the second pixel width refers to the pixel width of a lane where the vehicle is located in the road image at the position of the target to be detected; calculating a first ratio between the first pixel width and the second pixel width; and substituting the first ratio into a distance measurement relation equation of the first ratio and the distance value, and calculating to obtain the distance value between the target to be measured and the vehicle.

In the prior art, when distance measurement is performed based on a road image, the distance between the vehicle and the host vehicle is generally calculated by using the width of the lower edge of the target detection frame of the vehicle in front, the method assumes that all vehicles are of equal width, but the assumption does not hold in practice, that is, the width of the lower edge of the target detection frame may represent different actual widths instead of a fixed value, which results in low accuracy of the distance measurement result of the method.

The ranging method provided by the application is based on the assumption of equal lane width to perform ranging, and the general principle is as follows: the lane at the blind zone boundary and the lane at the target to be detected are equal in width, but due to perspective effect, the widths of the two pixels in the road image are respectively the first pixel width and the second pixel width. The first ratio between the first pixel width and the second pixel width is changed along with the distance between the target to be measured and the vehicle, and the change rule between the two values meets an equation (distance measurement relation equation), so that the distance measurement result can be obtained by substituting the first ratio into the equation.

The assumption of equal lane width is established in most cases, so the distance measuring method is very reliable in basis and high in distance measuring precision. Moreover, the distance measurement relation equation in the method is established between the first ratio and the actual distance, which is equivalent to eliminating the influence of the actual width of the lane on the distance measurement result (for example, if the relation equation is established between the second pixel width and the distance measurement result, the actual width of the lane has influence on the distance measurement result), so that the method can be applied to the lane with any width, and the distance measurement process is simple and efficient.

In one implementation manner of the second aspect, the calculating the first pixel width according to the lane line equation includes: determining the vertical coordinate of the bottom of the road image as the vertical coordinate of the blind area boundary; and calculating to obtain the first pixel width according to the longitudinal coordinate of the blind area boundary and the lane line equation.

The above implementation gives one possible way of calculating the first pixel width.

In one implementation manner of the second aspect, the calculating the second pixel width according to the lane line equation includes: detecting a target in the road image; determining a target located in a lane where the vehicle is located in the detected targets as the target to be detected according to the detection frame of the target and the lane line equation; determining the vertical coordinate of the target to be detected in the road image according to the detection frame of the target to be detected; and calculating to obtain the second pixel width according to the ordinate of the target to be detected and the lane line equation.

The above implementation gives one possible way of calculating the second pixel width. It should be noted that although the distance measurement method can also measure the distance of the target to be measured in the lane other than the vehicle, the distance measurement precision may be reduced due to the perspective effect, and the distance measurement result is more accurate for the target to be measured in the lane in which the vehicle is located.

In one implementation form of the second aspect, the method further comprises: acquiring a calibration road image acquired by the vehicle-mounted camera; obtaining a third pixel width, wherein the third pixel width is the pixel width of a lane where the vehicle is located in the blind area boundary of the vehicle-mounted camera measured in the calibrated road image; obtaining a plurality of fourth pixel widths, wherein the fourth pixel widths are pixel widths of lanes where the vehicle is located and at a preset distance from the vehicle, which are measured in the calibrated road image; wherein, each fourth pixel width corresponds to a different preset distance; calculating a plurality of second ratios between the plurality of fourth pixel widths and the third pixel widths, and forming a plurality of data points consisting of the corresponding preset distances and the second ratios; and solving the parameters of the ranging relation equation according to the plurality of data points.

The above implementation gives a process of solving the parameters of the ranging relation equation, which is also referred to as a calibration process. The number of data points used for calibration is related to the number of parameters to be solved, e.g. if the quadratic curve equation has three parameters, three data points need to be used.

In one implementation manner of the second aspect, the lane line equation is a linear equation, and the distance measurement relation equation is a linear equation or a quadratic curve equation.

The inventor researches and finds that the distance measuring method performs well on a scene with a straight lane line, and the distance measuring relation equation has enough distance measuring precision and small calculation amount when a straight line equation or a quadratic curve equation is adopted.

In a third aspect, an embodiment of the present application provides a lane line detection apparatus, including: the interesting region extraction module is used for determining an interesting region image containing a lane line from a road image acquired by the vehicle-mounted camera; the lane line segmentation module is used for processing the image of the region of interest by utilizing a semantic segmentation network to obtain a segmentation mask of a lane line in the image and converting the segmentation mask into a corresponding top view; the pixel point detection module is used for determining a set of lane line pixel points corresponding to different lane lines in the top view and converting coordinates of the lane line pixel points in the set from coordinates in the top view into coordinates in the road image or coordinates in a world coordinate system; a lane line fitting module for fitting the set of the lane line pixel points after coordinate transformation to form a lane line equation

In a fourth aspect, an embodiment of the present application provides a distance measuring device, including: the image acquisition module is used for acquiring a road image acquired by the vehicle-mounted camera; a lane line detection module, configured to obtain a lane line equation of a lane where a vehicle is located in the road image by using a lane line detection method provided in the first aspect or any one of possible implementation manners of the first aspect; the first width calculation module is used for calculating a first pixel width according to the lane line equation, wherein the first pixel width is the pixel width of a lane where the vehicle is located in the blind area boundary of the vehicle-mounted camera in the road image; the second width calculation module is used for calculating a second pixel width according to the lane line equation, wherein the second pixel width refers to the pixel width of a lane where the vehicle is located in the road image at the position of the target to be detected; a ratio calculation module for calculating a first ratio between the first pixel width and the second pixel width; and the distance calculation module is used for substituting the first ratio into a distance measurement relation equation of the first ratio and the distance value to calculate and obtain the distance value between the target to be measured and the vehicle.

In a fifth aspect, the present application provides a computer-readable storage medium, where computer program instructions are stored, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the method provided in any one of the possible implementation manners of the first aspect, the second aspect, or both.

In a sixth aspect, an embodiment of the present application provides an electronic device, including: a memory in which computer program instructions are stored, and a processor, wherein the computer program instructions, when read and executed by the processor, perform the method provided by any one of the possible implementations of the first aspect, the second aspect, or both.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 illustrates a flow of a lane line detection method provided in an embodiment of the present application;

FIG. 2 illustrates a region of interest image provided by an embodiment of the present application;

FIG. 3 illustrates a segmentation mask provided by an embodiment of the present application;

FIG. 4 illustrates a semantic segmentation network provided by an embodiment of the present application;

FIG. 5 illustrates a region to be detected in a segmented mask provided by an embodiment of the present application;

FIG. 6 is a top view of a split mask provided in an embodiment of the present application;

FIG. 7 illustrates a distal portion image and a proximal portion image provided by an embodiment of the present application;

FIG. 8 illustrates a top view based on a segmentation mask corresponding to the distal portion image and a segmentation mask corresponding to the proximal portion image;

FIG. 9 illustrates regions for histogram statistics in a top view provided by an embodiment of the present application;

FIG. 10 illustrates a histogram provided by an embodiment of the present application;

FIG. 11 illustrates a slider search result provided by an embodiment of the present application;

FIG. 12 shows lane line fitting results provided by embodiments of the present application;

fig. 13 illustrates an operation principle of a ranging method provided in an embodiment of the present application;

fig. 14 illustrates a flow of a ranging method provided in an embodiment of the present application;

fig. 15 illustrates a ranging process of a ranging method provided in an embodiment of the present application;

FIG. 16 illustrates a calibration procedure of a ranging method provided by an embodiment of the present application;

fig. 17 shows a structure of a lane line detection apparatus according to an embodiment of the present application;

fig. 18 shows a structure of a distance measuring device according to an embodiment of the present application;

fig. 19 shows a structure of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element. The terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Fig. 1 shows a flow of a lane line detection method provided in an embodiment of the present application. The method may be, but is not limited to being, performed by an electronic device, the structure of which is described in detail below with respect to fig. 17. Referring to fig. 1, the method includes:

step S110: and determining an interested area image containing the lane line from the road image acquired by the vehicle-mounted camera.

The vehicle-mounted camera in step S110 is a camera mounted on the vehicle, and the specific mounting position is not limited, for example, the vehicle head and the vehicle tail may be used, and hereinafter, the case where the vehicle-mounted camera is mounted on the vehicle head is mainly taken as an example, and the road image is an image in front of the vehicle. Hereinafter, the concept of "own vehicle" is sometimes referred to, and "own vehicle" and a road image have a correspondence relationship, and for a specific road image, the "own vehicle" refers to a vehicle where a camera which captures the image is located.

The lane line detection method provided by the embodiment of the application has multiple application modes: for example, a road image can be acquired by a vehicle-mounted camera in real time and detected by vehicle-mounted equipment in real time; for another example, a data set consisting of road images captured by a vehicle-mounted camera may be collected, detected on a PC or server, and so forth.

The region of interest in step S110 is a region including a lane line in the road image, and the region of interest image, that is, a portion of the road image located in the region of interest, obviously, once the region of interest is determined, the region of interest image is easily intercepted from the road image. Note that "including" here should be understood as meaning that if there is a lane line in the road image, the lane line will appear in the region of interest, and should not be understood as meaning that the road image necessarily has a lane line in the region of interest (because there is no lane line on some roads).

Since in the road image the road is always at the bottom of the image, in a relatively simple implementation a certain area at the bottom of the road image (e.g. the area where the bottom occupies the full map area 1/3) may be directly determined as the region of interest. Alternatively, since the mounting position, angle, and the like of the in-vehicle camera are generally fixed, it is possible to calculate a vanishing point in the road image and determine the area below the vanishing point as the region of interest, and the like.

Referring to fig. 2, the area below the white horizontal line is an area of interest, and the image corresponding to the area is an area of interest image, which is easy to see that the area of interest image includes the lane line to be detected.

Step S120: and processing the image of the region of interest by utilizing a semantic segmentation network, obtaining a segmentation mask of the lane line in the image, and converting the segmentation mask into a corresponding top view.

The semantic segmentation network is a trained convolutional neural network, and the network takes an image of a region of interest or an image of the region of interest after preprocessing (such as scaling, normalization and the like) as an input and outputs a segmentation mask of a lane line in a road image.

The division mask may be regarded as a representation of a lane line division result, and the division mask may include category information of the lane line and position information of the lane line for each category. In the scheme of the application, the segmentation mask has pixel-level precision, namely the resolution is the same as the input image of the semantic segmentation network, and each pixel point in the segmentation mask indicates the segmentation result of the pixel point at the corresponding position in the input image. The inventor finds that the high-precision lane line segmentation is beneficial to improving the lane line detection precision in the subsequent steps.

The lane line has various classification modes, for example:

if the classification is 11 classes, the categories may be:

[ background (non-lane line), white solid line, solid-yellow line, solid-white line, solid-yellow line, dotted-white line, dotted-yellow line ]

If the classification is 6 classes, the categories may be:

[ background, Single solid line, Single dashed line, double solid line, double dashed line, solid dashed line ]

If classified into 3 categories, the categories may be:

[ background, Single line, double line ]

If classified into 2 categories, the categories may be:

[ background, lane line ]

The segmentation mask may have different forms according to different designs:

for example, fig. 3 shows a lane line segmentation mask, which is a gray scale map, in which pixel points represent different types of lane lines by different gray scale values, for example, black represents a background, white represents a single dotted line, gray represents a single solid line, and the positions of regions with different gray scale values in the segmentation mask represent the positions of lane lines with different types.

For another example, the segmentation mask of the lane lines may be an RGB map, where the pixels in the map represent different RGB values to represent different lane line categories, for example, black represents the background, green represents a single dotted line, blue represents a single solid line, and the positions of the regions with different RGB values in the segmentation mask represent the positions of the lane lines in different categories.

For another example, the segmentation mask of the lane lines may be a multi-value graph, in which pixel points take different enumerated values to represent different lane line categories, for example, 0 represents a background, 1 represents a single dotted line, 2 represents a single solid line, and the positions of regions with different enumerated values in the segmentation mask represent the positions of lane lines with different categories.

In the above three forms of the division mask, the RGB map is relatively suitable for external display, so that the user can visually see the lane line division result, and even if the division mask is in the form of a gray scale map or a multi-value map, the division mask can be converted into the RGB map and then displayed externally. It should be understood that the implementation form of the division mask is not limited to the above three forms.

The specific structure of the semantic segmentation network is not limited, and for example, the architectures such as FCN, SegNet, UNet and the like in the prior art can be adopted, and the present application also proposes a new semantic segmentation network architecture, which is specifically described below, and the network structure diagram in fig. 4 is mainly used as a reference during the description:

the semantic segmentation network comprises a backbone network and a feature fusion network connected with the backbone network, wherein the backbone network is used for extracting multi-scale features of an image of an interested area, the feature fusion network fuses the multi-scale features extracted by the backbone network and outputs segmentation results aiming at different lane line categories, and certainly, the segmentation results may have certain differences with segmentation masks in form and need to be further converted into the segmentation masks. In the semantic segmentation network, the segmentation mask is obtained by extracting and fusing the multi-scale features, and the fused features have strong expression capability, so that the lane line segmentation precision is improved. For example, in fig. 4, the boxes in the leftmost column represent the backbone network, and the boxes in the right two columns represent the feature fusion network.

When designing the semantic segmentation network, the network structure can be designed according to the above-mentioned functional description of the backbone network and the feature fusion network.

For example, in one design, the backbone network may further include a plurality of bottleeck modules, and some of the features extracted from these bottleeck modules may be selected as the multi-scale features. Among them, the bottompiece module is a convolution module in MobileNet (e.g., MobileNetv2), and its internal structure can be referred to MobileNet, which is omitted here. Because MobileNet is a lightweight network, and deep separable convolution is widely used, a backbone network is realized by means of a bottleeck module in MobileNet, so that the division accuracy is met, the calculation amount of a semantic division network is reduced, and the real-time performance of lane line division is improved.

Referring to fig. 4, the backbone network in fig. 4 includes 1 convolutional layer (Conv2d) and 7 bottleeck modules connected in series (the numbers in the boxes of the backbone network represent the feature size of the input current module, and the three numbers represent the width, height and number of channels, respectively), and further selects the features (the scales are 52 × 28, 26 × 14 and 13 × 7, i.e., the results of sampling the input image by 8 times, 16 times and 32 times) output by the 4 th, 6 th and 7 th bottleeck modules as the output of the backbone network, wherein which features output by the bottleeck modules are to be selected as the output of the backbone network can be determined by experiments.

In one design, the feature fusion network convolves the feature of each scale output by the backbone network and adds the feature to the fusion feature of the scale, and the addition result is subjected to deconvolution upsampling to obtain the fusion feature of the previous scale, which can be used for continuing to fuse with the feature of the previous scale output by the backbone network. There are two special cases, the minimum scale feature output by the backbone network is directly used as the addition result after convolution (because there is no minimum scale fusion feature), and the previous scale fusion feature calculated by the maximum scale feature output by the backbone network is directly used as the segmentation result.

Referring to fig. 4, it is mentioned that the 4 th, 6 th and 7 th bottleck modules of the backbone network output features with scales 52 × 28, 26 × 14 and 13 × 7, respectively, where 52 × 28 is the last scale of 26 × 14, 26 × 14 is the last scale of 13 × 7, 52 × 28 is the maximum scale, and 13 × 7 is the minimum scale.

The feature with scale 13 × 7 was processed by convolution layer (conv of 13 × 7 × 11) and directly used as the additive result, and after the additive result was up-sampled by deconvolution layer (Deconv of 26 × 14 × 11), the scale of the resulting fused feature was 26 × 14, which was the same as the feature of the previous scale. The feature with the scale of 26 × 14 is processed by a convolution layer (conv of 26 × 14 × 11), then added with the fusion feature with the scale of 26 × 14 pixel by pixel, and the addition result is up-sampled by a deconvolution layer (Deconv of 52 × 28 × 11), so that the scale of the obtained fusion feature is 52 × 28 and is the same as the feature of the previous scale. The feature with the scale of 52 × 28 is processed by the convolution layer (conv of 52 × 28 × 11), and then added with the fusion feature with the scale of 52 × 28 pixel by pixel, and after the addition result is up-sampled by the deconvolution layer (Deconv of 416 × 224), the obtained fusion feature has the scale of 416 × 224, and has the same resolution as the input image of the semantic segmentation network, and the fusion feature is also the segmentation result of the semantic segmentation network. It should be noted that the numbers in the boxes of the feature fusion network represent the feature sizes of the output current modules, which is different from the meaning of the numbers in the backbone network.

It can be seen that, in the above feature fusion network, the features are up-sampled and additively fused step by step according to the sequence of the feature scale from small to large, and since the features of different scales contain different semantic information and correspond to different receptive fields, the expression capability of the features can be significantly improved through feature fusion, and then the lane line segmentation accuracy is improved.

Further, in fig. 4, the segmentation result has 11 channels (416 × 224 × 11) representing 11 lane line categories, wherein each channel is used to indicate a segmentation status of a lane line. For example, the pixel value in the kth channel (k is an integer from 1 to 11) may be a confidence level, which represents the probability that the corresponding pixel belongs to the kth lane line category, and which pixel belongs to the kth lane line category may be determined by setting a threshold for the kth channel. However, according to the foregoing, the segmentation mask may be only a single-channel image (grayscale image, multi-value image) or a three-channel image (RGB image), and therefore, after obtaining the segmentation result, the segmentation result needs to be converted into the segmentation mask.

Although it is possible to directly detect the lane line based on the division mask, in the embodiment of the present application, the lane line detection is not directly performed based on the division mask, but the division mask is first converted into a corresponding top view, and then the lane line detection is performed based on the top view.

The reason for this is that there is a step of searching for a lane line pixel point in the lane line detection process (step S130), and the accuracy of the lane line fitting is directly determined by the search result (step S140, the fitted lane line equation, i.e., the lane line detection result). However, in the segmentation mask, originally parallel lane lines are likely to converge together at a distance due to a perspective effect, so that certain algorithms (for example, a sliding frame algorithm mentioned later) for searching for lane line pixel points are difficult to distinguish pixel points belonging to different lane lines when a lane line convergence position is searched, further the lane line detection result is inaccurate, and after the lane line convergence position is converted into a top view, due to the fact that the perspective effect in the segmentation mask is eliminated, the parallel relation between different lane lines is kept in the top view, so that the algorithms for searching for lane line pixel points can obtain a more accurate result, and the accuracy of lane line detection can be improved. Wherein the transforming the top view may employ an inverse perspective transformation.

In converting the top view, the entire division mask may be converted into the top view. However, the inventor has found that, on the one hand, in many applications regarding lane lines, only the lane in which the host vehicle is located (corresponding to two lane lines) or mainly the lane, on the other hand, the lane lines are usually regularly extended (for example, extended in a straight line), and it is sufficient to fit the lane line equation in step S140 even if only a part of the lane line pixel points are searched in step S130. Therefore, based on these two considerations, in some implementations, an area to be detected may be determined from the segmentation mask by using internal and external parameters (which are calibrated in advance) of the vehicle-mounted camera, and only a portion of the segmentation mask located in the area to be detected may be converted into a top view. The region to be detected represents a near region of the lane where the vehicle is located, for example, rectangular regions with 4 meters left and right and 100 meters vertical of the camera. Because the lane line detection is not required to be carried out based on the whole mask image, the method can save the calculation amount and hardly influences the lane line detection effect.

Because of the perspective effect, the region to be detected may appear as a trapezoid region in the segmentation mask, as shown in fig. 5, and the top view obtained by converting the region to be detected is shown in fig. 6, it is easily seen that in fig. 6, the two lane lines are approximately parallel.

Further, in some implementations, when the segmentation mask is converted into the top view, it may also be converted into a binary map (if the segmentation mask is not originally a binary map), for example, in the binary map, a background may be represented by a pixel value of 0, and a lane line may be represented by a pixel value of 1 (without distinguishing which lane line is). The motivation is that in the process of detecting lane lines based on the top view (see the following steps in detail), the lane line category obtained by the semantic segmentation network may not be used, and thus the subsequent detection steps (for example, statistical histograms or the like) may be simplified by using the binary map as the top view. Referring to fig. 6, fig. 6 shows a visualization result of a binary map (0 is mapped to black, and 1 is mapped to white), and although one of the two lane lines is a single dotted line and the other is a single solid line, both lane lines are shown in white in fig. 6 and are not distinguished.

As mentioned above, the region-of-interest image may be scaled before being input into the semantic segmentation network. With the improvement of the performance of the existing camera, it is a common situation that a road image has a higher resolution, and an input resolution of a semantic segmentation network is lower, that is, an image of an area of interest is reduced by a larger proportion and then input into the semantic segmentation network for processing.

In such a case, the inventors found the following problems: in the image of the region of interest, due to the perspective effect, the far-end lane lines occupy fewer pixel points, and the near-end lane lines occupy more pixel points, so that in the input image generated after the input image is reduced, the far-end lane lines only occupy fewer pixel points, and thus, the semantic segmentation network is difficult to effectively segment the input image, and the part of the lane lines is difficult to effectively detect subsequently.

In some implementations, to improve this problem, the following may be done:

first, a distal portion image containing a distal lane line and a proximal portion image containing a proximal lane line are determined from the region-of-interest image. For example, the far-end partial image and the near-end partial image may be cut from the region-of-interest image according to the position of the vanishing point and a preset proportion. Referring to fig. 7, two black boxes are shown in fig. 7, the upper representing the distal portion image and the lower representing the proximal portion image, the proximal portion image may be sized larger than the distal portion image due to perspective effect so that it contains a complete lane line.

Then, the far-end partial image and the near-end partial image are respectively processed by utilizing a semantic segmentation network, and a segmentation mask of the lane lines in the far-end part image and a segmentation mask of the lane lines in the near-end part image are obtained. The manner in which the segmentation mask is computed is similar to that described above for only one input image, and the resulting two segmentation masks are also similar to that of FIG. 3 and will not be repeated here.

Finally, the segmentation mask for the lane lines in the distal portion image and the segmentation mask for the lane lines in the proximal portion image are transformed into top views corresponding to both, and the method for transforming the top views is described above, and the difference here is only that two segmentation masks need to be mapped into the same top view. Before the conversion, the first region to be detected and the second region to be detected may be determined from the two division masks, respectively, and only the contents of the division masks in the two regions are converted into the top view, and the first region to be detected and the second region to be detected together correspond to the aforementioned region to be detected (the trapezoid frame in fig. 5).

The far-end partial image and the near-end partial image also need to be scaled before being input into the semantic segmentation network so as to meet the requirement of the semantic segmentation network on input resolution. However, since the distal portion image and the proximal portion image are only a portion of the region-of-interest image, and the size of the distal portion image and the proximal portion image is smaller than the size of the region-of-interest image, even if the distal portion image is reduced, the reduction ratio is not as high as that when the region-of-interest image is directly reduced, and the distal portion image is not necessarily reduced or may be enlarged according to the input resolution of the semantic segmentation network. Therefore, in the zoomed far-end part image, the far-end lane line can occupy more pixel points, so that the semantic segmentation network can effectively segment the image and can also effectively detect the far-end lane line subsequently.

Referring to fig. 8, fig. 8 shows a top view (using the first detection area and the second detection area) obtained in these implementations, it can be easily seen that the number of the lane line pixels in fig. 8 is increased compared to fig. 6, especially the pixels above the top view and representing the far lane lines. Therefore, more lane line pixel points can be searched from the top view in the subsequent step, and a more accurate lane line equation can be obtained through fitting. In the following, for the sake of simplicity, the description is continued by taking the top view in fig. 6 as an example. Step S130: and determining a set of lane line pixel points corresponding to different lane lines in the top view, and converting coordinates of the lane line pixel points in the set from the coordinates in the top view into coordinates in a road image or coordinates in a world coordinate system.

In the top view, each lane line is composed of some pixels, these pixels are called lane line pixels, a set composed of these pixels is called a lane line pixel set, each lane line corresponds to its own lane line pixel set, and the main task of step S130 is to search out the lane line pixel set corresponding to each lane line in the top view. For example, in fig. 6, two sets of lane line pixels should be ideally searched, and the two sets of lane line pixels correspond to a single dotted line and a single solid line in the figure respectively.

In order to search the set of lane line pixel points, an initial search position needs to be determined, and further search is performed from the initial search position. In some implementation manners, an intersection point of a lane line and a bottom edge of the top view may be taken as a starting search position, and the intersection point is called as a positioning base point of the lane line, obviously, each lane line corresponds to one positioning base point, after the position of the positioning base point is determined, a certain search algorithm is used to perform search along the longitudinal direction of the top view, and a set of lane line pixel points corresponding to different lane lines in the top view may be obtained.

Of course, the position of the positioning base point in the top view is unknown and needs to be obtained through calculation. For example, the number of the lane line pixel points at each abscissa in the top view may be counted to obtain a histogram, and then the positioning base point of the lane line in the top view may be determined according to the abscissa of the peak in the histogram.

The pixels representing the lane lines and the pixels representing the background in the top view have different values (for example, when the top view is a binary view, one pixel takes 1 and the other pixel takes 0), so that the statistics is feasible. In the histogram obtained by statistics, the peak necessarily corresponds to the position where the pixel points of the lane line are most densely distributed, and naturally, the peak is most likely to be the position where the lane line is located, so that the abscissa of the peak is used as the abscissa of the positioning base point (since the positioning base point is located at the bottom of the top view, the ordinate of the positioning base point is known, and calculation is not needed), and the position of the positioning base point can be quickly and accurately obtained.

Optionally, when the histogram is counted, only an area with a specified height at the bottom of the top view may be taken for counting, and it is not necessary to take the whole top view for counting, on one hand, because the positioning base point may be defined as an intersection point between the lane line and the bottom edge of the top view, which is not too much related to the lane line at a far distance, it is sufficient to search in the small area, which is not only favorable for saving the computation amount, but also may obtain a more accurate position of the positioning base point, and on the other hand, because the small area is closer to the camera, the inclination degree of the lane line is smaller, which is also favorable for more accurately determining the position of the positioning base point.

For example, for the top view in fig. 6, assuming that the resolution is 80 × 180, when counting the number of lane line pixels at each abscissa, the counting may be performed only in the region corresponding to 20 meters near the bottom (20 meters may be converted into the corresponding pixel height, for example, 50, according to the internal and external parameters of the vehicle-mounted camera), and the region is marked by a white box in fig. 9. Fig. 10 shows the corresponding statistical results, where the horizontal axis x of the histogram in fig. 10 represents the horizontal coordinate of the top view, and the vertical axis m represents the number of the lane line pixel points, so that it is easy to see that the histogram includes two distinct peaks, which respectively correspond to the horizontal coordinates of the two positioning base points in fig. 6. In an alternative scheme, for the histogram obtained by direct statistics, filtering may be performed first, and then peaks are searched in the filtered histogram, and the filtering operation is performed such that only one peak is generated in the histogram for each lane line.

Different strategies can be adopted for searching peaks from the histogram, and the following description is continued:

detecting lane lines is likely to be a continuous process, for example, where a vehicle-mounted camera continuously captures road images and detects the road images while the vehicle is traveling. The road image acquired before the current road image is called a preamble image of the current road image, the preamble image is usually also used for lane line detection, and the detection frequency may be preset (possibly for a short time), for example, every frame is detected or every three frames are detected. For the most recent preceding images for detecting the lane line, if the position of the positioning base point is already determined during the detection (the positioning base points are not called known positioning base points), the position of the positioning base point does not deviate too far from the position of the known positioning base point if the positioning base point of the lane line also exists in the top view corresponding to the current road image in consideration of the continuity of the vehicle motion and the continuity of the lane line. Thus, one alternative peak search strategy is (hereafter strategy a): searching for a peak in a preset range near the abscissa of the known positioning base point in the histogram, which can significantly improve the efficiency of determining the position of the positioning base point. The preset range may be 1 meter left and right of a known positioning base point (the 1 meter needs to be converted into a corresponding pixel width according to internal and external parameters of the vehicle-mounted camera).

For a previous image for detecting a lane line, if the position of a positioning base point cannot be determined during detection (for example, there is no lane line in the previous image), or the detection of the lane line is started from a current road image, and the detection of the lane line is not performed for the previous image, or the current road image is a first frame image collected by a camera, and there is no previous image at all, and these factors will result in that there is no known positioning base point. At this time, only another peak search strategy (hereinafter, strategy B) can be adopted: the peaks are searched over the full abscissa of the histogram.

How to determine two positioning base points corresponding to the lane where the vehicle is located is further described below by combining strategies a and B:

if there are two known positioning base points (corresponding to the two lane lines of the lane where the host vehicle is located), according to the above explanation, strategy a should be adopted, that is, the peaks are searched in the preset ranges located near the abscissa of the two known positioning base points in the histogram, respectively. If two peaks are found and their abscissas x1 and x2 satisfy the condition thresh _ min < abs (x1-x2) < thresh _ max, then x1 and x2 are determined as the abscissas of two positioning base points. Where thresh _ min is a preset minimum width of a lane line (e.g., a pixel width corresponding to 2.5 meters), thresh _ max is a preset maximum width of a lane line (e.g., a pixel width corresponding to 3.75 meters), and abs represents an absolute value calculation, which is used to check whether the distances abs (x1-x2) between x1 and x2 are reasonable, and if the distances are too close or too far, they cannot represent a real lane.

If there is no known location base, then strategy B should be employed, as explained above, and peaks can be searched within the abscissa ranges [0, L/2] and [ L/2, L ] of the histogram, respectively, where L is the maximum of the abscissa of the histogram (e.g., 80), so this search is equivalent to a full-range search.

If two peaks are found and the abscissa x1 and x2 satisfy the condition thresh _ min < abs (x1-x2) < thresh _ max, determining x1 and x2 as the abscissa of two positioning base points;

if two peaks are found and the abscissa x1 and x2 thereof satisfy the conditions 2 × thresh _ min < abs (x1-x2) < 2 × thresh max and abs (x1-L/2) < abs (x2-L/2), determining x1 and (x1+ x2)/2 as the abscissa of two positioning base points;

if two peaks are found and the abscissa x1 and x2 thereof satisfy the conditions 2 × thresh _ min < abs (x1-x2) < 2 × thresh _ max and abs (x1-L/2) > abs (x2-L/2), x2 and (x1+ x2)/2 are determined as the abscissa of two positioning base points.

Since the lane widths of the different types are different, even if the size of the region to be detected is limited to only two lane lines when the plan view is changed, three lane lines may be included in practice. For example, the lane width may be between 2.5 meters and 3.75 meters, if the width of the region to be detected is 4 meters from the left to the right of the vehicle-mounted camera, for a lane with a width of 3.75 meters, two lane lines are basically included, and for a lane with a width of 2.5 meters, three lane lines are likely to be included.

If the abscissa of the two peaks satisfies the condition thresh _ min < abs (x1-x2) < thresh _ max, which indicates that there are two lane lines in the top view at this time, x1 and x2 respectively correspond to the positions of the two lane lines of the lane where the host vehicle is located, so that the positioning base point can be determined directly according to x1 and x 2.

If the abscissa of the two peaks satisfies the condition 2 × thresh min < abs (x1-x2) < 2 × thresh _ max, it indicates that there are three lane lines in the top view at this time, and the positioning base points corresponding to the two lane lines of the lane where the vehicle is located should be further calculated.

At this time, if abs (x1-L/2) < abs (x2-L/2), it indicates that the vehicle is in the lane near the left (the three lane lines together form two lanes), x1 and (x1+ x2)/2 respectively correspond to the positions of the two lane lines of the lane where the vehicle is located (the two lane lines near the left of the three lane lines), i.e. the abscissa of the two positioning base points to be determined; if abs (x1-L/2) > abs (x2-L/2) indicates that the vehicle is in the lane to the right, x2 and (x1+ x2)/2 respectively correspond to two lane line positions (two lane lines to the right of the three lane lines) of the lane in which the vehicle is located, namely, the abscissa of the two positioning base points to be determined.

After the positioning base point of the lane line is determined, searching may be performed along the longitudinal direction of the top view from the positioning base point to obtain a set of lane line pixel points corresponding to different lane lines in the top view, and the specific searching manner is not limited, and a sliding frame searching algorithm is described below as an example.

For each positioning base point obtained in the previous step, the following steps are performed:

(1) determining the initial position of a sliding frame according to the positioning base point, wherein the sliding frame is a rectangular frame with a preset size;

(2) continuously moving the sliding frame to the top of the top view from the initial position for searching, judging whether the sliding frame is effective at the position according to a preset rule after the sliding frame is moved to one position, and adding lane line pixel points in the sliding frame into a lane line pixel point set corresponding to the positioning base point if the sliding frame is effective;

(3) and (3) repeating the step (2) until the sliding frame reaches the top of the top view, wherein the obtained lane line pixel point set is the lane line pixel point set corresponding to the positioning base point.

The sliding frame search algorithm is simple to implement, takes the sliding frame as a basic search unit, has smaller search granularity, and ensures that the search result gives consideration to both efficiency and accuracy. In addition, the algorithm also allows the position of the sliding frame to be flexibly adjusted according to the search result in the search process, and is favorable for further improving the accuracy of the search result. Fig. 11 shows a moving track of a sliding frame (from bottom to top) in a sliding frame search process, and it can be seen that the sliding frame effectively covers a pixel point area (white area) of a lane line, and an intersection does not exist between two sets of sliding frames corresponding to two lane lines, so that a search result is accurate.

The algorithm steps are further explained below:

in step (1), if the positioning base point is located on the bottom side of the top view, the positioning base point may be used as the middle point of the bottom side of the slide frame, so as to determine the start position of the slide frame.

In step (2), the preset rule may be: and judging whether the number of the lane line pixel points in the sliding frame exceeds a first threshold value at the current position, if so, determining that the sliding frame is effective, otherwise, determining that the sliding frame is invalid. The starting point for setting the rule is that if the number of lane line pixel points in the sliding frame is large (exceeds a first threshold), it indicates that the position of the sliding frame is matched with the position of the lane line, the searched lane line pixel points should be effective search results, otherwise, the lane line pixel points in the sliding frame may be false detection noise or the sliding frame deviates from the position of the lane line. It is understood that the preset rule may take other rules.

In step (2), the slide frame is only required to move towards the top of the top view as a whole, but it is not mandatory that the slide frame must move towards the top of the top view each time, for example, the vertical coordinate of the slide frame is allowed to remain unchanged and the position of the slide frame is adjusted laterally.

In some implementations, the next position of the slider can be calculated from the current search results. For example, when the sliding frame is valid, the mean of the abscissa of the pixel points of the lane line in the sliding frame is calculated, and the new position to which the sliding frame is to be moved is determined according to the calculation result, for example, the mean of the abscissa is taken as the abscissa of the new position, the current ordinate of the sliding frame is increased (or decreased, depending on the direction of the vertical axis) by the height of one sliding frame and then taken as the ordinate of the new position, and the midpoint of the bottom edge of the sliding frame coincides with the new position after the sliding frame is moved. In the implementation modes, the position of the sliding frame is dynamically calculated according to the abscissa of the lane line pixel point in the sliding frame, so that the sliding frame is favorable for moving along with the extension trend of the lane line, and a better search result is obtained.

When the sliding frame is invalid, the above method for calculating the mean value of the abscissa can no longer be used to determine the new position of the sliding frame, but the sliding frame should be moved continuously to try to search for new lane line pixel points, rather than stopping the search, because the sliding frame is invalid, which may be a "normal" condition: for example, the lane line is originally a dotted line, and there is a break in the middle, so that there are no or only few lane line pixels in the sliding frame after the sliding frame moves; for another example, the lane line is changed drastically (e.g., greatly bent), and the sliding frame cannot effectively move along with the lane line, so that there are no or only few lane line pixels therein, and so on.

There are many possible ways of handling this. One possible way is to keep the abscissa of the current slide constant and continue moving the slide toward the top of the top view (e.g., the height of one slide may be moved), for example, if there is a break in the lane line, this may cause the slide to quickly match the next lane line. Another possible way is to first calculate the extension trend of the lane line according to the searched lane line pixel points (i.e. the current set of lane line pixel points). For example, a temporary lane line equation may be fitted to the searched lane line pixel points or a vector representing the lane line extension trend may be calculated, and so on. Then, the sliding frame is moved along the extension trend of the lane line, and the calculated extension trend is better predicted for the position of the lane line due to the continuity of the lane line, so that the sliding frame can be ensured to move along with the lane line to a great extent, for example, if the lane line is changed violently, the sliding frame can be matched to a new lane line position quickly.

For step (3), in some implementation manners, if the sliding frame search is finished, it may be further determined whether the total number of the effective sliding frames obtained in the search process exceeds a second threshold, and if the total number of the effective sliding frames exceeds the second threshold, the available set of lane line pixels is identified, otherwise the available set of lane line pixels is not identified (that is, the search of the set of lane line pixels from the current positioning base point fails). The starting points for these implementations are: if the number of the effective sliding frames is large (exceeds a second threshold), the obtained lane line pixel point set is high in reliability and should be approved, and otherwise, the lane line pixel point set is not approved. The accuracy of lane line detection can be improved through the judgment, and false detection is avoided.

After the search process in step S130 is completed, the set of lane line pixel points corresponding to different lane lines in the top view has been obtained, but the coordinates of the pixel points are still the coordinates in the top view, and need to be converted into the coordinates in the road image or the coordinates in the world coordinate system, i.e., the coordinates in a certain common view. Whether the coordinates in the road image are to be converted (the road image also corresponds to an image coordinate system) or the coordinates in the world coordinate system can be determined according to the subsequent use requirements of the lane line equation. For example, if the lane line equation is used for lane departure warning, lane keeping, lane change assistance, and the like, the lane line equation may be converted into coordinates in a world coordinate system, perspective transformation may be used during the conversion, and coordinate calculation may be performed by combining a correspondence between a pixel distance in a top view and a real distance; for another example, to use lane-line equations for ranging (see the step in FIG. 14 in detail), it can be transformed into coordinates in the road image. Of course, the coordinates in the road image and the coordinates in the world coordinate system may be converted to each other.

Step S140: and fitting to form a lane line equation according to the set of the lane line pixel points after the coordinates are transformed.

According to the coordinate system to which the pixel point belongs in step S130, the lane line equation fitted in step S140 may be a lane line equation in the road image, or a lane line equation in the world coordinate system, and the method of fitting the equation to the set of points may be implemented by using the prior art, which is not specifically described herein. For example, the fitted lane line equation is a lane line equation in the road image, and if x represents the abscissa and y represents the ordinate in the road image, the equations of the two lane lines of the lane in which the host vehicle is located may be represented as x ═ fl (y) and x ═ f2(y), respectively. The fl and f2 are curves that can be selected according to actual requirements, and in most cases, the accuracy of selecting straight lines or quadratic curves is enough. Fig. 12 shows one possible lane line detection result with two black straight lines.

The following briefly summarizes the advantages of the lane line detection method provided by the embodiment of the present application over the existing methods:

firstly, a deep learning method (meaning segmentation network) is utilized to segment the lane line with pixel precision, which is beneficial to improving the segmentation precision of the lane line and further improving the detection precision of the lane line in the subsequent steps. In some implementation manners, the semantic segmentation network can be designed to be light and multi-scale feature fusion, and the segmentation performance of the semantic segmentation network is improved.

Secondly, the lane line detection is not directly performed based on the segmentation mask, but the segmentation mask is converted into a corresponding top view, and then the lane line detection is performed based on the top view, so that the parallel relation between different lane lines can be kept in the top view as much as possible, and the accuracy of the lane line detection can be improved. In addition, when the detection is carried out based on the top view, methods of searching a positioning base point based on the histogram, searching a lane line pixel point set based on a sliding frame and the like are further provided, and the method is also beneficial to improving the lane line detection precision.

The detected lane line is not limited to its use, and may be used for lane departure warning, lane keeping, lane change assistance, and the like, for example, and these functions are enhanced accordingly after the lane line detection accuracy is improved. In addition, the present application also proposes a method for ranging based on the detected lane line equation, which can be used to measure (or predict) the distance between the vehicle and the target to be measured on the road, such as a vehicle, a pedestrian, a rider, etc., most commonly a vehicle.

In the prior art, methods for ranging based on road images acquired by a vehicle-mounted camera exist, generally, target detection is performed on the road images, then the distance between the vehicle and the vehicle is calculated by using the width of the lower edge of a target detection frame of a vehicle in front, it is assumed that all vehicles are of equal width, but the assumption is not true actually, statistics shows that the width of the vehicle generally varies from 1.5 meters to 3 meters, that is, the width of the lower edge of the target detection frame may represent different actual widths rather than a fixed value, which results in that the accuracy of the ranging result of the method is not high, and research shows that the accuracy is only about 30%. The distance measurement method provided by the application measures distance on the assumption that lanes have the same width, namely two lane lines of the same lane are always kept parallel, and the assumption is established under most conditions, so that the theoretical basis of the distance measurement method is very reliable, and the distance measurement precision is remarkably improved compared with the conventional method.

In the following, some basic concepts related to the ranging method are explained with reference to fig. 13, then the basic principle of the ranging method is described, and finally the specific steps of the method are described.

Referring to fig. 13, a rectangular area above the x-axis represents a road image (or the x-axis is at a position on the bottom side of the road image), and the x-axis represents the abscissa in the road image in units of pixels.

Meanwhile, the x axis is also a boundary position of a blind area of the vehicle-mounted camera, namely the so-called blind area, which is an area where the camera cannot collect images. In fig. 13, a rectangular area below the x-axis represents a blind area, and the in-vehicle camera is mounted at O. The point O also represents a ranging origin, that is, the distance between the target to be measured and the vehicle is the distance between the target to be measured and the point O.

Four lane lines, i.e., LO ', RO ', L ' O ', R ' O ' (for simplicity in explaining the principle, the case where the lane lines are straight lines is taken as an example), are included in the road image, where O ' represents a vanishing point, i.e., a position where the lane lines converge due to the perspective effect. Of course, the vanishing point may be outside the visible range of the lane lines in the road image, so the near portion of the lane lines in fig. 13 is shown with solid lines, indicating visible in the road image, and the far portion is shown with dashed lines, indicating invisible in the road image. In the four lane lines, LO 'and RO' belong to the same lane, L 'O' and R 'O' belong to the same lane, and the widths of the two lanes are different.

With continued reference to fig. 13, for convenience of explaining the distance measurement principle, it is assumed that the vehicle runs in the center of the road, and the vehicle-mounted camera is also installed in the center of the vehicle head, so that the connection line OO' is the central axis of the lane. A line segment which is perpendicular to the central axis of the lane and intersects with the lane line is made in the road image, the line segment is called as an equal-width line, and for the same lane, the equal-width lines at different positions correspond to the same actual width. For example, the equal-width lines at A, B, C may all correspond to a lane line width of 3 meters. The width of the equal-width line in the road image (referred to as the pixel width) is denoted as W _, and the width of the equal-width line in the road image is different due to the perspective effect, and can be also visually seen in fig. 13. In particular, the equal-width line width at the blind-zone boundary is denoted as W.

In fig. 13, the s-axis represents world coordinates (the origin of coordinates is point O) of the position of the equal-width line (which may be understood as the position of the intersection of the equal-width line and the center axis of the lane, for example, point A, B, C), that is, the actual distance between the equal-width line and the host vehicle (point O). A coordinate system is formed by taking the s-axis as the vertical axis and the x-axis as the horizontal axis.

The basic principle of the distance measuring method provided by the application is as follows:

the lane at the blind zone boundary and the lane at a certain target to be detected are equal in width, but due to perspective effect, the widths of the equal-width lines corresponding to the lane and the target to be detected in the road image are different, namely W and W _. The ratio of the two is recorded as r, that is, r is W/W (certainly, the numerator and denominator can be exchanged), the value of r changes with the distance between the target to be measured and the vehicle, the change rule between the two satisfies an equation s is f (r), the parameters of the equation can be determined by calibration in advance, the values of the parameters are kept constant after calibration for the same camera, and the distance s between the target to be measured and the vehicle can be obtained by directly substituting r into the equation. The distance between the target to be measured and the vehicle is the world coordinate of the position of the equal-width line corresponding to the target to be measured.

For example, in the road image in fig. 13, assuming that the target to be measured is located at C, the distance s _ C (world coordinates of C) between the target to be measured and the host vehicle can be obtained by calculating the ratio r _ C between W _ C and W and then substituting r _ C into the equation s ═ f (r) without taking the width W _ C of the corresponding isowidth line as W _ C.

Further, in fig. 13, taking point a as an example, there is a relationship AA '/PR ═ AA "/PR', so that the ratio r ═ W _/W calculated at the same position of the equal width line is always the same for lanes of different widths, i.e., the distance measurement result of the distance measurement method is not affected by the actual width of the lane, or can be adapted to lanes of different widths.

Fig. 14 shows a flow of a ranging method provided in an embodiment of the present application, and the basic principle of the ranging method is described above. The method may be, but is not limited to being, performed by an electronic device, the structure of which is described in detail below with respect to fig. 17. Referring to fig. 14, the method includes:

step S210: and acquiring a road image acquired by the vehicle-mounted camera.

The distance measuring method in fig. 14 has a plurality of application modes corresponding to different acquisition modes of road images: for example, a vehicle-mounted camera can acquire a road image in real time and a vehicle-mounted device can perform distance measurement in real time, and the road image is acquired in real time; for another example, a data set consisting of road images captured by a vehicle-mounted camera may be collected, and distance measurement may be performed on a PC or a server, where the road images are obtained by reading from the data set, and so on.

Step S220: the lane line detection method provided by the embodiment of the application is utilized to obtain the lane line equation of the lane where the vehicle is located in the road image.

The specific implementation of step S220 has been explained above, and is not repeated here, and it is not assumed that the equations of the two lane lines of the lane where the host vehicle is located are x ═ f1(y) and x ═ f2(y), respectively.

Step S230: the first pixel width is calculated according to a lane line equation.

The first pixel width is a pixel width of a lane where the vehicle is located at a blind area boundary of the vehicle-mounted camera in the road image (i.e., a width of an equal-width line at the blind area boundary), and is denoted by W in continuation of the foregoing. It has been mentioned in the introduction of fig. 13 that the blind area boundary can be regarded as the bottom edge of the road image, so that the ordinate y _ W of the blind area boundary is known, and by substituting it into the lane line equation in step S220, the abscissa fl (y _ W) and f2(y _ W) of the two end points of the equal-width line at the blind area boundary can be calculated, and W can be obtained by calculating the difference abs (fl (y _ W) -f2(y _ W)) between the two abscissas. The pixel width represented by W is indicated in fig. 15, and the black line represents the fitted lane line equation.

Step S240: and calculating the second pixel width according to the lane line equation.

The second pixel width is the pixel width of the lane where the vehicle is located at the target to be detected (i.e. the width of the equal-width line at the target to be detected) in the road image, and is denoted as W _.

To calculate W _, firstly, an object in the road image needs to be detected, for example, object detection can be realized through a neural network model such as Mask-RCNN, YOLOv3, and the detection result includes the category information of the object and the position of the detection frame.

Then, the target to be measured, i.e. the target to be measured, which needs to be measured, needs to be determined from all the detected targets. For example, all detected targets may be set as targets to be detected. For another example, considering that the practical use of distance measurement is likely to avoid collision between the vehicle and surrounding targets, targets that are at risk of collision with the vehicle may be selected from all targets as targets to be measured, such as those targets whose detection frames have a large size and are not blocked by other detection frames. For example, the target to be measured may be selected based on the distance measurement characteristics of the method itself, and the inventors of the present invention have studied on the distance measurement method proposed in the present application and found that, although this method can perform distance measurement even for a target in a lane other than the vehicle, the distance measurement accuracy is lowered due to the perspective effect, and therefore, only the target detected in the lane in which the vehicle is present can be used as the target to be measured. For the targets in other lanes, if ranging is needed, other methods, such as radar ranging, may be adopted. In the case where the detection frame of the target and the lane line equation are known, it is possible to determine whether a target is located in the lane where the host vehicle is located.

Taking the case that the target to be detected is located in the lane where the vehicle is located as an example, the vertical coordinate y _ W _ of the target to be detected in the road image can be determined according to the detection frame of the target to be detected, for example, the vertical coordinate of the lower edge of the detection frame can be taken as the vertical coordinate of the target to be detected. Substituting the obtained ordinate into the lane line equation in step S220 can calculate the abscissa f1(y _ W _) and f2(y _ W _) of the two end points of the equal-width line at the target to be measured, and further can obtain W _bycalculating the difference abs (fl (y _ W _) -f2(y _ W _)) of the two abscissas. The position of the detection frame and the pixel width represented by W are indicated in fig. 15.

Step S250: a first ratio between the first pixel width and the second pixel width is calculated.

The first ratio is denoted r, r ═ W _/W (although the numerator may be exchanged).

Step S260: and substituting the first ratio into a distance measurement relation equation of the first ratio and the distance value, and calculating to obtain the distance value between the target to be measured and the vehicle.

The distance-measuring relation equation, i.e., S ═ f (r), is substituted into r in step S250, and S, i.e., the distance value between the target to be measured and the vehicle, is calculated.

The advantages of the ranging method are briefly summarized as follows: one is that the assumption of equal lane width is established under most conditions, so the theoretical basis is very reliable, and the distance measurement precision is high. Secondly, the distance measurement relation equation in the method is established between the first ratio and the actual distance, which is equivalent to eliminating the influence of the actual width of the lane on the distance measurement result (because the first ratio is not related to the actual width of the lane), so that the method can be applied to the lane with any width, and the distance measurement process is simple and efficient. In contrast, if the distance measurement relation equation is established between the second pixel width and the distance measurement result, the actual lane width has an influence on the distance measurement result, so that the distance measurement process becomes complicated and inaccurate.

Further, the inventor researches and discovers that the distance measuring method performs well on a scene with a straight lane line, the distance measuring relation equation has enough distance measuring precision when a straight line equation or a quadratic curve equation is adopted, and the calculation amount is small, wherein the precision is high when a quadratic curve is adopted. Therefore, the distance measurement method can be used only when the detected lane line is a straight line, and the distance measurement can be performed by other methods when the lane line is a curved line.

The calibration process of the distance measuring method is described below with reference to fig. 16, and the purpose of the calibration process is to solve the parameters in the distance measuring relation equation, so that the parameters are fixed before the steps S210 to S260 are performed. The steps of the calibration process include:

step a: and acquiring a calibration road image acquired by the vehicle-mounted camera.

The calibrated road image is collected by a vehicle-mounted camera to be calibrated, the road environment for collecting the calibrated road image can be as shown in fig. 16, an open road is selected, the vehicle is driven to the center of a certain lane with the same width (if the camera is also arranged at the center of the vehicle head, the camera is positioned on the central axis of the lane where the vehicle is located at the moment), and the vehicle head is opposite to the front, so that the optical axis of the camera is parallel to the lane line.

Step b: a third pixel width is obtained.

The third pixel width is the pixel width of the lane where the vehicle is located at the blind area boundary of the vehicle-mounted camera, which is measured in the calibrated road image. The definition of the third pixel width is similar to that of the first pixel width, and the explanation is not repeated. The third pixel width is not written as W _ 0.

Due to the calibration stage, the lane line equation (black line in fig. 16) can be manually marked in the calibration road image, and then the W _0 can be manually measured (of course, coordinate calculation can also be used).

Step c: a plurality of fourth pixel widths are obtained.

The fourth pixel width is the pixel width of the lane where the vehicle is located at a preset distance from the vehicle, which is measured in the calibrated road image, that is, the equal-width line width at the preset distance. Each fourth pixel width corresponds to a different preset distance, namely an equal-width line position. For example, 3 preset distances, 20 meters, 50 meters and 100 meters are shown in fig. 16, the positions of the equal-width lines corresponding to these preset distances being known (

positions

1, 2, 3), and may be marked in the figure, for example, manually, or markers may be placed on the road surface at these distances in advance to determine the positions thereof.

Due to the calibration stage, the lane line equation (black straight line in fig. 16) may be manually marked in the calibration road image, and then the equal-width line width at each preset distance may be manually measured (of course, coordinate calculation may also be used). For example, 3 equal-width line widths are measured in fig. 16 and are respectively denoted as W _1, W _2, and W _ 3.

Step d: and calculating a plurality of second ratios between the plurality of fourth pixel widths and the third pixel widths, and forming a plurality of data points consisting of the corresponding preset distances and the second ratios.

The number of data points used for calibration is related to the number of parameters to be solved: for example, if the distance measurement relation equation is a quadratic curve s ═ a × r2+ b × r + c, there are 3 parameters, so 3 data points need to be used, and 3 fourth pixel widths need to be measured in step c; for another example, if the distance measurement relation equation is a straight line s ═ a × r + b, there are 2 parameters, so 2 data points need to be used, and 2 fourth pixel widths need to be measured in step c as well.

Taking fig. 16 as an example, 3 second ratios can be calculated, which are respectively recorded as:

r1＝W_1/W_0

r2＝W_2/W_0

r3＝W_3/W_0

further, 3 data points P1(20, r1), P2(50, r2) and P3(100, r3) can be formed.

Step e: and solving parameters of the ranging relation equation according to the plurality of data points.

The method of solving the equation parameters using known data points is referred to in the art and will not be described in detail herein. For example, if the distance-measuring relational equation is s ═ a × r2+ b × r + c, the parameters a, b, and c can be solved by substituting the distance-measuring relational equation into P1, P2, and P3, and the form of the distance-measuring relational equation is determined, and the subsequent distance calculation can be performed.

Fig. 15 is a functional block diagram of a lane line detection apparatus 300 according to an embodiment of the present application. Referring to fig. 15, the lane line detection apparatus 300 includes:

the region-of-interest extracting module 310 is configured to determine a region-of-interest image containing a lane line from a road image acquired by a vehicle-mounted camera;

the lane line segmentation module 320 is configured to process the region-of-interest image by using a semantic segmentation network, obtain a segmentation mask of a lane line in the image, and convert the segmentation mask into a corresponding top view;

a pixel point detection module 330, configured to determine a set of lane line pixel points corresponding to different lane lines in the top view, and transform coordinates of the lane line pixel points in the set from coordinates in the top view to coordinates in the road image or coordinates in a world coordinate system;

and the lane line fitting module 340 is configured to form a lane line equation according to the set of lane line pixel points after the coordinate transformation.

In one implementation of the lane line detection apparatus 300, the lane line segmentation module 320 processes the image of the region of interest by using a semantic segmentation network to obtain a segmentation mask of a lane line in the image, including: extracting the multi-scale features of the region-of-interest image by using a backbone network in the semantic segmentation network; fusing the multi-scale features by using a feature fusion network in the semantic segmentation network, and outputting segmentation results aiming at different lane line categories; converting the segmentation result into the segmentation mask.

In one implementation of the lane line detection apparatus 300, the lane line segmentation module 320 extracts the multi-scale features of the image of the region of interest by using a backbone network in the semantic segmentation network, including: extracting the multi-scale features by utilizing a plurality of bottleeck modules in the backbone network, wherein the bottleeck modules are convolution modules in the MobileNet; the lane line segmentation module 320 fuses the multi-scale features by using a feature fusion network in the semantic segmentation network, and outputs segmentation results for different lane line categories, including: convolving the features of each scale, adding the convolved features of each scale to the fusion features of the scale, and sampling the addition result through deconvolution to obtain the fusion features of the previous scale; and after convolution, the features of the minimum scale are directly used as an addition result, and the fusion features of the previous scale calculated by using the features of the maximum scale are used as the segmentation result.

In one implementation of the lane line detection apparatus 300, the lane line dividing module 320 converts the dividing mask into a corresponding top view, including: determining a region to be detected in the segmentation mask by using the parameters of the vehicle-mounted camera, wherein the region to be detected represents a near region of a lane where the vehicle is located; and converting the part of the segmentation mask in the area to be detected into the top view.

In one implementation of the lane line detection apparatus 300, the top view is a binary image.

In an implementation manner of the lane line detection apparatus 300, the determining, by the pixel point detection module 330, a set of lane line pixel points corresponding to different lane lines in the top view includes: counting the number of lane line pixel points at each abscissa in the top view to obtain a histogram; determining a positioning base point of the lane line in the top view according to the abscissa of the peak in the histogram; the positioning base points are located at the bottom of the top view, and each positioning base point corresponds to one lane line; and searching along the longitudinal direction of the top view by taking the positioning base point as an initial position to obtain a lane line pixel point set corresponding to different lane lines in the top view.

In an implementation manner of the lane line detection apparatus 300, the pixel point detection module 330 counts the number of the lane line pixel points at each abscissa in the top view to obtain a histogram, which includes: and counting the number of lane line pixel points at each abscissa in the area with the designated height at the bottom of the top view to obtain the histogram.

In an implementation manner of the lane line detection apparatus 300, the determining, by the pixel point detection module 330, a positioning base point of the lane line in the top view according to an abscissa of a peak in the histogram includes: if a known positioning base point exists, searching a peak in a preset range near the abscissa of the known positioning base point in the histogram, and determining the positioning base point according to the found abscissa of the peak; if no known positioning base point exists, searching a peak in the whole abscissa range of the histogram, and determining the positioning base point according to the found abscissa of the peak; wherein the known positioning base point is a positioning base point determined in a process of performing lane line detection on a preamble image of the road image.

In one implementation manner of the lane line detecting device 300, the searching a peak in the histogram within a preset range near the abscissa of the known positioning base point by the pixel point detecting module 330, and determining the positioning base point according to the found abscissa of the peak includes: searching peaks in a preset range near the abscissa of two known positioning base points in the histogram respectively, and if two peaks are found and the abscissa x1 and x2 of the peaks meet the condition that thresh _ min < abs (x1-x2) < thresh _ max, determining x1 and x2 as the abscissa of the two positioning base points; wherein thresh min is a preset minimum width of a lane line, thresh _ max is a preset maximum width of the lane line, abs represents an absolute value calculation, and the two positioning base points correspond to two lane lines of a lane where the vehicle is located.

In an implementation manner of the lane line detection apparatus 300, the pixel point detection module 330 searches peaks in a whole abscissa range of the histogram, and determines the positioning base point according to the found abscissa of the peak, including: searching wave crests in the range [0, L/2] and [ L/2, L ] of the horizontal coordinates of the histogram respectively; wherein L is the maximum value of the abscissa of the histogram; if two peaks are found and the abscissa x1 and x2 satisfy the condition thresh _ min < abs (x1-x2) < thresh _ max, determining x1 and x2 as the abscissa of two positioning base points; if two peaks are found and the abscissa x1 and x2 thereof satisfy the conditions 2 × thresh _ min < abs (x1-x2) < 2 × thresh _ max and abs (x1-L/2) < abs (x2-L/2), determining x1 and (x1+ x2)/2 as the abscissa of two positioning base points; if two peaks are found and the abscissa x1 and x2 thereof satisfy the conditions of 2 × thresh _ min < abs (x1-x2) < 2 × thresh _ max and abs (x1-L/2) > abs (x2-L/2), determining x2 and (x1+ x2)/2 as the abscissa of two positioning base points; wherein thresh _ min is a preset minimum width of a lane line, thresh _ max is a preset maximum width of the lane line, abs represents absolute value calculation, and the two positioning base points correspond to two lane lines of a lane where the vehicle is located.

In an implementation manner of the lane line detecting device 300, the searching, by the pixel point detecting module 330, along the longitudinal direction of the top view, with the positioning base point as an initial position, to obtain a set of lane line pixel points corresponding to different lane lines in the top view, includes: determining the initial position of the sliding frame according to the positioning base point; starting from the initial position, moving the sliding frame to the top of the top view to search until the position of the sliding frame reaches the top of the top view, and obtaining the lane line pixel point set corresponding to the positioning base point; and after the sliding frame moves to a position, judging whether the sliding frame is effective at the position, and if the sliding frame is effective, adding the lane line pixel points in the sliding frame into the lane line pixel point set corresponding to the positioning base point.

In one implementation of the lane line detection apparatus 300, the determining whether the sliding frame is valid at the position by the pixel point detection module 330 includes: and judging whether the number of the lane line pixel points in the sliding frame at the position exceeds a first threshold value, if so, determining that the sliding frame is valid, and otherwise, determining that the sliding frame is invalid.

In an implementation manner of the lane line detection apparatus 300, the pixel point detection module 330 is further configured to: and when the sliding frame is effective, calculating the mean value of the abscissa of the lane line pixel points in the sliding frame, and determining a new position to which the sliding frame is to be moved according to the calculation result.

In an implementation manner of the lane line detection apparatus 300, the pixel point detection module 330 is further configured to: when the sliding frame is invalid, directly moving the sliding frame to a new position for continuing searching; wherein, the mode of moving the sliding frame comprises one of the following modes: keeping the abscissa of the sliding frame unchanged, and changing the ordinate of the sliding frame to enable the sliding frame to move towards the top of the top view; and determining the extension trend of the lane line corresponding to the positioning base point according to the searched lane line pixel points, and moving the sliding frame to the top of the top view according to the extension trend.

In an implementation manner of the lane line detection apparatus 300, the pixel point detection module 330 is further configured to: and after the search is finished, judging whether the total number of the effective sliding frames obtained in the search process exceeds a second threshold, if so, recognizing the obtainable lane line pixel point set, otherwise, not recognizing the obtainable lane line pixel point set.

The implementation principle and the technical effects of the lane line detection device 300 provided in the embodiment of the present application have been introduced in the foregoing method embodiments, and for brief description, no mention is made in the device embodiment, and reference may be made to the corresponding contents in the method embodiments.

Fig. 18 is a functional block diagram of a distance measuring device 400 according to an embodiment of the present disclosure. Referring to fig. 18, the ranging apparatus 400 includes:

the image acquisition module 410 is used for acquiring a road image acquired by the vehicle-mounted camera;

the lane line detection module 420 is configured to obtain a lane line equation of a lane where the vehicle is located in the road image by using the lane line detection method provided in the embodiment of the present application;

the first width calculation module 430 is configured to calculate a first pixel width according to the lane line equation, where the first pixel width is a pixel width of a lane where the vehicle is located in a blind area boundary of the vehicle-mounted camera in the road image;

a second width calculation module 440, configured to calculate a second pixel width according to the lane line equation, where the second pixel width is a pixel width of a lane where the vehicle is located in the road image at the target to be detected;

a ratio calculation module 450, configured to calculate a first ratio between the first pixel width and the second pixel width;

and the distance calculation module 460 is configured to substitute the first ratio into a distance measurement relation equation between the first ratio and the distance value, and calculate to obtain the distance value between the target to be measured and the vehicle.

In one implementation of the distance measuring device 400, the first width calculating module 430 calculates the first pixel width according to the lane line equation, including: determining the vertical coordinate of the bottom of the road image as the vertical coordinate of the blind area boundary; and calculating to obtain the first pixel width according to the longitudinal coordinate of the blind area boundary and the lane line equation.

In one implementation of the distance measuring device 400, the second width calculating module 440 calculates the second pixel width according to the lane line equation, including: detecting a target in the road image; determining a target located in a lane where the vehicle is located in the detected targets as the target to be detected according to the detection frame of the target and the lane line equation; determining the vertical coordinate of the target to be detected in the road image according to the detection frame of the target to be detected; and calculating to obtain the second pixel width according to the ordinate of the target to be detected and the lane line equation.

In one implementation of ranging device 400, the device further comprises:

a calibration module to: acquiring a calibration road image acquired by the vehicle-mounted camera; obtaining a third pixel width, wherein the third pixel width is the pixel width of a lane where the vehicle is located in the blind area boundary of the vehicle-mounted camera measured in the calibrated road image; obtaining a plurality of fourth pixel widths, wherein the fourth pixel widths are pixel widths of lanes where the vehicle is located and at a preset distance from the vehicle, which are measured in the calibrated road image; wherein, each fourth pixel width corresponds to a different preset distance; calculating a plurality of second ratios between the plurality of fourth pixel widths and the third pixel widths, and forming a plurality of data points consisting of the corresponding preset distances and the second ratios; and solving the parameters of the ranging relation equation according to the plurality of data points.

In one implementation of the distance measuring device 400, the lane line equation is a linear equation and the distance measuring relation equation is a linear equation or a quadratic curve equation.

The distance measuring device 400 provided in the embodiment of the present application, the implementation principle and the technical effects thereof have been introduced in the foregoing method embodiments, and for the sake of brief description, no mention is made in the device embodiment, and reference may be made to the corresponding contents in the method embodiments.

Fig. 19 shows a possible structure of an electronic device 500 provided in an embodiment of the present application. Referring to fig. 19, the electronic device 500 includes: a processor 510, a memory 520, and a communication interface 530, which are interconnected and in communication with each other via a communication bus 540 and/or other form of connection mechanism (not shown).

Processor 510 includes one or more (only one shown), which may be an integrated circuit chip, having signal processing capabilities. The Processor 510 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Micro Control Unit (MCU), a Network Processor (NP), or other conventional processors; the Processor may also be a dedicated Processor, including a Graphics Processing Unit (GPU), a Neural-Network Processing Unit (NPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, and a discrete hardware component. Also, when there are multiple processors 510, some of them may be general-purpose processors and others may be special-purpose processors.

The Memory 520 includes one or more (Only one is shown in the figure), which may be, but not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an electrically Erasable Programmable Read-Only Memory (EEPROM), and the like. The processor 510, as well as possibly other components, may access, read, and/or write data to the memory 520. For example, one or more computer program instructions may be stored in the memory 520, and may be read and executed by the processor 510 to implement the lane line detection method and/or the ranging method provided by the embodiments of the present application.

Communication interface 530 includes one or more devices (only one of which is shown) that can be used to communicate directly or indirectly with other devices for data interaction. Communication interface 530 may include an interface to communicate wired and/or wireless.

It will be appreciated that the configuration shown in FIG. 19 is merely illustrative and that electronic device 500 may include more or fewer components than shown in FIG. 19 or have a different configuration than shown in FIG. 19. The components shown in fig. 19 may be implemented in hardware, software, or a combination thereof. The electronic device 500 may be a physical device, such as an in-vehicle device, a PC, a laptop, a tablet, a mobile phone, a server, etc., or may be a virtual device, such as a virtual machine, a virtualized container, etc. The electronic device 500 is not limited to a single device, and may be a combination of a plurality of devices or a cluster including a large number of devices.

The embodiment of the present application further provides a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor of a computer, the lane line detection method and/or the distance measurement method provided in the embodiment of the present application are executed. For example, the computer-readable storage medium may be embodied as the memory 520 in the electronic device 500 in fig. 19.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A lane line detection method is characterized by comprising the following steps:

determining an interested area image containing a lane line from a road image acquired by a vehicle-mounted camera;

processing the interesting region image by utilizing a semantic segmentation network to obtain a segmentation mask of a lane line in the image, and converting the segmentation mask into a corresponding top view;

determining a set of lane line pixel points corresponding to different lane lines in the top view, and transforming coordinates of the lane line pixel points in the set from coordinates in the top view into coordinates in the road image or coordinates in a world coordinate system;

and fitting to form a lane line equation according to the set of the lane line pixel points after the coordinates are transformed.

2. The method according to claim 1, wherein the processing the image of the region of interest by using the semantic segmentation network to obtain a segmentation mask of the lane line in the image comprises:

extracting the multi-scale features of the region-of-interest image by using a backbone network in the semantic segmentation network;

fusing the multi-scale features by using a feature fusion network in the semantic segmentation network, and outputting segmentation results aiming at different lane line categories;

converting the segmentation result into the segmentation mask.

3. The method according to claim 2, wherein the extracting the multi-scale features of the image of the region of interest by using a backbone network in the semantic segmentation network comprises:

extracting the multi-scale features by utilizing a plurality of bottleeck modules in the backbone network, wherein the bottleeck modules are convolution modules in the MobileNet;

the fusing the multi-scale features by using the feature fusion network in the semantic segmentation network and outputting segmentation results aiming at different lane line categories comprises the following steps:

convolving the features of each scale, adding the convolved features of each scale to the fusion features of the scale, and sampling the addition result through deconvolution to obtain the fusion features of the previous scale; and after convolution, the features of the minimum scale are directly used as an addition result, and the fusion features of the previous scale calculated by using the features of the maximum scale are used as the segmentation result.

4. The lane line detection method of any of claims 1-3, wherein said converting the segmentation mask into a corresponding top view comprises:

determining a region to be detected in the segmentation mask by using the parameters of the vehicle-mounted camera, wherein the region to be detected represents a near region of a lane where the vehicle is located;

and converting the part of the segmentation mask in the area to be detected into the top view.

5. The lane line detection method according to any one of claims 1 to 3, wherein processing the region-of-interest image using a semantic segmentation network to obtain a segmentation mask of the lane lines in the image and converting the segmentation mask into a corresponding top view comprises:

determining a distal portion image containing a distal lane line and a proximal portion image containing a proximal lane line from the region of interest image;

respectively processing the far-end partial image and the near-end partial image by utilizing the semantic segmentation network to obtain a segmentation mask of a lane line in the far-end part image and a segmentation mask of a lane line in the near-end part image;

and converting the two segmentation masks into the top view which is jointly corresponding to the two segmentation masks.

6. The method of claim 4, wherein the determining the set of lane line pixels corresponding to different lane lines in the top view comprises:

counting the number of lane line pixel points at each abscissa in the area with the designated height at the bottom of the top view to obtain a histogram;

determining a positioning base point of the lane line in the top view according to the abscissa of the peak in the histogram; the positioning base points are located at the bottom of the top view, and each positioning base point corresponds to one lane line;

and searching along the longitudinal direction of the top view by taking the positioning base point as an initial position to obtain a lane line pixel point set corresponding to different lane lines in the top view.

7. The lane line detection method according to claim 6, wherein determining a positioning base point of the lane line in the top view from an abscissa of a peak in the histogram comprises:

if a known positioning base point exists, searching a peak in a preset range near the abscissa of the known positioning base point in the histogram, and determining the positioning base point according to the found abscissa of the peak;

if no known positioning base point exists, searching a peak in the whole abscissa range of the histogram, and determining the positioning base point according to the found abscissa of the peak;

wherein the known positioning base point is a positioning base point determined in a process of performing lane line detection on a preamble image of the road image.

8. The lane line detection method according to claim 7, wherein the searching for a peak in the histogram within a preset range located near an abscissa of the known positioning base point and determining the positioning base point according to the found abscissa of the peak includes:

searching peaks in a preset range near the abscissa of two known positioning base points in the histogram respectively, and if two peaks are found and the abscissa x1 and x2 of the peaks meet the condition that thresh _ min < abs (x1-x2) < thresh _ max, determining x1 and x2 as the abscissa of the two positioning base points; -wherein thresh _ min is a preset minimum width of a lane line, thresh _ max is a preset maximum width of a lane line, abs represents an absolute value calculation, and the two positioning base points correspond to two lane lines of a lane where the host vehicle is located;

searching peaks in the whole abscissa range of the histogram, and determining the positioning base point according to the found abscissas of the peaks, including:

searching wave crests in the range [0, L/2] and [ L/2, L ] of the horizontal coordinates of the histogram respectively; wherein L is the maximum value of the abscissa of the histogram;

if two peaks are found and the abscissa x1 and x2 thereof satisfy the conditions 2 × thresh _ min < abs (x1-x2) < 2 × thresh _ max and abs (x1-L/2) < abs (x2-L/2), determining x1 and (x1+ x2)/2 as the abscissa of two positioning base points;

if two peaks are found and the abscissa x1 and x2 thereof satisfy the conditions of 2 × thresh _ min < abs (x1-x2) < 2 × thresh _ max and abs (x1-L/2) > abs (x2-L/2), determining x2 and (x1+ x2)/2 as the abscissa of two positioning base points;

wherein thresh _ min is the preset minimum width of the lane line, thresh _ max is the preset maximum width of the lane line, abs-represents the absolute value calculation, and the two positioning base points correspond to the two lane lines of the lane where the vehicle is located.

9. The method according to claim 6, wherein the searching along the longitudinal direction of the top view with the positioning base point as a starting position to obtain a set of lane line pixel points corresponding to different lane lines in the top view comprises:

determining the initial position of the sliding frame according to the positioning base point;

starting from the initial position, moving the sliding frame to the top of the top view to search until the position of the sliding frame reaches the top of the top view, and obtaining the lane line pixel point set corresponding to the positioning base point;

after the sliding frame moves to a position, judging whether the number of lane line pixel points in the sliding frame at the position exceeds a first threshold value, if so, determining that the sliding frame is valid, otherwise, determining that the sliding frame is invalid;

if the sliding frame is effective, adding the lane line pixel points in the sliding frame into the lane line pixel point set corresponding to the positioning base point, calculating the abscissa mean value of the lane line pixel points in the sliding frame, and determining a new position to which the sliding frame is to be moved according to the calculation result;

if the sliding frame is invalid, directly moving the sliding frame to a new position for continuing searching, wherein the sliding frame moving mode comprises one of the following modes: keeping the abscissa of the sliding frame unchanged, and changing the ordinate of the sliding frame to enable the sliding frame to move towards the top of the top view; determining the extension trend of the lane line corresponding to the positioning base point according to the searched lane line pixel points, and moving the sliding frame to the top of the top view according to the extension trend;

the method further comprises the following steps:

and after the search is finished, judging whether the total number of the effective sliding frames obtained in the search process exceeds a second threshold, if so, recognizing the obtainable lane line pixel point set, otherwise, not recognizing the obtainable lane line pixel point set.

10. A method of ranging, comprising:

acquiring a road image acquired by a vehicle-mounted camera;

obtaining a lane line equation of a lane where the host vehicle is located in the road image by using the lane line detection method according to any one of claims 1 to 9;

calculating a first pixel width according to the lane line equation, wherein the first pixel width is the pixel width of a lane where the vehicle is located at the blind area boundary of the vehicle-mounted camera in the road image;

calculating a second pixel width according to the lane line equation, wherein the second pixel width refers to the pixel width of a lane where the vehicle is located in the road image at the position of the target to be detected;

calculating a first ratio between the first pixel width and the second pixel width;

and substituting the first ratio into a distance measurement relation equation of the first ratio and the distance value, and calculating to obtain the distance value between the target to be measured and the vehicle.

Wherein the calculating a first pixel width according to the lane line equation comprises:

determining the vertical coordinate of the bottom of the road image as the vertical coordinate of the blind area boundary;

calculating to obtain the first pixel width according to the longitudinal coordinate of the blind area boundary and the lane line equation;

the calculating a second pixel width according to the lane line equation includes:

detecting a target in the road image;

determining a target located in a lane where the vehicle is located in the detected targets as the target to be detected according to the detection frame of the target and the lane line equation;

determining the vertical coordinate of the target to be detected in the road image according to the detection frame of the target to be detected;

and calculating to obtain the second pixel width according to the ordinate of the target to be detected and the lane line equation.

11. The ranging method of claim 10, further comprising:

acquiring a calibration road image acquired by the vehicle-mounted camera;

obtaining a third pixel width, wherein the third pixel width is the pixel width of a lane where the vehicle is located in the blind area boundary of the vehicle-mounted camera measured in the calibrated road image;

obtaining a plurality of fourth pixel widths, wherein the fourth pixel widths are pixel widths of lanes where the vehicle is located and at a preset distance from the vehicle, which are measured in the calibrated road image; wherein, each fourth pixel width corresponds to a different preset distance;

calculating a plurality of second ratios between the plurality of fourth pixel widths and the third pixel widths, and forming a plurality of data points consisting of the corresponding preset distances and the second ratios;

and solving the parameters of the ranging relation equation according to the plurality of data points.

12. A ranging method according to claim 10 or 11, characterized in that the lane line equation is a straight line equation and the ranging relation equation is a straight line equation or a quadratic curve equation.

13. A lane line detection apparatus, comprising:

the interesting region extraction module is used for determining an interesting region image containing a lane line from a road image acquired by the vehicle-mounted camera;

the lane line segmentation module is used for processing the image of the region of interest by utilizing a semantic segmentation network to obtain a segmentation mask of a lane line in the image and converting the segmentation mask into a corresponding top view;

the pixel point detection module is used for determining a set of lane line pixel points corresponding to different lane lines in the top view and converting coordinates of the lane line pixel points in the set from coordinates in the top view into coordinates in the road image or coordinates in a world coordinate system;

and the lane line fitting module is used for fitting to form a lane line equation according to the set of the lane line pixel points after the coordinates are transformed.

14. A ranging apparatus, comprising:

the image acquisition module is used for acquiring a road image acquired by the vehicle-mounted camera;

a lane line detection module, configured to obtain a lane line equation of a lane in which the host vehicle is located in the road image by using the lane line detection method according to any one of claims 1 to 9;

the first width calculation module is used for calculating a first pixel width according to the lane line equation, wherein the first pixel width is the pixel width of a lane where the vehicle is located in the blind area boundary of the vehicle-mounted camera in the road image;

the second width calculation module is used for calculating a second pixel width according to the lane line equation, wherein the second pixel width refers to the pixel width of a lane where the vehicle is located in the road image at the position of the target to be detected;

a ratio calculation module for calculating a first ratio between the first pixel width and the second pixel width;

the distance calculation module is used for substituting the first ratio into a distance measurement relation equation of the first ratio and the distance value to calculate and obtain the distance value between the target to be measured and the vehicle;

wherein the first width calculation module calculates a first pixel width according to the lane line equation, including:

the second width calculation module calculates a second pixel width according to the lane line equation, including:

detecting a target in the road image;

15. A computer-readable storage medium having stored thereon computer program instructions which, when read and executed by a processor, perform the method of any one of claims 1-12.

16. An electronic device comprising a memory and a processor, the memory having stored therein computer program instructions that, when read and executed by the processor, perform the method of any of claims 1-12.