CN116630917A - Lane line detection method - Google Patents

Lane line detection method Download PDF

Info

Publication number
CN116630917A
CN116630917A CN202310504571.8A CN202310504571A CN116630917A CN 116630917 A CN116630917 A CN 116630917A CN 202310504571 A CN202310504571 A CN 202310504571A CN 116630917 A CN116630917 A CN 116630917A
Authority
CN
China
Prior art keywords
lane line
feature map
module
feature
characteristic diagram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310504571.8A
Other languages
Chinese (zh)
Inventor
张锋威
尚磊
陈松乐
孙红波
黄茹玥
吴雨欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202310504571.8A priority Critical patent/CN116630917A/en
Publication of CN116630917A publication Critical patent/CN116630917A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application discloses a lane line detection method, which comprises the following steps: labeling the collected lane line detection data set by using a rotatable boundary box; obtaining a characteristic diagram through a modified Swin-transducer algorithm; processing the feature map based on a feature fusion network to obtain a fused feature map; inputting the fused characteristic images into a detection head for detection; and outputting a final prediction result by using a prediction frame based on the improvement of the circular smooth label, and fitting the lane line by using a Bezier curve polynomial. The application can better highlight the angle information and the characteristic information of the target, thereby more accurately framing the position and the shape of the lane line, having higher detection speed and recognition precision, better helping the vehicle to perceive the road surface track information and having better practical feasibility.

Description

Lane line detection method
Technical Field
The application relates to the technical field of automatic driving, in particular to a lane line detection method.
Background
With the rapid development of artificial intelligence technology, intelligent transportation has been developed. The intelligent driving technology is used as a branch in the intelligent traffic field, intelligent information exchange of a person-vehicle-road is realized by integrating multiple technologies such as a computer, a sensor, artificial intelligence and the like, so that the vehicle has environment sensing capability, a safer and more reliable driving strategy is provided for a driver, and the safety and the comfort of vehicle driving are further improved.
The lane line detection is a key link in the intelligent driving technology, and is used for rapidly and effectively detecting lane lines in road condition images, and has very important roles in driving path planning, road deviation reminding and traffic accident avoidance.
The traditional target detection algorithm flow mainly comprises three steps: 1) Selecting a region; 2) Extracting features; 3) And (5) classification. The entire image is first traversed using a sliding window similar to the exhaustion, and the region where the target may appear is located. The exhaustive method has high time complexity, can generate a large number of redundant windows, is inflexible in scale transformation of sliding windows, and cannot well detect some targets with larger size variation. For the feature extraction portion, conventional object detection algorithms employ manually extracted features to express objects, such as Scale-invariant feature transform, SIFT, directional gradient histogram features (Histogram of Oriented Gradient, HOG), acceleration robust features (Speeded Up Robust Features, SURF), and the like. However, in an actual scene, the target background is complex, the gesture is varied, the influence of illumination, shielding, angles and the like can be caused, and people can hardly design the hand features which have universality and can be suitable for various complex backgrounds. After the features are extracted, the classification typically determines whether the region contains a target. The common classification algorithms are SVM, adaboost and the like. Such techniques work well in certain environments. However, the performance is still poor in the case of driving and surrounding environment changes, so it is not suitable for practical road scene applications. The development of sensor technology, the development of hardware technology capable of large-capacity and high-speed processing, and the development of various deep learning algorithms, such as computer vision methods based on Convolutional Neural Networks (CNNs) and YOLO algorithms, have significantly improved lane detection and recognition performance.
Disclosure of Invention
This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.
The present application has been made in view of the above-described problems occurring in the prior art.
Therefore, the lane line detection method provided by the application solves the problems of low detection speed and low recognition accuracy in the prior art.
In order to solve the technical problems, the application provides the following technical scheme:
labeling the collected lane line detection data set by using a rotatable boundary box;
obtaining a characteristic diagram through a modified Swin-transducer algorithm;
processing the feature map based on a feature fusion network to obtain a fused feature map;
inputting the fused characteristic images into a detection head for detection;
and outputting a final prediction result by using a prediction frame based on the improvement of the circular smooth label, and fitting the lane line by using a Bezier curve polynomial.
As a preferable embodiment of the lane line detection method according to the present application, wherein: and labeling the collected lane line detection data set by using a rotatable boundary box in a rotary labeling tool aiming at the target lane line in the image and generating a corresponding file, wherein the information labeled by the rotatable boundary box comprises the coordinates xywh of a rectangular bounding box of the target lane line and an included angle between the side of the bounding box and a horizontal line.
As a preferable embodiment of the lane line detection method according to the present application, wherein: the improved Swin-transducer algorithm obtains a feature map, adds global attention in a Swin-Transformer Block module, adds global attention operation after calculating window attention operation, namely, obtains a feature sequence x through window attention operation, and then obtains a feature sequence x through three weight matrixes W q 、W k And W is v Respectively converting into a Query vector, a Key vector and a Value vector, obtaining a weight matrix by the dot product Query vector and the Key vector, and obtaining a multi-head attention output vector x by multiplying the weight matrix by the Value vector, wherein the multi-head attention output vector x is expressed as:
wherein d k For the length of the global attention head, softmax is a normalized exponential function.
As a preferable embodiment of the lane line detection method according to the present application, wherein: the improved Swin-transducer algorithm obtains a characteristic diagram, which comprises the following steps:
the method comprises the steps that a lane line color image X= [ H, W,3] of an input image is formed, wherein H is the height of the image, W is the width of the image, 3 is the channel number of the image, the image X is split into non-overlapping patches with the same size through a Patch splitting module, and each Patch is flattened into a token vector;
inputting token vectors into a linear embedding layer for processing, and projecting dimensions into any dimension C, wherein a typical value of C is 96;
feeding token vectors processed by the linear embedding layer into a plurality of Swin-Transformer Block with improved self-attention for operation;
the linear embedded layer and the improved Swin-Transformer Block module are taken as a first processing stage by a multi-head self-attention module based on a shift window and a global attention operation;
the feature vector of the image is obtained after the first processing stage is processed, the feature vector is input into a Patch Merging module for processing to obtain an effect similar to the downsampling operation, and the feature image out1 is obtained through the processing of a modified Swin-Transformer Block module;
the Patch metering and the improved Swin-Transformer Block module are used as a second stage to process the image features to obtain a feature map out1, a feature map out2 and a feature map out3 respectively, wherein the size of the feature map out1 is [ H/8,W/8,2C ], the size of the feature map out2 is [ H/16, W/16,4C ], the size of the feature map out3 is [ H/32, W/32,8C ], and then the multi-scale feature map is extracted.
As a preferable embodiment of the lane line detection method according to the present application, wherein: the feature map is processed based on a feature fusion network to obtain a fused feature map, the feature fusion network is a neck network of YOLOv5, the obtained feature map out3 is convolved by a CBS module to obtain a high-level feature map f3;
the high-level characteristic diagram f3 is subjected to up-sampling and splicing operation of the characteristic diagram out2 to obtain a characteristic diagram of the middle layer, and the characteristic diagram of the middle layer f2 is obtained after the characteristic diagram of the middle layer is processed by a C3II_1 module and convolved by a CBS module;
the middle-layer characteristic diagram f2 is subjected to up-sampling and splicing operation of the characteristic diagram out1, and a bottom-layer characteristic diagram f1 is obtained after being processed by a C3II_1 module and is output to a detection head;
processing the bottom layer characteristic diagram f1 by adopting a CBS module and splicing the middle layer characteristic diagram f2, processing the spliced characteristic diagram by adopting a C3II_1 module, obtaining a new middle layer characteristic diagram f2', and outputting the new middle layer characteristic diagram f2' to a detection head;
and (3) performing convolution operation on the new middle-layer feature map f2' by adopting a CBS module, splicing the new middle-layer feature map f2' with the high-layer feature map f3, processing the spliced feature map by using a C3II_1 module, and outputting the new high-layer feature map f3' to a detection head.
As a preferable embodiment of the lane line detection method according to the present application, wherein: the CBS module is composed of Conv functions, BN functions and SiLU functions, wherein the Conv functions are used for carrying out convolution operation on the feature graphs, the BN functions are used for carrying out batch normalization operation on data, and the SiLU functions are used as final activation functions.
As a preferable embodiment of the lane line detection method according to the present application, wherein: the C3 module is composed of 3 CBS modules and a plurality of Bottleneck modules, the C3 module learns residual characteristics, the structure comprises two branches, wherein a first branch uses a plurality of specified Bottleneck stacks and 3 standard CBS modules, a second branch only passes through one CBS module, and finally the two branches are spliced.
As a preferable embodiment of the lane line detection method according to the present application, wherein: the fused feature graphs are input into the detection head for detection, a feature pyramid network and a path aggregation network are combined, wherein the feature pyramid network is from top to bottom, high-level strong semantic features are transferred, the whole pyramid is enhanced, semantic information is enhanced, positioning information is not transferred, the path aggregation network adds a bottom-up pyramid behind the feature pyramid network, low-level strong positioning features are transferred, and the multi-scale fused feature graphs in the path aggregation network are directly used for input into the detection head for detection.
As a preferable embodiment of the lane line detection method according to the present application, wherein: the improved prediction frame optimizes the prediction frame based on a circular smooth label, predicts angles by a classification rather than regression method, and selects a proper window function to avoid the problem of angle periodicity, wherein the circular smooth label is expressed as:
where g (x) represents the window function, r is the radius of the window function, and θ represents the angle of the current bounding box.
As a preferable embodiment of the lane line detection method according to the present application, wherein: and fitting the lane line information represented by the prediction frame by adopting a Bezier curve polynomial, wherein the Bezier curve general formula of the order n is expressed as follows:
wherein P is i Synchronization is expressed as the abscissa or ordinate of a given point, n represents the order of the Bezier curve polynomial, and i represents the ith term of the polynomial.
The application has the beneficial effects that: the application uses the rotatable detection frame to replace the traditional horizontal detection frame to extract the image characteristics of the lane lines, and can better highlight the angle information of the target, thereby more accurately framing the positions and the shapes of the lane lines. The improved Swin-transform algorithm is used for extracting high performance of image features and fusing the multi-scale feature images through a YOLOv5 neck network, so that compared with other existing methods, the method has higher detection speed and recognition accuracy, can better help vehicles to perceive road surface track information, and has better practical feasibility.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
fig. 1 is a basic flow diagram of a lane line detection method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a improved Swin-Transformer Block module structure of a lane line detection method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a network model of a lane line detection method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a C3 module of a lane line detection method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of prediction information of a prediction frame of an improved lane line detection method according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a circular smooth tag of a lane line detection method according to an embodiment of the present application;
Detailed Description
So that the manner in which the above recited objects, features and advantages of the present application can be understood in detail, a more particular description of the application, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the application. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
While the embodiments of the present application have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the application. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.
Also in the description of the present application, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present application and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.
Example 1
Referring to fig. 1-6, for one embodiment of the present application, a lane line detection method is provided, as shown in fig. 1, including the steps of:
s1: labeling the collected lane line detection data set by using a rotatable boundary box;
further, a rotatable bounding box of a rolabelmg rotary marking tool is used in the collected lane line detection dataset image to mark a target lane line in the image and generate a corresponding file, the marked information of the rotatable bounding box comprises coordinates xywh of a rectangular bounding box of the target lane line and an included angle between the edge of the bounding box and a horizontal line, and compared with a traditional horizontal bounding box, the rotatable bounding box can better highlight angle information of the target;
s2: obtaining a characteristic diagram through a modified Swin-transducer algorithm;
further, the data set for which the label work has been completed is divided into a training set, a validation set and a test set. The preprocessing operation is performed on the images to facilitate model processing of the dataset, and the resizing of the images in the dataset to a multiple of 32 is performed to accommodate the rolling and pooling operations. Therefore, the feature map can still keep enough resolution and information quantity after multiple times of rolling and pooling, and information loss and model performance degradation are prevented;
further, as shown in fig. 2, the improved Swin-transform algorithm adds Global Attention to the Swin-Transformer Block module, and since the calculation of Attention is limited to each window in the original Swin-Transformer Block module, the information exchange is insufficient to cause partial information loss, and Global Attention operation (Global Attention) can be added after the window Attention operation is calculated, that is, after the feature sequence x is obtained through the window Attention operation, the feature sequence x is obtained through three weight matrices W q 、W k And W is v Respectively converting into a Query vector, a Key vector and a Value vector, obtaining a weight matrix by the dot product Query vector and the Key vector, obtaining a multi-head attention output vector x by multiplying the weight matrix by the Value vector, and calculating the multi-head attention output vector x by the following steps:
wherein d k The length of the global attention head is the length of the Softmax, and the global information is fully interacted by adding global attention operation;
further, as shown in fig. 3, the overall structure of the network model, the process of extracting image features by the improved Swin-transducer algorithm is as follows:
a1: the method comprises the steps that a lane line color image X= [ H, W,3] of an input image is formed, wherein H is the height of the image, W is the width of the image, 3 is the channel number of the image, the image X is split into non-overlapping patches with equal size through a Patch splitting module, and then each Patch is flattened into a token vector;
a2: inputting the token vector into a Linear Embedding layer (Linear Embedding) for processing, and projecting the dimension into any dimension C, wherein the typical value of C is 96;
a3: feeding the processed token into a plurality of Swin-Transformer Block with improved self-attention for correlation;
a4: the improved Swin-Transformer Block module mainly applies a multi-head self-Attention module (W-MSA/SW-MSA) and Global Attention operation (Global Attention) based on a shift window, so that the model can capture the characteristic of long-distance interdependence more easily and the Global information can be fully interacted, and a linear embedded layer and the improved Swin-Transformer Block module can be used as a first processing stage;
a5: the feature vector of the image is obtained after the processing in the first processing stage, then the feature vector is input into a Patch Merging module for processing, the effect similar to the downsampling operation can be obtained, and then the feature image out1 is obtained through the processing of a modified Swin-Transformer Block module;
a6: then, the Patch metering and improved Swin-Transformer Block module is used as a second processing stage to process the image characteristics to obtain a characteristic image out1 (with the size of [ H/8,W/8,2C ]), a characteristic image out2 (with the size of [ H/16, W/16,4C ]), and a characteristic image out3 (with the size of [ H/32, W/32,8C ]), so that a multi-scale characteristic image can be extracted, the model is helped to capture targets and detail information with different scales, and the performance and robustness of the model are improved;
for example, the lane line detection data set is adjusted from an image with a width×height×channel number of 1280×720×3 to an image with a width×height×channel number of 640×640×3, wherein the units of width and height are pixels. The preprocessed training set picture is input into a modified Swin-transducer structure, and three feature graphs, namely out1 (the feature graph size is 80 multiplied by 192), out2 (the feature graph size is 40 multiplied by 384) and out3 (the feature graph size is 20 multiplied by 768) are output after processing.
S3: processing the feature map based on a feature fusion network to obtain a fused feature map;
furthermore, the feature fusion network refers to a YOLOv5 neck network, and the feature map out3 is convolved by a CBS module to obtain a high-level feature map f3;
the high-level characteristic diagram f3 is subjected to up-sampling, then is subjected to splicing operation with the characteristic diagram out2 to obtain a characteristic diagram, is processed by a C3II_1 module, and is subjected to convolution operation of a CBS module to obtain a middle-level characteristic diagram f2;
the middle-layer feature map f2 is subjected to up-sampling and then is subjected to splicing operation with the feature map out1, and a bottom-layer feature map f1 is obtained after being processed by a C3II_1 module and is output to a detection head;
then, the bottom layer characteristic diagram f1 is processed by adopting a CBS module and then is spliced with the middle layer characteristic diagram f2, the spliced characteristic diagram is processed by adopting a C3II_1 module, and a new middle layer characteristic diagram f2' is obtained and then is output to a detection head;
finally, performing convolution operation on the new middle-layer feature map f2 'by adopting a CBS module, then splicing the new middle-layer feature map f2' with the high-layer feature map f3, processing the spliced feature map by adopting a C3II_1 module, and obtaining a new high-layer feature map f3 'and outputting the new high-layer feature map f3' to a detection head;
further, the CBS module is composed of Conv functions, BN functions and SiLU functions, wherein the Conv functions are used for carrying out convolution operation on the feature graphs, the BN functions are used for carrying out batch normalization operation on data, and the SiLU functions are used as final activation functions;
further, as shown in the schematic diagram of the C3 module in fig. 4, the C3 module is composed of 3 CBS modules and a plurality of Bottleneck modules, the C3 module mainly learns residual characteristics, the structure includes two branches, the first branch uses the specified plurality of Bottleneck stacks and 3 standard CBS modules, the second branch only passes through one CBS module, and finally the two branches are spliced;
for example: the feature map with the size of 20 multiplied by 768 outputted by the improved Swin-transducer structure is subjected to convolution operation of a CBS module and then subjected to dimension reduction to obtain a high-level feature map f3 (with the size of 20 multiplied by 512);
after up-sampling operation (the size is 40×40×512), the high-level feature map f3 (the size is 20×20×512) is spliced with a feature map out2 obtained by a main network to obtain a feature map, the feature map is processed by a c3ii_1 module, and then the dimension is reduced by a CBS module convolution operation to obtain a middle-level feature map f2 (the size is 40×40×256);
the middle-layer feature map f2 (with the size of 40×40×256) is subjected to up-sampling operation (with the size of 80×80×256), then is subjected to splicing operation with a feature map out1 obtained by a main network, and after being processed by a c3ii_1 module, a bottom-layer feature map f1 (with the size of 80×80×256) is obtained and is output to a detection head.
The bottom layer characteristic diagram f1 (with the size of 80 multiplied by 256) is processed by adopting a downsampling operation (with the size of 40 multiplied by 256) and then is spliced with the middle layer characteristic diagram f2 (with the size of 40 multiplied by 256), the spliced characteristic diagram is processed by a C3II_1 module, and a new middle layer characteristic diagram f2' (with the size of 40 multiplied by 512) is obtained and then output to a detection head;
the new middle-layer characteristic diagram f2 '(with the size of 40 multiplied by 512) is processed by adopting a downsampling operation (with the size of 20 multiplied by 512), then is spliced with the high-layer characteristic diagram f3 (with the size of 20 multiplied by 512), and the spliced characteristic diagram is processed by a C3II_1 module to obtain a new high-layer characteristic diagram f3' (with the size of 20 multiplied by 1024) and is output to a detection head;
s4: inputting the fused characteristic images into a detection head for detection;
furthermore, a Feature Pyramid Network (FPN) and a path aggregation network (PANet) are combined in the model, wherein the feature pyramid network is from top to bottom, high-level strong semantic features are transferred, the whole pyramid is enhanced, semantic information is enhanced, positioning information is not transferred, and a bottom-up pyramid is added behind the feature pyramid network by the path aggregation network, so that the operation is to supplement the feature pyramid network, the low-level strong positioning features are transferred, and a better feature fusion effect can be obtained by combining the feature pyramid network with the low-level strong positioning features. And then directly inputting the multiscale fusion characteristic diagram in the path aggregation network into a detection head for detection.
S5: and outputting a final prediction result by using a prediction frame based on the improvement of the circular smooth label, and fitting the lane line by using a Bezier curve polynomial.
As shown in fig. 5, the improved prediction frame prediction information highlights bθ compared to the original horizontal detection frame, which displays only the coordinates of the rectangular bounding box containing the object ij The important angle information of the target enables the improved prediction frame to better represent the target information and more accurately frame the shape and the position of the target.
Furthermore, if the angle is predicted by adopting a regression mode, the problem of discontinuous boundary is caused, which causes the sudden increase when the model calculates the loss value at the boundary and greatly influences the detection result.
As shown in the schematic diagram of the circular smooth tag in fig. 6, the predicted angle and the true angle of the target are processed by the improved prediction frame by using a window function, and because the circular smooth tag formed by the window function processing can avoid abrupt change of error between the angle information predicted by the model and the true angle information, the processing mode enables the model to perform the training process more stably.
The improved prediction frame optimizes the prediction frame based on a Circular Smoothing Label (CSL), predicts angles by a classification rather than regression method, and selects a proper window function to avoid the problem of angle periodicity, wherein the Circular Smoothing Label (CSL) is calculated as follows:
where g (x) represents the window function, r is the radius of the window function, θ represents the angle of the current bounding box, and a Gaussian function (Gaussian) may be selected as the window function, where the Gaussian function formula is:
the one-dimensional graph of the Gaussian function is a feature symmetry 'bell curve' shape, a is the height of a curve peak, b is the coordinate of the peak center, c is called standard deviation, the characteristic is bell-shaped width, the label value is continuous at the boundary, and the classification of angles can be achieved by applying the Gaussian function as a window function, so that a good prediction result is obtained;
furthermore, the lane line information represented by the prediction frame is fitted to the lane line by adopting a Bezier curve polynomial, wherein the Bezier curve of the n-order has the following general formula:
p in the formula i Synchronization is expressed as the abscissa or ordinate of a given point, n represents the order of the Bezier curve polynomial, and i represents the ith term of the polynomial. And taking the second-order Bezier curve as a fitting function, calculating coefficients of the Bezier curve according to the data points of the lane line and the order of the Bezier curve, and drawing the lane line curve by using the calculated coefficients of the Bezier curve.
Example 2
Referring to table 1, for one embodiment of the present application, a lane line detection method is provided, and in order to verify the beneficial effects, the comparison results of the present scheme and the other two schemes on the Caltech Lanes data set are provided.
Scheme one: inputting the lane line data set marked by the horizontal boundary frame into a YOLO algorithm which is not subjected to any improvement, training the lane line data set, and outputting a model;
scheme II: the original image in the lane line detection data set is firstly converted into a bird's eye view. This result can be achieved by an inverse perspective transformation, in the general procedure: firstly, calibrating a road region in an image so as to determine a region of interest (ROI), then calculating a perspective transformation matrix by using the correlated coordinate data of the calibrated ROI, and completing the transformation from the original image to the aerial view in the data set after obtaining the perspective transformation matrix.
After obtaining the aerial view, the lane line target in the aerial view may be labeled with a horizontal bounding box, and then input into the YOLO algorithm to train and output a model.
The scheme is as follows: the rotatable detection frame is used for replacing the traditional horizontal detection frame to extract the image characteristics of the lane lines, so that the angle information of the target can be better highlighted, and the position and the shape of the lane lines can be more accurately framed. The multiscale feature map is fused by means of improved Swin-transducer algorithm through YOLOv5 neck network after extracting high performance of image features, and finally the obtained multiscale fused feature map is input into a detection head for detection.
Table 1 comparison table
Scheme one Scheme II The proposal is that
Recognition accuracy 84.4% 88.44% 90.16%
Angle information Does not show Does not show Can show that
Output prediction frame Failure to characterize lane shape Failure to characterize lane shape The lane shape can be characterized
As can be seen from table 1, the processing and division of lane line detection are more detailed, and the collected lane line detection data set is labeled by using a rotatable bounding box; obtaining a characteristic diagram through a modified Swin-transducer algorithm; the feature map is processed through a neck network of YOLOv5 to obtain a fused feature map; inputting the fused characteristic images into a detection head for detection; the final prediction result is output by using the prediction frame based on the improvement of the circular smooth tag, and the lane line is fitted through the Bezier curve polynomial, so that the recognition accuracy is higher, the angle information is clearer, the shape of the lane line can be represented by the output prediction frame, and the vehicle can be better helped to perceive the road surface track information.
It should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application, which is intended to be covered in the scope of the claims of the present application.

Claims (10)

1. A lane line detection method, characterized by comprising:
labeling the collected lane line detection data set by using a rotatable boundary box;
obtaining a characteristic diagram through a modified Swin-transducer algorithm;
processing the feature map based on a feature fusion network to obtain a fused feature map;
inputting the fused characteristic images into a detection head for detection;
and outputting a final prediction result by using a prediction frame based on the improvement of the circular smooth label, and fitting the lane line by using a Bezier curve polynomial.
2. The lane line detection method according to claim 1, wherein: and labeling the collected lane line detection data set by using a rotatable boundary box in a rotary labeling tool aiming at the target lane line in the image and generating a corresponding file, wherein the information labeled by the rotatable boundary box comprises the coordinates xywh of a rectangular bounding box of the target lane line and an included angle between the side of the bounding box and a horizontal line.
3. The lane line detection method according to claim 2, wherein: the improved Swin-transducer algorithm obtains a feature map, adds global attention in a Swin-Transformer Block module, adds global attention operation after calculating window attention operation, namely, obtains a feature sequence x through window attention operation, and then obtains a feature sequence x through three weight matrixes W q 、W k And W is v Respectively converting into a Query vector, a Key vector and a Value vector, obtaining a weight matrix by the dot product Query vector and the Key vector, and obtaining a multi-head attention output vector x by multiplying the weight matrix by the Value vector, wherein the multi-head attention output vector x is expressed as:
wherein d k For the length of the global attention head, softmax is a normalized exponential function.
4. The lane line detection method according to claim 3, wherein: the improved Swin-transducer algorithm obtains a characteristic diagram, which comprises the following steps:
the method comprises the steps that a lane line color image X= [ H, W,3] of an input image is formed, wherein H is the height of the image, W is the width of the image, 3 is the channel number of the image, the image X is split into non-overlapping patches with the same size through a Patch splitting module, and each Patch is flattened into a token vector;
inputting token vectors into a linear embedding layer for processing, and projecting dimensions into any dimension C, wherein a typical value of C is 96;
feeding token vectors processed by the linear embedding layer into a plurality of Swin-Transformer Block with improved self-attention for operation;
the linear embedded layer and the improved Swin-Transformer Block module are taken as a first processing stage by a multi-head self-attention module based on a shift window and a global attention operation;
the feature vector of the image is obtained after the first processing stage is processed, the feature vector is input into a Patch Merging module for processing to obtain an effect similar to the downsampling operation, and the feature image out1 is obtained through the processing of a modified Swin-Transformer Block module;
the Patch metering and the improved Swin-Transformer Block module are used as a second stage to process the image features to obtain a feature map out1, a feature map out2 and a feature map out3 respectively, wherein the size of the feature map out1 is [ H/8,W/8,2C ], the size of the feature map out2 is [ H/16, W/16,4C ], the size of the feature map out3 is [ H/32, W/32,8C ], and then the multi-scale feature map is extracted.
5. The lane line detection method as claimed in claim 4, wherein: the feature map is processed based on a feature fusion network to obtain a fused feature map, the feature fusion network refers to a neck network of YOLOv5, the obtained feature map out3 is convolved by a CBS module to obtain a high-level feature map f3;
the high-level characteristic diagram f3 is subjected to up-sampling and splicing operation of the characteristic diagram out2 to obtain a characteristic diagram of the middle layer, and the characteristic diagram of the middle layer f2 is obtained after the characteristic diagram of the middle layer is processed by a C3II_1 module and convolved by a CBS module;
the middle-layer characteristic diagram f2 is subjected to up-sampling and splicing operation of the characteristic diagram out1, and a bottom-layer characteristic diagram f1 is obtained after being processed by a C3II_1 module and is output to a detection head;
processing the bottom layer characteristic diagram f1 by adopting a CBS module and splicing the middle layer characteristic diagram f2, processing the spliced characteristic diagram by adopting a C3II_1 module, obtaining a new middle layer characteristic diagram f2', and outputting the new middle layer characteristic diagram f2' to a detection head;
and (3) performing convolution operation on the new middle-layer feature map f2' by adopting a CBS module, splicing the new middle-layer feature map f2' with the high-layer feature map f3, processing the spliced feature map by using a C3II_1 module, and outputting the new high-layer feature map f3' to a detection head.
6. The lane line detection method according to claim 5, wherein: the CBS module is composed of Conv functions, BN functions and SiLU functions, wherein the Conv functions are used for carrying out convolution operation on the feature graphs, the BN functions are used for carrying out batch normalization operation on data, and the SiLU functions are used as final activation functions.
7. The lane line detection method according to any one of claims 4 to 6, wherein: the C3 module is composed of 3 CBS modules and a plurality of Bottleneck modules, the C3 module learns residual characteristics, the structure comprises two branches, wherein a first branch uses a plurality of specified Bottleneck stacks and 3 standard CBS modules, a second branch only passes through one CBS module, and finally the two branches are spliced.
8. The lane line detection method according to claim 7, wherein: the method comprises the steps that the fused feature images are input into a detection head for detection, a feature pyramid network and a path aggregation network are combined, a pyramid from bottom to top is added behind the feature pyramid network by the path aggregation network, the strong positioning features of the lower layer are transmitted, and the multi-scale fused feature images in the path aggregation network are input into the detection head for detection.
9. The lane line detection method as claimed in claim 8, wherein: the improved prediction frame optimizes the prediction frame based on a circular smooth label, predicts angles by a classification rather than regression method, and selects a proper window function to avoid the problem of angle periodicity, wherein the circular smooth label is expressed as:
where g (x) represents the window function, r is the radius of the window function, and θ represents the angle of the current bounding box.
10. The lane line detection method according to claim 8 or 9, wherein: and fitting the lane line information represented by the prediction frame by adopting a Bezier curve polynomial, wherein the Bezier curve general formula of the order n is expressed as follows:
wherein P is i Synchronization is expressed as the abscissa or ordinate of a given point, n represents the order of the Bezier curve polynomial, and i represents the ith term of the polynomial.
CN202310504571.8A 2023-05-06 2023-05-06 Lane line detection method Pending CN116630917A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310504571.8A CN116630917A (en) 2023-05-06 2023-05-06 Lane line detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310504571.8A CN116630917A (en) 2023-05-06 2023-05-06 Lane line detection method

Publications (1)

Publication Number Publication Date
CN116630917A true CN116630917A (en) 2023-08-22

Family

ID=87620427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310504571.8A Pending CN116630917A (en) 2023-05-06 2023-05-06 Lane line detection method

Country Status (1)

Country Link
CN (1) CN116630917A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710755A (en) * 2024-02-04 2024-03-15 江苏未来网络集团有限公司 Vehicle attribute identification system and method based on deep learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710755A (en) * 2024-02-04 2024-03-15 江苏未来网络集团有限公司 Vehicle attribute identification system and method based on deep learning

Similar Documents

Publication Publication Date Title
CN110728200B (en) Real-time pedestrian detection method and system based on deep learning
US20240037926A1 (en) Segmenting objects by refining shape priors
CN108830171B (en) Intelligent logistics warehouse guide line visual detection method based on deep learning
CN111612807A (en) Small target image segmentation method based on scale and edge information
CN111008632B (en) License plate character segmentation method based on deep learning
CN110781744A (en) Small-scale pedestrian detection method based on multi-level feature fusion
CN111914838A (en) License plate recognition method based on text line recognition
CN114155527A (en) Scene text recognition method and device
CN113095152B (en) Regression-based lane line detection method and system
CN112132013B (en) Vehicle key point detection method
CN116052026B (en) Unmanned aerial vehicle aerial image target detection method, system and storage medium
CN113903028A (en) Target detection method and electronic equipment
CN114120289A (en) Method and system for identifying driving area and lane line
CN116630917A (en) Lane line detection method
CN115238758A (en) Multi-task three-dimensional target detection method based on point cloud feature enhancement
CN114596548A (en) Target detection method, target detection device, computer equipment and computer-readable storage medium
CN112949500A (en) Improved YOLOv3 lane line detection method based on spatial feature coding
Al Mamun et al. Efficient lane marking detection using deep learning technique with differential and cross-entropy loss.
CN117037119A (en) Road target detection method and system based on improved YOLOv8
CN113408413B (en) Emergency lane identification method, system and device
CN116052149A (en) CS-ABCNet-based electric power tower plate detection and identification method
CN115294548A (en) Lane line detection method based on position selection and classification method in row direction
CN114332814A (en) Parking frame identification method and device, electronic equipment and storage medium
CN114882449B (en) Car-Det network model-based vehicle detection method and device
CN117392392B (en) Rubber cutting line identification and generation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination