CN115497069A - Lane line detection and classification method and system based on bidirectional separation attention - Google Patents

Lane line detection and classification method and system based on bidirectional separation attention Download PDF

Info

Publication number
CN115497069A
CN115497069A CN202211066541.5A CN202211066541A CN115497069A CN 115497069 A CN115497069 A CN 115497069A CN 202211066541 A CN202211066541 A CN 202211066541A CN 115497069 A CN115497069 A CN 115497069A
Authority
CN
China
Prior art keywords
lane line
module
bidirectional
attention
lightweight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211066541.5A
Other languages
Chinese (zh)
Inventor
孔斌
张露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Institutes of Physical Science of CAS
Original Assignee
Hefei Institutes of Physical Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Institutes of Physical Science of CAS filed Critical Hefei Institutes of Physical Science of CAS
Priority to CN202211066541.5A priority Critical patent/CN115497069A/en
Publication of CN115497069A publication Critical patent/CN115497069A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The invention provides a lane line detection and classification method and system based on bidirectional separation attention, wherein the method comprises the following steps: constructing a lightweight lane line detection classification network LNet, wherein the lightweight lane line detection classification network LNet comprises: the symmetrical units and the dense symmetrical blocks are used for densely connecting at least 2 symmetrical units; and constructing a bidirectional separation attention mechanism so as to construct the dependency relationship among the long-distance pixels and acquire global position information and local detail information. The invention solves the technical problems of large parameter quantity and poor segmentation performance.

Description

Lane line detection and classification method and system based on bidirectional separation attention
Technical Field
The invention relates to the field of computer vision, in particular to a lane line detection and classification method and system based on bidirectional separation attention.
Background
Image segmentation is a multi-classification problem of deep features of pixels that attempts to combine pixels in an image that meet certain specific semantics by assigning to each pixel a class label of a corresponding attribute, presenting different class regions that meet visual judgment. Therefore, the invention applies the segmentation idea to the lane line detection and classification task. The vision-based segmentation method can provide information such as lane line positions and types in a road scene for environmental perception of the automatic driving vehicle, and plays a very important role in safety in the driving process of the automatic driving vehicle.
In early image segmentation tasks, features such as color, gradient, and geometric shape were often selected to segment images. Segmentation based on thresholds, edges, clustering, graph theory, and regions are all common segmentation algorithms. In recent years, CNN has been applied to a plurality of image segmentation fields due to a strong feature extraction capability and a semantic information understanding capability. Although semantic segmentation networks, such as FCN, UNet, segNet, and deep series networks, achieve better segmentation performance, the number of network parameters is large. Then, considering the problem of more network parameters, the ENet, ERFNet and BiseNet series networks realize the balance between the dividing performance and the parameter quantity by constructing a light weight module. However, since there is a correlation between objects in the image, if only a single pixel is concerned, the dependency relationship between pixels is not concerned, which may reduce the segmentation performance of the network. The prior invention patent document with publication number CN110688971A, a method, device and equipment for detecting a dashed lane line, includes: carrying out feature extraction on a road image to be detected to obtain a feature map of the road image; determining a lane line area in the road image and an end point pixel point in the road image according to the feature map; the endpoint pixel points are pixel points of endpoints which may be dotted lane lines in the road image; and determining a dotted line lane line in the road image based on the lane line region and the endpoint pixel points. As can be known from the detailed content in the prior art, the FCN is used as a feature extraction network, a high-dimensional feature map of a road sample image is obtained after multiple times of convolution and down-sampling, and then the high-dimensional feature map conv1 is subjected to deconvolution and up-sampling to obtain the image feature us _ conv1. The image feature us _ conv1 is then input to the area prediction network and the endpoint prediction network. Therefore, in the process of down-sampling and up-sampling, a feasible scheme is not further disclosed aiming at the problems of image distortion, feature precision loss and the like in the conventional scheme, the precision of lane line feature extraction cannot be ensured, and meanwhile, the prior art does not fully consider the association relation among targets in the image, so that the network segmentation performance is reduced. The patent document CN112287912A of the present invention, "a method and an apparatus for detecting lane lines based on deep learning", includes: acquiring a first picture to be detected; inputting the first picture into a target neural network model to obtain a target characteristic diagram, wherein the target neural network model comprises a neural network model generated based on a multi-scale attention mechanism and a depth separable convolution model, and the target characteristic diagram is used for representing the probability that each pixel in the first picture is a lane line pixel; and performing image post-processing on the target characteristic graph to obtain a target detection result. In the embodiment of the specification of the prior art, the following are described: and respectively carrying out convolution operation on the first characteristic diagram, wherein different convolution kernels are adopted to fuse the information of the lane line elements under different receptive field scales. And outputting three feature maps, specifically, two second-class feature maps and one first-class feature map, calculating the correlation between elements to determine the importance of each pixel position in the feature maps to global inference, and counting the value of each position element from the dimension of a channel by using a statistical method. It can be known that, although the correlation between elements in the feature map is considered in the prior art, the statistical processing logic disclosed in the prior art can only eliminate connected domains not belonging to the lane line and identify a conventional lane, and cannot sufficiently fuse local features and global features in the feature image, so that the image segmentation performance of the network model is still restricted in a specific application scene in which the lane line is blocked, and the network parameters in the network model adopted in the prior art are large, thereby increasing the operation difficulty.
In summary, the prior art has the technical problems of large parameter number and poor segmentation performance.
Disclosure of Invention
The technical problem to be solved by the invention is how to solve the technical problems of large parameter quantity and poor segmentation performance in the prior art.
The invention adopts the following technical scheme to solve the technical problems: the lane line detection and classification method based on bidirectional separation attention comprises the following steps:
s1, constructing a lightweight lane line detection classification network LNet, wherein the lightweight lane line detection classification network LNet comprises: symmetric unit and dense symmetric block, utilize dense symmetric block dense connection to be no less than 2 symmetric units, wherein, step S1 includes:
s11, processing the input image by down-sampling of an encoder of the lightweight lane line detection and classification network LNet to reduce the resolution of the input image step by step so as to obtain a low-resolution feature map;
s12, processing the last layer of convolution image in the encoder by a decoder to restore the small-resolution feature map to the original size of the input image, wherein the output features of the decoder and the encoder are processed in a fusion mode to obtain fusion features;
s13, training a lightweight lane line detection classification network LNet by using a weighted cross entropy loss function;
s2, constructing a bidirectional separation attention mechanism, so as to construct a dependency relationship between long-distance pixels and acquire global position information and local detail information, wherein the step S2 comprises the following steps:
s21, adding the bidirectional separation attention module TSA into the lightweight lane line detection and classification network LNet so as to construct a bidirectional attention lightweight lane line detection and classification network TSANet;
and S22, integrating the target texture and the target position information from the horizontal pixels and the vertical pixels respectively by using the bidirectional separation attention module TSA, taking the fusion characteristics as an input characteristic diagram, and processing the input characteristic diagram by using the bidirectional separation attention module TSA so as to obtain an applicable mapping relation as a dependency relation between the long-distance pixels.
The present invention is based on the idea of semantic segmentation to redesign the encoder-decoder model to generate a lightweight network (LNet) with fewer parameters. Second, to make the model robust to challenging environments, bidirectional separation attention (TSA) is introduced to build the dependency between long-distance pixels. And finally, fusing the TSA into the LNet, thereby obtaining more accurate lane line detection and classification results. The TSANet adopted by the invention improves the lane line detection accuracy on the data sets such as the Tusimple and the like, optimizes the image segmentation effect, and is suitable for real-time lane line detection and classification in an intelligent driving scene.
Compared with the LLNet sub-unit adopted in the prior art, the symmetric unit adopts the one-dimensional convolution with smaller convolution kernel, only needs less calculation amount and also reduces the parameter amount.
In a more specific technical scheme, in step S1, after the bidirectional attention separating module TSA is placed in each densely connected block in the encoder, the bidirectional attention lightweight lane line detecting and classifying network TSANet extracts the target features and semantic information.
According to the invention, information with more scales is obtained by adopting a dense connection mode, so that the problem of gradient disappearance in a network is solved, and the over-fitting phenomenon is prevented.
In a more specific technical solution, in step S11, the lightweight lane line detection classification network LNet fuses the output resolution of the multi-scale features extracted at the encoder stage in the up-sampling process to make up for information loss.
The present invention constructs a lightweight lane line detection and classification network (LNet) having symmetric units and dense symmetric blocks. The present invention constructs a bidirectional separation attention mechanism (TSA) to construct the dependency between long-distance pixels. The invention adds a bidirectional attention separating module (TSA) to the LNet, constructs a lightweight lane line detection and classification network (TSANet) based on bidirectional attention, and enhances the capability of the network to acquire the position of a target area and the dependency relationship between long-distance pixels.
In a more specific embodiment, step S12 includes:
s121, performing fusion processing on the multi-scale features output by the first encoder DS1 and the seventh encoder DS 7;
s122, performing fusion processing on the multi-scale features output by the second encoder DS2 and the sixth encoder DS 6;
and S123, fusing the multi-scale features output by the third encoder DS3 and the fifth encoder DS 5.
The LNet adopted by the invention integrates the multi-scale features extracted at the encoder stage in the up-sampling process to gradually increase the output resolution, and integrates the features of the encoder stage and the decoder stage which contain rich global information. By adopting the method, partial loss of information is made up, low-layer characteristics can be fused into a decoder to make up for the information lost along with the deepening of the network layer number, so that the final result not only has global information but also retains local detail information.
The invention only considers the DS block composed of two and three symmetrical units to reduce the rapid increase of the channel number brought by dense connection, so as to improve the segmentation precision and ensure the real-time property of the network under the specific network depth scene.
In a more specific technical solution, in step S13, the lightweight lane line detection classification network LNet is trained with the following weighted cross entropy loss function:
Figure BDA0003828607330000041
in the formula, n is the number of pixels, yi is id of the ith pixel, wherein id represents a characteristic lane line or a background, and pi represents the class prediction probability of the ith pixel.
In a more specific solution, a regularization parameter L is utilized 2 The weight parameters are constrained to the applicable region to reduce the overfitting probability.
The invention trains a network using a weighted cross-entropy loss function, where L is regularized 2 The weight parameters can be constrained to a smaller area, which can reduce the probability of overfitting.
In a more specific technical solution, the bidirectional separated attention module TSA in step S22 is an independent module.
The TSA adopted in the invention is an independent module, can be placed at any position on the network, and improves the applicability of the system.
In a more specific embodiment, step S22 includes:
s221, performing maximum pooling and average pooling on the input feature map from the transverse direction and the longitudinal direction respectively by utilizing the following logics to obtain a transverse feature map and a longitudinal feature map;
Figure BDA0003828607330000042
Figure BDA0003828607330000043
in the formula, MP and AP respectively represent maximum pooling and average pooling;
s222, transmitting the transverse characteristic diagram and the longitudinal characteristic diagram to a sharing module, processing the transverse characteristic diagram and the longitudinal characteristic diagram to obtain a transverse sharing characteristic and a longitudinal sharing characteristic, and processing the transverse sharing characteristic and the longitudinal sharing characteristic according to the following logic by using a Sigmoid activation function to obtain a transverse activation characteristic H n ' and longitudinal activation feature W n ':
H n '=f(Θ(H n ))
W n '=f(Θ(W n ))
In the formula, f is a Sigmoid activation function, and theta represents a sharing module;
s223, multiplying the transverse activation characteristic and the longitudinal activation characteristic by the following logic to obtain the attention characteristic so as to construct the dependency relationship between the long-distance pixels:
A n =F n ×H n '×W n '。
the TSA adopted by the invention can effectively integrate the texture and the position information of the target from the horizontal and longitudinal pixels respectively, without introducing a large amount of overhead, thereby improving the segmentation efficiency. The TSA of the present invention can feature encode image pixels in both the horizontal and vertical directions, respectively, to efficiently integrate spatial coordinate information into the generated attention features. Thus, the TSA can efficiently integrate coordinate information into the generated attention features and construct long-distance dependencies with spatial positions between pixels.
In a more specific technical solution, the sharing module in step S22 includes: 2 convolutional layers.
In a more specific technical solution, a lane line detection and classification system based on bidirectional separation attention includes:
lightweight lane line detects categorised module for establish lightweight lane line and detect categorised network LNet, wherein, lightweight lane line detects categorised network LNet includes: symmetrical unit and intensive symmetry piece utilize intensive symmetry piece intensive connection to be no less than 2 symmetrical unit, and wherein, lightweight lane line detects classification module includes:
the resolution reduction module is used for processing the input image by down-sampling of an encoder of the lightweight lane line detection classification network LNet so as to reduce the resolution of the input image step by step and obtain a low-resolution feature map;
the characteristic fusion module is used for recovering the original size of the small-resolution characteristic image to the input image by processing the last layer of convolution image in the encoder on the decoder, wherein the output characteristics of the decoder and the encoder are fused to obtain fusion characteristics, and the characteristic fusion module is connected with the resolution reduction module;
the network training module is used for training the lightweight lane line detection classification network LNet by using a weighted cross entropy loss function;
the position detail information module is used for constructing a bidirectional separation attention mechanism, constructing a dependency relationship between long-distance pixels and acquiring global position information and local detail information, and is connected with the lightweight lane line detection and classification module, and the position detail information module comprises:
the bidirectional attention network construction module is used for adding the bidirectional separation attention module TSA into the lightweight lane line detection classification network LNet so as to construct a bidirectional attention lightweight lane line detection classification network TSANT;
and the dependency relationship acquisition module is used for integrating the target texture and the target position information from the horizontal pixels and the vertical pixels respectively by using the bidirectional separation attention module TSA, taking the fusion characteristics as an input characteristic diagram, processing the input characteristic diagram by using the bidirectional separation attention module TSA to obtain an applicable mapping relationship as the dependency relationship among the long-distance pixels, and connecting the dependency relationship acquisition module with the bidirectional attention network construction module.
Compared with the prior art, the invention has the following advantages: the present invention is based on the idea of semantic segmentation to redesign the encoder-decoder model to generate a lightweight network (LNet) with fewer parameters. Second, to make the model robust to challenging environments, bidirectional separation attention (TSA) is introduced to build the dependency between long-distance pixels. And finally, fusing the TSA into the LNet, thereby obtaining more accurate lane line detection and classification results. The TSANT adopted by the invention improves the lane line detection accuracy on the data sets such as Tusimple and the like, optimizes the image segmentation effect, and is suitable for real-time lane line detection and classification under an intelligent driving scene.
Compared with the LLNet sub-unit adopted in the prior art, the symmetric unit adopts one-dimensional convolution with smaller convolution kernel, only needs less calculation amount and also reduces parameter amount.
According to the invention, information with more scales is obtained by adopting a dense connection mode, so that the problem of gradient disappearance in a network is solved, and the over-fitting phenomenon is prevented.
The present invention constructs a lightweight lane line detection and classification network (LNet) having symmetric units and dense symmetric blocks. The present invention constructs a bidirectional separation attention mechanism (TSA) to construct the dependency between long-distance pixels. The invention adds a bidirectional attention separating module (TSA) to the LNet, constructs a lightweight lane line detection and classification network (TSANT) based on bidirectional attention, and enhances the capability of the network to acquire the position of a target area and the dependency relationship between long-distance pixels.
The LNet adopted by the invention integrates the multi-scale characteristics extracted at the encoder stage in the up-sampling process to gradually increase the output resolution, and integrates the characteristics of the encoder stage and the decoder stage containing rich global information. By adopting the method, partial loss of information is made up, low-layer characteristics can be fused into a decoder to make up for the information lost along with the deepening of the network layer number, so that the final result not only has global information but also retains local detail information.
The invention only considers the DS block formed by two and three symmetrical units to reduce the rapid increase of the channel number brought by dense connection, improve the segmentation precision under the specific network depth scene and ensure the real-time property of the network.
The invention trains a network using a weighted cross-entropy loss function, where L is regularized 2 The weight parameters can be constrained to a smaller area, which can reduce the probability of overfitting.
The TSA adopted in the invention is an independent module, can be placed at any position on the network, and improves the applicability of the system.
The TSA adopted by the invention can effectively integrate the texture and position information of the target from the horizontal and vertical pixels respectively without introducing a large amount of overhead, thereby improving the segmentation efficiency. The TSA of the present invention can feature encode image pixels in both the horizontal and vertical directions, respectively, to efficiently integrate spatial coordinate information into the generated attention features. Therefore, the TSA can efficiently integrate coordinate information into the generated attention feature and construct a long-distance dependency relationship using spatial positions between pixels. The invention solves the technical problems of large parameter quantity and poor segmentation performance in the prior art.
Drawings
Fig. 1 is a schematic diagram of an overall network structure of a TSANet in a lane line detection and classification method based on bidirectional separation attention in embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of a structure of a symmetric unit in embodiment 1 of the present invention
FIG. 3a is a schematic diagram of a first composition structure of a dense symmetric block in embodiment 1 of the present invention;
FIG. 3b is a schematic diagram of a second compact and symmetrical structure in embodiment 1 of the present invention;
FIG. 4 is a schematic structural diagram of the dimension reduction layer in example 1 of the present invention;
fig. 5 is an architecture diagram of an LNet according to embodiment 1 of the present invention;
FIG. 6 is a schematic structural diagram of two-way attention separation in embodiment 1 of the present invention;
FIG. 7 is a view showing visual characteristics in different combination patterns of densely packed connector blocks according to example 2 of the present invention;
fig. 8 is a schematic diagram of a detection result of the TuSimple data centralized lane line detection network according to embodiment 3 of the present invention;
fig. 9 is a schematic diagram of a detection result of a self-established data centralized lane line detection network according to embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1, the present invention constructs a lightweight lane line detection and classification network (LNet) having symmetric units and dense symmetric blocks. Secondly, in order to construct the dependency relationship between long-distance pixels, a bidirectional separation attention mechanism (TSA) is constructed. In order to enhance the ability of the network to acquire the position of a target region and the dependency relationship between long-distance pixels, a bidirectional separate attention module (TSA) is added to the LNet, and a lightweight lane line detection and classification network (TSANet) based on bidirectional attention is constructed. The TSA is a stand-alone module that can be placed anywhere on the network. In order to enhance the ability of the network to extract target features and semantic information, in the network constructed by the present invention, TSA is placed after each densely-connected block in the encoder.
1. Symmetric unit and dense symmetric block:
as shown in fig. 2, the present invention proposes a symmetric unit having a symmetric structure. Compared with sub-units of LLNet, the symmetric unit adopts one-dimensional convolution with smaller convolution kernel, only needs less calculation amount, and also reduces parameter amount. The composition of the symmetric cell is given in fig. 2, and consists of 3 × 1 and 1 × 3 convolutional layers.
As shown in fig. 3a and 3b, dense connections can acquire information of more scales, so that the problem of gradient disappearance in the network is solved, and the over-fitting phenomenon is prevented. Thus, the present invention constructs a dense symmetric block (DS). The DS block is formed by connecting a plurality of symmetrical units through dense connection, and is used for enhancing information transmission of each layer. The deeper the depth of the network is, the higher the segmentation precision is, and the real-time performance of the network needs to be ensured, so the invention only considers the DS blocks formed by two and three symmetrical units to reduce the rapid increase of the number of channels brought by dense connection, wherein, U is used for connecting the feature maps.
As shown in fig. 4, since the DS blocks adopt dense connections, the dimension of the feature map is greatly increased, thereby increasing the amount of computation. Therefore, the invention adopts the dimension reduction layer to reduce the dimension of the characteristic diagram after the DS block. In order to reduce the information loss caused by Pooling, pooling is not adopted in the dimensionality reduction layer, only a convolution layer of 1 × 1 is adopted, and the dimensionality reduction factor is set to be 0.5. Assuming that the input ratio is h × w × c, the dimension of the channel after dimensionality reduction is h × w × (c × 0.5).
2. Architecture of LNet:
as shown in fig. 5, lnets are constructed herein based on the symmetric cells and dense symmetric blocks designed above.
Firstly, an LNet encoder reduces the resolution of an input image step by adopting a downsampling operation, and reduces the calculated amount of a network by obtaining a characteristic diagram with small resolution; the decoder performs up-sampling operation on the last layer of convolution image in the encoder, and restores the feature map with smaller resolution to the size of the original input image. However, the resolution of the feature map of the last convolutional layer in the encoder is extremely small, and the corresponding detail information is very small. If the feature map is directly upsampled to restore the resolution of the original input image, much information is lost, and the segmentation accuracy is not high.
To make up for the partial loss of information, LNet progressively increases the resolution of the output by fusing the multi-scale features extracted at the encoder stage during up-sampling, and this operation is represented in fig. 4 using a black dashed line. The features output by the DS1 and the DS7 are fused, the features output by the DS2 and the DS6 are fused, and the features output by the DS3 and the DS5 are fused. Therefore, the invention fuses the characteristics of the encoder stage and the characteristics of the decoder stage which contain rich global information. By adopting the method, the low-layer characteristics can be fused into the decoder to make up the information lost along with the deepening of the network layer number, so that the final result not only has global information but also retains local detail information.
The network is trained by using a weighted cross entropy loss function, and the calculation formula of the loss function is shown as formula (1). Regularization L 2 The weight parameters may be constrained to a smaller area to reduce the probability of overfitting. In formula (1), n is the number of pixels, yi is id (characteristic lane line or background) of the ith pixel, and pi represents the class prediction probability of the ith pixel.
Figure BDA0003828607330000091
3. Bidirectional separation attention
In order to obtain global position information and local detail information of a target region and establish a dependency relationship between long-distance pixels, the invention provides bidirectional separation attention (TSA). Wherein the multiplication is at pixel level
Figure BDA0003828607330000097
Expressed, the Sigmoid function is represented by ^ j. The TSA can efficiently integrate texture and position information of the target from horizontal and vertical pixels, respectively, without introducing a large amount of overhead.
As shown in FIG. 6, assume that the characteristic diagram of the n-th layer is F n ∈h n ×w n ×c n . Input of TSA is F n The output is A n ∈h n ×w n ×c n . TSA finds an optimal mapping Γ:
Figure BDA0003828607330000092
first, a feature map F for input n Performing Max-pooling and average pooling (Avg-pooling) operations in the lateral and longitudinal directions, respectively, to obtain a lateral profile
Figure BDA0003828607330000093
And longitudinal feature map
Figure BDA0003828607330000094
As shown in equations (3) and (4). Where MP, AP represent maximum and average pooling, respectively.
Figure BDA0003828607330000095
Figure BDA0003828607330000096
Then, H is reacted with n And W n Into a shared module consisting of two convolutional layers, where K h =[1,w],K w =[h,1]. As shown in the formula (5) and the formula (6), the feature output of the shared module in the two directions is further transmitted to a Sigmoid activation function to generate H respectively n ' and W n '. In addition, in order to reduce the number of channels in the network, the TSA adds a 1 × 1 convolutional layer with a dimensionality reduction coefficient r.
H n '=f(Θ(H n )) (5)
W n '=f(Θ(W n )) (6)
Where f is a Sigmoid activation function and Θ represents a shared module.
Finally, the feature map (H) n ' and W n ') and an input feature map (F) n ) Multiplying to obtain attention feature (A) n ) As shown in equation (7).
A n =F n ×H n '×W n ' (7)
The TSA may feature encode image pixels in both the horizontal and vertical directions, respectively, to effectively integrate spatial coordinate information into the generated attention features. Thus, the TSA can efficiently integrate coordinate information into the generated attention features and construct long-distance dependencies with spatial positions between pixels.
Example 2
As shown in fig. 7, an ablation experiment of dense symmetric blocks was performed, and LNet was selected as a reference network in this embodiment in order to obtain the optimum number of symmetric units constituting a set connection block. To eliminate the effects of TSA, TSA and LNet are not combined. Five modes are designed for the present invention, 2222, 2223, 2233, 2333, and 3333, respectively, as shown in fig. 7. The pattern 2222 indicates that the number of symmetric elements in DS1, DS2, DS3, and DS4 is 2, respectively. As can be seen from fig. 7, when the mode 2223 is used, the lane line feature of the feature map is clearer and more detailed than the features acquired by the other modes. Further, pattern 2223 achieves better detection accuracy (94.43%) in table 1.
TABLE 1 comparison of results for different combination patterns of densely packed blocks in tusimple dataset
Figure BDA0003828607330000101
Example 3
An ablation experiment with TSA was performed, and in this example, TSA was used after each DS block in the encoder. The feasibility and rationality of the design was demonstrated by analyzing the performance of TSA additions at different locations. As shown in table 2, the performance of combining each DS block in the encoder with the TSA is optimal with an accuracy of up to 96.53%. I.e., demonstrate that the layout in which the TSA is placed behind each DS block is superior to the other layouts.
TABLE 2 comparison of results for TSA at different positions of LNet in the Tusimple dataset
Figure BDA0003828607330000102
Figure BDA0003828607330000111
Comparing and analyzing the performance of the lane line detection network, it can be seen from the data in table 3 that TSANet is superior to other lane detection methods. While TSANT has comparable detection accuracy to SCNN and SE + FG + BS + Push, TSANT takes less time than both methods. And SCNN takes 20 times as much time as TSANet. Although the single frame processing time of lanetatt using ResNet18 as a backbone network is 6.06ms, it is 0.963 lower than TSANet. However, the network parameters of lanetatt (ResNet 18) are 6 times higher than TSANet. Both ESA and TSA can extract features in both directions in the image. ESA and TSA were added to ERFNet, respectively, and the accuracy of ERFNet-TSA and ERFNet-ESA increased by 0.85% and 0.92%, respectively, compared to baseline ERFNet. Therefore, the extraction of the bidirectional features in the image is beneficial to the detection result. Therefore, the TSANT can simultaneously extract local texture and global position information, has fewer parameters, and achieves competitive performance.
TABLE 3 comparison and analysis of TuSimple data set, lane marking detection network Performance
Figure BDA0003828607330000112
As shown in fig. 8, fig. 8 shows lane line detection results in different scenes on the TuSimple dataset. As can be seen from fig. 8, the TSANet has better robustness for ground obstacles, occlusion, and other scenes. This is because TSANet can obtain local information and position information in two directions, and false detection and missed detection are avoided to some extent, which is favorable for obtaining a better detection result.
As shown in fig. 9, the present invention not only verifies the performance of TSANet on the TuSimple dataset, but also tests the robustness of TSANet on the self-built dataset. Selecting 8 scenes from the self-built data set, wherein (a) and (b) represent scenes that the road surface has no interference of traffic signs and the lane lines are clear; (c) water accumulation on the road surface; (d) The (e), (f) and (g) the road surface has the interference of different traffic signs; and (h) comprises a zigzag lane line. The result shows that TSANT can achieve better detection performance by constructing the dependency relationship among long-distance pixels.
Comparison and analysis of lane line classification network performance
As can be seen from the results in Table 4, TSANT has a mIoU of 69.7%, a F1-score of 95.3%, and the parameters are also only 0.21M higher than LNet. The output of Cascade-CNN, LNet and TSAnet are all result graphs of semantic categories, and TSAnet has competitive advantages in mIoU, F1-score, real-time and parameter quantity.
TABLE 4 comparison and analysis of TuSimple data set, lane line classification network Performance
Figure BDA0003828607330000121
In summary, the present invention is based on the idea of semantic segmentation to redesign the encoder-decoder model to generate a lightweight network with less parameters (LNet). Second, in order to make the model robust to challenging environments, bidirectional separate attention (TSA) was introduced to build the dependencies between long-distance pixels. And finally, fusing the TSA into the LNet, thereby obtaining more accurate lane line detection and classification results. The TSANT adopted by the invention improves the lane line detection accuracy on the data sets such as Tusimple and the like, optimizes the image segmentation effect, and is suitable for real-time lane line detection and classification under an intelligent driving scene.
Compared with the LLNet sub-unit adopted in the prior art, the symmetric unit adopts the one-dimensional convolution with smaller convolution kernel, only needs less calculation amount and also reduces the parameter amount.
According to the invention, information with more scales is obtained by adopting a dense connection mode, so that the problem of gradient disappearance in a network is solved, and the over-fitting phenomenon is prevented.
The present invention constructs a lightweight lane line detection and classification network (LNet) having symmetric units and dense symmetric blocks. The present invention constructs a bidirectional separation attention mechanism (TSA) to construct the dependency between long-distance pixels. The invention adds a bidirectional attention separating module (TSA) to the LNet, constructs a lightweight lane line detection and classification network (TSANet) based on bidirectional attention, and enhances the capability of the network to acquire the position of a target area and the dependency relationship between long-distance pixels.
The LNet adopted by the invention integrates the multi-scale features extracted at the encoder stage in the up-sampling process to gradually increase the output resolution, and integrates the features of the encoder stage and the decoder stage which contain rich global information. By adopting the method, partial loss of information is made up, low-layer characteristics can be fused into a decoder to make up for the information lost along with the deepening of the network layer number, so that the final result has global information and retains local detail information.
The invention only considers the DS block formed by two and three symmetrical units to reduce the rapid increase of the channel number brought by dense connection, improve the segmentation precision under the specific network depth scene and ensure the real-time property of the network.
The invention trains a network using a weighted cross-entropy loss function, where L is regularized 2 The weight parameters can be constrained to a smaller area, which can reduce the probability of overfitting.
The TSA adopted in the invention is an independent module which can be placed at any position on the network, thereby improving the applicability of the system.
The TSA adopted by the invention can effectively integrate the texture and position information of the target from the horizontal and vertical pixels respectively without introducing a large amount of overhead, thereby improving the segmentation efficiency. The TSA of the present invention can feature encode image pixels in both the horizontal and vertical directions, respectively, to efficiently integrate spatial coordinate information into the generated attention features. Thus, the TSA can efficiently integrate coordinate information into the generated attention features and construct long-distance dependencies with spatial positions between pixels. The invention solves the technical problems of large parameter quantity and poor segmentation performance in the prior art.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. The lane line detection and classification method based on bidirectional separation attention is characterized by comprising the following steps:
s1, constructing a lightweight lane line detection classification network LNet, wherein the lightweight lane line detection classification network LNet comprises: symmetric units and dense symmetric blocks, wherein at least 2 symmetric units are densely connected by using the dense symmetric blocks, and the step S1 comprises the following steps:
s11, processing an input image by using an encoder of the lightweight lane line detection and classification network LNet in a downsampling mode to reduce the resolution of the input image step by step so as to obtain a low-resolution feature map;
s12, processing the last layer of convolution image in the encoder by a decoder to restore the small-resolution feature map to the original size of the input image, wherein the output features of the decoder and the encoder are processed in a fusion mode to obtain a fusion feature;
s13, training the lightweight lane line detection classification network LNet by using a weighted cross entropy loss function;
s2, constructing a bidirectional separation attention mechanism, so as to construct a dependency relationship between long-distance pixels and acquire global position information and local detail information, wherein the step S2 comprises the following steps:
s21, adding a bidirectional separation attention module TSA into the lightweight lane line detection and classification network LNet so as to construct a bidirectional attention lightweight lane line detection and classification network TSANET;
and S22, integrating object texture and object position information from horizontal pixels and vertical pixels respectively by using the bidirectional separation attention module TSA, taking the fusion feature as an input feature map, and processing the input feature map by using the bidirectional separation attention module TSA to obtain an applicable mapping relation as the dependency relation between the long-distance pixels.
2. The method according to claim 1, wherein in step S1, after the bidirectional attention separating module TSA is placed in each densely-connected block in the encoder, the bidirectional attention separating module TSA is used by the bidirectional attention lightweight lane line detecting and classifying network TSANet to extract the target features and semantic information.
3. The method for detecting and classifying lane lines based on bidirectional discrete attention as claimed in claim 1, wherein in step S11, the lightweight lane line detection and classification network LNet fuses output resolutions of multi-scale features extracted at an encoder stage in an up-sampling process to make up for information loss.
4. The method for detecting and classifying lane lines based on bidirectional distraction according to claim 1, wherein the step S12 comprises:
s121, fusing the multi-scale features output by the first encoder DS1 and the seventh encoder DS 7;
s122, fusing the multi-scale features output by the second encoder DS2 and the sixth encoder DS 6;
and S123, fusing the multi-scale features output by the third encoder DS3 and the fifth encoder DS 5.
5. The method for detecting and classifying lane lines based on bidirectional separation attention of claim 1, wherein in step S13, the lightweight lane line detection classification network LNet is trained with the following weighted cross entropy loss function:
Figure FDA0003828607320000021
in the formula, n is the number of pixels, yi is id of the ith pixel, wherein id represents a characteristic lane line or a background, and pi represents the class prediction probability of the ith pixel.
6. The method of claim 5, wherein a regularization parameter L is used 2 The weight parameters are constrained to the applicable region to reduce the overfitting probability.
7. The method for detecting and classifying lane lines based on bidirectional separated attention as claimed in claim 1, wherein said bidirectional separated attention module TSA in step S22 is an independent module.
8. The method for detecting and classifying lane lines based on bidirectional distraction according to claim 1, wherein the step S22 comprises:
s221, performing maximum pooling and average pooling operations on the input feature map from the horizontal direction and the vertical direction respectively by using the following logics to obtain a horizontal feature map and a vertical feature map;
Figure FDA0003828607320000022
Figure FDA0003828607320000023
in the formula, MP and AP respectively represent maximum pooling and average pooling;
s222, transmitting the transverse feature map and the longitudinal feature map to a sharing module, processing to obtain a transverse sharing feature and a longitudinal sharing feature, and processing the transverse sharing feature and the longitudinal sharing feature according to the following logic by using a Sigmoid activation function to obtain a transverse activation feature H n ' and longitudinal activation feature W n ':
H n '=f(Θ(H n ))
W n '=f(Θ(W n ))
In the formula, f is a Sigmoid activation function, and theta represents a sharing module;
s223, multiplying the transverse activation features and the longitudinal activation features by the following logic to obtain attention features so as to construct the dependency relationship among the long-distance pixels:
A n =F n ×H n '×W n '。
9. the method for detecting and classifying lane lines based on bidirectional distraction according to claim 1, wherein the sharing module in step S22 comprises: 2 convolutional layers.
10. Lane line detection and classification system based on bidirectional separation attention, characterized in that the system comprises:
lightweight lane line detects categorised module for establish lightweight lane line detects categorised network LNet, wherein, lightweight lane line detects categorised network LNet includes: symmetrical unit and intensive symmetry piece, utilize intensive connection of intensive symmetry piece is no less than 2 symmetrical unit, wherein, lightweight lane line detects classification module includes:
the resolution reduction module is used for processing an input image by down-sampling of an encoder of the lightweight lane line detection and classification network LNet so as to reduce the resolution of the input image step by step and obtain a low-resolution feature map;
the characteristic fusion module is used for processing the last layer of convolution image in the encoder on a decoder to restore the original size of the small-resolution characteristic image to the input image, wherein the output characteristics of the decoder and the encoder are subjected to fusion processing to obtain fusion characteristics, and the characteristic fusion module is connected with the resolution reduction module;
the network training module is used for training the lightweight lane line detection classification network LNet by using a weighted cross entropy loss function;
the position detail information module is used for constructing a bidirectional separation attention mechanism, constructing a dependency relationship between long-distance pixels and acquiring global position information and local detail information, and is connected with the lightweight lane line detection and classification module, and the position detail information module comprises:
the bidirectional attention network construction module is used for adding the bidirectional separation attention module TSA into the lightweight lane line detection and classification network LNet so as to construct a bidirectional attention lightweight lane line detection and classification network TSANT;
a dependency relationship obtaining module, configured to integrate the target texture and the target position information from the horizontal pixels and the vertical pixels respectively by using the bidirectional separating attention module TSA, use the fusion feature as an input feature map, use the bidirectional separating attention module TSA to process the input feature map, and obtain an applicable mapping relationship as a dependency relationship between the long-distance pixels, where the dependency relationship obtaining module is connected to the bidirectional attention network building module.
CN202211066541.5A 2022-09-01 2022-09-01 Lane line detection and classification method and system based on bidirectional separation attention Pending CN115497069A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211066541.5A CN115497069A (en) 2022-09-01 2022-09-01 Lane line detection and classification method and system based on bidirectional separation attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211066541.5A CN115497069A (en) 2022-09-01 2022-09-01 Lane line detection and classification method and system based on bidirectional separation attention

Publications (1)

Publication Number Publication Date
CN115497069A true CN115497069A (en) 2022-12-20

Family

ID=84469118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211066541.5A Pending CN115497069A (en) 2022-09-01 2022-09-01 Lane line detection and classification method and system based on bidirectional separation attention

Country Status (1)

Country Link
CN (1) CN115497069A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372983A (en) * 2023-10-18 2024-01-09 北京化工大学 Low-calculation-force automatic driving real-time multitasking sensing method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372983A (en) * 2023-10-18 2024-01-09 北京化工大学 Low-calculation-force automatic driving real-time multitasking sensing method and device

Similar Documents

Publication Publication Date Title
CN109377530B (en) Binocular depth estimation method based on depth neural network
CN111915592B (en) Remote sensing image cloud detection method based on deep learning
CN112396607B (en) Deformable convolution fusion enhanced street view image semantic segmentation method
CN111582201A (en) Lane line detection system based on geometric attention perception
CN107025440A (en) A kind of remote sensing images method for extracting roads based on new convolutional neural networks
CN114445430B (en) Real-time image semantic segmentation method and system for lightweight multi-scale feature fusion
CN112581409B (en) Image defogging method based on end-to-end multiple information distillation network
CN108038486A (en) A kind of character detecting method
CN116453121B (en) Training method and device for lane line recognition model
Cai et al. X-distill: Improving self-supervised monocular depth via cross-task distillation
CN113095152A (en) Lane line detection method and system based on regression
CN115205672A (en) Remote sensing building semantic segmentation method and system based on multi-scale regional attention
CN116385326A (en) Multispectral image fusion method, device and equipment based on multi-target segmentation
CN113505634A (en) Double-flow decoding cross-task interaction network optical remote sensing image salient target detection method
CN115601723A (en) Night thermal infrared image semantic segmentation enhancement method based on improved ResNet
CN112766056A (en) Method and device for detecting lane line in low-light environment based on deep neural network
CN116740439A (en) Crowd counting method based on trans-scale pyramid convertors
Li et al. An aerial image segmentation approach based on enhanced multi-scale convolutional neural network
CN115497069A (en) Lane line detection and classification method and system based on bidirectional separation attention
CN115527096A (en) Small target detection method based on improved YOLOv5
CN115115973A (en) Weak and small target detection method based on multiple receptive fields and depth characteristics
CN112801021B (en) Method and system for detecting lane line based on multi-level semantic information
Wang Remote sensing image semantic segmentation algorithm based on improved ENet network
CN116229406B (en) Lane line detection method, system, electronic equipment and storage medium
CN110503049B (en) Satellite video vehicle number estimation method based on generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination