CN112241728B - Real-time lane line detection method and system for learning context information by adopting attention mechanism - Google Patents

Real-time lane line detection method and system for learning context information by adopting attention mechanism Download PDF

Info

Publication number
CN112241728B
CN112241728B CN202011193555.4A CN202011193555A CN112241728B CN 112241728 B CN112241728 B CN 112241728B CN 202011193555 A CN202011193555 A CN 202011193555A CN 112241728 B CN112241728 B CN 112241728B
Authority
CN
China
Prior art keywords
feature map
lane line
layer
pixel
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011193555.4A
Other languages
Chinese (zh)
Other versions
CN112241728A (en
Inventor
孔斌
张露
杨静
王灿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Institutes of Physical Science of CAS
Original Assignee
Hefei Institutes of Physical Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Institutes of Physical Science of CAS filed Critical Hefei Institutes of Physical Science of CAS
Priority to CN202011193555.4A priority Critical patent/CN112241728B/en
Publication of CN112241728A publication Critical patent/CN112241728A/en
Application granted granted Critical
Publication of CN112241728B publication Critical patent/CN112241728B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention provides a real-time lane line detection method and a real-time lane line detection system for learning context information by adopting an attention mechanism. Then, in the decoding process, the two branches are fused synchronously, so that the detail features and the context information of the feature extraction network can be better supplemented to the up-sampling operation. To compensate for the missing detailed information and to generate precise points on the lane. The real-time lane line detection algorithm for learning the context information by using the attention mechanism is realized, the accuracy of the algorithm is effectively improved, and the processing time of each image is only 10ms. Therefore, the method is more suitable for lane line detection in an intelligent driving scene.

Description

Real-time lane line detection method and system for learning context information by adopting attention mechanism
Technical Field
The invention relates to the technical field of intelligent automobile driving safety, in particular to a real-time lane line detection method and a real-time lane line detection system for learning context information by adopting an attention mechanism.
Background
In the environment perception of the intelligent automobile, lane line detection is a basic function module, and is a premise that the automobile runs correctly according to traffic regulations, the correct lane line detection can enable the intelligent automobile to make further decisions and judgments on the position and the state of the intelligent automobile, and the automobile control system changes the position of the intelligent automobile based on the decisions and ensures that the automobile runs in a safe state. At present, the detection algorithms for lane lines are mainly divided into two types, namely a lane line detection algorithm based on traditional vision and a lane line detection algorithm based on a neural network. The traditional vision-based lane line detection algorithm extracts inherent features of a lane line, such as color, texture, edge information and the like. And then, converting the image into a bird's-eye view by utilizing perspective transformation. And finally, detecting the lane line by adopting Hough transform. The lane line detection algorithm based on deep learning has strong robustness in complex driving environments such as illumination change, shadow, night, bend and the like by training a large number of samples and autonomously learning features. Most lane line detection algorithms adopt an end-to-end mode to divide a task into two modules, one is a lane positioning module, and the other is a post-processing module. The lane locating module outputs the category (background or lane line) of the pixels in the image without considering the dependency relationship or structure between the pixels. Therefore, a post-processing module is required to be adopted to filter error detection, and a clustering algorithm is adopted to cluster the positioned lane lines to form a final lane.
The existing lane line detection algorithm based on deep learning is influenced by training labels, and only a fixed number of lane lines can be detected. The number of lane lines is constantly changing due to merging operations in driving, etc. Second, pooling operations in the network can reduce the resolution of the features, resulting in loss of information carried by the features.
The system comprises a backbone network, semantic segmentation branches and geometric distance embedding branches which are arranged behind the backbone network, an attention information transmission module which acts between two upper sampling layers at two adjacent stages of a decoder, namely between the semantic segmentation branches and the geometric distance embedding branches of the whole lane line, a geometric attention sensing module arranged at the tail ends of the semantic segmentation branches and the geometric distance embedding branches, and a hopping pyramid fusion upper sampling module connected with the backbone network and the geometric attention sensing module. The system adopts a multi-task branch network structure, besides a lane line segmentation task, a geometric distance embedding branch is added, and the branch guides lane line segmentation by learning continuous distance representation from the center to the boundary of a lane line, so that the problem that the lane line cannot be effectively detected in a complex road scene due to high dependence on sparse lane line labeling is solved. Although the system can solve the problem of lane line detection in a complex road scene to a certain extent, semantic information is adopted for auxiliary classification, geometric information of only one branch is fused once, and attention is paid between the two branches. The method cannot link convolutional layer characteristics and attention mechanism, and is not fused in two branches synchronously, so that the detailed characteristics and context information of the characteristic extraction network are not supplemented comprehensively.
Disclosure of Invention
The invention aims to solve the technical problem of how to provide a lane line detection method suitable for complex scenes.
The invention solves the technical problems through the following technical means:
the real-time lane line detection method for learning context information by adopting an attention mechanism comprises the following steps of:
s01, in the encoding process, inputting an original image into a convolutional neural network, and performing convolutional operation on the original image by a first layer of the convolutional neural network to obtain a first characteristic diagram; inputting the first feature into an attention model, outputting a second feature map by the attention model, fusing the first feature map and the second feature map to output a third feature map, taking the third feature map as the input of the next layer of the convolutional neural network, executing the same operation as the previous layer, repeating the operation until the last layer, and outputting a fourth feature map;
s02, a decoding process, namely dividing the fourth feature map into a binarization branch and an embeddable branch; for the binarization branch, firstly carrying out primary up-sampling operation on the extracted fourth feature map in the encoding process, carrying out concat operation on the obtained feature map and features of a convolution operation of 1 × 1 of a third feature map output by the last layer of the convolution neural network, taking the obtained result as the input of the next layer, then executing the same operation as the previous layer, and so on, and carrying out primary 3 × 3 convolution operation after fusing the feature map extracted by the 1 st layer in the encoding stage and the up-sampling result to obtain the feature map of the binarization branch; aiming at the embeddable branch, adopting the same method as the binarization branch until the third feature extracted from the first layer in the encoding stage and the up-sampling result are fused, and then carrying out the up-sampling operation again to obtain a feature map of the embeddable branch; then multiplying the features of the binary branch by elements of a feature map capable of trapping into the branch to obtain pixel positions belonging to the lane line;
and S03, clustering the pixel positions to different lanes to obtain lane points belonging to a specific lane.
Further, it is characterized in that: in the step S01, CBAM is used as the attention model, and after the first feature map is input into the attention model, the channel attention feature map is calculated by using the formula (1),
Figure GDA0004006768160000039
then multiplying the channel attention feature map with the first feature map to obtain an intermediate feature map, then calculating a space attention feature map by using a formula (2),
Figure GDA00040067681600000310
finally, multiplying the space attention feature map and the middle feature map to obtain a third feature map;
wherein f represents a feature map, AP represents an average pooling layer, MP represents a maximum pooling layer, w1 and w0 represent weight information, and symbols in formula (2); denotes the concat operation and σ denotes the sigmoid function.
Further, in step S02, a weighted cross entropy loss function is used to train the binary partition network, and the formula is as follows:
Figure GDA0004006768160000038
wherein L is b The loss function is represented. N denotes the number of pixels, y i Type 0 or 1,0 for pixel i indicates that the pixel type is background, 1 indicates that the pixel type is lane line, p i The prediction probability of the class of the pixel i is represented, and wi represents weight information of the pixel i.
Further, it is characterized in that: using a loss of variance function L v And distance loss function L d Training in a combined manner may embed the segmentation network:
Figure GDA0004006768160000036
/>
Figure GDA0004006768160000037
f(x)=max(0,x)
where M is the number of lane lines, N i Number of pixels, μ, belonging to a lane line i i Is the average value, p, of all pixels belonging to the class of lane line i j Is the value of a pixel j of the class belonging to the lane line i, μ M i Belong to the lane line M i Average of all pixels. Mu M j Belong to the lane line M j Is averaged over all pixels.
Loss of variance function L v And distance loss function L d The specific combination process comprises the following steps: the weight coefficients a1 and a2 of the two losses are defined and multiplied with the calculated variance loss and distance loss respectively, meanwhile, in order to ensure that the distance between the center and the origin of each example is close, a regular term is added to constrain alpha, and the final loss function is
L=a1×Lv+a2×Ld+α。
Further, it is characterized in that: in the step S03, after clustering, a clustering effect is also verified by polynomial fitting; and the polynomial fitting adopts quadratic polynomial fitting.
The invention also provides a real-time lane line detection system for learning the context information by adopting the attention mechanism, which comprises
The encoding module is used for inputting the original image into a convolutional neural network, and a first layer of the convolutional neural network performs convolutional operation on the original image to obtain a first characteristic diagram; inputting the first feature into an attention model, outputting a second feature map by the attention model, fusing the first feature map and the second feature map to output a third feature map, taking the third feature map as the input of the next layer of the convolutional neural network, executing the same operation as the previous layer, repeating the operation until the last layer, and outputting a fourth feature map;
the decoding module divides the fourth feature map into a binarization branch and an embeddable branch; for the binarization branch, firstly carrying out primary up-sampling operation on the extracted fourth feature map in the encoding process, carrying out concat operation on the obtained feature map and features of a convolution operation of 1 × 1 of a third feature map output by the last layer of the convolution neural network, taking the obtained result as the input of the next layer, then executing the same operation as the previous layer, and so on, and carrying out primary 3 × 3 convolution operation after fusing the feature map extracted by the 1 st layer in the encoding stage and the up-sampling result to obtain the feature map of the binarization branch; aiming at the embeddable branch, adopting the same method as the binarization branch until the third feature extracted from the first layer in the encoding stage and the up-sampling result are fused, and then carrying out the up-sampling operation again to obtain a feature map of the embeddable branch; then multiplying the features of the binarization branches with elements of the feature map capable of being embedded into the branches to obtain pixel positions belonging to the lane lines;
and the post-processing module is used for clustering the pixel positions to different lanes so as to obtain lane points belonging to a specific lane.
Furthermore, in the encoding module, CBAM is used as an attention model, after the first feature map is input into the attention model, the channel attention feature map is calculated by formula (1),
Figure GDA0004006768160000041
then multiplying the channel attention characteristic diagram with the first characteristic diagram to obtain an intermediate characteristic diagram, then calculating a space attention characteristic diagram by using a formula (2),
Figure GDA0004006768160000042
finally, multiplying the space attention feature map and the middle feature map to obtain a third feature map;
wherein f represents a feature map, AP represents an average pooling layer, MP represents a maximum pooling layer, w1 and w0 represent weight information, and symbols in formula (2); denotes the concat operation and σ denotes the sigmoid function.
Further, in the decoding module, a weighted cross entropy loss function is used to train a binary separation network, and the formula is as follows:
Figure GDA0004006768160000051
wherein L is b The loss function is represented. N denotes the number of pixels, y i Type 0 or 1,0 for pixel i indicates that the pixel type is background, 1 indicates that the pixel type is lane line, p i The prediction probability of the class of the pixel i is represented, and wi represents weight information of the pixel i.
Further, a variance loss function L is used v And distance loss function L d Training in a combined manner may embed the segmentation network:
Figure GDA0004006768160000052
Figure GDA0004006768160000053
where M is the number of lane lines, N i Number of pixels, μ, belonging to a lane line i i Is the average value, p, of all pixels belonging to the class of lane line i j Value of a pixel j belonging to the class of lane line i, μ M i Belong to the lane line M i Is averaged over all pixels. μ M j Belong to the lane line M j Average of all pixels.
Loss of variance function L v And distance loss function L d The specific combination process comprises the following steps: the weight coefficients a1 and a2 of the two losses are defined and multiplied with the calculated variance loss and distance loss respectively, meanwhile, in order to ensure that the distance between the center and the origin of each example is close, a regular term is added to constrain alpha, and the final loss function is
L=a1×Lv+a2×Ld+α。
Further, in the post-processing module, after clustering, a clustering effect is verified through polynomial fitting; and the polynomial fitting adopts quadratic polynomial fitting.
The invention has the advantages that:
the invention adopts an encoding-decoding structure, takes the lane line detection task as an example segmentation task, and can detect any number of lane lines.
Specifically, in coding, the invention designs a fusion module, and learns the context information by establishing the correlation between each convolution layer of the feature extraction network and the attention mechanism. Then, in decoding, the two branches are fused synchronously, and the detail features and the context information of the feature extraction network can be better supplemented to the operation of up-sampling. To compensate for the missing detailed information and to generate precise points on the lane. The real-time lane line detection algorithm for learning the context information by using the attention mechanism is realized, the accuracy of the algorithm is effectively improved, and the processing time of each image is only 10ms. Therefore, the method is more suitable for lane line detection in an intelligent driving scene.
By combining the variance loss function and the distance loss function, the distance between the center point and the far point of each instance can be close, and the instance segmentation feature map can be better learned.
Through polynomial fitting, the clustering effect can be verified, and the detection accuracy is further ensured.
Drawings
FIG. 1 is a block diagram of a lane line detection process according to an embodiment of the present invention;
FIG. 2 is a diagram of a feature extraction network structure in the encoding stage according to an embodiment of the present invention;
FIG. 3 is a block diagram of an attention model CBAM according to an embodiment of the present invention;
FIG. 4 is an expanded flow diagram of the steps of FIG. 1;
fig. 5 is a lane line detection result diagram of the lane line detection method in different scenes in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment provides a real-time lane line detection method for learning context information by adopting an attention mechanism, which adopts a coding-decoding structure, wherein the coding structure is used for extracting feature information of a lane line, and the decoding structure is used for acquiring an example segmentation feature map of the lane line, and the method specifically comprises the following steps:
step 1, encoding process:
as shown in fig. 1, the feature extraction network of the present embodiment is based on VGG 16. The VGG16 is a classification network that ultimately outputs 1000-dimensional feature vectors representing scores of 1000 classes. If the last fully connected layer is replaced by the convolutional layer, the finally obtained feature map has a size of only 1/32 of that of the input image. Because the network structure of the VGG16 includes 5 max pooling layers, the resolution of the feature map is doubled every time a maximum pooling layer is passed, and the information carried by the feature map is lost. Therefore, the embodiment constructs a fusion module, and further establishes the correlation between the convolution feature and the attention mechanism feature to learn the context information. In this embodiment, a skip pyramid fusion module is used for fusion.
In neural networks, convolutional layers obtain output features by linear combination of convolutional kernels and primitive features. Since the convolution kernel is usually local, in order to increase the receptive field, the way of stacking convolution layers is often adopted, and in fact, the processing way is not efficient. The attention mechanism can obtain larger receptive field and context information by capturing global information. Since the CBAM is a lightweight general-purpose module, the CBAM can be integrated into various convolutional neural networks for end-to-end training. Thus, the present embodiment employs CBAM as the attention mechanism module. Fig. 2 is a schematic structural diagram of CBAM.
The encoding process specifically comprises: inputting an original image into a convolutional neural network VGG16, and performing convolution operation on the original image by a first layer of the convolutional neural network VGG16 to obtain a first feature map; and then inputting the first feature into an attention model, outputting a second feature map by the attention model, fusing the first feature map and the second feature map to output a third feature map, taking the third feature map as the input of the next layer of the convolutional neural network, executing the same operation as the previous layer, and repeating the steps until the last layer, and outputting a fourth feature map.
As can be seen from fig. 2, with respect to the first feature map, the channel attention feature map is first obtained by using the formula (1), and then the channel attention feature map is multiplied by the input feature map to obtain an intermediate feature map. And then calculating a space attention feature map by using a formula (2), and multiplying the space attention feature map by the space attention feature map to obtain a final output second feature map.
Figure GDA0004006768160000071
Figure GDA0004006768160000072
Where f represents the feature map, AP represents the average pooling layer, MP represents the maximum pooling layer, and w1 and w0 represent weight information. The symbols in formula (2); denotes the concat operation and σ denotes the sigmoid function.
Step 2, decoding process:
the present embodiment divides the last layer of encoding into a binarization branch and an embeddable branch. The function of the binary branch is to locate the position of the lane line, and the function of the embeddable branch is to acquire the instance information of each lane line. To ensure that the output image is the same size as the input image, the present embodiment uses an upsampling layer in the decoding stage. However, if the present embodiment extracts features directly through the upsampling layer, much detail is lost. Therefore, the present embodiment fuses the features extracted in the encoding stage with the up-sampling result, as shown in fig. 3. The dimension reduction is carried out by carrying out convolution operation of 1 multiplied by 1 once on the extracted features of each layer in the encoding stage. For the binarization branch, firstly, performing primary up-sampling operation on the features extracted from the last layer and the fifth layer in the encoding stage, performing concat operation on the obtained features and the features after the convolution operation of 1 × 1, and sending the obtained result to the next layer. By analogy, after the features extracted from the layer 1 in the encoding stage and the up-sampling result are fused, a 3 × 3 convolution operation is performed, and the feature map of the binary branch with the same size as the original image can be obtained. The embeddable branch adopts the same flow, and the difference lies in that after the characteristics extracted from the layer 1 in the coding stage and the up-sampling result are fused, the up-sampling operation of 3 x 3 is carried out again, and the characteristic diagram of the embeddable branch can be obtained. Therefore, the characteristics extracted in the encoding stage can be fully utilized in the characteristic extraction process to make up for the detail information lost by the upper sampling layer.
The lane lines are elongated in shape and occupy a small area in the overall image. Background information is much larger than the lane line information, which can cause category imbalance. In order to eliminate the influence of class imbalance, a weighted cross entropy loss function is adopted to train a binary segmentation network, as shown in formula (3).
Figure GDA0004006768160000081
Wherein L is b Representing a loss function. N denotes the number of pixels, y i Type 0 or 1,0 for pixel i indicates that the pixel type is background, 1 indicates that the pixel type is lane line, p i The prediction probability of the class of the pixel i is represented, and wi represents weight information of the pixel i.
For better learning of example segmentation feature maps, a combined manner of variance loss and distance loss is adopted. The variance loss Lv minimizes the distance between pixels belonging to the same lane line. The distance loss Ld maximizes the distance between pixels that do not belong to the same lane line. Distance first and difference rear
Figure GDA0004006768160000082
Figure GDA0004006768160000083
Where M is the number of lane lines, N i Number of pixels, μ, belonging to a lane line i i Average, p, of all pixels belonging to the class i j Is the value of a pixel j of the class belonging to the lane line i, μ M i Belong to the lane line M for the category i Is averaged over all pixels. Mu M j Belong to the lane line M j Is averaged over all pixels.
The specific combination process of variance loss and distance loss is as follows: the weight coefficients a1 and a2 defining the two losses are multiplied by the found variance loss and distance loss, respectively. Meanwhile, in order to ensure that the distance between the center (μ) and the origin (pixel point of the image) of each example is not too far, a regular term is added to constrain alpha, and the final loss function: a1 × Lv + a2 × Ld + α. In this embodiment, a1= a2=1.0, and α =0.005.
Step 3, post-treatment process
The segmentation feature maps of the two branches can be obtained by the previous encoding process and decoding process. Then, elements of the semantic segmentation graph generated by the binary branch are multiplied by elements of the segmentation graph generated by the embeddable branch. This allows a more accurate pixel position belonging to the lane line. The results are then sent to a post-processing module. The post-processing module comprises a clustering algorithm, and the pixels are clustered to different lanes by generally adopting a mean clustering algorithm, so as to obtain lane points belonging to a specific lane. The lane line detection flow is shown in fig. 4. In order to verify the clustering effect, after clustering, the clustering effect is also verified by quadratic polynomial fitting.
In order to verify the experimental effect of the embodiment and compare with other methods, as can be seen from table 1, the embodiment not only achieves the highest accuracy, but also consumes the least time, and more symbolizes the real-time requirement of the intelligent driving scene.
TABLE 1 Experimental results of different algorithms
Rate of accuracy FPR FNR Time consuming (ms)
Densenet 0.871 0.122 0.575 42
Enet 0.899 0.094 0.436 86
Erfnet 0.940 0.058 0.174 53
Mobilenet 0.719 0.285 0.035 50
Ours 0.949 0.049 0.093 10
Fig. 5 shows the detection effect of the embodiment in different scenarios. Therefore, the method and the device can be used for detecting the lane lines in different scenes and obtain a better detection effect. The final result graph is the result obtained by the method of the embodiment, and the true result graph is the actual result.
The embodiment firstly establishes the correlation between the features extracted by convolution and the features extracted by the attention mechanism in the encoding process to learn the context information, and can improve the condition that the lane identification features are not obvious by utilizing the context information. Secondly, in the decoding module, the context information learned by the encoding module is fused with the features extracted by the up-sampling module to compensate the lost detail information. The validity of the present embodiment is verified in the public data set TuSimple. The result proves that the embodiment can process an uncertain number of lanes, the accuracy of the algorithm is effectively improved, and the processing time of each image is only 10ms. Therefore, the method is more suitable for lane line detection in an intelligent driving scene.
The present embodiment further provides a real-time lane line detection system for learning context information by using an attention mechanism, which includes:
the coding module:
as shown in fig. 1, the feature extraction network of the present embodiment is based on VGG 16. The VGG16 is a classification network that ultimately outputs 1000-dimensional feature vectors representing scores of 1000 classes. If the last fully connected layer is replaced by the convolutional layer, the finally obtained feature map has a size of only 1/32 of that of the input image. Because the network structure of the VGG16 includes 5 max pooling layers, the resolution of the feature map is doubled every time a maximum pooling layer is passed, and the information carried by the feature map is lost. Therefore, the embodiment constructs a fusion module, and further establishes the correlation between the convolution feature and the attention mechanism feature to learn the context information. In this embodiment, a skip pyramid fusion module is used for fusion.
In neural networks, convolutional layers obtain output features through linear combination of convolutional kernels and primitive features. Since the convolution kernel is usually local, in order to increase the receptive field, the way of stacking convolution layers is often adopted, and in fact, the processing way is not efficient. The attention mechanism can obtain larger receptive field and context information by capturing global information. Since the CBAM is a lightweight general-purpose module, the CBAM can be integrated into various convolutional neural networks for end-to-end training. Thus, the present embodiment employs CBAM as the attention mechanism module. Fig. 2 is a schematic structural diagram of CBAM.
The encoding process specifically comprises: inputting an original image into a convolutional neural network VGG16, and performing convolution operation on the original image by a first layer of the convolutional neural network VGG16 to obtain a first feature map; and then inputting the first feature into an attention model, outputting a second feature map by the attention model, fusing the first feature map and the second feature map to output a third feature map, taking the third feature map as the input of the next layer of the convolutional neural network, executing the same operation as the previous layer, and repeating the steps until the last layer, and outputting a fourth feature map.
As can be seen from fig. 2, with respect to the first feature map, the channel attention feature map is first obtained by using the formula (1), and then the channel attention feature map is multiplied by the input feature map to obtain an intermediate feature map. And then calculating a space attention feature map by using a formula (2), and multiplying the space attention feature map by the space attention feature map to obtain a final output second feature map.
Figure GDA0004006768160000101
Figure GDA0004006768160000102
Where f represents a feature map, AP represents an average pooling layer, MP represents a maximum pooling layer, and w1 and w0 represent weight information. The symbols in formula (2); denotes the concat operation and σ denotes the sigmoid function.
A decoding module:
the present embodiment divides the last layer of encoding into a binarization branch and an embeddable branch. The function of the binary branch is to locate the position of the lane line, and the function of the embeddable branch is to acquire the instance information of each lane line. To ensure that the output image is the same size as the input image, the present embodiment uses an upsampling layer in the decoding stage. However, if the present embodiment extracts features directly through the upsampling layer, much detail is lost. Therefore, the present embodiment fuses the features extracted in the encoding stage with the up-sampling result, as shown in fig. 3. The dimension reduction is carried out by carrying out convolution operation of 1 multiplied by 1 once on the extracted features of each layer in the encoding stage. For the binarization branch, firstly, performing primary up-sampling operation on the features extracted from the last layer and the fifth layer in the encoding stage, performing concat operation on the obtained features and the features after the convolution operation of 1 × 1, and sending the obtained result to the next layer. By analogy, after the features extracted from the layer 1 in the encoding stage and the up-sampling result are fused, a 3 × 3 convolution operation is performed, and the feature map of the binary branch with the same size as the original image can be obtained. The embeddable branch adopts the same flow, and the difference lies in that after the characteristics extracted from the layer 1 in the coding stage and the up-sampling result are fused, the up-sampling operation of 3 x 3 is carried out again, and the characteristic diagram of the embeddable branch can be obtained. Therefore, the features extracted in the encoding stage can be fully utilized in the feature extraction process to make up for the detail information lost in the upper sampling layer.
The lane lines are elongated in shape and occupy a small area in the overall image. Background information is much larger than the lane line information, which can cause category imbalance. In order to eliminate the influence of class imbalance, a weighted cross entropy loss function is adopted to train a binary segmentation network, as shown in formula (3).
Figure GDA0004006768160000111
Wherein L is b The loss function is represented. N denotes the number of pixels, y i Type 0 or 1,0 for pixel i indicates that the pixel type is background, 1 indicates that the pixel type is lane line, p i The prediction probability of the class of the pixel i is represented, and wi represents weight information of the pixel i.
For better learning of example segmentation feature maps, a combined manner of variance loss and distance loss is adopted. The variance loss Lv minimizes the distance between pixels belonging to the same lane line. The distance loss Ld maximizes the distance between pixels that do not belong to the same lane line. Distance first and difference rear
Figure GDA0004006768160000112
Figure GDA0004006768160000121
Where M is the number of lane lines, N i Number of pixels, μ, belonging to a lane line i i Is the average value, p, of all pixels belonging to the class of lane line i j Is the value of a pixel j of the class belonging to the lane line i, μ M i Belong to the lane line M i Is averaged over all pixels. Mu M j Belong to the lane line M j Is averaged over all pixels.
The specific combination process of variance loss and distance loss is as follows: the weight coefficients a1 and a2 defining the two losses are multiplied by the found variance loss and distance loss, respectively. Meanwhile, in order to ensure that the distance between the center (μ) and the origin (pixel point of the image) of each example is not too far, a regular term is added to constrain alpha, and the final loss function: a1 x Lv + a2 x Ld + α. In this embodiment, a1= a2=1.0, and α =0.005.
A post-processing module:
the segmentation feature maps of the two branches can be obtained through the previous encoding process and decoding process. Then, elements of the semantic segmentation graph generated by the binary branch are multiplied by elements of the segmentation graph generated by the embeddable branch. This allows a more accurate pixel position belonging to the lane line. The results are then sent to a post-processing module. The post-processing module comprises a clustering algorithm, and the pixels are clustered to different lanes by generally adopting a mean clustering algorithm, so as to obtain lane points belonging to a specific lane. The lane line detection flow is shown in fig. 4. In order to verify the clustering effect, after clustering, the clustering effect is also verified by quadratic polynomial fitting.
In order to verify the experimental effect of the embodiment and compare with other methods, as can be seen from table 1, the embodiment not only achieves the highest accuracy, but also consumes the least time, and more symbolizes the real-time requirement of the intelligent driving scene.
Table 1 experimental results of different algorithms
Rate of accuracy FPR FNR Time consuming (ms)
Densenet 0.871 0.122 0.575 42
Enet 0.899 0.094 0.436 86
Erfnet 0.940 0.058 0.174 53
Mobilenet 0.719 0.285 0.035 50
Ours 0.949 0.049 0.093 10
Fig. 5 shows the detection effect of the embodiment in different scenarios. Therefore, the method and the device can be used for detecting the lane lines in different scenes and obtain a better detection effect. The final result graph is the result obtained by the method of the embodiment, and the true result graph is the actual result.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. The real-time lane line detection method for learning context information by adopting an attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:
s01, in the encoding process, inputting an original image into a convolutional neural network, and performing convolutional operation on the original image by a first layer of the convolutional neural network to obtain a first characteristic diagram; inputting the first feature into an attention model, outputting a second feature map by the attention model, fusing the first feature map and the second feature map to output a third feature map, taking the third feature map as the input of the next layer of the convolutional neural network, executing the same operation as the previous layer, and repeating the steps until the last layer, and outputting a third feature map;
s02, a decoding process, namely dividing the third feature map output by the last layer into a binarization branch and an embeddable branch; for the binarization branch, firstly carrying out primary up-sampling operation on the third feature map output by the last layer extracted in the encoding process, carrying out concat operation on the obtained feature map and the features of the third feature map output by the previous layer after convolution operation of 1 × 1, taking the obtained result as the input of the next layer, then executing the same operation as the previous layer, and so on, and carrying out primary 3 × 3 convolution operation after fusing the third feature map extracted by the 1 st layer in the encoding stage and the up-sampling result to obtain the feature map of the binarization branch; aiming at the embeddable branch, adopting the same method as the binary branch until the third feature map extracted from the first layer in the coding stage is fused with the up-sampling result, and then carrying out the up-sampling operation again to obtain the feature map of the embeddable branch; then multiplying the features of the binarization branches with elements of the feature map capable of being embedded into the branches to obtain pixel positions belonging to the lane lines;
and S03, clustering the pixel positions to different lanes to obtain lane points belonging to a specific lane.
2. The method of claim 1, wherein the method comprises the steps of: in the step S01, CBAM is used as the attention model, and after the first feature map is input into the attention model, the channel attention feature map is calculated by using the formula (1),
Figure FDA0004006768150000011
then multiplying the channel attention feature map with the first feature map to obtain an intermediate feature map, then calculating a space attention feature map by using a formula (2),
Figure FDA0004006768150000012
finally, multiplying the space attention feature map and the middle feature map to obtain a third feature map;
wherein f represents a feature map, AP represents an average pooling layer, MP represents a maximum pooling layer, w1 and w0 represent weight information, and symbols in formula (2); denotes the concat operation and σ denotes the sigmoid function.
3. The method of claim 1, wherein the method comprises the steps of: in step S02, a weighted cross entropy loss function is used to train a binary separation network, and the formula is as follows:
Figure FDA0004006768150000021
wherein L is b Representing a loss function, N representing the number of pixels, y i Type 0 or 1,0 for pixel i indicates that the pixel type is background, 1 indicates that the pixel type is lane line, p i The prediction probability of the class of the pixel i is represented, and wi represents weight information of the pixel i.
4. The method of claim 1, wherein the method comprises the steps of: using a loss of variance function L v And distance loss function L d Training in a combined manner may embed the segmentation network:
Figure FDA0004006768150000022
Figure FDA0004006768150000023
where M is the number of lane lines, N i Number of pixels, μ, belonging to a lane line i i Is the average value, p, of all pixels belonging to the class of lane line i j Is the value of a pixel j of the class belonging to the lane line i, μ M i Belong to the lane line M i All pixel mean, μ M j Belong to the lane line M j All pixel average values of (a);
loss of variance function L v And distance loss function L d The specific combination process comprises the following steps: the weight coefficients a1 and a2 of the two losses are defined and multiplied with the calculated variance loss and distance loss respectively, meanwhile, in order to ensure that the distance between the center and the origin of each example is close, a regular term is added to constrain alpha, and the final loss function is
L=a1×Lv+a2×Ld+α。
5. The method of claim 4, wherein the method comprises the steps of: in step S03, after clustering, a clustering effect is verified by polynomial fitting; and the polynomial fitting adopts quadratic polynomial fitting.
6. Adopt the real-time lane line detecting system of the mechanism learning context information of attention, its characterized in that:
the encoding module is used for inputting the original image into a convolutional neural network, and the first layer of the convolutional neural network performs convolutional operation on the original image to obtain a first feature map; inputting the first feature into an attention model, outputting a second feature map by the attention model, fusing the first feature map and the second feature map to output a third feature map, taking the third feature map as the input of the next layer of the convolutional neural network, executing the same operation as the previous layer, and repeating the steps until the last layer, and outputting a third feature map;
the decoding module divides the third feature map output by the last layer into a binarization branch and an embeddable branch; for the binarization branch, firstly carrying out primary up-sampling operation on the third feature map output by the last layer extracted in the encoding process, carrying out concat operation on the obtained feature map and the features of the third feature map output by the previous layer after convolution operation of 1 × 1, taking the obtained result as the input of the next layer, then executing the same operation as the previous layer, and so on, and carrying out primary 3 × 3 convolution operation after fusing the third feature map extracted by the 1 st layer in the encoding stage and the up-sampling result to obtain the feature map of the binarization branch; aiming at the embeddable branch, adopting the same method as the binary branch until the third feature map extracted from the first layer in the coding stage is fused with the up-sampling result, and then carrying out the up-sampling operation again to obtain the feature map of the embeddable branch; then multiplying the features of the binarization branches with elements of the feature map capable of being embedded into the branches to obtain pixel positions belonging to the lane lines;
and the post-processing module is used for clustering the pixel positions to different lanes so as to obtain lane points belonging to a specific lane.
7. The real-time lane line detection system using an attention mechanism to learn context information of claim 6, wherein: in the coding module, CBAM is used as an attention model, after a first characteristic diagram is input into the attention model, a channel attention characteristic diagram is calculated by adopting a formula (1),
Figure FDA0004006768150000031
then multiplying the channel attention feature map with the first feature map to obtain an intermediate feature map, then calculating a space attention feature map by using a formula (2),
Figure FDA0004006768150000032
/>
finally, multiplying the space attention feature map and the middle feature map to obtain a third feature map;
wherein f represents a feature map, AP represents an average pooling layer, MP represents a maximum pooling layer, w1 and w0 represent weight information, and symbols in formula (2); denotes the concat operation and σ denotes the sigmoid function.
8. The real-time lane line detection system using an attention mechanism to learn context information of claim 6, wherein: in the decoding module, a weighted cross entropy loss function is adopted to train a binary separation network, and the formula is as follows:
Figure FDA0004006768150000041
wherein L is b Representing a loss function, N representing the number of pixels, y i Type 0 or 1,0 representing pixel i represents that the pixel type is background, 1 represents that the pixel type is lane line, p i The prediction probability of the class of the pixel i is represented, and wi represents weight information of the pixel i.
9. The real-time lane line detection system using an attention mechanism to learn context information of claim 6, wherein: using a loss of variance function L v And distance loss function L d Training in a combined manner may embed the segmentation network:
Figure FDA0004006768150000042
Figure FDA0004006768150000043
where M is the number of lane lines, N i Number of pixels, μ, belonging to a lane line i i Is the average value, p, of all pixels belonging to the class of lane line i j Is the value of a pixel j of the class belonging to the lane line i, μ M i Belong to the lane line M i All pixel mean, μ M j Belong to the lane line M j All pixel average values of (a);
loss of variance function L v And distance loss function L d The specific combination process comprises the following steps: the weight coefficients a1 and a2 of the two losses are defined and multiplied with the calculated variance loss and distance loss respectively, meanwhile, in order to ensure that the distance between the center and the origin of each example is close, a regularization term is added to constrain alpha,the final loss function is
L=a1×Lv+a2×Ld+α。
10. The real-time lane line detection system using an attention mechanism to learn context information of claim 6, wherein: in the post-processing module, after clustering, a clustering effect is verified through polynomial fitting; and the polynomial fitting adopts quadratic polynomial fitting.
CN202011193555.4A 2020-10-30 2020-10-30 Real-time lane line detection method and system for learning context information by adopting attention mechanism Active CN112241728B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011193555.4A CN112241728B (en) 2020-10-30 2020-10-30 Real-time lane line detection method and system for learning context information by adopting attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011193555.4A CN112241728B (en) 2020-10-30 2020-10-30 Real-time lane line detection method and system for learning context information by adopting attention mechanism

Publications (2)

Publication Number Publication Date
CN112241728A CN112241728A (en) 2021-01-19
CN112241728B true CN112241728B (en) 2023-04-07

Family

ID=74170352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011193555.4A Active CN112241728B (en) 2020-10-30 2020-10-30 Real-time lane line detection method and system for learning context information by adopting attention mechanism

Country Status (1)

Country Link
CN (1) CN112241728B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966569B (en) * 2021-02-09 2022-02-11 腾讯科技(深圳)有限公司 Image processing method and device, computer equipment and storage medium
CN113158768B (en) * 2021-03-03 2023-02-24 中山大学 Intelligent vehicle lane line detection method based on ResNeSt and self-attention distillation
CN113158790A (en) * 2021-03-15 2021-07-23 河北工业职业技术学院 Processing edge lane line detection system based on geometric context coding network model
CN113011360B (en) * 2021-03-29 2023-11-24 江苏思玛特科技有限公司 Road traffic sign line detection method and system based on attention capsule network model
CN113468967B (en) * 2021-06-02 2023-08-18 北京邮电大学 Attention mechanism-based lane line detection method, attention mechanism-based lane line detection device, attention mechanism-based lane line detection equipment and attention mechanism-based lane line detection medium
CN113591670B (en) * 2021-07-27 2023-12-01 中国科学院合肥物质科学研究院 Lane line detection method based on convolutional neural network
CN114022863B (en) * 2021-10-28 2022-10-11 广东工业大学 Deep learning-based lane line detection method, system, computer and storage medium
CN114550135B (en) * 2022-02-22 2023-04-18 无锡物联网创新中心有限公司 Lane line detection method based on attention mechanism and feature aggregation
CN115131968B (en) * 2022-06-28 2023-07-11 重庆长安汽车股份有限公司 Matching fusion method based on lane line point set and attention mechanism
CN116363712B (en) * 2023-03-21 2023-10-31 中国矿业大学 Palmprint palm vein recognition method based on modal informativity evaluation strategy
CN116343301B (en) * 2023-03-27 2024-03-08 滨州市沾化区退役军人服务中心 Personnel information intelligent verification system based on face recognition
CN116935349B (en) * 2023-09-15 2023-11-28 华中科技大学 Lane line detection method, system, equipment and medium based on Zigzag transformation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010028598A (en) * 2008-07-23 2010-02-04 Sony Corp Wireless communication system, wireless communication apparatus and wireless communication method, encoding device and encoding method, and computer program
US10311578B1 (en) * 2019-01-23 2019-06-04 StradVision, Inc. Learning method and learning device for segmenting an image having one or more lanes by using embedding loss to support collaboration with HD maps required to satisfy level 4 of autonomous vehicles and softmax loss, and testing method and testing device using the same
CN111242037A (en) * 2020-01-15 2020-06-05 华南理工大学 Lane line detection method based on structural information
CN111582201A (en) * 2020-05-12 2020-08-25 重庆理工大学 Lane line detection system based on geometric attention perception
CN111738124A (en) * 2020-06-15 2020-10-02 西安电子科技大学 Remote sensing image cloud detection method based on Gabor transformation and attention

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9865027B2 (en) * 2014-05-09 2018-01-09 Graphiclead LLC System and method for embedding of a two dimensional code with an image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010028598A (en) * 2008-07-23 2010-02-04 Sony Corp Wireless communication system, wireless communication apparatus and wireless communication method, encoding device and encoding method, and computer program
US10311578B1 (en) * 2019-01-23 2019-06-04 StradVision, Inc. Learning method and learning device for segmenting an image having one or more lanes by using embedding loss to support collaboration with HD maps required to satisfy level 4 of autonomous vehicles and softmax loss, and testing method and testing device using the same
CN111242037A (en) * 2020-01-15 2020-06-05 华南理工大学 Lane line detection method based on structural information
CN111582201A (en) * 2020-05-12 2020-08-25 重庆理工大学 Lane line detection system based on geometric attention perception
CN111738124A (en) * 2020-06-15 2020-10-02 西安电子科技大学 Remote sensing image cloud detection method based on Gabor transformation and attention

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Context Modelling Using Hierarchical Attention Networks for Sentiment and Self-assessed Emotion Detection in Spoken Narratives;Lukas Stappen;《ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)》;20190417;全文 *
一种新的DWT视频动态水印算法;刘媛媛;《 吉林大学学报(工学版) 》;20130331;全文 *

Also Published As

Publication number Publication date
CN112241728A (en) 2021-01-19

Similar Documents

Publication Publication Date Title
CN112241728B (en) Real-time lane line detection method and system for learning context information by adopting attention mechanism
CN111967305B (en) Real-time multi-scale target detection method based on lightweight convolutional neural network
CN110222591B (en) Lane line detection method based on deep neural network
CN109726627B (en) Neural network model training and universal ground wire detection method
CN108694386B (en) Lane line detection method based on parallel convolution neural network
CN112380921A (en) Road detection method based on Internet of vehicles
CN110263786B (en) Road multi-target identification system and method based on feature dimension fusion
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN104809443A (en) Convolutional neural network-based license plate detection method and system
CN112990065B (en) Vehicle classification detection method based on optimized YOLOv5 model
CN112395951B (en) Complex scene-oriented domain-adaptive traffic target detection and identification method
CN113723377B (en) Traffic sign detection method based on LD-SSD network
CN111008632B (en) License plate character segmentation method based on deep learning
CN111310766A (en) License plate identification method based on coding and decoding and two-dimensional attention mechanism
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
CN113313031B (en) Deep learning-based lane line detection and vehicle transverse positioning method
Cho et al. Semantic segmentation with low light images by modified CycleGAN-based image enhancement
CN111062347B (en) Traffic element segmentation method in automatic driving, electronic equipment and storage medium
CN112766056A (en) Method and device for detecting lane line in low-light environment based on deep neural network
CN116129291A (en) Unmanned aerial vehicle animal husbandry-oriented image target recognition method and device
CN111178363A (en) Character recognition method and device, electronic equipment and readable storage medium
CN114359554A (en) Image semantic segmentation method based on multi-receptive-field context semantic information
CN113011308A (en) Pedestrian detection method introducing attention mechanism
Cho et al. Modified perceptual cycle generative adversarial network-based image enhancement for improving accuracy of low light image segmentation
CN116883912A (en) Infrared dim target detection method based on global information target enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant