CN112241728B

CN112241728B - Real-time lane line detection method and system for learning context information by adopting attention mechanism

Info

Publication number: CN112241728B
Application number: CN202011193555.4A
Authority: CN
Inventors: 孔斌; 张露; 杨静; 王灿
Original assignee: Hefei Institutes of Physical Science of CAS
Current assignee: Hefei Institutes of Physical Science of CAS
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2023-04-07
Anticipated expiration: 2040-10-30
Also published as: CN112241728A

Abstract

The invention provides a real-time lane line detection method and a real-time lane line detection system for learning context information by adopting an attention mechanism. Then, in the decoding process, the two branches are fused synchronously, so that the detail features and the context information of the feature extraction network can be better supplemented to the up-sampling operation. To compensate for the missing detailed information and to generate precise points on the lane. The real-time lane line detection algorithm for learning the context information by using the attention mechanism is realized, the accuracy of the algorithm is effectively improved, and the processing time of each image is only 10ms. Therefore, the method is more suitable for lane line detection in an intelligent driving scene.

Description

Real-time lane line detection method and system for learning context information by adopting attention mechanism

Technical Field

The invention relates to the technical field of intelligent automobile driving safety, in particular to a real-time lane line detection method and a real-time lane line detection system for learning context information by adopting an attention mechanism.

Background

In the environment perception of the intelligent automobile, lane line detection is a basic function module, and is a premise that the automobile runs correctly according to traffic regulations, the correct lane line detection can enable the intelligent automobile to make further decisions and judgments on the position and the state of the intelligent automobile, and the automobile control system changes the position of the intelligent automobile based on the decisions and ensures that the automobile runs in a safe state. At present, the detection algorithms for lane lines are mainly divided into two types, namely a lane line detection algorithm based on traditional vision and a lane line detection algorithm based on a neural network. The traditional vision-based lane line detection algorithm extracts inherent features of a lane line, such as color, texture, edge information and the like. And then, converting the image into a bird's-eye view by utilizing perspective transformation. And finally, detecting the lane line by adopting Hough transform. The lane line detection algorithm based on deep learning has strong robustness in complex driving environments such as illumination change, shadow, night, bend and the like by training a large number of samples and autonomously learning features. Most lane line detection algorithms adopt an end-to-end mode to divide a task into two modules, one is a lane positioning module, and the other is a post-processing module. The lane locating module outputs the category (background or lane line) of the pixels in the image without considering the dependency relationship or structure between the pixels. Therefore, a post-processing module is required to be adopted to filter error detection, and a clustering algorithm is adopted to cluster the positioned lane lines to form a final lane.

The existing lane line detection algorithm based on deep learning is influenced by training labels, and only a fixed number of lane lines can be detected. The number of lane lines is constantly changing due to merging operations in driving, etc. Second, pooling operations in the network can reduce the resolution of the features, resulting in loss of information carried by the features.

The system comprises a backbone network, semantic segmentation branches and geometric distance embedding branches which are arranged behind the backbone network, an attention information transmission module which acts between two upper sampling layers at two adjacent stages of a decoder, namely between the semantic segmentation branches and the geometric distance embedding branches of the whole lane line, a geometric attention sensing module arranged at the tail ends of the semantic segmentation branches and the geometric distance embedding branches, and a hopping pyramid fusion upper sampling module connected with the backbone network and the geometric attention sensing module. The system adopts a multi-task branch network structure, besides a lane line segmentation task, a geometric distance embedding branch is added, and the branch guides lane line segmentation by learning continuous distance representation from the center to the boundary of a lane line, so that the problem that the lane line cannot be effectively detected in a complex road scene due to high dependence on sparse lane line labeling is solved. Although the system can solve the problem of lane line detection in a complex road scene to a certain extent, semantic information is adopted for auxiliary classification, geometric information of only one branch is fused once, and attention is paid between the two branches. The method cannot link convolutional layer characteristics and attention mechanism, and is not fused in two branches synchronously, so that the detailed characteristics and context information of the characteristic extraction network are not supplemented comprehensively.

Disclosure of Invention

The invention aims to solve the technical problem of how to provide a lane line detection method suitable for complex scenes.

The invention solves the technical problems through the following technical means:

the real-time lane line detection method for learning context information by adopting an attention mechanism comprises the following steps of:

s01, in the encoding process, inputting an original image into a convolutional neural network, and performing convolutional operation on the original image by a first layer of the convolutional neural network to obtain a first characteristic diagram; inputting the first feature into an attention model, outputting a second feature map by the attention model, fusing the first feature map and the second feature map to output a third feature map, taking the third feature map as the input of the next layer of the convolutional neural network, executing the same operation as the previous layer, repeating the operation until the last layer, and outputting a fourth feature map;

s02, a decoding process, namely dividing the fourth feature map into a binarization branch and an embeddable branch; for the binarization branch, firstly carrying out primary up-sampling operation on the extracted fourth feature map in the encoding process, carrying out concat operation on the obtained feature map and features of a convolution operation of 1 × 1 of a third feature map output by the last layer of the convolution neural network, taking the obtained result as the input of the next layer, then executing the same operation as the previous layer, and so on, and carrying out primary 3 × 3 convolution operation after fusing the feature map extracted by the 1 st layer in the encoding stage and the up-sampling result to obtain the feature map of the binarization branch; aiming at the embeddable branch, adopting the same method as the binarization branch until the third feature extracted from the first layer in the encoding stage and the up-sampling result are fused, and then carrying out the up-sampling operation again to obtain a feature map of the embeddable branch; then multiplying the features of the binary branch by elements of a feature map capable of trapping into the branch to obtain pixel positions belonging to the lane line;

and S03, clustering the pixel positions to different lanes to obtain lane points belonging to a specific lane.

Further, it is characterized in that: in the step S01, CBAM is used as the attention model, and after the first feature map is input into the attention model, the channel attention feature map is calculated by using the formula (1),

then multiplying the channel attention feature map with the first feature map to obtain an intermediate feature map, then calculating a space attention feature map by using a formula (2),

finally, multiplying the space attention feature map and the middle feature map to obtain a third feature map;

wherein f represents a feature map, AP represents an average pooling layer, MP represents a maximum pooling layer, w1 and w0 represent weight information, and symbols in formula (2); denotes the concat operation and σ denotes the sigmoid function.

Further, in step S02, a weighted cross entropy loss function is used to train the binary partition network, and the formula is as follows:

wherein L is _b The loss function is represented. N denotes the number of pixels, y _i Type 0 or 1,0 for pixel i indicates that the pixel type is background, 1 indicates that the pixel type is lane line, p _i The prediction probability of the class of the pixel i is represented, and wi represents weight information of the pixel i.

Further, it is characterized in that: using a loss of variance function L _v And distance loss function L _d Training in a combined manner may embed the segmentation network:

/>

f(x)＝max(0,x)

where M is the number of lane lines, N _i Number of pixels, μ, belonging to a lane line i _i Is the average value, p, of all pixels belonging to the class of lane line i _j Is the value of a pixel j of the class belonging to the lane line i, μ M _i Belong to the lane line M _i Average of all pixels. Mu M _j Belong to the lane line M _j Is averaged over all pixels.

Loss of variance function L _v And distance loss function L _d The specific combination process comprises the following steps: the weight coefficients a1 and a2 of the two losses are defined and multiplied with the calculated variance loss and distance loss respectively, meanwhile, in order to ensure that the distance between the center and the origin of each example is close, a regular term is added to constrain alpha, and the final loss function is

L＝a1×Lv+a2×Ld+α。

Further, it is characterized in that: in the step S03, after clustering, a clustering effect is also verified by polynomial fitting; and the polynomial fitting adopts quadratic polynomial fitting.

The invention also provides a real-time lane line detection system for learning the context information by adopting the attention mechanism, which comprises

The encoding module is used for inputting the original image into a convolutional neural network, and a first layer of the convolutional neural network performs convolutional operation on the original image to obtain a first characteristic diagram; inputting the first feature into an attention model, outputting a second feature map by the attention model, fusing the first feature map and the second feature map to output a third feature map, taking the third feature map as the input of the next layer of the convolutional neural network, executing the same operation as the previous layer, repeating the operation until the last layer, and outputting a fourth feature map;

the decoding module divides the fourth feature map into a binarization branch and an embeddable branch; for the binarization branch, firstly carrying out primary up-sampling operation on the extracted fourth feature map in the encoding process, carrying out concat operation on the obtained feature map and features of a convolution operation of 1 × 1 of a third feature map output by the last layer of the convolution neural network, taking the obtained result as the input of the next layer, then executing the same operation as the previous layer, and so on, and carrying out primary 3 × 3 convolution operation after fusing the feature map extracted by the 1 st layer in the encoding stage and the up-sampling result to obtain the feature map of the binarization branch; aiming at the embeddable branch, adopting the same method as the binarization branch until the third feature extracted from the first layer in the encoding stage and the up-sampling result are fused, and then carrying out the up-sampling operation again to obtain a feature map of the embeddable branch; then multiplying the features of the binarization branches with elements of the feature map capable of being embedded into the branches to obtain pixel positions belonging to the lane lines;

and the post-processing module is used for clustering the pixel positions to different lanes so as to obtain lane points belonging to a specific lane.

Furthermore, in the encoding module, CBAM is used as an attention model, after the first feature map is input into the attention model, the channel attention feature map is calculated by formula (1),

then multiplying the channel attention characteristic diagram with the first characteristic diagram to obtain an intermediate characteristic diagram, then calculating a space attention characteristic diagram by using a formula (2),

Further, in the decoding module, a weighted cross entropy loss function is used to train a binary separation network, and the formula is as follows:

Further, a variance loss function L is used _v And distance loss function L _d Training in a combined manner may embed the segmentation network:

where M is the number of lane lines, N _i Number of pixels, μ, belonging to a lane line i _i Is the average value, p, of all pixels belonging to the class of lane line i _j Value of a pixel j belonging to the class of lane line i, μ M _i Belong to the lane line M _i Is averaged over all pixels. μ M _j Belong to the lane line M _j Average of all pixels.

L＝a1×Lv+a2×Ld+α。

Further, in the post-processing module, after clustering, a clustering effect is verified through polynomial fitting; and the polynomial fitting adopts quadratic polynomial fitting.

The invention has the advantages that:

the invention adopts an encoding-decoding structure, takes the lane line detection task as an example segmentation task, and can detect any number of lane lines.

Specifically, in coding, the invention designs a fusion module, and learns the context information by establishing the correlation between each convolution layer of the feature extraction network and the attention mechanism. Then, in decoding, the two branches are fused synchronously, and the detail features and the context information of the feature extraction network can be better supplemented to the operation of up-sampling. To compensate for the missing detailed information and to generate precise points on the lane. The real-time lane line detection algorithm for learning the context information by using the attention mechanism is realized, the accuracy of the algorithm is effectively improved, and the processing time of each image is only 10ms. Therefore, the method is more suitable for lane line detection in an intelligent driving scene.

By combining the variance loss function and the distance loss function, the distance between the center point and the far point of each instance can be close, and the instance segmentation feature map can be better learned.

Through polynomial fitting, the clustering effect can be verified, and the detection accuracy is further ensured.

Drawings

FIG. 1 is a block diagram of a lane line detection process according to an embodiment of the present invention;

FIG. 2 is a diagram of a feature extraction network structure in the encoding stage according to an embodiment of the present invention;

FIG. 3 is a block diagram of an attention model CBAM according to an embodiment of the present invention;

FIG. 4 is an expanded flow diagram of the steps of FIG. 1;

fig. 5 is a lane line detection result diagram of the lane line detection method in different scenes in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment provides a real-time lane line detection method for learning context information by adopting an attention mechanism, which adopts a coding-decoding structure, wherein the coding structure is used for extracting feature information of a lane line, and the decoding structure is used for acquiring an example segmentation feature map of the lane line, and the method specifically comprises the following steps:

step 1, encoding process:

as shown in fig. 1, the feature extraction network of the present embodiment is based on VGG 16. The VGG16 is a classification network that ultimately outputs 1000-dimensional feature vectors representing scores of 1000 classes. If the last fully connected layer is replaced by the convolutional layer, the finally obtained feature map has a size of only 1/32 of that of the input image. Because the network structure of the VGG16 includes 5 max pooling layers, the resolution of the feature map is doubled every time a maximum pooling layer is passed, and the information carried by the feature map is lost. Therefore, the embodiment constructs a fusion module, and further establishes the correlation between the convolution feature and the attention mechanism feature to learn the context information. In this embodiment, a skip pyramid fusion module is used for fusion.

In neural networks, convolutional layers obtain output features by linear combination of convolutional kernels and primitive features. Since the convolution kernel is usually local, in order to increase the receptive field, the way of stacking convolution layers is often adopted, and in fact, the processing way is not efficient. The attention mechanism can obtain larger receptive field and context information by capturing global information. Since the CBAM is a lightweight general-purpose module, the CBAM can be integrated into various convolutional neural networks for end-to-end training. Thus, the present embodiment employs CBAM as the attention mechanism module. Fig. 2 is a schematic structural diagram of CBAM.

The encoding process specifically comprises: inputting an original image into a convolutional neural network VGG16, and performing convolution operation on the original image by a first layer of the convolutional neural network VGG16 to obtain a first feature map; and then inputting the first feature into an attention model, outputting a second feature map by the attention model, fusing the first feature map and the second feature map to output a third feature map, taking the third feature map as the input of the next layer of the convolutional neural network, executing the same operation as the previous layer, and repeating the steps until the last layer, and outputting a fourth feature map.

As can be seen from fig. 2, with respect to the first feature map, the channel attention feature map is first obtained by using the formula (1), and then the channel attention feature map is multiplied by the input feature map to obtain an intermediate feature map. And then calculating a space attention feature map by using a formula (2), and multiplying the space attention feature map by the space attention feature map to obtain a final output second feature map.

Where f represents the feature map, AP represents the average pooling layer, MP represents the maximum pooling layer, and w1 and w0 represent weight information. The symbols in formula (2); denotes the concat operation and σ denotes the sigmoid function.

Step 2, decoding process:

the present embodiment divides the last layer of encoding into a binarization branch and an embeddable branch. The function of the binary branch is to locate the position of the lane line, and the function of the embeddable branch is to acquire the instance information of each lane line. To ensure that the output image is the same size as the input image, the present embodiment uses an upsampling layer in the decoding stage. However, if the present embodiment extracts features directly through the upsampling layer, much detail is lost. Therefore, the present embodiment fuses the features extracted in the encoding stage with the up-sampling result, as shown in fig. 3. The dimension reduction is carried out by carrying out convolution operation of 1 multiplied by 1 once on the extracted features of each layer in the encoding stage. For the binarization branch, firstly, performing primary up-sampling operation on the features extracted from the last layer and the fifth layer in the encoding stage, performing concat operation on the obtained features and the features after the convolution operation of 1 × 1, and sending the obtained result to the next layer. By analogy, after the features extracted from the layer 1 in the encoding stage and the up-sampling result are fused, a 3 × 3 convolution operation is performed, and the feature map of the binary branch with the same size as the original image can be obtained. The embeddable branch adopts the same flow, and the difference lies in that after the characteristics extracted from the layer 1 in the coding stage and the up-sampling result are fused, the up-sampling operation of 3 x 3 is carried out again, and the characteristic diagram of the embeddable branch can be obtained. Therefore, the characteristics extracted in the encoding stage can be fully utilized in the characteristic extraction process to make up for the detail information lost by the upper sampling layer.

The lane lines are elongated in shape and occupy a small area in the overall image. Background information is much larger than the lane line information, which can cause category imbalance. In order to eliminate the influence of class imbalance, a weighted cross entropy loss function is adopted to train a binary segmentation network, as shown in formula (3).

Wherein L is _b Representing a loss function. N denotes the number of pixels, y _i Type 0 or 1,0 for pixel i indicates that the pixel type is background, 1 indicates that the pixel type is lane line, p _i The prediction probability of the class of the pixel i is represented, and wi represents weight information of the pixel i.

For better learning of example segmentation feature maps, a combined manner of variance loss and distance loss is adopted. The variance loss Lv minimizes the distance between pixels belonging to the same lane line. The distance loss Ld maximizes the distance between pixels that do not belong to the same lane line. Distance first and difference rear

Where M is the number of lane lines, N _i Number of pixels, μ, belonging to a lane line i _i Average, p, of all pixels belonging to the class i _j Is the value of a pixel j of the class belonging to the lane line i, μ M _i Belong to the lane line M for the category _i Is averaged over all pixels. Mu M _j Belong to the lane line M _j Is averaged over all pixels.

The specific combination process of variance loss and distance loss is as follows: the weight coefficients a1 and a2 defining the two losses are multiplied by the found variance loss and distance loss, respectively. Meanwhile, in order to ensure that the distance between the center (μ) and the origin (pixel point of the image) of each example is not too far, a regular term is added to constrain alpha, and the final loss function: a1 × Lv + a2 × Ld + α. In this embodiment, a1= a2=1.0, and α =0.005.

Step 3, post-treatment process

The segmentation feature maps of the two branches can be obtained by the previous encoding process and decoding process. Then, elements of the semantic segmentation graph generated by the binary branch are multiplied by elements of the segmentation graph generated by the embeddable branch. This allows a more accurate pixel position belonging to the lane line. The results are then sent to a post-processing module. The post-processing module comprises a clustering algorithm, and the pixels are clustered to different lanes by generally adopting a mean clustering algorithm, so as to obtain lane points belonging to a specific lane. The lane line detection flow is shown in fig. 4. In order to verify the clustering effect, after clustering, the clustering effect is also verified by quadratic polynomial fitting.

In order to verify the experimental effect of the embodiment and compare with other methods, as can be seen from table 1, the embodiment not only achieves the highest accuracy, but also consumes the least time, and more symbolizes the real-time requirement of the intelligent driving scene.

TABLE 1 Experimental results of different algorithms

	Rate of accuracy	FPR	FNR	Time consuming (ms)
					Densenet	0.871	0.122	0.575	42
Enet	0.899	0.094	0.436	86
					Erfnet	0.940	0.058	0.174	53
Mobilenet	0.719	0.285	0.035	50
					Ours	0.949	0.049	0.093	10

Fig. 5 shows the detection effect of the embodiment in different scenarios. Therefore, the method and the device can be used for detecting the lane lines in different scenes and obtain a better detection effect. The final result graph is the result obtained by the method of the embodiment, and the true result graph is the actual result.

The embodiment firstly establishes the correlation between the features extracted by convolution and the features extracted by the attention mechanism in the encoding process to learn the context information, and can improve the condition that the lane identification features are not obvious by utilizing the context information. Secondly, in the decoding module, the context information learned by the encoding module is fused with the features extracted by the up-sampling module to compensate the lost detail information. The validity of the present embodiment is verified in the public data set TuSimple. The result proves that the embodiment can process an uncertain number of lanes, the accuracy of the algorithm is effectively improved, and the processing time of each image is only 10ms. Therefore, the method is more suitable for lane line detection in an intelligent driving scene.

The present embodiment further provides a real-time lane line detection system for learning context information by using an attention mechanism, which includes:

the coding module:

In neural networks, convolutional layers obtain output features through linear combination of convolutional kernels and primitive features. Since the convolution kernel is usually local, in order to increase the receptive field, the way of stacking convolution layers is often adopted, and in fact, the processing way is not efficient. The attention mechanism can obtain larger receptive field and context information by capturing global information. Since the CBAM is a lightweight general-purpose module, the CBAM can be integrated into various convolutional neural networks for end-to-end training. Thus, the present embodiment employs CBAM as the attention mechanism module. Fig. 2 is a schematic structural diagram of CBAM.

Where f represents a feature map, AP represents an average pooling layer, MP represents a maximum pooling layer, and w1 and w0 represent weight information. The symbols in formula (2); denotes the concat operation and σ denotes the sigmoid function.

A decoding module:

the present embodiment divides the last layer of encoding into a binarization branch and an embeddable branch. The function of the binary branch is to locate the position of the lane line, and the function of the embeddable branch is to acquire the instance information of each lane line. To ensure that the output image is the same size as the input image, the present embodiment uses an upsampling layer in the decoding stage. However, if the present embodiment extracts features directly through the upsampling layer, much detail is lost. Therefore, the present embodiment fuses the features extracted in the encoding stage with the up-sampling result, as shown in fig. 3. The dimension reduction is carried out by carrying out convolution operation of 1 multiplied by 1 once on the extracted features of each layer in the encoding stage. For the binarization branch, firstly, performing primary up-sampling operation on the features extracted from the last layer and the fifth layer in the encoding stage, performing concat operation on the obtained features and the features after the convolution operation of 1 × 1, and sending the obtained result to the next layer. By analogy, after the features extracted from the layer 1 in the encoding stage and the up-sampling result are fused, a 3 × 3 convolution operation is performed, and the feature map of the binary branch with the same size as the original image can be obtained. The embeddable branch adopts the same flow, and the difference lies in that after the characteristics extracted from the layer 1 in the coding stage and the up-sampling result are fused, the up-sampling operation of 3 x 3 is carried out again, and the characteristic diagram of the embeddable branch can be obtained. Therefore, the features extracted in the encoding stage can be fully utilized in the feature extraction process to make up for the detail information lost in the upper sampling layer.

Where M is the number of lane lines, N _i Number of pixels, μ, belonging to a lane line i _i Is the average value, p, of all pixels belonging to the class of lane line i _j Is the value of a pixel j of the class belonging to the lane line i, μ M _i Belong to the lane line M _i Is averaged over all pixels. Mu M _j Belong to the lane line M _j Is averaged over all pixels.

The specific combination process of variance loss and distance loss is as follows: the weight coefficients a1 and a2 defining the two losses are multiplied by the found variance loss and distance loss, respectively. Meanwhile, in order to ensure that the distance between the center (μ) and the origin (pixel point of the image) of each example is not too far, a regular term is added to constrain alpha, and the final loss function: a1 x Lv + a2 x Ld + α. In this embodiment, a1= a2=1.0, and α =0.005.

A post-processing module:

the segmentation feature maps of the two branches can be obtained through the previous encoding process and decoding process. Then, elements of the semantic segmentation graph generated by the binary branch are multiplied by elements of the segmentation graph generated by the embeddable branch. This allows a more accurate pixel position belonging to the lane line. The results are then sent to a post-processing module. The post-processing module comprises a clustering algorithm, and the pixels are clustered to different lanes by generally adopting a mean clustering algorithm, so as to obtain lane points belonging to a specific lane. The lane line detection flow is shown in fig. 4. In order to verify the clustering effect, after clustering, the clustering effect is also verified by quadratic polynomial fitting.

Table 1 experimental results of different algorithms

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. The real-time lane line detection method for learning context information by adopting an attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:

s01, in the encoding process, inputting an original image into a convolutional neural network, and performing convolutional operation on the original image by a first layer of the convolutional neural network to obtain a first characteristic diagram; inputting the first feature into an attention model, outputting a second feature map by the attention model, fusing the first feature map and the second feature map to output a third feature map, taking the third feature map as the input of the next layer of the convolutional neural network, executing the same operation as the previous layer, and repeating the steps until the last layer, and outputting a third feature map;

s02, a decoding process, namely dividing the third feature map output by the last layer into a binarization branch and an embeddable branch; for the binarization branch, firstly carrying out primary up-sampling operation on the third feature map output by the last layer extracted in the encoding process, carrying out concat operation on the obtained feature map and the features of the third feature map output by the previous layer after convolution operation of 1 × 1, taking the obtained result as the input of the next layer, then executing the same operation as the previous layer, and so on, and carrying out primary 3 × 3 convolution operation after fusing the third feature map extracted by the 1 st layer in the encoding stage and the up-sampling result to obtain the feature map of the binarization branch; aiming at the embeddable branch, adopting the same method as the binary branch until the third feature map extracted from the first layer in the coding stage is fused with the up-sampling result, and then carrying out the up-sampling operation again to obtain the feature map of the embeddable branch; then multiplying the features of the binarization branches with elements of the feature map capable of being embedded into the branches to obtain pixel positions belonging to the lane lines;

2. The method of claim 1, wherein the method comprises the steps of: in the step S01, CBAM is used as the attention model, and after the first feature map is input into the attention model, the channel attention feature map is calculated by using the formula (1),

3. The method of claim 1, wherein the method comprises the steps of: in step S02, a weighted cross entropy loss function is used to train a binary separation network, and the formula is as follows:

wherein L is _b Representing a loss function, N representing the number of pixels, y _i Type 0 or 1,0 for pixel i indicates that the pixel type is background, 1 indicates that the pixel type is lane line, p _i The prediction probability of the class of the pixel i is represented, and wi represents weight information of the pixel i.

4. The method of claim 1, wherein the method comprises the steps of: using a loss of variance function L _v And distance loss function L _d Training in a combined manner may embed the segmentation network:

where M is the number of lane lines, N _i Number of pixels, μ, belonging to a lane line i _i Is the average value, p, of all pixels belonging to the class of lane line i _j Is the value of a pixel j of the class belonging to the lane line i, μ M _i Belong to the lane line M _i All pixel mean, μ M _j Belong to the lane line M _j All pixel average values of (a);

L＝a1×Lv+a2×Ld+α。

5. The method of claim 4, wherein the method comprises the steps of: in step S03, after clustering, a clustering effect is verified by polynomial fitting; and the polynomial fitting adopts quadratic polynomial fitting.

6. Adopt the real-time lane line detecting system of the mechanism learning context information of attention, its characterized in that:

the encoding module is used for inputting the original image into a convolutional neural network, and the first layer of the convolutional neural network performs convolutional operation on the original image to obtain a first feature map; inputting the first feature into an attention model, outputting a second feature map by the attention model, fusing the first feature map and the second feature map to output a third feature map, taking the third feature map as the input of the next layer of the convolutional neural network, executing the same operation as the previous layer, and repeating the steps until the last layer, and outputting a third feature map;

the decoding module divides the third feature map output by the last layer into a binarization branch and an embeddable branch; for the binarization branch, firstly carrying out primary up-sampling operation on the third feature map output by the last layer extracted in the encoding process, carrying out concat operation on the obtained feature map and the features of the third feature map output by the previous layer after convolution operation of 1 × 1, taking the obtained result as the input of the next layer, then executing the same operation as the previous layer, and so on, and carrying out primary 3 × 3 convolution operation after fusing the third feature map extracted by the 1 st layer in the encoding stage and the up-sampling result to obtain the feature map of the binarization branch; aiming at the embeddable branch, adopting the same method as the binary branch until the third feature map extracted from the first layer in the coding stage is fused with the up-sampling result, and then carrying out the up-sampling operation again to obtain the feature map of the embeddable branch; then multiplying the features of the binarization branches with elements of the feature map capable of being embedded into the branches to obtain pixel positions belonging to the lane lines;

7. The real-time lane line detection system using an attention mechanism to learn context information of claim 6, wherein: in the coding module, CBAM is used as an attention model, after a first characteristic diagram is input into the attention model, a channel attention characteristic diagram is calculated by adopting a formula (1),

/>

8. The real-time lane line detection system using an attention mechanism to learn context information of claim 6, wherein: in the decoding module, a weighted cross entropy loss function is adopted to train a binary separation network, and the formula is as follows:

wherein L is _b Representing a loss function, N representing the number of pixels, y _i Type 0 or 1,0 representing pixel i represents that the pixel type is background, 1 represents that the pixel type is lane line, p _i The prediction probability of the class of the pixel i is represented, and wi represents weight information of the pixel i.

9. The real-time lane line detection system using an attention mechanism to learn context information of claim 6, wherein: using a loss of variance function L _v And distance loss function L _d Training in a combined manner may embed the segmentation network:

loss of variance function L _v And distance loss function L _d The specific combination process comprises the following steps: the weight coefficients a1 and a2 of the two losses are defined and multiplied with the calculated variance loss and distance loss respectively, meanwhile, in order to ensure that the distance between the center and the origin of each example is close, a regularization term is added to constrain alpha,the final loss function is

L＝a1×Lv+a2×Ld+α。

10. The real-time lane line detection system using an attention mechanism to learn context information of claim 6, wherein: in the post-processing module, after clustering, a clustering effect is verified through polynomial fitting; and the polynomial fitting adopts quadratic polynomial fitting.