CN114743126A

CN114743126A - Lane line sign segmentation method based on graph attention machine mechanism network

Info

Publication number: CN114743126A
Application number: CN202210224636.9A
Authority: CN
Inventors: 张雯玮; 杜杰儒; 赵子铭; 何为; 李凤荣; 张质懿
Original assignee: Shanghai Hansuo Information Technology Co ltd
Current assignee: Shanghai Hansuo Information Technology Co ltd
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2022-07-12

Abstract

The invention relates to a lane mark segmentation method based on an attention machine mechanism network, which comprises the following steps: step S1, constructing a traffic sign segmentation model based on the graph attention machine mechanism network; step S2, collecting a lane marking data set, and training a traffic marking segmentation model according to the collected lane marking data set to obtain a trained traffic marking segmentation model; step S3, acquiring an initial road condition image, and preprocessing the acquired initial road condition image to obtain a preprocessed road condition image; and step S4, inputting the preprocessed road condition image into the trained traffic sign segmentation model, acquiring the segmented lane mark, and marking the segmented lane mark in the initial road condition image. The invention can fully fuse the global features of the image and the features of each subregion in space and semantics, and simultaneously screen the features according to importance, thereby accurately segmenting the lane marking information in a complex scene.

Description

Lane line sign segmentation method based on graph attention machine mechanism network

Technical Field

The invention relates to the technical field of traffic sign segmentation, in particular to a lane line sign segmentation method based on a graph attention machine system network.

Background

Increased automobile inventory can create traffic hazards, in part because drivers and vehicles do not recognize traffic sign information well. The traffic sign segmentation technology can provide good traffic information for the driver and assist the driver in making judgment, so that the traffic accident rate is reduced; meanwhile, as an important part of environment perception, lane line identification can provide good lane mark information for running vehicles, and driving experience is improved.

The current lane marking segmentation method can be divided into two categories, one category is based on the traditional machine learning algorithm, such as AdaBoost, support vector machine and the like, the method mainly relies on an online training classifier to distinguish a target from a background, and then the classifier is utilized to position the target from a candidate area; another class is based on deep learning algorithms, such as convolutional neural networks, which first perform off-line training on large-scale lane marker data sets, and then segment the lane markers. The machine learning method has strong sensitivity to illumination, and is difficult to obtain a good identification effect under the conditions of shadow, low brightness, occlusion and motion blur. The deep learning algorithm relies on strong feature representation capability, can solve the problems of motion blur, occlusion and the like, and is far superior to the traditional machine learning algorithm in segmentation precision.

However, the current deep learning algorithm only focuses on image feature expression, and is difficult to process categories caused by illumination change and shielding interference, so that when the lane marking is in a complex scene, the boundary between the foreground and the background is not obvious, and lane marking information cannot be accurately segmented.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a lane marking segmentation method based on a graph attention machine network, which can accurately segment lane marking information in a complex scene.

The invention provides a lane line mark segmentation method based on a graph attention machine mechanism network, which comprises the following steps:

step S1, constructing a traffic sign segmentation model based on the graph attention machine mechanism network;

step S2, collecting a lane marking data set, and training the traffic marking segmentation model according to the collected lane marking data set to obtain a trained traffic marking segmentation model;

step S3, acquiring an initial road condition image, and preprocessing the acquired initial road condition image to obtain a preprocessed road condition image;

and step S4, inputting the preprocessed road condition image into the trained traffic sign segmentation model, acquiring the segmented lane mark, and marking the segmented lane mark in the initial road condition image.

Further, the traffic sign segmentation model comprises an initial feature extraction module, a pyramid pooling module, a graph attention machine mechanism network, an upsampling module and a feedforward neural network which are connected in sequence, and the upsampling module is connected with the fusion neural network.

Further, the loss function of the traffic sign segmentation model is expressed according to the following formula:

L_total＝αL_cls+βL_seg

L_cls＝{l₁，l₂，…，l_N}l_i＝-w_i[y_ilogx_i+(1-y_i)log(1-x_i)]

L_seg＝{l₁，l₂，...，l_N}，

in the formula, L_totalRepresenting the total loss function of the traffic sign segmentation model, alpha and beta being weighting coefficients, L_clsRepresents the classification loss function, L_segRepresenting a segmentation loss function; w is a_iIs a weight matrix, y_iAs a result of the original classification, x_iIn order to classify and predict the value for the model,

in order to obtain the original segmentation result,

the predicted values are segmented for the model.

Further, the step S3 includes:

step S31, shooting a road condition video through a vehicle-mounted camera, and extracting frames of the road condition video according to a set frame rate to obtain an initial road condition image;

and step S32, adjusting the size of the initial road condition image, and enhancing the initial road condition image after the size is adjusted to obtain a road condition image after preprocessing.

Further, in step S32, the initial image is enhanced by using RetinexNet.

Further, the step S4 includes:

step S41, inputting the road condition image after being preprocessed into the initial feature extraction module to obtain an initial feature map;

step S42, inputting the initial feature map into the pyramid pooling module to obtain a global pooling feature map and a plurality of sub-area pooling feature maps;

step S43, inputting the global pooling feature map and the sub-region pooling feature map into the map attention mechanism network to obtain a multi-head attention feature map;

step S44, reducing the dimension of the multi-head attention feature map through the fusion neural network, inputting the multi-head attention feature map after the dimension reduction into the up-sampling module to obtain an up-sampled feature map, and then connecting the up-sampled feature map with the initial feature map to obtain a fusion feature map;

and step S45, classifying each pixel point in the fusion characteristic diagram through the feedforward neural network to obtain a segmented lane mark.

The invention can fully fuse the global features of the image and the features of each subregion in space and semantics, and simultaneously screen the features according to importance, thereby accurately segmenting the lane marking information in a complex scene.

Drawings

Fig. 1 is a flowchart of a lane marking segmentation method based on a graph attention mechanism network according to the present invention.

Detailed Description

The following description of the preferred embodiments of the present invention is provided in conjunction with the accompanying drawings and will be described in detail.

As shown in fig. 1, the lane marking segmentation method based on the attention mechanism network provided by the invention comprises the following steps:

and step S1, constructing a traffic sign segmentation model based on the graph attention machine mechanism network. Specifically, the traffic sign segmentation model comprises an initial feature extraction module, a pyramid pooling module, a graph attention machine mechanism network, an upsampling module and a feedforward neural network which are connected in sequence, wherein the upsampling module is connected with the fusion neural network.

The loss function of the traffic sign segmentation model is as follows:

L_total＝αL_cls+βL_seg

in the formula, L_totalTotal loss function representing traffic sign segmentation model, alpha and beta are weighting coefficients, L_clsRepresenting the classification loss function, L_segRepresenting the segmentation loss function.

L_cls＝{l₁，l₂，…，l_N}l_i＝-w_i[y_ilogx_i+(1-y_i)log(1-x_i)]

L_seg＝{l₁，l₂，...，l_N}

In the formula, w_iIs a weight matrix, y_iAs a result of the original classification, x_iIn order to predict the value of the classification of the model,

in order to be the result of the original segmentation,

the predicted values are segmented for the model. Wherein,

and

the method comprises the steps of predicting the center coordinates, the length and the width of a frame and obtaining contour information of target detection.

And step S2, collecting a lane marking data set, and training the traffic marking segmentation model constructed in the step S1 according to the collected lane marking data set to obtain the trained traffic marking segmentation model. During training, the lane marking data set is automatically divided into a training set, a testing set and a verification set according to a preset division threshold value, and a traffic marking division model is trained according to the divided training set, the divided testing set and the divided verification set.

And step S3, acquiring an initial road condition image, and preprocessing the acquired initial road condition image to obtain a preprocessed road condition image.

Specifically, step S3 includes:

and step S31, shooting the road condition video through the vehicle-mounted camera, and extracting frames from the road condition video according to a set frame rate to obtain a plurality of images, wherein the images form an initial road condition image.

Specifically, the size of the initial road condition image is unified to 512 × 512 pixels. In addition, because the quality of images shot under weak illumination conditions such as foggy days, rainy days, nights and the like is poor, the contrast is low, the details of the images are seriously lost, and the visibility is difficult to meet the requirement, the low-illumination images need to be enhanced. In the present embodiment, the image is subjected to enhancement processing using RetinexNet. RetinexNet is an effective dim light image enhancement tool, and the principle is as follows: firstly, decomposing an image through a decomposition network; secondly, on the basis of decomposition, utilizing an enhancement network to adjust the illumination map of the image; and finally, carrying out denoising operation on the reflectivity by joint denoising.

Specifically, step S4 includes:

and step S41, inputting the preprocessed road condition image into an initial feature extraction module of the traffic sign segmentation model to obtain an initial feature map. Specifically, the initial feature extraction module is of a residual network structure, and obtains an extracted initial feature map through multilayer convolution operation and a residual structure, and the initial feature map can be used for representing semantic and spatial feature information of road condition image information and possibly contains edge information or detail feature information of different targets in a road condition.

And step S42, inputting the initial feature map into a pyramid pooling module of the traffic sign segmentation model, and acquiring a global pooling feature map and a plurality of sub-area pooling feature maps. In the pyramid pooling module, the initial feature map is divided into a plurality of sub-regions with different scales, for example, sub-regions of 1 × 1 pixel, 2 × 2 pixel, 3 × 3 pixel, 6 × 6 pixel and 12 × 12 pixel, and then global average pooling operation is performed on the sub-regions to obtain pooled feature maps corresponding to the sub-regions. The global pooling feature map refers to a pooling feature map of the entire image, including background information of the entire image.

And step S43, inputting the global pooling feature map and the sub-region pooling feature map into a map attention mechanism network of the traffic sign segmentation model, and acquiring a multi-head attention feature map for representing the semantics among the targets of the sub-regions and the interrelation of the space structures of the sub-regions.

The graph neural network is a neural network structure that operates on a graph based on a graph structure. The figure attention mechanism network introduces a multi-head attention mechanism on the basis of a figure neural network, and the calculation principle is as follows:

1) extracting pixel points in the pooling feature graph as nodes, and performing feature conversion on all the nodes to convert the pooled feature channel number F into a channel number F' after dimensionality reduction, wherein the conversion formula is as follows:

z_j＝Wh_j

in the formula, z_jFor the converted features, h_jFor the graph attention machine to make the original input characteristics of each node of the network, W belongs to R^F′×FIs a weight matrix and is used for completing the characteristic conversion process of all nodes.

2) A multi-head attention mechanism is introduced into a graph structure, and in a graph attention mechanism network, a self-attention mechanism (self-attention) is used, a shared attention calculation function is used, and the calculation formula is as follows:

e_ij＝W_ih_i||W_jh_j

in the formula, e_ijRepresents the degree of contribution, W, of the feature of node j to node i_iWeight matrix, h, representing node i_iRepresenting the original input features of node i, "| |" represents the concatenation of vectors, W_jWeight matrix, h, representing node j_jRepresenting the original input characteristics of node j.

3) Normalizing the contribution degree of each adjacent node k to better distribute the weight, wherein the calculation formula is as follows:

in this process, the contribution degree is activated using a leakage activation function. In the formula, alpha_ijRepresenting the correlation degree of the index node i and the node j; n is a radical of hydrogen_iSet of first order neighbor nodes representing node i (representing nodes directly related to inode), e_ikRepresenting the degree of correlation of the node i and the node k; k represents one of the first order neighbor node sets of node i.

4) And after the contribution degree of each adjacent node of the node i is calculated, performing feature summation updating on all adjacent nodes of the node i according to the weight, and taking the sum as the final output of the node i. In order to stabilize the process of attention mechanism learning, a multi-head attention mechanism is adopted, and the calculation formula is as follows:

in the formula,

representing the output of the network of the attention mechanism, K is the number of the multi-head attention mechanism, | | | represents that the results of the K multi-head attention mechanisms are connected, σ represents a sigmoid activation function,

denotes the kth Multi-head attention value (relevance), W, for node i and node j^kA transformation matrix representing the (degree of correlation) of the kth multi-head attention,

the diagram shows the inputs to the attention mechanism network.

And S44, reducing the dimensions of the multi-head attention feature map obtained in the step S43 through a fusion neural network of the traffic sign segmentation model, inputting the multi-head attention feature map subjected to dimension reduction into an up-sampling module of the traffic sign segmentation model to obtain an up-sampled feature map, enabling the up-sampled feature map to be consistent with the size of the initial feature map obtained in the step S41, and then connecting the up-sampled feature map with the initial feature map to obtain a fusion feature map.

The purpose of reducing the dimensions of the multi-head attention feature map is to fuse the multi-head attention features, enable a network to better learn the relationship among the multi-head attention features, and extract important features.

And step S45, classifying each pixel point in the fusion characteristic diagram through a feedforward neural network of the traffic sign segmentation model to obtain the segmented lane mark. Specifically, whether each pixel point in the fusion characteristic diagram is a prediction target or not is judged, and if yes, the pixel point is judged as a lane mark; otherwise, it is judged as a background. In this way, a complete lane marking shape can be finally formed.

The invention can fully fuse the global features of the image and the features of each subregion in space and semantics, and simultaneously screen the features according to importance, thereby accurately segmenting the lane marking information in a complex scene. Compared with the traditional semantic segmentation method based on pure vision, the method provided by the invention does not need to incorporate a large amount of carefully designed and highly task-customized expert experience knowledge into the algorithm, and is simpler and more convenient to realize. In addition, the invention can be applied to other computer vision scenes, has stronger expansibility and can access a more general intelligent system.

The above embodiments are merely preferred embodiments of the present invention, which are not intended to limit the scope of the present invention, and various changes may be made in the above embodiments of the present invention. All simple and equivalent changes and modifications made according to the claims and the content of the specification of the present application fall within the scope of the claims of the present patent application. The invention has not been described in detail in the conventional technical content.

Claims

1. A lane mark segmentation method based on a graph attention machine mechanism network is characterized by comprising the following steps:

2. The lane marking segmentation method based on the graph attention mechanism network according to claim 1, wherein the traffic marking segmentation model comprises an initial feature extraction module, a pyramid pooling module, the graph attention mechanism network, an upsampling module and a feed-forward neural network which are connected in sequence, and the upsampling module is connected with the fusion neural network.

3. The lane marking segmentation method based on the graph attention mechanism network according to claim 1, wherein the loss function of the traffic marking segmentation model is expressed according to the following formula:

L_total＝αL_cls+βL_seg，

L_cls＝{l₁，l₂，...，l_N}l_i＝-w_i[y_ilogx_i+(1-y_i)log(1-x_i)]，

L_seg＝{l₁，l₂，...，l_N}，

in the formula, L_totalRepresenting the total loss function of the traffic sign segmentation model, alpha and beta being weighting coefficients, L_clsRepresenting the classification loss function, L_segRepresenting a segmentation loss function; w is a_iIs a weight matrix, y_iAs a result of the original classification, x_iIn order to predict the value of the classification of the model,

in order to obtain the original segmentation result,

the predicted values are segmented for the model.

4. The lane marking segmentation method based on graph attention mechanism network according to claim 1, wherein the step S3 includes:

step S31, shooting road condition videos through a vehicle-mounted camera, and extracting frames of the road condition videos according to a set frame rate to obtain initial road condition images;

5. The method for segmenting lane marking based on graph attention mechanism network according to claim 4, wherein RetinexNet is adopted to enhance the initial image in step S32.

6. The method for segmenting the lane marking based on the graph attention mechanism network according to claim 2, wherein the step S4 comprises: