CN116092034A - Lane line detection method based on improved deep V < 3+ > model - Google Patents

Lane line detection method based on improved deep V < 3+ > model Download PDF

Info

Publication number
CN116092034A
CN116092034A CN202310058401.1A CN202310058401A CN116092034A CN 116092034 A CN116092034 A CN 116092034A CN 202310058401 A CN202310058401 A CN 202310058401A CN 116092034 A CN116092034 A CN 116092034A
Authority
CN
China
Prior art keywords
lane line
deep
model
feature map
attention mechanism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310058401.1A
Other languages
Chinese (zh)
Inventor
马晨旭
李景昂
韩永华
丁一凡
孙子昂
崔雨欣
余见楚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Sci Tech University ZSTU
Original Assignee
Zhejiang Sci Tech University ZSTU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Sci Tech University ZSTU filed Critical Zhejiang Sci Tech University ZSTU
Priority to CN202310058401.1A priority Critical patent/CN116092034A/en
Publication of CN116092034A publication Critical patent/CN116092034A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a lane line detection method based on an improved deep V & lt3+ & gt model, which is characterized by comprising a training sample set and a final label, wherein the training sample set is used for obtaining a lane line image data set; constructing an improved deep V & lt3+ & gt model, inputting a training sample into a MobilenetV2 network to obtain shallow features and deep features, inputting the deep features into a multi-scale feature enhancement ASPP module to obtain a first fusion feature map, then obtaining a deep feature map through a block weighting double-attention mechanism module, carrying out upsampling on the deep feature map, fusing the deep feature map with the shallow features, and sequentially carrying out block weighting double-attention mechanism module and upsampling on fused data to obtain sample prediction data; and training the improved deep LabV < 3+ > model through a training sample set of a classification cross entropy loss function and a similarity measurement function to obtain a lane line detection model. The method can detect the lane line accurately in real time.

Description

Lane line detection method based on improved deep V < 3+ > model
Technical Field
The invention belongs to the field of image data processing, and relates to a lane line detection method based on an improved deep V < 3+ > model.
Background
Environmental awareness is one of three core technologies of automatic driving, and provides correct surrounding environmental information for an automatic driving vehicle, so that various driving parameters of the vehicle are determined. The lane line detection based on image recognition is a core part of environment perception, so that the lane line accurate detection plays a vital role in unmanned driving to safely and efficiently participate in traffic.
Before deep learning is not applied to various industries on a large scale, people mainly let a computer understand the existence of a lane line through a traditional method mainly comprising an intuitive means and a mathematical method, and establish autonomous recognition capability. Conventional methods mainly include feature-based methods and model-based methods. The algorithm based on the characteristics classifies the pixel points in the map by extracting high-order characteristics such as texture, color, shape and the like of the lane lines; the algorithm based on the model is used for matching a preset mathematical model through feature extraction, and then parameters of the model are determined, so that lane lines are fitted. The lane line detection method based on machine vision faces a plurality of problems: 1) Because illumination and weather are easy to change the pixel value of the same scene after imaging, the traditional methods based on the pixel value operation are easy to be disturbed by illumination and weather. 2) In the model-based method, different models correspond to different parameter amounts, and selecting an unsuitable model can lead to slow model solving speed, and can lead to poor detection effect due to the fact that the shape of the lane line is not consistent with the assumed model.
The lane line detection method based on deep learning can lead the network to learn the lane line characteristics with multiple angles, multiple layers and more accuracy independently by designing and training the neural network, and is more robust and superior than the traditional method. Most of the current image segmentation models are basic full convolution neural network models, and the models take a coding and decoding mode as a main network structure to receive input images with any size. The model firstly uses convolution operation to extract the characteristics of a target object, then uses transposition convolution operation to up-sample the characteristic diagram output by the last convolution layer, so that the characteristic diagram is gradually restored to the original diagram size, and finally combines a layer jump connection structure to realize higher-precision semantic segmentation. Subsequently, based on the codec structure, a semantic segmentation generic model of Segnet model, PSPnet model, and deep series has been developed. The network models are improved by the latter, and are applied to lane line detection.
The literature Moujotahid S, benmokhtar R, breheret A, et al space-UNet: deep Learning-Based Lane Detection Using Fisheye Cameras for Autonomous Driving [ C ]// International Conference on Image Analysis and processing spring, cham,2022:576-586 discloses a space-UNet model that combines lane line location information and prior information on the basis of the U-Net model to effectively detect and infer lane lines captured by a fisheye camera. The model has good detection effect, but has weak compatibility of lane line detection accuracy and real-time performance, and lacks comprehensive consideration on complex lane line detection such as large-curvature lane lines.
Disclosure of Invention
The invention provides a lane line detection method based on an improved deep V < 3+ > model, which can detect lane lines accurately in real time.
A lane line detection method based on an improved deeplabv3+ model, comprising:
obtaining a lane line image data set and an initial label thereof, preprocessing the lane line image data set to obtain a training sample set, and reserving a lane line part in the initial label to obtain a final label;
constructing an improved deep V < 3+ > model, wherein the improved deep V < 3+ > model comprises a MobilenetV2 network, a multi-scale feature enhanced ASPP module and a block weighting double-attention mechanism module; inputting training samples into a MobilenetV2 network to obtain shallow features and deep features, inputting the deep features into a multi-scale feature enhancement ASPP module to obtain a first fusion feature map, wherein the block weighted double-attention mechanism module comprises a channel attention mechanism, a partition weighted space attention mechanism and a pixel weighted space attention mechanism, distributing weights to feature channels of the first fusion feature map through the channel attention mechanism to obtain a first distribution weight feature map, carrying out region weight division on the first distribution weight map through the partition weighted space attention mechanism to obtain a second distribution weight feature map, carrying out pixel weight division on the second distribution weight feature map through the pixel weighted space attention mechanism to obtain a deep feature map, carrying out up-sampling on the deep feature map, fusing an up-sampling result and the shallow features through layer jump connection to obtain a second fusion feature map, inputting the second fusion feature map into the block weighted double-attention mechanism module again to obtain a final fusion feature map, and carrying out up-sampling on the final fusion feature map to obtain predicted data;
constructing a total loss function, wherein the total loss function comprises a two-class cross entropy loss function and a set similarity measurement function, and the two-class cross entropy loss function and the similarity measurement function are respectively constructed based on sample prediction data and a final label; training an improved deep V & lt3+ & gt model through a total loss function based on a training sample set to obtain a lane line detection model;
when the method is applied, the lane line image data are input into a lane line detection model to obtain a lane line detection diagram.
The lane line image dataset includes large curvature lane line image data, lane line breakage image data, long straight lane line image data, and large curvature glare road segment image data.
The preprocessing of the lane line image dataset to obtain a training sample set comprises the following steps: and cutting the lane line image data set to reserve a lane line part to finish the purification of the lane line image data, and rotating, compressing or manually adding noise to the purified result to obtain a training sample set.
The step of reserving the lane line part in the initial label to obtain a final label comprises the following steps: the pixel value of the lane line in the initial label is assigned to 255, and the pixel value of the other traffic rule information is assigned to 0.
The multi-scale feature enhancement ASPP module comprises a cavity depth separable convolution, a 1×1 convolution and an Image Pooling with sampling rates of 3,6,9,12,15,18 and 24 respectively, deep features are input into the cavity depth separable convolution, the 1×1 convolution and the Image Pooling of 3,6,9,12,15,18 and 24 respectively, and the obtained results are fused to obtain a first fusion feature map.
The step of obtaining a first distribution weight graph by distributing weights to the feature channels of the first fusion feature graph through a channel attention mechanism comprises the following steps:
the channel attention mechanism comprises a maximum pooling layer, a global average pooling layer and a Sigmoid function, the first fusion feature map is respectively input into the maximum pooling layer and the global average pooling layer, the obtained maximum pooling result and the global average pooling result are added and then activated through the Sigmoid function to obtain the weight distributed to each feature channel in the first fusion feature map, and the obtained weight is multiplied with the corresponding first fusion feature map to obtain the first distribution weight feature map.
The step of performing region weight division on the first distribution weight graph through a region weighted spatial attention mechanism comprises the following steps:
the first distribution weight graph is divided into a plurality of areas by reducing the dimension of the characteristic channel into one characteristic channel through average pooling, each area is respectively subjected to maximum pooling and average pooling, the obtained maximum pooling result and the obtained average pooling result are summed, the summed result sequentially passes through a plurality of convolution and activation functions to obtain area weight information of different areas, and the area weight information and the first distribution weight feature graph are multiplied to obtain the second distribution weight feature graph.
The total loss function L is:
L=L BCE +L dice
Figure BDA0004060861850000031
Figure BDA0004060861850000041
wherein L is BCE For a cross entropy loss function of two classes, L dice To aggregate similarity measure functions, y i Final label for ith pixel point, y i N is the number of pixels of the image, the pixels of the lane line are 1, and the pixels of the background are 0. The loss function adopts weighted two-class cross entropy loss, gives different weights to the lane line and the background, and improves the segmentation precision of the model on the lane line detection
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a multi-scale feature enhancement ASPP module, which combines parallel cavity convolution layers with different sampling rates, improves the recognition capability of a model on edge lane lines and far lane lines, secondly utilizes depth separable convolution, reduces the quantity of model parameters, reduces the memory overhead of a computer, and then adds a block weighting dual-attention mechanism module after the multi-scale feature enhancement ASPP module, reasonably adjusts attention resources in two aspects of channels and spaces, so that the feature information with strong image channel and space characterization capability is fully utilized. The method aims to solve the problem of unbalance of positive and negative samples of the lane line data set. The improved deep labV < 3+ > model provided by the invention can give consideration to the average intersection ratio and the detection speed, and has good detection effect on the lane line.
Drawings
FIG. 1 is a flow chart of a lane line detection method based on an improved deep V3+ model provided by an embodiment of the invention;
fig. 2 is a lane line image data diagram provided in an embodiment of the present invention, in which fig. 2 (a) is lane line image data with a large curvature, fig. 2 (b) is lane line broken image data, fig. 2 (c) is long straight lane line image data, and fig. 2 (d) is a large curvature glare road section image data.
Fig. 3 is a label data diagram provided in an embodiment of the present invention, where fig. 3 (a) is an initial label before processing, and fig. 3 (b) is a final label after processing.
FIG. 4 is a flow chart of a lane line detection method based on an improved deep V3+ model according to an embodiment of the present invention;
FIG. 5 is a block diagram illustrating a block-weighted dual attention mechanism module (CBEAM) flow according to an embodiment of the present invention;
FIG. 6 is a block diagram of a flow chart of a attention-directed mechanism provided by an embodiment of the present invention;
FIG. 7 is a block diagram of a partitioned weighted spatial attention mechanism (BEAM) provided by an embodiment of the present invention;
FIG. 8 is a block diagram of a pixel weighted Spatial Attention Mechanism (SAM) provided by an embodiment of the present invention;
fig. 9 is a lane line detection chart generated by different models provided in embodiment 1 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The invention provides a lane line detection method based on an improved deep V < 3+ > model, which is shown in figure 1 and comprises the following steps:
(1) Obtaining a training sample set and a final label: the present embodiment selects a lane line image dataset from the dataset of the unmanned vehicle lane line challenge in the compound stage provided by hundred degrees, the lane line image dataset being collected at two city partial road sections of beijing and Shanghai, as shown in fig. 2 (a) -2 (d), including large curvature lane line image data, lane line broken image data, long straight lane line image data and large curvature glare road section image data. Each image in the lane line image dataset is 3384×1710 pixels in size, marked on a three-dimensional point cloud, and then projected onto a two-dimensional plane.
Because the upper half of the image in the lane line image dataset contains a large number of objects irrelevant to lane line detection, such as sky, tree and the like, and meanwhile, considering that the training process is quickened and enough vision is considered, the embodiment uses a clipping method to reserve the lower half of the image in about 2/3 area, uses the image with the size of 3384 multiplied by 102 pixels as the region of interest, and then reduces the equal proportion of the image to 1128 multiplied by 340 pixels, thereby preventing the overflow of the display memory during training and further obtaining the purified lane line image dataset.
The embodiment amplifies the large curvature lane lines with a small number in the purified lane line image data set in a horizontal overturning mode, and balances sample types; in the training process, images in each batch (batch) in the data set are cut randomly, brightness, contrast and saturation are increased randomly, and the condition of light brightness change in the driving process is simulated while the training speed is maintained. The amplified lane line image data set contains 11608 images in total, and the amplified lane line image data set is used as a training sample set.
The initial label provided in this embodiment includes various labels on the road, such as arrow and zebra stripes, which are irrelevant to the text, and removes the traffic rule information represented by the labels, so as to reduce the difficulty of model learning, and improve the detection accuracy of the lane lines, so that all the pixel values of the classes are assigned 0, namely, the labels are regarded as the background. As shown in fig. 3 (a), the labels before processing, which include blue, red, green and other types, only remain on the lane line, i.e. the red and blue labels, and are assigned 255, and the remaining labels are all assigned 0 as final labels, as shown in fig. 3 (b).
(2) Construction of an improved deeplabv3+ model: the improved deep V & lt3+ & gt model comprises an encoding layer and a decoding layer, wherein the encoding layer comprises a MobilenetV2 network, a multi-scale feature enhancement ASPP module and a block weighted double attention mechanism module (CBEAM), the decoding layer comprises a block weighted double attention mechanism module (CBEAM), as shown in figure 4, training samples are input into the MobilenetV2 network to perform preliminary extraction on lane line feature information to obtain deep features and shallow features, 4 times of downsampled output results are used as shallow features, 16 times of downsampled output results are used as deep features, then the deep features are input into a multi-scale feature enhancement extraction ASPP module, the parallel and more sampling rate cavity convolution layers are used for extracting features under a richer sense field, all the feature images obtained through parallel extraction are overlapped (Concat) on the channel dimension to obtain a first fusion feature image, then the first fusion feature image is input into the block weighted double attention mechanism module (CBEAM) to obtain the deep feature image, different features are effectively distributed on the space and the channel, the attention of the channel is improved, the importance of the channel is reduced to be 1 times of the relevant feature, and the importance of the channel is restored to the relevant channel is restored through the 1 times of the complexity after the channel is overlapped, and the importance of the channel is restored to the 1 times of the relevant size. In view of the importance of lane line edges to lane line detection, 4 times of up-sampled deep feature images and shallow features are fused through layer jump connection, and the boundary is thinned by means of powerful local feature characterization capability of the shallow features, so that the problem of blurred edge points is solved. And inputting the fused result to a block weighted double-attention mechanism module (CBEAM) to obtain a second fused feature map, wherein the second fused feature map is used for carrying out feature channel and spatial attention resource allocation on the fused result again. And finally, carrying out fine adjustment on the prediction result of the second fusion feature map through a 3X 3 convolution check, and then using 4 times up-sampling to restore the feature map to the size of the input image so as to realize pixel-by-pixel segmentation to obtain a sample prediction data map.
The embodiment provides the specific steps of inputting deep features into the multi-scale feature enhanced ASPP module, which are as follows:
the ASPP module provided in this embodiment includes a separable convolution of hole depths, a convolution of 1×1, and an Image Pooling with sampling rates of 3,6,9,12,15,18 and 24, and deep features are input to the separable convolution of hole depths, the convolution of 1×1, and the Image Pooling of 3,6,9,12,15,18 and 24, respectively, and the obtained results are fused to obtain a first fusion feature map. Compared with other detection objects, the shape and distribution of the lane lines are unstable, various types such as long straight lane lines, block-shaped segmented lane lines, blocked lane lines, curve lane lines and the like coexist, and the lane line distribution conditions of expressways, suburban road sections and urban complicated road sections are greatly different. The feature information of more scales of the lane lines is enriched by using a hole convolution layer with more sampling rates on the basis of the original ASPP structure (hole space pyramid pooling) by combining the characteristic that the edge lane lines and the far lane lines are high in omission ratio, and the accuracy of detecting the lane lines by the model is improved by fusing the feature information of different receptive fields of the model under multiple scales. In addition, the common cavity convolution in ASPP is replaced by the cavity depth separable convolution, so that the parameter number is effectively reduced, the model performance is improved, and the real-time performance of the model for detecting the lane line is further met.
The embodiment provides the specific steps of inputting the first fusion feature map to a block weighted dual attention mechanism module (CBEAM) to obtain a deep feature map:
as shown in fig. 5, the block weighted double Attention mechanism Module (CBEAM) provided in this embodiment includes a channel Attention mechanism (cam. Coherent Block Empowerment Attention Mechanism Module), a partition weighted spatial Attention mechanism (beam. Block Empowerment Attention Module), and a pixel weighted spatial Attention mechanism (sam. Spatial Attention Module); in the encoder, a first distribution weight feature map is obtained by distributing weights to feature channels of a first fusion feature map through a Channel Attention Mechanism (CAM), a second distribution weight feature map is obtained by carrying out region weight division on the first distribution weight map through a region weighted spatial attention mechanism (BEAM), and a deep feature map is obtained by carrying out pixel weight division on the second distribution weight feature map through a pixel weighted Spatial Attention Mechanism (SAM).
The embodiment provides that in the encoder, the specific steps of assigning weights to the feature channels of the first fusion feature map by a Channel Attention Mechanism (CAM) to obtain a first assigned weight map are as follows:
as shown in fig. 6, the first fused feature map is input into a maximum pooling layer and a global average pooling layer respectively, the obtained maximum pooling result and the global average pooling result are added and activated by a Sigmoid function to obtain the weight allocated to each feature channel in the first fused feature map, and the obtained weight is multiplied by the corresponding first fused feature map to obtain a first allocation weight feature map, so that the effect of allocating the attention resource among channels is achieved.
Similarly, in the spatial dimension, the importance of information at different locations is also different, so it is also necessary to design a spatial attention mechanism to allocate attention resources. The present embodiment provides a split area weighted spatial attention mechanism module (BEAM) that is customized based on lane line detection input image features. In the lane line detection environment, the input image is divided into areas, and it can be found that some areas are filled with sky, asphalt roads without lane lines, green plants beside lanes, heads and other irrelevant elements, and the importance of the areas is obviously lower than that of the areas with lane lines. Based on the above, as shown in fig. 7, in the encoder, the BEAM first reduces the channel dimension to 1 channel through average pooling of the result output by the Channel Attention Mechanism (CAM), then equally divides the channel dimension into 16 regions, respectively performs maximum pooling and average pooling on each region, and after the obtained results are superimposed, learns weight information by different regions through three 3×3 convolutions, and finally obtains final weight information through a sigmoid function. The weight information is multiplied by the initial input image, so that the feature map is obtained by block weighting. After the block weighting is finished, a pixel weighting Space Attention Mechanism (SAM) is connected in series to carry out weight fine adjustment to obtain a deep feature map. The BEAM performs block weighting on the spatial attention resource, so that the attention resource is distributed on a large area, and the SAM performs weighting on the spatial attention resource of the whole image, so that the attention resource is secondarily distributed on small pixels, and the utilization rate of the spatial attention resource is improved.
The pixel weighted spatial attention mechanism SAM provided in this embodiment is shown in fig. 8. In the encoder, a pixel weighted spatial attention mechanism SAM respectively generates two feature maps through global average pooling and global maximum pooling, feature information is extracted through a 7 multiplied by 7 convolution kernel after merging, each point weight is generated by using a sigmoid function and the feature maps output by BEAM are overlapped to obtain a deep feature map, so that the capability of a model for identifying a target area is enhanced.
(3) Constructing a total loss function, and training an improved deep V & lt3+ & gt model through the total loss function based on a training sample set to obtain a lane line detection model:
the total loss function provided by the implementation comprises a two-class cross entropy loss function and a set similarity measurement function, and the two-class cross entropy loss function and the set similarity measurement function are respectively constructed based on sample prediction data and a final label.
The total loss function L is:
L=L BCE +L dice
Figure BDA0004060861850000081
Figure BDA0004060861850000082
wherein L is BCE For a cross entropy loss function of two classes, L dice To aggregate similarity measure functions, y i Final label for ith pixel point, y i For the prediction data of the ith pixel point, N is the number of pixel points of the image, the lane line pixel point (positive class) is 1, and the background pixel point (negative class) is 0.
When the method is applied, the lane line image data are input into a lane line detection model to obtain a lane line detection diagram.
Example 1
The operating system used in this example 1 was ubuntu20.04, CPU was Intel (R) Xeon (R) Platinum 8358P CPU@2.60GHz,GPU was RTX 3090, memory was 90GB, and hard disk space was 50GB. The deep learning framework is PyTorch 1.10.0 and CUDA version 11.3.
In the embodiment 1, a step learning rate attenuation strategy is adopted, and an adam optimizer is used, so that a local optimum point is broken through 'momentum', and a smaller loss function value can be obtained in the training process, so that a better model convergence and optimization effect is achieved. The expanded data set is divided into a training set, a verification set and a test set according to the proportion of 8.5:1:0.5. The initial learning rate of the experimental set-up was 0.0005, the minimum learning rate was set to 5e-6, and the batch_size was setSet at 24. Respectively adopting a two-classification cross entropy loss function, a weighted two-classification cross entropy loss function and a collection similarity measurement function L dice Experiments were performed with the addition of (c). The experiment is trained for 100 rounds, and in the training process, the training loss function value and the verification loss value of each round are recorded. In addition, model evaluation is carried out every 5 rounds, and the optimal weight of the evaluation index mIoU is saved. Finally, excel was used to plot the curves of Loss and mIoU with respect to the number of training.
In the embodiment 1, the average cross ratio mIoU, the Accuracy and the average pixel Accuracy mPA are selected as main performance evaluation indexes of the experiment.
The mIoU is used as one of the most important measurement indexes for evaluating the performance of the model through semantic segmentation, the calculating method is used for calculating the ratio of the intersection value and the union value of the predicted value and the label for all the categories in the image, and then the average value of the intersection ratios of all the categories is calculated. In the two classification problem studied herein, the average intersection ratio mIoU is the lane line intersection ratio IoU lane Cross-over with background ratio IoU background Average of (2), mIoU and IoU lane The calculation formulas of (a) are respectively shown as formulas (6) and (7).
Figure BDA0004060861850000091
Figure BDA0004060861850000092
The Accuracy is used for quantitatively measuring the accurate condition of model prediction, and the calculation formula is as follows:
Figure BDA0004060861850000093
the mPA is the ratio of the number of pixels with positive prediction results of each class to the total number of the pixels, and then the average of each class is calculated, and the calculation formula is as follows:
Figure BDA0004060861850000094
where M is the number of classes, TP is the sum of the number of pixels predicted to be positive but actually positive, TN is the sum of the number of pixels predicted to be negative but actually negative, FN is the sum of the number of pixels predicted to be negative but actually positive, and FP is the sum of the number of pixels predicted to be positive but actually negative.
To explore the merits of a single cross-class entropy loss function and the addition of the cross-class entropy loss function and the Dice, ioU is obtained by training two schemes based on DeeplabV3+ model design experiments lane ,IoU background The mIoU, accuracy, mPA, single map prediction time and model parameters are shown in Table 1:
(1) scheme one: using a two-class cross entropy loss function;
(2) scheme II: two-classification cross entropy loss function and set similarity measurement function L dice Is added to the sum of (3).
TABLE 1 experimental results of different loss functions
Figure BDA0004060861850000101
As can be seen from Table 1, the data of each index of the scheme II is better than that of the scheme I, so that the two-class cross entropy loss function and the set similarity measurement function L are selected dice As a function of loss as used herein.
To verify the necessity of adding the above three modules in the deep v3+ model, the following ablation experiments were designed:
(1) scheme one: improving a backbone network module in a deep V < 3+ > model, and training;
(2) scheme II: based on the first scheme, the method is improved into a multi-scale feature extraction enhancement module and training is carried out;
(3) scheme III: on the basis of scheme one, a dual-attention mechanism module CBEAM is added and training is carried out.
All four schemes use the sum of the two classification cross entropy loss functions and the Dice loss as the loss function. Semantic segmentation performance index IoU obtained through training of four schemes lane ,IoU background The mlou, PA, mPA, single map prediction time and model parameters are shown in table 2:
table 2 ablation experimental results
Figure BDA0004060861850000102
Figure BDA0004060861850000111
Compared with a deep LabV < 3+ > model taking Xreception as a main network, the main network module is improved in the scheme I, so that the parameter quantity of the main network is changed from 88M to 3.4M, and the model ensures the real-time performance while considering the accuracy. And (3) taking the model in the scheme I as a basic model, comparing the model with the schemes II and III, and verifying the necessity of adding corresponding modules in the schemes II and III. In the scheme II, a multi-scale feature extraction enhancement module is improved, and even if a plurality of parallel cavity convolution layers are additionally added, the overall parameters and the prediction time of a model obtained through training are still smaller than those of the scheme I, and in addition, the mIoU is improved because the feature information under different receptive fields is obtained more abundantly; in scheme IV, a double-attention mechanism module CBEAM and a model IoU are added lane And the mIoU is improved, so that the model can better allocate the attention resources in the two aspects of space and channel, and a better model segmentation effect is achieved.
In order to further verify the segmentation performance of the improved deep labv3+ model, the model proposed in this embodiment is compared with a classical image segmentation model, and the experimental scheme is designed as follows:
(1) scheme one: training using the deeplabv3+ model;
(2) scheme II: training using the modified deeplabv3+ model presented herein;
(3) scheme III: training by using a Unet model with VGG as a backbone network;
(4) scheme IV: training was performed using the PSPnet model with mobiletv 2 as the backbone network.
All four schemes use a two-classification cross entropy loss function and a set similarity measurement function L dice As a function of loss. IoU obtained by training four schemes lane ,IoU background The mIoU, accuracy, mPA, single map prediction time and model parameters are shown in Table 3:
TABLE 3 experimental results of different image segmentation models
Figure BDA0004060861850000112
Figure BDA0004060861850000121
As can be seen from Table 2, the lane crossing ratio IoU of the modified deep V3+ model lane Compared with the DeeplabV3+ model, the method is 1.66 percent higher, the average intersection is 0.85 percent higher than the mIoU, the single-graph prediction accuracy is improved by 2.28ms, but the model parameter is reduced by 2.24MB, and the improved DeeplabV3+ model is more suitable for lane line detection compared with the DeeplabV3+ model. The single-graph prediction time and parameter quantity of the improved deep V < 3+ > model are slightly lower than those of the PSPnet model, but the high real-time rate and low parameter quantity of the PSPnet sacrifice the prediction precision, so that the prediction effect of the network is poor, and the judgment of a vehicle decision system in automatic driving is interfered. The lane line intersection ratio and the average intersection ratio of the improved deep labV & lt3+ & gt model are slightly lower than those of the Unet model, but the high precision of the Unet sacrifices single-graph prediction time and parameter quantity which are 2.45 times and 4.77 times that of the improved deep labV & lt3+ & gt model respectively, and the real-time performance of lane line prediction is poor. In conclusion, improving deep V < 3+ > can give consideration to the accuracy and the real-time performance of lane line detection.
In order to intuitively feel the advantages of the model proposed herein, a hundred-degree unmanned vehicle lane line is selected to challenge a test image selected by a Road section in a Road official data set to predict, an obtained lane line detection diagram is shown in fig. 9, and the improved deep V < 3+ > can also have a good prediction effect under two different conditions of glare and low brightness; when the prediction effect of the improved deep V & lt3+ & gt model is used for predicting the edge lane line pixel points and the far-end lane line pixel points, the prediction effect is improved to a certain extent compared with the deep V & lt3+ & gt model and the PSPnet model, and the prediction effect is achieved due to the fact that the multi-scale feature enhancement extraction module and the double-attention mechanism module are combined, but a certain progress space is still provided compared with the Unet model.
The embodiment aims at the problems that the main network Xportion of the deep V < 3+ > model is difficult to train, the ASPP structure cannot reasonably distribute space and channel weight, accuracy and instantaneity are difficult to consider, and the embodiment replaces the main network with the Mobilene V < 2 >, fuses the double-attention mechanism CBEAM and uses the cavity depth separable convolution, so that the problem that the recognition accuracy of the long straight lane line and the large curvature lane line is low is effectively solved based on the improved deep V < 3+ > model. Experiments show that the detection accuracy of the improved deep V < 3+ > model is up to 99.35%, the average cross ratio is up to 86.08%, the single-graph prediction model can reach 22.62ms, the model parameter is 19.9MB, the prediction accuracy and the reasoning speed are both considered, and the method has strong generalization capability in the aspect of lane line detection.

Claims (8)

1. The lane line detection method based on the improved deep V < 3+ > model is characterized by comprising the following steps of:
obtaining a lane line image data set and an initial label thereof, preprocessing the lane line image data set to obtain a training sample set, and reserving a lane line part in the initial label to obtain a final label;
constructing an improved deep V < 3+ > model, wherein the improved deep V < 3+ > model comprises a MobilenetV2 network, a multi-scale feature enhanced ASPP module and a block weighting double-attention mechanism module; inputting training samples into a MobilenetV2 network to obtain shallow features and deep features, inputting the deep features into a multi-scale feature enhancement ASPP module to obtain a first fusion feature map, wherein the block weighted double-attention mechanism module comprises a channel attention mechanism, a partition weighted space attention mechanism and a pixel weighted space attention mechanism, distributing weights to feature channels of the first fusion feature map through the channel attention mechanism to obtain a first distribution weight feature map, carrying out region weight division on the first distribution weight map through the partition weighted space attention mechanism to obtain a second distribution weight feature map, carrying out pixel weight division on the second distribution weight feature map through the pixel weighted space attention mechanism to obtain a deep feature map, carrying out up-sampling on the deep feature map, fusing an up-sampling result and shallow features through layer jump connection to obtain a second fusion feature map, inputting the second fusion feature map into the block weighted double-attention mechanism module again to obtain a final fusion feature map, and carrying out up-sampling on the final fusion feature map to obtain predicted data;
constructing a total loss function, wherein the total loss function comprises a two-class cross entropy loss function and a set similarity measurement function, and the two-class cross entropy loss function and the similarity measurement function are respectively constructed based on sample prediction data and a final label; training an improved deep V & lt3+ & gt model through a total loss function based on a training sample set to obtain a lane line detection model;
when the method is applied, the lane line image data are input into a lane line detection model to obtain a lane line detection diagram.
2. The lane line detection method based on the improved deep v3+ model according to claim 1, wherein the lane line image dataset includes large curvature lane line image data, lane line breakage image data, long straight lane line image data, and large curvature glare section image data.
3. The lane line detection method based on the improved deep v3+ model according to claim 1, wherein preprocessing the lane line image dataset to obtain a training sample set comprises: and cutting the lane line image data set to reserve a lane line part to finish the purification of the lane line image data, and rotating, compressing or manually adding noise to the purified result to obtain a training sample set.
4. The lane line detection method based on the improved deep v3+ model according to claim 1, wherein the retaining the lane line portion in the initial tag obtains a final tag, comprising: the pixel value of the lane line in the initial label is assigned to 255, and the pixel value of the other traffic rule information is assigned to 0.
5. The lane line detection method based on the improved deep v3+ model according to claim 1, wherein the multi-scale feature enhancement ASPP module comprises a hole depth separable convolution, a 1×1 convolution and an Image Pooling with sampling rates of 3,6,9,12,15,18 and 24, respectively, and deep features are input into the hole depth separable convolution, the 1×1 convolution and the Image Pooling of 3,6,9,12,15,18 and 24, respectively, and the obtained results are fused to obtain a first fusion feature map.
6. The lane line detection method based on the improved deep v3+ model according to claim 1, wherein the assigning weights to the feature channels of the first fusion feature map by the channel attention mechanism to obtain a first assigned weight map comprises:
the channel attention mechanism comprises a maximum pooling layer, a global average pooling layer and a Sigmoid function, the first fusion feature map is respectively input into the maximum pooling layer and the global average pooling layer, the obtained maximum pooling result and the global average pooling result are added and then activated through the Sigmoid function to obtain the weight distributed to each feature channel in the first fusion feature map, and the obtained weight is multiplied with the corresponding first fusion feature map to obtain the first distribution weight feature map.
7. The lane line detection method based on the improved deep v3+ model according to claim 1, wherein the performing region weight division on the first distribution weight map by the region weighted spatial attention mechanism comprises:
the first distribution weight graph is divided into a plurality of areas by reducing the dimension of the characteristic channel into one characteristic channel through average pooling, each area is respectively subjected to maximum pooling and average pooling, the obtained maximum pooling result and the obtained average pooling result are summed, the summed result sequentially passes through a plurality of convolution and activation functions to obtain area weight information of different areas, and the area weight information and the first distribution weight feature graph are multiplied to obtain the second distribution weight feature graph.
8. The lane line detection method based on the modified deep v3+ model according to claim 1, wherein the total loss function L is:
L=L BCE +L dice
Figure FDA0004060861840000021
Figure FDA0004060861840000031
wherein L is BCE For a cross entropy loss function of two classes, L dice To aggregate similarity measure functions, y i Final label for ith pixel point, y i N is the number of pixels of the image, the pixels of the lane line are 1, and the pixels of the background are 0.
CN202310058401.1A 2023-01-13 2023-01-13 Lane line detection method based on improved deep V < 3+ > model Pending CN116092034A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310058401.1A CN116092034A (en) 2023-01-13 2023-01-13 Lane line detection method based on improved deep V < 3+ > model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310058401.1A CN116092034A (en) 2023-01-13 2023-01-13 Lane line detection method based on improved deep V < 3+ > model

Publications (1)

Publication Number Publication Date
CN116092034A true CN116092034A (en) 2023-05-09

Family

ID=86207936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310058401.1A Pending CN116092034A (en) 2023-01-13 2023-01-13 Lane line detection method based on improved deep V < 3+ > model

Country Status (1)

Country Link
CN (1) CN116092034A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576649A (en) * 2023-12-26 2024-02-20 华东师范大学 Lane line detection method and system based on segmentation points and dual-feature enhancement

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576649A (en) * 2023-12-26 2024-02-20 华东师范大学 Lane line detection method and system based on segmentation points and dual-feature enhancement
CN117576649B (en) * 2023-12-26 2024-04-30 华东师范大学 Lane line detection method and system based on segmentation points and dual-feature enhancement

Similar Documents

Publication Publication Date Title
CN109993082B (en) Convolutional neural network road scene classification and road segmentation method
CN110136170B (en) Remote sensing image building change detection method based on convolutional neural network
CN111126202B (en) Optical remote sensing image target detection method based on void feature pyramid network
CN107609602A (en) A kind of Driving Scene sorting technique based on convolutional neural networks
CN110956094A (en) RGB-D multi-mode fusion personnel detection method based on asymmetric double-current network
JPWO2020181685A5 (en)
CN112489054A (en) Remote sensing image semantic segmentation method based on deep learning
CN113688836A (en) Real-time road image semantic segmentation method and system based on deep learning
CN115035361A (en) Target detection method and system based on attention mechanism and feature cross fusion
CN109903339B (en) Video group figure positioning detection method based on multi-dimensional fusion features
CN114821342B (en) Remote sensing image road extraction method and system
CN114359130A (en) Road crack detection method based on unmanned aerial vehicle image
CN111968088A (en) Building detection method based on pixel and region segmentation decision fusion
CN114187520B (en) Building extraction model construction and application method
CN113255678A (en) Road crack automatic identification method based on semantic segmentation
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
CN116246169A (en) SAH-Unet-based high-resolution remote sensing image impervious surface extraction method
CN114943902A (en) Urban vegetation unmanned aerial vehicle remote sensing classification method based on multi-scale feature perception network
CN113505670A (en) Remote sensing image weak supervision building extraction method based on multi-scale CAM and super-pixels
CN114998744A (en) Agricultural machinery track field segmentation method based on motion and vision dual-feature fusion
CN113569724A (en) Road extraction method and system based on attention mechanism and dilation convolution
CN114596316A (en) Road image detail capturing method based on semantic segmentation
CN117197763A (en) Road crack detection method and system based on cross attention guide feature alignment network
CN116092034A (en) Lane line detection method based on improved deep V &lt; 3+ &gt; model
CN115984537A (en) Image processing method and device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination