CN114674338B - Fine recommendation method for road drivable area based on layered input and output and double-attention jump - Google Patents
Fine recommendation method for road drivable area based on layered input and output and double-attention jump Download PDFInfo
- Publication number
- CN114674338B CN114674338B CN202210366807.1A CN202210366807A CN114674338B CN 114674338 B CN114674338 B CN 114674338B CN 202210366807 A CN202210366807 A CN 202210366807A CN 114674338 B CN114674338 B CN 114674338B
- Authority
- CN
- China
- Prior art keywords
- attention
- output
- model
- road
- layered
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 230000007246 mechanism Effects 0.000 claims abstract description 11
- 230000009977 dual effect Effects 0.000 claims abstract description 4
- 238000012549 training Methods 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 7
- 230000003213 activating effect Effects 0.000 claims description 6
- 230000003247 decreasing effect Effects 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000010200 validation analysis Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000009191 jumping Effects 0.000 claims 2
- 238000001514 detection method Methods 0.000 abstract description 36
- 230000011218 segmentation Effects 0.000 abstract description 12
- 230000003993 interaction Effects 0.000 abstract description 4
- 230000008901 benefit Effects 0.000 abstract description 3
- 230000000877 morphologic effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 13
- 238000002474 experimental method Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
- G01C21/3453—Special cost functions, i.e. other than distance or default speed limit of road segments
- G01C21/3461—Preferred or disfavoured areas, e.g. dangerous zones, toll or emission zones, intersections, manoeuvre types, segments such as motorways, toll roads, ferries
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/28—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
- G01C21/30—Map- or contour-matching
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/34—Route searching; Route guidance
- G01C21/36—Input/output arrangements for on-board computers
- G01C21/3602—Input other than that of destination using image analysis, e.g. detection of road signs, lanes, buildings, real preceding vehicles using a camera
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Automation & Control Theory (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Traffic Control Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a road drivable area fine recommendation method based on layered input and output and double-attention jump, which constructs an inverted pyramid type multi-scale layered input and layered output structure on the basis of a framework of an encoder-decoder so as to effectively integrate morphological characteristics and semantic information of a road; a jump connection structure integrating the channel attention and the space attention is constructed to realize accurate detection of different driving areas. The method is a drivable area recommending method which integrates a multi-scale interaction strategy and a dual attention mechanism under an M-shaped depth architecture, and aiming at complex roads with fuzzy boundaries and changeable road conditions, the method can divide the strongly recommended, weakly recommended and non-recommended drivable areas of the roads based on vision fine division in complex driving scenes so as to meet different detection requirements of intelligent automobiles on the drivable areas under normal, emergency and other complex driving conditions. The model provided gives consideration to the segmentation precision and the time efficiency, and has obvious advantages in the detection task of the travelable area of the complex road.
Description
Technical Field
The invention relates to a road drivable area recommending method, in particular to a road drivable area fine recommending method based on layered input and output and double-attention jump connection, which belongs to the technical field of automatic driving and computer vision and particularly relates to a road drivable area detecting method.
Background
In vision-based automatic driving environment sensing, an intelligent automobile detects a road area available for driving by screening traffic element information such as road surfaces, vehicles, pedestrians, positive and negative barriers and the like of a front scene, and can provide important information support for further path planning and driving decision. For structured roads with good pavement, clear lanes and clear boundaries, the automatic driving at the present stage can realize effective detection of the drivable area. However, for roads with lower structuring degree such as suburb non-main roads and rural streets, the prominent problems of fuzzy lanes and boundaries, strong randomness of participants and the like exist, so that the feature information of the non-structured roads is difficult to effectively capture by a travelable region detection algorithm which is provided for the structured roads, the accuracy and the instantaneity of detection tasks are seriously influenced, and even failure is caused. More importantly, for safety reasons, a human driver may choose to run emergency on a flat area that is not normally considered a road in an emergency situation. In view of the particularly complex and changeable road conditions of unstructured roads, the intelligent automobile is more suitable for the capability of a human driver for coping with emergency working conditions. Therefore, the method for carrying out diversified segmentation and refined recommendation on the drivable area of the complex road so as to adapt to different driving conditions is a key task which is critical to the driving safety of the intelligent automobile.
Currently, vision-based detection methods for a complex road drivable area can be mainly classified into three types: based on appearance descriptions, based on geometric information and based on semantic segmentation. The method based on the appearance description is extremely easy to be interfered by factors such as illumination change, pavement shading and the like due to single appearance characteristics; many researches integrate geometric information of roads on the basis of appearance description, but when the geometric constraint condition of a scene cannot be met or the quality of three-dimensional data used is low, the performance of a method based on the geometric information is seriously degraded; the existing data based on the semantic segmentation method are mostly obtained under the specific running environment of foreign countries, are not completely consistent with complex road conditions of China, and the algorithm performs semantic segmentation on all objects in a scene, so that the redundancy is high, the focusing is insufficient, and the accuracy of the extracted running area is limited. In addition, no matter based on the appearance description, the geometric information or the semantic segmentation method, the existing travelable region detection model mostly only extracts a single road region as a travelable region, and can not give consideration to the normal driving working condition and the emergency driving working condition, so that the method is difficult to adapt to the changeable road conditions of the complex road. Therefore, in the actual automatic driving task at present, a method for recommending a drivable area which takes complex traffic scenes in China into consideration and simultaneously extracts different areas of a road and gives consideration to normal driving conditions and emergency driving conditions is needed.
Related patent literature: CN113223313a discloses a lane recommendation method, a lane recommendation device and a vehicle-mounted communication device, and the lane recommendation method comprises the following steps: acquiring lane information of a road where a target vehicle is currently located through a vehicle-mounted communication technology; receiving vehicle data of surrounding vehicles of the target vehicle through an on-vehicle communication technology; determining the positions of surrounding vehicles in a lane according to the lane information and the vehicle data; determining the driving parameters of each lane to be selected by the target vehicle according to the vehicle data and the positions of surrounding vehicles in the lanes; and determining the passing time length of the target vehicle passing through a preset road section according to the running parameters of each lane to be selected of the target vehicle, and recommending the lanes according to the determined passing time length. CN112857381a discloses a path recommending method, device and readable medium. The method comprises the following steps: the method comprises the steps of identifying a target object with congestion characteristics in an acquired image, determining road condition information of a driving road according to the target object and current navigation data, generating an alternative path according to the road condition information, and recommending the path, so that navigation can obtain more specific and accurate road condition information in time, and the problem that an incorrect path cannot be corrected in time is solved.
The specific guidance scheme is not provided for the method for recommending the road drivable area by the technologies on how to solve the detection problems of fuzzy aliasing of the road area boundary and complex and changeable driving working conditions of an automobile in an actual driving task and improve the detection precision and time efficiency of the road drivable area.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention aims to provide a road drivable area fine recommendation method based on layered input and output and double-attention jump connection, which focuses on the high efficiency and accuracy of feature extraction, effectively considers both the accuracy and the real-time performance, solves the detection problems of fuzzy aliasing of the road area boundary and complex and changeable driving working conditions of an automobile in an actual driving task, and improves the detection precision and the time efficiency of the road drivable area.
In order to solve the technical problems, the invention adopts the following technical scheme:
a fine recommendation method for a road drivable area based on layered input and output and double-attention jump (or a recommendation method for the road drivable area based on M-shaped depth architecture) is characterized by comprising the following steps:
step (1): constructing a data set with a label, dividing the data set into a training set, a verification set and a test set, and preprocessing the data set;
further, the preferred technical scheme may be: the step of constructing the labeled dataset in the step (1) is as follows:
step (101): labeling, merging and modifying the existing complex road driving scene image to enable the complex road driving scene image to accord with 4 types of driving area detection tasks of strong recommendation, weak recommendation, non-recommendation and background, wherein the part of the sample is marked as IDD_ unst;
Step (102): the method comprises the steps that an all-terrain intelligent experiment vehicle is utilized, a vehicle-mounted camera is utilized to collect images of complex roads in a closed/semi-closed park in constant-speed running, driving scene images of the complex roads are marked correspondingly, and a sample of the complex roads is marked as Campus_ unst;
step (103): and acquiring and marking complex road driving scene images of suburbs, villages and other places in China by using a vehicle recorder of a common passenger vehicle, wherein the part of the images are marked as China_ unst.
Step (2): based on the U-shaped encoder-decoder structure, an M-shaped encoder-decoder network, namely an M 2 AttentionNet model, is constructed by adding three large structures of multi-scale hierarchical input, double-attention jumper and multi-scale hierarchical output.
Step (3): constructing an inverted pyramid type layered input structure at the input end of the model encoder, namely constructing a multi-scale layered input structure, reserving shallow features on different scale levels by the multi-scale layered input structure, and fusing the shallow features and deep semantics layer by layer;
Further, the preferred technical scheme may be: the multi-scale hierarchical input structure constructed by the input end of the model encoder comprises the following steps:
Step (31): performing continuous maximum pooling downsampling on the image I to be detected to generate an image inverted pyramid { I,1/2I,1/4I,1/8I } with a decreasing scale;
step (32): and layering and merging the images with four scales into corresponding levels of the encoder branch, activating and extracting features through Conv, BN and ReLU, merging the features generated by the previous level with the feature map generated by the previous level in a channel dimension splicing mode, and inputting the feature map into a network encoder.
Step (4): four levels were constructed at the M 2 AttentionNet encoder arm, with two successive feature extractions at each level using a combination of 3 x 3Conv, BN and ReLU operations.
Step (5): the resolution of the same level is kept unchanged, and 2 x 2 max pooling is used between layers for downsampling.
Step (6): for the decoder branches, each layer uses Conv-BN-ReLU combinations of the same parameters for two consecutive feature extractions, 2 x 2 upsampling for nearest neighbor interpolation between layers.
Step (7): and activating the final terminal of the decoder branch by using 1 multiplied by 1Conv, BN and Softmax to carry out quaternary classification, generating a prediction result of the scale of an input image and the like, wherein 4 categories respectively correspond to a strong recommended driving area, a weak recommended driving area, a non-recommended driving area and a background area in the driving scene.
Step (8): designing an output structure of layered prediction and layered loss at the output end of the model;
further, the preferred technical scheme may be: the specific steps and formulas of the output structure of the hierarchical prediction and the hierarchical loss (or the specific steps of the multi-scale hierarchical output constructed by the model decoder branch) are designed at the model output end are as follows:
step (81): outputting a corresponding travelable region prediction map R s (layer sequence s=1, 2,3, 4) at each layer of the decoder branch by up-sampling and convolution combination (including 1×1Conv, BN and Softmax activation), and merging the prediction maps of all layers into a final travelable region prediction result;
Step (82): by single-hot encoding, the loss of all levels of the decoder branches is fused and calculated, and the level loss l s of the s-th layer is defined as:
Wherein, I is an input image, R opt is a true value, θ is a network parameter, N is the number of label categories, where N is 4; in the one-hot mode, for class k, Y k + and Y k - are the sets of pixels labeled positive (1) and negative (0) in their true values, x k is the predicted value, γ is the constant factor, ω is the balance factor;
Step (83): the total loss function L of the calculation model is the sum of four decoder level losses L s, l= Σl s;
Further, the preferred technical scheme may be: in the step (8), in the formula of designing the hierarchical prediction and the hierarchical loss at the output end of the model, γ=2, ω=0.55, and the loss function is a focusing loss.
Step (9): the jump connection part in the middle of the encoder-decoder is designed into a double-attention jump connection structure, and the specific steps (the preferred technical scheme) are as follows:
step (91): integrating a channel attention and space attention dual mechanism in the hierarchical jump process;
step (92): the feature map F w×h×c obtained by each level of the encoder is subjected to fine adjustment through a channel attention module and a space attention module in sequence;
Step (93): and performing channel dimension splicing on the feature map adjusted by the double-attention mechanism and the up-sampling feature map of the corresponding layer of the decoder to obtain a final output feature map F' w×h×c.
Step (10): training the M 2 AttentionNet model by using a training set to obtain a model with well trained parameters; detecting the trained model by using the test set to obtain a road drivable area in a complex traffic scene;
further, the preferred technical scheme may be: step (10), when the model is trained, specific parameters are set as follows: in the training process, using a Glorot tool built in Keras to initialize all the parameters of the convolution layer, initializing the deviation of the parameters to 0, and updating and optimizing all the parameters by using a random gradient descent method; the Batchsize parameter is set to 64, the initial learning rate is 1e-4, the momentum is 0.9, and the increment is reduced by 1e-6 once each iteration; to prevent model overfitting, the input layer uses dropout at a rate of 0.1, the output layer uses dropout at a rate of 0.4, and an early stop strategy is employed to stop training in advance when it is detected that the validation set error is no longer decreasing for 20 iteration cycles. And (10) adopting a ten-fold cross-validation method when the model is trained, and amplifying the sample by using a horizontal overturning, brightness adjusting and random noise preprocessing method.
Step (11): and acquiring real-time traffic scene data in actual driving, and inputting the real-time traffic scene data into a trained M 2 AttentionNet model to obtain recommended results of different driving areas.
The invention discloses a method for recommending a drivable area by fusing a multi-scale interaction strategy and a dual-attention mechanism under an M-shaped depth architecture, which aims at complex roads with fuzzy boundaries and changeable road conditions, and can divide the strongly recommended, weakly recommended and non-recommended drivable areas of the roads based on visual fine division in complex driving scenes so as to meet different detection requirements of intelligent automobiles on the drivable areas under normal, emergency and other complex driving conditions. Firstly, constructing an inverted pyramid type multi-scale layered input and layered output structure on the basis of a skeleton of an encoder-decoder so as to effectively fuse morphological characteristics and semantic information of a road; secondly, a jump connection structure integrating the channel attention and the space attention is constructed to realize accurate detection of different driving areas. The method can better realize the fine segmentation of the strong recommended driving area, the weak recommended driving area, the non-recommended driving area and the background area under various real driving scenes. Compared with other existing main stream models, the model provided gives consideration to the segmentation precision and the time efficiency, and has obvious advantages in the detection task of the travelable area of the complex road.
In general, compared with the prior art, the technical scheme designed by the invention has the following technical characteristics and beneficial effects:
(1) The road drivable region segmentation model M 2 AttentionNet integrating the multi-scale interaction strategy and the dual-attention mechanism can accurately segment a real driving scene image of a road into a strong recommended driving region, a weak recommended driving region, an un-recommended driving region and a background region, can cope with special driving conditions such as narrow road meeting, emergency avoidance and the like, and effectively adapts to changeable road conditions of different roads.
(2) According to the invention, three structures of multi-scale layered input, double-attention jump connection and multi-scale layered output are designed on an encoder-decoder framework, an M-shaped deep convolutional neural network framework is constructed, shallow layer characteristics and deep semantics are effectively fused, model prediction bias on different scales is balanced, the learning process is focused on important characteristics related to road travelling performance, and the model performance is effectively improved; the invention effectively considers the accuracy and the real-time performance, can obtain good fine detection effect under different real scenes, has the average intersection ratio reaching 92.46 percent and the average detection speed reaching 22.7 frames/second, effectively completes the fine detection task of the travelable area of the complex road, and has better generalization performance.
In summary, the invention provides a road drivable region fine recommendation method based on layered input and output and double attention skip, which utilizes a convolutional neural network and a double attention mechanism to pay attention to the high efficiency and accuracy of feature extraction, effectively considers both accuracy and instantaneity, solves the detection problems of fuzzy aliasing of road region boundaries and complex and changeable driving working conditions of automobiles in actual driving tasks, and improves the detection precision and time efficiency of the road drivable region.
Drawings
Fig. 1 is an M-shaped architecture schematic diagram of an M 2 AttentionNet model according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of fine recommendation of a road drivable area according to an embodiment of the present invention, and fig. 2 (a) is a schematic diagram of driving scenario 1, and (b) is a schematic diagram of fine recommendation of a road drivable area according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a dual-attention jumper module according to an embodiment of the present invention.
Fig. 4 is a schematic diagram comparing the detection result and the manual detection result of the method according to the embodiment of the present invention, where (c) in fig. 4 is an input image (schematic diagram of driving scene 2), (d) is a manual detection result diagram (extraction result diagram), and (e) is a detection result diagram (extraction result diagram) of the method according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
As shown in fig. 1, fig. 1 is an M-shaped architecture schematic diagram of an M 2 AttentionNet model provided by an embodiment of the present invention, and a target to be achieved is shown in fig. 2, and a method for finely recommending a road drivable area based on hierarchical input and output and double-attention jumper specifically includes the following steps:
step (1): constructing a data set with a label, dividing the data set into a training set, a verification set and a test set, and preprocessing the data set;
further, the step of constructing the labeled dataset in the step (1) is as follows:
step (101): marking, merging and modifying the existing road driving scene image to enable the road driving scene image to accord with 4 types of driving area detection tasks of strong recommendation, weak recommendation, non-recommendation and background, wherein the part of the sample is marked as IDD_ unst;
Step (102): the method comprises the steps that an all-terrain intelligent experiment vehicle is utilized, a vehicle-mounted camera is utilized to collect images of roads in a closed/semi-closed park in constant-speed running, driving scene images of the roads are marked correspondingly, and a sample of the road is marked as Campus_ unst;
step (103): and acquiring and marking complex road driving scene images of suburbs, villages and other places in China by using a vehicle recorder of a common passenger vehicle, wherein the part of the images are marked as China_ unst.
Step (2): based on the U-shaped encoder-decoder structure, an M-shaped encoder-decoder network, namely an M 2 AttentionNet model, is constructed by adding three large structures of multi-scale hierarchical input, double-attention jumper and multi-scale hierarchical output.
Step (3): constructing an inverted pyramid type layered input structure at the input end of the model encoder, namely constructing a multi-scale layered input structure, reserving shallow features on different scale levels by the multi-scale layered input structure, and fusing the shallow features and deep semantics layer by layer; the multi-scale hierarchical input structure constructed by the input end of the model encoder comprises the following steps:
Step (31): performing continuous maximum pooling downsampling on the image I to be detected to generate an image inverted pyramid { I,1/2I,1/4I,1/8I } with a decreasing scale;
step (32): and layering and merging the images with four scales into corresponding levels of the encoder branch, activating and extracting features through Conv, BN and ReLU, merging the features generated by the previous level with the feature map generated by the previous level in a channel dimension splicing mode, and inputting the feature map into a network encoder.
Step (4): four levels were constructed at the M 2 AttentionNet encoder arm, with two successive feature extractions at each level using a combination of 3 x 3Conv, BN and ReLU operations.
Step (5): the resolution of the same level is kept unchanged, and 2 x 2 max pooling is used between layers for downsampling.
Step (6): for the decoder branches, each layer uses Conv-BN-ReLU combinations of the same parameters for two consecutive feature extractions, 2 x 2 upsampling for nearest neighbor interpolation between layers.
Step (7): and activating the final terminal of the decoder branch by using 1 multiplied by 1Conv, BN and Softmax to carry out quaternary classification, generating a prediction result of the scale of an input image and the like, wherein 4 categories respectively correspond to a strong recommended driving area, a weak recommended driving area, a non-recommended driving area and a background area in the driving scene.
Step (8): the output structure of the hierarchical prediction and the hierarchical loss is designed at the output end of the model, and the specific steps of the output structure (or the specific steps of the multi-scale hierarchical output constructed by the model decoder branch) are as follows:
step (81): outputting a corresponding travelable region prediction map R s (layer sequence s=1, 2,3, 4) at each layer of the decoder branch by up-sampling and convolution combination (including 1×1Conv, BN and Softmax activation), and merging the prediction maps of all layers into a final travelable region prediction result;
Step (82): by single-hot encoding, the loss of all levels of the decoder branches is fused and calculated, and the level loss l s of the s-th layer is defined as:
Wherein, I is an input image, R opt is a true value, θ is a network parameter, N is the number of label categories, where N is 4; in the one-hot mode, for class k, Y k + and Y k - are the sets of pixels labeled positive (1) and negative (0) in their true values, x k is the predicted value, γ is a constant factor, and ω is the balance factor, respectively. In the formula, γ=2 (a possible value), ω=0.55 (a possible value), and the loss function is a focus loss.
Step (83): the total loss function L of the calculation model is the sum of the four decoder level losses L s, l= Σl s.
Step (9): the jump connection part in the middle of the encoder-decoder is designed into a double-attention jump connection structure, as shown in fig. 3, and the specific steps are as follows:
step (91): integrating a channel attention and space attention dual mechanism in the hierarchical jump process;
step (92): the feature map F w×h×c obtained by each level of the encoder is subjected to fine adjustment through a channel attention module and a space attention module in sequence;
Step (93): and performing channel dimension splicing on the feature map adjusted by the double-attention mechanism and the up-sampling feature map of the corresponding layer of the decoder to obtain a final output feature map F' w×h×c.
Step (10): training the M 2 AttentionNet model by using a training set to obtain a model with well trained parameters; and detecting the trained model by using the test set to obtain the road drivable area in the complex traffic scene. Step (10), when the model is trained, specific parameters are set as follows: in the training process, using a Glorot tool built in Keras to initialize all the parameters of the convolution layer, initializing the deviation of the parameters to 0, and updating and optimizing all the parameters by using a random gradient descent method; the Batchsize parameter is set to 64, the initial learning rate is 1e-4, the momentum is 0.9, and the increment is reduced by 1e-6 once each iteration; to prevent model overfitting, the input layer uses dropout at a rate of 0.1, the output layer uses dropout at a rate of 0.4, and an early stop strategy is employed to stop training in advance when it is detected that the validation set error is no longer decreasing for 20 iteration cycles. And (10) adopting a ten-fold cross-validation method when the model is trained, and amplifying the sample by using a horizontal overturning, brightness adjusting and random noise preprocessing method.
Step (11): and acquiring real-time traffic scene data in actual driving, and inputting the real-time traffic scene data into a trained M 2 AttentionNet model to obtain recommended results of different driving areas.
As shown in FIG. 4, FIG. 4 is a schematic diagram showing the comparison of the detection result and the manual detection result of the method of the present invention.
Further, the method of the present invention conducted more extensive detection and extraction experiments on the published dataset IDD and the constructed dataset URDD, including experiments on both structured and unstructured roads, and quantitatively compared with 9 representative methods (all of which are known techniques) published in 2015-2021, FCN model, UNet model, segNet model, PSPNet model, deeplabV3 +model, DANet model, modified DeeplabV3 +model, HIERARCHICAL ATTENTION model, HR-Net model, etc., which were well-known in the art, under the same conditions. The comparison uses 2 pixel level evaluation indexes: the cross-over ratios (IoU) and the comprehensive average cross-over ratio (mIoU) for each class are defined in Table 1. Wherein IoU is the overlap ratio of a model detection region (R k) and its true region (R k opt) of a certain class, i.e. the ratio of intersection to union. mIoU is the average IoU index of the full class. The higher the values IoU and mIoU, the stronger the segmentation performance of the representative model.
Table 1 Algorithm Performance evaluation index
Table 2 shows the accuracy and efficiency achieved by the different models on URDD datasets. All methods of obtaining source code run on the same workstation (NVIDIA GTX 3090 GPU) as the method of the present invention. It can be seen that mIoU score of 92.46% of the method of the present invention is optimal for the same class of algorithms. Meanwhile, due to the light-weight structure of multi-scale layered input, double-attention jump and multi-scale layered output, the method can still process 22.7 frames of images per second under the condition of adopting multi-scale interaction and double attention, and the algorithm efficiency can meet the real-time requirement.
TABLE 2
Model | Image size | mIoU | Speed/(frame s-1) |
FCN | 640×360 | 67.76% | 5.8 |
UNet | 640×360 | 78.23% | 37.1 |
SegNet | 640×360 | 68.34% | 15.2 |
PSPNet | 640×360 | 85.40% | 3.4 |
DeepLabV3+ | 640×360 | 85.90% | 2.4 |
DANet | 640×360 | 84.58% | 8.1 |
modified DeepLabV3+ | 512×512 | 86.75% | 12.6 |
Hierarchical Attention | 640×360 | 88.19% | 15.3 |
HR-Net | 640×360 | 86.56% | 16.2 |
The method of the invention | 640×360 | 92.46% | 22.7 |
Furthermore, in order to verify the generalization performance of the method in various driving scenes, based on the model which has been trained in URDD data sets, the method also carries out a segmentation experiment which is not trained and directly tested on new collected data (two scenes including unstructured roads and structured roads) of a vehicle recorder and a semantic segmentation set in KITTI data sets of a public scene respectively. The method can simultaneously and effectively recommend the drivable area for the structured road scene and the unstructured road scene, and the integrated mIoU fraction on the collected sample data sets of a plurality of real vehicles under different scenes can reach 83.94 percent on average, so that the model has better generalization performance.
TABLE 3 Table 3
Experimental results prove that the method has high detection precision and generalization performance and high time efficiency, and the difficult problem of detecting the drivable area under different road scenes is effectively solved.
In summary, the invention provides a road drivable area fine recommendation method based on layered input and output and double attention skip, which uses a convolutional neural network and a double attention mechanism to pay attention to the high efficiency and accuracy of feature extraction, solves the detection problems of fuzzy aliasing of road area boundaries and complex and changeable driving working conditions of automobiles in actual driving tasks, and improves the detection precision and time efficiency of the road drivable area. The invention effectively considers the accuracy and the real-time performance, can obtain good fine detection effect under different real scenes, has the average intersection ratio reaching 92.46 percent and the average detection speed reaching 22.7 frames/second, effectively completes the fine detection task of the travelable area of the complex road, and has better generalization performance.
Claims (7)
1. A road drivable area fine recommendation method based on layered input and output and double-attention jump connection is characterized by comprising the following steps:
step (1): constructing a data set with a label, dividing the data set into a training set, a verification set and a test set, and preprocessing the data set;
Step (2): based on the U-shaped encoder-decoder structure, an M-shaped encoder-decoder network, namely an M 2 AttentionNet model, is constructed by adding three structures of multi-scale layered input, double-attention jumper and multi-scale layered output;
Step (3): constructing an inverted pyramid type layered input structure at the input end of the model encoder, namely constructing a multi-scale layered input structure, reserving shallow features on different scale levels by the multi-scale layered input structure, and fusing the shallow features and deep semantics layer by layer;
Step (4): constructing four levels in the M 2 AttentionNet encoder branch, and carrying out continuous feature extraction twice on each level by utilizing the combination operation of 3X 3Conv, BN and ReLU;
Step (5): the resolution of the same level is kept unchanged, and 2 multiplied by 2 max pooling is used between layers for downsampling;
Step (6): for the decoder branch, each layer uses Conv-BN-ReLU combination with the same parameters to perform continuous feature extraction twice, and 2X 2 up-sampling of nearest neighbor interpolation is performed between layers;
step (7): activating the final terminal of the decoder branch by using 1 multiplied by 1Conv, BN and Softmax to carry out quaternary classification, generating a prediction result of the same scale as an input image, wherein 4 categories respectively correspond to a strong recommended driving area, a weak recommended driving area, a non-recommended driving area and a background area in a driving scene;
step (8): designing an output structure of layered prediction and layered loss at the output end of the model;
Step (9): designing a double-attention jumper structure at a jumper connection part in the middle of an encoder-decoder;
Step (10): training the M 2 AttentionNet model by using a training set to obtain a model with well trained parameters; detecting the trained model by using the test set to obtain a road drivable area in a traffic scene;
Step (11): and acquiring real-time traffic scene data in actual driving, and inputting the real-time traffic scene data into a trained M 2 AttentionNet model to obtain recommended results of different driving areas.
2. The method for finely recommending a road drivable area based on layered input and output and double-attention jump according to claim 1, wherein the specific steps of the multi-scale layered input structure constructed at the input end of the model encoder in the step (3) are as follows:
Step (31): performing continuous maximum pooling downsampling on the image I to be detected to generate an image inverted pyramid { I,1/2I,1/4I,1/8I } with a decreasing scale;
step (32): and layering and merging the images with four scales into corresponding levels of the encoder branch, activating and extracting features through Conv, BN and ReLU, merging the features generated by the previous level with the feature map generated by the previous level in a channel dimension splicing mode, and inputting the feature map into a network encoder.
3. The method for finely recommending a road drivable area based on layered input/output and double-attention jump according to claim 1 or 2, wherein the specific steps of designing an output structure of layered prediction and layered loss at the model output end in the step (8) are as follows:
Step (81): each layer of the decoder branch is activated by up-sampling and convolution combination comprising 1×1Conv, BN and Softmax, a corresponding drivable region prediction map R s is output, the layer sequences s=1, 2,3,4, and the prediction maps of all layers are combined into a final drivable region prediction result;
Step (82): by single-hot encoding, the loss of all levels of the decoder branches is fused and calculated, and the level loss l s of the s-th layer is defined as:
Wherein, I is an input image, R opt is a true value, θ is a network parameter, N is the number of label categories, where N is 4; in the one-hot mode, for class k, Y k + and Y k - are the sets of pixels labeled positive (1) and negative (0) in their true values, x k is the predicted value, γ is the constant factor, ω is the balance factor;
step (83): the total loss function L of the calculation model is the sum of the four decoder level losses L s, l= Σl s.
4. The method for fine recommendation of a road drivable area based on hierarchical input/output and dual-attention jump according to claim 3, wherein in the step (8), the model output is designed with a hierarchical prediction and a hierarchical loss formula, γ=2, ω=0.55, and the loss function is a focusing loss.
5. The method for fine recommendation of a road drivable area based on hierarchical input/output and dual-attention jumping according to any one of claims 1,2 and 4, wherein the dual-attention jumping structure designed in step (9) comprises the specific steps of:
step (91): integrating a channel attention and space attention dual mechanism in the hierarchical jump process;
step (92): the feature map F w×h×c obtained by each level of the encoder is subjected to fine adjustment through a channel attention module and a space attention module in sequence;
Step (93): and performing channel dimension splicing on the feature map adjusted by the double-attention mechanism and the up-sampling feature map of the corresponding layer of the decoder to obtain a final output feature map F' w×h×c.
6. The method for fine recommendation of a road drivable area based on hierarchical input/output and double attention jumpers as set forth in any one of claims 1,2 and 4, wherein the specific parameters of step (10) are set as follows when the model is trained: in the training process, using a Glorot tool built in Keras to initialize all the parameters of the convolution layer, initializing the deviation of the parameters to 0, and updating and optimizing all the parameters by using a random gradient descent method; the Batchsize parameter is set to 64, the initial learning rate is 1e-4, the momentum is 0.9, and the increment is reduced by 1e-6 once each iteration; to prevent model overfitting, the input layer uses dropout at a rate of 0.1, the output layer uses dropout at a rate of 0.4, and an early stop strategy is employed to stop training in advance when it is detected that the validation set error is no longer decreasing for 20 iteration cycles.
7. The method for fine recommendation of a road drivable area based on layered input/output and double attention jump according to any one of claims 1,2 and 4, wherein step (10) adopts a ten-fold cross validation method when training the model, and amplifies the samples by using a horizontal flip, brightness adjustment and random noise preprocessing method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210366807.1A CN114674338B (en) | 2022-04-08 | 2022-04-08 | Fine recommendation method for road drivable area based on layered input and output and double-attention jump |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210366807.1A CN114674338B (en) | 2022-04-08 | 2022-04-08 | Fine recommendation method for road drivable area based on layered input and output and double-attention jump |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114674338A CN114674338A (en) | 2022-06-28 |
CN114674338B true CN114674338B (en) | 2024-05-07 |
Family
ID=82077498
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210366807.1A Active CN114674338B (en) | 2022-04-08 | 2022-04-08 | Fine recommendation method for road drivable area based on layered input and output and double-attention jump |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114674338B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345875A (en) * | 2018-04-08 | 2018-07-31 | 北京初速度科技有限公司 | Wheeled region detection model training method, detection method and device |
CN108985194A (en) * | 2018-06-29 | 2018-12-11 | 华南理工大学 | A kind of intelligent vehicle based on image, semantic segmentation can travel the recognition methods in region |
FR3092546A1 (en) * | 2019-02-13 | 2020-08-14 | Safran | Identification of rolling areas taking into account uncertainty by a deep learning method |
CN111882620A (en) * | 2020-06-19 | 2020-11-03 | 江苏大学 | Road drivable area segmentation method based on multi-scale information |
CN112639821A (en) * | 2020-05-11 | 2021-04-09 | 华为技术有限公司 | Method and system for detecting vehicle travelable area and automatic driving vehicle adopting system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102421855B1 (en) * | 2017-09-28 | 2022-07-18 | 삼성전자주식회사 | Method and apparatus of identifying driving lane |
-
2022
- 2022-04-08 CN CN202210366807.1A patent/CN114674338B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108345875A (en) * | 2018-04-08 | 2018-07-31 | 北京初速度科技有限公司 | Wheeled region detection model training method, detection method and device |
CN108985194A (en) * | 2018-06-29 | 2018-12-11 | 华南理工大学 | A kind of intelligent vehicle based on image, semantic segmentation can travel the recognition methods in region |
FR3092546A1 (en) * | 2019-02-13 | 2020-08-14 | Safran | Identification of rolling areas taking into account uncertainty by a deep learning method |
CN112639821A (en) * | 2020-05-11 | 2021-04-09 | 华为技术有限公司 | Method and system for detecting vehicle travelable area and automatic driving vehicle adopting system |
WO2021226776A1 (en) * | 2020-05-11 | 2021-11-18 | 华为技术有限公司 | Vehicle drivable area detection method, system, and automatic driving vehicle using system |
CN114282597A (en) * | 2020-05-11 | 2022-04-05 | 华为技术有限公司 | Method and system for detecting vehicle travelable area and automatic driving vehicle adopting system |
CN111882620A (en) * | 2020-06-19 | 2020-11-03 | 江苏大学 | Road drivable area segmentation method based on multi-scale information |
Non-Patent Citations (1)
Title |
---|
基于SegNet的非结构道路可行驶区域语义分割;张凯航;冀杰;蒋骆;周显林;;重庆大学学报;20200315(03);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114674338A (en) | 2022-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112200161B (en) | Face recognition detection method based on mixed attention mechanism | |
CN105160309B (en) | Three lanes detection method based on morphological image segmentation and region growing | |
CN108985194B (en) | Intelligent vehicle travelable area identification method based on image semantic segmentation | |
CN104246821B (en) | Three-dimensional body detection device and three-dimensional body detection method | |
US8487991B2 (en) | Clear path detection using a vanishing point | |
US20180124319A1 (en) | Method and apparatus for real-time traffic information provision | |
CN102044151A (en) | Night vehicle video detection method based on illumination visibility identification | |
CN112329533B (en) | Local road surface adhesion coefficient estimation method based on image segmentation | |
Cai et al. | Applying machine learning and google street view to explore effects of drivers’ visual environment on traffic safety | |
Zakaria et al. | Lane detection in autonomous vehicles: A systematic review | |
CN110532961A (en) | A kind of semantic traffic lights detection method based on multiple dimensioned attention mechanism network model | |
KR102377044B1 (en) | Apparatus and method for evaluating walking safety risk, and recording media recorded program realizing the same | |
CN114092917B (en) | MR-SSD-based shielded traffic sign detection method and system | |
CN212009589U (en) | Video identification driving vehicle track acquisition device based on deep learning | |
Kim et al. | Toward explainable and advisable model for self‐driving cars | |
CN111046723B (en) | Lane line detection method based on deep learning | |
CN114973199A (en) | Rail transit train obstacle detection method based on convolutional neural network | |
CN114674338B (en) | Fine recommendation method for road drivable area based on layered input and output and double-attention jump | |
CN115294545A (en) | Complex road surface lane identification method and chip based on deep learning | |
CN113945222B (en) | Road information identification method and device, electronic equipment, vehicle and medium | |
CN114911891A (en) | Method, system, storage medium and device for analyzing space quality of historical street | |
CN114882205A (en) | Target detection method based on attention mechanism | |
CN106354135A (en) | Lane keeping system and method based on Beidou high-precision positioning | |
de Mesquita et al. | Street pavement classification based on navigation through street view imagery | |
Kim | Explainable and Advisable Learning for Self-driving Vehicles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |