CN114674338B - Fine recommendation method for road drivable area based on layered input and output and double-attention jump - Google Patents

Fine recommendation method for road drivable area based on layered input and output and double-attention jump Download PDF

Info

Publication number
CN114674338B
CN114674338B CN202210366807.1A CN202210366807A CN114674338B CN 114674338 B CN114674338 B CN 114674338B CN 202210366807 A CN202210366807 A CN 202210366807A CN 114674338 B CN114674338 B CN 114674338B
Authority
CN
China
Prior art keywords
attention
output
model
road
layered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210366807.1A
Other languages
Chinese (zh)
Other versions
CN114674338A (en
Inventor
王雪玮
梁晓
李韶华
冯桂珍
闫德立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shijiazhuang Tiedao University
Original Assignee
Shijiazhuang Tiedao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shijiazhuang Tiedao University filed Critical Shijiazhuang Tiedao University
Priority to CN202210366807.1A priority Critical patent/CN114674338B/en
Publication of CN114674338A publication Critical patent/CN114674338A/en
Application granted granted Critical
Publication of CN114674338B publication Critical patent/CN114674338B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/3453Special cost functions, i.e. other than distance or default speed limit of road segments
    • G01C21/3461Preferred or disfavoured areas, e.g. dangerous zones, toll or emission zones, intersections, manoeuvre types, segments such as motorways, toll roads, ferries
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/28Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
    • G01C21/30Map- or contour-matching
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/36Input/output arrangements for on-board computers
    • G01C21/3602Input other than that of destination using image analysis, e.g. detection of road signs, lanes, buildings, real preceding vehicles using a camera
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a road drivable area fine recommendation method based on layered input and output and double-attention jump, which constructs an inverted pyramid type multi-scale layered input and layered output structure on the basis of a framework of an encoder-decoder so as to effectively integrate morphological characteristics and semantic information of a road; a jump connection structure integrating the channel attention and the space attention is constructed to realize accurate detection of different driving areas. The method is a drivable area recommending method which integrates a multi-scale interaction strategy and a dual attention mechanism under an M-shaped depth architecture, and aiming at complex roads with fuzzy boundaries and changeable road conditions, the method can divide the strongly recommended, weakly recommended and non-recommended drivable areas of the roads based on vision fine division in complex driving scenes so as to meet different detection requirements of intelligent automobiles on the drivable areas under normal, emergency and other complex driving conditions. The model provided gives consideration to the segmentation precision and the time efficiency, and has obvious advantages in the detection task of the travelable area of the complex road.

Description

Fine recommendation method for road drivable area based on layered input and output and double-attention jump
Technical Field
The invention relates to a road drivable area recommending method, in particular to a road drivable area fine recommending method based on layered input and output and double-attention jump connection, which belongs to the technical field of automatic driving and computer vision and particularly relates to a road drivable area detecting method.
Background
In vision-based automatic driving environment sensing, an intelligent automobile detects a road area available for driving by screening traffic element information such as road surfaces, vehicles, pedestrians, positive and negative barriers and the like of a front scene, and can provide important information support for further path planning and driving decision. For structured roads with good pavement, clear lanes and clear boundaries, the automatic driving at the present stage can realize effective detection of the drivable area. However, for roads with lower structuring degree such as suburb non-main roads and rural streets, the prominent problems of fuzzy lanes and boundaries, strong randomness of participants and the like exist, so that the feature information of the non-structured roads is difficult to effectively capture by a travelable region detection algorithm which is provided for the structured roads, the accuracy and the instantaneity of detection tasks are seriously influenced, and even failure is caused. More importantly, for safety reasons, a human driver may choose to run emergency on a flat area that is not normally considered a road in an emergency situation. In view of the particularly complex and changeable road conditions of unstructured roads, the intelligent automobile is more suitable for the capability of a human driver for coping with emergency working conditions. Therefore, the method for carrying out diversified segmentation and refined recommendation on the drivable area of the complex road so as to adapt to different driving conditions is a key task which is critical to the driving safety of the intelligent automobile.
Currently, vision-based detection methods for a complex road drivable area can be mainly classified into three types: based on appearance descriptions, based on geometric information and based on semantic segmentation. The method based on the appearance description is extremely easy to be interfered by factors such as illumination change, pavement shading and the like due to single appearance characteristics; many researches integrate geometric information of roads on the basis of appearance description, but when the geometric constraint condition of a scene cannot be met or the quality of three-dimensional data used is low, the performance of a method based on the geometric information is seriously degraded; the existing data based on the semantic segmentation method are mostly obtained under the specific running environment of foreign countries, are not completely consistent with complex road conditions of China, and the algorithm performs semantic segmentation on all objects in a scene, so that the redundancy is high, the focusing is insufficient, and the accuracy of the extracted running area is limited. In addition, no matter based on the appearance description, the geometric information or the semantic segmentation method, the existing travelable region detection model mostly only extracts a single road region as a travelable region, and can not give consideration to the normal driving working condition and the emergency driving working condition, so that the method is difficult to adapt to the changeable road conditions of the complex road. Therefore, in the actual automatic driving task at present, a method for recommending a drivable area which takes complex traffic scenes in China into consideration and simultaneously extracts different areas of a road and gives consideration to normal driving conditions and emergency driving conditions is needed.
Related patent literature: CN113223313a discloses a lane recommendation method, a lane recommendation device and a vehicle-mounted communication device, and the lane recommendation method comprises the following steps: acquiring lane information of a road where a target vehicle is currently located through a vehicle-mounted communication technology; receiving vehicle data of surrounding vehicles of the target vehicle through an on-vehicle communication technology; determining the positions of surrounding vehicles in a lane according to the lane information and the vehicle data; determining the driving parameters of each lane to be selected by the target vehicle according to the vehicle data and the positions of surrounding vehicles in the lanes; and determining the passing time length of the target vehicle passing through a preset road section according to the running parameters of each lane to be selected of the target vehicle, and recommending the lanes according to the determined passing time length. CN112857381a discloses a path recommending method, device and readable medium. The method comprises the following steps: the method comprises the steps of identifying a target object with congestion characteristics in an acquired image, determining road condition information of a driving road according to the target object and current navigation data, generating an alternative path according to the road condition information, and recommending the path, so that navigation can obtain more specific and accurate road condition information in time, and the problem that an incorrect path cannot be corrected in time is solved.
The specific guidance scheme is not provided for the method for recommending the road drivable area by the technologies on how to solve the detection problems of fuzzy aliasing of the road area boundary and complex and changeable driving working conditions of an automobile in an actual driving task and improve the detection precision and time efficiency of the road drivable area.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention aims to provide a road drivable area fine recommendation method based on layered input and output and double-attention jump connection, which focuses on the high efficiency and accuracy of feature extraction, effectively considers both the accuracy and the real-time performance, solves the detection problems of fuzzy aliasing of the road area boundary and complex and changeable driving working conditions of an automobile in an actual driving task, and improves the detection precision and the time efficiency of the road drivable area.
In order to solve the technical problems, the invention adopts the following technical scheme:
a fine recommendation method for a road drivable area based on layered input and output and double-attention jump (or a recommendation method for the road drivable area based on M-shaped depth architecture) is characterized by comprising the following steps:
step (1): constructing a data set with a label, dividing the data set into a training set, a verification set and a test set, and preprocessing the data set;
further, the preferred technical scheme may be: the step of constructing the labeled dataset in the step (1) is as follows:
step (101): labeling, merging and modifying the existing complex road driving scene image to enable the complex road driving scene image to accord with 4 types of driving area detection tasks of strong recommendation, weak recommendation, non-recommendation and background, wherein the part of the sample is marked as IDD_ unst;
Step (102): the method comprises the steps that an all-terrain intelligent experiment vehicle is utilized, a vehicle-mounted camera is utilized to collect images of complex roads in a closed/semi-closed park in constant-speed running, driving scene images of the complex roads are marked correspondingly, and a sample of the complex roads is marked as Campus_ unst;
step (103): and acquiring and marking complex road driving scene images of suburbs, villages and other places in China by using a vehicle recorder of a common passenger vehicle, wherein the part of the images are marked as China_ unst.
Step (2): based on the U-shaped encoder-decoder structure, an M-shaped encoder-decoder network, namely an M 2 AttentionNet model, is constructed by adding three large structures of multi-scale hierarchical input, double-attention jumper and multi-scale hierarchical output.
Step (3): constructing an inverted pyramid type layered input structure at the input end of the model encoder, namely constructing a multi-scale layered input structure, reserving shallow features on different scale levels by the multi-scale layered input structure, and fusing the shallow features and deep semantics layer by layer;
Further, the preferred technical scheme may be: the multi-scale hierarchical input structure constructed by the input end of the model encoder comprises the following steps:
Step (31): performing continuous maximum pooling downsampling on the image I to be detected to generate an image inverted pyramid { I,1/2I,1/4I,1/8I } with a decreasing scale;
step (32): and layering and merging the images with four scales into corresponding levels of the encoder branch, activating and extracting features through Conv, BN and ReLU, merging the features generated by the previous level with the feature map generated by the previous level in a channel dimension splicing mode, and inputting the feature map into a network encoder.
Step (4): four levels were constructed at the M 2 AttentionNet encoder arm, with two successive feature extractions at each level using a combination of 3 x 3Conv, BN and ReLU operations.
Step (5): the resolution of the same level is kept unchanged, and 2 x 2 max pooling is used between layers for downsampling.
Step (6): for the decoder branches, each layer uses Conv-BN-ReLU combinations of the same parameters for two consecutive feature extractions, 2 x 2 upsampling for nearest neighbor interpolation between layers.
Step (7): and activating the final terminal of the decoder branch by using 1 multiplied by 1Conv, BN and Softmax to carry out quaternary classification, generating a prediction result of the scale of an input image and the like, wherein 4 categories respectively correspond to a strong recommended driving area, a weak recommended driving area, a non-recommended driving area and a background area in the driving scene.
Step (8): designing an output structure of layered prediction and layered loss at the output end of the model;
further, the preferred technical scheme may be: the specific steps and formulas of the output structure of the hierarchical prediction and the hierarchical loss (or the specific steps of the multi-scale hierarchical output constructed by the model decoder branch) are designed at the model output end are as follows:
step (81): outputting a corresponding travelable region prediction map R s (layer sequence s=1, 2,3, 4) at each layer of the decoder branch by up-sampling and convolution combination (including 1×1Conv, BN and Softmax activation), and merging the prediction maps of all layers into a final travelable region prediction result;
Step (82): by single-hot encoding, the loss of all levels of the decoder branches is fused and calculated, and the level loss l s of the s-th layer is defined as:
Wherein, I is an input image, R opt is a true value, θ is a network parameter, N is the number of label categories, where N is 4; in the one-hot mode, for class k, Y k + and Y k - are the sets of pixels labeled positive (1) and negative (0) in their true values, x k is the predicted value, γ is the constant factor, ω is the balance factor;
Step (83): the total loss function L of the calculation model is the sum of four decoder level losses L s, l= Σl s;
Further, the preferred technical scheme may be: in the step (8), in the formula of designing the hierarchical prediction and the hierarchical loss at the output end of the model, γ=2, ω=0.55, and the loss function is a focusing loss.
Step (9): the jump connection part in the middle of the encoder-decoder is designed into a double-attention jump connection structure, and the specific steps (the preferred technical scheme) are as follows:
step (91): integrating a channel attention and space attention dual mechanism in the hierarchical jump process;
step (92): the feature map F w×h×c obtained by each level of the encoder is subjected to fine adjustment through a channel attention module and a space attention module in sequence;
Step (93): and performing channel dimension splicing on the feature map adjusted by the double-attention mechanism and the up-sampling feature map of the corresponding layer of the decoder to obtain a final output feature map F' w×h×c.
Step (10): training the M 2 AttentionNet model by using a training set to obtain a model with well trained parameters; detecting the trained model by using the test set to obtain a road drivable area in a complex traffic scene;
further, the preferred technical scheme may be: step (10), when the model is trained, specific parameters are set as follows: in the training process, using a Glorot tool built in Keras to initialize all the parameters of the convolution layer, initializing the deviation of the parameters to 0, and updating and optimizing all the parameters by using a random gradient descent method; the Batchsize parameter is set to 64, the initial learning rate is 1e-4, the momentum is 0.9, and the increment is reduced by 1e-6 once each iteration; to prevent model overfitting, the input layer uses dropout at a rate of 0.1, the output layer uses dropout at a rate of 0.4, and an early stop strategy is employed to stop training in advance when it is detected that the validation set error is no longer decreasing for 20 iteration cycles. And (10) adopting a ten-fold cross-validation method when the model is trained, and amplifying the sample by using a horizontal overturning, brightness adjusting and random noise preprocessing method.
Step (11): and acquiring real-time traffic scene data in actual driving, and inputting the real-time traffic scene data into a trained M 2 AttentionNet model to obtain recommended results of different driving areas.
The invention discloses a method for recommending a drivable area by fusing a multi-scale interaction strategy and a dual-attention mechanism under an M-shaped depth architecture, which aims at complex roads with fuzzy boundaries and changeable road conditions, and can divide the strongly recommended, weakly recommended and non-recommended drivable areas of the roads based on visual fine division in complex driving scenes so as to meet different detection requirements of intelligent automobiles on the drivable areas under normal, emergency and other complex driving conditions. Firstly, constructing an inverted pyramid type multi-scale layered input and layered output structure on the basis of a skeleton of an encoder-decoder so as to effectively fuse morphological characteristics and semantic information of a road; secondly, a jump connection structure integrating the channel attention and the space attention is constructed to realize accurate detection of different driving areas. The method can better realize the fine segmentation of the strong recommended driving area, the weak recommended driving area, the non-recommended driving area and the background area under various real driving scenes. Compared with other existing main stream models, the model provided gives consideration to the segmentation precision and the time efficiency, and has obvious advantages in the detection task of the travelable area of the complex road.
In general, compared with the prior art, the technical scheme designed by the invention has the following technical characteristics and beneficial effects:
(1) The road drivable region segmentation model M 2 AttentionNet integrating the multi-scale interaction strategy and the dual-attention mechanism can accurately segment a real driving scene image of a road into a strong recommended driving region, a weak recommended driving region, an un-recommended driving region and a background region, can cope with special driving conditions such as narrow road meeting, emergency avoidance and the like, and effectively adapts to changeable road conditions of different roads.
(2) According to the invention, three structures of multi-scale layered input, double-attention jump connection and multi-scale layered output are designed on an encoder-decoder framework, an M-shaped deep convolutional neural network framework is constructed, shallow layer characteristics and deep semantics are effectively fused, model prediction bias on different scales is balanced, the learning process is focused on important characteristics related to road travelling performance, and the model performance is effectively improved; the invention effectively considers the accuracy and the real-time performance, can obtain good fine detection effect under different real scenes, has the average intersection ratio reaching 92.46 percent and the average detection speed reaching 22.7 frames/second, effectively completes the fine detection task of the travelable area of the complex road, and has better generalization performance.
In summary, the invention provides a road drivable region fine recommendation method based on layered input and output and double attention skip, which utilizes a convolutional neural network and a double attention mechanism to pay attention to the high efficiency and accuracy of feature extraction, effectively considers both accuracy and instantaneity, solves the detection problems of fuzzy aliasing of road region boundaries and complex and changeable driving working conditions of automobiles in actual driving tasks, and improves the detection precision and time efficiency of the road drivable region.
Drawings
Fig. 1 is an M-shaped architecture schematic diagram of an M 2 AttentionNet model according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of fine recommendation of a road drivable area according to an embodiment of the present invention, and fig. 2 (a) is a schematic diagram of driving scenario 1, and (b) is a schematic diagram of fine recommendation of a road drivable area according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a dual-attention jumper module according to an embodiment of the present invention.
Fig. 4 is a schematic diagram comparing the detection result and the manual detection result of the method according to the embodiment of the present invention, where (c) in fig. 4 is an input image (schematic diagram of driving scene 2), (d) is a manual detection result diagram (extraction result diagram), and (e) is a detection result diagram (extraction result diagram) of the method according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
As shown in fig. 1, fig. 1 is an M-shaped architecture schematic diagram of an M 2 AttentionNet model provided by an embodiment of the present invention, and a target to be achieved is shown in fig. 2, and a method for finely recommending a road drivable area based on hierarchical input and output and double-attention jumper specifically includes the following steps:
step (1): constructing a data set with a label, dividing the data set into a training set, a verification set and a test set, and preprocessing the data set;
further, the step of constructing the labeled dataset in the step (1) is as follows:
step (101): marking, merging and modifying the existing road driving scene image to enable the road driving scene image to accord with 4 types of driving area detection tasks of strong recommendation, weak recommendation, non-recommendation and background, wherein the part of the sample is marked as IDD_ unst;
Step (102): the method comprises the steps that an all-terrain intelligent experiment vehicle is utilized, a vehicle-mounted camera is utilized to collect images of roads in a closed/semi-closed park in constant-speed running, driving scene images of the roads are marked correspondingly, and a sample of the road is marked as Campus_ unst;
step (103): and acquiring and marking complex road driving scene images of suburbs, villages and other places in China by using a vehicle recorder of a common passenger vehicle, wherein the part of the images are marked as China_ unst.
Step (2): based on the U-shaped encoder-decoder structure, an M-shaped encoder-decoder network, namely an M 2 AttentionNet model, is constructed by adding three large structures of multi-scale hierarchical input, double-attention jumper and multi-scale hierarchical output.
Step (3): constructing an inverted pyramid type layered input structure at the input end of the model encoder, namely constructing a multi-scale layered input structure, reserving shallow features on different scale levels by the multi-scale layered input structure, and fusing the shallow features and deep semantics layer by layer; the multi-scale hierarchical input structure constructed by the input end of the model encoder comprises the following steps:
Step (31): performing continuous maximum pooling downsampling on the image I to be detected to generate an image inverted pyramid { I,1/2I,1/4I,1/8I } with a decreasing scale;
step (32): and layering and merging the images with four scales into corresponding levels of the encoder branch, activating and extracting features through Conv, BN and ReLU, merging the features generated by the previous level with the feature map generated by the previous level in a channel dimension splicing mode, and inputting the feature map into a network encoder.
Step (4): four levels were constructed at the M 2 AttentionNet encoder arm, with two successive feature extractions at each level using a combination of 3 x 3Conv, BN and ReLU operations.
Step (5): the resolution of the same level is kept unchanged, and 2 x 2 max pooling is used between layers for downsampling.
Step (6): for the decoder branches, each layer uses Conv-BN-ReLU combinations of the same parameters for two consecutive feature extractions, 2 x 2 upsampling for nearest neighbor interpolation between layers.
Step (7): and activating the final terminal of the decoder branch by using 1 multiplied by 1Conv, BN and Softmax to carry out quaternary classification, generating a prediction result of the scale of an input image and the like, wherein 4 categories respectively correspond to a strong recommended driving area, a weak recommended driving area, a non-recommended driving area and a background area in the driving scene.
Step (8): the output structure of the hierarchical prediction and the hierarchical loss is designed at the output end of the model, and the specific steps of the output structure (or the specific steps of the multi-scale hierarchical output constructed by the model decoder branch) are as follows:
step (81): outputting a corresponding travelable region prediction map R s (layer sequence s=1, 2,3, 4) at each layer of the decoder branch by up-sampling and convolution combination (including 1×1Conv, BN and Softmax activation), and merging the prediction maps of all layers into a final travelable region prediction result;
Step (82): by single-hot encoding, the loss of all levels of the decoder branches is fused and calculated, and the level loss l s of the s-th layer is defined as:
Wherein, I is an input image, R opt is a true value, θ is a network parameter, N is the number of label categories, where N is 4; in the one-hot mode, for class k, Y k + and Y k - are the sets of pixels labeled positive (1) and negative (0) in their true values, x k is the predicted value, γ is a constant factor, and ω is the balance factor, respectively. In the formula, γ=2 (a possible value), ω=0.55 (a possible value), and the loss function is a focus loss.
Step (83): the total loss function L of the calculation model is the sum of the four decoder level losses L s, l= Σl s.
Step (9): the jump connection part in the middle of the encoder-decoder is designed into a double-attention jump connection structure, as shown in fig. 3, and the specific steps are as follows:
step (91): integrating a channel attention and space attention dual mechanism in the hierarchical jump process;
step (92): the feature map F w×h×c obtained by each level of the encoder is subjected to fine adjustment through a channel attention module and a space attention module in sequence;
Step (93): and performing channel dimension splicing on the feature map adjusted by the double-attention mechanism and the up-sampling feature map of the corresponding layer of the decoder to obtain a final output feature map F' w×h×c.
Step (10): training the M 2 AttentionNet model by using a training set to obtain a model with well trained parameters; and detecting the trained model by using the test set to obtain the road drivable area in the complex traffic scene. Step (10), when the model is trained, specific parameters are set as follows: in the training process, using a Glorot tool built in Keras to initialize all the parameters of the convolution layer, initializing the deviation of the parameters to 0, and updating and optimizing all the parameters by using a random gradient descent method; the Batchsize parameter is set to 64, the initial learning rate is 1e-4, the momentum is 0.9, and the increment is reduced by 1e-6 once each iteration; to prevent model overfitting, the input layer uses dropout at a rate of 0.1, the output layer uses dropout at a rate of 0.4, and an early stop strategy is employed to stop training in advance when it is detected that the validation set error is no longer decreasing for 20 iteration cycles. And (10) adopting a ten-fold cross-validation method when the model is trained, and amplifying the sample by using a horizontal overturning, brightness adjusting and random noise preprocessing method.
Step (11): and acquiring real-time traffic scene data in actual driving, and inputting the real-time traffic scene data into a trained M 2 AttentionNet model to obtain recommended results of different driving areas.
As shown in FIG. 4, FIG. 4 is a schematic diagram showing the comparison of the detection result and the manual detection result of the method of the present invention.
Further, the method of the present invention conducted more extensive detection and extraction experiments on the published dataset IDD and the constructed dataset URDD, including experiments on both structured and unstructured roads, and quantitatively compared with 9 representative methods (all of which are known techniques) published in 2015-2021, FCN model, UNet model, segNet model, PSPNet model, deeplabV3 +model, DANet model, modified DeeplabV3 +model, HIERARCHICAL ATTENTION model, HR-Net model, etc., which were well-known in the art, under the same conditions. The comparison uses 2 pixel level evaluation indexes: the cross-over ratios (IoU) and the comprehensive average cross-over ratio (mIoU) for each class are defined in Table 1. Wherein IoU is the overlap ratio of a model detection region (R k) and its true region (R k opt) of a certain class, i.e. the ratio of intersection to union. mIoU is the average IoU index of the full class. The higher the values IoU and mIoU, the stronger the segmentation performance of the representative model.
Table 1 Algorithm Performance evaluation index
Table 2 shows the accuracy and efficiency achieved by the different models on URDD datasets. All methods of obtaining source code run on the same workstation (NVIDIA GTX 3090 GPU) as the method of the present invention. It can be seen that mIoU score of 92.46% of the method of the present invention is optimal for the same class of algorithms. Meanwhile, due to the light-weight structure of multi-scale layered input, double-attention jump and multi-scale layered output, the method can still process 22.7 frames of images per second under the condition of adopting multi-scale interaction and double attention, and the algorithm efficiency can meet the real-time requirement.
TABLE 2
Model Image size mIoU Speed/(frame s-1)
FCN 640×360 67.76% 5.8
UNet 640×360 78.23% 37.1
SegNet 640×360 68.34% 15.2
PSPNet 640×360 85.40% 3.4
DeepLabV3+ 640×360 85.90% 2.4
DANet 640×360 84.58% 8.1
modified DeepLabV3+ 512×512 86.75% 12.6
Hierarchical Attention 640×360 88.19% 15.3
HR-Net 640×360 86.56% 16.2
The method of the invention 640×360 92.46% 22.7
Furthermore, in order to verify the generalization performance of the method in various driving scenes, based on the model which has been trained in URDD data sets, the method also carries out a segmentation experiment which is not trained and directly tested on new collected data (two scenes including unstructured roads and structured roads) of a vehicle recorder and a semantic segmentation set in KITTI data sets of a public scene respectively. The method can simultaneously and effectively recommend the drivable area for the structured road scene and the unstructured road scene, and the integrated mIoU fraction on the collected sample data sets of a plurality of real vehicles under different scenes can reach 83.94 percent on average, so that the model has better generalization performance.
TABLE 3 Table 3
Experimental results prove that the method has high detection precision and generalization performance and high time efficiency, and the difficult problem of detecting the drivable area under different road scenes is effectively solved.
In summary, the invention provides a road drivable area fine recommendation method based on layered input and output and double attention skip, which uses a convolutional neural network and a double attention mechanism to pay attention to the high efficiency and accuracy of feature extraction, solves the detection problems of fuzzy aliasing of road area boundaries and complex and changeable driving working conditions of automobiles in actual driving tasks, and improves the detection precision and time efficiency of the road drivable area. The invention effectively considers the accuracy and the real-time performance, can obtain good fine detection effect under different real scenes, has the average intersection ratio reaching 92.46 percent and the average detection speed reaching 22.7 frames/second, effectively completes the fine detection task of the travelable area of the complex road, and has better generalization performance.

Claims (7)

1. A road drivable area fine recommendation method based on layered input and output and double-attention jump connection is characterized by comprising the following steps:
step (1): constructing a data set with a label, dividing the data set into a training set, a verification set and a test set, and preprocessing the data set;
Step (2): based on the U-shaped encoder-decoder structure, an M-shaped encoder-decoder network, namely an M 2 AttentionNet model, is constructed by adding three structures of multi-scale layered input, double-attention jumper and multi-scale layered output;
Step (3): constructing an inverted pyramid type layered input structure at the input end of the model encoder, namely constructing a multi-scale layered input structure, reserving shallow features on different scale levels by the multi-scale layered input structure, and fusing the shallow features and deep semantics layer by layer;
Step (4): constructing four levels in the M 2 AttentionNet encoder branch, and carrying out continuous feature extraction twice on each level by utilizing the combination operation of 3X 3Conv, BN and ReLU;
Step (5): the resolution of the same level is kept unchanged, and 2 multiplied by 2 max pooling is used between layers for downsampling;
Step (6): for the decoder branch, each layer uses Conv-BN-ReLU combination with the same parameters to perform continuous feature extraction twice, and 2X 2 up-sampling of nearest neighbor interpolation is performed between layers;
step (7): activating the final terminal of the decoder branch by using 1 multiplied by 1Conv, BN and Softmax to carry out quaternary classification, generating a prediction result of the same scale as an input image, wherein 4 categories respectively correspond to a strong recommended driving area, a weak recommended driving area, a non-recommended driving area and a background area in a driving scene;
step (8): designing an output structure of layered prediction and layered loss at the output end of the model;
Step (9): designing a double-attention jumper structure at a jumper connection part in the middle of an encoder-decoder;
Step (10): training the M 2 AttentionNet model by using a training set to obtain a model with well trained parameters; detecting the trained model by using the test set to obtain a road drivable area in a traffic scene;
Step (11): and acquiring real-time traffic scene data in actual driving, and inputting the real-time traffic scene data into a trained M 2 AttentionNet model to obtain recommended results of different driving areas.
2. The method for finely recommending a road drivable area based on layered input and output and double-attention jump according to claim 1, wherein the specific steps of the multi-scale layered input structure constructed at the input end of the model encoder in the step (3) are as follows:
Step (31): performing continuous maximum pooling downsampling on the image I to be detected to generate an image inverted pyramid { I,1/2I,1/4I,1/8I } with a decreasing scale;
step (32): and layering and merging the images with four scales into corresponding levels of the encoder branch, activating and extracting features through Conv, BN and ReLU, merging the features generated by the previous level with the feature map generated by the previous level in a channel dimension splicing mode, and inputting the feature map into a network encoder.
3. The method for finely recommending a road drivable area based on layered input/output and double-attention jump according to claim 1 or 2, wherein the specific steps of designing an output structure of layered prediction and layered loss at the model output end in the step (8) are as follows:
Step (81): each layer of the decoder branch is activated by up-sampling and convolution combination comprising 1×1Conv, BN and Softmax, a corresponding drivable region prediction map R s is output, the layer sequences s=1, 2,3,4, and the prediction maps of all layers are combined into a final drivable region prediction result;
Step (82): by single-hot encoding, the loss of all levels of the decoder branches is fused and calculated, and the level loss l s of the s-th layer is defined as:
Wherein, I is an input image, R opt is a true value, θ is a network parameter, N is the number of label categories, where N is 4; in the one-hot mode, for class k, Y k + and Y k - are the sets of pixels labeled positive (1) and negative (0) in their true values, x k is the predicted value, γ is the constant factor, ω is the balance factor;
step (83): the total loss function L of the calculation model is the sum of the four decoder level losses L s, l= Σl s.
4. The method for fine recommendation of a road drivable area based on hierarchical input/output and dual-attention jump according to claim 3, wherein in the step (8), the model output is designed with a hierarchical prediction and a hierarchical loss formula, γ=2, ω=0.55, and the loss function is a focusing loss.
5. The method for fine recommendation of a road drivable area based on hierarchical input/output and dual-attention jumping according to any one of claims 1,2 and 4, wherein the dual-attention jumping structure designed in step (9) comprises the specific steps of:
step (91): integrating a channel attention and space attention dual mechanism in the hierarchical jump process;
step (92): the feature map F w×h×c obtained by each level of the encoder is subjected to fine adjustment through a channel attention module and a space attention module in sequence;
Step (93): and performing channel dimension splicing on the feature map adjusted by the double-attention mechanism and the up-sampling feature map of the corresponding layer of the decoder to obtain a final output feature map F' w×h×c.
6. The method for fine recommendation of a road drivable area based on hierarchical input/output and double attention jumpers as set forth in any one of claims 1,2 and 4, wherein the specific parameters of step (10) are set as follows when the model is trained: in the training process, using a Glorot tool built in Keras to initialize all the parameters of the convolution layer, initializing the deviation of the parameters to 0, and updating and optimizing all the parameters by using a random gradient descent method; the Batchsize parameter is set to 64, the initial learning rate is 1e-4, the momentum is 0.9, and the increment is reduced by 1e-6 once each iteration; to prevent model overfitting, the input layer uses dropout at a rate of 0.1, the output layer uses dropout at a rate of 0.4, and an early stop strategy is employed to stop training in advance when it is detected that the validation set error is no longer decreasing for 20 iteration cycles.
7. The method for fine recommendation of a road drivable area based on layered input/output and double attention jump according to any one of claims 1,2 and 4, wherein step (10) adopts a ten-fold cross validation method when training the model, and amplifies the samples by using a horizontal flip, brightness adjustment and random noise preprocessing method.
CN202210366807.1A 2022-04-08 2022-04-08 Fine recommendation method for road drivable area based on layered input and output and double-attention jump Active CN114674338B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210366807.1A CN114674338B (en) 2022-04-08 2022-04-08 Fine recommendation method for road drivable area based on layered input and output and double-attention jump

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210366807.1A CN114674338B (en) 2022-04-08 2022-04-08 Fine recommendation method for road drivable area based on layered input and output and double-attention jump

Publications (2)

Publication Number Publication Date
CN114674338A CN114674338A (en) 2022-06-28
CN114674338B true CN114674338B (en) 2024-05-07

Family

ID=82077498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210366807.1A Active CN114674338B (en) 2022-04-08 2022-04-08 Fine recommendation method for road drivable area based on layered input and output and double-attention jump

Country Status (1)

Country Link
CN (1) CN114674338B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345875A (en) * 2018-04-08 2018-07-31 北京初速度科技有限公司 Wheeled region detection model training method, detection method and device
CN108985194A (en) * 2018-06-29 2018-12-11 华南理工大学 A kind of intelligent vehicle based on image, semantic segmentation can travel the recognition methods in region
FR3092546A1 (en) * 2019-02-13 2020-08-14 Safran Identification of rolling areas taking into account uncertainty by a deep learning method
CN111882620A (en) * 2020-06-19 2020-11-03 江苏大学 Road drivable area segmentation method based on multi-scale information
CN112639821A (en) * 2020-05-11 2021-04-09 华为技术有限公司 Method and system for detecting vehicle travelable area and automatic driving vehicle adopting system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102421855B1 (en) * 2017-09-28 2022-07-18 삼성전자주식회사 Method and apparatus of identifying driving lane

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345875A (en) * 2018-04-08 2018-07-31 北京初速度科技有限公司 Wheeled region detection model training method, detection method and device
CN108985194A (en) * 2018-06-29 2018-12-11 华南理工大学 A kind of intelligent vehicle based on image, semantic segmentation can travel the recognition methods in region
FR3092546A1 (en) * 2019-02-13 2020-08-14 Safran Identification of rolling areas taking into account uncertainty by a deep learning method
CN112639821A (en) * 2020-05-11 2021-04-09 华为技术有限公司 Method and system for detecting vehicle travelable area and automatic driving vehicle adopting system
WO2021226776A1 (en) * 2020-05-11 2021-11-18 华为技术有限公司 Vehicle drivable area detection method, system, and automatic driving vehicle using system
CN114282597A (en) * 2020-05-11 2022-04-05 华为技术有限公司 Method and system for detecting vehicle travelable area and automatic driving vehicle adopting system
CN111882620A (en) * 2020-06-19 2020-11-03 江苏大学 Road drivable area segmentation method based on multi-scale information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于SegNet的非结构道路可行驶区域语义分割;张凯航;冀杰;蒋骆;周显林;;重庆大学学报;20200315(03);全文 *

Also Published As

Publication number Publication date
CN114674338A (en) 2022-06-28

Similar Documents

Publication Publication Date Title
CN112200161B (en) Face recognition detection method based on mixed attention mechanism
CN105160309B (en) Three lanes detection method based on morphological image segmentation and region growing
CN108985194B (en) Intelligent vehicle travelable area identification method based on image semantic segmentation
CN104246821B (en) Three-dimensional body detection device and three-dimensional body detection method
US8487991B2 (en) Clear path detection using a vanishing point
US20180124319A1 (en) Method and apparatus for real-time traffic information provision
CN102044151A (en) Night vehicle video detection method based on illumination visibility identification
CN112329533B (en) Local road surface adhesion coefficient estimation method based on image segmentation
Cai et al. Applying machine learning and google street view to explore effects of drivers’ visual environment on traffic safety
Zakaria et al. Lane detection in autonomous vehicles: A systematic review
CN110532961A (en) A kind of semantic traffic lights detection method based on multiple dimensioned attention mechanism network model
KR102377044B1 (en) Apparatus and method for evaluating walking safety risk, and recording media recorded program realizing the same
CN114092917B (en) MR-SSD-based shielded traffic sign detection method and system
CN212009589U (en) Video identification driving vehicle track acquisition device based on deep learning
Kim et al. Toward explainable and advisable model for self‐driving cars
CN111046723B (en) Lane line detection method based on deep learning
CN114973199A (en) Rail transit train obstacle detection method based on convolutional neural network
CN114674338B (en) Fine recommendation method for road drivable area based on layered input and output and double-attention jump
CN115294545A (en) Complex road surface lane identification method and chip based on deep learning
CN113945222B (en) Road information identification method and device, electronic equipment, vehicle and medium
CN114911891A (en) Method, system, storage medium and device for analyzing space quality of historical street
CN114882205A (en) Target detection method based on attention mechanism
CN106354135A (en) Lane keeping system and method based on Beidou high-precision positioning
de Mesquita et al. Street pavement classification based on navigation through street view imagery
Kim Explainable and Advisable Learning for Self-driving Vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant