CN114674338B

CN114674338B - Fine recommendation method for road drivable area based on layered input and output and double-attention jump

Info

Publication number: CN114674338B
Application number: CN202210366807.1A
Authority: CN
Inventors: 王雪玮; 梁晓; 李韶华; 冯桂珍; 闫德立
Original assignee: Shijiazhuang Tiedao University
Current assignee: Shijiazhuang Tiedao University
Priority date: 2022-04-08
Filing date: 2022-04-08
Publication date: 2024-05-07
Anticipated expiration: 2042-04-08
Also published as: CN114674338A

Abstract

The invention discloses a road drivable area fine recommendation method based on layered input and output and double-attention jump, which constructs an inverted pyramid type multi-scale layered input and layered output structure on the basis of a framework of an encoder-decoder so as to effectively integrate morphological characteristics and semantic information of a road; a jump connection structure integrating the channel attention and the space attention is constructed to realize accurate detection of different driving areas. The method is a drivable area recommending method which integrates a multi-scale interaction strategy and a dual attention mechanism under an M-shaped depth architecture, and aiming at complex roads with fuzzy boundaries and changeable road conditions, the method can divide the strongly recommended, weakly recommended and non-recommended drivable areas of the roads based on vision fine division in complex driving scenes so as to meet different detection requirements of intelligent automobiles on the drivable areas under normal, emergency and other complex driving conditions. The model provided gives consideration to the segmentation precision and the time efficiency, and has obvious advantages in the detection task of the travelable area of the complex road.

Description

Fine recommendation method for road drivable area based on layered input and output and double-attention jump

Technical Field

The invention relates to a road drivable area recommending method, in particular to a road drivable area fine recommending method based on layered input and output and double-attention jump connection, which belongs to the technical field of automatic driving and computer vision and particularly relates to a road drivable area detecting method.

Background

In vision-based automatic driving environment sensing, an intelligent automobile detects a road area available for driving by screening traffic element information such as road surfaces, vehicles, pedestrians, positive and negative barriers and the like of a front scene, and can provide important information support for further path planning and driving decision. For structured roads with good pavement, clear lanes and clear boundaries, the automatic driving at the present stage can realize effective detection of the drivable area. However, for roads with lower structuring degree such as suburb non-main roads and rural streets, the prominent problems of fuzzy lanes and boundaries, strong randomness of participants and the like exist, so that the feature information of the non-structured roads is difficult to effectively capture by a travelable region detection algorithm which is provided for the structured roads, the accuracy and the instantaneity of detection tasks are seriously influenced, and even failure is caused. More importantly, for safety reasons, a human driver may choose to run emergency on a flat area that is not normally considered a road in an emergency situation. In view of the particularly complex and changeable road conditions of unstructured roads, the intelligent automobile is more suitable for the capability of a human driver for coping with emergency working conditions. Therefore, the method for carrying out diversified segmentation and refined recommendation on the drivable area of the complex road so as to adapt to different driving conditions is a key task which is critical to the driving safety of the intelligent automobile.

Currently, vision-based detection methods for a complex road drivable area can be mainly classified into three types: based on appearance descriptions, based on geometric information and based on semantic segmentation. The method based on the appearance description is extremely easy to be interfered by factors such as illumination change, pavement shading and the like due to single appearance characteristics; many researches integrate geometric information of roads on the basis of appearance description, but when the geometric constraint condition of a scene cannot be met or the quality of three-dimensional data used is low, the performance of a method based on the geometric information is seriously degraded; the existing data based on the semantic segmentation method are mostly obtained under the specific running environment of foreign countries, are not completely consistent with complex road conditions of China, and the algorithm performs semantic segmentation on all objects in a scene, so that the redundancy is high, the focusing is insufficient, and the accuracy of the extracted running area is limited. In addition, no matter based on the appearance description, the geometric information or the semantic segmentation method, the existing travelable region detection model mostly only extracts a single road region as a travelable region, and can not give consideration to the normal driving working condition and the emergency driving working condition, so that the method is difficult to adapt to the changeable road conditions of the complex road. Therefore, in the actual automatic driving task at present, a method for recommending a drivable area which takes complex traffic scenes in China into consideration and simultaneously extracts different areas of a road and gives consideration to normal driving conditions and emergency driving conditions is needed.

Related patent literature: CN113223313a discloses a lane recommendation method, a lane recommendation device and a vehicle-mounted communication device, and the lane recommendation method comprises the following steps: acquiring lane information of a road where a target vehicle is currently located through a vehicle-mounted communication technology; receiving vehicle data of surrounding vehicles of the target vehicle through an on-vehicle communication technology; determining the positions of surrounding vehicles in a lane according to the lane information and the vehicle data; determining the driving parameters of each lane to be selected by the target vehicle according to the vehicle data and the positions of surrounding vehicles in the lanes; and determining the passing time length of the target vehicle passing through a preset road section according to the running parameters of each lane to be selected of the target vehicle, and recommending the lanes according to the determined passing time length. CN112857381a discloses a path recommending method, device and readable medium. The method comprises the following steps: the method comprises the steps of identifying a target object with congestion characteristics in an acquired image, determining road condition information of a driving road according to the target object and current navigation data, generating an alternative path according to the road condition information, and recommending the path, so that navigation can obtain more specific and accurate road condition information in time, and the problem that an incorrect path cannot be corrected in time is solved.

The specific guidance scheme is not provided for the method for recommending the road drivable area by the technologies on how to solve the detection problems of fuzzy aliasing of the road area boundary and complex and changeable driving working conditions of an automobile in an actual driving task and improve the detection precision and time efficiency of the road drivable area.

Disclosure of Invention

Aiming at the defects or improvement demands of the prior art, the invention aims to provide a road drivable area fine recommendation method based on layered input and output and double-attention jump connection, which focuses on the high efficiency and accuracy of feature extraction, effectively considers both the accuracy and the real-time performance, solves the detection problems of fuzzy aliasing of the road area boundary and complex and changeable driving working conditions of an automobile in an actual driving task, and improves the detection precision and the time efficiency of the road drivable area.

In order to solve the technical problems, the invention adopts the following technical scheme:

a fine recommendation method for a road drivable area based on layered input and output and double-attention jump (or a recommendation method for the road drivable area based on M-shaped depth architecture) is characterized by comprising the following steps:

step (1): constructing a data set with a label, dividing the data set into a training set, a verification set and a test set, and preprocessing the data set;

further, the preferred technical scheme may be: the step of constructing the labeled dataset in the step (1) is as follows:

step (101): labeling, merging and modifying the existing complex road driving scene image to enable the complex road driving scene image to accord with 4 types of driving area detection tasks of strong recommendation, weak recommendation, non-recommendation and background, wherein the part of the sample is marked as IDD_ unst;

Step (102): the method comprises the steps that an all-terrain intelligent experiment vehicle is utilized, a vehicle-mounted camera is utilized to collect images of complex roads in a closed/semi-closed park in constant-speed running, driving scene images of the complex roads are marked correspondingly, and a sample of the complex roads is marked as Campus_ unst;

step (103): and acquiring and marking complex road driving scene images of suburbs, villages and other places in China by using a vehicle recorder of a common passenger vehicle, wherein the part of the images are marked as China_ unst.

Step (2): based on the U-shaped encoder-decoder structure, an M-shaped encoder-decoder network, namely an M ² AttentionNet model, is constructed by adding three large structures of multi-scale hierarchical input, double-attention jumper and multi-scale hierarchical output.

Step (3): constructing an inverted pyramid type layered input structure at the input end of the model encoder, namely constructing a multi-scale layered input structure, reserving shallow features on different scale levels by the multi-scale layered input structure, and fusing the shallow features and deep semantics layer by layer;

Further, the preferred technical scheme may be: the multi-scale hierarchical input structure constructed by the input end of the model encoder comprises the following steps:

Step (31): performing continuous maximum pooling downsampling on the image I to be detected to generate an image inverted pyramid { I,1/2I,1/4I,1/8I } with a decreasing scale;

step (32): and layering and merging the images with four scales into corresponding levels of the encoder branch, activating and extracting features through Conv, BN and ReLU, merging the features generated by the previous level with the feature map generated by the previous level in a channel dimension splicing mode, and inputting the feature map into a network encoder.

Step (4): four levels were constructed at the M ² AttentionNet encoder arm, with two successive feature extractions at each level using a combination of 3 x 3Conv, BN and ReLU operations.

Step (5): the resolution of the same level is kept unchanged, and 2 x 2 max pooling is used between layers for downsampling.

Step (6): for the decoder branches, each layer uses Conv-BN-ReLU combinations of the same parameters for two consecutive feature extractions, 2 x 2 upsampling for nearest neighbor interpolation between layers.

Step (7): and activating the final terminal of the decoder branch by using 1 multiplied by 1Conv, BN and Softmax to carry out quaternary classification, generating a prediction result of the scale of an input image and the like, wherein 4 categories respectively correspond to a strong recommended driving area, a weak recommended driving area, a non-recommended driving area and a background area in the driving scene.

Step (8): designing an output structure of layered prediction and layered loss at the output end of the model;

further, the preferred technical scheme may be: the specific steps and formulas of the output structure of the hierarchical prediction and the hierarchical loss (or the specific steps of the multi-scale hierarchical output constructed by the model decoder branch) are designed at the model output end are as follows:

step (81): outputting a corresponding travelable region prediction map R _s (layer sequence s=1, 2,3, 4) at each layer of the decoder branch by up-sampling and convolution combination (including 1×1Conv, BN and Softmax activation), and merging the prediction maps of all layers into a final travelable region prediction result;

Step (82): by single-hot encoding, the loss of all levels of the decoder branches is fused and calculated, and the level loss l _s of the s-th layer is defined as:

Wherein, I is an input image, R ^opt is a true value, θ is a network parameter, N is the number of label categories, where N is 4; in the one-hot mode, for class k, Y _k ⁺ and Y _k ^- are the sets of pixels labeled positive (1) and negative (0) in their true values, x _k is the predicted value, γ is the constant factor, ω is the balance factor;

Step (83): the total loss function L of the calculation model is the sum of four decoder level losses L _s, l= Σl _s;

Further, the preferred technical scheme may be: in the step (8), in the formula of designing the hierarchical prediction and the hierarchical loss at the output end of the model, γ=2, ω=0.55, and the loss function is a focusing loss.

Step (9): the jump connection part in the middle of the encoder-decoder is designed into a double-attention jump connection structure, and the specific steps (the preferred technical scheme) are as follows:

step (91): integrating a channel attention and space attention dual mechanism in the hierarchical jump process;

step (92): the feature map F _w×h×c obtained by each level of the encoder is subjected to fine adjustment through a channel attention module and a space attention module in sequence;

Step (93): and performing channel dimension splicing on the feature map adjusted by the double-attention mechanism and the up-sampling feature map of the corresponding layer of the decoder to obtain a final output feature map F' _w×h×c.

Step (10): training the M ² AttentionNet model by using a training set to obtain a model with well trained parameters; detecting the trained model by using the test set to obtain a road drivable area in a complex traffic scene;

further, the preferred technical scheme may be: step (10), when the model is trained, specific parameters are set as follows: in the training process, using a Glorot tool built in Keras to initialize all the parameters of the convolution layer, initializing the deviation of the parameters to 0, and updating and optimizing all the parameters by using a random gradient descent method; the Batchsize parameter is set to 64, the initial learning rate is 1e-4, the momentum is 0.9, and the increment is reduced by 1e-6 once each iteration; to prevent model overfitting, the input layer uses dropout at a rate of 0.1, the output layer uses dropout at a rate of 0.4, and an early stop strategy is employed to stop training in advance when it is detected that the validation set error is no longer decreasing for 20 iteration cycles. And (10) adopting a ten-fold cross-validation method when the model is trained, and amplifying the sample by using a horizontal overturning, brightness adjusting and random noise preprocessing method.

Step (11): and acquiring real-time traffic scene data in actual driving, and inputting the real-time traffic scene data into a trained M ² AttentionNet model to obtain recommended results of different driving areas.

The invention discloses a method for recommending a drivable area by fusing a multi-scale interaction strategy and a dual-attention mechanism under an M-shaped depth architecture, which aims at complex roads with fuzzy boundaries and changeable road conditions, and can divide the strongly recommended, weakly recommended and non-recommended drivable areas of the roads based on visual fine division in complex driving scenes so as to meet different detection requirements of intelligent automobiles on the drivable areas under normal, emergency and other complex driving conditions. Firstly, constructing an inverted pyramid type multi-scale layered input and layered output structure on the basis of a skeleton of an encoder-decoder so as to effectively fuse morphological characteristics and semantic information of a road; secondly, a jump connection structure integrating the channel attention and the space attention is constructed to realize accurate detection of different driving areas. The method can better realize the fine segmentation of the strong recommended driving area, the weak recommended driving area, the non-recommended driving area and the background area under various real driving scenes. Compared with other existing main stream models, the model provided gives consideration to the segmentation precision and the time efficiency, and has obvious advantages in the detection task of the travelable area of the complex road.

In general, compared with the prior art, the technical scheme designed by the invention has the following technical characteristics and beneficial effects:

(1) The road drivable region segmentation model M ² AttentionNet integrating the multi-scale interaction strategy and the dual-attention mechanism can accurately segment a real driving scene image of a road into a strong recommended driving region, a weak recommended driving region, an un-recommended driving region and a background region, can cope with special driving conditions such as narrow road meeting, emergency avoidance and the like, and effectively adapts to changeable road conditions of different roads.

(2) According to the invention, three structures of multi-scale layered input, double-attention jump connection and multi-scale layered output are designed on an encoder-decoder framework, an M-shaped deep convolutional neural network framework is constructed, shallow layer characteristics and deep semantics are effectively fused, model prediction bias on different scales is balanced, the learning process is focused on important characteristics related to road travelling performance, and the model performance is effectively improved; the invention effectively considers the accuracy and the real-time performance, can obtain good fine detection effect under different real scenes, has the average intersection ratio reaching 92.46 percent and the average detection speed reaching 22.7 frames/second, effectively completes the fine detection task of the travelable area of the complex road, and has better generalization performance.

In summary, the invention provides a road drivable region fine recommendation method based on layered input and output and double attention skip, which utilizes a convolutional neural network and a double attention mechanism to pay attention to the high efficiency and accuracy of feature extraction, effectively considers both accuracy and instantaneity, solves the detection problems of fuzzy aliasing of road region boundaries and complex and changeable driving working conditions of automobiles in actual driving tasks, and improves the detection precision and time efficiency of the road drivable region.

Drawings

Fig. 1 is an M-shaped architecture schematic diagram of an M ² AttentionNet model according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of fine recommendation of a road drivable area according to an embodiment of the present invention, and fig. 2 (a) is a schematic diagram of driving scenario 1, and (b) is a schematic diagram of fine recommendation of a road drivable area according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a dual-attention jumper module according to an embodiment of the present invention.

Fig. 4 is a schematic diagram comparing the detection result and the manual detection result of the method according to the embodiment of the present invention, where (c) in fig. 4 is an input image (schematic diagram of driving scene 2), (d) is a manual detection result diagram (extraction result diagram), and (e) is a detection result diagram (extraction result diagram) of the method according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

As shown in fig. 1, fig. 1 is an M-shaped architecture schematic diagram of an M ² AttentionNet model provided by an embodiment of the present invention, and a target to be achieved is shown in fig. 2, and a method for finely recommending a road drivable area based on hierarchical input and output and double-attention jumper specifically includes the following steps:

further, the step of constructing the labeled dataset in the step (1) is as follows:

step (101): marking, merging and modifying the existing road driving scene image to enable the road driving scene image to accord with 4 types of driving area detection tasks of strong recommendation, weak recommendation, non-recommendation and background, wherein the part of the sample is marked as IDD_ unst;

Step (102): the method comprises the steps that an all-terrain intelligent experiment vehicle is utilized, a vehicle-mounted camera is utilized to collect images of roads in a closed/semi-closed park in constant-speed running, driving scene images of the roads are marked correspondingly, and a sample of the road is marked as Campus_ unst;

Step (3): constructing an inverted pyramid type layered input structure at the input end of the model encoder, namely constructing a multi-scale layered input structure, reserving shallow features on different scale levels by the multi-scale layered input structure, and fusing the shallow features and deep semantics layer by layer; the multi-scale hierarchical input structure constructed by the input end of the model encoder comprises the following steps:

Step (8): the output structure of the hierarchical prediction and the hierarchical loss is designed at the output end of the model, and the specific steps of the output structure (or the specific steps of the multi-scale hierarchical output constructed by the model decoder branch) are as follows:

Wherein, I is an input image, R ^opt is a true value, θ is a network parameter, N is the number of label categories, where N is 4; in the one-hot mode, for class k, Y _k ⁺ and Y _k ^- are the sets of pixels labeled positive (1) and negative (0) in their true values, x _k is the predicted value, γ is a constant factor, and ω is the balance factor, respectively. In the formula, γ=2 (a possible value), ω=0.55 (a possible value), and the loss function is a focus loss.

Step (83): the total loss function L of the calculation model is the sum of the four decoder level losses L _s, l= Σl _s.

Step (9): the jump connection part in the middle of the encoder-decoder is designed into a double-attention jump connection structure, as shown in fig. 3, and the specific steps are as follows:

Step (10): training the M ² AttentionNet model by using a training set to obtain a model with well trained parameters; and detecting the trained model by using the test set to obtain the road drivable area in the complex traffic scene. Step (10), when the model is trained, specific parameters are set as follows: in the training process, using a Glorot tool built in Keras to initialize all the parameters of the convolution layer, initializing the deviation of the parameters to 0, and updating and optimizing all the parameters by using a random gradient descent method; the Batchsize parameter is set to 64, the initial learning rate is 1e-4, the momentum is 0.9, and the increment is reduced by 1e-6 once each iteration; to prevent model overfitting, the input layer uses dropout at a rate of 0.1, the output layer uses dropout at a rate of 0.4, and an early stop strategy is employed to stop training in advance when it is detected that the validation set error is no longer decreasing for 20 iteration cycles. And (10) adopting a ten-fold cross-validation method when the model is trained, and amplifying the sample by using a horizontal overturning, brightness adjusting and random noise preprocessing method.

As shown in FIG. 4, FIG. 4 is a schematic diagram showing the comparison of the detection result and the manual detection result of the method of the present invention.

Further, the method of the present invention conducted more extensive detection and extraction experiments on the published dataset IDD and the constructed dataset URDD, including experiments on both structured and unstructured roads, and quantitatively compared with 9 representative methods (all of which are known techniques) published in 2015-2021, FCN model, UNet model, segNet model, PSPNet model, deeplabV3 +model, DANet model, modified DeeplabV3 +model, HIERARCHICAL ATTENTION model, HR-Net model, etc., which were well-known in the art, under the same conditions. The comparison uses 2 pixel level evaluation indexes: the cross-over ratios (IoU) and the comprehensive average cross-over ratio (mIoU) for each class are defined in Table 1. Wherein IoU is the overlap ratio of a model detection region (R _k) and its true region (R _k ^opt) of a certain class, i.e. the ratio of intersection to union. mIoU is the average IoU index of the full class. The higher the values IoU and mIoU, the stronger the segmentation performance of the representative model.

Table 1 Algorithm Performance evaluation index

Table 2 shows the accuracy and efficiency achieved by the different models on URDD datasets. All methods of obtaining source code run on the same workstation (NVIDIA GTX 3090 GPU) as the method of the present invention. It can be seen that mIoU score of 92.46% of the method of the present invention is optimal for the same class of algorithms. Meanwhile, due to the light-weight structure of multi-scale layered input, double-attention jump and multi-scale layered output, the method can still process 22.7 frames of images per second under the condition of adopting multi-scale interaction and double attention, and the algorithm efficiency can meet the real-time requirement.

TABLE 2

Model	Image size	mIoU	Speed/(frame s-1)
				FCN	640×360	67.76％	5.8
UNet	640×360	78.23％	37.1
				SegNet	640×360	68.34％	15.2
PSPNet	640×360	85.40％	3.4
				DeepLabV3+	640×360	85.90％	2.4
DANet	640×360	84.58％	8.1
				modified DeepLabV3+	512×512	86.75％	12.6
Hierarchical Attention	640×360	88.19％	15.3
				HR-Net	640×360	86.56％	16.2
The method of the invention	640×360	92.46％	22.7

Furthermore, in order to verify the generalization performance of the method in various driving scenes, based on the model which has been trained in URDD data sets, the method also carries out a segmentation experiment which is not trained and directly tested on new collected data (two scenes including unstructured roads and structured roads) of a vehicle recorder and a semantic segmentation set in KITTI data sets of a public scene respectively. The method can simultaneously and effectively recommend the drivable area for the structured road scene and the unstructured road scene, and the integrated mIoU fraction on the collected sample data sets of a plurality of real vehicles under different scenes can reach 83.94 percent on average, so that the model has better generalization performance.

TABLE 3 Table 3

Experimental results prove that the method has high detection precision and generalization performance and high time efficiency, and the difficult problem of detecting the drivable area under different road scenes is effectively solved.

In summary, the invention provides a road drivable area fine recommendation method based on layered input and output and double attention skip, which uses a convolutional neural network and a double attention mechanism to pay attention to the high efficiency and accuracy of feature extraction, solves the detection problems of fuzzy aliasing of road area boundaries and complex and changeable driving working conditions of automobiles in actual driving tasks, and improves the detection precision and time efficiency of the road drivable area. The invention effectively considers the accuracy and the real-time performance, can obtain good fine detection effect under different real scenes, has the average intersection ratio reaching 92.46 percent and the average detection speed reaching 22.7 frames/second, effectively completes the fine detection task of the travelable area of the complex road, and has better generalization performance.

Claims

1. A road drivable area fine recommendation method based on layered input and output and double-attention jump connection is characterized by comprising the following steps:

Step (2): based on the U-shaped encoder-decoder structure, an M-shaped encoder-decoder network, namely an M ² AttentionNet model, is constructed by adding three structures of multi-scale layered input, double-attention jumper and multi-scale layered output;

Step (4): constructing four levels in the M ² AttentionNet encoder branch, and carrying out continuous feature extraction twice on each level by utilizing the combination operation of 3X 3Conv, BN and ReLU;

Step (5): the resolution of the same level is kept unchanged, and 2 multiplied by 2 max pooling is used between layers for downsampling;

Step (6): for the decoder branch, each layer uses Conv-BN-ReLU combination with the same parameters to perform continuous feature extraction twice, and 2X 2 up-sampling of nearest neighbor interpolation is performed between layers;

step (7): activating the final terminal of the decoder branch by using 1 multiplied by 1Conv, BN and Softmax to carry out quaternary classification, generating a prediction result of the same scale as an input image, wherein 4 categories respectively correspond to a strong recommended driving area, a weak recommended driving area, a non-recommended driving area and a background area in a driving scene;

Step (9): designing a double-attention jumper structure at a jumper connection part in the middle of an encoder-decoder;

Step (10): training the M ² AttentionNet model by using a training set to obtain a model with well trained parameters; detecting the trained model by using the test set to obtain a road drivable area in a traffic scene;

2. The method for finely recommending a road drivable area based on layered input and output and double-attention jump according to claim 1, wherein the specific steps of the multi-scale layered input structure constructed at the input end of the model encoder in the step (3) are as follows:

3. The method for finely recommending a road drivable area based on layered input/output and double-attention jump according to claim 1 or 2, wherein the specific steps of designing an output structure of layered prediction and layered loss at the model output end in the step (8) are as follows:

Step (81): each layer of the decoder branch is activated by up-sampling and convolution combination comprising 1×1Conv, BN and Softmax, a corresponding drivable region prediction map R _s is output, the layer sequences s=1, 2,3,4, and the prediction maps of all layers are combined into a final drivable region prediction result;

4. The method for fine recommendation of a road drivable area based on hierarchical input/output and dual-attention jump according to claim 3, wherein in the step (8), the model output is designed with a hierarchical prediction and a hierarchical loss formula, γ=2, ω=0.55, and the loss function is a focusing loss.

5. The method for fine recommendation of a road drivable area based on hierarchical input/output and dual-attention jumping according to any one of claims 1,2 and 4, wherein the dual-attention jumping structure designed in step (9) comprises the specific steps of:

6. The method for fine recommendation of a road drivable area based on hierarchical input/output and double attention jumpers as set forth in any one of claims 1,2 and 4, wherein the specific parameters of step (10) are set as follows when the model is trained: in the training process, using a Glorot tool built in Keras to initialize all the parameters of the convolution layer, initializing the deviation of the parameters to 0, and updating and optimizing all the parameters by using a random gradient descent method; the Batchsize parameter is set to 64, the initial learning rate is 1e-4, the momentum is 0.9, and the increment is reduced by 1e-6 once each iteration; to prevent model overfitting, the input layer uses dropout at a rate of 0.1, the output layer uses dropout at a rate of 0.4, and an early stop strategy is employed to stop training in advance when it is detected that the validation set error is no longer decreasing for 20 iteration cycles.

7. The method for fine recommendation of a road drivable area based on layered input/output and double attention jump according to any one of claims 1,2 and 4, wherein step (10) adopts a ten-fold cross validation method when training the model, and amplifies the samples by using a horizontal flip, brightness adjustment and random noise preprocessing method.