CN108876805B

CN108876805B - End-to-end unsupervised scene passable area cognition and understanding method

Info

Publication number: CN108876805B
Application number: CN201810636311.5A
Authority: CN
Inventors: 赵祥模; 刘占文; 樊星; 高涛; 董鸣; 沈超; 王润民; 连心雨; 徐江; 张凡
Original assignee: Changan University
Current assignee: Shaanxi Heavy Duty Automobile Co Ltd
Priority date: 2018-06-20
Filing date: 2018-06-20
Publication date: 2021-07-27
Anticipated expiration: 2038-06-20
Also published as: CN108876805A

Abstract

The invention discloses an end-to-end unsupervised scene pavement area determination method, which comprises the steps of constructing a road position prior probability distribution map, taking the road position prior probability distribution map as a feature map of a detection network, directly attaching the feature map to a convolutional layer, constructing a convolutional network framework fusing position prior features, then constructing a deep network framework-UC-FCN network by combining a full convolutional network and a U-NET, taking the constructed passable region position prior probability distribution map as a feature map of the deep network framework-UC-FCN network, and generating the UC-FCN-L network; the passable area is detected based on the vanishing point detection method, the obtained detection result is used as a true value of a training data set to train the UC-FCN-L network, and a deep network model for extracting the passable area is obtained.

Description

End-to-end unsupervised scene passable area cognition and understanding method

Technical Field

The invention belongs to the technical field of traffic control, and particularly relates to a method for recognizing and understanding a passable area of an end-to-end self-supervision scene based on a video data set.

Background

With the development of society, automobiles have become irreplaceable vehicles for human daily life. However, the safety problem caused by the method is increasingly highlighted. The global road safety condition report indicates that the number of deaths caused by traffic accidents is up to 124 thousands of deaths each year, and the main cause of accidents is the negligence and fatigue driving of drivers, so that the development of the automobile intelligent technology is particularly important for relieving the situation. The structured road surface generally has road edge lines, and the road surface has a single structure, such as an urban main road, a high speed road, a national road, a provincial road and the like; the semi-structured pavement refers to a general non-standardized pavement, and the pavement surface layer has larger color and material difference, such as a parking lot, a square and the like, and also has some branch roads; unstructured pavements have no structural layers, natural road scenes. At present, intelligent automobiles mainly combine radars and cameras to realize the cognition and understanding of travelable areas, however, radars (laser radars, millimeter wave radars and ultrasonic radars) are generally high in cost, high in power consumption and prone to mutual interference.

The method for recognizing and understanding the drivable area based on vision mainly comprises the steps of obtaining basic structural features of a road surface based on road surface colors, road models, road texture features and the like, further obtaining potential information such as vanishing points, road edge lines and basic directions (straight walking, left turning, right turning, left sharp turning and right sharp turning) of the road through the features, and finally extracting the drivable area from the features by using a traditional segmentation extraction method.

Disclosure of Invention

The invention aims to provide a method for recognizing and understanding a passable area of an end-to-end unsupervised scene, so as to overcome the defects of the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

an end-to-end unsupervised scene pavement area determination method comprises the following steps:

step 1), constructing a road position prior probability distribution map and directly adding the road position prior probability distribution map as feature mapping of a detection network into a convolutional layer, thereby constructing a passable area position prior probability distribution map in which position prior information can be flexibly applied in an actual road traffic environment;

step 2), a deep network architecture (UC-FCN) is constructed by combining a full convolution network and the U-NET and is used as a main network model for realizing detection;

step 3), mapping the constructed passable area position prior probability distribution map as a feature map of a deep network architecture-UC-FCN network to obtain an optimal additional position, and directly attaching the optimal additional position to the full convolution layer to generate the UC-FCN-L network;

and 4) detecting the passable area based on a vanishing point detection method, and training the UC-FCN-L network by taking the obtained detection result as a true value of a training data set to obtain a deep network model for extracting the passable area.

Further, in the step 1), a passable area position prior probability distribution map is constructed based on statistics by using the distribution rule of the road area in space and images.

Further, in step 1), based on the live-action graphs and truth-value graphs of the driveway urban roads and the driveway-free urban roads in the KITTI data set, the passable areas are counted to obtain passable area position prior probability distribution graphs under two road conditions respectively, and the passable area position prior probability distribution graphs obtained under the two road conditions are fused to obtain the passable area position prior probability distribution graphs.

Further, based on a live-action graph and a truth-value graph of the KITTI data set, the trafficable region is counted, the times of judging each coordinate position as the trafficable region is counted, the average value of the times is calculated, a trafficable region position prior probability distribution graph under two road conditions is obtained, in the probability distribution graph, the brightness of each pixel point represents the probability that the pixel point belongs to the target, and the higher the brightness of the pixel point is, the higher the probability that the pixel point belongs to the target is; conversely, the lower the brightness, the smaller the probability that it belongs to the target; separating the passable region from the scene through the probability distribution image, and fusing the two prior probability distribution maps to obtain the prior probability distribution map of the passable region position.

Further, in the step 2), the UC-FCN network includes a contraction structure and an expansion structure, and the contraction structure performs convolution and pooling operations to gradually reduce spatial dimensions, so that the obtained image is smaller and smaller, and the resolution is lower and lower; and then, the expansion structure replaces the pooling operation after the convolution layer in the contraction structure with the up-sampling operation after the convolution layer, the high-resolution features generated in the network contraction structure are connected to the result after the convolution of the expansion structure, the output resolution is increased, and the detail and the space dimension of the object are gradually restored.

Furthermore, the expansion structure adopts a repeated structure of up-sampling convolution, the up-sampling in the repeated structure is specifically an up-sampling plus ReLU activation function structure, the input up-sampling is 2 times by using bilinear interpolation, then the gradient disappearance problem is solved by using ReLU, after the up-sampling is finished, the number of channels of the feature map is changed by using convolution operation, the size of a convolution kernel in the convolution layer is 3 x 3, the result after the convolution is fused with the feature map of the corresponding step in the contraction structure, and finally, the high-precision identification result is obtained through a softmax layer.

Further, in step 3), the passable area position prior probability distribution map is adjusted in an equal ratio to the size of the last feature map connected with the passable area position prior probability distribution map is the same as the size of the last feature map connected with the passable area position prior probability distribution map, and the adjusted passable area position prior probability distribution map is used as a feature map mapping of the UC-FCN network and is added to the corresponding position of the UC-FCN network to generate the UC-FCN-L network.

Furthermore, the passable area detection is carried out on the collected training images based on a vanishing point method, the detection result is used as a true value GT of the training data, in the network training process, the difference value between the detection result and the detection result obtained based on the vanishing point is reduced by continuously improving network parameters to train the network, and finally the network architecture which can be used for the passable area detection is obtained.

Further, in the step 4), the UC-FCN-L network is trained in an unsupervised mode, and a deep network model for extracting the travelable area is obtained.

Further, the unsupervised training mode is to divide the sample into marked sample and unmarked sample, and the marked sample is the training sample set D_l＝{(x₁,y₁),(x₂,y₂),K(x_l,y_l) The class is marked as a known sample, and the unmarked sample is the training sample set D_u＝{x_l+1,x_l+2,x_l+uThe u classes in the } mark unknown samples, u is much larger than l, based on the marked samples D_lTraining to construct a model, unlabeled sample D_uThe training mode in which the contained information is not utilized is called supervised learning, if the labeled sample D is lacked_lIn the case of the sample, it is necessary to consider the unlabeled sample D_uThe model learning is realized.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention relates to a method for determining an end-to-end unsupervised scene pavement area, which comprises the steps of constructing a road position prior probability distribution map and directly attaching the road position prior probability distribution map as feature mapping of a detection network to a convolutional layer, thereby constructing a passable area position prior probability distribution map in which position prior information can be flexibly applied in an actual road traffic environment, constructing a convolutional network frame fused with position prior features, and then constructing a deep network architecture-UC-FCN network by combining a full convolutional network and a U-NET to serve as a main network model for realizing detection; the constructed passable area position prior probability distribution map is used as a feature map mapping of a deep network architecture-UC-FCN network to obtain an optimal additional position, and the optimal additional position is directly added to the optimal additional position of the full convolution layer to generate the UC-FCN-L network; the passable area is detected based on the vanishing point detection method, the obtained detection result is used as a true value of a training data set to train the UC-FCN-L network, a deep network model for extracting the travelable area is obtained, a self-supervision learning mode is adopted, the problem that the passable area is difficult to label is solved, the applicability is strong, the passable area can stably work under various road environments, the real-time performance is good, the method can be widely applied to intelligent automobiles and auxiliary driving systems, and compared with the existing travelable area cognition and understanding method, the method is high in detection accuracy, good in adaptability, real-time performance and robustness, and simple and effective.

Furthermore, based on a live-action map and a truth-value map of the KITTI data set, the trafficable regions are counted to respectively obtain trafficable region position prior probability distribution maps under two road conditions, and the trafficable region position prior probability distribution maps under the two road conditions are fused to obtain trafficable region position prior probability distribution maps, so that the problem that the conventional convolutional neural network is insensitive to position prior and has similar appearance characteristic foreground and background false detection caused by the fact that the conventional convolutional neural network is insensitive to position prior is solved.

Further, a contraction structure is adopted to carry out convolution and pooling operation, so that the spatial dimension is gradually reduced, the obtained image is smaller and smaller, and the resolution is lower and lower; and then, the expansion structure replaces the pooling operation after the convolution layer in the contraction structure with the up-sampling operation after the convolution layer, the high-resolution features generated in the network contraction structure are connected to the result after the convolution of the expansion structure, the output resolution is increased, the details and the space dimensionality of the object are gradually repaired, and the purposes of improving the detection speed and simultaneously realizing higher detection precision are achieved.

Drawings

FIG. 1 is an overall framework diagram of a scene passable area cognition and understanding method.

FIG. 2 is a schematic diagram of a location prior and a location prior characteristic, (a) is a spatial distribution of objects in an actual traffic scene; (b) the real-scene graph and the truth-value graph of the urban road with the lane and the urban road without the lane are collected based on KITTI data.

Fig. 3 is a schematic diagram of a UC-FCN network architecture.

Fig. 4 is a comparative illustration of the prior probability distribution diagram of the positions of different connection passable areas.

Fig. 5 is an overall schematic diagram of the UC-FCN-L network.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings:

as shown in fig. 1, an end-to-end unsupervised scene road surface area determining method specifically includes the following steps:

1) the distribution rule of the road area in space and images is utilized, a road position prior probability distribution map is constructed based on statistics and is directly attached to a convolutional layer as a feature mapping of a detection network, and a passable area position prior probability distribution map with position prior information capable of being flexibly applied in an actual road traffic environment is constructed;

2) aiming at the method for recognizing and understanding the passable area, a new deep network architecture (UC-FCN) is constructed by combining a Full Convolution Network (FCN) and a U-NET to solve the problems of road surface detection and segmentation and is used as a main network model for realizing detection;

3) the constructed passable area position prior probability distribution map is used as a feature map mapping of the UC-FCN network, the optimal additional position of the UC-FCN network is verified through experiments, and the UC-FCN-L network is generated by directly adding the optimal additional position of the full convolution layer;

4) aiming at the problem that the difficulty of obtaining pixel-level semantic labels corresponding to training data from a self-acquisition traffic scene video data set is high, an unsupervised training method is provided, a passable area is roughly detected based on a traditional vanishing point detection method, and a UC-FCN-L network is trained by taking an obtained detection result as a true value of a training data set, so that a deep network model for extracting the passable area is obtained.

In the step 1), in order to solve the problem that the traditional convolutional neural network is insensitive to position prior and has similar appearance characteristic foreground and background false detection, the passable areas are counted based on live-action graphs and truth-value graphs of driveway urban roads and driveway-free urban roads in KITTI data sets to respectively obtain passable area position prior probability distribution graphs under two road conditions, and then the two prior probability distribution graphs are fused to obtain the passable area position prior probability distribution graph; as shown in fig. 2(a), the spatial distribution of objects in an actual traffic scene has a certain rule, for example, the sky is distributed at the top of the image, buildings are distributed at both sides of the image, and road regions are distributed at the bottom of the image. The traditional convolutional neural network is only sensitive to the local appearance characteristics of the target, cannot utilize position prior information, possibly identifies a building area as a road area with similar appearance characteristics, and can effectively eliminate such false detection if the position prior is reasonably utilized. In order to make the position prior information flexibly applied in the actual road traffic environment, different input images should have the same position feature expression, so the position prior is directly added to the convolutional layer as a feature mapping of the detection network. As shown in fig. 2(b), based on the live-action graphs and truth-value graphs of the driveway urban roads and the driveway-free urban roads in the KITTI dataset, the passable areas are counted, the number of times that each coordinate position is determined as the passable area is counted, the average value is calculated, the passable area position prior probability distribution maps under two road conditions are obtained respectively, in the probability distribution maps, the luminance of each pixel point represents the probability that the pixel point belongs to the target, and the higher the luminance of the pixel point is, the higher the probability that the pixel point belongs to the target is; conversely, the lower the brightness, the lower the probability that it belongs to the target. The passable area can be separated from the scene through the probability distribution image. And fusing the two prior probability distribution maps to obtain the prior probability distribution map of the passable area position.

In step 2), a new deep network architecture, namely a UC-FCN (unified core-fiber channel network) is provided based on the full convolutional neural network. Convolutional Neural Networks (CNN) have achieved great success and wide application in image classification, image detection, and the like since 2012. The traditional CNN method uses pixel blocks as sensing regions, and can only extract some local features, thereby causing the performance of classification to be limited. To address this problem, Jonathan Long et al of UC Berkeley proposed a full volumetric Networks (FCN) for image segmentation, trying to recover the class to which each pixel belongs from abstract features. The FCN converts the fully-connected layers in a traditional CNN into convolutional layers one by one, all of which are convolutional layers, and is therefore called a full convolutional network.

Based on an FCN network architecture mode, the method improves and constructs the network:

the UC-FCN network comprises a contraction structure and an expansion structure, wherein the contraction structure performs convolution and pooling operation, the spatial dimension is gradually reduced, the obtained image is smaller and smaller, the resolution is lower and lower, in order to recover the resolution of an original image from a rough image with low resolution, the expansion structure is used, in particular, after a convolution layer, the pooling operation after the convolution layer in the contraction structure is replaced by up-sampling operation, the output resolution is increased, the details and the spatial dimension of an object are gradually restored, in order to use local information, connection is arranged between two modules to help the expansion structure to better restore the details of a target, in particular, the high-resolution feature generated in the contraction structure of the network is connected to the result after convolution of the expansion structure.

The method comprises the steps of constructing a UC-FCN network which is mainly based on convolution and pooling operations and gradually reduces spatial dimensions, enabling a contraction structure with smaller and smaller images to be obtained, mainly based on the convolution operations, increasing output resolution, and gradually repairing details of an object and the expansion structure with the spatial dimensions.

Specifically, as shown in fig. 3, since the height and width of the feature map are smaller than those of the input feature map, the passable region position prior probability distribution map should be adjusted to be equal to the size of the last feature map connected with the passable region position prior probability distribution map. The passable area position prior probability distribution graph is connected to the back of the two characteristic graphs 33 x 33 or 15 x 15, the former has more accurate position prior information than the latter, can describe more diversified and irregular shapes, can better embody detail information such as long-distance roads and small corners, and can obtain more accurate detection results. Mapping and adding the final passable region position prior probability distribution map serving as a feature map of the UC-FCN network to a corresponding position of the UC-FCN network to generate the UC-FCN-L network;

the systolic structure is a typical convolutional network architecture, which is a repetitive structure, each repetition has 2 convolutional layers and one pooling layer, the convolutional cores in the convolutional layers are 3 × 3, the activation function uses ReLU, two convolutional layers are followed by one maximum pooling layer with 2 × 2 step size of 2, the number of feature channels is doubled after each downsampling, 5 times of convolution pooling is followed by a full convolutional structure, there are 2 convolutional layers, the improvement of FCN is to change the fully-connected layer of CNN into convolutional layer, FCN uses VGG16 as the basis in the feature extraction stage (systolic structure), the network has 4096 filters in the full convolutional structure, a large number of filters makes the calculation larger, we reduce the number of filters in the full convolutional structure from 4096 to 1024, the size of the filters is changed from 7 to 3 × 3, the parameters of the network are reduced, the calculated amount is correspondingly reduced, the precision is also reduced, and the expansion structure is correspondingly improved for keeping the identification precision of the network;

specifically, the expansion structure adopts a repeated structure of up-sampling convolution, the up-sampling in the repeated structure is an up-sampling plus ReLU activation function structure, the input up-sampling is 2 times by using bilinear interpolation, then the gradient disappearance problem is solved by using ReLU, the size of a feature map is doubled by using the up-sampling each time, after the up-sampling is finished, the number of channels of the feature map is changed by using convolution operation, the size of convolution kernels in a convolution layer is 3 x 3, the result after the convolution is fused with the feature map of the corresponding step in the contraction structure, and finally, the identification result is obtained through a softmax layer.

In order to reduce the number of filters in the contraction structure without influencing the identification precision, the expansion structure is specifically improved as follows:

1) adding a conv-Ncl layer between the contraction structure and the expansion structure, wherein the convolution kernel size of the conv-Ncl layer is 1 x 1, the number of characteristic map channels passing through the conv-Ncl layer is converted into a specific number from 1024, the size of the characteristic map is converted into 1 x 1, and the number of the conversion channels is directly set as a classification number in order to simplify the subsequent classification calculation amount;

2) in order to match the convolution result of the expansion structure with the number of channels of the feature map of the contraction structure, all framework layers of the expansion structure use a plurality of convolution kernels, in order to avoid a large increase of network parameters, a scalar value C is used as a coefficient of the number of the convolution kernels, the expansion part of the new network is provided with C × Ncl convolution kernels, and C is adjusted according to different corresponding feature map positions to be the same as the number of the convolution kernels of the corresponding contraction structure.

And 3), taking the constructed passable region position prior probability distribution map as a feature map mapping of the UC-FCN network, directly adding the feature map mapping to the full convolution layer, and extracting position features to generate the UC-FCN-L network. As described in step 1), some false detections can be effectively avoided by reasonably utilizing the position prior, and since the height and the width of the feature map are smaller than those of the input feature map, the passable region position prior probability distribution map should be adjusted to be equal to the size of the last feature map connected with the passable region position prior probability distribution map. It is obvious from UC-FCN network that the convolution for generating feature map is a repetitive structure, which repeats 7 times, and the output width and height are 259, 130, 65, 33, 17, 15 (the size of the last two full convolution feature maps is not changed), the difference of the generated result will be caused by the difference of the number of convolution layers used for feature extraction in the prior probability distribution map of passable area position, the more the number of convolution layers, the more detailed the extracted feature information, the less the number of convolution layers, and the lunar profile of the extracted feature information, and the more the whole information can be covered. The passable area position prior probability distribution map is used as auxiliary information for passable area detection, the detection result is properly corrected to a certain degree, the extraction of the prior probability distribution diagram characteristics of the passable area positions needs to preserve contour information and contain detail information, therefore, the passable region position prior probability distribution map is connected to the characteristic map of 33 x 33, at this time, the passable region position prior probability distribution map can describe more diversified and irregular shapes, the extracted characteristics can not only reflect the outline information of the general shape, the position and the like of the road, but also better reflect the detailed information of the long-distance road, the small corner and the like, more accurate detection results can be obtained, and the final passable area position prior probability distribution map is placed at the corresponding position to obtain a depth network model for travelable area extraction, as shown in fig. 5.

And 4) training the UC-FCN-L network in an unsupervised mode to obtain a deep network model for extracting the travelable area. The samples are indispensable in the deep learning training process and mainly divided into marked samples and unmarked samples, and the marked samples are a training sample set D_l＝{(x₁,y₁),(x₂,y₂),K(x_l,y_l) The class is marked as a known sample, and the unmarked sample is the training sample set D_u＝{x_l+1,x_l+2,x_l+uThese u classes in (u is much larger than l) mark the unknown samples. Based on marked samples D_lTraining to construct a model, unlabeled sample D_uThe training mode in which the contained information is not utilized is called supervised learning, if the labeled sample D is lacked_lIn the case of the sample, it is necessary to consider the unlabeled sample D_uThe training mode only using unlabeled samples is called unsupervised learning.

The network architecture for scene passable area cognition and understanding provided by the invention is based on a self-acquisition traffic scene video data set, as shown in fig. 4, the network architecture comprises real image data acquired by scenes such as urban areas, villages, highways and the like, partial image data in the real image data are selected for training and testing, the passable area cognition and understanding method is essentially to perform pixel level segmentation on the image, if an image segmentation truth value is obtained, the training data needs to obtain corresponding pixel level semantic labels, however, the pixel level labels performed on a large amount of acquired real scene data are very difficult, and an unsupervised method needs to be adopted for network training.

Specifically, the passable area detection is performed on the acquired training image based on the vanishing point method by using the conventional method. The vanishing point is the only intersection point where a set of parallel lines in space are imaged on the image plane. The passable area detection based on the vanishing point mainly comprises the following steps: performing texture analysis on multiple scales by using Gabor wavelets, and discarding points with insignificant texture; investigating the relation between each point and the texture information, and calculating the score of each point by using a texture voting method; and searching the road edge according to the vanishing point to obtain the road surface area. And in the network training process, the difference between the detection result obtained by reducing the proposed network model and the detection result obtained based on the vanishing point is trained by continuously improving the network parameters to reduce the detection result, so that the network architecture which can be finally used for detecting the passable area is obtained.

Claims

1. An end-to-end unsupervised scene road surface area determination method is characterized by comprising the following steps:

step 2), a deep network architecture (UC-FCN) is constructed by combining a full convolution network and the U-NET and is used as a main network model for realizing detection; the UC-FCN network comprises a contraction structure and an expansion structure, wherein the contraction structure performs convolution and pooling operation, and the spatial dimension is gradually reduced, so that the obtained image is smaller and smaller, and the resolution is lower and lower; then, after the convolution layer is formed, the pooling operation after the convolution layer in the contraction structure is replaced by the up-sampling operation through the expansion structure, the high-resolution feature generated in the network contraction structure is connected to the result after the convolution of the expansion structure, the output resolution is increased, the detail and the space dimension of the object are gradually repaired, the expansion structure adopts a repeated framework of up-sampling convolution, the up-sampling in the repeated framework is specifically an up-sampling and ReLU activation function structure, the input up-sampling is 2 times by using bilinear interpolation, then the gradient disappearance problem is solved by using ReLU, after the up-sampling is performed, the number of channels of the feature map is changed by using the convolution operation, the size of a convolution kernel in the convolution layer is 3 x 3, the result after the convolution is fused with the feature map of the corresponding step in the contraction structure, and finally, the high-precision identification result is obtained through a softmax layer; adding a conv-Ncl layer between the contraction structure and the expansion structure, wherein the convolution kernel size of the conv-Ncl layer is 1 x 1, the number of characteristic map channels passing through the conv-Ncl layer is converted into a specific number from 1024, the size of the characteristic map is converted into 1 x 1, and the number of the conversion channels is directly set as a classification number in order to simplify the subsequent classification calculation amount; in order to match the convolution result of the expansion structure with the number of channels of the characteristic diagram of the contraction structure, all framework layers of the expansion structure use a plurality of convolution kernels, in order to avoid the large increase of network parameters, a scalar value C is used as a coefficient of the number of the convolution kernels, the expansion part of a new network is provided with C × Ncl convolution kernels, and the C is adjusted according to different positions of the corresponding characteristic diagram so as to be the same as the number of the convolution kernels of the corresponding contraction structure;

2. The method for determining the road surface area of the end-to-end unsupervised scene according to claim 1, wherein in the step 1), a passable area position prior probability distribution map is constructed based on statistics by utilizing the distribution rule of the road area in space and images.

3. The method for determining the road surface area of the end-to-end unsupervised scene according to claim 1 or 2, characterized in that in step 1), based on the live-action graphs and the truth-value graphs of the urban roads with lanes and the urban roads without lanes in the KITTI data set, the passable areas are counted to obtain passable area position prior probability distribution graphs under two road conditions respectively, and then the passable area position prior probability distribution graphs under the two road conditions are fused to obtain the passable area position prior probability distribution graph.

4. The method for determining the road surface area of the end-to-end unsupervised scene according to claim 3, characterized in that the passable area is counted based on a live view and a truth view of a driveway urban road and a driveway urban road in a KITTI data set, the number of times that each coordinate position is judged as the passable area is counted and the average value is calculated, prior probability distribution maps of the passable area positions under two road conditions are obtained respectively, in the probability distribution maps, the luminance of each pixel point represents the probability that the pixel point belongs to the target, and the higher the luminance of the pixel point is, the higher the probability that the pixel point belongs to the target is; conversely, the lower the brightness, the smaller the probability that it belongs to the target; separating the passable region from the scene through the probability distribution image, and fusing the two prior probability distribution maps to obtain the prior probability distribution map of the passable region position.

5. The method for determining a road surface area in an end-to-end unsupervised scene according to claim 1, wherein in step 3), the passable area position prior probability distribution map is adjusted to have the same size as the last feature map connected to the passable area position prior probability distribution map, and the adjusted passable area position prior probability distribution map is added to the corresponding position as a feature map of the UC-FCN network to generate the UC-FCN-L network.

6. The method for determining an end-to-end unsupervised scene road surface area according to claim 1, characterized in that the passable area detection is performed on the collected training images based on a vanishing point method, and the detection result is used as a true value GT of the training data, and in the network training process, the network is trained by continuously improving the network parameters to reduce the difference between the detection result realized by the proposed network model and the detection result obtained based on the vanishing point, so as to obtain the network architecture which can be finally used for the passable area detection.

7. The method for determining an end-to-end unsupervised scene road surface area according to claim 1, wherein in step 4), the UC-FCN-L network is trained in an unsupervised manner to obtain a deep network model for extracting a travelable area.

8. An end-to-end unsupervised scene pavement area determination method as claimed in claim 7, characterized by unsupervised training, i.e. dividing the samples into marked samples and unmarked samples, the marked samples being a training sample set D_l＝{(x₁,y₁),(x₂,y₂),...(x_l,y_l) The class is marked as a known sample, and the unmarked sample is the training sample set D_u＝{x_l+1,x_l+2,x_l+uThe u classes in the } mark unknown samples, u is much larger than l, based on the marked samples D_lTraining to construct a model, unlabeled sample D_uThe training mode in which the contained information is not utilized is called supervised learning, if the labeled sample D is lacked_lIn the case of the sample, it is necessary to consider the unlabeled sample D_uThe model learning is realized.