CN112731436B

CN112731436B - Multi-mode data fusion travelable region detection method based on point cloud up-sampling

Info

Publication number: CN112731436B
Application number: CN202011501003.5A
Authority: CN
Inventors: 金晓; 沈会良
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2024-03-19
Anticipated expiration: 2040-12-17
Also published as: CN112731436A

Abstract

The invention discloses a multi-modal data fusion drivable area detection method based on point cloud upsampling, which mainly includes two parts: spatial point cloud adaptive upsampling and multi-modal data fusion drivable area detection. The camera and lidar are registered through a joint calibration algorithm, the point cloud is projected onto the image plane to obtain a sparse point cloud image, the pixel local window is used to calculate edge intensity information, and the point cloud upsampling scheme is adaptively selected to obtain a dense point cloud image; the obtained Dense point cloud images and RGB images are used for feature extraction and cross-fusion to achieve rapid detection of drivable areas. The detection method of the present invention can realize rapid and accurate detection and segmentation of drivable areas.

Description

Multimodal data fusion drivable area detection method based on point cloud upsampling

技术领域Technical field

本发明涉及一种基于点云上采样的多模态数据融合可行驶区域检测方法，主要包括空间点云的自适应上采样、多模态数据融合可行驶区域检测两部分。The invention relates to a multi-modal data fusion drivable area detection method based on point cloud upsampling, which mainly includes adaptive upsampling of spatial point clouds and multi-modal data fusion drivable area detection.

背景技术Background technique

根据选择的传感器种类不同，当前的可行驶区域检测算法主要存在着以相机为主和以激光雷达为主的两种方案。其中相机具有成本低、帧率高和分辨率高等诸多优点，但是其容易受到天气等因素干扰，鲁棒程度较低。另一方面，激光雷达以三维点云为主要获取数据，其虽然在分辨率和成本上有所不足，但具有很高的三维测量精度以及较强的抗干扰能力，因而在无人系统中得到了普遍应用。对于点云的稀疏性，现有的一些方法采用了诸如联合双边滤波上采样的方式，在局部窗口内进行加权估计，进而获取致密空间信息，但其大多存在着边缘恢复较为模糊、细节保留程度不够等问题。Depending on the type of sensor selected, the current drivable area detection algorithms mainly include two solutions: camera-based and lidar-based. Among them, the camera has many advantages such as low cost, high frame rate and high resolution, but it is easily interfered by factors such as weather and has low robustness. On the other hand, lidar mainly obtains data from three-dimensional point clouds. Although it is insufficient in resolution and cost, it has high three-dimensional measurement accuracy and strong anti-interference ability, so it is widely used in unmanned systems. universally applied. Regarding the sparsity of point clouds, some existing methods use methods such as joint bilateral filtering upsampling to perform weighted estimation within a local window to obtain dense spatial information. However, most of them suffer from blurred edge recovery and poor detail retention. Not enough and other issues.

随着对可行驶区域检测算法精度要求的不断提高，使用单一传感器进行可行驶区域检测虽在部分场景下可以实现较为可靠的检测，但仍存在着一定的局限性。为了获得更好的检测效果，基于图像和点云的融合方法也开始不断出现。As the accuracy requirements for drivable area detection algorithms continue to increase, although using a single sensor to detect drivable areas can achieve more reliable detection in some scenarios, there are still certain limitations. In order to obtain better detection results, fusion methods based on images and point clouds have also begun to appear.

Zhang Y等在文献【"Fusion of LiDAR and Camera by Scanning in LiDARImagery and Image-Guided Diffusion for Urban Road Detection,"[J].2018:579-584.】中提出了一种传统的相机与激光雷达融合方法。该方法在对点云进行初步筛选的基础上利用行和列扫描思想确定可行驶区域离散点云，并以图像为引导实现道路区域的像素级分割。该方法的缺陷在于检测过程没有充分利用图像信息，对于一些结构化程度较差的道路场景不太适用。Zhang Y et al. proposed a traditional camera and lidar fusion in the literature ["Fusion of LiDAR and Camera by Scanning in LiDARImagery and Image-Guided Diffusion for Urban Road Detection,"[J].2018:579-584.] method. This method uses the idea of row and column scanning to determine the discrete point cloud of the drivable area based on the preliminary screening of point clouds, and uses the image as a guide to achieve pixel-level segmentation of the road area. The disadvantage of this method is that the detection process does not fully utilize the image information, and is not suitable for some poorly structured road scenes.

发明内容Contents of the invention

为克服上述缺陷，本发明要解决的技术问题是提供一种基于边缘强度自适应的空间点云上采样方法，增强边缘与细节信息的保留。In order to overcome the above defects, the technical problem to be solved by the present invention is to provide a spatial point cloud upsampling method based on edge intensity adaptation to enhance the retention of edge and detail information.

与此相应，本发明另一个要解决的技术问题是提供一种能充分融合点云和图像特性的可行驶区域检测框架。Correspondingly, another technical problem to be solved by the present invention is to provide a drivable area detection framework that can fully integrate point cloud and image characteristics.

对于智能车可行驶区域检测而言，本发明解决上述技术问题主要包括如下步骤：基于像素边缘强度完成稀疏点云的自适应上采样，然后将同步后的RGB图像与稠密点云图作为输入，进行特征提取、融合，输出检测结果。For smart car drivable area detection, the present invention solves the above technical problems and mainly includes the following steps: completing adaptive upsampling of sparse point clouds based on pixel edge intensity, and then using the synchronized RGB image and dense point cloud image as input to perform Feature extraction, fusion, and output detection results.

本发明具体采用以下技术方案实现：The present invention is specifically implemented by adopting the following technical solutions:

一种基于点云上采样的多模态数据融合可行驶区域检测方法，该方法通过联合标定算法对相机与激光雷达进行标定，将点云投影至图像平面获取稀疏点云图，利用像素局部窗口计算边缘强度信息，自适应选择点云上采样方案，获得稠密点云图；对所得稠密点云图与RGB图像进行特征提取和交叉融合，实现可行驶区域快速检测。A multi-modal data fusion drivable area detection method based on point cloud upsampling. This method uses a joint calibration algorithm to calibrate the camera and lidar, project the point cloud to the image plane to obtain a sparse point cloud map, and use pixel local windows to calculate Based on edge intensity information, the point cloud upsampling scheme is adaptively selected to obtain a dense point cloud image; feature extraction and cross-fusion of the obtained dense point cloud image and RGB image are performed to achieve rapid detection of drivable areas.

上述技术方案中，进一步地，在稀疏点云图的基础上可利用像素局部窗口计算边缘强度信息，进而将像素分为非边缘区域和边缘区域两类，并据此完成自适应上采样。利用像素局部窗口计算边缘强度信息，具体为：对于每一个像素，利用像素局部窗口内的点云距离，按如下公式计算边缘强度信息，当边缘强度信息大于指定阈值τ时，认为该像素处于边缘区域，否则认为该像素处于非边缘区域：In the above technical solution, further, based on the sparse point cloud image, the edge intensity information can be calculated using the pixel local window, and then the pixels are divided into two categories: non-edge areas and edge areas, and adaptive upsampling is completed accordingly. Use the pixel local window to calculate the edge intensity information, specifically: for each pixel, use the point cloud distance within the pixel local window to calculate the edge intensity information according to the following formula. When the edge intensity information is greater than the specified threshold τ, the pixel is considered to be on the edge. area, otherwise the pixel is considered to be in a non-edge area:

其中，σ表示标准差计算，表示窗口内点云的平均距离，λ为一个固定参数。所述的像素局部窗口是指以该像素为中心的邻域窗口，所述的边缘强度信息用于表征该像素处于边缘的可能性。Among them, σ represents the standard deviation calculation, Represents the average distance of the point cloud within the window, and λ is a fixed parameter. The pixel local window refers to the neighborhood window centered on the pixel, and the edge intensity information is used to represent the possibility that the pixel is on the edge.

更进一步地，所述的自适应选择点云上采样方案，具体为：对于非边缘区域像素而言，在其邻域窗口内只利用空间高斯核即可很好完成计算；而对于边缘像素而言，仅依靠空间位置会使边缘恢复趋于模糊，因此引入颜色信息，首先基于颜色和空间位置高斯核计算各点的初始权重，在此基础上根据点云平均深度，将局部窗口内的各点分为前景点和背景点两类，统计两类点的数量、权重和，并据此对各点进行权重的调整，最终完成待计算像素的空间位置信息估计；其中，所述的前景点为小于平均深度信息的点，所述的背景点为大于等于平均深度信息的点。Furthermore, the adaptive selection point cloud upsampling scheme described is as follows: for non-edge area pixels, the calculation can be completed well using only the spatial Gaussian kernel in its neighborhood window; while for edge pixels, In other words, relying only on spatial position will tend to blur the edge recovery. Therefore, color information is introduced. First, the initial weight of each point is calculated based on the color and spatial position Gaussian kernel. On this basis, each point in the local window is calculated based on the average depth of the point cloud. Points are divided into two categories: foreground points and background points. The number and weight sum of the two types of points are counted, and the weight of each point is adjusted accordingly, and finally the spatial position information estimation of the pixel to be calculated is completed; among them, the foreground point is a point with less than the average depth information, and the background point is a point with greater than or equal to the average depth information.

作为本发明的另一个改进，对所得稠密点云图与RGB图像进行特征提取和交叉融合，具体为：将同步后的稠密点云图和RGB图像作为输入，通过多层卷积网络进行特征提取与交叉融合，多层卷积网络同时结合空洞卷积与金字塔池化模块，可快速增大感受野并聚合多尺度的上下文信息。采用重点关注难检测区域与非道路区域检测结果的损失函数，提升检测准确率的同时确保车辆行驶的安全性。As another improvement of the present invention, feature extraction and cross-fusion are performed on the obtained dense point cloud image and RGB image. Specifically, the synchronized dense point cloud image and RGB image are used as input, and feature extraction and intersection are performed through a multi-layer convolution network. Fusion, a multi-layer convolutional network combined with atrous convolution and pyramid pooling modules can quickly increase the receptive field and aggregate multi-scale contextual information. A loss function that focuses on detection results in difficult-to-detect areas and non-road areas is used to improve detection accuracy while ensuring vehicle driving safety.

本发明的有益效果是：The beneficial effects of the present invention are:

相比于传统的联合双边滤波上采样算法，本发明采用的基于边缘强度的自适应点云上采样方法能够更加可靠地恢复场景的细节信息，提升准确率；同时，本发明所采用的RGB图像和稠密点云图融合方法能够有效地融合多模态数据的特性，综合两种传感器的优势，实现可行驶区域快速且准确的检测分割。本发明的多层卷积网络可实现感受野的快速增长及多尺度聚合信息，同时本发明还采用重点关注难检测区域和非道路区域的损失函数，能够准确、可靠地输出可行驶区域的检测结果，实现道路区域的快速检测与分割。Compared with the traditional joint bilateral filtering upsampling algorithm, the adaptive point cloud upsampling method based on edge strength used in the present invention can more reliably restore the detailed information of the scene and improve the accuracy; at the same time, the RGB image used in the present invention and dense point cloud image fusion method can effectively fuse the characteristics of multi-modal data, combine the advantages of the two sensors, and achieve fast and accurate detection and segmentation of drivable areas. The multi-layer convolution network of the present invention can achieve rapid growth of the receptive field and multi-scale aggregation of information. At the same time, the present invention also adopts a loss function that focuses on difficult-to-detect areas and non-road areas, and can accurately and reliably output the detection of drivable areas. As a result, rapid detection and segmentation of road areas are achieved.

附图说明Description of the drawings

下面结合附图对本发明的具体实施方式作进一步详细的说明。The specific embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

图1是基于点云上采样的多模态数据融合可行驶区域检测方法的流程图；Figure 1 is a flow chart of the multi-modal data fusion drivable area detection method based on point cloud upsampling;

图2(a)是稀疏点云图像，(b)是该场景对应的边缘区域表示；Figure 2(a) is a sparse point cloud image, and (b) is the corresponding edge area representation of the scene;

图3(a)是联合双边滤波上采样结果，(b)是本发明方法上采样结果；Figure 3(a) is the joint bilateral filtering upsampling result, (b) is the upsampling result of the method of the present invention;

图4是三个网络的可行驶区域检测结果对比以及对应的场景图Image和结果真值图Label，三个网络分别是：只输入图像的检测网络RGB、只输入稠密点云的检测网络Lidar、本发明的多模态数据融合检测网络Fusion；Figure 4 is a comparison of the drivable area detection results of the three networks, as well as the corresponding scene graph Image and result truth map Label. The three networks are: the detection network RGB that only inputs images, the detection network Lidar that only inputs dense point clouds, Multi-modal data fusion detection network Fusion of the present invention;

图5是本发明的多层卷积网络结构图。Figure 5 is a structural diagram of the multi-layer convolutional network of the present invention.

具体实施方式Detailed ways

如图1所示，本发明提供的一种基于点云上采样的多模态数据融合可行驶区域检测方法，具体实施方式如下：As shown in Figure 1, the present invention provides a multi-modal data fusion drivable area detection method based on point cloud upsampling. The specific implementation is as follows:

1.相机内参标定，相机和激光雷达外参联合标定，具体如下所示。1. Camera internal parameter calibration, camera and lidar external parameter joint calibration, as shown below.

1.1固定相机与激光雷达位置，基于硬触发机制同步采集点云和图像数据；1.1 Fixed the positions of the camera and lidar, and synchronously collected point cloud and image data based on the hard trigger mechanism;

1.2根据单目标定获取相机内参信息，同时获取标定板在每帧相机和激光雷达坐标系下的平面方程，分别记为a_c，i和a_l，i，其中i表示帧数，c表示相机坐标系，l表示激光雷达坐标系。用θ表示标定板平面法向量，X表示平面上的空间点，d表示坐标系原点至平面的距离，存在平面约束为1.2 Obtain camera internal parameter information based on single-object calibration, and at the same time obtain the plane equation of the calibration plate in the camera and lidar coordinate systems of each frame, which are recorded as a _c,i and a _l,i respectively, where i represents the number of frames and c represents the camera Coordinate system, l represents the lidar coordinate system. Use θ to represent the normal vector of the calibration plate plane, X to represent the space point on the plane, d to represent the distance from the origin of the coordinate system to the plane, and the plane constraint is

a_c，i：θ_c，iX+d_c，i＝0a _{c, i} : θ _{c, i} X+d _{c, i} = 0

a_l，i：θ_l，iX+d_l，i＝0a _{l, i} : θ _{l, i} X+d _{l, i} = 0

1.3构建如下优化方程，求解旋转矩阵R和平移向量t，其中L表示每一帧平面上的点数，num为总帧数。1.3 Construct the following optimization equation to solve for the rotation matrix R and the translation vector t, where L represents the number of points on the plane of each frame, and num is the total number of frames.

2.根据联合标定结果，将激光点云投影至图像平面，获取初始的稀疏点云图。对于每一个像素，利用局部窗口内的点云距离，按如下公式计算边缘强度信息T，当边缘强度大于指定阈值τ(阈值τ可根据需要选择，如取1.1)时，认为该像素处于边缘区域。2. According to the joint calibration results, project the laser point cloud to the image plane to obtain the initial sparse point cloud image. For each pixel, use the point cloud distance within the local window to calculate the edge intensity information T according to the following formula. When the edge intensity is greater than the specified threshold τ (the threshold τ can be selected as needed, such as 1.1), the pixel is considered to be in the edge area. .

其中σ表示标准差计算，表示窗口内点云的平均距离，λ为一个固定参数，此处取值为3。图2中(a)、(b)分别为稀疏点云图和与之对应的边缘图表示。where σ represents the standard deviation calculation, Represents the average distance of the point cloud within the window, λ is a fixed parameter, and the value here is 3. (a) and (b) in Figure 2 are sparse point cloud images and corresponding edge image representations respectively.

3.根据上述边缘强度信息，将图像中各像素划分为非边缘区域、边缘区域两种类型，并据此完成相应的点云上采样，实现稀疏点云的稠密化，获得稠密点云图。3. Based on the above edge intensity information, divide each pixel in the image into two types: non-edge area and edge area, and complete the corresponding point cloud upsampling accordingly to achieve densification of the sparse point cloud and obtain a dense point cloud image.

3.1若像素q处于非边缘区域，则在其邻域N(q)内直接利用空间高斯核计算加权结果，避免因颜色差异过大而导致的点云重建不平滑。3.1 If pixel q is in a non-edge area, directly use the spatial Gaussian kernel to calculate the weighted result in its neighborhood N(q) to avoid uneven point cloud reconstruction caused by excessive color differences.

3.2若q处于边缘区域，为避免边缘恢复过于模糊，参考联合双边滤波上采样方法进行处理，首先利用颜色和空间位置相似性对各点赋予初始权重g(p)，s表示求和计算，目的是对空间和颜色的差异性进行平衡，I表示RGB图像的像素值，具体如3.2 If q is in the edge area, in order to avoid the edge recovery being too blurry, refer to the joint bilateral filtering upsampling method for processing. First, use the similarity of color and spatial position to assign an initial weight g(p) to each point, and s represents the summation calculation. The purpose It is to balance the differences in space and color. I represents the pixel value of the RGB image, specifically as

在此基础上，考虑局部窗口内点云的空间分布相关性，将其按深度信息分成前景点与背景点两类，分别记成F和B，其中，前景点为小于平均深度信息的点，背景点为大于等于平均深度信息的点，c表示邻域点云的类别F或B，m和n表示两个类别的数量及权重和，t_q表示当前像素的边缘强度，按类别计算各点的权重调整因子，整体如On this basis, considering the spatial distribution correlation of the point cloud in the local window, it is divided into two categories: foreground points and background points according to depth information, which are recorded as F and B respectively. Among them, the foreground points are points that are smaller than the average depth information. Background points are points that are greater than or equal to the average depth information. c represents the category F or B of the neighborhood point cloud. m and n represent the number and weight sum of the two categories. t _q represents the edge strength of the current pixel. Each point is calculated by category. The weight adjustment factor of

m_c＝|c|， m _c =|c|,

根据计算所得权重，计算当前像素对应的空间位置信息如According to the calculated weight, the spatial position information corresponding to the current pixel is calculated as follows:

本步骤中，表示待计算像素的空间位置信息，d_p表示邻域内已知的空间点，K表示归一化因子，σ_r和σ_I分别表示空间域和颜色域的标准差。In this step, represents the spatial position information of the pixel to be calculated, d _p represents the known spatial point in the neighborhood, K represents the normalization factor, σ _r and σ _I represent the standard deviation of the spatial domain and color domain respectively.

4.将RGB与步骤3得到的稠密点云图同时作为2个三通道数据输入，构建多模态数据融合可行驶区域检测网络(即多层卷积网络)。如图5所示，多层卷积网络采用双编码器(双编码器结构相同，但不共享参数)和单解码器结构，分别以RGB图像和稠密点云图作为原始输入，同一层的两个特征图通过1×1卷积进行交叉融合，结果作为下一层卷积网络的输入；将编码器所得输出结果作为金字塔池化模块输入，获取最终特征图输出；金字塔池化模块输出结果经过解码器恢复分辨率，并利用Sigmoid函数计算各像素属于可行驶区域的概率，当概率大于设定阈值时判断该像素属于可行驶区域。本发明的多层卷积网络结合了空洞卷积与金字塔池化模块，可以快速地增大感受野并聚合多尺度的上下文信息。4. Use RGB and the dense point cloud image obtained in step 3 as two three-channel data inputs at the same time to construct a multi-modal data fusion drivable area detection network (i.e., a multi-layer convolutional network). As shown in Figure 5, the multi-layer convolutional network uses a dual encoder (the dual encoder structure is the same, but does not share parameters) and a single decoder structure, using RGB images and dense point cloud images as original inputs respectively. The feature map is cross-fused through 1×1 convolution, and the result is used as the input of the next layer of convolutional network; the output result of the encoder is used as the input of the pyramid pooling module to obtain the final feature map output; the output result of the pyramid pooling module is decoded The device restores the resolution and uses the Sigmoid function to calculate the probability that each pixel belongs to the drivable area. When the probability is greater than the set threshold, it is judged that the pixel belongs to the drivable area. The multi-layer convolutional network of the present invention combines atrous convolution and pyramid pooling modules, which can quickly increase the receptive field and aggregate multi-scale contextual information.

在监督学习过程中，本发明设计损失函数如下所示，重点关注难检测区域和非道路区域的检测结果，提升检测准确率的同时确保车辆行驶的安全性。In the process of supervised learning, the present invention designs the loss function as follows, focusing on the detection results of difficult-to-detect areas and non-road areas to improve detection accuracy while ensuring the safety of vehicle driving.

其中y＝1和y＝0分别表示正负样本，正样本为道路区域，负样本为非道路区域，难检测区域是指检测较为困难的区域，对于正样本来说，其检测结果倾向于为非道路；对于负样本来说，检测结果倾向于道路。y′表示检测概率，α和γ为固定常数，此处均取值为2。Among them, y=1 and y=0 represent positive and negative samples respectively. Positive samples are road areas, and negative samples are non-road areas. Difficult detection areas refer to areas where detection is more difficult. For positive samples, the detection results tend to be Non-road; for negative samples, the detection results tend to be roads. y′ represents the detection probability, α and γ are fixed constants, both of which take the value 2 here.

特征图经过解码器恢复分辨率，并利用Sigmoid层计算各像素属于道路的概率，当概率大于设定阈值(如0.5)时判断该像素属于可行驶区域。The resolution of the feature map is restored through the decoder, and the Sigmoid layer is used to calculate the probability that each pixel belongs to the road. When the probability is greater than the set threshold (such as 0.5), the pixel is judged to belong to the drivable area.

实施例1Example 1

本实施例主要对比联合双边滤波上采样算法JBU与本发明中的基于边缘强度信息的自适应上采样方法的性能指标。本实施例通过将深度真值图5倍降采样获得稀疏点云图，对比两种方法的上采样效果。图3的(a)，(b)分别表示JBU上采样结果、本发明方法上采样结果。可以发现，本发明方法能够在减少重建误差的同时，更好地防止边缘模糊。This embodiment mainly compares the performance indicators of the joint bilateral filtering upsampling algorithm JBU and the adaptive upsampling method based on edge intensity information in the present invention. In this embodiment, a sparse point cloud image is obtained by downsampling the depth truth map by 5 times, and the upsampling effects of the two methods are compared. (a) and (b) of Figure 3 respectively represent the JBU upsampling results and the upsampling results of the method of the present invention. It can be found that the method of the present invention can better prevent edge blur while reducing reconstruction errors.

实施例2Example 2

本实例主要通过KITTI数据集对比单一图像数据网络、单一点云数据网络、本发明中多模态数据融合网络的可行驶区域检测性能，三种网络检测结果如图4所示，可以直观地看出，本发明中的多模态数据融合可行驶区域检测方法能够进一步提升道路检测的准确率，在很大程度上避免对车辆的误检测，同时提升边界检测的可靠性。This example mainly uses the KITTI data set to compare the drivable area detection performance of a single image data network, a single point cloud data network, and the multi-modal data fusion network in the present invention. The three network detection results are shown in Figure 4, which can be viewed intuitively It can be seen that the multi-modal data fusion drivable area detection method in the present invention can further improve the accuracy of road detection, avoid false detection of vehicles to a large extent, and at the same time improve the reliability of boundary detection.

以上所述的具体实施方式对本发明的技术方案和有益效果进行了详细说明，应理解的是以上所述仅为本发明的优选实施例子，并不用于限制本发明，凡在本发明的原则范围内所做的任何修改、补充和等同替换等，均应包含在本发明的保护范围之内。The above-described specific embodiments describe in detail the technical solutions and beneficial effects of the present invention. It should be understood that the above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, additions, equivalent substitutions, etc. made within shall be included in the protection scope of the present invention.

Claims

1. A multi-modal data fusion drivable area detection method based on point cloud upsampling, which is characterized in that the method uses a joint calibration algorithm to calibrate the camera and lidar, and projects the point cloud to the image plane to obtain a sparse point cloud map , use the pixel local window to calculate edge intensity information, adaptively select the point cloud upsampling scheme, and obtain a dense point cloud image; perform feature extraction and cross-fusion of the obtained dense point cloud image and RGB image to achieve rapid detection of drivable areas;

The described calculation of edge intensity information using a pixel local window is specifically as follows: for each pixel, the point cloud distance within the pixel local window is used to calculate the edge intensity information according to the following formula. When the edge intensity information is greater than the specified threshold τ, it is considered to be the edge intensity information. The pixel is in the edge area, otherwise the pixel is considered to be in the non-edge area:

Among them, σ represents the standard deviation calculation, Represents the average distance of the point cloud within the window, λ is a fixed parameter;

The described adaptive selection point cloud upsampling scheme is specifically: for non-edge area pixels, the spatial Gaussian kernel is directly used to calculate the weighted result within its local window; for edge area pixels, the spatial and color Gaussian kernels are first jointly calculated separately. The weight of each point in the local window; secondly, the point cloud is divided into two categories: foreground points and background points according to the average depth of the point cloud in the window, the number and weight sum of the two types of points in the local window are counted, and the weight of each point is calculated accordingly. The weights are adjusted; finally, the updated weights are used to weight each point within the local window to complete the calculation of the spatial position information of the pixels to be calculated.

2. The multi-modal data fusion drivable area detection method based on point cloud upsampling according to claim 1, characterized in that feature extraction and cross-fusion are performed on the obtained dense point cloud image and RGB image, specifically: RGB Images and dense point cloud images are used as input at the same time, and multi-layer convolutional networks are used for feature extraction and cross-fusion. The loss function focuses on the detection results of difficult-to-detect areas and non-road areas, and outputs the detection probability of drivable areas;

The loss function is as follows:

Among them, y=1 and y=0 represent positive and negative samples respectively. Positive samples are road areas, and negative samples are non-road areas. Difficult detection areas refer to areas where detection is more difficult. For positive samples, the detection results tend to be Non-road; for negative samples, the detection results tend to be roads; y′ represents the probability of being judged to be a road area, and α and γ are fixed constants.

3. The multi-modal data fusion drivable area detection method based on point cloud upsampling according to claim 2, characterized in that the multi-layer convolution network structure adopts a dual encoder and a single decoder structure. The two encoders have the same structure, but do not share parameters. They use RGB images and dense point cloud images as original inputs respectively. For the two feature maps output by the same layer encoder, 1×1 convolution is used for cross-fusion, and the fusion result is used as the next The input of layer convolution, and the downsampled feature map obtained by the dual encoder is input into the pyramid pooling module to obtain the final feature map output;

The output result of the pyramid pooling module is restored to resolution through the decoder, and the Sigmoid function is used to calculate the probability that each pixel belongs to the drivable area. When the probability is greater than the set threshold, the pixel is judged to belong to the drivable area.