CN114140765B

CN114140765B - Obstacle sensing method and device and storage medium

Info

Publication number: CN114140765B
Application number: CN202111338928.7A
Authority: CN
Inventors: 吴新开; 徐少清; 王朋成
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2022-06-24
Anticipated expiration: 2041-11-12
Also published as: CN114140765A

Abstract

The present application discloses an obstacle sensing method, device and storage medium, which are used to reduce the false detection and false detection rate of obstacles and improve the detection accuracy. The obstacle perception method disclosed in the present application includes: obtaining the original point cloud and camera picture at the same time; obtaining the calibration internal and external parameters of projection transformation; performing semantic segmentation on the original point cloud to obtain a second point cloud; Semantic segmentation is performed on the camera picture to obtain a second picture; according to the internal parameters and the external parameters, the original point cloud is projected to the second picture to obtain a third point cloud. Each point includes second semantic category information corresponding to the second picture; for the second semantic category information in the third point cloud, the first semantic category information in the second point cloud and the original point After the feature information of the cloud is voxelized, it is input into the adaptive attention mechanism network for learning, and the weighted semantic information is obtained; according to the weighted semantic information, the obstacle target is detected. The present application also provides an obstacle sensing device and a storage medium.

Description

Obstacle sensing method, device and storage medium

技术领域technical field

本申请涉及自动驾驶领域领域，尤其涉及一种障碍物感知方法、装置和存储介质。The present application relates to the field of automatic driving, and in particular, to an obstacle sensing method, device and storage medium.

背景技术Background technique

随着自动驾驶技术的不断发展，各类传感器作为自动驾驶系统的重要组成部分。自动驾驶系统的环境感知部分通常需要获取大量周围环境信息，确保自动车对车身周围环境的正确理解和对应决策。但使用单个传感器感知存在局限性，一方面单个感知设备由于传感器安装位置限制有可能存在检测盲区；另一方面每个传感器有各自的特有缺陷。With the continuous development of autonomous driving technology, various sensors are an important part of the autonomous driving system. The environmental perception part of the automatic driving system usually needs to obtain a large amount of surrounding environment information to ensure the correct understanding and corresponding decision-making of the automatic vehicle's surrounding environment. However, there are limitations in using a single sensor for sensing. On the one hand, a single sensing device may have a detection blind spot due to the limitation of the sensor installation location; on the other hand, each sensor has its own unique defects.

由此可见，使用单传感器进行障碍物感知，存在识别精度不高的问题。It can be seen that using a single sensor for obstacle perception has the problem of low recognition accuracy.

发明内容SUMMARY OF THE INVENTION

针对上述技术问题，本申请实施例提供了一种障碍物感知方法、装置及存储介质，用以提高障碍物感知的精度。In view of the above technical problems, embodiments of the present application provide an obstacle sensing method, device, and storage medium, so as to improve the accuracy of obstacle sensing.

第一方面，本申请实施例提供的一种障碍物感知方法，包括：In a first aspect, an obstacle sensing method provided by an embodiment of the present application includes:

获取同一时刻的原始点云和相机图片；Get the original point cloud and camera image at the same moment;

获取投影转换的标定内参和外参；Obtain the calibration internal and external parameters of the projection transformation;

对所述原始点云进行语义分割，得到第二点云；Semantically segment the original point cloud to obtain a second point cloud;

对所述相机图片进行语义分割，得到第二图片；Semantically segment the camera picture to obtain a second picture;

根据所述内参和所述外参，进行所述原始点云到所述第二图片的投影，得到第三点云，所述第三点云中的每个点包括所述第二图片对应的第二语义类别信息；According to the internal reference and the external reference, the original point cloud is projected to the second picture to obtain a third point cloud, where each point in the third point cloud includes the corresponding second semantic category information;

对所述第三点云中的第二语义类别信息，所述第二点云中的第一语义类别信息和所述原始点云的特征信息进行体素化后，输入自适应注意力机制网络中进行学习，得到加权后的语义信息；After voxelizing the second semantic category information in the third point cloud, the first semantic category information in the second point cloud and the feature information of the original point cloud, input the adaptive attention mechanism network learning in the process to obtain weighted semantic information;

根据所述加权后的语义信息，检测障碍物目标；Detecting an obstacle target according to the weighted semantic information;

其中，所述第二点云和第三点云包括障碍物类别信息。Wherein, the second point cloud and the third point cloud include obstacle category information.

优选的，所述输入自适应注意力机制网络中进行学习包括：Preferably, the learning in the input adaptive attention mechanism network includes:

对局部特征在自适应注意力机制网络中进行学习，得到学习后的局部特征V_i；Learning the local features in the adaptive attention mechanism network to obtain the learned local features V _i ;

对全局特征在自适应注意力机制网络中进行学习，得到学习后的全局特征V_global；The global feature is learned in the adaptive attention mechanism network, and the learned global feature V _global is obtained;

将所述学习后的全局特征V_global拼接到每个局部特征V_i上，得到增强特征V_gl。The learned global feature V _global is spliced onto each local feature V _i to obtain an enhanced feature V _gl .

优选的，所述根据所述第一语义信息，检测障碍物目标包括：Preferably, the detecting an obstacle target according to the first semantic information includes:

将所述第一语义信息，输入到目标检测器中进行障碍物目标的检测。The first semantic information is input into the target detector to detect the obstacle target.

优选的，所述获取同一时刻的原始点云和相机图片包括：Preferably, the obtaining the original point cloud and the camera picture at the same moment includes:

对点云和相机进行软件同步或者硬件同步；Software synchronization or hardware synchronization of point cloud and camera;

获得同一时刻的原始点云和相机图片。Get the original point cloud and camera image at the same moment.

优选的，所述原始点云进行语义分割，得到第二点云包括：Preferably, the original point cloud is semantically segmented, and obtaining the second point cloud includes:

将所述原始点云输入到点云语义分割网络中，得到第二点云。The original point cloud is input into the point cloud semantic segmentation network to obtain a second point cloud.

优选的，所述对所述相机图片进行语义分割，得到第二图片包括：Preferably, the semantic segmentation of the camera picture to obtain the second picture includes:

将所述相机图片输入到图片语义分割网络中，得到第二图片。The camera picture is input into the picture semantic segmentation network to obtain the second picture.

所述对所述第三点云中的第二语义类别信息，所述第二点云中的第一语义类别信息和所述原始点云的特征信息进行体素化之前，还包括：Before performing voxelization on the second semantic category information in the third point cloud, the first semantic category information in the second point cloud, and the feature information of the original point cloud, the method further includes:

将所述第一语义类别信息和所述第二语义类别信息转换为One-Hot编码格式。Convert the first semantic category information and the second semantic category information into One-Hot encoding format.

优选的，所述进行所述原始点云到所述第二图片的投影包括：Preferably, performing the projection of the original point cloud to the second picture includes:

根据以下公式进行投影：Projection is performed according to the following formula:

P′＝Proj(K,M,P)，P′=Proj(K,M,P),

其中，Proj为投影矩阵处理过程；Among them, Proj is the projection matrix processing process;

K为相机的内参矩阵；K is the internal parameter matrix of the camera;

M为相机到激光雷达的外参矩阵；M is the extrinsic parameter matrix from camera to lidar;

P为激光雷达点云集合；P is the lidar point cloud collection;

P'为投影到相机坐标系后的激光雷达点云。P' is the lidar point cloud projected to the camera coordinate system.

所述对局部特征在自适应注意力机制网络中进行学习，得到学习后的局部特征V_i包括：The local features are learned in the adaptive attention mechanism network, and the learned local features V _i include:

根据以下公式进行局部特征的学习：The learning of local features is carried out according to the following formula:

V_i＝max_{i＝1,2,…,N}{MLP_l(p_i)}，V _i =max _i _=1,2,...,N {MLP _l (pi )},

其中，V_i是学习得到的第i个立体体素格内的特征；Among them, V _i is the feature in the i-th three-dimensional voxel lattice obtained by learning;

p_i为空间点云中的第i个点；p _i is the ith point in the spatial point cloud;

MLP_l(p_i)为局部特征多层感知机；MLP _l ( _pi ) is a local feature multilayer perceptron;

max为对一个体素内所有点进行最大池化操作；max is the maximum pooling operation for all points in a voxel;

C₁为局部特征图的通道数量；C ₁ is the number of channels of the local feature map;

N为体素格的数量。N is the number of voxel grids.

优选的，所述对全局特征在自适应注意力机制网络中进行学习，得到学习后的全局特征V_global包括：Preferably, the global feature is learned in an adaptive attention mechanism network, and the learned global feature V _global includes:

根据以下公式进行全局特征的学习：The learning of global features is carried out according to the following formula:

V_global＝max_{i＝1,2,…,N}{MLP_g(V_i)}V _global =max _i=1,2,...,N {MLP _g (V _i )}

MLP_g(V_i)为全局特征多层感知机；MLP _g (V _i ) is a global feature multilayer perceptron;

max为对所有体素进行最大池化操作；max is the maximum pooling operation for all voxels;

C₂为整个特征图的通道数量；C ₂ is the number of channels in the entire feature map;

N为体素格的数量；N is the number of voxels;

V_i为学习得到的第i个立体体素格内的特征。V _i is the learned feature in the i-th voxel grid.

所述得到加权后的语义信息包括：The weighted semantic information includes:

根据以下公式得到加权后的语义信息：The weighted semantic information is obtained according to the following formula:

其中，P_a,s和P_a,t为加权后的语义信息；Among them, P _a,s and P _a,t are the weighted semantic information;

P_2D为所述第二语义信息；P _2D is the second semantic information;

P_3D为所述第一语义信息；P _3D is the first semantic information;

MLP_att为多层感知机；MLP _att is a multi-layer perceptron;

σ为Sigmoid激活函数。σ is the sigmoid activation function.

使用本发明提供的障碍物感知方法，将激光雷达传感器以及相机传感器相融合，利用不同传感器的优势，同时也补充了各自传感器自身的不足，提高了点云目标检测的感知识别精度。同时，本方案中，利用三维点云语义分割信息及二维图片语义分割信息相结合的深度学习网络，利用不同传感器的语义信息，降低了障碍物的误检及错检率。Using the obstacle sensing method provided by the present invention, the lidar sensor and the camera sensor are integrated, the advantages of different sensors are used, and the shortcomings of the respective sensors are supplemented, and the perception and recognition accuracy of point cloud target detection is improved. At the same time, in this solution, the deep learning network combining the semantic segmentation information of 3D point cloud and the semantic segmentation information of 2D image is used, and the semantic information of different sensors is used to reduce the false detection and false detection rate of obstacles.

第二方面，本申请实施例还提供一种障碍物感知装置，包括：In a second aspect, an embodiment of the present application further provides an obstacle sensing device, including:

图片获取模块，被配置用于获取相机图片，获取投影转换的标定内参和外参；The image acquisition module is configured to acquire the camera image, and to acquire the calibration internal and external parameters of the projection transformation;

点云获取模块，被配置用于获取原始点云；A point cloud acquisition module, configured to acquire the original point cloud;

图片语义分割模块，被配置用于对所述相机图片进行语义分割，得到第二图片；The image semantic segmentation module is configured to perform semantic segmentation on the camera image to obtain a second image;

点云语义分割模块，被配置用于对所述原始点云进行语义分割，得到第二点云；a point cloud semantic segmentation module, configured to perform semantic segmentation on the original point cloud to obtain a second point cloud;

图片语义投影模块，被配置用于根据所述内参和所述外参，进行所述原始点云到所述第二图片的投影，得到第三点云，所述第三点云中的每个点包括所述第二图片对应的第二语义类别信息；The image semantic projection module is configured to perform the projection of the original point cloud to the second image according to the internal parameters and the external parameters to obtain a third point cloud, each of which is in the third point cloud. The point includes second semantic category information corresponding to the second picture;

语义融合模块，被配置用于对所述第三点云中的第二语义类别信息，所述第二点云中的第一语义类别信息和所述原始点云的特征信息进行体素化后，输入自适应注意力机制网络中进行学习，得到加权后的语义信息；a semantic fusion module, configured to perform voxelization on the second semantic category information in the third point cloud, the first semantic category information in the second point cloud and the feature information of the original point cloud , input into the adaptive attention mechanism network for learning, and obtain weighted semantic information;

障碍物感知模块，被配置用于根据加权后的语义信息，检测障碍物目标；The obstacle perception module is configured to detect the obstacle target according to the weighted semantic information;

其中，所述第二点云和第二图片包括障碍物类别信息。Wherein, the second point cloud and the second picture include obstacle category information.

第三方面，本申请实施例还提供一种障碍物感知装置，包括：存储器、处理器和用户接口；In a third aspect, an embodiment of the present application further provides an obstacle sensing device, including: a memory, a processor, and a user interface;

所述存储器，用于存储计算机程序；the memory for storing computer programs;

所述用户接口，用于与用户实现交互；the user interface for interacting with the user;

所述处理器，用于读取所述存储器中的计算机程序，所述处理器执行所述计算机程序时，实现本发明提供的障碍物感知方法。The processor is configured to read the computer program in the memory, and when the processor executes the computer program, the obstacle sensing method provided by the present invention is implemented.

第四方面，本申请实施例还提供一种处理器可读存储介质，所述处理器可读存储介质存储有计算机程序，所述处理器执行所述计算机程序时实现本发明提供的障碍物感知方法。In a fourth aspect, an embodiment of the present application further provides a processor-readable storage medium, where the processor-readable storage medium stores a computer program, and when the processor executes the computer program, the obstacle perception provided by the present invention is implemented method.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简要介绍，显而易见地，下面描述中的附图仅是本申请的一些实施例，对于本领域的普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本申请实施例提供的障碍物感知示意图；FIG. 1 is a schematic diagram of obstacle perception provided by an embodiment of the present application;

图2为本申请实施例提供的自适应注意力机制流程示意图；FIG. 2 is a schematic flowchart of an adaptive attention mechanism provided by an embodiment of the present application;

图3为本申请实施例提供的障碍物感知装置结构示意图；FIG. 3 is a schematic structural diagram of an obstacle sensing device provided by an embodiment of the present application;

图4为本申请实施例提供的另一种障碍物感知装置结构示意图。FIG. 4 is a schematic structural diagram of another obstacle sensing device provided by an embodiment of the present application.

具体实施方式Detailed ways

为了使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明作进一步地详细描述，显然，所描述的实施例仅仅是本发明一部份实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. . Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

下面对文中出现的一些词语进行解释：The following is an explanation of some words that appear in the text:

1、本发明实施例中术语“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。1. The term "and/or" in the embodiment of the present invention describes the association relationship of the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist simultaneously, and a separate relationship exists. There are three cases of B. The character "/" generally indicates that the associated objects are an "or" relationship.

2、本申请实施例中术语“多个”是指两个或两个以上，其它量词与之类似。2. The term "plurality" in the embodiments of the present application refers to two or more than two, and other quantifiers are similar.

3、One-Hot编码格式，One-Hot编码又称一位有效编码，在分类预测中经常用到，通常以二进制向量的形式呈现。首先将物体所属的种类映射为整数值，然后将其转换为二进制编码，即所属类别维度的值为1，剩余维度的值均为0。3. One-Hot encoding format, One-Hot encoding, also known as one-bit effective encoding, is often used in classification prediction and is usually presented in the form of a binary vector. First, map the category to which the object belongs to an integer value, and then convert it to a binary code, that is, the value of the category dimension is 1, and the value of the remaining dimension is 0.

4、体素化是指把三维点云按照相同的分辨率大小的格子(如0.75m*0.75m*0.75)进行分割，按照点云中每个点的空间位置的不同，放入到不同的体素格内。4. Voxelization refers to dividing the three-dimensional point cloud according to the grid of the same resolution size (such as 0.75m*0.75m*0.75), and putting it into different points according to the different spatial positions of each point in the point cloud. within the voxel grid.

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，并不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

需要说明的是，本申请实施例的展示顺序仅代表实施例的先后顺序，并不代表实施例所提供的技术方案的优劣。It should be noted that the display order of the embodiments of the present application only represents the sequence of the embodiments, and does not represent the advantages and disadvantages of the technical solutions provided by the embodiments.

实施例一Example 1

参见图1，本申请实施例提供的一种障碍物感知方法示意图，如图1所示，该方法包括步骤S101到S107：Referring to FIG. 1, a schematic diagram of an obstacle sensing method provided by an embodiment of the present application, as shown in FIG. 1, the method includes steps S101 to S107:

S101、获取同一时刻的原始点云和相机图片；S101. Obtain the original point cloud and the camera picture at the same moment;

本发明实施例中，原始点云是三维点云，可通过激光雷达获取。相机图片通过相机获取，若安装有多台相机，则同时获取多台相机的相机图片。原始点云的获取时间与相机图片的获取时间相同，即获取同一时刻的原始点云和相机图片。In the embodiment of the present invention, the original point cloud is a three-dimensional point cloud, which can be acquired by a laser radar. The camera picture is obtained through the camera. If multiple cameras are installed, the camera pictures of the multiple cameras will be obtained at the same time. The acquisition time of the original point cloud is the same as the acquisition time of the camera image, that is, the original point cloud and the camera image are acquired at the same moment.

作为一种优选示例，本步骤中，原始点云和相机图片还可以不是同一时刻，但是获取原始点云的时间与获取相机图片的时间之差在预定的时间差范围之内，预定的时间差范围实现确定，例如获取原始点云的时间与获取相机图片的时间相差0.001秒。As a preferred example, in this step, the original point cloud and the camera picture may not be at the same time, but the difference between the time when the original point cloud is obtained and the time when the camera picture is obtained is within a predetermined time difference range, and the predetermined time difference range is achieved. Make sure, for example, that there is a difference of 0.001 seconds between the time when the original point cloud was acquired and the time when the camera image was acquired.

作为一种优选示例，为了获取同一时刻的原始点云和相机内参，可对点云和相机进行软同步或者硬件同步。软同步是指提供一个相同的时间源给不同的传感器，传感器分别在记录的数据打上时间戳；硬件同步是指用硬件触发器，直接通过物理信号，如PPS授时，触发不同传感器后记录时间。As a preferred example, in order to obtain the original point cloud and camera internal parameters at the same moment, soft synchronization or hardware synchronization may be performed on the point cloud and the camera. Soft synchronization refers to providing the same time source to different sensors, and the sensors stamp the recorded data with time stamps; hardware synchronization refers to using hardware triggers to record time directly through physical signals, such as PPS timing, after triggering different sensors.

S102、获取投影转换的标定内参和外参；S102, obtaining the calibration internal parameters and external parameters of the projection transformation;

本步骤中，获取相机与激光雷达相应的投影转换的标定内参和外参。In this step, the calibration internal parameters and external parameters of the projection transformation corresponding to the camera and the lidar are obtained.

本发明实施例中，内参为标定好的相机的内部参数；外参为相机和激光雷达的外部参数；内参和外参用于点云的投影转换。In the embodiment of the present invention, the internal parameters are the internal parameters of the calibrated camera; the external parameters are the external parameters of the camera and the laser radar; the internal parameters and the external parameters are used for the projection transformation of the point cloud.

优选的，内参包括但不限于：畸变系数、焦距、像素大小等；外参包括但不限于：旋转、平移矩阵等。Preferably, the internal parameters include but are not limited to: distortion coefficient, focal length, pixel size, etc.; external parameters include but are not limited to: rotation, translation matrix, and the like.

需要说明的是，本发明实施例中，内参和外参是预先标定好并存储的。It should be noted that, in the embodiment of the present invention, the internal reference and the external reference are pre-calibrated and stored.

S103、对所述原始点云进行语义分割，得到第二点云；S103, performing semantic segmentation on the original point cloud to obtain a second point cloud;

作为一种优选示例，将所述原始点云输入到点云语义分割网络中。As a preferred example, the original point cloud is input into the point cloud semantic segmentation network.

本步骤中，将一帧点云输送到点云语义分割网络中，将得到包含细粒度语义信息的第二点云，即第二点云中包括第一语义信息。In this step, a frame of point cloud is sent to the point cloud semantic segmentation network, and a second point cloud containing fine-grained semantic information will be obtained, that is, the second point cloud includes the first semantic information.

需要说明的是，细粒度语义信息指的是每个点的类别信息明确，不受其他外在条件的干扰，例如不会受到内参和外参的影响。It should be noted that the fine-grained semantic information refers to that the category information of each point is clear and is not disturbed by other external conditions, such as the influence of internal and external parameters.

S104、对所述相机图片进行语义分割，得到第二图片；S104, performing semantic segmentation on the camera picture to obtain a second picture;

本步骤中，将所述相机图片输入到图片语义分割网络中，得到第二图片。In this step, the camera picture is input into the picture semantic segmentation network to obtain the second picture.

需要说明的是，作为一种优选示例，上述步骤S103和S104中，语义分割网络可以是Cylinder3D网络等。It should be noted that, as a preferred example, in the above steps S103 and S104, the semantic segmentation network may be a Cylinder3D network or the like.

S105、根据所述内参和所述外参，进行所述原始点云到所述第二图片的投影，得到第三点云，所述第三点云中的每个点包括所述第二图片对应的第二语义类别信息；S105. Perform the projection of the original point cloud to the second picture according to the internal reference and the external reference to obtain a third point cloud, where each point in the third point cloud includes the second picture the corresponding second semantic category information;

本步骤中，所述原始点云到所述第二图片的投影方法可以是：In this step, the projection method of the original point cloud to the second picture may be:

P′＝Proj(K,M,P)，P′=Proj(K,M,P),

K为相机的内参矩阵；K is the internal parameter matrix of the camera;

P为激光雷达点云集合；P is the lidar point cloud collection;

例如，原始点云为P，第二图片为K，根据上述公式1投影后，得到第三点云P'。For example, the original point cloud is P, and the second image is K. After projection according to the above formula 1, the third point cloud P' is obtained.

经过上述步骤S101到S105后，在点云中的每个点都有得到的两种语义信息，即来自原始点云的第一语义信息和来自点云投影图片后的第二语义信息。需要说明的是，作为一种优选示例，还可以将获得的第一语义信息和第二语义信息为One-Hot格式。After the above steps S101 to S105, each point in the point cloud has two kinds of semantic information obtained, namely the first semantic information from the original point cloud and the second semantic information from the projected image of the point cloud. It should be noted that, as a preferred example, the obtained first semantic information and second semantic information may also be in One-Hot format.

S106、对所述第三点云中的第二语义类别信息，所述第二点云中的第一语义类别信息和所述原始点云的特征信息进行体素化后，输入自适应注意力机制网络中进行学习，得到加权后的语义信息；S106. After voxelizing the second semantic category information in the third point cloud, the first semantic category information in the second point cloud, and the feature information of the original point cloud, input adaptive attention Learning in the mechanism network to obtain weighted semantic information;

本步骤中，所述输入自适应注意力机制网络中进行学习包括：In this step, learning in the input adaptive attention mechanism network includes:

作为一种优选示例，所述对局部特征在注意力机制网络中进行学习，得到学习后的局部特征V_i包括：As a preferred example, the local features are learned in the attention mechanism network, and the learned local features _Vi include:

N为体素格的数量。N is the number of voxel grids.

所述对全局特征在自适应注意力机制网络中进行学习，得到学习后的全局特征V_global包括：The global feature is learned in the adaptive attention mechanism network, and the learned global feature V _global includes:

N为体素格的数量；N is the number of voxels;

作为一种优选示例，根据以下公式得到加权后的语义信息：As a preferred example, the weighted semantic information is obtained according to the following formula:

P_2D为所述第二语义信息；P _2D is the second semantic information;

P_3D为所述第一语义信息；P _3D is the first semantic information;

MLP_att为多层感知机；MLP _att is a multi-layer perceptron;

σ为Sigmoid激活函数。σ is the sigmoid activation function.

作为一种优选示例，本步骤的处理过程如图2所示。图2中，输入点云为原始点云，2D语义信息为第二语义信息转换为One-Hot格式后的数据，3D语义信息为第一语义信息转换为One-Hot格式后的数据。处理过程如下：As a preferred example, the processing process of this step is shown in FIG. 2 . In FIG. 2 , the input point cloud is the original point cloud, the 2D semantic information is the data after the second semantic information is converted into the One-Hot format, and the 3D semantic information is the data after the first semantic information is converted into the One-Hot format. The process is as follows:

将转化为One-Hot的2D及3D语义信息拼接到原始点云特征信息后，如所需预测类别数量为m，则每个点分别包含3D及2D分割出的语义类别信息为2m，最后再与点云的原始数据信息如XYZ拼接到一块，得到N×(2m+3)维特征向量，经体素化后输入到结合了局部特征与全局特征的自适应注意力机制网络中，获得加权后每个体素格所属类别特征，最后将其输入到目标检测网络中。After splicing the 2D and 3D semantic information converted into One-Hot to the original point cloud feature information, if the number of predicted categories required is m, then each point contains 2m of semantic category information divided by 3D and 2D respectively. It is spliced with the original data information of the point cloud, such as XYZ, to obtain an N×(2m+3) dimensional feature vector. After voxelization, it is input into the adaptive attention mechanism network that combines local features and global features to obtain weighted weights. Then, each voxel grid belongs to the category feature, and finally it is input into the target detection network.

S107、根据所述加权后的语义信息，检测障碍物目标；S107, detecting an obstacle target according to the weighted semantic information;

本步骤中，将所述加权后的语义信息，输入到目标检测器中进行障碍物目标的检测。In this step, the weighted semantic information is input into the target detector to detect the obstacle target.

实施例二Embodiment 2

基于同一个发明构思，本发明实施例还提供了一种障碍物感知装置，如图3所示，该装置包括：Based on the same inventive concept, an embodiment of the present invention also provides an obstacle sensing device, as shown in FIG. 3 , the device includes:

图片获取模块303，被配置用于获取相机图片，获取投影转换的标定内参和外参；The picture obtaining module 303 is configured to obtain the camera picture, and obtain the calibration internal parameters and external parameters of the projection transformation;

点云获取模块301，被配置用于获取原始点云；a point cloud acquisition module 301, configured to acquire an original point cloud;

图片语义分割模块304，被配置用于对所述相机图片进行语义分割，得到第二图片；The picture semantic segmentation module 304 is configured to perform semantic segmentation on the camera picture to obtain a second picture;

点云语义分割模块302，被配置用于对所述原始点云进行语义分割，得到第二点云；The point cloud semantic segmentation module 302 is configured to perform semantic segmentation on the original point cloud to obtain a second point cloud;

图片语义投影模块305，被配置用于根据所述内参和所述外参，进行所述原始点云到所述第二图片的投影，得到第三点云，所述第三点云中的每个点包括所述第二图片对应的第二语义类别信息；The picture semantic projection module 305 is configured to perform the projection of the original point cloud to the second picture according to the internal parameters and the external parameters to obtain a third point cloud, each of which is in the third point cloud. Each point includes the second semantic category information corresponding to the second picture;

语义融合模块306，被配置用于对所述第三点云中的第二语义类别信息，所述第二点云中的第一语义类别信息和所述原始点云的特征信息进行体素化后，输入自适应注意力机制网络中进行学习，得到加权后的语义信息；The semantic fusion module 306 is configured to voxelize the second semantic category information in the third point cloud, the first semantic category information in the second point cloud and the feature information of the original point cloud Then, input it into the adaptive attention mechanism network for learning, and obtain the weighted semantic information;

障碍物感知模块307，被配置用于根据加权后的语义信息，检测障碍物目标；The obstacle perception module 307 is configured to detect the obstacle target according to the weighted semantic information;

作为一种优选示例，所述图片获取模块303还被配置用于：As a preferred example, the picture acquisition module 303 is further configured to:

获取与所述原始点云同一时刻的相机图片。Obtain a camera image at the same moment as the original point cloud.

具体的，可先对点云和相机进行软件同步或者硬件同步，然后获得与所述原始点云同一时刻的相机图片。Specifically, software synchronization or hardware synchronization may be performed on the point cloud and the camera first, and then the camera picture at the same moment as the original point cloud may be obtained.

作为一种优选示例，图片语义分割模块304还被配置用于：As a preferred example, the image semantic segmentation module 304 is further configured to:

作为一种优选示例，点云语义分割模块302还被配置用于：As a preferred example, the point cloud semantic segmentation module 302 is further configured to:

作为一种优选示例，图片语义投影模块305还被配置用于根据以下方式进行所述原始点云到所述第二图片的投影：As a preferred example, the image semantic projection module 305 is further configured to perform the projection of the original point cloud to the second image in the following manner:

P′＝Proj(K,M,P)，P′=Proj(K,M,P),

K为相机的内参矩阵；K is the internal parameter matrix of the camera;

P为激光雷达点云集合；P is the lidar point cloud collection;

作为一种优选示例，所述输入自适应注意力机制网络中进行学习包括：As a preferred example, the learning in the input adaptive attention mechanism network includes:

所述输入自适应注意力机制网络中进行学习包括：The learning in the input adaptive attention mechanism network includes:

N为体素格的数量；N is the number of voxels;

作为一种优选示例，语义融合模块306还被配置用于根据以下公式得到加权后的语义信息：As a preferred example, the semantic fusion module 306 is further configured to obtain weighted semantic information according to the following formula:

P_2D为所述第二语义信息；P _2D is the second semantic information;

P_3D为所述第一语义信息；P _3D is the first semantic information;

MLP_att为多层感知机；MLP _att is a multi-layer perceptron;

σ为Sigmoid激活函数。σ is the sigmoid activation function.

需要说明的是，实施例二提供的装置与实施例一提供的方法属于同一个发明构思，解决相同的技术问题，达到相同的技术效果，实施例二提供的装置能实现实施例一的所有方法，相同之处不再赘述。It should be noted that the device provided in the second embodiment and the method provided in the first embodiment belong to the same inventive concept, solve the same technical problems, and achieve the same technical effect, and the device provided in the second embodiment can implement all the methods in the first embodiment. , the similarities will not be repeated.

实施例三Embodiment 3

基于同一个发明构思，本发明实施例还提供了一种障碍物感知装置，如图4所示，该装置包括：Based on the same inventive concept, an embodiment of the present invention also provides an obstacle sensing device, as shown in FIG. 4 , the device includes:

包括存储器402、处理器401和用户接口403；including memory 402, processor 401 and user interface 403;

所述存储器402，用于存储计算机程序；The memory 402 is used to store computer programs;

所述用户接口403，用于与用户实现交互；The user interface 403 is used to interact with the user;

所述处理器401，用于读取所述存储器402中的计算机程序，所述处理器401执行所述计算机程序时，实现：The processor 401 is configured to read the computer program in the memory 402, and when the processor 401 executes the computer program, it realizes:

其中，在图4中，总线架构可以包括任意数量的互联的总线和桥，具体由处理器401代表的一个或多个处理器和存储器402代表的存储器的各种电路链接在一起。总线架构还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起，这些都是本领域所公知的，因此，本文不再对其进行进一步描述。总线接口提供接口。处理器401负责管理总线架构和通常的处理，存储器402可以存储处理器501在执行操作时所使用的数据。Wherein, in FIG. 4 , the bus architecture may include any number of interconnected buses and bridges, specifically, one or more processors represented by processor 401 and various circuits of memory represented by memory 402 are linked together. The bus architecture may also link together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and, therefore, will not be described further herein. The bus interface provides the interface. The processor 401 is responsible for managing the bus architecture and general processing, and the memory 402 may store data used by the processor 501 in performing operations.

处理器401可以是CPU、ASIC、FPGA或CPLD，处理器401也可以采用多核架构。The processor 401 may be a CPU, an ASIC, an FPGA or a CPLD, and the processor 401 may also adopt a multi-core architecture.

处理器401执行存储器402存储的计算机程序时，实现实施例一中的任一障碍物感知方法。When the processor 401 executes the computer program stored in the memory 402, any obstacle sensing method in the first embodiment is implemented.

需要说明的是，实施例三提供的装置与实施例一提供的方法属于同一个发明构思，解决相同的技术问题，达到相同的技术效果，实施例三提供的装置能实现实施例一的所有方法，相同之处不再赘述。It should be noted that the device provided in the third embodiment and the method provided in the first embodiment belong to the same inventive concept, solve the same technical problems, and achieve the same technical effect, and the device provided in the third embodiment can realize all the methods of the first embodiment. , the similarities will not be repeated.

本申请还提出一种处理器可读存储介质。其中，该处理器可读存储介质存储有计算机程序，所述处理器执行所述计算机程序时实现实施例一中的任一障碍物感知方法。The present application also proposes a processor-readable storage medium. The processor-readable storage medium stores a computer program, and when the processor executes the computer program, any obstacle sensing method in Embodiment 1 is implemented.

需要说明的是，本申请实施例中对单元的划分是示意性的，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。It should be noted that the division of units in the embodiments of the present application is schematic, and is only a logical function division, and other division methods may be used in actual implementation. In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

显然，本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样，倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内，则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the spirit and scope of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims

1. An obstacle sensing method, comprising:

acquiring an original point cloud and a camera picture at the same time;

acquiring a calibration internal parameter and a calibration external parameter of projection conversion;

performing semantic segmentation on the original point cloud to obtain a second point cloud;

performing semantic segmentation on the camera picture to obtain a second picture;

according to the internal parameters and the external parameters, projecting the original point cloud to the second picture to obtain a third point cloud, wherein each point in the third point cloud comprises second semantic category information corresponding to the second picture;

after the second semantic category information in the third point cloud, the first semantic category information in the second point cloud and the feature information of the original point cloud are subjected to voxelization, the second semantic category information and the feature information are input into a self-adaptive attention mechanism network for learning, and the weighted semantic information is obtained;

detecting an obstacle target according to the weighted semantic information;

wherein the second point cloud and the third point cloud comprise obstacle category information;

the obtaining of the weighted semantic information includes:

obtaining weighted semantic information according to the following formula:

wherein, P_a，sAnd P_a，tThe weighted semantic information is obtained;

P_2Dthe second semantic category information;

P_3Dthe first semantic category information;

MLP_atta multilayer perceptron;

sigma is a Sigmoid activation function,

the learning in the input adaptive attention mechanism network comprises:

the local features are learned in the adaptive attention mechanism network to obtain learned local features V_i；

The global feature is learned in the adaptive attention mechanism network to obtain a learned global feature V_global；

The global feature V after learning is processed_globalSpliced to each local feature V_iTo obtain an enhanced feature V_gl。

2. The method of claim 1, wherein detecting an obstacle target based on the weighted semantic information comprises:

and inputting the weighted semantic information into a target detector to detect the obstacle target.

3. The method of claim 1, wherein the obtaining the original point cloud and the camera picture at the same time comprises:

carrying out software synchronization or hardware synchronization on the point cloud and the camera;

and obtaining the original point cloud and the camera picture at the same moment.

4. The method of claim 1, wherein semantically segmenting the original point cloud to obtain a second point cloud comprises:

and inputting the original point cloud into a point cloud semantic segmentation network to obtain a second point cloud.

5. The method of claim 1, wherein the semantically segmenting the camera picture to obtain a second picture comprises:

and inputting the camera picture into a picture semantic segmentation network to obtain a second picture.

6. The method of claim 1, wherein before the voxelizing the second semantic category information in the third point cloud, the first semantic category information in the second point cloud, and the feature information of the original point cloud, further comprising:

and converting the first semantic category information and the second semantic category information into a One-Hot coding format.

7. The method of claim 1, wherein the projecting the original point cloud to the second picture comprises:

projection is performed according to the following formula:

P′＝Proj(K，M，P)，

wherein, Proj is a projection matrix processing process;

k is an internal reference matrix of the camera;

m is an external parameter matrix from the camera to the laser radar;

p is a laser radar point cloud set;

and P' is the laser radar point cloud projected to the camera coordinate system.

8. An obstacle sensing device, comprising:

the image acquisition module is configured for acquiring a camera image and acquiring calibration internal parameters and external parameters of projection conversion;

a point cloud acquisition module configured to acquire an original point cloud;

the picture semantic segmentation module is configured for performing semantic segmentation on the camera picture to obtain a second picture;

a point cloud semantic segmentation module configured to perform semantic segmentation on the original point cloud to obtain a second point cloud:

the image semantic projection module is configured to perform projection from the original point cloud to the second image according to the internal parameters and the external parameters to obtain a third point cloud, wherein each point in the third point cloud comprises second semantic category information corresponding to the second image;

the semantic fusion module is configured to perform voxelization on second semantic category information in the third point cloud, first semantic category information in the second point cloud and feature information of the original point cloud, and input the voxelization information into an adaptive attention mechanism network for learning to obtain weighted semantic information;

an obstacle sensing module configured to detect an obstacle target according to the weighted semantic information;

wherein the second point cloud and the second picture comprise obstacle category information;

the semantic fusion module is further configured to obtain weighted semantic information according to the following formula:

wherein, P_a，sAnd P_a，tThe weighted semantic information is obtained;

P_2Dthe second semantic category information;

P_3Dthe first semantic category information;

MLP_attis a multilayer perceptron;

sigma is a Sigmoid activation function,

the learning in the input adaptive attention mechanism network comprises:

9. An obstacle sensing apparatus comprising a memory, a processor and a user interface;

the memory for storing a computer program;

the user interface is used for realizing interaction with a user;

the processor for reading the computer program in the memory, the processor implementing the obstacle sensing method according to one of claims 1 to 7 when executing the computer program.

10. A processor-readable storage medium, characterized in that the processor-readable storage medium stores a computer program which, when executed by a processor, implements an obstacle sensing method according to one of claims 1 to 8.