WO2023193400A1

WO2023193400A1 - Point cloud detection and segmentation method and apparatus, and electronic device

Info

Publication number: WO2023193400A1
Application number: PCT/CN2022/117322
Authority: WO
Inventors: 赵天坤; 唐佳
Original assignee: 合众新能源汽车股份有限公司
Priority date: 2022-04-06
Filing date: 2022-09-06
Publication date: 2023-10-12
Also published as: CN114820463A

Abstract

A point cloud detection and segmentation method, relating to the technical field of computers. The method comprises: performing columnar voxelization processing on point cloud to be processed, so as to obtain a plurality of columnar voxels forming said point cloud; performing feature extraction and mapping on the plurality of columnar voxels to obtain the voxel features of said point cloud, and mapping the voxel features to an aerial view so as to obtain aerial view features corresponding to said point cloud; performing feature extraction on the aerial view features by means of a backbone network of a pre-trained multi-task neural network to obtain a point cloud feature vector; and by means of a point cloud detection network branch and a point cloud segmentation network branch of the multi-task neural network, respectively performing point cloud detection and point cloud segmentation on the basis of the point cloud feature vector. By reducing the operation of repeatedly performing point cloud feature extraction, the efficiency of point cloud detection and point cloud segmentation is improved.

Description

Point cloud detection and segmentation methods, devices, and electronic equipment

This application requires the priority of the Chinese patent application submitted to the China Patent Office on April 6, 2022, with the application number 202210353486.1 and the invention name "Point cloud detection and segmentation method, device, and electronic equipment", and its entire content has been approved This reference is incorporated into this application.

Technical field

The present application relates to the field of computer technology, and in particular to point cloud detection and segmentation methods and devices, as well as electronic equipment and computer-readable storage media.

Background technique

Point cloud data refers to a set of vectors in a three-dimensional coordinate system. Spatial information is recorded in the form of points, and each point contains three-dimensional coordinates. Depending on the data collection capabilities of point cloud collection equipment, some point cloud data may also contain color information (RGB) or reflection intensity information (Intensity). Taking point cloud data collected through lidar as an example, point cloud data includes the position coordinates and reflection intensity information of points in three-dimensional space. Point cloud data is widely used for target detection and recognition in the field of autonomous driving. For example, it is used for target detection and recognition in autonomous driving fields such as cars and drones. In the application process of point cloud data, point cloud detection and segmentation technology are usually used to perform target object detection and point cloud segmentation based on point cloud data. Among them, point cloud detection technology refers to processing point cloud data to detect the position of the target object in the scene that the point cloud data matches, while point cloud segmentation technology refers to identifying the target object that matches each point in the point cloud data. Category to facilitate subsequent automatic driving control.

In the existing technology, different network models are usually used to perform point cloud detection and point cloud segmentation tasks respectively. Since point cloud data is sparse and irregular, the structures of the commonly used detection networks and segmentation networks are relatively complex, resulting in a high amount of calculations required to obtain point cloud detection results and point cloud segmentation results.

It can be seen that the point cloud detection and segmentation methods in the existing technology still need to be improved.

Contents of the invention

The embodiment of the present application provides a point cloud detection and segmentation method, which helps to improve the efficiency of point cloud detection and point cloud segmentation.

In the first aspect, embodiments of the present application provide a point cloud detection and segmentation method, including:

Perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed;

Perform feature extraction and mapping on the plurality of columnar voxels to obtain voxel features of the point cloud to be processed;

Map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed;

Through the backbone network of the pre-trained multi-task neural network, feature extraction is performed on the bird's-eye view features to obtain a point cloud feature vector;

Through the point cloud detection network branch of the multi-task neural network, the target object is detected based on the point cloud feature vector, and the point cloud detection result is output; and, through the point cloud segmentation network branch of the multi-task neural network, based on the The point cloud feature vector is used for point cloud segmentation and the point cloud segmentation result is output.

In the second aspect, embodiments of the present application provide a point cloud detection and segmentation device, including:

A columnar voxelization module, used to perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed;

A voxel feature acquisition module, used to perform feature extraction and mapping on the plurality of columnar voxels, and obtain the voxel features of the point cloud to be processed;

A bird's-eye view feature mapping module, used to map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed;

A point cloud feature extraction module is used to extract features of the bird's-eye view features through the backbone network of a pre-trained multi-task neural network to obtain a point cloud feature vector;

A point cloud detection and segmentation module is configured to perform target object detection based on the point cloud feature vector through the point cloud detection network branch of the multi-task neural network, and output point cloud detection results; and, through the multi-task neural network The point cloud segmentation network branch performs point cloud segmentation based on the point cloud feature vector and outputs the point cloud segmentation result.

In a third aspect, embodiments of the present application also disclose an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the The point cloud detection and segmentation method described in the embodiment of this application.

In the fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps of the point cloud detection and segmentation method disclosed in the embodiments of the present application are provided.

The point cloud detection and segmentation method disclosed in the embodiment of the present application obtains a number of columnar voxels that constitute the point cloud to be processed by performing columnar voxelization processing on the point cloud to be processed; and then characterizes the several columnar voxels. Extract and map, obtain the voxel features of the point cloud to be processed, and map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed; finally, through the pre-trained multi-task neural The backbone network of the network extracts features from the bird's-eye view features to obtain point cloud feature vectors; through the point cloud detection network branch of the multi-task neural network, target objects are detected based on the point cloud feature vectors and output point clouds Detection results; and, through the point cloud segmentation network branch of the multi-task neural network, point cloud segmentation is performed based on the point cloud feature vector, and the point cloud segmentation result is output, which helps to improve the efficiency of point cloud detection and point cloud segmentation. .

The above description is only an overview of the technical solutions of the present application. In order to have a clearer understanding of the technical means of the present application, they can be implemented according to the content of the description, and in order to make the above and other purposes, features and advantages of the present application more obvious and understandable. , the specific implementation methods of the present application are specifically listed below.

Description of the drawings

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments These are part of the embodiments of this application, but not all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

Figure 1 is a schematic flow chart of the point cloud detection and segmentation method in Embodiment 1 of the present application;

Figure 2 is a schematic diagram of the effect of point cloud voxelization processing in Embodiment 1 of the present application.

Figure 3 is a schematic structural diagram of the multi-task neural network used in Embodiment 1 of the present application;

Figure 4 is a schematic diagram of point cloud segmentation result mapping in Embodiment 1 of the present application;

Figure 5 is one of the structural schematic diagrams of the point cloud detection and segmentation device in Embodiment 2 of the present application;

Figure 6 is the second structural schematic diagram of the point cloud detection and segmentation device in Embodiment 2 of the present application.

Figure 7 schematically shows a block diagram of an electronic device for performing a method according to the present application; and

Figure 8 schematically shows a storage unit for holding or carrying program code for implementing the method according to the present application.

Specific embodiments

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

Embodiment 1

An embodiment of the present application discloses a point cloud detection and segmentation method, as shown in Figure 1. The method includes: steps 110 to 150.

Step 110: Perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed.

The point cloud to be processed described in the embodiment of this application is: the point cloud in the area of interest in the point cloud collected by a point cloud collection device (such as a lidar sensor).

Taking the application of the point cloud detection and segmentation method described in the embodiments of this application to the automatic driving scene as an example, the original point cloud collected by the lidar sensor installed on the vehicle is a data set of several disordered points, where each Point data can be represented by data with a dimension of 4, for example, expressed as: (x, y, z, i), where x, y, z are the spatial position coordinates of the point, and i represents the reflection intensity of the point.

For the original point cloud collected by point cloud acquisition equipment, point cloud preprocessing first needs to be performed to obtain a point set that meets the requirements. For example, for the original point cloud, remove the nan values (null values), or remove the points with very large values to filter the point cloud noise. For specific implementation solutions of point cloud preprocessing, please refer to the prior art. In the embodiments of this application, the technical solution adopted for point cloud preprocessing is not limited and will not be described again here.

The point cloud collected by point cloud collection equipment (such as lidar sensor) is a point cloud in a three-dimensional irregular spatial area. Before detecting and segmenting the point cloud, it is first necessary to determine a point cloud in a regular spatial area. For example, by limiting the coordinate range in the x, y and z directions, taking a point cloud in a large cubic area, and discarding the rest, the size of this cubic area can be expressed as: [xmax-xmin, ymax-ymin, zmax-zmin ], where xmax and xmin respectively represent the maximum and minimum coordinate values in the x direction, ymax and ymin represent the maximum and minimum coordinate values in the y direction respectively, and zmax and zmin represent the maximum and minimum coordinate values in the z direction respectively.

Further, the data of the points in the area of interest in the large cube area determined previously is obtained to facilitate subsequent point cloud detection and point cloud segmentation of the point cloud in the area of interest. In some embodiments of the present application, the coordinates of points within the area of interest can be expressed by (x, y, z), where xmin < x < xmax, ymin < y < ymax, zmin < z < zmax, and the unit is meters .

In some embodiments of the present application, points in the region of interest are determined based on point cloud quality. For example, if the point cloud far away from the vehicle is sparse and the number of points hitting the vehicle is small, you can set the minimum number of points to a smaller value (for example, the point value is equal to 5), and then find the corresponding number of points based on this number. , and determine a spatial area based on a maximum distance point. In some embodiments of the present application, for the same point cloud quality (such as point clouds collected by the same point cloud collection device), this distance can be predetermined by the quality of the collected point cloud data and will not change during the application process.

For the method of determining the region of interest, please refer to the method of determining the region of interest used in point cloud detection or point cloud segmentation solutions in the prior art. In the embodiments of this application, the specific implementation method of determining the region of interest is not limited.

Since the point cloud collected by the point cloud collection device contains many points, feature extraction based on points for point cloud detection and point cloud segmentation will consume a large amount of computing resources. Therefore, in the embodiment of the present application, First, voxelize the original point cloud, and then perform feature extraction based on voxels, which can effectively reduce the amount of data processing and save computing resources.

In some embodiments of the present application, the point cloud to be processed is subjected to columnar voxelization processing to obtain a number of columnar voxels that constitute the point cloud to be processed, including: coordinate distribution according to the first coordinate axis and the second coordinate axis. , divide the points in the point cloud to be processed into several columnar voxels. In some embodiments of the present application, the first coordinate axis and the second coordinate axis are two different coordinate axes of a three-dimensional spatial coordinate system, and the columnar voxels are prismatic voxels. For example, after the point cloud shown on the left side in Figure 2 is voxelized, a cuboid voxel (ie, a columnar voxel) 210 shown on the right side in Figure 2 can be obtained.

Taking the first coordinate axis as the x-axis and the second coordinate axis as the y-axis, the points in the area of interest can be divided into cuboid voxels along the x-axis and y-axis directions respectively. The z-axis direction is not divided, and the division is obtained. The size of each voxel can be expressed as [x _v , y _v , zmaz-zmin], where x _v represents the length of the voxel along the x-axis direction, y _v represents the length of the voxel along the y-axis direction, zmax-zmin Represents the height of the voxel along the z-axis direction, in meters. According to the aforementioned columnar voxel generation method, corresponding to a region of interest, W×H columnar voxels can be divided, where,

Taking the range of x in the area of interest as (0,102.4), the range of y as (0,50), the range of z as (0,100), and the size of the columnar voxel as 0.2×0.2×100 as an example, then the columnar volume in the x-axis direction The number of pixels w is equal to (102.4-0)/0.2=512, and the number of columnar voxels in the y-axis direction H is equal to (50-0)/0.2=250. The area of interest is divided into 512×250 columnar voxels. Subsequently, these columnar voxels are regarded as image pixels and used for feature extraction of the region of interest. In some embodiments of the present application, after voxelization processing, the point cloud of the area of interest can be represented as a voxel image of W×H×1, and the dimension of the voxel image is W×H×1.

In some embodiments of the present application, the size of the columnar voxels is determined experimentally. For example, you can preset some voxel sizes, conduct point cloud detection and point cloud segmentation experiments respectively, analyze the impact of voxel size on detection and segmentation results and performance, and finally determine the most optimal voxel size.

In some embodiments of the present application, after the point cloud to be processed is subjected to columnar voxelization and a plurality of columnar voxels constituting the point cloud to be processed are obtained, the method further includes: obtaining the first point of the plurality of columnar voxels. Cloud segmentation label, wherein the first point cloud segmentation label includes: position information of each columnar voxel. For the W×H columnar voxels obtained by division, these columnar voxels form a voxel image with a voxel dimension of W×H×1. The first point cloud segmentation label of this voxel image is the above-mentioned W×H The first point cloud segmentation label of the columnar voxel can be represented by a position information table of size W×H, for example, expressed as (W, H, 1). The first point cloud segmentation label is used to subsequently determine the segmentation result of the point cloud based on the segmentation result of the columnar voxels.

In some embodiments of the present application, obtaining the first point cloud segmentation labels of the several columnar voxels includes: for each columnar voxel, position information of the columnar voxel is used as the columnar voxel. First point cloud segmentation label for voxel matching. For example, the first point cloud segmentation label can be represented by a position information table of size W×H, for example, represented as (W, H, 1). The position information table includes W×H sets of position information, and each set of position information corresponds to a columnar voxel. For example, each set of position information is used to represent the coordinate range of the corresponding columnar voxel on the x-axis and y-axis. It can be seen that each set of position information can also be used to represent the coordinate range of points in the point cloud divided into columnar voxels corresponding to the set of position information. In some embodiments of the present application, the mapping relationship between the points in the point cloud and the columnar voxel can be established by recording the coordinate range of the corresponding columnar voxel in the position information table. In other embodiments of the present application, other methods may be used to establish the mapping relationship between points in the point cloud and columnar voxels. In the embodiments of this application, the specific expression form of the mapping relationship is not limited.

Step 120: Perform feature extraction and mapping on the plurality of columnar voxels to obtain voxel features of the point cloud to be processed.

After obtaining a number of columnar voxels that constitute the point cloud to be processed (such as the point cloud of the aforementioned area of interest), the columnar voxels can be regarded as pixels of the image, and the voxel image composed of the several columnar voxels can be processed. Feature extraction and mapping are used to obtain the features of the voxel image. Since the features of the voxel image are extracted based on the distribution data of points within columnar voxels, the features of the point cloud to be processed can be fully expressed.

In some embodiments of the present application, performing feature extraction and mapping on the plurality of columnar voxels to obtain the voxel features of the point cloud to be processed includes: for each columnar voxel, obtaining the points divided into The center point of all points in the columnar voxel is calculated, and the coordinate distance between each point divided into the columnar voxel and the center point is calculated; for each columnar voxel, the coordinate distance divided into the columnar voxel is calculated. The point features of all points in the columnar voxels are spliced into the voxel features of the columnar voxels, wherein the point features of each point include: the position coordinates and reflection intensity information of the point; The voxel features of the columnar voxels are spliced to obtain the splicing features of the several columnar voxels; feature mapping is performed on the splicing features to obtain the voxel features of the point cloud to be processed.

For example, for each columnar voxel obtained in the previous step, each columnar voxel will contain a certain number of points. Taking a columnar voxel that contains K points as an example, first calculate the average coordinate of these K points based on the position coordinates in the original point cloud data of these K points.

as the center point coordinates of these K points; then, subtract the aforementioned average coordinates from the position coordinates of these K points to obtain

And use x _c , y _c , z _c to represent the coordinate distance between the point in the columnar voxel and the center point; then, the point features in each columnar voxel are used as data x, y, z, i, x _c , y _c , z _c represent. In this way, the features of a columnar voxel containing K points can be expressed as features with a length of K×7, that is, the features of the columnar voxel can be represented by the point features of all included points.

Further, for a point cloud to be processed including N columnar voxels (such as the point cloud of the aforementioned area of interest), the voxel characteristics can be obtained by voxelizing the point cloud to be processed. Express the characteristics of elements. For example, for a point cloud to be processed that includes N columnar voxels, the features of the N columnar voxels obtained after voxelization processing (such as the aforementioned features with a length of K×7) are spliced to obtain a length is the splicing feature of N×K×7.

In some embodiments of the present application, if there are no points in a certain columnar voxel, the columnar voxel can be discarded.

Next, feature mapping is further performed on the acquired splicing features to obtain voxel features of preset dimensions of the point cloud to be processed. For example, for the spliced features of N columnar voxels, the spliced features can be feature mapped through a pre-trained feature extraction network to obtain features with a length of N×D, where D represents the feature dimension of each columnar voxel. number. In some embodiments of the present application, the feature extraction network can be constructed by serial connection of a fully connected layer, a normalization layer and a one-dimensional maximum pooling layer MaxPool1D. Finally, N×D-dimensional features are output, where D is the full The dimension of the connection layer output.

Step 130: Map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed.

In some embodiments of the present application, mapping the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed includes: segmenting each of the first point cloud labels according to The position information of the columnar voxel is used to obtain the number of points included in each columnar voxel; for each columnar voxel, the volume is calculated according to the number of points included in the columnar voxel. The feature corresponding to the columnar voxel in the voxel feature is mapped to the corresponding position of the bird's-eye view that matches the first point cloud segmentation label, and the bird's-eye view feature corresponding to the point cloud to be processed is obtained; wherein, According to the number of points included in the columnar voxel, the feature of the voxel feature corresponding to the columnar voxel is mapped to the corresponding position of the bird's-eye view that matches the first point cloud segmentation label, Including: when the number of points included in the columnar voxel is greater than 0, mapping the feature vector corresponding to the columnar voxel in the voxel feature to match the first point cloud segmentation label at the corresponding position of the bird's-eye view; when the number of points included in the columnar voxel is equal to 0, the feature vector at the corresponding position of the bird's-eye view that matches the first point cloud segmentation label is set to 0 .

As mentioned above, each columnar voxel corresponds to a label data (i.e. a set of position information) in the first point cloud segmentation label. For example, in the first point cloud segmentation label with a length of W×H, the first The label data corresponds to columnar voxels with coordinates ranging from (0, 0) to (0.2, 0.2). In this step, you can initialize a bird's-eye view that matches the dimension of the first point cloud segmentation label. For example, initialize a bird's-eye view with a size of W×H so that there are W×H pixels on the bird's-eye view. Each pixel The features are represented by D-dimensional feature vectors, and each pixel corresponds to a columnar voxel. In this way, for the voxel feature with a length of N×D obtained in the previous step, the respective feature vectors of the N columnar voxels included in the voxel feature can be mapped to the corresponding positions on the bird's-eye view to obtain the size is the bird's-eye view feature of W×H×D.

In some embodiments of the present application, due to the sparsity of the point cloud, some columnar voxels may not have points. In this way, when performing feature mapping, the bird's-eye view corresponds to the columnar voxels that do not include points. position, its eigenvector can be set to a zero vector.

Step 140: Feature extraction is performed on the bird's-eye view features through the backbone network of the pre-trained multi-task neural network to obtain a point cloud feature vector.

In some embodiments of the present application, as shown in Figure 3, the multi-task neural network includes: a backbone network 310, a point cloud detection network branch 320 and a cloud segmentation network branch 330.

Among them, the backbone network 310 may adopt a convolutional neural network commonly used in the prior art. For example, in some embodiments of the present application, as shown in Figure 3, the backbone network 310 further includes: three cascaded feature extraction modules of different scales and a feature concatenation layer (ConCat), where each feature extraction The modules include: different numbers of feature mapping modules (CBR), an upsampling layer, and, a feature mapping module (CBR). The number of feature mapping modules (CBR) included in each feature extraction module can be 4, 6, and 6 respectively. The feature mapping module (CBR) can be composed of a convolution layer, a batch normalization layer, and a cascade of Relu activation functions. Taking the size of the input feature as W×H as an example, the sizes of the features output by these three feature extraction modules are respectively

The feature splicing layer is used to splice the features output by the above three feature extraction modules. In this way, after the bird's-eye view features of size W×H×D are input to the backbone network 310, the above three feature extraction modules perform convolution operation, upsampling, normalization and activation processing on the input bird's-eye view features respectively. , and finally after splicing through the feature splicing layer, the obtained feature vector dimension is

C is the number of characteristic channels.

Step 150, through the point cloud detection network branch of the multi-task neural network, perform target object detection based on the point cloud feature vector, and output the point cloud detection results; and, through the point cloud segmentation network branch of the multi-task neural network , perform point cloud segmentation based on the point cloud feature vector, and output the point cloud segmentation result.

Next, the point cloud feature vectors output by the backbone network 310 will be input to the point cloud detection network branch 320 and the cloud segmentation network branch 330 respectively, and these two network branches will perform the next step of processing respectively.

The following is an example of the execution scheme of the point cloud detection task and the point cloud segmentation task in conjunction with the network structures of the point cloud detection network branch 320 and the cloud segmentation network branch 330 respectively.

In some embodiments of the present application, the point cloud detection network branch 320 includes four detection heads, which are respectively used to output the detection results of whether there is a heat map, the detected target position, the size of the target, and the rotation angle of the target. In some embodiments of the present application, each detection head included in the point cloud detection network branch 320 is composed of a feature extraction module and a convolutional layer. The feature extraction module is further composed of a convolutional layer, a batch normalization layer and an activation layer. Function composition. Each detection head performs feature encoding and transformation mapping on the input point cloud feature vector, and finally outputs the corresponding prediction result. For example, the detection head corresponding to the detection heat map has a size of

Each position in the point cloud feature vector is predicted separately, and whether the corresponding position is a key point on the heat map is output; for another example, the detection head corresponding to the detection target object has a size of

Predict from the point cloud feature vector and output the position of the detected target (x, y, z); for another example, the detection head corresponding to the output target size outputs the size of the target (dx, dy, dz); Corresponding to the detection head that outputs the rotation angle of the target object, it outputs the rotation angle θ of the target object.

In some embodiments of the present application, as shown in Figure 3, the point cloud segmentation network branch 330 is composed of an upsampling module, a feature extraction module and a convolutional layer. The feature extraction module is further composed of a convolutional layer, a batch regression layer and a convolutional layer. It consists of a unified layer and an activation function. The upsampling layer first upsamples the point cloud feature vectors output by the backbone network 310, and then uses the convolution layer, batch normalization layer and activation function to sequentially perform feature conversion and mapping on the vectors obtained by the upsampling process. , Finally, the segmentation results of the corresponding columnar voxels are output through the convolutional layer.

The size of the point cloud feature vector output by the backbone network is

For example, the point cloud segmentation network branch 330 performs upsampling, convolution operations, batch normalization, activation mapping and other processes on the input point cloud feature vector, and finally outputs one-dimensional data of (W, H, n_class). Among them, W and H refer to the dimensions of the output data corresponding to the width and height of the input feature map; n_classs represents the number of point cloud semantic categories. Taking the W×H columnar voxels, W=512, H=512, and the number of point cloud semantic categories obtained after voxelization of the point cloud to be processed as an example, the output data size of the point cloud segmentation network branch 330 is: 512× 512×11, which means that in these 512×512 positions, each position has a set of segmentation result prediction values, the number is 11, the value of these 11 segmentation result prediction values is between 0-1, and the sum is 1, which means The probability value that each columnar voxel belongs to the corresponding point cloud semantic category. Furthermore, the point cloud semantic category corresponding to the maximum probability value can be taken as the point cloud semantic category for corresponding columnar voxel matching.

In some embodiments of the present application, the point cloud semantic categories are determined according to specific application scenarios. For example, for point clouds collected in autonomous driving applications, point cloud semantic categories can be defined to include but are not limited to any one or more of the following: buildings, green plants, ground, fences, curbs, lane lines, vehicles, etc.

In this way, through the point cloud segmentation network branch of the multi-task neural network, point cloud segmentation processing is performed based on the point cloud feature vector, and the point cloud segmentation network branch will output the plurality of columns matched by the point cloud feature vector. Point cloud segmentation results of voxels (that is, all columnar voxels obtained after voxelizing the point cloud to be processed).

As can be seen from the previous description, the segmentation result output by the point cloud segmentation network branch is the segmentation result obtained by semantic segmentation based on the features projected onto the bird's-eye view. In the subsequent point cloud data processing, it is necessary to obtain the point cloud midpoint Segmentation results, therefore, need to be converted from segmentation results based on columnar voxels into segmentation results for points in the point cloud. In some embodiments of the present application, the point cloud segmentation result includes: the point cloud semantic category matched by each columnar voxel, the point cloud segmentation network branch through the multi-task neural network, based on the point cloud segmentation network branch. After the cloud feature vector performs point cloud segmentation and outputs the point cloud segmentation result, it also includes: mapping the point cloud semantic category matched by the columnar voxel to the point cloud to be processed according to the position information of the columnar voxel. , the segmentation result of the point cloud to be processed is obtained.

In some embodiments of the present application, the point cloud semantic category matched by the columnar voxel is mapped to the point cloud to be processed according to the position information of the columnar voxel, and the point to be processed is obtained. The segmentation result of the point in the cloud includes: obtaining the points in the point cloud to be processed contained in each columnar voxel according to the position information of the columnar voxel; for each columnar voxel, The semantic category of the point cloud matched by the columnar voxel is used as the semantic category of the point cloud matched by the point contained in the columnar voxel. As can be seen from the previous description, each columnar voxel corresponds to a position in the bird's-eye view. Through this mapping relationship, the segmentation result of the columnar voxel is obtained, which can be considered as the point cloud semantic segmentation result of the columnar area in the point cloud. As shown in Figure 4, each box in the bird's-eye view corresponds to a columnar voxel. The segmentation result corresponding to the image position matched by each box in the bird's-eye view can be regarded as the segmentation result of the columnar voxel corresponding to the box. As can be seen from the previous description, each columnar voxel corresponds to a spatial area in the point cloud to be processed. This spatial area may contain 0 or more points. Then the segmentation result of each columnar voxel (ie, matching The point cloud semantic category) is used as the point cloud semantic category of each point included in the columnar voxel. At this point, the semantic segmentation of the points in the point cloud is completed. For example, for a columnar voxel with coordinates ranging from (0, 0) to (0.2, 0.2), if the segmentation result of the columnar voxel is "kerb", it can be determined that in the point cloud to be processed, the coordinate range is (0 , 0) to (0.2, 0.2), the semantic category of point cloud matching is "kerb".

In order to facilitate readers to better understand the point cloud detection and segmentation methods disclosed in the embodiments of this application, the training scheme of the multi-task neural network is illustrated below with an example.

As mentioned above, the pre-trained multi-task neural network includes: a backbone network 310, a point cloud detection network branch 320, and a point cloud segmentation network branch 330. In some embodiments of the present application, the pre-trained multi-task neural network The backbone network of the neural network, before performing feature extraction on the bird's-eye view features and obtaining the point cloud feature vector, also includes: training a multi-task neural network based on several voxelized point cloud training samples; wherein, the voxelized points The cloud training samples are constructed based on the columnar voxels obtained by performing columnar voxelization on several point clouds respectively; for each of the voxelized point cloud training samples, the sample data includes: several columnar voxels, and the sample labels include : The second point cloud segmentation label matching the corresponding sample data; the second point cloud segmentation label is used to identify the true value of the point cloud semantic category of each columnar voxel matching in the corresponding sample data; the columnar voxel matching The true value of the point cloud semantic category is: among the point cloud semantic categories covered by the points divided into corresponding columnar voxels in the point cloud, the point cloud semantic category with the largest coverage rate.

When constructing a voxelized point cloud training sample, the specific implementation method of generating sample data refers to the corresponding implementation method in the previous steps, such as obtaining the point cloud to be processed, and voxelizing the point cloud to be processed to obtain a number of columnar voxels. The specific implementation will not be described again here.

Furthermore, for all columnar voxels obtained after voxelization of each point cloud, each columnar voxel will contain a certain number of points, and these points are manually labeled with point cloud semantic categories. In this application In some embodiments, by counting the points in the columnar voxels, the point cloud semantic category matched by the largest number of points is annotated as the point cloud semantic category of the columnar voxel. For example, a certain columnar voxel includes 3 points, which are marked with point cloud semantic categories (such as small cars, large cars, bicycles, tricycles, pedestrians, cones, green plants, ground, fences, Curbs, lane lines, etc.), assuming they are (buildings, buildings, green plants), then take the largest number of buildings as the point cloud semantic category for this columnar voxel matching. The point cloud semantic categories matched by all columnar voxels obtained after voxelization of a certain point cloud are arranged according to voxel positions, that is, the point cloud semantic category labels matching the sample data generated by the point cloud are obtained (i.e., the second point cloud split tag).

Taking the sample data as W×H columnar voxels as an example, the sample label of the sample data can be expressed as a W×H label matrix. Each element in the label matrix is the point cloud semantic category matched by the corresponding columnar voxel. logo.

In some embodiments of the present application, the sample label further includes: a point cloud detection label, which is used to identify the true value of the target detection result in the corresponding sample data. For example, for each point cloud used to generate training samples, manually mark the key points of the target object on the heat map, the spatial position coordinates, stereoscopic size, and rotation angle of the target object in the point cloud, and use the standard information as the Point cloud detection labels for training samples generated from point clouds.

In some embodiments of the present application, training a multi-task neural network based on several voxelized point cloud training samples includes: performing the following point cloud detection and segmentation operations for each of the voxelized point cloud training samples. , obtain the point cloud detection result prediction value and point cloud segmentation result prediction value of the corresponding voxelized point cloud training sample: perform feature extraction and mapping on several columnar voxels included in the voxelized point cloud training sample, and obtain the Describe the voxel features of the voxelized point cloud training sample; map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the voxelized point cloud training sample; through the backbone network, map the bird's-eye view Feature extraction is performed on the image features to obtain point cloud feature vectors; target object detection is performed based on the point cloud feature vectors through the point cloud detection network branch, and the point cloud detection result prediction value of the voxelized point cloud training sample is output ; And, through the point cloud segmentation network branch, perform point cloud segmentation based on the point cloud feature vector, and output the voxelized point cloud training sample point cloud segmentation result prediction value; according to each of the voxelized point cloud The point cloud detection result prediction value of the training sample and the corresponding point cloud detection label are calculated, and the point cloud detection loss of the multi-task neural network is calculated, and the point cloud segmentation result prediction value is based on each of the voxelized point cloud training samples. and the corresponding second point cloud segmentation label, calculate the point cloud segmentation loss of the multi-task neural network, and then iteratively train the multi-task neural network with the goal of optimizing the point cloud detection loss and the point cloud segmentation loss. network.

Perform feature extraction and mapping on several columnar voxels included in the voxelized point cloud training sample to obtain the voxel features of the voxelized point cloud training sample. For specific implementation methods, see Extracting Point Clouds to be Processed. The specific implementation of voxel features will not be described again here.

For a specific implementation method of mapping the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the voxelized point cloud training samples, please refer to the relevant descriptions above and will not be described again here.

For specific implementation methods of extracting features from the bird's-eye view features through the backbone network to obtain point cloud feature vectors, please refer to the relevant descriptions above and will not be repeated here.

Through the point cloud detection network branch, the target object is detected based on the point cloud feature vector, and the point cloud detection result prediction value of the voxelized point cloud training sample is output. Please refer to the previous article to obtain the point to be processed. The relevant description of the cloud detection results will not be described again here.

Through the point cloud segmentation network branch, perform point cloud segmentation based on the point cloud feature vector, and output the voxelized point cloud training sample point cloud segmentation result prediction value. Please refer to the previous article to obtain the point cloud to be processed. The relevant description of the segmentation results will not be repeated here.

In the training phase of the multi-task neural network, the point cloud detection loss of the multi-task neural network is calculated based on the point cloud detection result prediction value and the corresponding point cloud detection label of each voxelized point cloud training sample. Among them, the point cloud detection loss includes four parts, namely: heat map prediction loss, position prediction loss, size prediction loss and rotation angle prediction loss.

In some embodiments of the present application, the position prediction loss, size prediction loss, and rotation angle prediction loss can be expressed by mean square error. For example, the position prediction loss of the multi-task neural network is represented by the mean square error of the predicted values of the target object position (such as spatial position coordinates) of all the voxelized point cloud training samples and the true value of the target object position in the sample label. ; The size prediction loss of the multi-task neural network is represented by the mean square error of the predicted value of the target size (such as three-dimensional size) of all the voxelized point cloud training samples and the true value of the target size in the sample label; by The mean square error between the predicted value of the target rotation angle of all the voxelized point cloud training samples and the true value of the target rotation angle in the sample label represents the rotation angle prediction loss of the multi-task neural network.

In some embodiments of the present application, the heat map prediction loss is calculated using a pixel-by-pixel focal loss function (ie, focal loss function).

Assume that the position of the target object is p. After downsampling calculation, the key points (p _x , p _y ) on the heat map are obtained, and the calculated data is distributed to the heat map through the Gaussian kernel. If the Gaussian kernels of multiple targets overlap, the maximum value will be taken. The formula of the Gaussian kernel can be expressed as:

Among them, x and y are the enumerated step block positions in the image to be detected,

is the target scale adaptive variance, and Y _xyc is the Gaussian heat map data representation of each key point after Gaussian kernel mapping.

Then, the pixel-by-pixel focal loss function is used to calculate the loss of the heat map. The formula is as follows:

Among them, M represents the total number of targets;

Represents the possibility of a target predicted by the network, and the value range is (0, 1); y _xyc represents the true value of whether there is a target, and the value range is (0, 1); α and β are hyperparameters, The values are set based on experience. For example, α=2 and β=4 can be taken.

In the training phase of the multi-task neural network, the point cloud segmentation loss of the multi-task neural network is calculated based on the point cloud segmentation result prediction value of each voxelized point cloud training sample and the corresponding second point cloud segmentation label. . For example, the point cloud segmentation loss can be expressed by the cross entropy of the point cloud segmentation result prediction value and the corresponding second point cloud segmentation label.

Further, point cloud detection loss and point cloud segmentation loss are integrated to calculate the loss of the multi-task neural network, and with the goal of minimizing the loss of the entire network, optimize the network of the backbone network, point cloud detection network branch and cloud segmentation network branch. parameters to complete the training of multi-task neural networks.

By simultaneously performing the network branches corresponding to the point cloud detection task and the point cloud segmentation task, it can not only improve the representation ability of the backbone network to extract features, but also enable the two tasks to promote each other during the training process, thereby improving the accuracy of point cloud detection and segmentation. .

The point cloud feature vector is extracted and mapped through the backbone network of a multi-task neural network, and then input to the network branch corresponding to the point cloud detection task and the network branch corresponding to the point cloud segmentation task, respectively, for point cloud detection and point cloud detection. Segmentation enables point cloud detection tasks and point cloud segmentation tasks to share the input of the point cloud feature extraction network. Compared with using two neural networks to independently perform point cloud detection and point cloud segmentation, it saves the amount of calculation consumed by point cloud feature extraction. , effectively improving the efficiency of point cloud detection and point cloud segmentation.

Point cloud detection tasks in the prior art usually include: point cloud preprocessing, feature extraction, and detection head prediction steps. Point cloud segmentation tasks in the prior art generally include: point cloud preprocessing, feature extraction, and point cloud segmentation. steps, and the time consumption and resource consumption (cpu, gpu, etc.) of point cloud preprocessing and feature extraction account for 90% of the entire task consumption. For example, in terms of time, the detection task takes 20ms and the segmentation task takes 20ms. Among them, point cloud preprocessing and feature extraction take up 18ms. If the two tasks are based on independent network models, it will take 40ms. If this method is used, The method disclosed in the application embodiment uses a network for point cloud detection and segmentation. The total time consumed is 18+2+2=22ms, which greatly improves the efficiency of point cloud detection and segmentation and saves point cloud preprocessing and features. Extract consumed resources.

On the other hand, by voxelizing the point cloud to be processed, feature extraction and mapping are performed based on columnar voxels to achieve point cloud detection and segmentation. Compared with extracting features directly from the point cloud, feature extraction can be reduced. difficulty, thereby reducing the complexity of the network model.

Furthermore, the point cloud and its point cloud segmentation labels are converted to a bird's-eye view, and feature extraction, detection and segmentation are performed under the bird's-eye view, which is fast and effective. Finally, by converting the point cloud semantic segmentation results output by the model to each point in the point cloud, the task of semantic segmentation of point clouds based on points is completed, which effectively improves the speed of point cloud segmentation.

Embodiment 2

A point cloud detection and segmentation device disclosed in the embodiment of the present application, as shown in Figure 5, includes:

The columnar voxelization module 510 is used to perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed;

The voxel feature acquisition module 520 is used to perform feature extraction and mapping on the plurality of columnar voxels, and obtain the voxel features of the point cloud to be processed;

A bird's-eye view feature mapping module 530 is used to map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed;

The point cloud feature extraction module 540 is used to extract features of the bird's-eye view features through the backbone network of a pre-trained multi-task neural network to obtain a point cloud feature vector;

The point cloud detection and segmentation module 550 is used to perform target object detection based on the point cloud feature vector through the point cloud detection network branch of the multi-task neural network, and output the point cloud detection results; and, through the multi-task neural network The point cloud segmentation network branch of the network performs point cloud segmentation based on the point cloud feature vector and outputs the point cloud segmentation result.

In some embodiments of the present application, as shown in Figure 6, the point cloud segmentation result includes: the point cloud semantic category matched by each columnar voxel, and the device further includes:

The first point cloud segmentation label acquisition module 511 is used to obtain the first point cloud segmentation label of the plurality of columnar voxels, wherein the first point cloud segmentation label includes: position information of each columnar voxel. ;

The segmentation result conversion module 560 is used to map the point cloud semantic category matched by the columnar voxel to the point cloud to be processed according to the position information of the columnar voxel, and obtain the point cloud to be processed. Point segmentation results.

In some embodiments of the present application, the point cloud semantic category matched by the columnar voxel is mapped to the point cloud to be processed according to the position information of the columnar voxel, and the point to be processed is obtained. The segmentation results of cloud points include:

According to the position information of the columnar voxels, obtain the points in the point cloud to be processed contained in each of the columnar voxels;

For each columnar voxel, the point cloud semantic category matched by the columnar voxel is used as the point cloud semantic category matched by the points contained in the columnar voxel.

In some embodiments of the present application, the voxel feature acquisition module 520 is further used to:

For each columnar voxel, obtain the center point of all points divided into the columnar voxel, and calculate the coordinate distance between each point divided into the columnar voxel and the center point;

For each columnar voxel, point features divided into all points in the columnar voxel are spliced into voxel features of the columnar voxel, where the point features of each point include : The position coordinates and reflection intensity information of the point;

Splicing the voxel features of the columnar voxels to obtain the splicing features of the several columnar voxels;

Perform feature mapping on the spliced features to obtain voxel features of the point cloud to be processed.

In some embodiments of the present application, the bird's-eye view feature mapping module 530 is further used to:

Obtain the number of points included in each columnar voxel according to the position information of each columnar voxel in the first point cloud segmentation label;

For each columnar voxel, according to the number of points included in the columnar voxel, the feature corresponding to the columnar voxel in the voxel feature is mapped to the first point cloud segmentation label At the corresponding position of the matched bird's-eye view, the bird's-eye view features corresponding to the point cloud to be processed are obtained; where,

and mapping, according to the number of points included in the columnar voxel, the feature of the voxel feature corresponding to the columnar voxel to the corresponding position of the bird's-eye view that matches the first point cloud segmentation label. above, including:

When the number of points included in the columnar voxel is greater than 0, the feature vector corresponding to the columnar voxel in the voxel feature is mapped to a bird's-eye view that matches the first point cloud segmentation label. at the corresponding position of the figure;

In the case where the number of points included in the columnar voxels is equal to 0, the feature vector at the corresponding position of the bird's-eye view matching the first point cloud segmentation label is set to 0.

In some embodiments of the present application, the pre-trained multi-task neural network includes: a backbone network, a point cloud detection network branch, and a point cloud segmentation network branch. The device further includes:

A multi-task neural network training module (not shown in the figure) is used to train a multi-task neural network based on several voxelized point cloud training samples;

Wherein, the voxelized point cloud training sample is constructed based on the columnar voxels obtained after columnar voxelization processing of several point clouds respectively; for each of the voxelized point cloud training samples, the sample data includes: Several columnar voxels, the sample label includes: a second point cloud segmentation label matching the corresponding sample data; the second point cloud segmentation label is used to identify the true point cloud semantic category of each columnar voxel matching in the corresponding sample data. value; the true value of the point cloud semantic category matched by the columnar voxel is: among the point cloud semantic categories covered by the points in the point cloud that are divided into corresponding columnar voxels, the point cloud semantic category with the largest coverage rate.

In some embodiments of the present application, the sample label also includes: a point cloud detection label, which is used to identify the true value of the target detection result in the corresponding sample data. The sample label is based on several voxelized point clouds. Training samples to train multi-task neural networks, including:

For each of the voxelized point cloud training samples, the following point cloud detection and segmentation operations are performed respectively to obtain the predicted value of the point cloud detection result and the predicted value of the point cloud segmentation result of the corresponding voxelized point cloud training sample:

Perform feature extraction and mapping on several columnar voxels included in the voxelized point cloud training sample to obtain voxel features of the voxelized point cloud training sample;

Map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the voxelized point cloud training sample;

Through the backbone network, feature extraction is performed on the bird's-eye view features to obtain point cloud feature vectors;

Through the point cloud detection network branch, target object detection is performed based on the point cloud feature vector, and the point cloud detection result prediction value of the voxelized point cloud training sample is output; and, through the point cloud segmentation network branch, Perform point cloud segmentation based on the point cloud feature vector, and output the predicted value of the voxelized point cloud training sample point cloud segmentation result;

Calculate the point cloud detection loss of the multi-task neural network according to the point cloud detection result prediction value and the corresponding point cloud detection label of each voxelized point cloud training sample, and, according to each voxelized point cloud The point cloud segmentation result prediction value of the training sample and the corresponding second point cloud segmentation label are used to calculate the point cloud segmentation loss of the multi-task neural network. After that, the point cloud detection loss and the point cloud segmentation loss are optimized as The goal is to iteratively train the multi-task neural network.

The point cloud detection and segmentation device disclosed in the embodiment of this application is used to implement the point cloud detection and segmentation device method described in Embodiment 1 of this application. The specific implementation of each module of the device will not be described in detail. Please refer to the method embodiment. Specific implementation of the corresponding steps.

The point cloud detection and segmentation device disclosed in the embodiment of the present application obtains a number of columnar voxels that constitute the point cloud to be processed by performing columnar voxelization processing on the point cloud to be processed; and then characterizes the several columnar voxels. Extract and map, obtain the voxel features of the point cloud to be processed, and map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed; finally, through the pre-trained multi-task neural The backbone network of the network extracts features from the bird's-eye view features to obtain point cloud feature vectors; through the point cloud detection network branch of the multi-task neural network, target objects are detected based on the point cloud feature vectors and output point clouds Detection results; and, through the point cloud segmentation network branch of the multi-task neural network, point cloud segmentation is performed based on the point cloud feature vector, and the point cloud segmentation result is output, which helps to improve the efficiency of point cloud detection and point cloud segmentation. .

Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same and similar parts between the various embodiments can be referred to each other. As for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple. For relevant details, please refer to the partial description of the method embodiment.

The point cloud detection and segmentation method provided by this application has been introduced in detail above. This article uses specific examples to illustrate the principles and implementation methods of this application. The description of the above embodiments is only used to help understand this application. The method and its core idea; at the same time, for those of ordinary skill in the field, there will be changes in the specific implementation and application scope based on the ideas of this application. In summary, the contents of this specification should not understood as a limitation on this application.

The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the electronic device according to embodiments of the present application. The present application may also be implemented as an apparatus or device program (eg, computer program and computer program product) for performing part or all of the methods described herein. Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, or provided on a carrier signal, or in any other form.

For example, Figure 7 shows an electronic device that can implement the method according to the present application. The electronic device may be a PC, a mobile terminal, a personal digital assistant, a tablet computer, etc. The electronic device conventionally includes a processor 710 and a memory 720 and program code 730 stored on the memory 720 and executable on the processor 710. When the processor 710 executes the program code 730, the above embodiments are implemented. the method described. The memory 720 may be a computer program product or a computer-readable medium. Memory 720 may be electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. The memory 720 has a storage space 7201 for program code 730 of a computer program for executing any of the method steps described above. For example, the storage space 7201 for the program code 730 may include various computer programs respectively used to implement various steps in the above method. The program code 730 is computer readable code. These computer programs can be read from or written into one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. The computer program includes computer readable code that, when run on an electronic device, causes the electronic device to perform the method according to the above embodiments.

An embodiment of the present application also discloses a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the steps of the point cloud detection and segmentation method described in Embodiment 1 of the present application are implemented.

Such a computer program product may be a computer-readable storage medium, which may have storage segments, storage spaces, etc. arranged similarly to the memory 720 in the electronic device shown in FIG. 7 . The program code may, for example, be compressed and stored in the computer-readable storage medium in a suitable form. The computer-readable storage medium is typically a portable or fixed storage unit as described with reference to FIG. 8 . Generally, the storage unit includes computer readable code 730', which is code read by a processor. When these codes are executed by the processor, each step in the method described above is implemented.

Reference herein to "one embodiment," "an embodiment," or "one or more embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. In addition, please note that the examples of the word "in one embodiment" here do not necessarily all refer to the same embodiment.

In the instructions provided here, a number of specific details are described. However, it is understood that embodiments of the present application may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.

In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the element claim enumerating several means, several of these means may be embodied by the same item of hardware. The use of the words first, second, third, etc. does not indicate any order. These words can be interpreted as names.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims

A point cloud detection and segmentation method, including:

Perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed;

Perform feature extraction and mapping on the plurality of columnar voxels to obtain voxel features of the point cloud to be processed;

Map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed;

Through the backbone network of the pre-trained multi-task neural network, feature extraction is performed on the bird's-eye view features to obtain a point cloud feature vector;

Through the point cloud detection network branch of the multi-task neural network, the target object is detected based on the point cloud feature vector, and the point cloud detection result is output; and, through the point cloud segmentation network branch of the multi-task neural network, based on the The point cloud feature vector is used for point cloud segmentation and the point cloud segmentation result is output.
The method according to claim 1, wherein after the point cloud to be processed is subjected to columnar voxelization and a plurality of columnar voxels constituting the point cloud to be processed are obtained, the method further includes:

Obtain first point cloud segmentation labels of the plurality of columnar voxels, wherein the first point cloud segmentation label includes: position information of each columnar voxel;

The point cloud segmentation result includes: the point cloud semantic category matched by each columnar voxel, the point cloud segmentation network branch through the multi-task neural network, and the point cloud segmentation based on the point cloud feature vector, After outputting the point cloud segmentation results, it also includes:

According to the position information of the columnar voxels, the point cloud semantic categories matched by the columnar voxels are mapped to the point cloud to be processed, and a segmentation result of the points in the point cloud to be processed is obtained.
The method according to claim 2, wherein the point cloud semantic category matched by the columnar voxel is mapped to the point cloud to be processed according to the position information of the columnar voxel to obtain the The segmentation results of the points in the point cloud to be processed include:

According to the position information of the columnar voxels, obtain the points in the point cloud to be processed contained in each of the columnar voxels;

For each columnar voxel, the point cloud semantic category matched by the columnar voxel is used as the point cloud semantic category matched by the points contained in the columnar voxel.
The method according to claim 1, wherein performing feature extraction and mapping on the plurality of columnar voxels to obtain the voxel features of the point cloud to be processed includes:

For each columnar voxel, obtain the center point of all points divided into the columnar voxel, and calculate the coordinate distance between each point divided into the columnar voxel and the center point;

For each columnar voxel, point features divided into all points in the columnar voxel are spliced into voxel features of the columnar voxel, where the point features of each point include : The position coordinates and reflection intensity information of the point;

Splicing the voxel features of the columnar voxels to obtain the splicing features of the several columnar voxels;

Perform feature mapping on the spliced features to obtain voxel features of the point cloud to be processed.
The method according to claim 2, wherein mapping the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed includes:

Obtain the number of points included in each columnar voxel according to the position information of each columnar voxel in the first point cloud segmentation label;

For each columnar voxel, according to the number of points included in the columnar voxel, the feature corresponding to the columnar voxel in the voxel feature is mapped to the first point cloud segmentation label At the corresponding position of the matched bird's-eye view, the bird's-eye view features corresponding to the point cloud to be processed are obtained; where,

and mapping, according to the number of points included in the columnar voxel, the feature of the voxel feature corresponding to the columnar voxel to the corresponding position of the bird's-eye view that matches the first point cloud segmentation label. above, including:

When the number of points included in the columnar voxel is greater than 0, the feature vector corresponding to the columnar voxel in the voxel feature is mapped to a bird's-eye view that matches the first point cloud segmentation label. at the corresponding position in the figure;

In the case where the number of points included in the columnar voxels is equal to 0, the feature vector at the corresponding position of the bird's-eye view matching the first point cloud segmentation label is set to 0.
The method according to any one of claims 1 to 5, wherein the pre-trained multi-task neural network includes: a backbone network, a point cloud detection network branch, and a point cloud segmentation network branch. The backbone network of the neural network, which extracts features from the bird's-eye view features and obtains the point cloud feature vector, also includes:

Based on several voxelized point cloud training samples, train a multi-task neural network;

Wherein, the voxelized point cloud training sample is constructed based on the columnar voxels obtained after columnar voxelization processing of several point clouds respectively; for each of the voxelized point cloud training samples, the sample data includes: Several columnar voxels, the sample label includes: a second point cloud segmentation label matching the corresponding sample data; the second point cloud segmentation label is used to identify the true point cloud semantic category of each columnar voxel matching in the corresponding sample data. value; the true value of the point cloud semantic category matched by the columnar voxel is: among the point cloud semantic categories covered by the points in the point cloud that are divided into corresponding columnar voxels, the point cloud semantic category with the largest coverage rate.
The method according to claim 6, wherein the sample label further includes: a point cloud detection label, the point cloud detection label is used to identify the true value of the target detection result in the corresponding sample data, and the based on a plurality of voxels Collect point cloud training samples to train multi-task neural networks, including:

For each of the voxelized point cloud training samples, the following point cloud detection and segmentation operations are performed respectively to obtain the predicted value of the point cloud detection result and the predicted value of the point cloud segmentation result of the corresponding voxelized point cloud training sample:

Perform feature extraction and mapping on several columnar voxels included in the voxelized point cloud training sample to obtain voxel features of the voxelized point cloud training sample;

Map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the voxelized point cloud training sample;

Through the backbone network, feature extraction is performed on the bird's-eye view features to obtain point cloud feature vectors;

Through the point cloud detection network branch, target object detection is performed based on the point cloud feature vector, and the point cloud detection result prediction value of the voxelized point cloud training sample is output; and, through the point cloud segmentation network branch, Perform point cloud segmentation based on the point cloud feature vector, and output the predicted value of the voxelized point cloud training sample point cloud segmentation result;

Calculate the point cloud detection loss of the multi-task neural network according to the point cloud detection result prediction value and the corresponding point cloud detection label of each voxelized point cloud training sample, and, according to each voxelized point cloud The point cloud segmentation result prediction value of the training sample and the corresponding second point cloud segmentation label are used to calculate the point cloud segmentation loss of the multi-task neural network. After that, the point cloud detection loss and the point cloud segmentation loss are optimized as The goal is to iteratively train the multi-task neural network.
A point cloud detection and segmentation device, including:

A columnar voxelization module, used to perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed;

A voxel feature acquisition module, used to perform feature extraction and mapping on the plurality of columnar voxels, and obtain the voxel features of the point cloud to be processed;

A bird's-eye view feature mapping module, used to map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed;

A point cloud feature extraction module is used to extract features of the bird's-eye view features through the backbone network of a pre-trained multi-task neural network to obtain a point cloud feature vector;

A point cloud detection and segmentation module is configured to perform target object detection based on the point cloud feature vector through the point cloud detection network branch of the multi-task neural network, and output point cloud detection results; and, through the multi-task neural network The point cloud segmentation network branch performs point cloud segmentation based on the point cloud feature vector and outputs the point cloud segmentation result.
An electronic device, including a memory, a processor, and a program code stored in the memory and executable on the processor. When the processor executes the program code, it implements the method described in any one of claims 1 to 7. Point cloud detection and segmentation methods.
A computer-readable storage medium having program code stored thereon, which implements the steps of the point cloud detection and segmentation method described in any one of claims 1 to 7 when executed by a processor.
A computer program product comprising computer readable code that, when run on an electronic device, causes the electronic device to perform point cloud detection according to any one of claims 1 to 7 and Segmentation method.