CN114820463A

CN114820463A - Point cloud detection and segmentation method and device, and electronic equipment

Info

Publication number: CN114820463A
Application number: CN202210353486.1A
Authority: CN
Inventors: 赵天坤; 唐佳
Original assignee: Hozon New Energy Automobile Co Ltd
Current assignee: Hozon New Energy Automobile Co Ltd
Priority date: 2022-04-06
Filing date: 2022-04-06
Publication date: 2022-07-29
Also published as: WO2023193400A1

Abstract

The application discloses a point cloud detection and segmentation method, belongs to the technical field of computers, and is beneficial to improving the efficiency of point cloud detection and segmentation. The method comprises the following steps: performing cylindrical voxelization processing on the point cloud to be processed to obtain a plurality of cylindrical voxels forming the point cloud to be processed; extracting and mapping the characteristics of the plurality of columnar voxels to obtain voxel characteristics of the point cloud to be processed, and mapping the voxel characteristics to a bird's-eye view to obtain bird's-eye view characteristics corresponding to the point cloud to be processed; extracting the characteristics of the aerial view through a pre-trained backbone network of a multitask neural network to obtain a point cloud characteristic vector; through the point cloud detection network branch and the point cloud segmentation network branch of the multitask neural network, point cloud detection and point cloud segmentation are respectively carried out on the basis of the point cloud characteristic vector, and the efficiency of point cloud detection and point cloud segmentation is improved by reducing repeated operation of extracting point cloud characteristics.

Description

Point cloud detection and segmentation method and device, and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for point cloud detection and segmentation, and an electronic device and a computer-readable storage medium.

Background

Point cloud data (point cloud data) refers to a collection of vectors in a three-dimensional coordinate system. The spatial information is recorded in the form of points, each point containing three-dimensional coordinates. Some point cloud data may further include color information (RGB) or reflection Intensity information (Intensity), etc. according to the difference of the data acquisition capability of the point cloud acquisition equipment. Taking point cloud data acquired by a laser radar as an example, the point cloud data comprises position coordinates and reflection intensity information of a midpoint in a three-dimensional space. The point cloud data is widely used for detecting and identifying target objects in the field of automatic driving. For example, object detection and identification in the automotive, unmanned, etc. automotive field. In the application of point cloud data, point cloud detection and segmentation techniques are generally used to perform object detection and point cloud segmentation based on the point cloud data. The point cloud detection technology is used for detecting the position of a target object in a scene matched with point cloud data by processing the point cloud data, and the point cloud segmentation technology is used for identifying the type of the target object matched with each point in the point cloud data, so that automatic driving control can be conveniently performed subsequently.

In the prior art, different network models are usually adopted to respectively execute the tasks of point cloud detection and point cloud segmentation. Because the point cloud data has the characteristics of sparsity and irregularity, the structures of the detection network and the segmentation network which are usually adopted are relatively complex, so that the calculation amount required for obtaining the point cloud detection result and the point cloud segmentation result is very high.

It can be seen that the point cloud detection and segmentation methods in the prior art need to be improved.

Disclosure of Invention

The embodiment of the application provides a point cloud detection and segmentation method which is beneficial to improving the efficiency of point cloud detection and point cloud segmentation.

In a first aspect, an embodiment of the present application provides a point cloud detection and segmentation method, including:

performing cylindrical voxelization processing on the point cloud to be processed to obtain a plurality of cylindrical voxels forming the point cloud to be processed;

extracting and mapping the characteristics of the plurality of columnar voxels to obtain the voxel characteristics of the point cloud to be processed;

mapping the voxel characteristics to a bird's-eye view to obtain bird's-eye view characteristics corresponding to the point cloud to be processed;

extracting the characteristics of the aerial view through a pre-trained backbone network of a multitask neural network to obtain a point cloud characteristic vector;

performing target object detection based on the point cloud characteristic vector through a point cloud detection network branch of the multitask neural network, and outputting a point cloud detection result; and performing point cloud segmentation on the basis of the point cloud characteristic vector through a point cloud segmentation network branch of the multitask neural network, and outputting a point cloud segmentation result.

In a second aspect, an embodiment of the present application provides a point cloud detection and segmentation apparatus, including:

the system comprises a cylindrical voxelization module, a processing module and a processing module, wherein the cylindrical voxelization module is used for performing cylindrical voxelization processing on point cloud to be processed to obtain a plurality of cylindrical voxels forming the point cloud to be processed;

the voxel characteristic acquisition module is used for extracting and mapping the characteristics of the plurality of columnar voxels to acquire the voxel characteristics of the point cloud to be processed;

the aerial view characteristic mapping module is used for mapping the voxel characteristics to an aerial view to obtain aerial view characteristics corresponding to the point cloud to be processed;

the point cloud feature extraction module is used for extracting features of the aerial view features through a pre-trained backbone network of a multitask neural network to obtain point cloud feature vectors;

the point cloud detection and segmentation module is used for detecting a target object based on the point cloud characteristic vector through a point cloud detection network branch of the multitask neural network and outputting a point cloud detection result; and performing point cloud segmentation on the basis of the point cloud characteristic vector through a point cloud segmentation network branch of the multitask neural network, and outputting a point cloud segmentation result.

In a third aspect, an embodiment of the present application further discloses an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the point cloud detection and segmentation method described in the embodiment of the present application when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, performs the steps of the point cloud detection and segmentation method disclosed in the embodiments of the present application.

According to the point cloud detection and segmentation method disclosed by the embodiment of the application, a plurality of cylindrical voxels forming the point cloud to be processed are obtained by performing cylindrical voxelization processing on the point cloud to be processed; then, extracting and mapping the characteristics of the plurality of columnar voxels to obtain voxel characteristics of the point cloud to be processed, and mapping the voxel characteristics to a bird's-eye view to obtain bird's-eye view characteristics corresponding to the point cloud to be processed; finally, extracting the characteristics of the aerial view through a pre-trained backbone network of a multitask neural network to obtain a point cloud characteristic vector; performing target object detection based on the point cloud characteristic vector through a point cloud detection network branch of the multitask neural network, and outputting a point cloud detection result; and performing point cloud segmentation based on the point cloud characteristic vector through the point cloud segmentation network branch of the multitask neural network, outputting a point cloud segmentation result, and contributing to improving the efficiency of point cloud detection and point cloud segmentation.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

FIG. 1 is a schematic flow chart of a point cloud detection and segmentation method according to a first embodiment of the present application;

FIG. 2 is a schematic diagram of a point cloud voxelization processing effect in the first embodiment of the present application

FIG. 3 is a diagram illustrating a multitasking neural network architecture used in one embodiment of the present application;

FIG. 4 is a schematic diagram of mapping a point cloud segmentation result according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a point cloud detection and segmentation apparatus according to a second embodiment of the present application;

FIG. 6 is a second schematic view of the point cloud detection and segmentation apparatus according to the second embodiment of the present application

FIG. 7 schematically shows a block diagram of an electronic device for performing a method according to the present application; and

fig. 8 schematically shows a storage unit for holding or carrying program code implementing a method according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Example one

As shown in fig. 1, a method for point cloud detection and segmentation disclosed in the embodiments of the present application includes: step 110 to step 150.

And 110, performing cylindrical voxelization processing on the point cloud to be processed to obtain a plurality of cylindrical voxels forming the point cloud to be processed.

The point cloud to be processed in the embodiment of the application is as follows: and point clouds in the region of interest in the point clouds acquired by a point cloud acquisition device (such as a laser radar sensor).

Taking the application of the point cloud detection and segmentation method described in the embodiment of the present application to an automatic driving scene of an automobile as an example, the original point cloud collected by a laser radar sensor arranged on a vehicle is a data set of a plurality of unordered points, where data of each point can be represented by data with a dimension of 4, for example, the data is represented as: (x, y, z, i), where x, y, z are the spatial location coordinates of a point and i represents the reflected intensity of the point.

For an original point cloud acquired by a point cloud acquisition device, point cloud preprocessing is firstly required to obtain a point set meeting requirements. For example, for the original point cloud, nan values (null values) are removed, or points where the value is very large are removed to filter the point cloud noise. The specific implementation of the point cloud pretreatment can be found in the prior art, and the technical scheme adopted by the point cloud pretreatment in the embodiment of the application is not limited, and is not further described here.

The point cloud collected by the point cloud collecting device (such as a laser radar sensor) is a point in a three-dimensional irregular space area, and the point cloud in a regular space area needs to be determined before the point cloud is detected and segmented. For example, by defining the coordinate ranges in the x, y and z directions, taking a large block of point clouds in the cubic area and discarding the rest, the size of this cubic area can be expressed as: [ xmax-xmin, ymax-ymin, zmax-zmin ], wherein xmax and xmin respectively represent a maximum value and a minimum value of coordinates in an x direction, ymax and ymin respectively represent a maximum value and a minimum value of coordinates in a y direction, and zmax and zmin respectively represent a maximum value and a minimum value of coordinates in a z direction.

Further, the data of the points in the region of interest in the large cube region determined in the foregoing is acquired, so that the point cloud in the region of interest can be conveniently subjected to point cloud detection and point cloud segmentation in the following process. In some embodiments of the present application, the coordinates of a point within the region of interest may be represented by (x, y, z), where xmin < x < xmax, ymin < y < ymax, zmin < z < zmax, in meters.

In some embodiments of the present application, the points in the region of interest are determined according to the point cloud quality. For example, the point clouds at positions farther from the vehicle are sparse, the number of points hitting the vehicle is small, the minimum number of points can be set to be a small number (for example, the number of points is equal to 5), then, a corresponding number of points are found according to the number of points, and a space area is determined according to a point with the largest distance. In some embodiments of the present application, for the same point cloud quality (e.g., point cloud collected by the same point cloud collection device), the distance may be predetermined by the quality of the collected point cloud data and may not change during the application process.

The method for determining the region of interest may refer to a method for determining the region of interest adopted in a point cloud detection or point cloud segmentation scheme in the prior art, and in the embodiment of the present application, a specific implementation manner for determining the region of interest is not limited.

Because the point cloud collected by the point cloud collection equipment comprises a large number of points, the point-based feature extraction consumes a large amount of computing resources when being used for point cloud detection point cloud segmentation, so in the embodiment of the application, the original point cloud is subjected to voxelization processing at first, and then the feature extraction is performed based on the voxels, so that the data processing amount can be effectively reduced, and the computing resources are saved.

In some embodiments of the present application, the performing a cylindrical voxelization process on the point cloud to be processed to obtain a plurality of cylindrical voxels constituting the point cloud to be processed includes: and dividing points in the point cloud to be processed into a plurality of cylindrical voxels according to the coordinate distribution of the first coordinate axis and the second coordinate axis. In some embodiments of the present application, the first coordinate axis and the second coordinate axis are two different coordinate axes of a three-dimensional space coordinate system, and the cylindrical voxels are prism-shaped voxels. For example, after the point cloud shown on the left side of fig. 2 is subjected to voxelization, a rectangular parallelepiped voxel (i.e., a cylindrical voxel) 210 shown on the right side of fig. 2 can be obtained.

With the first coordinate axis as the x-axis and the second coordinate axis as the y-axis, the points in the region of interest can be divided into cuboid voxels along the x-axis and y-axis directions, respectively, the z-axis direction is not divided, and the size of each voxel obtained by dividing can be represented as [ x [ _v ,y _v ,zmax-zmin]Wherein x is _v Representing the length of a voxel in the x-axis direction, y _v Representing the length of the voxel along the y-axis, zmax-zmin representing the height of the voxel along the z-axis, in meters. According to the above-mentioned method for generating the cylindrical voxels, corresponding to a region of interest, W × H cylindrical voxels can be obtained by dividing,

taking the case where x is (0,102.4), y is (0,50), z is (0,100), and the size of the voxel is 0.2 × 0.2 × 100 in the region of interest, the number w of voxels in the x-axis direction is equal to (102.4-0)/0.2 × 512, and the number H of voxels in the y-axis direction is equal to (50-0)/0.2 × 250, and the region of interest is divided into 512 × 250 voxels. Subsequently, these cylindrical voxels are treated as image pixels for feature extraction of the region of interest. In some embodiments of the present application, after the voxelization process, the point cloud of the region of interest may be represented as a W × H × 1 voxel image, with dimensions of the voxel image being W × H × 1.

In some embodiments of the present application, the size of the cylindrical voxels is determined experimentally. For example, some voxel sizes may be preset, point cloud detection and point cloud segmentation experiments may be performed, the influence of the voxel sizes on the detection and segmentation results and performance may be analyzed, and the most voxel sizes may be determined.

In some embodiments of the present application, after performing a cylindrical voxelization process on the point cloud to be processed and obtaining a plurality of cylindrical voxels constituting the point cloud to be processed, the method further includes: obtaining a first point cloud segmentation label of the plurality of cylindrical voxels, wherein the first point cloud segmentation label comprises: location information for each of the cylindrical voxels. The first point cloud segmentation label of the voxel image, that is, the first point cloud segmentation label of the W × H voxels may be represented by a position information table having a size of W × H, for example, (W, H, 1). And the first point cloud segmentation label is used for determining the segmentation result of the point cloud according to the segmentation result of the cylindrical voxels.

In some embodiments of the present application, the obtaining a first point cloud segmentation label of the plurality of cylindrical voxels includes: for each of the cylindrical voxels, using the position information of the cylindrical voxel as a first point cloud segmentation label matched with the cylindrical voxel. For example, the first point cloud partition label may be represented by a position information table having a size of W × H, for example, (W, H, 1). The position information table includes W × H sets of position information, each set of position information corresponds to one voxel, and each set of position information is used to indicate a coordinate range of the corresponding voxel on the x axis and the y axis, for example. It follows that each set of location information may also be used to represent a range of coordinates of points in the point cloud divided into the voxels of the histogram corresponding to the set of location information. In some embodiments of the present application, a mapping relationship between a point in the point cloud and a corresponding voxel may be established by recording a coordinate range of the corresponding voxel in the position information table. In other embodiments of the present application, other ways may also be used to establish the mapping relationship between the points in the point cloud and the voxels of the histogram. In the embodiment of the present application, the concrete expression form of the mapping relationship is not limited.

And 120, extracting and mapping the features of the plurality of columnar voxels to obtain the voxel features of the point cloud to be processed.

After a plurality of cylindrical voxels forming a point cloud to be processed (such as the point cloud of the region of interest) are acquired, the cylindrical voxels can be regarded as pixels of an image, a voxel image formed by the plurality of cylindrical voxels is subjected to feature extraction and mapping, and the features of the voxel image are acquired.

In some embodiments of the present application, the extracting and mapping the features of the plurality of cylindrical voxels to obtain the voxel features of the point cloud to be processed includes: for each columnar voxel, acquiring the central points of all the points divided into the columnar voxels, and calculating the coordinate distance between each point divided into the columnar voxels and the central point; for each of the cylindrical voxels, joining point features of all the points classified into the cylindrical voxels into voxel features of the cylindrical voxels, wherein the point features of each of the points include: position coordinates and reflection intensity information of the points; splicing the voxel characteristics of the columnar voxels to obtain the splicing characteristics of the plurality of columnar voxels; and performing feature mapping on the splicing features to obtain voxel features of the point cloud to be processed.

For example, for each of the voxels obtained in the previous step, each voxel contains a certain number of points. Taking an example that a certain columnar voxel comprises K points, firstly, calculating the coordinate average value of the K points according to the position coordinates in the original point cloud data of the K points

As the coordinates of the center point of the K points; then, the position coordinates of the K points are subtracted from the average coordinate value to obtain

And adopt x _c ,y _c ,z _c Representing a coordinate distance between a point in the columnar voxel and the center point; then, the point feature data x, y, z, i, x in each of the cylindrical voxels _c ,y _c ,z _c And (4) showing. Thus, a voxel containing K points can be characterized by a length of K7, i.e., a voxelMay be represented by the point feature of all points included.

Further, for a point cloud to be processed (such as the point cloud of the foregoing region of interest) including N voxels, voxel characteristics of the point cloud to be processed may be represented by characteristics of the N voxels obtained after the point cloud to be processed is subjected to voxelization. For example, for a point cloud to be processed including N voxels, features (such as the features with the length of K × 7) of the N voxels obtained after the point cloud to be processed is subjected to voxelization processing are spliced, so as to obtain a spliced feature with the length of N × K × 7.

In some embodiments of the present application, a voxel may be discarded if there are no points in the voxel.

And then, further performing feature mapping on the obtained splicing features to obtain voxel features of preset dimensions of the point cloud to be processed. For example, for the mosaic features of N cylindrical voxels, the mosaic features may be feature mapped through a pre-trained feature extraction network to obtain features with a length of N × D, where D represents the feature dimensionality of each cylindrical voxel. In some embodiments of the present application, the feature extraction network may be constructed by serially connecting a full link layer, a normalization layer, and a one-dimensional maximum pooling layer MaxPool1D, and finally, outputting features of N × D dimensions, where D is a dimension output by the full link layer.

And step 130, mapping the voxel characteristics to a bird's-eye view to obtain bird's-eye view characteristics corresponding to the point cloud to be processed.

In some embodiments of the present application, the mapping the voxel features to a bird's-eye view to obtain bird's-eye view features corresponding to the point cloud to be processed includes: acquiring the number of points included in each cylindrical voxel according to the position information of each cylindrical voxel in the first point cloud segmentation label; for each columnar voxel, mapping the characteristics corresponding to the columnar voxel in the voxel characteristics to the corresponding position of the aerial view matched with the first point cloud segmentation label according to the number of points included in the columnar voxel to obtain the aerial view characteristics corresponding to the point cloud to be processed; wherein the mapping, according to the number of points included in the voxel histogram, a feature corresponding to the voxel histogram among the voxel features to a corresponding position of the aerial view matched with the first point cloud segmentation label includes: under the condition that the number of points included in the columnar voxels is larger than 0, mapping a feature vector corresponding to the columnar voxels in the voxel features to corresponding positions of a bird's eye view matched with the first point cloud segmentation label; setting a feature vector at a corresponding position of the bird's eye view that matches the first point cloud segmentation label to 0 in a case where the number of points included in the voxel is equal to 0.

As described above, each of the voxels corresponds to one of the label data (i.e., a set of position information) in the first point cloud segmentation label, for example, the first label data corresponds to the voxels of the histogram having the coordinate range of (0, 0) to (0.2 ) in the first point cloud segmentation label having the length W × H. In this step, a bird's eye view that matches the first point cloud segmentation label dimension may be initialized, for example, a bird's eye view of size W × H, such that there are W × H pixels on the bird's eye view, and the features of each pixel are represented by a D-dimensional feature vector, and each pixel corresponds to a voxel. In this way, for the voxel feature with a length of N × D obtained in the above step, feature vectors of each of the N voxels of the voxel feature are mapped to corresponding positions on the bird's eye view, and a bird's eye view feature with a size of W × H × D can be obtained.

In some embodiments of the present application, due to the sparsity of the point cloud, some of the voxels of the histogram may not have points, so that when performing feature mapping, the locations on the bird's eye view corresponding to the voxels of the histogram that do not include points may have their feature vectors set to zero vectors.

And 140, extracting the characteristics of the aerial view through a pre-trained backbone network of the multitask neural network to obtain a point cloud characteristic vector.

In some embodiments of the present application, as shown in fig. 3, the multitasking neural network comprises: a backbone network 310, a point cloud detection network branch 320, and a cloud segmentation network branch 330.

The backbone network 310 may adopt a convolutional neural network commonly used in the prior art. For example, in some embodiments of the present application, as shown in fig. 3, the backbone network 310 further includes: three cascaded feature extraction modules of different scales and a feature concatenation layer (ConCat), wherein each feature extraction module comprises: a different number of feature mapping modules (CBR), an upsampling layer, and a feature mapping module (CBR). The number of feature mapping modules (CBR) included in each feature extraction module can be 4, 6 and 6 respectively, and the feature mapping modules (CBR) can be formed by cascading convolutional layers, batch normalization layers and Relu activation functions. Taking the size of the input feature as W × H as an example, the sizes of the features output by the three feature extraction modules are respectively

The feature splicing layer is used for splicing the features output by the three feature extraction modules. After the bird's-eye view features with the size of W × H × D are input into the backbone network 310, the three feature extraction modules respectively perform convolution operation, upsampling, normalization and activation on the input bird's-eye view features, and finally, after the bird's-eye view features are spliced through the feature splicing layer, the obtained feature vector dimension is

And C is the number of characteristic channels.

Step 150, detecting a target object based on the point cloud characteristic vector through a point cloud detection network branch of the multitask neural network, and outputting a point cloud detection result; and performing point cloud segmentation on the basis of the point cloud characteristic vector through a point cloud segmentation network branch of the multitask neural network, and outputting a point cloud segmentation result.

Next, the point cloud feature vectors output by the backbone network 310 are input to the point cloud detection network branch 320 and the cloud segmentation network branch 330, and the two network branches perform the next processing.

The following illustrates the execution schemes of the point cloud detection task and the point cloud segmentation task by respectively combining the network structures of the point cloud detection network branch 320 and the cloud segmentation network branch 330.

In some embodiments of the present application, the point cloud detection network branch 320 includes four detection heads, which are respectively used to output the detection result of whether the thermodynamic diagram exists, the detected position of the target object, the size of the target object, and the rotation angle of the target object. In some embodiments of the present application, each detection head included in the point cloud detection network branch 320 is formed by cascading a feature extraction module and a convolution layer, wherein the feature extraction module further comprises a convolution layer, a batch normalization layer, and an activation function. And each detection head respectively carries out feature coding and transformation mapping on the input point cloud feature vector and finally outputs a corresponding prediction result. For example, for a detection head of size of

Predicting each position in the point cloud feature vector, and outputting whether the corresponding position is a key point on the thermodynamic diagram; for another example, the size of the detection head corresponding to the detection target object is set to

Predicting in the point cloud feature vector, and outputting the position (x, y, z) of the detected target object; for another example, the size (dx, dy, dz) of the output target is output corresponding to the detection head outputting the size of the target; and outputting the rotation angle theta of the target object corresponding to the detection head outputting the rotation angle of the target object.

In some embodiments of the present application, as shown in fig. 3, the point cloud segmentation network branch 330 is composed of an upsampling module, a feature extraction module and a convolution layer cascade, wherein the feature extraction module is further composed of a convolution layer, a batch normalization layer and an activation function. The upsampling layer performs upsampling processing on the point cloud feature vector output by the backbone network 310, performs feature conversion and mapping on the vector obtained by the upsampling processing by the convolutional layer, the batch normalization layer and the activation function in sequence, and finally outputs a segmentation result corresponding to the cylindrical voxel through the convolutional layer.

The size of the point cloud characteristic vector output by the backbone network is

For example, the point cloud segmentation network branch 330 finally outputs data with one dimension of (W, H, n _ class) by performing up-sampling, convolution operation, batch normalization, activation mapping, and the like on the input point cloud feature vector. Wherein W and H refer to the width and height of the dimension of the output data corresponding to the input characteristic diagram; n _ classes represents the number of point cloud semantic categories. Taking the point cloud to be processed as a voxel, W × H number of cylindrical voxels are obtained, W is 512, H is 512, and the number of semantic categories of the point cloud is as an example, the size of the output data of the point cloud segmentation network branch 330 is: 512 × 11, which indicates that at the 512 × 512 positions, each position has a group of segmentation result prediction values, the number of which is 11, the values of the 11 segmentation result prediction values are between 0 and 1, and the total sum is 1, which indicates the probability value of each voxel belonging to the corresponding point cloud semantic category. Further, the point cloud semantic category corresponding to the maximum probability value can be taken as the point cloud semantic category matched with the corresponding cylindrical voxel.

In some embodiments of the present application, the point cloud semantic category is determined according to a specific application scenario. For example, for point clouds captured in automotive autopilot applications, point cloud semantic categories may be defined including, but not limited to, any one or more of: buildings, greens, floors, fences, curbs, lane lines, vehicles, and the like.

In this way, point cloud segmentation processing is performed based on the point cloud feature vector through a point cloud segmentation network branch of the multitask neural network, and the point cloud segmentation network branch outputs a point cloud segmentation result of the plurality of cylindrical voxels matched with the point cloud feature vector (namely, all the cylindrical voxels obtained after the point cloud to be processed is subjected to voxelization processing).

As is apparent from the above description, the segmentation result output by the point cloud segmentation network branch is a segmentation result obtained by performing semantic segmentation based on the features projected onto the bird's eye view, while in the subsequent point cloud data processing, the segmentation result of the point cloud midpoint needs to be obtained, and therefore, the segmentation result based on the voxel needs to be converted into the segmentation result of the point cloud midpoint. In some embodiments of the present application, the point cloud segmentation result comprises: the point cloud semantic category matched with each cylindrical voxel, the point cloud segmentation network branch passing through the multitask neural network, the point cloud segmentation based on the point cloud characteristic vector, and the point cloud segmentation result output further comprise: and mapping the point cloud semantic category matched with the cylindrical voxels to the point cloud to be processed according to the position information of the cylindrical voxels to obtain a segmentation result of the midpoint of the point cloud to be processed.

In some embodiments of the present application, the mapping the point cloud semantic category matched with the voxel to the point cloud to be processed according to the position information of the voxel column to obtain a segmentation result of a midpoint of the point cloud to be processed, includes: acquiring points in the point cloud to be processed contained in each columnar voxel according to the position information of the columnar voxel; and regarding each columnar voxel, taking the point cloud semantic category matched with the columnar voxel as the point cloud semantic category matched with the point contained in the columnar voxel. As can be seen from the foregoing description, each voxel corresponds to a position under the bird's eye view, and the segmentation result of the voxel obtained from this mapping relationship can be regarded as the point cloud semantic segmentation result of the columnar area in the point cloud. As shown in fig. 4, each box in the bird's eye view corresponds to one voxel, and the segmentation result corresponding to the image position matched with each box in the bird's eye view can be regarded as the segmentation result of the voxel corresponding to the box. As can be seen from the foregoing description, each voxel corresponds to a spatial region in the point cloud to be processed, and the spatial region may include 0 or more points, and then the segmentation result (i.e., the matched point cloud semantic category) of each voxel may be further used as the point cloud semantic category of each point included in the voxel, so as to complete the semantic segmentation of the point cloud midpoint. For example, for a voxel with a coordinate range of (0, 0) to (0.2 ), if the segmentation result of the voxel is "road edge", it may be determined that, in the point cloud to be processed, the point cloud semantic categories of point matches with the coordinate range of (0, 0) to (0.2 ) are all "road edge".

In order to facilitate the reader to better understand the point cloud detection and segmentation method disclosed in the embodiment of the present application, the following describes an example of the training scheme of the multitask neural network.

As previously mentioned, the pre-trained multitask neural network comprises: in some embodiments of the present application, before the extracting features of the bird's eye view features through the pre-trained multi-task neural network, the method further includes: training a multitask neural network based on a plurality of voxelized point cloud training samples; the voxel point cloud training sample is constructed according to the cylindrical voxels obtained by respectively performing cylindrical voxel processing on a plurality of point clouds; for each of the voxelized point cloud training samples, the sample data comprises: a number of cylindrical voxels, the exemplar labels comprising: a second point cloud segmentation label matched with the corresponding sample data; the second point cloud segmentation label is used for identifying a point cloud semantic category true value matched with each cylindrical voxel in corresponding sample data; the real value of the point cloud semantic category matched with the cylindrical voxels is as follows: and the point cloud semantic category with the maximum coverage rate is divided into the point cloud semantic categories covered by the points in the corresponding cylindrical voxels.

When a voxelized point cloud training sample is constructed, the specific implementation mode of generating sample data refers to the corresponding implementation mode in the previous step, for example, the specific implementation mode of obtaining the point cloud to be processed and voxelizing the point cloud to be processed to obtain a plurality of columnar voxels is obtained, and details are not repeated here.

Further, for all the cylindrical voxels obtained after each point cloud is subjected to voxelization, each cylindrical voxel comprises a certain number of points, and the point cloud semantic categories are manually labeled. For example, for a certain cylindrical voxel, 3 points are included, the 3 points are respectively labeled with point cloud semantic categories (such as car, bicycle, tricycle, pedestrian, cone, green plant, ground, fence, road edge, lane line, etc.), and if the points are (building, green plant), the most numerous buildings are taken as the point cloud semantic categories matched by the cylindrical voxel. And after the certain point cloud is subjected to voxelization, arranging all point cloud semantic categories matched with the columnar voxels according to the voxel positions to obtain point cloud semantic category labels (namely second point cloud segmentation labels) matched with the sample data generated by the point cloud.

Taking sample data as W × H cylindrical voxels as an example, a sample label of the sample data is a label matrix which can be represented as W × H, and each element in the label matrix is an identifier of a point cloud semantic category matched with the corresponding cylindrical voxel.

In some embodiments of the present application, the sample label further comprises: and the point cloud detection tag is used for identifying the true value of the target object detection result in the corresponding sample data. For example, for each point cloud used for generating a training sample, the key point of the target object on the thermodynamic diagram, the spatial position coordinates, the three-dimensional size and the rotation angle of the target object in the point cloud are manually marked, and standard information is used as the point cloud detection label of the training sample generated by the point cloud.

In some embodiments of the present application, training a multitask neural network based on a plurality of voxelized point cloud training samples comprises: for each voxelized point cloud training sample, respectively executing the following point cloud detection and segmentation operations to obtain a point cloud detection result predicted value and a point cloud segmentation result predicted value of the corresponding voxelized point cloud training sample: performing feature extraction and mapping on a plurality of columnar voxels included in the voxelized point cloud training sample to obtain the voxel features of the voxelized point cloud training sample; mapping the voxel characteristics to a bird's-eye view to obtain bird's-eye view characteristics corresponding to the voxel point cloud training sample; extracting the characteristics of the aerial view through the backbone network to obtain a point cloud characteristic vector; through the point cloud detection network branch, performing target object detection based on the point cloud feature vector, and outputting a point cloud detection result prediction value of the voxelized point cloud training sample; and performing point cloud segmentation based on the point cloud feature vector through the point cloud segmentation network branch, and outputting a predicted value of a point cloud segmentation result of the voxelized point cloud training sample; and calculating the point cloud detection loss of the multitask neural network according to the point cloud detection result predicted value and the corresponding point cloud detection label of each voxelized point cloud training sample, calculating the point cloud segmentation loss of the multitask neural network according to the point cloud segmentation result predicted value and the corresponding second point cloud segmentation label of each voxelized point cloud training sample, and then iteratively training the multitask neural network by taking the point cloud detection loss and the point cloud segmentation loss as targets.

And performing feature extraction and mapping on a plurality of cylindrical voxels included in the voxelized point cloud training sample to obtain a specific implementation mode of the voxel features of the voxelized point cloud training sample, which is referred to in the foregoing specific implementation mode of extracting the voxel features of the point cloud to be processed, and is not described herein again.

And mapping the voxel characteristics to a bird's-eye view to obtain a specific embodiment of the bird's-eye view characteristics corresponding to the voxel point cloud training sample, which is described in the foregoing, and is not repeated herein.

And extracting features of the aerial view features through the backbone network to obtain a specific implementation of a point cloud feature vector, which is described in the foregoing, and is not repeated herein.

And through the point cloud detection network branch, performing target object detection based on the point cloud feature vector, and outputting a specific implementation manner of a point cloud detection result prediction value of the voxelized point cloud training sample, which refers to the related description of the detection result of the point cloud to be processed obtained in the foregoing, and is not repeated herein.

The specific implementation of performing point cloud segmentation based on the point cloud feature vector through the point cloud segmentation network branch and outputting the predicted value of the point cloud segmentation result of the voxelized point cloud training sample refers to the description related to the point cloud segmentation result to be processed, which is not repeated herein.

And in the training stage of the multitask neural network, calculating the point cloud detection loss of the multitask neural network according to the point cloud detection result prediction value of each voxelized point cloud training sample and the corresponding point cloud detection label. The point cloud detection loss comprises four parts which are respectively: thermodynamic map predicted loss, position predicted loss, size predicted loss, and rotation angle predicted loss.

In some embodiments of the present application, the position prediction loss, the magnitude prediction loss, and the rotation angle prediction loss may be expressed in terms of a mean square error. For example, the position prediction loss of the multitask neural network is represented by the mean square error of the predicted value of the target position (such as the spatial position coordinate) of all the voxelized point cloud training samples and the true value of the target position in the sample label; representing the size prediction loss of the multitask neural network through the predicted value of the size (such as the three-dimensional size) of the target object of all the voxelized point cloud training samples and the mean square error of the true value of the size of the target object in the sample label; and representing the rotation angle prediction loss of the multitask neural network through the prediction value of the rotation angle of the target object of all the voxelized point cloud training samples and the mean square error of the true value of the rotation angle of the target object in the sample label.

In some embodiments of the present application, the thermodynamic diagram predicted loss is calculated using a pixel-by-pixel focal loss function (i.e., a focal loss function).

Assuming that the position of the target object is p, a key point (p) on the thermodynamic diagram is obtained after downsampling calculation _x ,p _y ) The calculated data is distributed to a thermodynamic diagram by a gaussian kernel. If the gaussian kernels of multiple targets overlap, then the maximum will be taken and the formula for the gaussian kernels can be expressed as:

wherein x and y are the enumerated step block positions in the image to be detected,

adaptive variance, Y, for a target scale _xyc Gaussian heatmap data representation of each keypoint after mapping for the gaussian kernel.

Then, the loss of the thermodynamic diagram is calculated by adopting a pixel-by-pixel focal loss function, and the formula is as follows:

wherein M represents the total number of targets;

the probability of the target object predicted by the network is represented, and the value range is (0, 1); y is _xyc Whether a real value of the target object exists or not is represented, and the value range is (0, 1); α and β are hyper-parameters, and values thereof are set empirically, and for example, α ═ 2 and β ═ 4 can be used.

And in the training stage of the multitask neural network, calculating the point cloud segmentation loss of the multitask neural network according to the point cloud segmentation result predicted value of each voxelized point cloud training sample and the corresponding second point cloud segmentation label. For example, the point cloud segmentation loss may be expressed by the cross entropy of the point cloud segmentation result prediction values and the corresponding second point cloud segmentation labels.

Further, point cloud detection loss and point cloud segmentation loss are fused, loss of the multitask neural network is calculated, and network parameters of the trunk network, the point cloud detection network branches and the cloud segmentation network branches are optimized by taking the minimum loss of the whole network as a target, so that training of the multitask neural network is completed.

By simultaneously carrying out the point cloud detection task and the network branches corresponding to the point cloud segmentation task, the characterization capability of extracting features of a main network can be improved, and mutual promotion of the two tasks in the training process can be realized, so that the precision of point cloud detection and segmentation is improved.

According to the point cloud detection and segmentation method disclosed by the embodiment of the application, a plurality of cylindrical voxels forming the point cloud to be processed are obtained by performing cylindrical voxelization processing on the point cloud to be processed; then, extracting and mapping the characteristics of the plurality of columnar voxels to obtain voxel characteristics of the point cloud to be processed, and mapping the voxel characteristics to a bird's-eye view to obtain bird's-eye view characteristics corresponding to the point cloud to be processed; finally, extracting the characteristics of the aerial view through a pre-trained backbone network of a multitask neural network to obtain a point cloud characteristic vector; performing target object detection based on the point cloud characteristic vector through a point cloud detection network branch of the multitask neural network, and outputting a point cloud detection result; and point cloud segmentation is carried out on the basis of the point cloud characteristic vector through the point cloud segmentation network branch of the multitask neural network, and a point cloud segmentation result is output, so that the efficiency of point cloud detection and point cloud segmentation is improved.

The method comprises the steps of extracting and mapping the characteristics of point cloud characteristic vectors through a trunk network of a multitask neural network, inputting the point cloud characteristic vectors into a network branch corresponding to a point cloud detection task and a network branch corresponding to a point cloud segmentation task respectively, and performing point cloud detection and point cloud segmentation respectively.

The prior art task of point cloud detection generally includes: point cloud preprocessing, feature extraction, and detection head prediction, a point cloud segmentation task in the prior art generally includes: point cloud preprocessing, feature extraction, and point cloud segmentation, and the time consumption and resource consumption (cpu, gpu, etc.) of the point cloud preprocessing and feature extraction occupy 90% of the whole task consumption. For example, in terms of time, for example, a detection task takes 20ms and a segmentation task takes 20ms, where point cloud preprocessing and feature extraction take 18ms, and two tasks are given by using the centers of independent network models, which requires 40ms, and if the method disclosed in the embodiment of the present application for point cloud detection and segmentation using one network is used, the total time consumed is 18+2+ 2-22 ms, which greatly improves the efficiency of point cloud detection and segmentation, and saves resources consumed by point cloud preprocessing and feature extraction.

On the other hand, the point cloud to be processed is subjected to voxelization processing, and feature extraction and mapping are performed based on the columnar voxels, so that point cloud detection and segmentation are realized.

Further, the point cloud and the point cloud segmentation label are converted into a bird's-eye view image, feature extraction, detection and segmentation are carried out under the bird's-eye view image, and the method is high in speed and good in effect. And finally, converting the point cloud semantic segmentation result output by the model to each point in the point cloud, thereby completing the task of performing semantic segmentation on the point cloud based on the point pair, and effectively improving the speed of point cloud segmentation.

Example two

The point cloud detecting and segmenting device disclosed in the embodiment of the present application, as shown in fig. 5, includes:

a cylindrical voxelization module 510, configured to perform a cylindrical voxelization process on the point cloud to be processed, and obtain a plurality of cylindrical voxels constituting the point cloud to be processed;

a voxel characteristic obtaining module 520, configured to perform characteristic extraction and mapping on the plurality of cylindrical voxels, and obtain a voxel characteristic of the point cloud to be processed;

the aerial view feature mapping module 530 is configured to map the voxel features to an aerial view to obtain aerial view features corresponding to the point clouds to be processed;

a point cloud feature extraction module 540, configured to perform feature extraction on the aerial view features through a pre-trained backbone network of a multitask neural network to obtain a point cloud feature vector;

a point cloud detection and segmentation module 550, configured to perform target object detection based on the point cloud feature vector through a point cloud detection network branch of the multitask neural network, and output a point cloud detection result; and performing point cloud segmentation on the basis of the point cloud characteristic vector through a point cloud segmentation network branch of the multitask neural network, and outputting a point cloud segmentation result.

In some embodiments of the present application, as shown in fig. 6, the point cloud segmentation result includes: each of the histogram voxel matched point cloud semantic categories, the apparatus further comprising:

a first point cloud segmentation label obtaining module 511, configured to obtain a first point cloud segmentation label of the plurality of cylindrical voxels, where the first point cloud segmentation label includes: location information for each of the cylindrical voxels;

and the segmentation result conversion module 560 is configured to map the point cloud semantic category matched with the voxel into the point cloud to be processed according to the position information of the voxel, so as to obtain a segmentation result of a midpoint of the point cloud to be processed.

In some embodiments of the present application, the mapping the point cloud semantic category matched with the voxel to the point cloud to be processed according to the position information of the voxel column to obtain a segmentation result of a midpoint of the point cloud to be processed, includes:

acquiring points in the point cloud to be processed contained in each columnar voxel according to the position information of the columnar voxel;

and regarding each columnar voxel, taking the point cloud semantic category matched with the columnar voxel as the point cloud semantic category matched with the point contained in the columnar voxel.

In some embodiments of the present application, the voxel characteristic obtaining module 520 is further configured to:

for each columnar voxel, acquiring the central points of all the points divided into the columnar voxels, and calculating the coordinate distance between each point divided into the columnar voxels and the central point;

for each of the cylindrical voxels, joining point features of all the points classified into the cylindrical voxels into voxel features of the cylindrical voxels, wherein the point features of each of the points include: position coordinates and reflection intensity information of the points;

splicing the voxel characteristics of the columnar voxels to obtain the splicing characteristics of the plurality of columnar voxels;

and performing feature mapping on the splicing features to obtain voxel features of the point cloud to be processed.

In some embodiments of the present application, the bird's eye view feature mapping module 530 is further configured to:

acquiring the number of points included in each cylindrical voxel according to the position information of each cylindrical voxel in the first point cloud segmentation label;

for each columnar voxel, mapping the characteristics corresponding to the columnar voxel in the voxel characteristics to the corresponding position of the aerial view matched with the first point cloud segmentation label according to the number of points included in the columnar voxel to obtain the aerial view characteristics corresponding to the point cloud to be processed; wherein,

the mapping, according to the number of points included in the voxel column, a feature corresponding to the voxel column to a corresponding position of the aerial view matched with the first point cloud segmentation label includes:

under the condition that the number of points included in the columnar voxels is larger than 0, mapping a feature vector corresponding to the columnar voxels in the voxel features to corresponding positions of a bird's eye view matched with the first point cloud segmentation label;

setting a feature vector at a corresponding position of the bird's eye view that matches the first point cloud segmentation label to 0 in a case where the number of points included in the voxel is equal to 0.

In some embodiments of the present application, the pre-trained multitask neural network comprises: backbone network, point cloud detection network branch to and, point cloud segmentation network branch, the device still includes:

a multitask neural network training module (not shown in the figure) for training a multitask neural network based on a plurality of voxelized point cloud training samples;

the voxel point cloud training sample is constructed according to the cylindrical voxels obtained by respectively performing cylindrical voxel processing on a plurality of point clouds; for each of the voxelized point cloud training samples, the sample data comprises: a number of cylindrical voxels, the exemplar labels comprising: a second point cloud segmentation label matched with the corresponding sample data; the second point cloud segmentation label is used for identifying a point cloud semantic category true value matched with each cylindrical voxel in corresponding sample data; the real value of the point cloud semantic category matched with the cylindrical voxels is as follows: and the point cloud semantic category with the maximum coverage rate is divided into the point cloud semantic categories covered by the points in the corresponding cylindrical voxels.

In some embodiments of the present application, the sample label further comprises: the point cloud detection label is used for identifying the true value of a target object detection result in corresponding sample data, and the training multitask neural network based on a plurality of voxelized point cloud training samples comprises the following steps:

for each voxelized point cloud training sample, respectively executing the following point cloud detection and segmentation operations to obtain a point cloud detection result predicted value and a point cloud segmentation result predicted value of the corresponding voxelized point cloud training sample:

performing feature extraction and mapping on a plurality of columnar voxels included in the voxelized point cloud training sample to obtain the voxel features of the voxelized point cloud training sample;

mapping the voxel characteristics to a bird's-eye view to obtain bird's-eye view characteristics corresponding to the voxel point cloud training sample;

extracting the characteristics of the aerial view through the backbone network to obtain a point cloud characteristic vector;

through the point cloud detection network branch, performing target object detection based on the point cloud feature vector, and outputting a point cloud detection result prediction value of the voxelized point cloud training sample; and performing point cloud segmentation based on the point cloud feature vector through the point cloud segmentation network branch, and outputting a predicted value of a point cloud segmentation result of the voxelized point cloud training sample;

and calculating the point cloud detection loss of the multitask neural network according to the point cloud detection result predicted value and the corresponding point cloud detection label of each voxelized point cloud training sample, calculating the point cloud segmentation loss of the multitask neural network according to the point cloud segmentation result predicted value and the corresponding second point cloud segmentation label of each voxelized point cloud training sample, and then iteratively training the multitask neural network by taking the point cloud detection loss and the point cloud segmentation loss as targets.

The point cloud detection and segmentation device disclosed in the embodiment of the present application is used for implementing the method of the point cloud detection and segmentation device described in the first embodiment of the present application, and specific implementation manners of each module of the device are not described again, and reference may be made to specific implementation manners of corresponding steps in the method embodiment.

According to the point cloud detection and segmentation device disclosed by the embodiment of the application, a plurality of cylindrical voxels forming the point cloud to be processed are obtained by performing cylindrical voxel processing on the point cloud to be processed; then, extracting and mapping the characteristics of the plurality of columnar voxels to obtain voxel characteristics of the point cloud to be processed, and mapping the voxel characteristics to a bird's-eye view to obtain bird's-eye view characteristics corresponding to the point cloud to be processed; finally, extracting the characteristics of the aerial view through a pre-trained backbone network of a multitask neural network to obtain a point cloud characteristic vector; performing target object detection based on the point cloud characteristic vector through a point cloud detection network branch of the multitask neural network, and outputting a point cloud detection result; and point cloud segmentation is carried out on the basis of the point cloud characteristic vector through the point cloud segmentation network branch of the multitask neural network, and a point cloud segmentation result is output, so that the efficiency of point cloud detection and point cloud segmentation is improved.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The point cloud detection and segmentation method and method provided by the application are introduced in detail, a specific example is applied in the description to explain the principle and the implementation mode of the application, and the description of the embodiment is only used for helping to understand the method and a core idea of the method; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in an electronic device according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 7 shows an electronic device that may implement a method according to the present application. The electronic device can be a PC, a mobile terminal, a personal digital assistant, a tablet computer and the like. The electronic device conventionally comprises a processor 710 and a memory 720 and program code 730 stored on said memory 720 and executable on the processor 710, said processor 710 implementing the method described in the above embodiments when executing said program code 730. The memory 720 may be a computer program product or a computer readable medium. The memory 720 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 720 has a storage space 7201 for program code 730 of a computer program for performing any of the method steps of the above-described method. For example, the storage space 7201 for the program code 730 may include respective computer programs for implementing the various steps in the above methods, respectively. The program code 730 is computer readable code. The computer programs may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. The computer program comprises computer readable code which, when run on an electronic device, causes the electronic device to perform the method according to the above embodiments.

The embodiment of the application also discloses a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the point cloud detection and segmentation method according to the first embodiment of the application.

Such a computer program product may be a computer readable storage medium that may have memory segments, memory spaces, etc. arranged similar to memory 720 in the electronic device shown in fig. 7. The program code may be stored in a computer readable storage medium, for example, compressed in a suitable form. The computer readable storage medium is typically a portable or fixed storage unit as described with reference to fig. 8. Typically, the storage unit comprises computer readable code 730 ', said computer readable code 730' being code read by a processor, which when executed by the processor implements the steps of the method described above.

Reference herein to "one embodiment," "an embodiment," or "one or more embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Moreover, it is noted that instances of the word "in one embodiment" are not necessarily all referring to the same embodiment.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A point cloud detection and segmentation method, comprising:

2. The method according to claim 1, wherein the performing the histogram voxelization process on the point cloud to be processed after obtaining a plurality of the voxels constituting the point cloud to be processed, further comprises:

obtaining a first point cloud segmentation label of the plurality of cylindrical voxels, wherein the first point cloud segmentation label comprises: location information for each of the cylindrical voxels;

the point cloud segmentation result comprises: the point cloud semantic category matched with each cylindrical voxel, the point cloud segmentation network branch passing through the multitask neural network, the point cloud segmentation based on the point cloud characteristic vector, and the point cloud segmentation result output further comprise:

and mapping the point cloud semantic category matched with the cylindrical voxels to the point cloud to be processed according to the position information of the cylindrical voxels to obtain a segmentation result of the midpoint of the point cloud to be processed.

3. The method according to claim 2, wherein the mapping the point cloud semantic category matched with the voxel to the point cloud to be processed according to the position information of the voxel column to obtain a segmentation result of the midpoint of the point cloud to be processed comprises:

4. The method according to claim 1, wherein the extracting and mapping the features of the plurality of cylindrical voxels to obtain the voxel features of the point cloud to be processed comprises:

5. The method of claim 2, wherein the mapping the voxel characteristics to a bird's eye view to obtain bird's eye view characteristics corresponding to the point cloud to be processed comprises:

the mapping, according to the number of points included in the voxel histogram, a feature corresponding to the voxel histogram in the voxel features to a corresponding position of the aerial view matched with the first point cloud segmentation label includes:

6. The method of any one of claims 1 to 5, wherein the pre-trained multi-tasking neural network comprises: the main network through the multitask neural network trained in advance, right aerial view characteristic carries out the feature extraction, before obtaining the point cloud eigenvector, still include:

training a multitask neural network based on a plurality of voxelized point cloud training samples;

7. The method of claim 6, wherein the sample label further comprises: the point cloud detection label is used for identifying the true value of a target object detection result in corresponding sample data, and the training multitask neural network based on a plurality of voxelized point cloud training samples comprises the following steps:

8. A point cloud detection and segmentation apparatus, comprising:

9. An electronic device comprising a memory, a processor, and program code stored on the memory and executable on the processor, wherein the processor implements the point cloud detection and segmentation method of any of claims 1 to 7 when executing the program code.

10. A computer-readable storage medium having stored thereon program code, characterized in that the program code realizes the steps of the point cloud detection and segmentation method of any one of claims 1 to 7 when executed by a processor.