WO2023193400A1 - Point cloud detection and segmentation method and apparatus, and electronic device - Google Patents

Point cloud detection and segmentation method and apparatus, and electronic device Download PDF

Info

Publication number
WO2023193400A1
WO2023193400A1 PCT/CN2022/117322 CN2022117322W WO2023193400A1 WO 2023193400 A1 WO2023193400 A1 WO 2023193400A1 CN 2022117322 W CN2022117322 W CN 2022117322W WO 2023193400 A1 WO2023193400 A1 WO 2023193400A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
columnar
voxel
segmentation
features
Prior art date
Application number
PCT/CN2022/117322
Other languages
French (fr)
Chinese (zh)
Inventor
赵天坤
唐佳
Original Assignee
合众新能源汽车股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 合众新能源汽车股份有限公司 filed Critical 合众新能源汽车股份有限公司
Publication of WO2023193400A1 publication Critical patent/WO2023193400A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details

Definitions

  • the present application relates to the field of computer technology, and in particular to point cloud detection and segmentation methods and devices, as well as electronic equipment and computer-readable storage media.
  • Point cloud data refers to a set of vectors in a three-dimensional coordinate system. Spatial information is recorded in the form of points, and each point contains three-dimensional coordinates. Depending on the data collection capabilities of point cloud collection equipment, some point cloud data may also contain color information (RGB) or reflection intensity information (Intensity). Taking point cloud data collected through lidar as an example, point cloud data includes the position coordinates and reflection intensity information of points in three-dimensional space. Point cloud data is widely used for target detection and recognition in the field of autonomous driving. For example, it is used for target detection and recognition in autonomous driving fields such as cars and drones. In the application process of point cloud data, point cloud detection and segmentation technology are usually used to perform target object detection and point cloud segmentation based on point cloud data.
  • point cloud detection technology refers to processing point cloud data to detect the position of the target object in the scene that the point cloud data matches
  • point cloud segmentation technology refers to identifying the target object that matches each point in the point cloud data. Category to facilitate subsequent automatic driving control.
  • the embodiment of the present application provides a point cloud detection and segmentation method, which helps to improve the efficiency of point cloud detection and point cloud segmentation.
  • embodiments of the present application provide a point cloud detection and segmentation method, including:
  • the target object is detected based on the point cloud feature vector, and the point cloud detection result is output; and, through the point cloud segmentation network branch of the multi-task neural network, based on the The point cloud feature vector is used for point cloud segmentation and the point cloud segmentation result is output.
  • embodiments of the present application provide a point cloud detection and segmentation device, including:
  • a columnar voxelization module used to perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed;
  • a voxel feature acquisition module used to perform feature extraction and mapping on the plurality of columnar voxels, and obtain the voxel features of the point cloud to be processed;
  • a bird's-eye view feature mapping module used to map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed;
  • a point cloud feature extraction module is used to extract features of the bird's-eye view features through the backbone network of a pre-trained multi-task neural network to obtain a point cloud feature vector;
  • a point cloud detection and segmentation module is configured to perform target object detection based on the point cloud feature vector through the point cloud detection network branch of the multi-task neural network, and output point cloud detection results; and, through the multi-task neural network
  • the point cloud segmentation network branch performs point cloud segmentation based on the point cloud feature vector and outputs the point cloud segmentation result.
  • embodiments of the present application also disclose an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, the The point cloud detection and segmentation method described in the embodiment of this application.
  • embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps of the point cloud detection and segmentation method disclosed in the embodiments of the present application are provided.
  • the point cloud detection and segmentation method disclosed in the embodiment of the present application obtains a number of columnar voxels that constitute the point cloud to be processed by performing columnar voxelization processing on the point cloud to be processed; and then characterizes the several columnar voxels.
  • Extract and map obtain the voxel features of the point cloud to be processed, and map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed; finally, through the pre-trained multi-task neural
  • the backbone network of the network extracts features from the bird's-eye view features to obtain point cloud feature vectors; through the point cloud detection network branch of the multi-task neural network, target objects are detected based on the point cloud feature vectors and output point clouds Detection results; and, through the point cloud segmentation network branch of the multi-task neural network, point cloud segmentation is performed based on the point cloud feature vector, and the point cloud segmentation result is output, which helps to improve the efficiency of point cloud detection and point cloud segmentation. .
  • Figure 1 is a schematic flow chart of the point cloud detection and segmentation method in Embodiment 1 of the present application
  • Figure 2 is a schematic diagram of the effect of point cloud voxelization processing in Embodiment 1 of the present application.
  • Figure 3 is a schematic structural diagram of the multi-task neural network used in Embodiment 1 of the present application.
  • Figure 4 is a schematic diagram of point cloud segmentation result mapping in Embodiment 1 of the present application.
  • Figure 5 is one of the structural schematic diagrams of the point cloud detection and segmentation device in Embodiment 2 of the present application.
  • Figure 6 is the second structural schematic diagram of the point cloud detection and segmentation device in Embodiment 2 of the present application.
  • Figure 7 schematically shows a block diagram of an electronic device for performing a method according to the present application.
  • Figure 8 schematically shows a storage unit for holding or carrying program code for implementing the method according to the present application.
  • An embodiment of the present application discloses a point cloud detection and segmentation method, as shown in Figure 1.
  • the method includes: steps 110 to 150.
  • Step 110 Perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed.
  • the point cloud to be processed described in the embodiment of this application is: the point cloud in the area of interest in the point cloud collected by a point cloud collection device (such as a lidar sensor).
  • a point cloud collection device such as a lidar sensor
  • the original point cloud collected by the lidar sensor installed on the vehicle is a data set of several disordered points, where each Point data can be represented by data with a dimension of 4, for example, expressed as: (x, y, z, i), where x, y, z are the spatial position coordinates of the point, and i represents the reflection intensity of the point.
  • point cloud preprocessing For the original point cloud collected by point cloud acquisition equipment, point cloud preprocessing first needs to be performed to obtain a point set that meets the requirements. For example, for the original point cloud, remove the nan values (null values), or remove the points with very large values to filter the point cloud noise.
  • nan values null values
  • point cloud preprocessing please refer to the prior art. In the embodiments of this application, the technical solution adopted for point cloud preprocessing is not limited and will not be described again here.
  • the point cloud collected by point cloud collection equipment is a point cloud in a three-dimensional irregular spatial area.
  • the data of the points in the area of interest in the large cube area determined previously is obtained to facilitate subsequent point cloud detection and point cloud segmentation of the point cloud in the area of interest.
  • the coordinates of points within the area of interest can be expressed by (x, y, z), where xmin ⁇ x ⁇ xmax, ymin ⁇ y ⁇ ymax, zmin ⁇ z ⁇ zmax, and the unit is meters .
  • points in the region of interest are determined based on point cloud quality. For example, if the point cloud far away from the vehicle is sparse and the number of points hitting the vehicle is small, you can set the minimum number of points to a smaller value (for example, the point value is equal to 5), and then find the corresponding number of points based on this number. , and determine a spatial area based on a maximum distance point. In some embodiments of the present application, for the same point cloud quality (such as point clouds collected by the same point cloud collection device), this distance can be predetermined by the quality of the collected point cloud data and will not change during the application process.
  • the method of determining the region of interest please refer to the method of determining the region of interest used in point cloud detection or point cloud segmentation solutions in the prior art.
  • the specific implementation method of determining the region of interest is not limited.
  • the point cloud to be processed is subjected to columnar voxelization processing to obtain a number of columnar voxels that constitute the point cloud to be processed, including: coordinate distribution according to the first coordinate axis and the second coordinate axis. , divide the points in the point cloud to be processed into several columnar voxels.
  • the first coordinate axis and the second coordinate axis are two different coordinate axes of a three-dimensional spatial coordinate system
  • the columnar voxels are prismatic voxels. For example, after the point cloud shown on the left side in Figure 2 is voxelized, a cuboid voxel (ie, a columnar voxel) 210 shown on the right side in Figure 2 can be obtained.
  • each voxel can be expressed as [x v , y v , zmaz-zmin], where x v represents the length of the voxel along the x-axis direction, y v represents the length of the voxel along the y-axis direction, zmax-zmin Represents the height of the voxel along the z-axis direction, in meters.
  • W ⁇ H columnar voxels can be divided, where,
  • the area of interest is divided into 512 ⁇ 250 columnar voxels. Subsequently, these columnar voxels are regarded as image pixels and used for feature extraction of the region of interest.
  • the point cloud of the area of interest can be represented as a voxel image of W ⁇ H ⁇ 1, and the dimension of the voxel image is W ⁇ H ⁇ 1.
  • the size of the columnar voxels is determined experimentally. For example, you can preset some voxel sizes, conduct point cloud detection and point cloud segmentation experiments respectively, analyze the impact of voxel size on detection and segmentation results and performance, and finally determine the most optimal voxel size.
  • the method further includes: obtaining the first point of the plurality of columnar voxels.
  • Cloud segmentation label wherein the first point cloud segmentation label includes: position information of each columnar voxel. For the W ⁇ H columnar voxels obtained by division, these columnar voxels form a voxel image with a voxel dimension of W ⁇ H ⁇ 1.
  • the first point cloud segmentation label of this voxel image is the above-mentioned W ⁇ H
  • the first point cloud segmentation label of the columnar voxel can be represented by a position information table of size W ⁇ H, for example, expressed as (W, H, 1).
  • the first point cloud segmentation label is used to subsequently determine the segmentation result of the point cloud based on the segmentation result of the columnar voxels.
  • obtaining the first point cloud segmentation labels of the several columnar voxels includes: for each columnar voxel, position information of the columnar voxel is used as the columnar voxel.
  • First point cloud segmentation label for voxel matching can be represented by a position information table of size W ⁇ H, for example, represented as (W, H, 1).
  • the position information table includes W ⁇ H sets of position information, and each set of position information corresponds to a columnar voxel.
  • each set of position information is used to represent the coordinate range of the corresponding columnar voxel on the x-axis and y-axis.
  • each set of position information can also be used to represent the coordinate range of points in the point cloud divided into columnar voxels corresponding to the set of position information.
  • the mapping relationship between the points in the point cloud and the columnar voxel can be established by recording the coordinate range of the corresponding columnar voxel in the position information table.
  • other methods may be used to establish the mapping relationship between points in the point cloud and columnar voxels.
  • the specific expression form of the mapping relationship is not limited.
  • Step 120 Perform feature extraction and mapping on the plurality of columnar voxels to obtain voxel features of the point cloud to be processed.
  • the columnar voxels After obtaining a number of columnar voxels that constitute the point cloud to be processed (such as the point cloud of the aforementioned area of interest), the columnar voxels can be regarded as pixels of the image, and the voxel image composed of the several columnar voxels can be processed. Feature extraction and mapping are used to obtain the features of the voxel image. Since the features of the voxel image are extracted based on the distribution data of points within columnar voxels, the features of the point cloud to be processed can be fully expressed.
  • performing feature extraction and mapping on the plurality of columnar voxels to obtain the voxel features of the point cloud to be processed includes: for each columnar voxel, obtaining the points divided into The center point of all points in the columnar voxel is calculated, and the coordinate distance between each point divided into the columnar voxel and the center point is calculated; for each columnar voxel, the coordinate distance divided into the columnar voxel is calculated.
  • the point features of all points in the columnar voxels are spliced into the voxel features of the columnar voxels, wherein the point features of each point include: the position coordinates and reflection intensity information of the point;
  • the voxel features of the columnar voxels are spliced to obtain the splicing features of the several columnar voxels; feature mapping is performed on the splicing features to obtain the voxel features of the point cloud to be processed.
  • each columnar voxel will contain a certain number of points. Taking a columnar voxel that contains K points as an example, first calculate the average coordinate of these K points based on the position coordinates in the original point cloud data of these K points.
  • the features of a columnar voxel containing K points can be expressed as features with a length of K ⁇ 7, that is, the features of the columnar voxel can be represented by the point features of all included points.
  • the voxel characteristics can be obtained by voxelizing the point cloud to be processed.
  • the features of the N columnar voxels obtained after voxelization processing (such as the aforementioned features with a length of K ⁇ 7) are spliced to obtain a length is the splicing feature of N ⁇ K ⁇ 7.
  • the columnar voxel can be discarded.
  • the spliced features can be feature mapped through a pre-trained feature extraction network to obtain features with a length of N ⁇ D, where D represents the feature dimension of each columnar voxel. number.
  • the feature extraction network can be constructed by serial connection of a fully connected layer, a normalization layer and a one-dimensional maximum pooling layer MaxPool1D.
  • N ⁇ D-dimensional features are output, where D is the full The dimension of the connection layer output.
  • Step 130 Map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed.
  • mapping the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed includes: segmenting each of the first point cloud labels according to The position information of the columnar voxel is used to obtain the number of points included in each columnar voxel; for each columnar voxel, the volume is calculated according to the number of points included in the columnar voxel.
  • the feature corresponding to the columnar voxel in the voxel feature is mapped to the corresponding position of the bird's-eye view that matches the first point cloud segmentation label, and the bird's-eye view feature corresponding to the point cloud to be processed is obtained; wherein, According to the number of points included in the columnar voxel, the feature of the voxel feature corresponding to the columnar voxel is mapped to the corresponding position of the bird's-eye view that matches the first point cloud segmentation label, Including: when the number of points included in the columnar voxel is greater than 0, mapping the feature vector corresponding to the columnar voxel in the voxel feature to match the first point cloud segmentation label at the corresponding position of the bird's-eye view; when the number of points included in the columnar voxel is equal to 0, the feature vector at the corresponding position of the bird's-eye view that matches the first point cloud segmentation label is set to 0 .
  • each columnar voxel corresponds to a label data (i.e. a set of position information) in the first point cloud segmentation label.
  • the first The label data corresponds to columnar voxels with coordinates ranging from (0, 0) to (0.2, 0.2).
  • Each pixel The features are represented by D-dimensional feature vectors, and each pixel corresponds to a columnar voxel.
  • the respective feature vectors of the N columnar voxels included in the voxel feature can be mapped to the corresponding positions on the bird's-eye view to obtain the size is the bird's-eye view feature of W ⁇ H ⁇ D.
  • some columnar voxels may not have points.
  • the bird's-eye view corresponds to the columnar voxels that do not include points. position, its eigenvector can be set to a zero vector.
  • Step 140 Feature extraction is performed on the bird's-eye view features through the backbone network of the pre-trained multi-task neural network to obtain a point cloud feature vector.
  • the multi-task neural network includes: a backbone network 310, a point cloud detection network branch 320 and a cloud segmentation network branch 330.
  • the backbone network 310 may adopt a convolutional neural network commonly used in the prior art.
  • the backbone network 310 further includes: three cascaded feature extraction modules of different scales and a feature concatenation layer (ConCat), where each feature extraction
  • the modules include: different numbers of feature mapping modules (CBR), an upsampling layer, and, a feature mapping module (CBR).
  • the number of feature mapping modules (CBR) included in each feature extraction module can be 4, 6, and 6 respectively.
  • the feature mapping module (CBR) can be composed of a convolution layer, a batch normalization layer, and a cascade of Relu activation functions.
  • the sizes of the features output by these three feature extraction modules are respectively
  • the feature splicing layer is used to splice the features output by the above three feature extraction modules.
  • the above three feature extraction modules perform convolution operation, upsampling, normalization and activation processing on the input bird's-eye view features respectively.
  • the obtained feature vector dimension is C is the number of characteristic channels.
  • Step 150 through the point cloud detection network branch of the multi-task neural network, perform target object detection based on the point cloud feature vector, and output the point cloud detection results; and, through the point cloud segmentation network branch of the multi-task neural network , perform point cloud segmentation based on the point cloud feature vector, and output the point cloud segmentation result.
  • the point cloud feature vectors output by the backbone network 310 will be input to the point cloud detection network branch 320 and the cloud segmentation network branch 330 respectively, and these two network branches will perform the next step of processing respectively.
  • the following is an example of the execution scheme of the point cloud detection task and the point cloud segmentation task in conjunction with the network structures of the point cloud detection network branch 320 and the cloud segmentation network branch 330 respectively.
  • the point cloud detection network branch 320 includes four detection heads, which are respectively used to output the detection results of whether there is a heat map, the detected target position, the size of the target, and the rotation angle of the target.
  • each detection head included in the point cloud detection network branch 320 is composed of a feature extraction module and a convolutional layer.
  • the feature extraction module is further composed of a convolutional layer, a batch normalization layer and an activation layer. Function composition.
  • Each detection head performs feature encoding and transformation mapping on the input point cloud feature vector, and finally outputs the corresponding prediction result.
  • the detection head corresponding to the detection heat map has a size of Each position in the point cloud feature vector is predicted separately, and whether the corresponding position is a key point on the heat map is output; for another example, the detection head corresponding to the detection target object has a size of Predict from the point cloud feature vector and output the position of the detected target (x, y, z); for another example, the detection head corresponding to the output target size outputs the size of the target (dx, dy, dz); Corresponding to the detection head that outputs the rotation angle of the target object, it outputs the rotation angle ⁇ of the target object.
  • the point cloud segmentation network branch 330 is composed of an upsampling module, a feature extraction module and a convolutional layer.
  • the feature extraction module is further composed of a convolutional layer, a batch regression layer and a convolutional layer. It consists of a unified layer and an activation function.
  • the upsampling layer first upsamples the point cloud feature vectors output by the backbone network 310, and then uses the convolution layer, batch normalization layer and activation function to sequentially perform feature conversion and mapping on the vectors obtained by the upsampling process. , Finally, the segmentation results of the corresponding columnar voxels are output through the convolutional layer.
  • the size of the point cloud feature vector output by the backbone network is For example, the point cloud segmentation network branch 330 performs upsampling, convolution operations, batch normalization, activation mapping and other processes on the input point cloud feature vector, and finally outputs one-dimensional data of (W, H, n_class).
  • W and H refer to the dimensions of the output data corresponding to the width and height of the input feature map
  • n_classs represents the number of point cloud semantic categories.
  • the output data size of the point cloud segmentation network branch 330 is: 512 ⁇ 512 ⁇ 11, which means that in these 512 ⁇ 512 positions, each position has a set of segmentation result prediction values, the number is 11, the value of these 11 segmentation result prediction values is between 0-1, and the sum is 1, which means The probability value that each columnar voxel belongs to the corresponding point cloud semantic category. Furthermore, the point cloud semantic category corresponding to the maximum probability value can be taken as the point cloud semantic category for corresponding columnar voxel matching.
  • the point cloud semantic categories are determined according to specific application scenarios.
  • point cloud semantic categories can be defined to include but are not limited to any one or more of the following: buildings, green plants, ground, fences, curbs, lane lines, vehicles, etc.
  • point cloud segmentation processing is performed based on the point cloud feature vector, and the point cloud segmentation network branch will output the plurality of columns matched by the point cloud feature vector.
  • Point cloud segmentation results of voxels that is, all columnar voxels obtained after voxelizing the point cloud to be processed).
  • the segmentation result output by the point cloud segmentation network branch is the segmentation result obtained by semantic segmentation based on the features projected onto the bird's-eye view.
  • the point cloud segmentation result includes: the point cloud semantic category matched by each columnar voxel, the point cloud segmentation network branch through the multi-task neural network, based on the point cloud segmentation network branch.
  • the cloud feature vector After the cloud feature vector performs point cloud segmentation and outputs the point cloud segmentation result, it also includes: mapping the point cloud semantic category matched by the columnar voxel to the point cloud to be processed according to the position information of the columnar voxel. , the segmentation result of the point cloud to be processed is obtained.
  • the point cloud semantic category matched by the columnar voxel is mapped to the point cloud to be processed according to the position information of the columnar voxel, and the point to be processed is obtained.
  • the segmentation result of the point in the cloud includes: obtaining the points in the point cloud to be processed contained in each columnar voxel according to the position information of the columnar voxel; for each columnar voxel, The semantic category of the point cloud matched by the columnar voxel is used as the semantic category of the point cloud matched by the point contained in the columnar voxel.
  • each columnar voxel corresponds to a position in the bird's-eye view.
  • the segmentation result of the columnar voxel is obtained, which can be considered as the point cloud semantic segmentation result of the columnar area in the point cloud.
  • each box in the bird's-eye view corresponds to a columnar voxel.
  • the segmentation result corresponding to the image position matched by each box in the bird's-eye view can be regarded as the segmentation result of the columnar voxel corresponding to the box.
  • each columnar voxel corresponds to a spatial area in the point cloud to be processed.
  • This spatial area may contain 0 or more points.
  • the segmentation result of each columnar voxel ie, matching The point cloud semantic category
  • the point cloud semantic category is used as the point cloud semantic category of each point included in the columnar voxel.
  • the semantic segmentation of the points in the point cloud is completed. For example, for a columnar voxel with coordinates ranging from (0, 0) to (0.2, 0.2), if the segmentation result of the columnar voxel is "kerb", it can be determined that in the point cloud to be processed, the coordinate range is (0 , 0) to (0.2, 0.2), the semantic category of point cloud matching is "kerb".
  • the pre-trained multi-task neural network includes: a backbone network 310, a point cloud detection network branch 320, and a point cloud segmentation network branch 330.
  • the backbone network of the neural network before performing feature extraction on the bird's-eye view features and obtaining the point cloud feature vector, also includes: training a multi-task neural network based on several voxelized point cloud training samples; wherein, the voxelized points
  • the cloud training samples are constructed based on the columnar voxels obtained by performing columnar voxelization on several point clouds respectively; for each of the voxelized point cloud training samples, the sample data includes: several columnar voxels, and the sample labels include : The second point cloud segmentation label matching the corresponding sample data; the second point cloud segmentation label is used to identify the true value of the point cloud semantic category of each columnar voxel matching in the corresponding sample data; the columnar
  • the specific implementation method of generating sample data refers to the corresponding implementation method in the previous steps, such as obtaining the point cloud to be processed, and voxelizing the point cloud to be processed to obtain a number of columnar voxels.
  • the specific implementation will not be described again here.
  • each columnar voxel will contain a certain number of points, and these points are manually labeled with point cloud semantic categories.
  • point cloud semantic category matched by the largest number of points is annotated as the point cloud semantic category of the columnar voxel.
  • a certain columnar voxel includes 3 points, which are marked with point cloud semantic categories (such as small cars, large cars, bicycles, tricycles, pedestrians, cones, green plants, ground, fences, Curbs, lane lines, etc.), assuming they are (buildings, buildings, green plants), then take the largest number of buildings as the point cloud semantic category for this columnar voxel matching.
  • point cloud semantic categories matched by all columnar voxels obtained after voxelization of a certain point cloud are arranged according to voxel positions, that is, the point cloud semantic category labels matching the sample data generated by the point cloud are obtained (i.e., the second point cloud split tag).
  • the sample label of the sample data can be expressed as a W ⁇ H label matrix.
  • Each element in the label matrix is the point cloud semantic category matched by the corresponding columnar voxel. logo.
  • the sample label further includes: a point cloud detection label, which is used to identify the true value of the target detection result in the corresponding sample data. For example, for each point cloud used to generate training samples, manually mark the key points of the target object on the heat map, the spatial position coordinates, stereoscopic size, and rotation angle of the target object in the point cloud, and use the standard information as the Point cloud detection labels for training samples generated from point clouds.
  • training a multi-task neural network based on several voxelized point cloud training samples includes: performing the following point cloud detection and segmentation operations for each of the voxelized point cloud training samples. , obtain the point cloud detection result prediction value and point cloud segmentation result prediction value of the corresponding voxelized point cloud training sample: perform feature extraction and mapping on several columnar voxels included in the voxelized point cloud training sample, and obtain the Describe the voxel features of the voxelized point cloud training sample; map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the voxelized point cloud training sample; through the backbone network, map the bird's-eye view Feature extraction is performed on the image features to obtain point cloud feature vectors; target object detection is performed based on the point cloud feature vectors through the point cloud detection network branch, and the point cloud detection result prediction value of the voxelized point
  • the target object is detected based on the point cloud feature vector, and the point cloud detection result prediction value of the voxelized point cloud training sample is output.
  • the point cloud detection result prediction value of the voxelized point cloud training sample is output. Please refer to the previous article to obtain the point to be processed. The relevant description of the cloud detection results will not be described again here.
  • point cloud segmentation network branch Through the point cloud segmentation network branch, perform point cloud segmentation based on the point cloud feature vector, and output the voxelized point cloud training sample point cloud segmentation result prediction value. Please refer to the previous article to obtain the point cloud to be processed. The relevant description of the segmentation results will not be repeated here.
  • the point cloud detection loss of the multi-task neural network is calculated based on the point cloud detection result prediction value and the corresponding point cloud detection label of each voxelized point cloud training sample.
  • the point cloud detection loss includes four parts, namely: heat map prediction loss, position prediction loss, size prediction loss and rotation angle prediction loss.
  • the position prediction loss, size prediction loss, and rotation angle prediction loss can be expressed by mean square error.
  • the position prediction loss of the multi-task neural network is represented by the mean square error of the predicted values of the target object position (such as spatial position coordinates) of all the voxelized point cloud training samples and the true value of the target object position in the sample label.
  • the size prediction loss of the multi-task neural network is represented by the mean square error of the predicted value of the target size (such as three-dimensional size) of all the voxelized point cloud training samples and the true value of the target size in the sample label; by The mean square error between the predicted value of the target rotation angle of all the voxelized point cloud training samples and the true value of the target rotation angle in the sample label represents the rotation angle prediction loss of the multi-task neural network.
  • the heat map prediction loss is calculated using a pixel-by-pixel focal loss function (ie, focal loss function).
  • the position of the target object is p.
  • the key points (p x , p y ) on the heat map are obtained, and the calculated data is distributed to the heat map through the Gaussian kernel. If the Gaussian kernels of multiple targets overlap, the maximum value will be taken.
  • the formula of the Gaussian kernel can be expressed as:
  • x and y are the enumerated step block positions in the image to be detected, is the target scale adaptive variance, and Y xyc is the Gaussian heat map data representation of each key point after Gaussian kernel mapping.
  • the point cloud segmentation loss of the multi-task neural network is calculated based on the point cloud segmentation result prediction value of each voxelized point cloud training sample and the corresponding second point cloud segmentation label.
  • the point cloud segmentation loss can be expressed by the cross entropy of the point cloud segmentation result prediction value and the corresponding second point cloud segmentation label.
  • point cloud detection loss and point cloud segmentation loss are integrated to calculate the loss of the multi-task neural network, and with the goal of minimizing the loss of the entire network, optimize the network of the backbone network, point cloud detection network branch and cloud segmentation network branch. parameters to complete the training of multi-task neural networks.
  • the point cloud detection and segmentation method disclosed in the embodiment of the present application obtains a number of columnar voxels that constitute the point cloud to be processed by performing columnar voxelization processing on the point cloud to be processed; and then characterizes the several columnar voxels.
  • Extract and map obtain the voxel features of the point cloud to be processed, and map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed; finally, through the pre-trained multi-task neural
  • the backbone network of the network extracts features from the bird's-eye view features to obtain point cloud feature vectors; through the point cloud detection network branch of the multi-task neural network, target objects are detected based on the point cloud feature vectors and output point clouds Detection results; and, through the point cloud segmentation network branch of the multi-task neural network, point cloud segmentation is performed based on the point cloud feature vector, and the point cloud segmentation result is output, which helps to improve the efficiency of point cloud detection and point cloud segmentation. .
  • the point cloud feature vector is extracted and mapped through the backbone network of a multi-task neural network, and then input to the network branch corresponding to the point cloud detection task and the network branch corresponding to the point cloud segmentation task, respectively, for point cloud detection and point cloud detection.
  • Segmentation enables point cloud detection tasks and point cloud segmentation tasks to share the input of the point cloud feature extraction network. Compared with using two neural networks to independently perform point cloud detection and point cloud segmentation, it saves the amount of calculation consumed by point cloud feature extraction. , effectively improving the efficiency of point cloud detection and point cloud segmentation.
  • Point cloud detection tasks in the prior art usually include: point cloud preprocessing, feature extraction, and detection head prediction steps.
  • point cloud and its point cloud segmentation labels are converted to a bird's-eye view, and feature extraction, detection and segmentation are performed under the bird's-eye view, which is fast and effective.
  • point cloud semantic segmentation results output by the model is converted to each point in the point cloud, the task of semantic segmentation of point clouds based on points is completed, which effectively improves the speed of point cloud segmentation.
  • a point cloud detection and segmentation device disclosed in the embodiment of the present application, as shown in Figure 5, includes:
  • the columnar voxelization module 510 is used to perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed;
  • the voxel feature acquisition module 520 is used to perform feature extraction and mapping on the plurality of columnar voxels, and obtain the voxel features of the point cloud to be processed;
  • a bird's-eye view feature mapping module 530 is used to map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed;
  • the point cloud feature extraction module 540 is used to extract features of the bird's-eye view features through the backbone network of a pre-trained multi-task neural network to obtain a point cloud feature vector;
  • the point cloud detection and segmentation module 550 is used to perform target object detection based on the point cloud feature vector through the point cloud detection network branch of the multi-task neural network, and output the point cloud detection results; and, through the multi-task neural network
  • the point cloud segmentation network branch of the network performs point cloud segmentation based on the point cloud feature vector and outputs the point cloud segmentation result.
  • the point cloud segmentation result includes: the point cloud semantic category matched by each columnar voxel, and the device further includes:
  • the first point cloud segmentation label acquisition module 511 is used to obtain the first point cloud segmentation label of the plurality of columnar voxels, wherein the first point cloud segmentation label includes: position information of each columnar voxel. ;
  • the segmentation result conversion module 560 is used to map the point cloud semantic category matched by the columnar voxel to the point cloud to be processed according to the position information of the columnar voxel, and obtain the point cloud to be processed. Point segmentation results.
  • the point cloud semantic category matched by the columnar voxel is mapped to the point cloud to be processed according to the position information of the columnar voxel, and the point to be processed is obtained.
  • the segmentation results of cloud points include:
  • the point cloud semantic category matched by the columnar voxel is used as the point cloud semantic category matched by the points contained in the columnar voxel.
  • the voxel feature acquisition module 520 is further used to:
  • point features divided into all points in the columnar voxel are spliced into voxel features of the columnar voxel, where the point features of each point include : The position coordinates and reflection intensity information of the point;
  • the bird's-eye view feature mapping module 530 is further used to:
  • the feature corresponding to the columnar voxel in the voxel feature is mapped to the first point cloud segmentation label
  • the bird's-eye view features corresponding to the point cloud to be processed are obtained;
  • mapping according to the number of points included in the columnar voxel, the feature of the voxel feature corresponding to the columnar voxel to the corresponding position of the bird's-eye view that matches the first point cloud segmentation label.
  • the feature vector corresponding to the columnar voxel in the voxel feature is mapped to a bird's-eye view that matches the first point cloud segmentation label. at the corresponding position of the figure;
  • the feature vector at the corresponding position of the bird's-eye view matching the first point cloud segmentation label is set to 0.
  • the pre-trained multi-task neural network includes: a backbone network, a point cloud detection network branch, and a point cloud segmentation network branch.
  • the device further includes:
  • a multi-task neural network training module (not shown in the figure) is used to train a multi-task neural network based on several voxelized point cloud training samples;
  • the voxelized point cloud training sample is constructed based on the columnar voxels obtained after columnar voxelization processing of several point clouds respectively; for each of the voxelized point cloud training samples, the sample data includes: Several columnar voxels, the sample label includes: a second point cloud segmentation label matching the corresponding sample data; the second point cloud segmentation label is used to identify the true point cloud semantic category of each columnar voxel matching in the corresponding sample data. value; the true value of the point cloud semantic category matched by the columnar voxel is: among the point cloud semantic categories covered by the points in the point cloud that are divided into corresponding columnar voxels, the point cloud semantic category with the largest coverage rate.
  • the sample label also includes: a point cloud detection label, which is used to identify the true value of the target detection result in the corresponding sample data.
  • the sample label is based on several voxelized point clouds. Training samples to train multi-task neural networks, including:
  • the following point cloud detection and segmentation operations are performed respectively to obtain the predicted value of the point cloud detection result and the predicted value of the point cloud segmentation result of the corresponding voxelized point cloud training sample:
  • point cloud detection network branch target object detection is performed based on the point cloud feature vector, and the point cloud detection result prediction value of the voxelized point cloud training sample is output; and, through the point cloud segmentation network branch, Perform point cloud segmentation based on the point cloud feature vector, and output the predicted value of the voxelized point cloud training sample point cloud segmentation result;
  • the point cloud detection and segmentation device disclosed in the embodiment of this application is used to implement the point cloud detection and segmentation device method described in Embodiment 1 of this application.
  • the specific implementation of each module of the device will not be described in detail. Please refer to the method embodiment. Specific implementation of the corresponding steps.
  • the point cloud detection and segmentation device disclosed in the embodiment of the present application obtains a number of columnar voxels that constitute the point cloud to be processed by performing columnar voxelization processing on the point cloud to be processed; and then characterizes the several columnar voxels.
  • Extract and map obtain the voxel features of the point cloud to be processed, and map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed; finally, through the pre-trained multi-task neural
  • the backbone network of the network extracts features from the bird's-eye view features to obtain point cloud feature vectors; through the point cloud detection network branch of the multi-task neural network, target objects are detected based on the point cloud feature vectors and output point clouds Detection results; and, through the point cloud segmentation network branch of the multi-task neural network, point cloud segmentation is performed based on the point cloud feature vector, and the point cloud segmentation result is output, which helps to improve the efficiency of point cloud detection and point cloud segmentation. .
  • the point cloud feature vector is extracted and mapped through the backbone network of a multi-task neural network, and then input to the network branch corresponding to the point cloud detection task and the network branch corresponding to the point cloud segmentation task, respectively, for point cloud detection and point cloud detection.
  • Segmentation enables point cloud detection tasks and point cloud segmentation tasks to share the input of the point cloud feature extraction network. Compared with using two neural networks to independently perform point cloud detection and point cloud segmentation, it saves the amount of calculation consumed by point cloud feature extraction. , effectively improving the efficiency of point cloud detection and point cloud segmentation.
  • point cloud and its point cloud segmentation labels are converted to a bird's-eye view, and feature extraction, detection and segmentation are performed under the bird's-eye view, which is fast and effective.
  • point cloud semantic segmentation results output by the model is converted to each point in the point cloud, the task of semantic segmentation of point clouds based on points is completed, which effectively improves the speed of point cloud segmentation.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
  • Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the electronic device according to embodiments of the present application.
  • the present application may also be implemented as an apparatus or device program (eg, computer program and computer program product) for performing part or all of the methods described herein.
  • Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, or provided on a carrier signal, or in any other form.
  • Figure 7 shows an electronic device that can implement the method according to the present application.
  • the electronic device may be a PC, a mobile terminal, a personal digital assistant, a tablet computer, etc.
  • the electronic device conventionally includes a processor 710 and a memory 720 and program code 730 stored on the memory 720 and executable on the processor 710.
  • the processor 710 executes the program code 730, the above embodiments are implemented.
  • the memory 720 may be a computer program product or a computer-readable medium.
  • Memory 720 may be electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the memory 720 has a storage space 7201 for program code 730 of a computer program for executing any of the method steps described above.
  • the storage space 7201 for the program code 730 may include various computer programs respectively used to implement various steps in the above method.
  • the program code 730 is computer readable code. These computer programs can be read from or written into one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • the computer program includes computer readable code that, when run on an electronic device, causes the electronic device to perform the method according to the above embodiments.
  • An embodiment of the present application also discloses a computer-readable storage medium on which a computer program is stored.
  • the program is executed by a processor, the steps of the point cloud detection and segmentation method described in Embodiment 1 of the present application are implemented.
  • Such a computer program product may be a computer-readable storage medium, which may have storage segments, storage spaces, etc. arranged similarly to the memory 720 in the electronic device shown in FIG. 7 .
  • the program code may, for example, be compressed and stored in the computer-readable storage medium in a suitable form.
  • the computer-readable storage medium is typically a portable or fixed storage unit as described with reference to FIG. 8 .
  • the storage unit includes computer readable code 730', which is code read by a processor. When these codes are executed by the processor, each step in the method described above is implemented.
  • any reference signs placed between parentheses shall not be construed as limiting the claim.
  • the word “comprising” does not exclude the presence of elements or steps not listed in a claim.
  • the word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements.
  • the application may be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the element claim enumerating several means, several of these means may be embodied by the same item of hardware.
  • the use of the words first, second, third, etc. does not indicate any order. These words can be interpreted as names.

Abstract

A point cloud detection and segmentation method, relating to the technical field of computers. The method comprises: performing columnar voxelization processing on point cloud to be processed, so as to obtain a plurality of columnar voxels forming said point cloud; performing feature extraction and mapping on the plurality of columnar voxels to obtain the voxel features of said point cloud, and mapping the voxel features to an aerial view so as to obtain aerial view features corresponding to said point cloud; performing feature extraction on the aerial view features by means of a backbone network of a pre-trained multi-task neural network to obtain a point cloud feature vector; and by means of a point cloud detection network branch and a point cloud segmentation network branch of the multi-task neural network, respectively performing point cloud detection and point cloud segmentation on the basis of the point cloud feature vector. By reducing the operation of repeatedly performing point cloud feature extraction, the efficiency of point cloud detection and point cloud segmentation is improved.

Description

点云检测和分割方法、装置,以及,电子设备Point cloud detection and segmentation methods, devices, and electronic equipment
本申请要求在2022年04月06日提交中国专利局、申请号为202210353486.1、发明名称为“点云检测和分割方法、装置,以及,电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on April 6, 2022, with the application number 202210353486.1 and the invention name "Point cloud detection and segmentation method, device, and electronic equipment", and its entire content has been approved This reference is incorporated into this application.
技术领域Technical field
本申请涉及计算机技术领域,特别是涉及点云检测和分割方法、装置,以及,电子设备及计算机可读存储介质。The present application relates to the field of computer technology, and in particular to point cloud detection and segmentation methods and devices, as well as electronic equipment and computer-readable storage media.
背景技术Background technique
点云数据(point cloud data)是指在一个三维坐标系统中的一组向量的集合。空间信息以点的形式记录,每一个点包含有三维坐标。根据点云采集设备的数据采集能力的差异,有些点云数据中可能还含有颜色信息(RGB)或反射强度信息(Intensity)等。以通过激光雷达采集的点云数据为例,点云数据包括三维空间中点的位置坐标和反射强度信息。点云数据广泛用于自动驾驶领域中进行目标物检测和识别。例如,用于汽车、无人机等自动驾驶领域的目标物检测和识别。在点云数据的应用过程中,通常要采用点云检测和分割技术,以基于点云数据进行目标物检测和点云分割。其中,点云检测技术指通过对点云数据进行处理,检测出出点云数据匹配的场景中的目标物的位置,而点云分割技术指识别出点云数据中每个点匹配的目标物类别,便于后续进行自动驾驶控制。Point cloud data refers to a set of vectors in a three-dimensional coordinate system. Spatial information is recorded in the form of points, and each point contains three-dimensional coordinates. Depending on the data collection capabilities of point cloud collection equipment, some point cloud data may also contain color information (RGB) or reflection intensity information (Intensity). Taking point cloud data collected through lidar as an example, point cloud data includes the position coordinates and reflection intensity information of points in three-dimensional space. Point cloud data is widely used for target detection and recognition in the field of autonomous driving. For example, it is used for target detection and recognition in autonomous driving fields such as cars and drones. In the application process of point cloud data, point cloud detection and segmentation technology are usually used to perform target object detection and point cloud segmentation based on point cloud data. Among them, point cloud detection technology refers to processing point cloud data to detect the position of the target object in the scene that the point cloud data matches, while point cloud segmentation technology refers to identifying the target object that matches each point in the point cloud data. Category to facilitate subsequent automatic driving control.
现有技术中,通常采用不同的网络模型分别执行点云检测和点云分割任务。由于点云数据具有稀疏性和无规则性的特点,因此,通常采用的检测网络和分割网络的结构比较复杂,导致获取点云检测结果和点云分割结果需要的计算量很高。In the existing technology, different network models are usually used to perform point cloud detection and point cloud segmentation tasks respectively. Since point cloud data is sparse and irregular, the structures of the commonly used detection networks and segmentation networks are relatively complex, resulting in a high amount of calculations required to obtain point cloud detection results and point cloud segmentation results.
可见,现有技术中的点云检测和分割方法还需要改进。It can be seen that the point cloud detection and segmentation methods in the existing technology still need to be improved.
发明内容Contents of the invention
本申请实施例提供一种点云检测和分割方法,有助于提升点云检测和点云分割的效率。The embodiment of the present application provides a point cloud detection and segmentation method, which helps to improve the efficiency of point cloud detection and point cloud segmentation.
第一方面,本申请实施例提供了一种点云检测和分割方法,包括:In the first aspect, embodiments of the present application provide a point cloud detection and segmentation method, including:
对待处理点云进行柱状体素化处理,获取构成所述待处理点云的若干柱状体素;Perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed;
对所述若干柱状体素进行特征提取和映射,获取所述待处理点云的体素特征;Perform feature extraction and mapping on the plurality of columnar voxels to obtain voxel features of the point cloud to be processed;
将所述体素特征映射到鸟瞰图,得到所述待处理点云对应的鸟瞰图特征;Map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed;
通过预先训练的多任务神经网络的主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量;Through the backbone network of the pre-trained multi-task neural network, feature extraction is performed on the bird's-eye view features to obtain a point cloud feature vector;
通过所述多任务神经网络的点云检测网络分支,基于所述点云特征向量进行目标物检测,输出点云检测结果;以及,通过所述多任务神经网络的点云分割网络分支,基于所述点云特征向量进行点云分割,输出点云分割结果。Through the point cloud detection network branch of the multi-task neural network, the target object is detected based on the point cloud feature vector, and the point cloud detection result is output; and, through the point cloud segmentation network branch of the multi-task neural network, based on the The point cloud feature vector is used for point cloud segmentation and the point cloud segmentation result is output.
第二方面,本申请实施例提供了一种点云检测和分割装置,包括:In the second aspect, embodiments of the present application provide a point cloud detection and segmentation device, including:
柱状体素化模块,用于对待处理点云进行柱状体素化处理,获取构成所述待处理点云的若干柱状体素;A columnar voxelization module, used to perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed;
体素特征获取模块,用于对所述若干柱状体素进行特征提取和映射,获取所述待处理点云的体素特征;A voxel feature acquisition module, used to perform feature extraction and mapping on the plurality of columnar voxels, and obtain the voxel features of the point cloud to be processed;
鸟瞰图特征映射模块,用于将所述体素特征映射到鸟瞰图,得到所述待处理点云对应的鸟瞰图特征;A bird's-eye view feature mapping module, used to map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed;
点云特征提取模块,用于通过预先训练的多任务神经网络的主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量;A point cloud feature extraction module is used to extract features of the bird's-eye view features through the backbone network of a pre-trained multi-task neural network to obtain a point cloud feature vector;
点云检测和分割模块,用于通过所述多任务神经网络的点云检测网络分支,基于所述点云特征向量进行目标物检测,输出点云检测结果;以及,通过所述多任务神经网络的点云分割网络分支,基于所述点云特征向量进行点云分割,输出点云分割结果。A point cloud detection and segmentation module is configured to perform target object detection based on the point cloud feature vector through the point cloud detection network branch of the multi-task neural network, and output point cloud detection results; and, through the multi-task neural network The point cloud segmentation network branch performs point cloud segmentation based on the point cloud feature vector and outputs the point cloud segmentation result.
第三方面,本申请实施例还公开了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本申请实施例所述的点云检测和分割方法。In a third aspect, embodiments of the present application also disclose an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the The point cloud detection and segmentation method described in the embodiment of this application.
第四方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时本申请实施例公开的点云检测和分割方法的步骤。In the fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps of the point cloud detection and segmentation method disclosed in the embodiments of the present application are provided.
本申请实施例公开的点云检测和分割方法,通过对待处理点云进行柱状 体素化处理,获取构成所述待处理点云的若干柱状体素;之后,对所述若干柱状体素进行特征提取和映射,获取所述待处理点云的体素特征,并将所述体素特征映射到鸟瞰图,得到所述待处理点云对应的鸟瞰图特征;最后,通过预先训练的多任务神经网络的主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量;通过所述多任务神经网络的点云检测网络分支,基于所述点云特征向量进行目标物检测,输出点云检测结果;以及,通过所述多任务神经网络的点云分割网络分支,基于所述点云特征向量进行点云分割,输出点云分割结果,有助于提升点云检测和点云分割的效率。The point cloud detection and segmentation method disclosed in the embodiment of the present application obtains a number of columnar voxels that constitute the point cloud to be processed by performing columnar voxelization processing on the point cloud to be processed; and then characterizes the several columnar voxels. Extract and map, obtain the voxel features of the point cloud to be processed, and map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed; finally, through the pre-trained multi-task neural The backbone network of the network extracts features from the bird's-eye view features to obtain point cloud feature vectors; through the point cloud detection network branch of the multi-task neural network, target objects are detected based on the point cloud feature vectors and output point clouds Detection results; and, through the point cloud segmentation network branch of the multi-task neural network, point cloud segmentation is performed based on the point cloud feature vector, and the point cloud segmentation result is output, which helps to improve the efficiency of point cloud detection and point cloud segmentation. .
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。The above description is only an overview of the technical solutions of the present application. In order to have a clearer understanding of the technical means of the present application, they can be implemented according to the content of the description, and in order to make the above and other purposes, features and advantages of the present application more obvious and understandable. , the specific implementation methods of the present application are specifically listed below.
附图说明Description of the drawings
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments These are part of the embodiments of this application, but not all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
图1是本申请实施例一的点云检测和分割方法流程示意图;Figure 1 is a schematic flow chart of the point cloud detection and segmentation method in Embodiment 1 of the present application;
图2是本申请实施例一的点云体素化处理效果示意图Figure 2 is a schematic diagram of the effect of point cloud voxelization processing in Embodiment 1 of the present application.
图3是本申请实施例一中采用的多任务神经网络结构示意图;Figure 3 is a schematic structural diagram of the multi-task neural network used in Embodiment 1 of the present application;
图4是本申请实施例一中点云分割结果映射示意图;Figure 4 is a schematic diagram of point cloud segmentation result mapping in Embodiment 1 of the present application;
图5是本申请实施例二的点云检测和分割装置结构示意图之一;Figure 5 is one of the structural schematic diagrams of the point cloud detection and segmentation device in Embodiment 2 of the present application;
图6是本申请实施例二的点云检测和分割装置结构示意图之二Figure 6 is the second structural schematic diagram of the point cloud detection and segmentation device in Embodiment 2 of the present application.
图7示意性地示出了用于执行根据本申请的方法的电子设备的框图;以及Figure 7 schematically shows a block diagram of an electronic device for performing a method according to the present application; and
图8示意性地示出了用于保持或者携带实现根据本申请的方法的程序代码的存储单元。Figure 8 schematically shows a storage unit for holding or carrying program code for implementing the method according to the present application.
具体实施例Specific embodiments
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是 全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
实施例一Embodiment 1
本申请实施例公开的一种点云检测和分割方法,如图1所示,所述方法包括:步骤110至步骤150。An embodiment of the present application discloses a point cloud detection and segmentation method, as shown in Figure 1. The method includes: steps 110 to 150.
步骤110,对待处理点云进行柱状体素化处理,获取构成所述待处理点云的若干柱状体素。Step 110: Perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed.
本申请实施例中所述的待处理点云为:点云采集设备(如激光雷达传感器)采集到的点云中感兴趣区域内的点云。The point cloud to be processed described in the embodiment of this application is: the point cloud in the area of interest in the point cloud collected by a point cloud collection device (such as a lidar sensor).
以本申请实施例中所述的点云检测和分割方法应用于汽车自动驾驶场景为例,设置在车辆上的激光雷达传感器采集的原始点云是若干无序点的数据集,其中,每个点的数据可以采用维度为4的数据表示,例如,表示为:(x,y,z,i),其中,x,y,z是点的空间位置坐标,i表示该点的反射强度。Taking the application of the point cloud detection and segmentation method described in the embodiments of this application to the automatic driving scene as an example, the original point cloud collected by the lidar sensor installed on the vehicle is a data set of several disordered points, where each Point data can be represented by data with a dimension of 4, for example, expressed as: (x, y, z, i), where x, y, z are the spatial position coordinates of the point, and i represents the reflection intensity of the point.
对于点云采集设备采集的原始点云,首先需要进行点云预处理,以获得符合要求的点集。例如,对于原始点云,去除其中nan值(空值),或者,去除其中数值非常大的点,以过滤点云噪声。点云预处理的具体实施方案可以参见现有技术,本申请实施例中对点云预处理采用的技术方案不做限定,此处亦不再赘述。For the original point cloud collected by point cloud acquisition equipment, point cloud preprocessing first needs to be performed to obtain a point set that meets the requirements. For example, for the original point cloud, remove the nan values (null values), or remove the points with very large values to filter the point cloud noise. For specific implementation solutions of point cloud preprocessing, please refer to the prior art. In the embodiments of this application, the technical solution adopted for point cloud preprocessing is not limited and will not be described again here.
点云采集设备(如激光雷达传感器)采集到的点云是三维的不规则空间区域内的点,在对点云进行检测和分割之前,首先需要从中确定一个规则的空间区域内的点云。例如,通过限定x,y和z方向的坐标范围,取一块大的立方体区域中的点云,其余的舍弃,这个立方体区域的大小可以表示为:[xmax-xmin,ymax-ymin,zmax-zmin],其中,xmax和xmin分别表示x方向的坐标最大值和最小值,ymax和ymin分别表示y方向的坐标最大值和最小值,zmax和zmin分别表示z方向的坐标最大值和最小值。The point cloud collected by point cloud collection equipment (such as lidar sensor) is a point cloud in a three-dimensional irregular spatial area. Before detecting and segmenting the point cloud, it is first necessary to determine a point cloud in a regular spatial area. For example, by limiting the coordinate range in the x, y and z directions, taking a point cloud in a large cubic area, and discarding the rest, the size of this cubic area can be expressed as: [xmax-xmin, ymax-ymin, zmax-zmin ], where xmax and xmin respectively represent the maximum and minimum coordinate values in the x direction, ymax and ymin represent the maximum and minimum coordinate values in the y direction respectively, and zmax and zmin represent the maximum and minimum coordinate values in the z direction respectively.
进一步的,获取前文确定的大立方体区域中感兴趣区域内的点的数据,便于后续对感兴趣区域内的点云进行点云检测和点云分割。本申请的一些实施例中,感兴趣区域内的点的坐标可以通过(x,y,z)表示,其中,xmin<x<xmax,ymin<y<ymax,zmin<z<zmax,单位是米。Further, the data of the points in the area of interest in the large cube area determined previously is obtained to facilitate subsequent point cloud detection and point cloud segmentation of the point cloud in the area of interest. In some embodiments of the present application, the coordinates of points within the area of interest can be expressed by (x, y, z), where xmin < x < xmax, ymin < y < ymax, zmin < z < zmax, and the unit is meters .
本申请的一些实施例中,感兴趣区域中的点根据点云质量确定。例如,距离车辆较远位置的点云比较稀疏,打到车上的点数较少,可以设置最小点 数为一个较小数值(例如:点数值等于5),然后,根据这个点数找到相应数量的点,并根据一个最大距离的点,确定一个空间区域。本申请的一些实施例中,对于同样的点云质量(如同样的点云采集设备采集的点云),这个距离可以通过采集点云数据的质量预先确定,在应用过程中不再改变。In some embodiments of the present application, points in the region of interest are determined based on point cloud quality. For example, if the point cloud far away from the vehicle is sparse and the number of points hitting the vehicle is small, you can set the minimum number of points to a smaller value (for example, the point value is equal to 5), and then find the corresponding number of points based on this number. , and determine a spatial area based on a maximum distance point. In some embodiments of the present application, for the same point cloud quality (such as point clouds collected by the same point cloud collection device), this distance can be predetermined by the quality of the collected point cloud data and will not change during the application process.
感兴趣区域的确定方法可以参见现有技术中点云检测或点云分割方案中采用的确定感兴趣区域的方法,本申请实施例中,对确定感兴趣区域的具体实施方式不做限定。For the method of determining the region of interest, please refer to the method of determining the region of interest used in point cloud detection or point cloud segmentation solutions in the prior art. In the embodiments of this application, the specific implementation method of determining the region of interest is not limited.
由于点云采集设备采集到的点云中包括的点非常多,因此,基于点进行特征提取,用于点云检测点云分割时,会消耗大量的计算资源,因此,本申请实施例中,首先对原始点云进行体素化处理,后续,基于体素进行特征提取,可以有效减少数据处理量,节省计算资源。Since the point cloud collected by the point cloud collection device contains many points, feature extraction based on points for point cloud detection and point cloud segmentation will consume a large amount of computing resources. Therefore, in the embodiment of the present application, First, voxelize the original point cloud, and then perform feature extraction based on voxels, which can effectively reduce the amount of data processing and save computing resources.
本申请的一些实施例中,所述对待处理点云进行柱状体素化处理,获取构成所述待处理点云的若干柱状体素,包括:按照第一坐标轴和第二坐标轴的坐标分布,将所述待处理点云中的点划分至若干柱状体素。本申请的一些实施例中,所述第一坐标轴和第二坐标轴为三维空间坐标系的两个不同坐标轴,所述的柱状体素为棱柱状体素。例如,图2中左侧所示的点云经过体素化处理后,可以得到如图2中右侧所示的长方体体素(即柱状体素)210。In some embodiments of the present application, the point cloud to be processed is subjected to columnar voxelization processing to obtain a number of columnar voxels that constitute the point cloud to be processed, including: coordinate distribution according to the first coordinate axis and the second coordinate axis. , divide the points in the point cloud to be processed into several columnar voxels. In some embodiments of the present application, the first coordinate axis and the second coordinate axis are two different coordinate axes of a three-dimensional spatial coordinate system, and the columnar voxels are prismatic voxels. For example, after the point cloud shown on the left side in Figure 2 is voxelized, a cuboid voxel (ie, a columnar voxel) 210 shown on the right side in Figure 2 can be obtained.
以第一坐标轴为x轴,第二坐标轴为y轴,可以将感兴趣区域内的点分别沿着x轴和y轴方向,划分成长方体体素,z轴方向不做划分,划分得到的每个体素的大小可以表示为[x v,y v,zmaz-zmin],其中,x v表示体素沿x轴方向的长度,y v表示体素沿y轴方向的长度,zmax-zmin表示体素沿z轴方向的高度,单位是米。按照前述柱状体素生成方法,对应一个感兴趣区域,将可以划分得到W×H个柱状体素,其中, Taking the first coordinate axis as the x-axis and the second coordinate axis as the y-axis, the points in the area of interest can be divided into cuboid voxels along the x-axis and y-axis directions respectively. The z-axis direction is not divided, and the division is obtained. The size of each voxel can be expressed as [x v , y v , zmaz-zmin], where x v represents the length of the voxel along the x-axis direction, y v represents the length of the voxel along the y-axis direction, zmax-zmin Represents the height of the voxel along the z-axis direction, in meters. According to the aforementioned columnar voxel generation method, corresponding to a region of interest, W×H columnar voxels can be divided, where,
Figure PCTCN2022117322-appb-000001
Figure PCTCN2022117322-appb-000001
以感兴趣区域中x的范围为(0,102.4),y的范围为(0,50),z的范围为(0,100),柱状体素大小为0.2×0.2×100为例,则x轴方向柱状体素个数w等于(102.4-0)/0.2=512,y轴方向柱状体素个数H等于(50-0)/0.2=250,则感兴趣区域被划分为512×250个柱状体素。后续,这些柱状体素被视为图像像素,用于进行感兴趣区域的特征提取。本申请的一些实施例中,经过体素化处理之后,感兴趣区域的点云可以表示为W×H×1的体素图像,体素 图像的维度为W×H×1。Taking the range of x in the area of interest as (0,102.4), the range of y as (0,50), the range of z as (0,100), and the size of the columnar voxel as 0.2×0.2×100 as an example, then the columnar volume in the x-axis direction The number of pixels w is equal to (102.4-0)/0.2=512, and the number of columnar voxels in the y-axis direction H is equal to (50-0)/0.2=250. The area of interest is divided into 512×250 columnar voxels. Subsequently, these columnar voxels are regarded as image pixels and used for feature extraction of the region of interest. In some embodiments of the present application, after voxelization processing, the point cloud of the area of interest can be represented as a voxel image of W×H×1, and the dimension of the voxel image is W×H×1.
本申请的一些实施例中,所述柱状体素的大小是通过实验确定的。例如,可以预设一些体素大小,分别做点云检测和点云分割实验,分析体素大小对检测和分割结果及性能的影响,最后确定最有的体素大小。In some embodiments of the present application, the size of the columnar voxels is determined experimentally. For example, you can preset some voxel sizes, conduct point cloud detection and point cloud segmentation experiments respectively, analyze the impact of voxel size on detection and segmentation results and performance, and finally determine the most optimal voxel size.
本申请的一些实施例中,所述对待处理点云进行柱状体素化处理,获取构成所述待处理点云的若干柱状体素之后,还包括:获取所述若干柱状体素的第一点云分割标签,其中,所述第一点云分割标签中包括:每个所述柱状体素的位置信息。对于划分得到的W×H个柱状体素,这些柱状体素组成了体素维度为W×H×1的体素图像,该体素图像的第一点云分割标签,即上述W×H个柱状体素的第一点云分割标签,可以通过一张大小为W×H的位置信息表表示,例如表示为(W,H,1)。所述第一点云分割标签用于后续根据柱状体素的分割结果确定点云的分割结果。In some embodiments of the present application, after the point cloud to be processed is subjected to columnar voxelization and a plurality of columnar voxels constituting the point cloud to be processed are obtained, the method further includes: obtaining the first point of the plurality of columnar voxels. Cloud segmentation label, wherein the first point cloud segmentation label includes: position information of each columnar voxel. For the W×H columnar voxels obtained by division, these columnar voxels form a voxel image with a voxel dimension of W×H×1. The first point cloud segmentation label of this voxel image is the above-mentioned W×H The first point cloud segmentation label of the columnar voxel can be represented by a position information table of size W×H, for example, expressed as (W, H, 1). The first point cloud segmentation label is used to subsequently determine the segmentation result of the point cloud based on the segmentation result of the columnar voxels.
本申请的一些实施例中,所述获取所述若干柱状体素的第一点云分割标签,包括:对于每个所述柱状体素,将所述柱状体素的位置信息,作为所述柱状体素匹配的第一点云分割标签。例如,所述第一点云分割标签可以通过一张大小为W×H的位置信息表表示,例如表示为(W,H,1)。该位置信息表中,包括W×H组位置信息,每组位置信息对应一个柱状体素,例如,每组位置信息用于表示相应柱状体素在x轴和y轴上的坐标范围。由此可见,每组位置信息还可以用于表示点云中划分至与该组位置信息对应的柱状体素中的点的坐标范围。本申请的一些实施例中,可以通过在所述位置信息表中记录相应柱状体素的坐标范围,从而建立点云中的点与该柱状体素之间的映射关系。本申请的另一些实施例中,还可以采用其他方式建立点云中的点与柱状体素之间的映射关系。本申请实施例中,对映射关系的具体表现形式不做限定。In some embodiments of the present application, obtaining the first point cloud segmentation labels of the several columnar voxels includes: for each columnar voxel, position information of the columnar voxel is used as the columnar voxel. First point cloud segmentation label for voxel matching. For example, the first point cloud segmentation label can be represented by a position information table of size W×H, for example, represented as (W, H, 1). The position information table includes W×H sets of position information, and each set of position information corresponds to a columnar voxel. For example, each set of position information is used to represent the coordinate range of the corresponding columnar voxel on the x-axis and y-axis. It can be seen that each set of position information can also be used to represent the coordinate range of points in the point cloud divided into columnar voxels corresponding to the set of position information. In some embodiments of the present application, the mapping relationship between the points in the point cloud and the columnar voxel can be established by recording the coordinate range of the corresponding columnar voxel in the position information table. In other embodiments of the present application, other methods may be used to establish the mapping relationship between points in the point cloud and columnar voxels. In the embodiments of this application, the specific expression form of the mapping relationship is not limited.
步骤120,对所述若干柱状体素进行特征提取和映射,获取所述待处理点云的体素特征。Step 120: Perform feature extraction and mapping on the plurality of columnar voxels to obtain voxel features of the point cloud to be processed.
在获取到构成待处理点云(如前述感兴趣区域的点云)的若干柱状体素之后,可以将柱状体素看作为图像的像素,对由所述若干柱状体素构成的体素图像进行特征提取和映射,获取所述体素图像的特征,由于所述体素图像的特征是基于柱状体素内点的分布数据提取的,因此,可以充分表达待处理点云的特征。After obtaining a number of columnar voxels that constitute the point cloud to be processed (such as the point cloud of the aforementioned area of interest), the columnar voxels can be regarded as pixels of the image, and the voxel image composed of the several columnar voxels can be processed. Feature extraction and mapping are used to obtain the features of the voxel image. Since the features of the voxel image are extracted based on the distribution data of points within columnar voxels, the features of the point cloud to be processed can be fully expressed.
本申请的一些实施例中,所述对所述若干柱状体素进行特征提取和映 射,获取所述待处理点云的体素特征,包括:对于每个所述柱状体素,获取划分至所述柱状体素中的所有点的中心点,并计算划分至所述柱状体素中的每个点与所述中心点之间的坐标距离;对于每个所述柱状体素,将划分至所述柱状体素中的所有点的点特征,拼接为所述柱状体素的体素特征,其中,每个所述点的所述点特征包括:所述点的位置坐标和反射强度信息;对所述柱状体素的体素特征进行拼接,得到所述若干柱状体素的拼接特征;对所述拼接特征进行特征映射,获取所述待处理点云的体素特征。In some embodiments of the present application, performing feature extraction and mapping on the plurality of columnar voxels to obtain the voxel features of the point cloud to be processed includes: for each columnar voxel, obtaining the points divided into The center point of all points in the columnar voxel is calculated, and the coordinate distance between each point divided into the columnar voxel and the center point is calculated; for each columnar voxel, the coordinate distance divided into the columnar voxel is calculated. The point features of all points in the columnar voxels are spliced into the voxel features of the columnar voxels, wherein the point features of each point include: the position coordinates and reflection intensity information of the point; The voxel features of the columnar voxels are spliced to obtain the splicing features of the several columnar voxels; feature mapping is performed on the splicing features to obtain the voxel features of the point cloud to be processed.
例如,对于前述步骤中得到的各个柱状体素,每个柱状体素中会包含一定数量的点。以某一柱状体素中包括K个点为例,首先根据这K个点的原始点云数据中的位置坐标,计算这K个点的坐标平均值
Figure PCTCN2022117322-appb-000002
作为这K个点的中心点坐标;然后,将这K个点的位置坐标分别减去前述坐标平均值,得到
Figure PCTCN2022117322-appb-000003
并采用x c,y c,z c表示所述柱状体素中的点与所述中心点之间的坐标距离;之后,将每个柱状体素中的点特征用数据x,y,z,i,x c,y c,z c表示。这样,一个包含K个点的柱状体素的特征可以表示为长度为K×7的特征,即柱状体素的特征可以通过包括的所有点的点特征表示。
For example, for each columnar voxel obtained in the previous step, each columnar voxel will contain a certain number of points. Taking a columnar voxel that contains K points as an example, first calculate the average coordinate of these K points based on the position coordinates in the original point cloud data of these K points.
Figure PCTCN2022117322-appb-000002
as the center point coordinates of these K points; then, subtract the aforementioned average coordinates from the position coordinates of these K points to obtain
Figure PCTCN2022117322-appb-000003
And use x c , y c , z c to represent the coordinate distance between the point in the columnar voxel and the center point; then, the point features in each columnar voxel are used as data x, y, z, i, x c , y c , z c represent. In this way, the features of a columnar voxel containing K points can be expressed as features with a length of K×7, that is, the features of the columnar voxel can be represented by the point features of all included points.
进一步的,对于包括N个柱状体素的待处理点云(如前述感兴趣区域的点云),其体素特征可以通过对该待处理点云进行体素化处理后得到的N个柱状体素的特征进行表示。例如,对于包括N个柱状体素的待处理点云,将对其进行体素化处理后得到的N个柱状体素的特征(如前述长度为K×7的特征)进行拼接,得到一个长度为N×K×7的拼接特征。Further, for a point cloud to be processed including N columnar voxels (such as the point cloud of the aforementioned area of interest), the voxel characteristics can be obtained by voxelizing the point cloud to be processed. Express the characteristics of elements. For example, for a point cloud to be processed that includes N columnar voxels, the features of the N columnar voxels obtained after voxelization processing (such as the aforementioned features with a length of K×7) are spliced to obtain a length is the splicing feature of N×K×7.
本申请的一些实施例中,如果某一柱状体素中没有点,可以将该柱状体素舍弃。In some embodiments of the present application, if there are no points in a certain columnar voxel, the columnar voxel can be discarded.
接下来,进一步对获取的拼接特征进行特征映射,得到所述待处理点云的预设维度的体素特征。例如,对于N个柱状体素的拼接特征,可以通过预先训练的特征提取网络对所述拼接特征进行特征映射,得到长度为N×D的特征,其中,D表示每个柱状体素的特征维度数。本申请的一些实施例中,所述特征提取网络可以由全连接层、归一化层和一维最大池化层MaxPool1D串行连接构建,最后,输出N×D维度的特征,其中D为全连接层输出的维度。Next, feature mapping is further performed on the acquired splicing features to obtain voxel features of preset dimensions of the point cloud to be processed. For example, for the spliced features of N columnar voxels, the spliced features can be feature mapped through a pre-trained feature extraction network to obtain features with a length of N×D, where D represents the feature dimension of each columnar voxel. number. In some embodiments of the present application, the feature extraction network can be constructed by serial connection of a fully connected layer, a normalization layer and a one-dimensional maximum pooling layer MaxPool1D. Finally, N×D-dimensional features are output, where D is the full The dimension of the connection layer output.
步骤130,将所述体素特征映射到鸟瞰图,得到所述待处理点云对应的 鸟瞰图特征。Step 130: Map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed.
本申请的一些实施例中,所述将所述体素特征映射到鸟瞰图,得到所述待处理点云对应的鸟瞰图特征,包括:根据所述第一点云分割标签中每个所述柱状体素的所述位置信息,获取每个所述柱状体素中包括的点的数量;对于每个所述柱状体素,根据所述柱状体素中包括的点的数量,将所述体素特征中与所述柱状体素对应的特征,映射到与所述第一点云分割标签匹配的鸟瞰图的相应位置上,得到所述待处理点云对应的鸟瞰图特征;其中,所述根据所述柱状体素中包括的点的数量,将所述体素特征中与所述柱状体素对应的特征,映射到与所述第一点云分割标签匹配的鸟瞰图的相应位置上,包括:在所述柱状体素中包括的点的数量大于0的情况下,将所述体素特征中与所述柱状体素对应的特征向量,映射到与所述第一点云分割标签匹配的鸟瞰图的相应位置上;在所述柱状体素中包括的点的数量等于0的情况下,将与所述第一点云分割标签匹配的鸟瞰图的相应位置上的特征向量设置为0。In some embodiments of the present application, mapping the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed includes: segmenting each of the first point cloud labels according to The position information of the columnar voxel is used to obtain the number of points included in each columnar voxel; for each columnar voxel, the volume is calculated according to the number of points included in the columnar voxel. The feature corresponding to the columnar voxel in the voxel feature is mapped to the corresponding position of the bird's-eye view that matches the first point cloud segmentation label, and the bird's-eye view feature corresponding to the point cloud to be processed is obtained; wherein, According to the number of points included in the columnar voxel, the feature of the voxel feature corresponding to the columnar voxel is mapped to the corresponding position of the bird's-eye view that matches the first point cloud segmentation label, Including: when the number of points included in the columnar voxel is greater than 0, mapping the feature vector corresponding to the columnar voxel in the voxel feature to match the first point cloud segmentation label at the corresponding position of the bird's-eye view; when the number of points included in the columnar voxel is equal to 0, the feature vector at the corresponding position of the bird's-eye view that matches the first point cloud segmentation label is set to 0 .
如前文所述,每个柱状体素与第一点云分割标签中的一个标签数据(即一组位置信息)对应,例如,长度为W×H的第一点云分割标签中,第一个标签数据对应坐标范围为(0,0)至(0.2,0.2)的柱状体素。在本步骤中,可以初始化一张与第一点云分割标签维度匹配的鸟瞰图,例如,初始化一张大小为W×H的鸟瞰图,使得鸟瞰图上具有W×H个像素,每个像素的特征通过D维特征向量表示,每个像素对应一个柱状体素。这样,对于前述步骤得到的长度为N×D的体素特征,可以将该体素特征中包括的N个柱状体素各自的特征向量,分别映射到所述鸟瞰图上的相应位置,得到大小为W×H×D的鸟瞰图特征。As mentioned above, each columnar voxel corresponds to a label data (i.e. a set of position information) in the first point cloud segmentation label. For example, in the first point cloud segmentation label with a length of W×H, the first The label data corresponds to columnar voxels with coordinates ranging from (0, 0) to (0.2, 0.2). In this step, you can initialize a bird's-eye view that matches the dimension of the first point cloud segmentation label. For example, initialize a bird's-eye view with a size of W×H so that there are W×H pixels on the bird's-eye view. Each pixel The features are represented by D-dimensional feature vectors, and each pixel corresponds to a columnar voxel. In this way, for the voxel feature with a length of N×D obtained in the previous step, the respective feature vectors of the N columnar voxels included in the voxel feature can be mapped to the corresponding positions on the bird's-eye view to obtain the size is the bird's-eye view feature of W×H×D.
本申请的一些实施例中,由于点云的稀疏性,某些柱状体素中可能没有点,这样,在进行特征映射时,所述鸟瞰图上与不包括点的所述柱状体素相对应的位置,可以将其特征向量设置为零向量。In some embodiments of the present application, due to the sparsity of the point cloud, some columnar voxels may not have points. In this way, when performing feature mapping, the bird's-eye view corresponds to the columnar voxels that do not include points. position, its eigenvector can be set to a zero vector.
步骤140,通过预先训练的多任务神经网络的主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量。Step 140: Feature extraction is performed on the bird's-eye view features through the backbone network of the pre-trained multi-task neural network to obtain a point cloud feature vector.
本申请的一些实施例中,如图3所示,所述多任务神经网络包括:主干网络310、点云检测网络分支320和云分割网络分支330。In some embodiments of the present application, as shown in Figure 3, the multi-task neural network includes: a backbone network 310, a point cloud detection network branch 320 and a cloud segmentation network branch 330.
其中,所述主干网络310可以采用现有技术中通用的卷积神经网络。例如,本申请的一些实施例中,如图3所示,所述主干网络310进一步包括:三个不同尺度的级联的特征提取模块和一个特征拼接层(ConCat),其中, 每个特征提取模块包括:不同数量的特征映射模块(CBR),一个上采样层,以及,一个特征映射模块(CBR)。每个特征提取模块包括的特征映射模块(CBR)数量可以分别是4、6、6,特征映射模块(CBR)可以由卷积层、批量归一化层和Relu激活函数级联构成。以输入特征的大小为W×H为例,这三个特征提取模块输出的特征的尺寸分别是
Figure PCTCN2022117322-appb-000004
所述特征拼接层用于将上述三个特征提取模块输出的特征进行拼接。这样,将大小为W×H×D的所述鸟瞰图特征输入至主干网络310之后,上述三个特征提取模块分别对输入的鸟瞰图特征进行卷积运算、上采样、归一化和激活处理,最后通过特征拼接层进行拼接后,得到的特征向量维度为
Figure PCTCN2022117322-appb-000005
C是特征通道数。
Among them, the backbone network 310 may adopt a convolutional neural network commonly used in the prior art. For example, in some embodiments of the present application, as shown in Figure 3, the backbone network 310 further includes: three cascaded feature extraction modules of different scales and a feature concatenation layer (ConCat), where each feature extraction The modules include: different numbers of feature mapping modules (CBR), an upsampling layer, and, a feature mapping module (CBR). The number of feature mapping modules (CBR) included in each feature extraction module can be 4, 6, and 6 respectively. The feature mapping module (CBR) can be composed of a convolution layer, a batch normalization layer, and a cascade of Relu activation functions. Taking the size of the input feature as W×H as an example, the sizes of the features output by these three feature extraction modules are respectively
Figure PCTCN2022117322-appb-000004
The feature splicing layer is used to splice the features output by the above three feature extraction modules. In this way, after the bird's-eye view features of size W×H×D are input to the backbone network 310, the above three feature extraction modules perform convolution operation, upsampling, normalization and activation processing on the input bird's-eye view features respectively. , and finally after splicing through the feature splicing layer, the obtained feature vector dimension is
Figure PCTCN2022117322-appb-000005
C is the number of characteristic channels.
步骤150,通过所述多任务神经网络的点云检测网络分支,基于所述点云特征向量进行目标物检测,输出点云检测结果;以及,通过所述多任务神经网络的点云分割网络分支,基于所述点云特征向量进行点云分割,输出点云分割结果。Step 150, through the point cloud detection network branch of the multi-task neural network, perform target object detection based on the point cloud feature vector, and output the point cloud detection results; and, through the point cloud segmentation network branch of the multi-task neural network , perform point cloud segmentation based on the point cloud feature vector, and output the point cloud segmentation result.
接下来,主干网络310输出的点云特征向量,将分别输入至点云检测网络分支320和云分割网络分支330,由这两个网络分支分别进行下一步处理。Next, the point cloud feature vectors output by the backbone network 310 will be input to the point cloud detection network branch 320 and the cloud segmentation network branch 330 respectively, and these two network branches will perform the next step of processing respectively.
下面分别结合点云检测网络分支320和云分割网络分支330的网络结构,对点云检测任务和点云分割任务的执行方案进行举例说明。The following is an example of the execution scheme of the point cloud detection task and the point cloud segmentation task in conjunction with the network structures of the point cloud detection network branch 320 and the cloud segmentation network branch 330 respectively.
本申请的一些实施例中,点云检测网络分支320包括四个检测头,分别用于输出是否存在热力图的检测结果、检测到的目标物位置、目标物大小和目标物的旋转角度。本申请的一些实施例中,点云检测网络分支320包括的每个检测头均由特征提取模块和卷积层级联构成,其中,特征提取模块进一步由卷积层、批量归一化层和激活函数组成。每个检测头分别对输入的点云特征向量进行特征编码和变换映射,最终输出相应的预测结果。例如,对应检测热力图的检测头,对大小为
Figure PCTCN2022117322-appb-000006
的点云特征向量中每个位置分别进行预测,输出相应位置是否为热力图上的关键点;又例如,对应检测目标物的检测头,对大小为
Figure PCTCN2022117322-appb-000007
的点云特征向量中进行预测,输出检测到的目标物的位置(x,y,z);再例如,对应输出目标物大小的检测头,输出目标物的大小(dx,dy,dz);对应输出目标物旋转角度的检测头,输出目标物的旋转角度θ。
In some embodiments of the present application, the point cloud detection network branch 320 includes four detection heads, which are respectively used to output the detection results of whether there is a heat map, the detected target position, the size of the target, and the rotation angle of the target. In some embodiments of the present application, each detection head included in the point cloud detection network branch 320 is composed of a feature extraction module and a convolutional layer. The feature extraction module is further composed of a convolutional layer, a batch normalization layer and an activation layer. Function composition. Each detection head performs feature encoding and transformation mapping on the input point cloud feature vector, and finally outputs the corresponding prediction result. For example, the detection head corresponding to the detection heat map has a size of
Figure PCTCN2022117322-appb-000006
Each position in the point cloud feature vector is predicted separately, and whether the corresponding position is a key point on the heat map is output; for another example, the detection head corresponding to the detection target object has a size of
Figure PCTCN2022117322-appb-000007
Predict from the point cloud feature vector and output the position of the detected target (x, y, z); for another example, the detection head corresponding to the output target size outputs the size of the target (dx, dy, dz); Corresponding to the detection head that outputs the rotation angle of the target object, it outputs the rotation angle θ of the target object.
本申请的一些实施例中,如图3所示,所述点云分割网络分支330由上采样模块、特征提取模块和卷积层级联构成,其中,特征提取模块进一步由 卷积层、批量归一化层和激活函数组成。所述上采样层首先对,主干网络310输出的点云特征向量进行上采样处理,然后,由卷积层、批量归一化层和激活函数依次对上采样处理得到的向量进行特征转换和映射,最后,经由卷积层输出对应柱状体素的分割结果。In some embodiments of the present application, as shown in Figure 3, the point cloud segmentation network branch 330 is composed of an upsampling module, a feature extraction module and a convolutional layer. The feature extraction module is further composed of a convolutional layer, a batch regression layer and a convolutional layer. It consists of a unified layer and an activation function. The upsampling layer first upsamples the point cloud feature vectors output by the backbone network 310, and then uses the convolution layer, batch normalization layer and activation function to sequentially perform feature conversion and mapping on the vectors obtained by the upsampling process. , Finally, the segmentation results of the corresponding columnar voxels are output through the convolutional layer.
以主干网络输出的点云特征向量大小为
Figure PCTCN2022117322-appb-000008
举例,点云分割网络分支330通过对输入的点云特征向量执行上采样、卷积运算、批量归一化、激活映射等处理,最终输出一维度为(W,H,n_class)的数据。其中,W和H是指输出数据的维度对应输入特征图的宽度和高度;n_classs表示点云语义类别的数量。以待处理点云体素化处理后得到W×H个柱状体素,W=512,H=512,点云语义类别的数量为例,点云分割网络分支330的输出数据大小为:512×512×11,表示在这512×512个位置上,每个位置都有一组分割结果预测值,数量是11,这11个分割结果预测值取值在0-1之间,总和为1,表示每个柱状体素属于相应点云语义类别的概率值。进一步的,可以取最大概率值对应的点云语义类别,作为相应柱状体素匹配的点云语义类别。
The size of the point cloud feature vector output by the backbone network is
Figure PCTCN2022117322-appb-000008
For example, the point cloud segmentation network branch 330 performs upsampling, convolution operations, batch normalization, activation mapping and other processes on the input point cloud feature vector, and finally outputs one-dimensional data of (W, H, n_class). Among them, W and H refer to the dimensions of the output data corresponding to the width and height of the input feature map; n_classs represents the number of point cloud semantic categories. Taking the W×H columnar voxels, W=512, H=512, and the number of point cloud semantic categories obtained after voxelization of the point cloud to be processed as an example, the output data size of the point cloud segmentation network branch 330 is: 512× 512×11, which means that in these 512×512 positions, each position has a set of segmentation result prediction values, the number is 11, the value of these 11 segmentation result prediction values is between 0-1, and the sum is 1, which means The probability value that each columnar voxel belongs to the corresponding point cloud semantic category. Furthermore, the point cloud semantic category corresponding to the maximum probability value can be taken as the point cloud semantic category for corresponding columnar voxel matching.
本申请的一些实施例中,点云语义类别根据具体应用场景确定。例如,对于汽车自动驾驶应用中采集的点云,可以定义点云语义类别包括但不限于以下任意一种或多种:建筑物、绿植、地面、栅栏、路沿、车道线、车辆等。In some embodiments of the present application, the point cloud semantic categories are determined according to specific application scenarios. For example, for point clouds collected in autonomous driving applications, point cloud semantic categories can be defined to include but are not limited to any one or more of the following: buildings, green plants, ground, fences, curbs, lane lines, vehicles, etc.
这样,通过所述多任务神经网络的点云分割网络分支,基于所述点云特征向量进行点云分割处理,所述点云分割网络分支将输出所述点云特征向量匹配的所述若干柱状体素(即对待处理点云进行体素化处理后得到的所有柱状体素)的点云分割结果。In this way, through the point cloud segmentation network branch of the multi-task neural network, point cloud segmentation processing is performed based on the point cloud feature vector, and the point cloud segmentation network branch will output the plurality of columns matched by the point cloud feature vector. Point cloud segmentation results of voxels (that is, all columnar voxels obtained after voxelizing the point cloud to be processed).
由前文描述可知,所述点云分割网络分支输出的分割结果是基于投影到鸟瞰图上的特征进行语义分割得到的分割结果,而在后续的点云数据处理中,需要获取点云中点的分割结果,因此,需要将基于柱状体素的分割结果转换成对点云中点的分割结果。本申请的一些实施例中,所述点云分割结果包括:每个所述柱状体素匹配的点云语义类别,所述通过所述多任务神经网络的点云分割网络分支,基于所述点云特征向量进行点云分割,输出点云分割结果之后,还包括:根据所述柱状体素的位置信息,将所述柱状体素匹配的所述点云语义类别映射到所述待处理点云中,得到所述待处理点云中点的分割结果。As can be seen from the previous description, the segmentation result output by the point cloud segmentation network branch is the segmentation result obtained by semantic segmentation based on the features projected onto the bird's-eye view. In the subsequent point cloud data processing, it is necessary to obtain the point cloud midpoint Segmentation results, therefore, need to be converted from segmentation results based on columnar voxels into segmentation results for points in the point cloud. In some embodiments of the present application, the point cloud segmentation result includes: the point cloud semantic category matched by each columnar voxel, the point cloud segmentation network branch through the multi-task neural network, based on the point cloud segmentation network branch. After the cloud feature vector performs point cloud segmentation and outputs the point cloud segmentation result, it also includes: mapping the point cloud semantic category matched by the columnar voxel to the point cloud to be processed according to the position information of the columnar voxel. , the segmentation result of the point cloud to be processed is obtained.
本申请的一些实施例中,所述根据所述柱状体素的位置信息,将所述柱状体素匹配的所述点云语义类别映射到所述待处理点云中,得到所述待处理 点云中点的分割结果,包括:根据所述柱状体素的位置信息,获取每个所述柱状体素中包含的所述待处理点云中的点;对于每个所述柱状体素,将所述柱状体素匹配的点云语义类别,作为所述柱状体素中包含的所述点匹配的点云语义类别。由前文描述可知,每个柱状体素与鸟瞰图下的一个位置相对应,通过这个映射关系得到柱状体素的分割结果,可以认为是点云中柱状区域的点云语义分割结果。如图4所示,鸟瞰图中每个方框对应一个柱状体素,鸟瞰图中每个方框匹配的图像位置对应的分割结果,可以看作该方框对应的柱状体素的分割结果。又由前文描述可知,每个柱状体素对应待处理点云中的一个空间区域,该空间区域中可能包含0个或多个点,则可以进一步将每个柱状体素的分割结果(即匹配的点云语义类别)作为该柱状体素中包括的每个点的点云语义类别,至此完成点云中点的语义分割。例如,对于坐标范围为(0,0)至(0.2,0.2)的柱状体素,如果该柱状体素的分割结果为“路沿”,则可以确定待处理点云中,坐标范围在(0,0)至(0.2,0.2)内的点匹配的点云语义类别均为“路沿”。In some embodiments of the present application, the point cloud semantic category matched by the columnar voxel is mapped to the point cloud to be processed according to the position information of the columnar voxel, and the point to be processed is obtained. The segmentation result of the point in the cloud includes: obtaining the points in the point cloud to be processed contained in each columnar voxel according to the position information of the columnar voxel; for each columnar voxel, The semantic category of the point cloud matched by the columnar voxel is used as the semantic category of the point cloud matched by the point contained in the columnar voxel. As can be seen from the previous description, each columnar voxel corresponds to a position in the bird's-eye view. Through this mapping relationship, the segmentation result of the columnar voxel is obtained, which can be considered as the point cloud semantic segmentation result of the columnar area in the point cloud. As shown in Figure 4, each box in the bird's-eye view corresponds to a columnar voxel. The segmentation result corresponding to the image position matched by each box in the bird's-eye view can be regarded as the segmentation result of the columnar voxel corresponding to the box. As can be seen from the previous description, each columnar voxel corresponds to a spatial area in the point cloud to be processed. This spatial area may contain 0 or more points. Then the segmentation result of each columnar voxel (ie, matching The point cloud semantic category) is used as the point cloud semantic category of each point included in the columnar voxel. At this point, the semantic segmentation of the points in the point cloud is completed. For example, for a columnar voxel with coordinates ranging from (0, 0) to (0.2, 0.2), if the segmentation result of the columnar voxel is "kerb", it can be determined that in the point cloud to be processed, the coordinate range is (0 , 0) to (0.2, 0.2), the semantic category of point cloud matching is "kerb".
为了便于读者更好的理解本申请实施例中公开的点云检测和分割方法,下面对所述多任务神经网络的训练方案进行举例说明。In order to facilitate readers to better understand the point cloud detection and segmentation methods disclosed in the embodiments of this application, the training scheme of the multi-task neural network is illustrated below with an example.
如前所述,预先训练的多任务神经网络包括:主干网络310、点云检测网络分支320,以及,点云分割网络分支330,本申请的一些实施例中,所述通过预先训练的多任务神经网络的主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量之前,还包括:基于若干体素化点云训练样本,训练多任务神经网络;其中,所述体素化点云训练样本是根据对若干点云分别进行柱状体素化处理后得到的柱状体素构建的;对于每条所述体素化点云训练样本,样本数据包括:若干柱状体素,样本标签包括:相应样本数据匹配的第二点云分割标签;所述第二点云分割标签用于标识相应样本数据中每个所述柱状体素匹配的点云语义类别真实值;所述柱状体素匹配的点云语义类别真实值为:点云中被划分至相应柱状体素内的点所覆盖的点云语义类别中,覆盖率最大的所述点云语义类别。As mentioned above, the pre-trained multi-task neural network includes: a backbone network 310, a point cloud detection network branch 320, and a point cloud segmentation network branch 330. In some embodiments of the present application, the pre-trained multi-task neural network The backbone network of the neural network, before performing feature extraction on the bird's-eye view features and obtaining the point cloud feature vector, also includes: training a multi-task neural network based on several voxelized point cloud training samples; wherein, the voxelized points The cloud training samples are constructed based on the columnar voxels obtained by performing columnar voxelization on several point clouds respectively; for each of the voxelized point cloud training samples, the sample data includes: several columnar voxels, and the sample labels include : The second point cloud segmentation label matching the corresponding sample data; the second point cloud segmentation label is used to identify the true value of the point cloud semantic category of each columnar voxel matching in the corresponding sample data; the columnar voxel matching The true value of the point cloud semantic category is: among the point cloud semantic categories covered by the points divided into corresponding columnar voxels in the point cloud, the point cloud semantic category with the largest coverage rate.
构建体素化点云训练样本时,生成样本数据的具体实施方式参见前述步骤中相应实施方式,如获取待处理点云,以及,对待处理点云进行体素化处理,得到若干柱状体素的具体实施方式,此处不再赘述。When constructing a voxelized point cloud training sample, the specific implementation method of generating sample data refers to the corresponding implementation method in the previous steps, such as obtaining the point cloud to be processed, and voxelizing the point cloud to be processed to obtain a number of columnar voxels. The specific implementation will not be described again here.
进一步的,对于每个点云进行体素化处理后的得到的所有柱状体素,每个柱状体素内会包含一定数量的点,这些点都由人工标注了点云语义类别, 在本申请的一些实施例中,通过对柱状体素中的点进行统计,数量最多的一类点匹配的点云语义类别,被标注为柱状体素的点云语义类别。例如,对于某一柱状体素,包括3个点,这3个点分别被标注了点云语义类别(如小型汽车、大型汽车、自行车、三轮车、行人、锥筒、绿植,地面,栅栏,路沿,车道线等),假设分别是(建筑物、建筑物、绿植),那么取数量最多的建筑物作为这个柱状体素匹配的点云语义类别。某一点云进行体素化处理后得到的所有柱状体素匹配的点云语义类别按照体素位置进行排列,即得到该点云生成的样本数据匹配的点云语义类别标签(即第二点云分割标签)。Furthermore, for all columnar voxels obtained after voxelization of each point cloud, each columnar voxel will contain a certain number of points, and these points are manually labeled with point cloud semantic categories. In this application In some embodiments, by counting the points in the columnar voxels, the point cloud semantic category matched by the largest number of points is annotated as the point cloud semantic category of the columnar voxel. For example, a certain columnar voxel includes 3 points, which are marked with point cloud semantic categories (such as small cars, large cars, bicycles, tricycles, pedestrians, cones, green plants, ground, fences, Curbs, lane lines, etc.), assuming they are (buildings, buildings, green plants), then take the largest number of buildings as the point cloud semantic category for this columnar voxel matching. The point cloud semantic categories matched by all columnar voxels obtained after voxelization of a certain point cloud are arranged according to voxel positions, that is, the point cloud semantic category labels matching the sample data generated by the point cloud are obtained (i.e., the second point cloud split tag).
以样本数据为W×H个柱状体素为例,该样本数据的样本标签为可以表示为一个W×H的标签矩阵,该标签矩阵中每一个元素为相应柱状体素匹配的点云语义类别的标识。Taking the sample data as W×H columnar voxels as an example, the sample label of the sample data can be expressed as a W×H label matrix. Each element in the label matrix is the point cloud semantic category matched by the corresponding columnar voxel. logo.
本申请的一些实施例中,所述样本标签还包括:点云检测标签,所述点云检测标签用于标识相应样本数据中的目标物检测结果真实值。例如,对于每个用于生成训练样本的点云,手动标注该点云中目标物在热力图上的关键点、目标物的空间位置坐标、立体尺寸、旋转角度,并将标准的信息作为该点云生成的训练样本的点云检测标签。In some embodiments of the present application, the sample label further includes: a point cloud detection label, which is used to identify the true value of the target detection result in the corresponding sample data. For example, for each point cloud used to generate training samples, manually mark the key points of the target object on the heat map, the spatial position coordinates, stereoscopic size, and rotation angle of the target object in the point cloud, and use the standard information as the Point cloud detection labels for training samples generated from point clouds.
本申请的一些实施例中,所述基于若干体素化点云训练样本,训练多任务神经网络,包括:对于每条所述体素化点云训练样本,分别执行以下点云检测和分割操作,得到相应体素化点云训练样本的点云检测结果预测值和点云分割结果预测值:对所述体素化点云训练样本中包括的若干柱状体素进行特征提取和映射,获取所述体素化点云训练样本的体素特征;将所述体素特征映射到鸟瞰图,得到所述体素化点云训练样本对应的鸟瞰图特征;通过所述主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量;通过所述点云检测网络分支,基于所述点云特征向量进行目标物检测,输出所述体素化点云训练样本的点云检测结果预测值;以及,通过所述点云分割网络分支,基于所述点云特征向量进行点云分割,输出所述体素化点云训练样本点云分割结果预测值;根据各所述体素化点云训练样本的点云检测结果预测值和相应的点云检测标签,计算所述多任务神经网络的点云检测损失,以及,根据各所述体素化点云训练样本的点云分割结果预测值和相应的第二点云分割标签,计算所述多任务神经网络的点云分割损失,之后,以优化所述点云检测损失和所述点云分割损失为目标,迭代训练所述多任务神经网络。In some embodiments of the present application, training a multi-task neural network based on several voxelized point cloud training samples includes: performing the following point cloud detection and segmentation operations for each of the voxelized point cloud training samples. , obtain the point cloud detection result prediction value and point cloud segmentation result prediction value of the corresponding voxelized point cloud training sample: perform feature extraction and mapping on several columnar voxels included in the voxelized point cloud training sample, and obtain the Describe the voxel features of the voxelized point cloud training sample; map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the voxelized point cloud training sample; through the backbone network, map the bird's-eye view Feature extraction is performed on the image features to obtain point cloud feature vectors; target object detection is performed based on the point cloud feature vectors through the point cloud detection network branch, and the point cloud detection result prediction value of the voxelized point cloud training sample is output ; And, through the point cloud segmentation network branch, perform point cloud segmentation based on the point cloud feature vector, and output the voxelized point cloud training sample point cloud segmentation result prediction value; according to each of the voxelized point cloud The point cloud detection result prediction value of the training sample and the corresponding point cloud detection label are calculated, and the point cloud detection loss of the multi-task neural network is calculated, and the point cloud segmentation result prediction value is based on each of the voxelized point cloud training samples. and the corresponding second point cloud segmentation label, calculate the point cloud segmentation loss of the multi-task neural network, and then iteratively train the multi-task neural network with the goal of optimizing the point cloud detection loss and the point cloud segmentation loss. network.
对所述体素化点云训练样本中包括的若干柱状体素进行特征提取和映 射,获取所述体素化点云训练样本的体素特征的具体实施方式,参见前文中提取待处理点云的体素特征的具体实施方式,此处不再赘述。Perform feature extraction and mapping on several columnar voxels included in the voxelized point cloud training sample to obtain the voxel features of the voxelized point cloud training sample. For specific implementation methods, see Extracting Point Clouds to be Processed. The specific implementation of voxel features will not be described again here.
将所述体素特征映射到鸟瞰图,得到所述体素化点云训练样本对应的鸟瞰图特征的具体实施方式,参见前文中的相关描述,此处不再赘述。For a specific implementation method of mapping the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the voxelized point cloud training samples, please refer to the relevant descriptions above and will not be described again here.
通过所述主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量的具体实施方式,参见前文中的相关描述,此处不再赘述。For specific implementation methods of extracting features from the bird's-eye view features through the backbone network to obtain point cloud feature vectors, please refer to the relevant descriptions above and will not be repeated here.
通过所述点云检测网络分支,基于所述点云特征向量进行目标物检测,输出所述体素化点云训练样本的点云检测结果预测值的具体实施方式,参见前文中得到待处理点云的检测结果的相关描述,此处不再赘述。Through the point cloud detection network branch, the target object is detected based on the point cloud feature vector, and the point cloud detection result prediction value of the voxelized point cloud training sample is output. Please refer to the previous article to obtain the point to be processed. The relevant description of the cloud detection results will not be described again here.
通过所述点云分割网络分支,基于所述点云特征向量进行点云分割,输出所述体素化点云训练样本点云分割结果预测值的具体实施方式参见前文中得到待处理点云的分割结果的相关描述,此处不再赘述。Through the point cloud segmentation network branch, perform point cloud segmentation based on the point cloud feature vector, and output the voxelized point cloud training sample point cloud segmentation result prediction value. Please refer to the previous article to obtain the point cloud to be processed. The relevant description of the segmentation results will not be repeated here.
在多任务神经网络的训练阶段,分别根据各所述体素化点云训练样本的点云检测结果预测值和相应的点云检测标签,计算所述多任务神经网络的点云检测损失。其中,点云检测损失包括四部分,分别是:热力图预测损失、位置预测损失、大小预测损失和旋转角度预测损失。In the training phase of the multi-task neural network, the point cloud detection loss of the multi-task neural network is calculated based on the point cloud detection result prediction value and the corresponding point cloud detection label of each voxelized point cloud training sample. Among them, the point cloud detection loss includes four parts, namely: heat map prediction loss, position prediction loss, size prediction loss and rotation angle prediction loss.
本申请的一些实施例中,位置预测损失、大小预测损失和旋转角度预测损失可以采用均方误差表达。例如,通过所有所述体素化点云训练样本的目标物位置(如空间位置坐标)的预测值和样本标签中的目标物位置真实值的均方误差,表示多任务神经网络的位置预测损失;通过所有所述体素化点云训练样本的目标物大小(如立体尺寸)的预测值和样本标签中的目标物大小真实值的均方误差,表示多任务神经网络的大小预测损失;通过所有所述体素化点云训练样本的目标物旋转角度的预测值和样本标签中的目标物旋转角度真实值的均方误差,表示多任务神经网络的旋转角度预测损失。In some embodiments of the present application, the position prediction loss, size prediction loss, and rotation angle prediction loss can be expressed by mean square error. For example, the position prediction loss of the multi-task neural network is represented by the mean square error of the predicted values of the target object position (such as spatial position coordinates) of all the voxelized point cloud training samples and the true value of the target object position in the sample label. ; The size prediction loss of the multi-task neural network is represented by the mean square error of the predicted value of the target size (such as three-dimensional size) of all the voxelized point cloud training samples and the true value of the target size in the sample label; by The mean square error between the predicted value of the target rotation angle of all the voxelized point cloud training samples and the true value of the target rotation angle in the sample label represents the rotation angle prediction loss of the multi-task neural network.
本申请的一些实施例中,所述热力图预测损失采用逐像素的focal loss损失函数(即焦点损失函数)计算。In some embodiments of the present application, the heat map prediction loss is calculated using a pixel-by-pixel focal loss function (ie, focal loss function).
假设目标物的位置为p,经过下采样计算后得到热力图上的关键点(p x,p y),通过高斯核将计算出的数据分布到热力图上。如果多个目标物的高斯核重叠,那么将取最大值,高斯核的公式可以表示为: Assume that the position of the target object is p. After downsampling calculation, the key points (p x , p y ) on the heat map are obtained, and the calculated data is distributed to the heat map through the Gaussian kernel. If the Gaussian kernels of multiple targets overlap, the maximum value will be taken. The formula of the Gaussian kernel can be expressed as:
Figure PCTCN2022117322-appb-000009
Figure PCTCN2022117322-appb-000009
其中,x和y为待检测图像中枚举的步长块位置,
Figure PCTCN2022117322-appb-000010
为目标尺度自适应方差,Y xyc为高斯核映射之后的每个关键点的高斯热图数据表示。
Among them, x and y are the enumerated step block positions in the image to be detected,
Figure PCTCN2022117322-appb-000010
is the target scale adaptive variance, and Y xyc is the Gaussian heat map data representation of each key point after Gaussian kernel mapping.
然后,采用逐像素的focal loss损失函数计算热力图的损失,公式如下:Then, the pixel-by-pixel focal loss function is used to calculate the loss of the heat map. The formula is as follows:
Figure PCTCN2022117322-appb-000011
Figure PCTCN2022117322-appb-000011
其中,M表示目标物总数;
Figure PCTCN2022117322-appb-000012
表示网络预测出的有目标物的可能性,取值范围为(0,1);y xyc表示是否有目标物的真实值,取值范围为(0,1);α和β为超参数,取值根据经验设定,例如,可以取α=2,β=4。
Among them, M represents the total number of targets;
Figure PCTCN2022117322-appb-000012
Represents the possibility of a target predicted by the network, and the value range is (0, 1); y xyc represents the true value of whether there is a target, and the value range is (0, 1); α and β are hyperparameters, The values are set based on experience. For example, α=2 and β=4 can be taken.
在多任务神经网络的训练阶段,分别根据各所述体素化点云训练样本的点云分割结果预测值和相应的第二点云分割标签,计算所述多任务神经网络的点云分割损失。例如,可以通过点云分割结果预测值和相应的第二点云分割标签的交叉熵,来表达点云分割损失。In the training phase of the multi-task neural network, the point cloud segmentation loss of the multi-task neural network is calculated based on the point cloud segmentation result prediction value of each voxelized point cloud training sample and the corresponding second point cloud segmentation label. . For example, the point cloud segmentation loss can be expressed by the cross entropy of the point cloud segmentation result prediction value and the corresponding second point cloud segmentation label.
进一步的,融合点云检测损失和点云分割损失,计算多任务神经网络的损失,并以整个网络的损失最小为目标,优化所述主干网络、点云检测网络分支和云分割网络分支的网络参数,从而完成多任务神经网络的训练。Further, point cloud detection loss and point cloud segmentation loss are integrated to calculate the loss of the multi-task neural network, and with the goal of minimizing the loss of the entire network, optimize the network of the backbone network, point cloud detection network branch and cloud segmentation network branch. parameters to complete the training of multi-task neural networks.
通过同时进行点云检测任务和点云分割任务对应的网络分支,不仅可以提升主干网络提取特征的表征能力,还可以实现在训练过程中两个任务相互促进,从而提升点云检测和分割的精度。By simultaneously performing the network branches corresponding to the point cloud detection task and the point cloud segmentation task, it can not only improve the representation ability of the backbone network to extract features, but also enable the two tasks to promote each other during the training process, thereby improving the accuracy of point cloud detection and segmentation. .
本申请实施例公开的点云检测和分割方法,通过对待处理点云进行柱状体素化处理,获取构成所述待处理点云的若干柱状体素;之后,对所述若干柱状体素进行特征提取和映射,获取所述待处理点云的体素特征,并将所述体素特征映射到鸟瞰图,得到所述待处理点云对应的鸟瞰图特征;最后,通过预先训练的多任务神经网络的主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量;通过所述多任务神经网络的点云检测网络分支,基于所述点云特征向量进行目标物检测,输出点云检测结果;以及,通过所述多任务神经网络的点云分割网络分支,基于所述点云特征向量进行点云分割,输出点云分割结果,有助于提升点云检测和点云分割的效率。The point cloud detection and segmentation method disclosed in the embodiment of the present application obtains a number of columnar voxels that constitute the point cloud to be processed by performing columnar voxelization processing on the point cloud to be processed; and then characterizes the several columnar voxels. Extract and map, obtain the voxel features of the point cloud to be processed, and map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed; finally, through the pre-trained multi-task neural The backbone network of the network extracts features from the bird's-eye view features to obtain point cloud feature vectors; through the point cloud detection network branch of the multi-task neural network, target objects are detected based on the point cloud feature vectors and output point clouds Detection results; and, through the point cloud segmentation network branch of the multi-task neural network, point cloud segmentation is performed based on the point cloud feature vector, and the point cloud segmentation result is output, which helps to improve the efficiency of point cloud detection and point cloud segmentation. .
通过一个多任务神经网络的主干网络对点云特征向量进行特征提取和映射,之后分别输入至点云检测任务对应的网络分支和点云分割任务对应的 网络分支,分别进行点云检测和点云分割,实现了点云检测任务和点云分割任务共享点云特征提取网络的输入,相比于采用两个神经网络独立进行点云检测和点云分割,节省了点云特征提取消耗的计算量,有效提升了点云检测和点云分割的效率。The point cloud feature vector is extracted and mapped through the backbone network of a multi-task neural network, and then input to the network branch corresponding to the point cloud detection task and the network branch corresponding to the point cloud segmentation task, respectively, for point cloud detection and point cloud detection. Segmentation enables point cloud detection tasks and point cloud segmentation tasks to share the input of the point cloud feature extraction network. Compared with using two neural networks to independently perform point cloud detection and point cloud segmentation, it saves the amount of calculation consumed by point cloud feature extraction. , effectively improving the efficiency of point cloud detection and point cloud segmentation.
现有技术中的点云检测任务通常包括:点云预处理、特征提取,以及检测头预测步骤,现有技术中的点云分割任务通常包括:点云预处理、特征提取,以及点云分割步骤,而点云预处理和特征提取这部分的时间消耗以及资源消耗(cpu,gpu等)占据了整个任务消耗的90%。从时间上举例,如检测任务耗时20ms,分割任务耗时20ms,其中,点云预处理和特征提取占据18ms,两个任务采用独立网络模型之心给,则需要消耗40ms,而如果采用本申请实施例中公开的采用一个网络进行点云检测和分割方法,消耗的总时长为18+2+2=22ms,大大提升了点云检测和分割的效率,并且节约了点云预处理和特征提取消耗的资源。Point cloud detection tasks in the prior art usually include: point cloud preprocessing, feature extraction, and detection head prediction steps. Point cloud segmentation tasks in the prior art generally include: point cloud preprocessing, feature extraction, and point cloud segmentation. steps, and the time consumption and resource consumption (cpu, gpu, etc.) of point cloud preprocessing and feature extraction account for 90% of the entire task consumption. For example, in terms of time, the detection task takes 20ms and the segmentation task takes 20ms. Among them, point cloud preprocessing and feature extraction take up 18ms. If the two tasks are based on independent network models, it will take 40ms. If this method is used, The method disclosed in the application embodiment uses a network for point cloud detection and segmentation. The total time consumed is 18+2+2=22ms, which greatly improves the efficiency of point cloud detection and segmentation and saves point cloud preprocessing and features. Extract consumed resources.
另一方面,通过将待处理点云进行体素化处理,基于柱状体素进行特征提取和映射,用于实现点云检测和分割,与直接从点云中提取特征相比,可以降低特征提取难度,从而降低网络模型的复杂度。On the other hand, by voxelizing the point cloud to be processed, feature extraction and mapping are performed based on columnar voxels to achieve point cloud detection and segmentation. Compared with extracting features directly from the point cloud, feature extraction can be reduced. difficulty, thereby reducing the complexity of the network model.
进一步的,将点云以及其点云分割标签转换到鸟瞰图下,在鸟瞰图下进行特征提取以及检测和分割,速度快而且效果好。最后,通过将模型输出的点云语义分割结果转化到点云中的每个点上,完成基于点对点云进行语义分割任务,有效提升了点云分割的速度。Furthermore, the point cloud and its point cloud segmentation labels are converted to a bird's-eye view, and feature extraction, detection and segmentation are performed under the bird's-eye view, which is fast and effective. Finally, by converting the point cloud semantic segmentation results output by the model to each point in the point cloud, the task of semantic segmentation of point clouds based on points is completed, which effectively improves the speed of point cloud segmentation.
实施例二Embodiment 2
本申请实施例公开的一种点云检测和分割装置,如图5所示,包括:A point cloud detection and segmentation device disclosed in the embodiment of the present application, as shown in Figure 5, includes:
柱状体素化模块510,用于对待处理点云进行柱状体素化处理,获取构成所述待处理点云的若干柱状体素;The columnar voxelization module 510 is used to perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed;
体素特征获取模块520,用于对所述若干柱状体素进行特征提取和映射,获取所述待处理点云的体素特征;The voxel feature acquisition module 520 is used to perform feature extraction and mapping on the plurality of columnar voxels, and obtain the voxel features of the point cloud to be processed;
鸟瞰图特征映射模块530,用于将所述体素特征映射到鸟瞰图,得到所述待处理点云对应的鸟瞰图特征;A bird's-eye view feature mapping module 530 is used to map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed;
点云特征提取模块540,用于通过预先训练的多任务神经网络的主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量;The point cloud feature extraction module 540 is used to extract features of the bird's-eye view features through the backbone network of a pre-trained multi-task neural network to obtain a point cloud feature vector;
点云检测和分割模块550,用于通过所述多任务神经网络的点云检测网 络分支,基于所述点云特征向量进行目标物检测,输出点云检测结果;以及,通过所述多任务神经网络的点云分割网络分支,基于所述点云特征向量进行点云分割,输出点云分割结果。The point cloud detection and segmentation module 550 is used to perform target object detection based on the point cloud feature vector through the point cloud detection network branch of the multi-task neural network, and output the point cloud detection results; and, through the multi-task neural network The point cloud segmentation network branch of the network performs point cloud segmentation based on the point cloud feature vector and outputs the point cloud segmentation result.
本申请的一些实施例中,如图6所示,所述点云分割结果包括:每个所述柱状体素匹配的点云语义类别,所述装置还包括:In some embodiments of the present application, as shown in Figure 6, the point cloud segmentation result includes: the point cloud semantic category matched by each columnar voxel, and the device further includes:
第一点云分割标签获取模块511,用于获取所述若干柱状体素的第一点云分割标签,其中,所述第一点云分割标签中包括:每个所述柱状体素的位置信息;The first point cloud segmentation label acquisition module 511 is used to obtain the first point cloud segmentation label of the plurality of columnar voxels, wherein the first point cloud segmentation label includes: position information of each columnar voxel. ;
分割结果转换模块560,用于根据所述柱状体素的位置信息,将所述柱状体素匹配的所述点云语义类别映射到所述待处理点云中,得到所述待处理点云中点的分割结果。The segmentation result conversion module 560 is used to map the point cloud semantic category matched by the columnar voxel to the point cloud to be processed according to the position information of the columnar voxel, and obtain the point cloud to be processed. Point segmentation results.
本申请的一些实施例中,所述根据所述柱状体素的位置信息,将所述柱状体素匹配的所述点云语义类别映射到所述待处理点云中,得到所述待处理点云中点的分割结果,包括:In some embodiments of the present application, the point cloud semantic category matched by the columnar voxel is mapped to the point cloud to be processed according to the position information of the columnar voxel, and the point to be processed is obtained. The segmentation results of cloud points include:
根据所述柱状体素的位置信息,获取每个所述柱状体素中包含的所述待处理点云中的点;According to the position information of the columnar voxels, obtain the points in the point cloud to be processed contained in each of the columnar voxels;
对于每个所述柱状体素,将所述柱状体素匹配的点云语义类别,作为所述柱状体素中包含的所述点匹配的点云语义类别。For each columnar voxel, the point cloud semantic category matched by the columnar voxel is used as the point cloud semantic category matched by the points contained in the columnar voxel.
本申请的一些实施例中,所述体素特征获取模块520,进一步用于:In some embodiments of the present application, the voxel feature acquisition module 520 is further used to:
对于每个所述柱状体素,获取划分至所述柱状体素中的所有点的中心点,并计算划分至所述柱状体素中的每个点与所述中心点之间的坐标距离;For each columnar voxel, obtain the center point of all points divided into the columnar voxel, and calculate the coordinate distance between each point divided into the columnar voxel and the center point;
对于每个所述柱状体素,将划分至所述柱状体素中的所有点的点特征,拼接为所述柱状体素的体素特征,其中,每个所述点的所述点特征包括:所述点的位置坐标和反射强度信息;For each columnar voxel, point features divided into all points in the columnar voxel are spliced into voxel features of the columnar voxel, where the point features of each point include : The position coordinates and reflection intensity information of the point;
对所述柱状体素的体素特征进行拼接,得到所述若干柱状体素的拼接特征;Splicing the voxel features of the columnar voxels to obtain the splicing features of the several columnar voxels;
对所述拼接特征进行特征映射,获取所述待处理点云的体素特征。Perform feature mapping on the spliced features to obtain voxel features of the point cloud to be processed.
本申请的一些实施例中,所述鸟瞰图特征映射模块530,进一步用于:In some embodiments of the present application, the bird's-eye view feature mapping module 530 is further used to:
根据所述第一点云分割标签中每个所述柱状体素的所述位置信息,获取每个所述柱状体素中包括的点的数量;Obtain the number of points included in each columnar voxel according to the position information of each columnar voxel in the first point cloud segmentation label;
对于每个所述柱状体素,根据所述柱状体素中包括的点的数量,将所述体素特征中与所述柱状体素对应的特征,映射到与所述第一点云分割标签匹 配的鸟瞰图的相应位置上,得到所述待处理点云对应的鸟瞰图特征;其中,For each columnar voxel, according to the number of points included in the columnar voxel, the feature corresponding to the columnar voxel in the voxel feature is mapped to the first point cloud segmentation label At the corresponding position of the matched bird's-eye view, the bird's-eye view features corresponding to the point cloud to be processed are obtained; where,
所述根据所述柱状体素中包括的点的数量,将所述体素特征中与所述柱状体素对应的特征,映射到与所述第一点云分割标签匹配的鸟瞰图的相应位置上,包括:and mapping, according to the number of points included in the columnar voxel, the feature of the voxel feature corresponding to the columnar voxel to the corresponding position of the bird's-eye view that matches the first point cloud segmentation label. above, including:
在所述柱状体素中包括的点的数量大于0的情况下,将所述体素特征中与所述柱状体素对应的特征向量,映射到与所述第一点云分割标签匹配的鸟瞰图的相应位置上;When the number of points included in the columnar voxel is greater than 0, the feature vector corresponding to the columnar voxel in the voxel feature is mapped to a bird's-eye view that matches the first point cloud segmentation label. at the corresponding position of the figure;
在所述柱状体素中包括的点的数量等于0的情况下,将与所述第一点云分割标签匹配的鸟瞰图的相应位置上的特征向量设置为0。In the case where the number of points included in the columnar voxels is equal to 0, the feature vector at the corresponding position of the bird's-eye view matching the first point cloud segmentation label is set to 0.
本申请的一些实施例中,预先训练的多任务神经网络包括:主干网络、点云检测网络分支,以及,点云分割网络分支,所述装置还包括:In some embodiments of the present application, the pre-trained multi-task neural network includes: a backbone network, a point cloud detection network branch, and a point cloud segmentation network branch. The device further includes:
多任务神经网络训练模块(图中未示出),用于基于若干体素化点云训练样本,训练多任务神经网络;A multi-task neural network training module (not shown in the figure) is used to train a multi-task neural network based on several voxelized point cloud training samples;
其中,所述体素化点云训练样本是根据对若干点云分别进行柱状体素化处理后得到的柱状体素构建的;对于每条所述体素化点云训练样本,样本数据包括:若干柱状体素,样本标签包括:相应样本数据匹配的第二点云分割标签;所述第二点云分割标签用于标识相应样本数据中每个所述柱状体素匹配的点云语义类别真实值;所述柱状体素匹配的点云语义类别真实值为:点云中被划分至相应柱状体素内的点所覆盖的点云语义类别中,覆盖率最大的所述点云语义类别。Wherein, the voxelized point cloud training sample is constructed based on the columnar voxels obtained after columnar voxelization processing of several point clouds respectively; for each of the voxelized point cloud training samples, the sample data includes: Several columnar voxels, the sample label includes: a second point cloud segmentation label matching the corresponding sample data; the second point cloud segmentation label is used to identify the true point cloud semantic category of each columnar voxel matching in the corresponding sample data. value; the true value of the point cloud semantic category matched by the columnar voxel is: among the point cloud semantic categories covered by the points in the point cloud that are divided into corresponding columnar voxels, the point cloud semantic category with the largest coverage rate.
本申请的一些实施例中,所述样本标签还包括:点云检测标签,所述点云检测标签用于标识相应样本数据中的目标物检测结果真实值,所述基于若干体素化点云训练样本,训练多任务神经网络,包括:In some embodiments of the present application, the sample label also includes: a point cloud detection label, which is used to identify the true value of the target detection result in the corresponding sample data. The sample label is based on several voxelized point clouds. Training samples to train multi-task neural networks, including:
对于每条所述体素化点云训练样本,分别执行以下点云检测和分割操作,得到相应体素化点云训练样本的点云检测结果预测值和点云分割结果预测值:For each of the voxelized point cloud training samples, the following point cloud detection and segmentation operations are performed respectively to obtain the predicted value of the point cloud detection result and the predicted value of the point cloud segmentation result of the corresponding voxelized point cloud training sample:
对所述体素化点云训练样本中包括的若干柱状体素进行特征提取和映射,获取所述体素化点云训练样本的体素特征;Perform feature extraction and mapping on several columnar voxels included in the voxelized point cloud training sample to obtain voxel features of the voxelized point cloud training sample;
将所述体素特征映射到鸟瞰图,得到所述体素化点云训练样本对应的鸟瞰图特征;Map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the voxelized point cloud training sample;
通过所述主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量;Through the backbone network, feature extraction is performed on the bird's-eye view features to obtain point cloud feature vectors;
通过所述点云检测网络分支,基于所述点云特征向量进行目标物检测,输出所述体素化点云训练样本的点云检测结果预测值;以及,通过所述点云分割网络分支,基于所述点云特征向量进行点云分割,输出所述体素化点云训练样本点云分割结果预测值;Through the point cloud detection network branch, target object detection is performed based on the point cloud feature vector, and the point cloud detection result prediction value of the voxelized point cloud training sample is output; and, through the point cloud segmentation network branch, Perform point cloud segmentation based on the point cloud feature vector, and output the predicted value of the voxelized point cloud training sample point cloud segmentation result;
根据各所述体素化点云训练样本的点云检测结果预测值和相应的点云检测标签,计算所述多任务神经网络的点云检测损失,以及,根据各所述体素化点云训练样本的点云分割结果预测值和相应的第二点云分割标签,计算所述多任务神经网络的点云分割损失,之后,以优化所述点云检测损失和所述点云分割损失为目标,迭代训练所述多任务神经网络。Calculate the point cloud detection loss of the multi-task neural network according to the point cloud detection result prediction value and the corresponding point cloud detection label of each voxelized point cloud training sample, and, according to each voxelized point cloud The point cloud segmentation result prediction value of the training sample and the corresponding second point cloud segmentation label are used to calculate the point cloud segmentation loss of the multi-task neural network. After that, the point cloud detection loss and the point cloud segmentation loss are optimized as The goal is to iteratively train the multi-task neural network.
本申请实施例公开的点云检测和分割装置,用于实现本申请实施例一中所述的点云检测和分割装置方法,装置的各模块的具体实施方式不再赘述,可参见方法实施例相应步骤的具体实施方式。The point cloud detection and segmentation device disclosed in the embodiment of this application is used to implement the point cloud detection and segmentation device method described in Embodiment 1 of this application. The specific implementation of each module of the device will not be described in detail. Please refer to the method embodiment. Specific implementation of the corresponding steps.
本申请实施例公开的点云检测和分割装置,通过对待处理点云进行柱状体素化处理,获取构成所述待处理点云的若干柱状体素;之后,对所述若干柱状体素进行特征提取和映射,获取所述待处理点云的体素特征,并将所述体素特征映射到鸟瞰图,得到所述待处理点云对应的鸟瞰图特征;最后,通过预先训练的多任务神经网络的主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量;通过所述多任务神经网络的点云检测网络分支,基于所述点云特征向量进行目标物检测,输出点云检测结果;以及,通过所述多任务神经网络的点云分割网络分支,基于所述点云特征向量进行点云分割,输出点云分割结果,有助于提升点云检测和点云分割的效率。The point cloud detection and segmentation device disclosed in the embodiment of the present application obtains a number of columnar voxels that constitute the point cloud to be processed by performing columnar voxelization processing on the point cloud to be processed; and then characterizes the several columnar voxels. Extract and map, obtain the voxel features of the point cloud to be processed, and map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed; finally, through the pre-trained multi-task neural The backbone network of the network extracts features from the bird's-eye view features to obtain point cloud feature vectors; through the point cloud detection network branch of the multi-task neural network, target objects are detected based on the point cloud feature vectors and output point clouds Detection results; and, through the point cloud segmentation network branch of the multi-task neural network, point cloud segmentation is performed based on the point cloud feature vector, and the point cloud segmentation result is output, which helps to improve the efficiency of point cloud detection and point cloud segmentation. .
通过一个多任务神经网络的主干网络对点云特征向量进行特征提取和映射,之后分别输入至点云检测任务对应的网络分支和点云分割任务对应的网络分支,分别进行点云检测和点云分割,实现了点云检测任务和点云分割任务共享点云特征提取网络的输入,相比于采用两个神经网络独立进行点云检测和点云分割,节省了点云特征提取消耗的计算量,有效提升了点云检测和点云分割的效率。The point cloud feature vector is extracted and mapped through the backbone network of a multi-task neural network, and then input to the network branch corresponding to the point cloud detection task and the network branch corresponding to the point cloud segmentation task, respectively, for point cloud detection and point cloud detection. Segmentation enables point cloud detection tasks and point cloud segmentation tasks to share the input of the point cloud feature extraction network. Compared with using two neural networks to independently perform point cloud detection and point cloud segmentation, it saves the amount of calculation consumed by point cloud feature extraction. , effectively improving the efficiency of point cloud detection and point cloud segmentation.
另一方面,通过将待处理点云进行体素化处理,基于柱状体素进行特征提取和映射,用于实现点云检测和分割,与直接从点云中提取特征相比,可以降低特征提取难度,从而降低网络模型的复杂度。On the other hand, by voxelizing the point cloud to be processed, feature extraction and mapping are performed based on columnar voxels to achieve point cloud detection and segmentation. Compared with extracting features directly from the point cloud, feature extraction can be reduced. difficulty, thereby reducing the complexity of the network model.
进一步的,将点云以及其点云分割标签转换到鸟瞰图下,在鸟瞰图下进行特征提取以及检测和分割,速度快而且效果好。最后,通过将模型输出的 点云语义分割结果转化到点云中的每个点上,完成基于点对点云进行语义分割任务,有效提升了点云分割的速度。Furthermore, the point cloud and its point cloud segmentation labels are converted to a bird's-eye view, and feature extraction, detection and segmentation are performed under the bird's-eye view, which is fast and effective. Finally, by converting the point cloud semantic segmentation results output by the model to each point in the point cloud, the task of semantic segmentation of point clouds based on points is completed, which effectively improves the speed of point cloud segmentation.
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same and similar parts between the various embodiments can be referred to each other. As for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple. For relevant details, please refer to the partial description of the method embodiment.
以上对本申请提供的一种点云检测和分割方法及方法进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其一种核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The point cloud detection and segmentation method provided by this application has been introduced in detail above. This article uses specific examples to illustrate the principles and implementation methods of this application. The description of the above embodiments is only used to help understand this application. The method and its core idea; at the same time, for those of ordinary skill in the field, there will be changes in the specific implementation and application scope based on the ideas of this application. In summary, the contents of this specification should not understood as a limitation on this application.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
本申请的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的电子设备中的一些或者全部部件的一些或者全部功能。本申请还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the electronic device according to embodiments of the present application. The present application may also be implemented as an apparatus or device program (eg, computer program and computer program product) for performing part or all of the methods described herein. Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, or provided on a carrier signal, or in any other form.
例如,图7示出了可以实现根据本申请的方法的电子设备。所述电子设备可以为PC机、移动终端、个人数字助理、平板电脑等。该电子设备传统上包括处理器710和存储器720及存储在所述存储器720上并可在处理器710上运行的程序代码730,所述处理器710执行所述程序代码730时实现上述实施例中所述的方法。所述存储器720可以为计算机程序产品或者计算机可读介质。存储器720可以是诸如闪存、EEPROM(电可擦除可编程只读存 储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器720具有用于执行上述方法中的任何方法步骤的计算机程序的程序代码730的存储空间7201。例如,用于程序代码730的存储空间7201可以包括分别用于实现上面的方法中的各种步骤的各个计算机程序。所述程序代码730为计算机可读代码。这些计算机程序可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。所述计算机程序包括计算机可读代码,当所述计算机可读代码在电子设备上运行时,导致所述电子设备执行根据上述实施例的方法。For example, Figure 7 shows an electronic device that can implement the method according to the present application. The electronic device may be a PC, a mobile terminal, a personal digital assistant, a tablet computer, etc. The electronic device conventionally includes a processor 710 and a memory 720 and program code 730 stored on the memory 720 and executable on the processor 710. When the processor 710 executes the program code 730, the above embodiments are implemented. the method described. The memory 720 may be a computer program product or a computer-readable medium. Memory 720 may be electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. The memory 720 has a storage space 7201 for program code 730 of a computer program for executing any of the method steps described above. For example, the storage space 7201 for the program code 730 may include various computer programs respectively used to implement various steps in the above method. The program code 730 is computer readable code. These computer programs can be read from or written into one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. The computer program includes computer readable code that, when run on an electronic device, causes the electronic device to perform the method according to the above embodiments.
本申请实施例还公开了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请实施例一所述的点云检测和分割方法的步骤。An embodiment of the present application also discloses a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the steps of the point cloud detection and segmentation method described in Embodiment 1 of the present application are implemented.
这样的计算机程序产品可以为计算机可读存储介质,该计算机可读存储介质可以具有与图7所示的电子设备中的存储器720类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩存储在所述计算机可读存储介质中。所述计算机可读存储介质通常为如参考图8所述的便携式或者固定存储单元。通常,存储单元包括计算机可读代码730’,所述计算机可读代码730’为由处理器读取的代码,这些代码被处理器执行时,实现上面所描述的方法中的各个步骤。Such a computer program product may be a computer-readable storage medium, which may have storage segments, storage spaces, etc. arranged similarly to the memory 720 in the electronic device shown in FIG. 7 . The program code may, for example, be compressed and stored in the computer-readable storage medium in a suitable form. The computer-readable storage medium is typically a portable or fixed storage unit as described with reference to FIG. 8 . Generally, the storage unit includes computer readable code 730', which is code read by a processor. When these codes are executed by the processor, each step in the method described above is implemented.
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本申请的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。Reference herein to "one embodiment," "an embodiment," or "one or more embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. In addition, please note that the examples of the word "in one embodiment" here do not necessarily all refer to the same embodiment.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the instructions provided here, a number of specific details are described. However, it is understood that embodiments of the present application may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.
在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本申请可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可 将这些单词解释为名称。In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the element claim enumerating several means, several of these means may be embodied by the same item of hardware. The use of the words first, second, third, etc. does not indicate any order. These words can be interpreted as names.
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims (11)

  1. 一种点云检测和分割方法,包括:A point cloud detection and segmentation method, including:
    对待处理点云进行柱状体素化处理,获取构成所述待处理点云的若干柱状体素;Perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed;
    对所述若干柱状体素进行特征提取和映射,获取所述待处理点云的体素特征;Perform feature extraction and mapping on the plurality of columnar voxels to obtain voxel features of the point cloud to be processed;
    将所述体素特征映射到鸟瞰图,得到所述待处理点云对应的鸟瞰图特征;Map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed;
    通过预先训练的多任务神经网络的主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量;Through the backbone network of the pre-trained multi-task neural network, feature extraction is performed on the bird's-eye view features to obtain a point cloud feature vector;
    通过所述多任务神经网络的点云检测网络分支,基于所述点云特征向量进行目标物检测,输出点云检测结果;以及,通过所述多任务神经网络的点云分割网络分支,基于所述点云特征向量进行点云分割,输出点云分割结果。Through the point cloud detection network branch of the multi-task neural network, the target object is detected based on the point cloud feature vector, and the point cloud detection result is output; and, through the point cloud segmentation network branch of the multi-task neural network, based on the The point cloud feature vector is used for point cloud segmentation and the point cloud segmentation result is output.
  2. 根据权利要求1所述的方法,其中,所述对待处理点云进行柱状体素化处理,获取构成所述待处理点云的若干柱状体素之后,还包括:The method according to claim 1, wherein after the point cloud to be processed is subjected to columnar voxelization and a plurality of columnar voxels constituting the point cloud to be processed are obtained, the method further includes:
    获取所述若干柱状体素的第一点云分割标签,其中,所述第一点云分割标签中包括:每个所述柱状体素的位置信息;Obtain first point cloud segmentation labels of the plurality of columnar voxels, wherein the first point cloud segmentation label includes: position information of each columnar voxel;
    所述点云分割结果包括:每个所述柱状体素匹配的点云语义类别,所述通过所述多任务神经网络的点云分割网络分支,基于所述点云特征向量进行点云分割,输出点云分割结果之后,还包括:The point cloud segmentation result includes: the point cloud semantic category matched by each columnar voxel, the point cloud segmentation network branch through the multi-task neural network, and the point cloud segmentation based on the point cloud feature vector, After outputting the point cloud segmentation results, it also includes:
    根据所述柱状体素的位置信息,将所述柱状体素匹配的所述点云语义类别映射到所述待处理点云中,得到所述待处理点云中点的分割结果。According to the position information of the columnar voxels, the point cloud semantic categories matched by the columnar voxels are mapped to the point cloud to be processed, and a segmentation result of the points in the point cloud to be processed is obtained.
  3. 根据权利要求2所述的方法,其中,所述根据所述柱状体素的位置信息,将所述柱状体素匹配的所述点云语义类别映射到所述待处理点云中,得到所述待处理点云中点的分割结果,包括:The method according to claim 2, wherein the point cloud semantic category matched by the columnar voxel is mapped to the point cloud to be processed according to the position information of the columnar voxel to obtain the The segmentation results of the points in the point cloud to be processed include:
    根据所述柱状体素的位置信息,获取每个所述柱状体素中包含的所述待处理点云中的点;According to the position information of the columnar voxels, obtain the points in the point cloud to be processed contained in each of the columnar voxels;
    对于每个所述柱状体素,将所述柱状体素匹配的点云语义类别,作为所述柱状体素中包含的所述点匹配的点云语义类别。For each columnar voxel, the point cloud semantic category matched by the columnar voxel is used as the point cloud semantic category matched by the points contained in the columnar voxel.
  4. 根据权利要求1所述的方法,其中,所述对所述若干柱状体素进行特征提取和映射,获取所述待处理点云的体素特征,包括:The method according to claim 1, wherein performing feature extraction and mapping on the plurality of columnar voxels to obtain the voxel features of the point cloud to be processed includes:
    对于每个所述柱状体素,获取划分至所述柱状体素中的所有点的中心点,并计算划分至所述柱状体素中的每个点与所述中心点之间的坐标距离;For each columnar voxel, obtain the center point of all points divided into the columnar voxel, and calculate the coordinate distance between each point divided into the columnar voxel and the center point;
    对于每个所述柱状体素,将划分至所述柱状体素中的所有点的点特征,拼接为所述柱状体素的体素特征,其中,每个所述点的所述点特征包括:所述点的位置坐标和反射强度信息;For each columnar voxel, point features divided into all points in the columnar voxel are spliced into voxel features of the columnar voxel, where the point features of each point include : The position coordinates and reflection intensity information of the point;
    对所述柱状体素的体素特征进行拼接,得到所述若干柱状体素的拼接特征;Splicing the voxel features of the columnar voxels to obtain the splicing features of the several columnar voxels;
    对所述拼接特征进行特征映射,获取所述待处理点云的体素特征。Perform feature mapping on the spliced features to obtain voxel features of the point cloud to be processed.
  5. 根据权利要求2所述的方法,其中,所述将所述体素特征映射到鸟瞰图,得到所述待处理点云对应的鸟瞰图特征,包括:The method according to claim 2, wherein mapping the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed includes:
    根据所述第一点云分割标签中每个所述柱状体素的所述位置信息,获取每个所述柱状体素中包括的点的数量;Obtain the number of points included in each columnar voxel according to the position information of each columnar voxel in the first point cloud segmentation label;
    对于每个所述柱状体素,根据所述柱状体素中包括的点的数量,将所述体素特征中与所述柱状体素对应的特征,映射到与所述第一点云分割标签匹配的鸟瞰图的相应位置上,得到所述待处理点云对应的鸟瞰图特征;其中,For each columnar voxel, according to the number of points included in the columnar voxel, the feature corresponding to the columnar voxel in the voxel feature is mapped to the first point cloud segmentation label At the corresponding position of the matched bird's-eye view, the bird's-eye view features corresponding to the point cloud to be processed are obtained; where,
    所述根据所述柱状体素中包括的点的数量,将所述体素特征中与所述柱状体素对应的特征,映射到与所述第一点云分割标签匹配的鸟瞰图的相应位置上,包括:and mapping, according to the number of points included in the columnar voxel, the feature of the voxel feature corresponding to the columnar voxel to the corresponding position of the bird's-eye view that matches the first point cloud segmentation label. above, including:
    在所述柱状体素中包括的点的数量大于0的情况下,将所述体素特征中与所述柱状体素对应的特征向量,映射到与所述第一点云分割标签匹配的鸟瞰图的相应位置上;When the number of points included in the columnar voxel is greater than 0, the feature vector corresponding to the columnar voxel in the voxel feature is mapped to a bird's-eye view that matches the first point cloud segmentation label. at the corresponding position in the figure;
    在所述柱状体素中包括的点的数量等于0的情况下,将与所述第一点云分割标签匹配的鸟瞰图的相应位置上的特征向量设置为0。In the case where the number of points included in the columnar voxels is equal to 0, the feature vector at the corresponding position of the bird's-eye view matching the first point cloud segmentation label is set to 0.
  6. 根据权利要求1至5任一项所述的方法,其中,预先训练的多任务神经网络包括:主干网络、点云检测网络分支,以及,点云分割网络分支,所述通过预先训练的多任务神经网络的主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量之前,还包括:The method according to any one of claims 1 to 5, wherein the pre-trained multi-task neural network includes: a backbone network, a point cloud detection network branch, and a point cloud segmentation network branch. The backbone network of the neural network, which extracts features from the bird's-eye view features and obtains the point cloud feature vector, also includes:
    基于若干体素化点云训练样本,训练多任务神经网络;Based on several voxelized point cloud training samples, train a multi-task neural network;
    其中,所述体素化点云训练样本是根据对若干点云分别进行柱状体素化处理后得到的柱状体素构建的;对于每条所述体素化点云训练样本,样本数据包括:若干柱状体素,样本标签包括:相应样本数据匹配的第二点云分割标签;所述第二点云分割标签用于标识相应样本数据中每个所述柱状体素匹配的点云语义类别真实值;所述柱状体素匹配的点云语义类别真实值为:点云中被划分至相应柱状体素内的点所覆盖的点云语义类别中,覆盖率最大的所述点云语义类别。Wherein, the voxelized point cloud training sample is constructed based on the columnar voxels obtained after columnar voxelization processing of several point clouds respectively; for each of the voxelized point cloud training samples, the sample data includes: Several columnar voxels, the sample label includes: a second point cloud segmentation label matching the corresponding sample data; the second point cloud segmentation label is used to identify the true point cloud semantic category of each columnar voxel matching in the corresponding sample data. value; the true value of the point cloud semantic category matched by the columnar voxel is: among the point cloud semantic categories covered by the points in the point cloud that are divided into corresponding columnar voxels, the point cloud semantic category with the largest coverage rate.
  7. 根据权利要求6所述的方法,其中,所述样本标签还包括:点云检测标签,所述点云检测标签用于标识相应样本数据中的目标物检测结果真实值,所述基于若干体素化点云训练样本,训练多任务神经网络,包括:The method according to claim 6, wherein the sample label further includes: a point cloud detection label, the point cloud detection label is used to identify the true value of the target detection result in the corresponding sample data, and the based on a plurality of voxels Collect point cloud training samples to train multi-task neural networks, including:
    对于每条所述体素化点云训练样本,分别执行以下点云检测和分割操作,得到相应体素化点云训练样本的点云检测结果预测值和点云分割结果预测值:For each of the voxelized point cloud training samples, the following point cloud detection and segmentation operations are performed respectively to obtain the predicted value of the point cloud detection result and the predicted value of the point cloud segmentation result of the corresponding voxelized point cloud training sample:
    对所述体素化点云训练样本中包括的若干柱状体素进行特征提取和映射,获取所述体素化点云训练样本的体素特征;Perform feature extraction and mapping on several columnar voxels included in the voxelized point cloud training sample to obtain voxel features of the voxelized point cloud training sample;
    将所述体素特征映射到鸟瞰图,得到所述体素化点云训练样本对应的鸟瞰图特征;Map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the voxelized point cloud training sample;
    通过所述主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量;Through the backbone network, feature extraction is performed on the bird's-eye view features to obtain point cloud feature vectors;
    通过所述点云检测网络分支,基于所述点云特征向量进行目标物检测,输出所述体素化点云训练样本的点云检测结果预测值;以及,通过所述点云分割网络分支,基于所述点云特征向量进行点云分割,输出所述体素化点云 训练样本点云分割结果预测值;Through the point cloud detection network branch, target object detection is performed based on the point cloud feature vector, and the point cloud detection result prediction value of the voxelized point cloud training sample is output; and, through the point cloud segmentation network branch, Perform point cloud segmentation based on the point cloud feature vector, and output the predicted value of the voxelized point cloud training sample point cloud segmentation result;
    根据各所述体素化点云训练样本的点云检测结果预测值和相应的点云检测标签,计算所述多任务神经网络的点云检测损失,以及,根据各所述体素化点云训练样本的点云分割结果预测值和相应的第二点云分割标签,计算所述多任务神经网络的点云分割损失,之后,以优化所述点云检测损失和所述点云分割损失为目标,迭代训练所述多任务神经网络。Calculate the point cloud detection loss of the multi-task neural network according to the point cloud detection result prediction value and the corresponding point cloud detection label of each voxelized point cloud training sample, and, according to each voxelized point cloud The point cloud segmentation result prediction value of the training sample and the corresponding second point cloud segmentation label are used to calculate the point cloud segmentation loss of the multi-task neural network. After that, the point cloud detection loss and the point cloud segmentation loss are optimized as The goal is to iteratively train the multi-task neural network.
  8. 一种点云检测和分割装置,包括:A point cloud detection and segmentation device, including:
    柱状体素化模块,用于对待处理点云进行柱状体素化处理,获取构成所述待处理点云的若干柱状体素;A columnar voxelization module, used to perform columnar voxelization processing on the point cloud to be processed, and obtain a number of columnar voxels that constitute the point cloud to be processed;
    体素特征获取模块,用于对所述若干柱状体素进行特征提取和映射,获取所述待处理点云的体素特征;A voxel feature acquisition module, used to perform feature extraction and mapping on the plurality of columnar voxels, and obtain the voxel features of the point cloud to be processed;
    鸟瞰图特征映射模块,用于将所述体素特征映射到鸟瞰图,得到所述待处理点云对应的鸟瞰图特征;A bird's-eye view feature mapping module, used to map the voxel features to a bird's-eye view to obtain the bird's-eye view features corresponding to the point cloud to be processed;
    点云特征提取模块,用于通过预先训练的多任务神经网络的主干网络,对所述鸟瞰图特征进行特征提取,得到点云特征向量;A point cloud feature extraction module is used to extract features of the bird's-eye view features through the backbone network of a pre-trained multi-task neural network to obtain a point cloud feature vector;
    点云检测和分割模块,用于通过所述多任务神经网络的点云检测网络分支,基于所述点云特征向量进行目标物检测,输出点云检测结果;以及,通过所述多任务神经网络的点云分割网络分支,基于所述点云特征向量进行点云分割,输出点云分割结果。A point cloud detection and segmentation module is configured to perform target object detection based on the point cloud feature vector through the point cloud detection network branch of the multi-task neural network, and output point cloud detection results; and, through the multi-task neural network The point cloud segmentation network branch performs point cloud segmentation based on the point cloud feature vector and outputs the point cloud segmentation result.
  9. 一种电子设备,包括存储器、处理器及存储在所述存储器上并可在处理器上运行的程序代码,所述处理器执行所述程序代码时实现权利要求1至7任一项所述的点云检测和分割方法。An electronic device, including a memory, a processor, and a program code stored in the memory and executable on the processor. When the processor executes the program code, it implements the method described in any one of claims 1 to 7. Point cloud detection and segmentation methods.
  10. 一种计算机可读存储介质,其上存储有程序代码,该程序代码被处理器执行时实现权利要求1至7任一项所述的点云检测和分割方法的步骤。A computer-readable storage medium having program code stored thereon, which implements the steps of the point cloud detection and segmentation method described in any one of claims 1 to 7 when executed by a processor.
  11. 一种计算机程序产品,包括计算机可读代码,当所述计算机可读代 码在电子设备上运行时,导致所述电子设备执行根据权利要求1至7中的任意一项所述的点云检测和分割方法。A computer program product comprising computer readable code that, when run on an electronic device, causes the electronic device to perform point cloud detection according to any one of claims 1 to 7 and Segmentation method.
PCT/CN2022/117322 2022-04-06 2022-09-06 Point cloud detection and segmentation method and apparatus, and electronic device WO2023193400A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210353486.1 2022-04-06
CN202210353486.1A CN114820463A (en) 2022-04-06 2022-04-06 Point cloud detection and segmentation method and device, and electronic equipment

Publications (1)

Publication Number Publication Date
WO2023193400A1 true WO2023193400A1 (en) 2023-10-12

Family

ID=82533341

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117322 WO2023193400A1 (en) 2022-04-06 2022-09-06 Point cloud detection and segmentation method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN114820463A (en)
WO (1) WO2023193400A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114820463A (en) * 2022-04-06 2022-07-29 合众新能源汽车有限公司 Point cloud detection and segmentation method and device, and electronic equipment
CN115358413A (en) * 2022-09-14 2022-11-18 清华大学 Point cloud multitask model training method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476822A (en) * 2020-04-08 2020-07-31 浙江大学 Laser radar target detection and motion tracking method based on scene flow
CN111862101A (en) * 2020-07-15 2020-10-30 西安交通大学 3D point cloud semantic segmentation method under aerial view coding visual angle
CN114140470A (en) * 2021-12-07 2022-03-04 群周科技(上海)有限公司 Ground object semantic segmentation method based on helicopter airborne laser radar
CN114820463A (en) * 2022-04-06 2022-07-29 合众新能源汽车有限公司 Point cloud detection and segmentation method and device, and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476822A (en) * 2020-04-08 2020-07-31 浙江大学 Laser radar target detection and motion tracking method based on scene flow
CN111862101A (en) * 2020-07-15 2020-10-30 西安交通大学 3D point cloud semantic segmentation method under aerial view coding visual angle
CN114140470A (en) * 2021-12-07 2022-03-04 群周科技(上海)有限公司 Ground object semantic segmentation method based on helicopter airborne laser radar
CN114820463A (en) * 2022-04-06 2022-07-29 合众新能源汽车有限公司 Point cloud detection and segmentation method and device, and electronic equipment

Also Published As

Publication number Publication date
CN114820463A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
WO2023193400A1 (en) Point cloud detection and segmentation method and apparatus, and electronic device
US11037305B2 (en) Method and apparatus for processing point cloud data
US9424493B2 (en) Generic object detection in images
US9147255B1 (en) Rapid object detection by combining structural information from image segmentation with bio-inspired attentional mechanisms
Liu et al. Fg-net: A fast and accurate framework for large-scale lidar point cloud understanding
CN111242122B (en) Lightweight deep neural network rotating target detection method and system
WO2023193401A1 (en) Point cloud detection model training method and apparatus, electronic device, and storage medium
WO2021217924A1 (en) Method and apparatus for identifying vehicle type at traffic checkpoint, and device and storage medium
CN112016638B (en) Method, device and equipment for identifying steel bar cluster and storage medium
Shen et al. Vehicle detection in aerial images based on lightweight deep convolutional network and generative adversarial network
Karim et al. A brief review and challenges of object detection in optical remote sensing imagery
CN113762003B (en) Target object detection method, device, equipment and storage medium
Guo et al. DF-SSD: a deep convolutional neural network-based embedded lightweight object detection framework for remote sensing imagery
WO2019100348A1 (en) Image retrieval method and device, and image library generation method and device
Shao et al. Semantic segmentation for free space and lane based on grid-based interest point detection
Zhang et al. Recognition of bird nests on power transmission lines in aerial images based on improved YOLOv4
CN112200191B (en) Image processing method, image processing device, computing equipment and medium
CN114115993A (en) Device for use in a processing apparatus and device and method for an artificial neural network
Geng et al. SANet: A novel segmented attention mechanism and multi-level information fusion network for 6D object pose estimation
CN115620081A (en) Training method of target detection model, target detection method and device
CN114627183A (en) Laser point cloud 3D target detection method
Marine et al. Pothole Detection on Urban Roads Using YOLOv8
Li et al. An FPGA-based tree crown detection approach for remote sensing images
Yang et al. Improved YOLOv4 based on dilated coordinate attention for object detection
CN116152345B (en) Real-time object 6D pose and distance estimation method for embedded system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22936328

Country of ref document: EP

Kind code of ref document: A1