WO2022141718A1

WO2022141718A1 - Method and system for assisting point cloud-based object detection

Info

Publication number: WO2022141718A1
Application number: PCT/CN2021/074199
Authority: WO
Inventors: 张翔; 黄尚锋; 杜静; 夏启明; 陈延行; 江文涛
Original assignee: 罗普特科技集团股份有限公司; 罗普特(厦门)系统集成有限公司
Priority date: 2020-12-31
Filing date: 2021-01-28
Publication date: 2022-07-07
Also published as: CN112734931A; CN112734931B

Abstract

The present disclosure relates to the technical field of computers, and specifically to a method and system for assisting point cloud-based object detection. The method and system for assisting point cloud-based object detection provided in embodiments of the present application have the following advantages: 1. the number of sampled points processed is reduced, and the calculation speed is increased; 2. according to the feature of an intermediate point of an initial target point cloud set, a fully connected layer is used to output the probability that each sampled point in the initial target point cloud set is a foreground point or a background point, so that the final semantic segmentation effect is more obvious, and then the display result of point cloud-based object detection is more obvious; 3. a precise three-dimensional object bounding box position can be obtained, and thus the method achieves high precision.

Description

A method and system for assisting point cloud target detection

Related applications

This application claims the priority of Chinese Patent Application No. 202011633104.8 filed on December 31, 2020, the entire contents of which are incorporated herein by reference.

technical field

The present disclosure belongs to the technical field of computers, and in particular relates to a method and system for assisting point cloud target detection.

Background technique

3D Laser Scanning Technology (3D Laser Scanning Technology) can continuously, quickly and massively acquire 3D point cloud data on the surface of objects, namely point cloud. At present, this technology is used in the field of autonomous driving, that is, the vehicle-mounted lidar is used to quickly scan the objects in front of the vehicle, to obtain a large number of point clouds rich in spatial structure information, and then according to the obtained point cloud data, the point cloud object-based target in front of the vehicle is carried out. detection. This technology is widely used in the autonomous driving area.

In the field of artificial intelligence, 3D object detection has attracted the attention of more and more scholars. This technology plays an important role in autonomous driving, robot trajectory planning, virtual reality, etc. According to the input format, 3D object detection methods are mainly divided into image-based 3D object detection, point cloud-based 3D object detection, and 3D object detection based on image and point cloud combination.

Among them, the point cloud-based 3D object detection directly uses the vehicle-mounted lidar data to estimate the 3D detection frame. At present, there are two main methods for directly processing raw point clouds. The first method is to convert the entire point cloud to voxels and then predict 3D detection boxes on the voxels. However, these methods not only lose rich spatial structure information, but also inevitably utilize 3D CNN to learn features from voxels, which causes a large amount of computation. In 2017, Yin Zhou et al. proposed VoxelNet, which uses PointNet network to learn point cloud features from voxels after voxelization of point clouds, while SECOND proposed by Yan Y et al. uses sparse convolutional layers instead of PointNet. These methods inevitably generate a large amount of computation. In order to reduce the computational load caused by voxelization, the PointPillars proposed by Lang A H et al. use voxel columns to replace voxel grids, but still cause some large computational load.

Another way uses the original point cloud directly as input without making any changes to the point cloud. Because PointNet and PointNet++ proposed by QiCR et al. have achieved great success in point clouds, more and more 3D object detection methods have realized the task of directly processing point clouds by using them. Specifically, in 2019, Shi S et al. proposed the two-stage network PointRCNN, which first used PointNet++ as the semantic segmentation backbone network to distinguish foreground points and background points. Then, the 3D detection box is estimated from the foreground points. In the same year, the two-stage network STD proposed by Yang Z et al. also used PointNet++ to learn point-by-point features of point clouds, and converted the internal point features of candidate target boxes from sparse representation to dense representation through the proposed PointsPool module. By summarizing these two two-stage methods, it is found that the first-step semantic segmentation network plays a crucial role and directly affects the final performance. However, its unbearable reasoning time makes it difficult to apply it in practical autonomous driving systems, so improving this technical difficulty has become an important task in this work.

Unlike the two-stage approach, the one-stage direct-utilization point cloud approach is known for its efficiency. In 2020, Shi W et al. proposed PointGNN and proposed a network that uses GNN to extract point-like features. This takes full advantage of GNNs to perceive spatial structure and achieves excellent performance on KITTI. GNN can perceive the spatial structure information of point cloud well, but it requires that the point cloud cannot be downsampled. So this means that the network has to GNN the whole point cloud. Therefore, the computational cost is high compared to other one-stage methods.

public content

The purpose of the embodiments of the present application is to propose a method and system for assisting point cloud target detection, which has both the efficiency of a one-stage network and the accuracy of a two-stage network, and solves the problems existing in the above background technology.

In a first aspect, an embodiment of the present application provides a method for assisting point cloud target detection, and the steps of the method include:

S100, segment the initial target point cloud set from the overall point cloud scene;

S200. Perform feature aggregation on the initial target point cloud set to obtain the spatial structure information of each sampling point in the initial target point cloud set, and assign the spatial structure information to the corresponding sampling points to obtain the sampling target point cloud set;

S300. Expand the target point cloud set back to the initial target point cloud set through three-point linear interpolation, and update the feature of the midpoint of the initial target point cloud set by interpolation, and use a fully connected layer to output the output according to the feature of the midpoint of the initial target point cloud set The probability that each sampling point in the initial target point cloud set is a foreground point or a background point;

S400. Input the sampling target point cloud set into the multilayer perceptron, output the offset of the sampling target point cloud set to the corresponding center point of the real object, and then add the offset and the feature of the sampling target point cloud set to obtain the prediction center Point features, and finally generate a three-dimensional target detection frame according to the features of the predicted center point.

In some embodiments, the step of feature aggregation in step S200 includes:

S201, down-sampling the initial target point cloud set to obtain K interest sampling points;

S202, selecting the interest sampling points to obtain K' quality sampling points;

S203, weighting the adjacent points around the quality sampling point to obtain the feature matrix of each adjacent point after the weighting;

S204, using the features of the weighted adjacent points to update the sampling point features in the initial target point cloud set to obtain the process target point cloud set;

S205. Perform down-sampling on the process target point cloud set to obtain new K interest sampling points, and repeat operations S202-S204 to obtain the second process target point cloud set or sampling target point cloud set. When the second process target point cloud set is obtained, repeat step S205 until the second process target point cloud set is obtained. Finally, the sampling target point cloud set is obtained.

Perform down-sampling-select-point-weighting-update processing on the initial target point cloud set, and repeat as many times as required until a satisfactory sampling target point cloud set is obtained, in which the interest sampling points are the sampling points left by the down-sampling operation, The quality sampling points are the sampling points selected by the interest sampling points in the same cycle after denoising and feature learning to obtain good or bad scores. Such an operation makes the sampling target point cloud set not only include as many foreground points as possible obtained by sampling the farthest point of the feature, but also a point cloud set that can preserve the overall shape of the point cloud as much as possible by sampling the farthest point. Step S300 And lay the foundation for the smooth execution of step S400.

In some embodiments, the specific method of downsampling in step 201 includes: inputting an initial target point cloud set, using distance-based farthest point sampling or independent sampling of eigenvalue-based farthest point sampling to collect the initial target point cloud set The sampling points are down-sampled, and K interest sampling points are obtained after down-sampling. Such an operation enables the obtained K interest sampling points to preserve as many foreground points as possible, and also preserve the shape of the point cloud as much as possible.

In some embodiments, the specific method selected in step 202 includes: denoising the K interest sampling points, using the number of adjacent points around the interest sampling point and the distance information from the adjacent points to the interest sampling point to obtain through feature learning The quality score of each interest sampling point, according to the quality score, select the first K' quality sampling points with better quality from the K interest sampling points. Such an operation can effectively remove sampling points that are noise points and sampling points with sparse spatial structure information, which facilitates subsequent weighting of adjacent points by their contribution degrees.

In some embodiments, the specific method for weighting in step 203 includes: for the K' quality sampling points with itself as the center, randomly sampling m adjacent points in a spherical area with radius r, and using the characteristics of the quality sampling points , The characteristics of the adjacent points and the relative coordinates of the adjacent points and the quality sampling points are used as input to calculate the contribution of each adjacent point to the quality sampling points, and the product of the characteristics of each adjacent point and its contribution to the quality sampling points is weighted. features of each adjacent point. Such an operation can obtain the feature matrix of the adjacent points after weighting, which is convenient for the next update point operation.

In some embodiments, the value range of the contribution degree in step S203 is 0-1. Such a setup can make the empowerment results more reasonable and scientific.

In some embodiments, the specific method for updating the sampling points in the initial target point cloud set in step 204 includes: using the MaxPooling operation to obtain the most obvious feature on each channel corresponding to the feature of the adjacent point after weighting to generate a new feature , so as to form the process target point cloud set.

Such an operation enables the process target point cloud set to perceive the spatial structure well, and makes the prediction of the center point of the final 3D target detection frame more accurate. At the same time, the downsampling operation reduces the number of sampling points to be processed, improves the operation speed, and achieves both high efficiency. And precise.

In some embodiments, the specific operation mode of step 205 is: input the process target point cloud set, use the joint sampling of distance-based farthest point sampling and eigenvalue-based farthest point sampling, to sample points in the process target point cloud set Perform downsampling to obtain new K interest sampling points, and then perform operations S202-S204 to obtain a second process target point cloud set or a sampling target point cloud set. When the second process target point cloud set is obtained, repeat step S205 until the sampling target point cloud set is finally obtained.

Repeated operations can further improve the center point prediction of the final 3D object detection frame as needed.

In some embodiments, step S205 is performed for 1 to 4 times. The number of point clouds after downsampling can be set for each downsampling, and the number of executions from 1 to 4 times can more reasonably allocate the number of downsampled point clouds, so that the calculation time and the prediction accuracy of the final 3D target detection frame are within a reasonable range. Too many execution times can easily lead to too long operation time, and too few execution times can easily lead to insufficient precision.

In some embodiments, step S300 further includes: penalizing the wrong labels of predicted foreground points or background points by using the F _local loss loss function. As an auxiliary supervision method, the F _local loss function can effectively supervise the accuracy of the final prediction results.

In some embodiments, the step of generating the three-dimensional target detection frame in step S400 includes: first predefining the length, width and height of the three-dimensional target detection frame, and then using the feature of the predicted center point as an input, using a fully connected layer to output relative The difference and rotation angle of the pre-defined length, width and height are used to obtain the final three-dimensional target detection frame. Such operations make the final generated 3D object detection frame more realistic.

In a second aspect, the present application provides a system for assisting point cloud target detection, the system comprising:

an extraction module, the extraction module is configured to segment the initial target point cloud set from the overall point cloud scene;

The feature aggregation module is configured to perform feature aggregation on the initial target point cloud set to obtain the spatial structure information of each sampling point in the initial target point cloud set, and assign the spatial structure information to the corresponding sampling points to obtain the sampling target point cloud set;

feature propagation module, the feature propagation module is configured to expand the sample target point cloud set back to the initial target point cloud set through three-point linear interpolation, and update the feature of the midpoint of the initial target point cloud set by interpolation, according to the midpoint of the initial target point cloud set The features of , use a fully connected layer to output the probability that each sampling point in the initial target point cloud set is a foreground point or a background point;

Detection frame generation module, the detection frame generation module is configured to input the sampling target point cloud set into the multilayer perceptron, output the offset of the sampling target point cloud set to the corresponding real object center point, and then compare the offset with the sampling The features of the target point cloud set are added to obtain the features of the predicted center point, and finally a three-dimensional object detection frame is generated according to the features of the predicted center point.

In some embodiments, the feature aggregation module includes:

A downsampling module, the downsampling module is configured to downsample the initial target point cloud set to obtain K interest sampling points;

point selection module, the point selection module is configured to select the interest sampling points to obtain K' quality sampling points;

Weighting module, the weighting module is configured to weight the adjacent points around the quality sampling point to obtain the feature matrix of each adjacent point after the weighting;

a sampling point update module, the sampling point update module is configured to update the sampling point features in the initial target point cloud set by using the features of the weighted adjacent points to obtain the process target point cloud set;

The loop module, the loop module is configured to downsample the process target point cloud set to obtain new K interest sampling points, and then repeat the operations S202-S204 to obtain the second process target point cloud set or the sampling target point cloud set. When the second process target point cloud set is obtained When the point cloud is collected, the step S205 is repeated until the sampling target point cloud set is finally obtained.

The combination of multiple modules in the feature aggregation module can make the process target point cloud set to perceive the spatial structure well, make the prediction of the center point of the final 3D target detection frame more accurate, and reduce the number of sampling points to be processed through the downsampling operation. , improve the operation speed, and achieve both high efficiency and precision.

In a third aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method according to any one of the foregoing first aspects is implemented.

The method and system for assisting point cloud target detection provided by the embodiments of the present application have the following advantages: 1. By performing feature aggregation on the initial target point cloud set, the spatial structure information of each sampling point in the initial target point cloud set is obtained, and The spatial structure information is assigned to the corresponding sampling points, and the sampling target point cloud set is obtained. In the aggregation process, downsampling is used to reduce the number of processing sampling points, improve the operation speed, and make the method more efficient; 2. By dividing the sampling target The point cloud set is expanded back to the initial target point cloud set through three-point linear interpolation, and the feature of the midpoint of the initial target point cloud set is updated by interpolation. According to the feature of the midpoint of the initial target point cloud set, a fully connected layer is used to output the initial target point cloud. Concentrating the probability of whether each sampling point is a foreground point or a background point makes the final semantic segmentation effect more obvious, so that the display result of point cloud target detection is more obvious. 3. Input the sampling target point cloud set into the multi-layer perceptron, output the offset of the sampling target point cloud set to the corresponding real object center point, and then add the offset and the characteristics of the sampling target point cloud set to obtain the prediction center Finally, the three-dimensional target detection frame is generated according to the characteristics of the predicted center point, and the accurate position of the three-dimensional target detection frame can be obtained, which makes the method more accurate.

Description of drawings

Other features, objects and advantages of the present application will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

1 is an exemplary base flow diagram according to an embodiment of the present disclosure;

2 is a schematic flowchart of a feature aggregation step in a method for assisting point cloud target detection according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a system for assisting point cloud target detection according to an embodiment of the present disclosure.

FIG. 4 is a schematic structural diagram of a feature aggregation module in a system for assisting point cloud target detection according to an embodiment of the present disclosure.

Detailed ways

The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related disclosure, but not to limit the disclosure. In addition, it should be noted that, for the convenience of description, only the parts related to the relevant disclosure are shown in the drawings.

It should be noted that the embodiments in this application and the features of the embodiments may be combined with each other under the condition of no conflict. The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

FIG. 1 shows an exemplary basic flow of a method for assisting point cloud target detection of the present disclosure.

As shown in Figure 1, the basic process includes:

Step S100, segmenting the initial target point cloud set from the overall point cloud scene;

In some embodiments of the present disclosure, a full-pixel semantic segmentation method is used to segment the initial target point cloud set from the overall point cloud scene, assuming that the initial target point cloud set is point set A. The semantic segmentation method can classify each pixel at the pixel level, so that the pixels of the same class are classified into the same class, which is more conducive to extracting the initial target point cloud set.

Step S200, performing feature aggregation on the initial target point cloud set to obtain the spatial structure information of each sampling point in the initial target point cloud set, and assigning the spatial structure information to the corresponding sampling point to obtain the sampling target point cloud set;

With reference to FIG. 2 , in some embodiments of the present disclosure, a specific implementation manner of step S200 includes:

S201, down-sampling the initial target point cloud set to obtain K interest sampling points; the specific operation method is: input point set A, use distance-based farthest point sampling or eigenvalue-based farthest point sampling independent sampling, The sampling points in the target point cloud set are down-sampled, and K interest sampling points are obtained after down-sampling. Here, the sampling method based on the farthest point can better maintain the shape of the point cloud, and the farthest point sampling based on the eigenvalue can obtain as many foreground points as possible, and independently sample them so that the final K interest sampling points have both has the advantages of both.

S202, selecting the interest sampling points to obtain K' quality sampling points; the specific operation method is: denoising the K interest sampling points, using the number of adjacent points around the interest sampling point and the distance from the adjacent points to the interest sampling point information, obtain the quality score of each interest sampling point through feature learning, and select the first K' quality sampling points with better quality from the K interest sampling points according to the quality score. The K interest sampling points obtained in S201 may include points with relatively sparse spatial structure information or sampling points themselves that are noise points, which can be denoised in step 202, and the feature learning method is used to obtain each sampling point. For the quality score, sort the sampling points according to the quality score, and select the top K' quality sampling points with good quality scores. Relative to the interest sampling point, the quality sampling point retains the sampling point set with richer spatial structure information and higher quality score.

S203, weighting the adjacent points around the quality sampling point to obtain the feature matrix of each adjacent point after the weighting; the specific operation method is: for the K' quality sampling points with itself as the center of the circle, in a spherical area with a radius of r Randomly sample m nearby points, and use the characteristics of the quality sampling points, the characteristics of the adjacent points, and the relative coordinates of the adjacent points and the quality sampling points as input to calculate the contribution of each adjacent point to the quality sampling point, and find the contribution of each adjacent point to the quality sampling point. The product of the feature of the point and its contribution to the quality of the sampling point gives the feature of each neighboring point after weighting. The value range of the contribution degree is 0 to 1. Such a setup can make the empowerment results more reasonable and scientific.

In this embodiment, it is assumed that W∈R ^k'*m is the contribution degree matrix of all adjacent points, F∈R ^k'*m*c is the set of feature vectors of all adjacent points, and c is the depth of adjacent point features. The feature matrix obtained after the adjacent point weight module is F'=W*F, where F'∈R ^k'*m*c .

S204, using the features of the weighted adjacent points to update the sampling point features in the initial target point cloud set to obtain the process target point cloud set; the specific operation mode is: using the MaxPooling operation to obtain the weighted adjacent point features on each channel corresponding to the feature The most obvious feature generates a new feature, thus forming the process target point cloud set, which is set as point set B.

In this embodiment, for the feature matrix F'∈R ^k'*m*c obtained in the previous step, the network uses a MaxPooling operation to obtain the most obvious feature on each channel of the feature matrix F' to generate a new feature. It can be expressed as New_F∈R ^k'*c . At the same time, MaxPooling can also solve the problem of point cloud disorder. New_F is the feature set of point set B.

S205. Perform down-sampling on the process target point cloud set to obtain new K interest sampling points, and repeat operations S202-S204 to obtain the second process target point cloud set or sampling target point cloud set. When the second process target point cloud set is obtained, repeat step S205 until the second process target point cloud set is obtained. Finally, the sampling target point cloud set is obtained; the specific operation method is as follows: input the process target point cloud set, and use the joint sampling of the farthest point sampling based on distance and the farthest point sampling based on eigenvalues to down-sample the sampling points in the process target point cloud set. Sampling, and then perform operations S202-S204 to obtain a second process target point cloud set or a sampling target point cloud set. When the second process target point cloud set is obtained, repeat step S205 until the sampling target point cloud set is finally obtained.

Step S205 is executed for 1 to 4 times. The number of point clouds after downsampling can be set for each downsampling, and the number of executions from 1 to 4 times can more reasonably allocate the number of downsampled point clouds, so that the calculation time and the prediction accuracy of the final 3D target detection frame are within a reasonable range. Too many execution times can easily lead to too long operation time, and too few execution times can easily lead to insufficient precision.

In this embodiment, it is assumed that the number of repetitions is 2 times. Then, the downsampling process for point set B firstly performs distance-based farthest point sampling to obtain point set C, then performs the farthest point sampling based on eigenvalues to obtain point set D, and then performs step 202 on point set C and point set D. -204 Operation obtains the feature sets of the point set C and the point set D. At this time, the point set C and the point set D together constitute the second process target point cloud set. The second process point cloud set is then downsampled. At this time, the point set C obtained by sampling the farthest point based on the distance continues to obtain the point set E based on the sampling of the farthest point based on the distance, and the point obtained by sampling the farthest point based on the eigenvalue. Set D continues to obtain the point set G according to the farthest point sampling based on the feature value, and then performs steps 202-204 on the point set E and the point set G to obtain the feature sets of the point set E and the point set G. At this time, the point set E and the point set G are The set G together constitutes the sampling target point cloud set.

As shown in Figure 1, the basic process also includes step S300, expanding the sampling target point cloud set back to the initial target point cloud set through three-point linear interpolation, and updating the feature of the midpoint of the initial target point cloud set by interpolation, according to the initial target point cloud set. The feature of the midpoint of the point cloud set, using a fully connected layer to output the probability that each sampling point in the initial target point cloud set is a foreground point or a background point;

In this embodiment, the sampling target point cloud set input in this step mainly uses the point set E obtained by sampling based on the farthest point, and then uses the three-point linear interpolation method (this method is the prior art, and will not be further explained) Expand the point set E back to the high-resolution point set C and update the features of the midpoint of the point set C, and then perform three-point linear interpolation on the point set C to expand it back to the higher-resolution point set B and update the point set B. feature, and then perform three-point linear interpolation on point set B to expand back to the original high-resolution point cloud scene, which is point set A, and update the features of points in point set A. Finally, a fully connected layer is used to output the probability that each point is a foreground point or a background point.

In some implementations of this embodiment, the F _local loss function may also be used to penalize the wrong labels for predicting the foreground points or background points. As an auxiliary supervision method, the F _local loss function can effectively supervise the accuracy of the final prediction results.

Step S400: Input the sampling target point cloud set into the multi-layer perceptron, output the offset of the sampling target point cloud set to the corresponding center point of the real object, and then add the offset to the feature of the sampling target point cloud set to obtain a prediction The features of the center point are finally generated according to the features of the predicted center point to generate a three-dimensional target detection frame.

In this embodiment, the sampling target point cloud set mainly uses the point set G obtained by sampling the farthest point based on the feature value. The offset of the center point of the real object, and the predicted center point position is obtained by adding the offset and the G coordinate of the point set. The features of the predicted center point can be obtained by assigning weights to adjacent points in step 203 and updating the sampling points in step 204 .

In some implementations of this embodiment, the method of generating the 3D target detection frame may be by predefining the length, width and height of the 3D target detection frame, and then using the feature of the predicted center point as input, using a fully connected layer Output the difference and rotation angle relative to the pre-defined length, width and height, so as to obtain the final three-dimensional target detection frame.

In order to implement the above method, the present application also provides a system for assisting point cloud target detection.

As shown in FIG. 3 , a system 500 for assisting point cloud target detection includes an extraction module 510 , a feature aggregation module 520 , a feature propagation module 530 and a detection frame generation module 540 . in,

The extraction module 510 is used to segment the initial target point cloud set from the overall point cloud scene.

The feature aggregation module 520 is configured to perform feature aggregation on the initial target point cloud set to obtain the spatial structure information of each sampling point in the initial target point cloud set, and assign the spatial structure information to the corresponding sampling points to obtain the sampling target point cloud set.

4, in some embodiments of the present disclosure, the feature aggregation module 520 includes:

The downsampling module 521 is used to downsample the initial target point cloud set to obtain K interest sampling points;

The point selection module 522 is used to select the interest sampling points to obtain K' quality sampling points;

The weighting module 523 is used to weight the adjacent points around the quality sampling point to obtain the feature matrix of each adjacent point after the weighting;

The sampling point updating module 524 is used to update the sampling point feature in the initial target point cloud set with the feature of the adjacent point after the weighting to obtain the process target point cloud set;

The loop module 525 is used to downsample the process target point cloud set to obtain new K interest sampling points, and then repeat the operations S202-S204 to obtain the second process target point cloud set or the sampling target point cloud set, and repeat when the second process target point cloud set is obtained. Step S205 until the sampling target point cloud set is finally obtained.

As shown in FIG. 3 , a system 500 for assisting point cloud target detection further includes: a feature propagation module 530 for expanding the sample target point cloud set back to the initial target point cloud set through three-point linear interpolation, and updating the initial target point by interpolation The feature of the midpoint of the cloud set, according to the feature of the midpoint of the initial target point cloud set, uses a fully connected layer to output the probability that each sampling point in the initial target point cloud set is a foreground point or a background point.

The detection frame generation module 540 is used to input the sampling target point cloud set into the multi-layer perceptron, output the offset of the sampling target point cloud set to the corresponding real object center point, and then compare the offset with the characteristics of the sampling target point cloud set After the addition, the features of the predicted center point are obtained, and finally a three-dimensional target detection frame is generated according to the features of the predicted center point.

In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. The computer-readable storage medium described in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above. In this application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable storage medium other than a computer-readable storage medium that can be sent, propagated, or transmitted for use by or in connection with the instruction execution system, apparatus, or device program of. Program code embodied on a computer-readable storage medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional procedural programming language - such as "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

The modules involved in the embodiments of the present application may be implemented in a software manner, and may also be implemented in a hardware manner. The described modules can also be provided in a processor, for example, it can be described as: a processor includes an acquisition module, an analysis module and an output module. Among them, the names of these modules do not constitute a limitation on the module itself under certain circumstances.

The above description is only a preferred embodiment of the present application and an illustration of the applied technical principles. Those skilled in the art should understand that the scope of disclosure involved in this application is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions made of the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above-mentioned features with the technical features disclosed in this application (but not limited to) with similar functions.

Claims

A method for assisting point cloud target detection, characterized in that: the steps of the method include:

S100, segment the initial target point cloud set from the overall point cloud scene;

S200. Perform feature aggregation on the initial target point cloud set to obtain spatial structure information of each sampling point in the initial target point cloud set, and assign the spatial structure information to the corresponding sampling points to obtain a sampling target point cloud collection;

S300. Expand the sampling target point cloud set back to the initial target point cloud set through three-point linear interpolation, and update the feature of the midpoint of the initial target point cloud set by interpolation, according to the characteristics of the midpoint of the initial target point cloud set. feature, using a fully connected layer to output the probability that each sampling point in the initial target point cloud set is a foreground point or a background point;

S400. Input the sampling target point cloud set into a multilayer perceptron, output the offset of the sampling target point cloud set to the corresponding center point of the real object, and then combine the offset with the sampling target point cloud set After adding the features of the predicted center point, the feature of the predicted center point is obtained, and finally a three-dimensional target detection frame is generated according to the feature of the predicted center point.
The method according to claim 1, wherein the step of feature aggregation in the step S200 comprises:

S201, down-sampling the initial target point cloud set to obtain K interest sampling points;

S202, selecting the interest sampling points to obtain K' quality sampling points;

S203, weight the adjacent points around the quality sampling point to obtain the feature matrix of each of the adjacent points after the weighting;

S204, utilize the feature of the adjacent point after the weighting to update the sampling point feature in the initial target point cloud set to obtain the process target point cloud set;

S205. Perform down-sampling on the process target point cloud set to obtain new K interest sampling points, and repeat operations S202-S204 to obtain the second process target point cloud set or the sampling target point cloud set. When the second process target point cloud is obtained Step S205 is repeated when the cloud is gathered until the cloud set of sampling target points is finally obtained.
The method according to claim 2, wherein: the specific method of downsampling in step 201 comprises: inputting the initial target point cloud set, using distance-based farthest point sampling or eigenvalue-based farthest point sampling independent sampling, down-sampling the sampling points in the initial target point cloud set, and obtains K interest sampling points after down-sampling.
The method according to claim 2, wherein: the specific method for selecting in step 202 comprises: denoising the K interest sampling points, using the number of adjacent points around the interest sampling point and the adjacent The distance information from the point to the interest sampling point, the quality score of each interest sampling point is obtained through feature learning, and according to the quality score, the first K' quality comparison points are selected from the K interest sampling points. Good quality sampling points.
The method according to claim 2, wherein: the specific method of weighting in step 203 comprises: for the K' quality sampling points, taking itself as the center of the circle, randomly sampling in a spherical area with a radius of r m adjacent points, using the characteristics of the quality sampling points, the characteristics of the adjacent points, and the relative coordinates of the adjacent points and the quality sampling points as input to calculate the effect of each adjacent point on the quality sampling points Contribution degree, calculate the product of the feature of each adjacent point and its contribution degree to the quality sampling point to obtain the feature of each adjacent point after weighting.
The method according to claim 5, wherein the value range of the contribution degree in step S203 is 0-1.
The method according to claim 2, wherein the specific method for updating the sampling points in the initial target point cloud set in step 204 comprises: using a MaxPooling operation to obtain the corresponding features of the weighted adjacent points. The most obvious feature on each channel generates a new feature, which forms the process target point cloud set.
The method according to claim 2, wherein: the specific operation mode of step 205 is: inputting the process target point cloud set, using the joint sampling of the farthest point sampling based on distance and the farthest point sampling based on eigenvalue, Perform downsampling on the sampling points in the process target point cloud set, and then perform operations S202-S204 to obtain the second process target point cloud set or the sampling target point cloud set, and repeat the step S205 when the second process target point cloud set is obtained. Until the sampling target point cloud set is finally obtained.
The method according to claim 8, wherein step S205 is performed for 1 to 4 times.
The method according to claim 1, characterized in that: the step S300 further comprises: penalizing an incorrectly predicted foreground point or background point through the Flocal loss function.
The method according to claim 1, wherein the step of generating the three-dimensional target detection frame in the step S400 comprises: first predefining the length, width and height of the three-dimensional target detection frame, and then predicting the The feature of the center point is used as input, and a fully connected layer is used to output the difference and rotation angle relative to the pre-defined length, width, and height, so as to obtain the final three-dimensional target detection frame.
A system for assisting point cloud target detection, characterized in that: the system comprises:

an extraction module configured to segment the initial target point cloud set from the overall point cloud scene;

A feature aggregation module, which is configured to perform feature aggregation on the initial target point cloud set to obtain spatial structure information of each sampling point in the initial target point cloud set, and assign the spatial structure information to a value For the corresponding sampling points, obtain the sampling target point cloud set;

a feature propagation module, the feature propagation module is configured to expand the sample target point cloud set back to the initial target point cloud set through three-point linear interpolation, and update the feature of the midpoint of the initial target point cloud set by interpolation, According to the feature of the midpoint of the initial target point cloud set, a fully connected layer is used to output the probability that each sampling point in the initial target point cloud set is a foreground point or a background point;

A detection frame generation module, the detection frame generation module is configured to input the sampling target point cloud set into the multilayer perceptron, output the offset of the sampling target point cloud set to the corresponding real object center point, and then The feature of the predicted center point is obtained by adding the offset and the feature of the sampled target point cloud set, and finally a three-dimensional object detection frame is generated according to the feature of the predicted center point.
The system for assisting point cloud target detection according to claim 12, wherein the feature aggregation module comprises:

A downsampling module, the downsampling module is configured to downsample the initial target point cloud set to obtain K interest sampling points;

a point selection module, the point selection module is configured to select the interest sampling points to obtain K' quality sampling points;

A weighting module, the weighting module is configured to weight the adjacent points around the quality sampling point to obtain the feature matrix of each of the adjacent points after the weighting;

a sampling point update module, the sampling point update module is configured to update the sampling point features in the initial target point cloud set by using the feature of the weighted adjacent point to obtain a process target point cloud set;

A loop module, the loop module is configured to downsample the process target point cloud set to obtain new K interest sampling points, and then repeat operations S202-S204 to obtain a second process target point cloud set or the sampling target point cloud set, When the second process target point cloud set is obtained, step S205 is repeated until the sampling target point cloud set is finally obtained.
A computer-readable storage medium in which a computer program is stored, which, when executed by a processor, implements the method according to any one of claims 1-11.