WO2023072055A1

WO2023072055A1 - Point cloud data processing method and system

Info

Publication number: WO2023072055A1
Application number: PCT/CN2022/127326
Authority: WO
Inventors: 李亚敏; 晋周南; 朱小天
Original assignee: 华为技术有限公司
Priority date: 2021-10-27
Filing date: 2022-10-25
Publication date: 2023-05-04
Also published as: CN116052155A

Abstract

A point cloud data processing method and system, which relate to the technical field of data processing. The method comprises: acquiring a first detection frame, the first detection frame being used for indicating a static object labeled in a first point cloud map, and the first point cloud map being obtained by M point cloud data frames; acquiring a second detection frame, the second detection frame being used for indicating a dynamic object labeled in the first point cloud data, the first point cloud data being obtained by N point cloud data frames, and N<M; fusing single-frame point cloud data among the M point cloud data frames according to the second detection frame to obtain a second point cloud map; and correcting the first detection frame and the second detection frame according to the second point cloud map to obtain a target detection frame. By fusing the detection result of the first point cloud map obtained by multiple point cloud data frames and the detection result of the first point cloud data obtained by some point cloud data frames among the multiple point cloud data frames, the labeling capability of the point cloud data processing system is improved.

Description

A point cloud data processing method and system

Cross References to Related Applications

This application claims the priority of a Chinese patent application filed with the Intellectual Property Office of the People's Republic of China on October 27, 2021, with application number 202111253590.5, and application title "A Method and System for Processing Point Cloud Data", the entire contents of which are incorporated by reference incorporated in this application.

technical field

The embodiments of the present application relate to the technical field of data processing, and in particular to a point cloud data processing method and system.

Background technique

Lidar (lightlaser detection and ranging, Lidar) is the abbreviation of laser detection and ranging system. Compared with traditional sensors such as cameras and ultrasonic sensors, lidar has the characteristics of high measurement accuracy, fast response speed, and strong anti-interference ability. LiDAR has been widely used in the field of intelligent driving and unmanned driving. Among them, the lidar transmits a detection signal to the target object during work (for example, the detection signal can be a laser beam), and after the lidar receives the reflected signal reflected from the target object, the lidar compares the detection signal with the transmitted signal and processed to obtain a point cloud set, which is a set of sampling points obtained after obtaining the spatial coordinates of each sampling point on the surface of the target object. Lidar can simultaneously detect and sample multiple target objects at the same time to obtain a set of point cloud data. The point cloud data can include multiple point cloud sets corresponding to multiple target objects.

At present, in order to reduce the labor cost of object detection on point cloud data and improve the detection accuracy, a point cloud 3D object automatic labeling system is proposed in some designs. The system takes a single frame of point cloud data or several frames of point cloud data as input , by using the 3D detection model for automatic annotation. However, when the target object is occluded or the distance is too far, due to the lack of point cloud data in the single frame point cloud data or several frames of point cloud data, it will lead to problems such as false detection, missed detection, and inaccurate detection frames. The offset in the point cloud data collected by the object at different times will seriously affect the automatic labeling effect of the system due to the smearing phenomenon caused by the point cloud data processing process.

Therefore, how to improve the labeling effect of the point cloud 3D object automatic labeling system is still an important problem that needs to be solved urgently.

Contents of the invention

The embodiment of the present application provides a point cloud data processing method and system, by using the detection result of the first point cloud map obtained from the multi-frame point cloud data, and part of the frame points in the multi-frame point cloud data The detection results of the first point cloud data obtained from the cloud data are fused to improve the ability of the point cloud data processing system to label part of the frame point cloud data.

In the first aspect, the embodiment of the present application provides a point cloud data processing method, which can be implemented by a point cloud data processing system, and the method includes: obtaining a first detection frame, the first detection frame is used to indicate the The static object marked in the point cloud map, the first point cloud map is obtained by M frame point cloud data, M is an integer greater than or equal to 2; obtain the second detection frame, the second detection frame is used to indicate the The dynamic object marked in the first point cloud data, the first point cloud data is obtained from N frames of point cloud data, N is an integer greater than or equal to 1, and N<M; according to the second detection frame, the The single-frame point cloud data in the M frame point cloud data is fused and processed to obtain a second point cloud map; the first detection frame and the second detection frame are corrected according to the second point cloud map to obtain a label A detection frame, the target detection frame is used to indicate the marked static object and/or dynamic object marked in the single frame point cloud data.

Through the above method, the point cloud data processing system can use the M frames of point cloud data and the N frames of point cloud data in the M frames of point cloud data as data to be labeled for different processing, and then process the processed point cloud data Perform fusion and obtain the corrected labeling results of each single frame of point cloud data, so as to use M frame point cloud data to compensate N frame point cloud data, and reduce the tailing phenomenon of N frame point cloud data caused by dynamic objects , and problems such as false detection, missed detection, and inaccurate detection frames caused by too few points in the N-frame point cloud data, so as to improve the labeling ability of the point cloud data processing system.

It should be noted that, in the embodiment of the present application, N<M, the N frames of point cloud data are partial frames of point cloud data of the M frames of point cloud data, wherein, when N=1, the N frames of point cloud data are For a single frame of point cloud data, when N>1, the N frames of point cloud data include several frames of point cloud data. During specific implementation, the point cloud data processing system determines the specific value of N or M according to the configuration information or the application scenario, and the embodiment of the present application does not limit the specific implementation manner.

It should be noted that in this embodiment of the application, static objects and dynamic objects are target objects included in point cloud data or point cloud maps. Static objects are generally point cloud collections collected by detecting static objects, and dynamic objects are generally A collection of point clouds collected for the detection of dynamic objects. Among them, due to the mobility of dynamic objects, the positions of dynamic objects at different times are different. Therefore, in order to eliminate the impact of the mobility of dynamic objects, when splicing point cloud collections collected at different times for the same dynamic object, it will be The phenomenon of listing the same objects is the smear phenomenon mentioned in the embodiment of the present application, which may also be called the smear phenomenon.

With reference to the first aspect, in a possible implementation manner, the acquiring the first detection frame includes: splicing the M frames of point cloud data to obtain the first point cloud map; The first point cloud map performs target detection to obtain the first detection frame, wherein the first detection model is obtained by training using first training data, and the first training data includes the third point cloud map and static Object annotation data.

Through the above method, a point cloud map (for example, the third point cloud map) can be used to train the first detection model for labeling static objects, so as to provide a high-precision static object detection model. Further, the point cloud data processing system can use the first detection model to mark the static objects in the point cloud map to be marked (for example, the first point cloud map), so that the detection results of the static objects in the point cloud map can be used in the In the M-frame point cloud data, the dynamic object is aligned with the relevant features of the static object, and the tailing phenomenon caused by the dynamic object is reduced in the N-frame point cloud data, thereby reducing the difficulty of labeling the dynamic object.

It should be understood that in the embodiment of the present application, the third point cloud map may be the same as or different from the first point cloud map, that is to say, the first point cloud map may be used as data to be labeled or as training data. No limit.

With reference to the first aspect, in a possible implementation manner, the acquiring the second detection frame includes: performing object detection on the first point cloud data according to the second detection model to obtain a third detection frame, wherein the The second detection model is obtained through training using second training data, the second training data includes second point cloud data and dynamic object labeling data, and the third detection frame is used to indicate that in the first point cloud data A marked dynamic object; correcting the third detection frame according to the first detection frame to obtain the second detection frame.

Through the above method, the second point cloud data can be used to train the second detection model for labeling the dynamic object, so as to provide a high-precision dynamic object detection model. Further, the point cloud data processing system can use the second detection model to mark the dynamic objects in the N frames of point cloud data to be marked, so as to obtain the dynamic object detection results in the N frames of point cloud data, and the dynamic object detection results can be used It is used to fuse M frames of point cloud data to present dynamic objects in M frames of point cloud data in a form that eliminates smearing, so as to facilitate the subsequent correction process and improve the detection accuracy of the point cloud data processing system.

With reference to the first aspect, in a possible implementation manner, according to the second detection frame, performing fusion processing on the single frame point cloud data in the M frames of point cloud data to obtain a second point cloud map, Including: according to the second detection frame, removing the dynamic object in the single frame point cloud data of the M frames of point cloud data to obtain the third point cloud data corresponding to the single frame point cloud data; according to the second detection frame Attribute information, associate the second detection frame corresponding to the same dynamic object to obtain the dynamic object association result; based on the dynamic object association result, perform fusion processing on the third point cloud data corresponding to the M frame point cloud data to obtain Second point cloud map.

Through the above method, the point cloud data processing system can use the detection results of the dynamic objects in the N-frame point cloud data to realize the object-level association of the dynamic objects in the M-frame point cloud data, and eliminate the tailing phenomenon caused by the dynamic objects , to obtain a new point cloud map, so that the dynamic objects in the new point cloud map are presented in a form that eliminates the trailing phenomenon, thereby facilitating the subsequent correction process.

With reference to the first aspect, in a possible implementation manner, correcting the first detection frame and the second detection frame according to the second point cloud map to obtain a target detection frame includes: In the two-point cloud map, the first attribute of the first detection frame marked with the static object is corrected, and/or the second attribute of the second detection frame marked with the dynamic object is corrected to obtain the fourth detection frame, the fourth The detection frame is used to indicate the dynamic object and/or static object marked in the second point cloud map; in the single frame point cloud data corresponding to the second point cloud map, the fourth detection frame of the marked dynamic object The third attribute is corrected to obtain the fifth detection frame; the fourth detection frame marked with the static object and the fifth detection frame are used as the target detection frame of the M frames of point cloud data.

Through the above method, the point cloud data processing system can, for example, correct the relevant attributes of the detection frame marked with static objects and/or dynamic objects in the second point cloud map, which helps to improve the detection accuracy of the point cloud data processing system . Wherein, the point cloud data processing system may implement the correction process by manual correction or automatic correction, and the embodiment of the present application does not limit the specific implementation manner.

With reference to the first aspect, in a possible implementation manner, after obtaining the target detection frame, the method further includes: combining the single frame point cloud data in the M frames of point cloud data and the target detection frame As the third training data, in the process of training a plurality of detection models based on the third training data, a sixth detection frame is determined, and the detection result of the sixth detection frame is an error; the second point cloud map and the The target detection frame is used as fourth training data, and a seventh detection frame is determined during the process of training multiple detection models based on the fourth training data; and the seventh detection frame is corrected based on the sixth detection frame.

Through the above method, the point cloud data processing system can use multi-model fusion and point cloud map to generate softening frames, provide higher-quality detection frames and softening frames that are easy to train, and help improve the accuracy of single frames using automatic labeling systems. Or the training accuracy of the model of several frames of point cloud data.

In the second aspect, the embodiment of the present application provides a point cloud data processing system, including: a first acquisition unit, configured to acquire a first detection frame, and the first detection frame is used to indicate static object, the first point cloud map is obtained from M frames of point cloud data, and M is an integer greater than or equal to 2; the second acquisition unit is used to acquire a second detection frame, and the second detection frame is used to indicate The dynamic object marked in the first point cloud data, the first point cloud data is obtained from N frames of point cloud data, N is an integer greater than or equal to 1, and N<M; the processing unit is used for according to the second The detection frame is used to fuse the single-frame point cloud data in the M frames of point cloud data to obtain a second point cloud map; a correction unit is used to correct the first detection frame and the second point cloud map according to the second point cloud map. The second detection frame is corrected to obtain a target detection frame, and the target detection frame is used to indicate the marked static object and/or dynamic object marked in the single frame point cloud data.

It should be noted that, in the embodiment of the present application, the first acquisition unit and the second acquisition unit may be the same unit or different units, and the embodiment of the present application does not limit the product form.

With reference to the second aspect, in a possible implementation manner, the first acquisition unit is configured to: stitch the M frames of point cloud data to obtain the first point cloud map; The first point cloud map is used for target detection to obtain the first detection frame, wherein the first detection model is obtained by training using the first training data, and the first training data includes the third point cloud map and static objects Annotate the data.

With reference to the second aspect, in a possible implementation manner, the second acquisition unit is configured to: perform object detection on the first point cloud data according to a second detection model to obtain a third detection frame, wherein the The second detection model is obtained by training using the second training data, the second training data includes the second point cloud data and dynamic object labeling data, and the third detection frame is used to indicate the labeling in the first point cloud data the dynamic object; correcting the third detection frame according to the first detection frame to obtain the second detection frame.

With reference to the second aspect, in a possible implementation manner, the processing unit is configured to: remove dynamic objects in a single frame of point cloud data in the M frames of point cloud data according to the second detection frame, to obtain a single The third point cloud data corresponding to the frame point cloud data; according to the attribute information of the second detection frame, correlating the second detection frame corresponding to the same dynamic object to obtain a dynamic object association result; based on the dynamic object association result to The third point cloud data corresponding to the M frames of point cloud data are fused to obtain a second point cloud map.

With reference to the second aspect, in a possible implementation manner, the correction unit is configured to: in the second point cloud map, correct the first attribute of the first detection frame marking the static object, and/or , modifying the second attribute of the second detection frame marked with the dynamic object to obtain a fourth detection frame, the fourth detection frame is used to indicate the dynamic object and/or the static object marked in the second point cloud map; in the In the single-frame point cloud data corresponding to the second point cloud map, the third attribute of the fourth detection frame marked with the dynamic object is corrected to obtain the fifth detection frame; the fourth detection frame marked with the static object and the first Five detection frames are used as target detection frames of the M frames of point cloud data.

With reference to the second aspect, in a possible implementation manner, the system further includes a training unit configured to: use the single frame of point cloud data in the M frames of point cloud data and the target detection frame As the third training data, in the process of training a plurality of detection models based on the third training data, a sixth detection frame is determined, and the detection result of the sixth detection frame is an error; the second point cloud map and the The target detection frame is used as fourth training data, and a seventh detection frame is determined during the process of training multiple detection models based on the fourth training data; and the seventh detection frame is corrected based on the sixth detection frame.

In a third aspect, the embodiment of the present application provides a point cloud data processing system, including a memory and a processor, the memory is used to store programs; the processor is used to execute the programs stored in the memory, so that the The device implements the method described in the foregoing first aspect and any possible implementation manner of the first aspect.

In a fourth aspect, the embodiment of the present application provides a point cloud data processing device, including: at least one processor and an interface circuit, the interface circuit is used to provide data or code instructions for the at least one processor, and the at least one processor A processor is configured to implement the method described in the first aspect and any possible implementation manner of the first aspect by using a logic circuit or executing code instructions.

In the fifth aspect, the embodiment of the present application provides a computer-readable storage medium, the computer-readable medium stores program code, and when the program code is run on the computer, the computer executes the above-mentioned first aspect and the first aspect. In one aspect, the method described in any possible implementation manner.

In a sixth aspect, an embodiment of the present application provides a computer program product, which, when the computer program product is run on a computer, enables the computer to execute the above-mentioned first aspect and any possible implementation manner of the first aspect. method.

In a seventh aspect, an embodiment of the present application provides a chip system, the chip system includes a processor, configured to call a computer program or a computer instruction stored in a memory, so that the processor executes the above-mentioned first aspect and the first aspect The method described in any possible implementation.

With reference to the seventh aspect, in a possible implementation manner, the processor is coupled to the memory through an interface.

With reference to the seventh aspect, in a possible implementation manner, the system on a chip further includes a memory, where computer programs or computer instructions are stored in the memory.

In an eighth aspect, the embodiment of the present application provides a terminal device, which can be used to implement the method described in the foregoing first aspect and any possible implementation manner of the first aspect. Exemplarily, the terminal equipment includes but is not limited to: intelligent transportation equipment (such as automobiles, ships, drones, trains, trucks, etc.), intelligent manufacturing equipment (such as robots, industrial equipment, intelligent logistics, intelligent factories, etc.), intelligent terminal (Mobile phones, computers, tablets, PDAs, desktops, headsets, audio, wearable devices, car devices, etc.).

In a ninth aspect, an embodiment of the present application provides a vehicle, which can be used to implement the method described in the first aspect and any possible implementation manner of the first aspect.

In a tenth aspect, the embodiment of the present application provides a server, which can be used to implement the method described in the first aspect and any possible implementation manner of the first aspect.

On the basis of the implementations provided by the foregoing aspects, the embodiments of the present application may be further combined to provide more implementations.

The technical effects that can be achieved by any possible implementation of any one of the above-mentioned second to tenth aspects can be described with reference to the technical effects that can be achieved by any possible implementation of any of the above-mentioned first aspects, Duplication will not be discussed.

Description of drawings

Fig. 1 is the schematic diagram of a kind of point cloud data;

Fig. 2 is a schematic diagram of a group of point cloud data with deviations in detection frame positions;

Figure 3 is a schematic diagram of less point cloud data and trailing phenomenon;

FIG. 4 is a schematic diagram of an application scenario applicable to the point cloud data processing method provided by the embodiment of the present application;

5 is a schematic diagram of a point cloud data processing system provided by an embodiment of the present application;

FIG. 6 is a schematic flow chart of a point cloud data processing method provided in an embodiment of the present application;

FIG. 7 is a schematic diagram of the first detection frame information provided by the embodiment of the present application;

FIG. 8 is a schematic diagram of the second detection frame information provided by the embodiment of the present application;

FIG. 9 is a schematic diagram of obtaining a second point cloud map provided by an embodiment of the present application;

FIG. 10 is a schematic flow diagram of a point cloud data processing method provided in an embodiment of the present application;

11 is a schematic diagram of a point cloud data processing system provided by an embodiment of the present application;

FIG. 12 is a schematic diagram of a point cloud data processing system provided by an embodiment of the present application.

Detailed ways

In order to facilitate the understanding of the embodiments of the present application, the terms related to the embodiments of the present application are introduced below:

1) The point cloud is the sampling point obtained after the object is detected by the measuring device. Point cloud data refers to a set of vectors in a three-dimensional coordinate system.

A collection of point clouds obtained after the measuring device detects and samples the appearance surface of an object may be referred to as a point cloud collection. The measuring device can detect and sample multiple objects at the same time, and the obtained set of point cloud data can include point cloud collections corresponding to multiple objects.

Wherein, besides the geometric position, the point cloud data may also include color information, for example. For example, point cloud data measured based on the principle of laser measurement (also referred to as laser point cloud data) may include information such as three-dimensional coordinates and laser reflection intensity (intensity). The point cloud data obtained based on the principle of photogrammetry may include information such as three-dimensional coordinates and color, wherein the color information may be color data in red, green, blue (RGB) format. Combining the principles of laser measurement and photogrammetry to obtain point cloud data can include information such as three-dimensional coordinates, laser reflection intensity, and color.

In the embodiment of the present application, the measuring device is Lidar (lightlaser detection and ranging, Lidar) as an example, and the point cloud data collected by the Lidar includes at least the three-dimensional coordinates of each point cloud.

It should be noted that, in the embodiment of the present application, one frame (or single frame) point cloud data refers to a group of point cloud data collected at one sampling moment, and the point cloud map refers to the continuous collection of multiple frames The image obtained by splicing point cloud data can be a two-dimensional image or a three-dimensional image.

2) The training set, verification set and test set are different sets of sample data used for model training in machine learning.

Wherein, the training set is a sample set composed of data samples used for model fitting.

The validation set is a set of samples reserved during the model training process, which is used to adjust the hyperparameters of the model and to conduct a preliminary evaluation of the model's capabilities. In the point cloud data processing method provided in the embodiment of the present application, the verification set can be used to adjust the proportion of point cloud data of different data types in the training set, thereby improving the model's ability to fit point cloud data of different data types .

The test set is a sample set used to evaluate the final generalization ability of the model.

In the embodiment of the present application, the sample data used for model training may be referred to as training data, and the training data may include a training set, a verification set, and a test set, and data in the training set, verification set, and test set generally do not overlap. Generally, before building the model, the sample data can be divided according to the preset ratio, for example, according to the ratio of 7:1:1 to obtain the training set, verification set and test set, and based on the training set, verification set and test set training For the target detection model (such as the first detection model and/or the second detection model mentioned below), the embodiment of the present application does not limit the specific implementation manner and training process of the model training.

3) Target detection is an important branch of image processing and computer vision disciplines, and is also the core part of an intelligent monitoring system. Target detection can perform target detection and recognition on the point cloud data, so as to determine the types of objects corresponding to the multiple point cloud sets included in the point cloud data.

For example, when the target detection is applied in the driving scene, the target detection device on the vehicle can perform target detection on the point cloud data collected by the lidar, so as to recognize the vehicles on the lane and the trees and pedestrians on the side of the road during the driving process of the vehicle. , assisting vehicles to realize route planning, obstacle avoidance and other functions, and then realize intelligent driving.

The embodiment of the present application may include two types of target detection models, denoted as a first detection model and a second detection model. Wherein, the first detection model can be obtained by performing model training on point cloud maps and static object annotation data as training data, and the second detection model can be obtained by point cloud data (including single frame point cloud data and/or several frames of point cloud data ) and dynamic object labeling data are used as training data to carry out model training, and the first detection model and the second detection model can be used to detect the point cloud map and point cloud data to be marked respectively, so as to determine the point cloud map , types of objects respectively corresponding to the multiple point cloud sets included in the point cloud data.

It can be understood that in the embodiment of the present application, the data and detection results processed by the first detection model or the second detection model can also be used as training data to further train the first detection model and the second detection model to improve The accuracy of the model is not limited in this embodiment of the present application.

4) Static objects and dynamic objects are target objects contained in point cloud data or point cloud maps. Static objects are generally point cloud collections collected by detecting static objects, and dynamic objects are generally collected by detecting dynamic objects. Collection of point clouds.

For example, in a vehicle driving scene, static objects may include but not limited to roads, trees statically placed on the side of the road, buildings, street lights, road signs, parked vehicles, etc. Dynamic objects may include but not limited to vehicles driving on the road, pedestrians , animals, etc. It can be understood that, in the embodiment of the present application, the dynamic or static distinction of an object only depends on whether the object moves relative to the reference object. For example, a vehicle is a dynamic object in a driving state, and the point cloud set obtained by detecting the vehicle is called As a dynamic object, the vehicle can be considered as a static object in the non-driving state, and the point cloud collection obtained by detecting the vehicle is called a static object.

5) The smearing phenomenon, also known as the smearing phenomenon, is a phenomenon in which the same objects are listed when splicing point cloud collections collected at different times for the same dynamic object. Among them, due to the mobility of dynamic objects, the positions of dynamic objects at different times are different. Therefore, in order to eliminate the impact of the mobility of dynamic objects, when splicing point cloud collections collected at different times for the same dynamic object, it will be The phenomenon that the same objects are listed is formed, that is, the tailing phenomenon.

The embodiments of the present application will be further described below in conjunction with the accompanying drawings.

With the wide application of object detection in driving and other fields, the object detection method based on deep learning has become the mainstream method in object detection due to its accuracy and high efficiency. In the implementation, the target detection model is built and trained based on deep learning, and the point cloud data to be detected is input into the trained target detection model, and the target detection results output by the target detection model can be obtained. For example, the target detection model can output multiple The type of each object and the location information of each object.

In the embodiment of the present application, the point cloud data may be collected by lidar, for example. For example, the laser radar transmits a detection signal to the target object (for example, the detection signal can be a laser beam), and after the laser radar receives the reflection signal reflected from the target object, the laser radar compares the detection signal with the transmitted signal and processes it to obtain The point cloud set, the point cloud set is a set of sampling points obtained after obtaining the spatial coordinates of each sampling point on the surface of the target object. Lidar can detect and sample multiple target objects at the same time to obtain a set of point cloud data. The point cloud data can include multiple point cloud sets corresponding to multiple target objects. Fig. 1 is a schematic diagram of point cloud data, wherein each black dot in the figure represents a point cloud, and each point cloud corresponds to a set of three-dimensional coordinates. A set of point cloud data includes multiple point cloud collections. For example, the point clouds in each detection frame in detection frame A, detection frame B, and detection frame C in Figure 1 constitute a point cloud collection, and each point cloud collection corresponds to a target object.

Since the target object type and the point cloud set corresponding to the target object need to be used as training samples when training the target detection model, so that the target detection model can learn the corresponding relationship between the target object type and the point cloud set, the point cloud used The accuracy of the ensemble can have a large impact on the performance of the object detection model. Usually a set of point cloud data includes multiple point cloud sets. Therefore, before training the target detection model, it is necessary to mark the detection frame on the point cloud data used for training, so as to divide the point cloud belonging to the same object in the point cloud data. for a point cloud collection. For example, the detection frame A, detection frame B, and detection frame C in Figure 1 divide the point cloud data into three point cloud sets, and each point cloud set corresponds to a target object.

When training the target detection model, the accurate position of the detection frame can ensure the accuracy of the point cloud set extracted according to the position of the detection frame. When there is a deviation in the position of the detection frame, the extracted point cloud set will not be accurate enough. For example, Figure 2 is a schematic diagram of a group of point cloud data with deviations in the detection frame positions. In Figure 2, there is a deviation in the positions of the detection frame E and the detection frame F. After dividing the point cloud data according to the detection frame E and detection frame F, If the point cloud sets corresponding to the two target objects cannot be accurately obtained, the target detection model will learn the wrong point cloud set during training, which will affect the performance of the target detection model. In view of the above problems, the current processing method is manual adjustment and verification, and the detection frame position and point cloud data after manual verification are used as the training data of the target detection model. It can be seen that the current point cloud data processing methods are inefficient and difficult to guarantee accuracy.

Based on the above problems, in order to reduce the labor cost of object detection on point cloud data and improve the detection accuracy, a three-dimensional (3Dimensions, 3D) point cloud object automatic labeling system is proposed in some designs. The system uses single frame point cloud data or Several frames of point cloud data are used as input, which are automatically marked by using the 3D detection model.

Although the above design can improve the detection efficiency and accuracy to a certain extent, when the target object is blocked or the distance is too far, due to the lack of point cloud data in the single frame point cloud data or several frames of point cloud data, it will cause False detection, missed detection, inaccurate detection frame, etc., and the offset of dynamic objects in the point cloud data collected at different times will seriously affect the system's automatic Labeling effect. As shown in Figure 3, the rectangular frame G marks the point cloud collection corresponding to the dynamic object (such as the rear of a driving vehicle). The number of point clouds included in the rectangular frame G is small, which makes it impossible to accurately identify the dynamic object. At the same time, after converting the point cloud data of different frames at different times into a unified coordinate system for splicing, the dynamic object will produce smears and cause smearing. As shown in Figure 3, the point cloud set included in the rectangular frame G is in List of different moments. Therefore, how to improve the labeling effect of the 3D point cloud object automatic labeling system is still an important problem that needs to be solved urgently.

The embodiment of the present application provides a point cloud data processing method and system, which can be used to correct the detection frames of a single frame of point cloud data or several frames of point cloud data, thereby improving the labeling effect of the 3D point cloud object automatic labeling system.

FIG. 4 is a schematic diagram of an application scenario where the point cloud data processing method provided in the embodiment of the present application is applicable.

As shown in FIG. 4 , the multi-frame point cloud data in the embodiment of the present application may be data collected by a radar (such as a lidar), and the radar may be located on a vehicle. For example, a radar may be installed on the vehicle 41 shown in FIG. The point cloud data processing method provided by the embodiment of this application. Alternatively, the vehicle 41 may send the collected point cloud data to an independently deployed point cloud data processing system, and the independently deployed point cloud data processing system executes the point cloud data processing method provided in the embodiment of the present application.

Wherein, the independently deployed point cloud data processing system may be the server 42, and the server 42 may execute the point cloud data processing method provided by the embodiment of the present application on the acquired point cloud data. Alternatively, the independently deployed point cloud data processing system may be a terminal device, such as the mobile terminal 43 shown in FIG.

Exemplarily, FIG. 5 is a schematic diagram of a point cloud data processing system provided in an embodiment of the present application. As shown in FIG. 5 , the system 500 may include an acquisition unit 510, a processing unit 520, a correction unit 530, a training unit 540 and an output unit 550.

Wherein, the obtaining unit 510 may be configured to obtain multi-frame point cloud data (for example, M frames, M is an integer greater than or equal to 2), and provide the multi-frame point cloud data to the processing unit 520 . The processing unit 520 can obtain the first detection model and/or the second detection model from the training unit 540, and perform the multi-frame point cloud data processing according to the first detection model and/or the second detection model Processing is performed to obtain a detection result, which can be used to indicate the static object and/or dynamic object marked in the single frame of point cloud data in the multi-frame point cloud data. The processing unit 520 can provide the detection result to the correction unit 530, and the correction unit 530 can correct the detection result to obtain a target detection frame, which can be used as the target detection result of the multi-frame point cloud data via the output from the output unit 550. In a possible implementation manner, the multi-frame point cloud data and the target detection frame information of the multi-frame point cloud data can be provided to the training unit 540 for the training unit to perform model training to improve the model precision.

It can be understood that the above unit modules are only the functional division of the point cloud data processing system 500 , and do not limit the functions of the point cloud data processing system 500 . In other embodiments, the point cloud data processing system 500 may also include other units, and the unit modules in the point cloud data processing system 500 may also be further divided into other naming methods, which is not limited in this embodiment of the present application. For example, the acquiring unit 510 may specifically include a first acquiring unit and a second acquiring unit, and the processing unit 520 may include a first processing unit and a second processing unit, which will not be repeated here.

The point cloud data processing method of the embodiment of the present application is introduced below.

Fig. 6 is a schematic flow chart of the point cloud data processing method provided by the embodiment of the present application, wherein the method can be realized by the point cloud data processing system 500 in Fig. 5 and its functional modules, as shown in Fig. 6 , the point cloud The data processing method may include the following steps:

S610: The point cloud data processing system acquires the first detection frame.

In the embodiment of the present application, the first detection frame is used to indicate the static object marked in the first point cloud map, and the first point cloud map can be obtained from M frames of point cloud data, where M is an integer greater than or equal to 2. Wherein, the M frames of point cloud data can be point cloud data of continuous frames, and the M frames of point cloud data can be collected by a data acquisition device and obtained by an acquisition unit of the point cloud data processing system, and the data acquisition device can for example is the radar on the vehicle 41 , as shown in FIG. 4 . In an example, for the convenience of distinction, the M frames of point cloud data may also be referred to as a point cloud sequence.

For example, the first point cloud map may be a point cloud map obtained by splicing the M frames of point cloud data. During specific implementation, the point cloud data processing system can convert all single-frame point cloud data in the M frame point cloud data to a unified coordinate system (such as the world coordinate system), and convert all single-frame point cloud data in the M frame point cloud data The point cloud collection in the frame point cloud data is spliced into a point cloud map in this coordinate system. Wherein, in the process of splicing the point cloud collection of multi-frame point cloud data into a point cloud map in the unified coordinate system, different The frame point cloud data is spliced; or, it can also be to select a reference object (such as a vehicle positioning attitude), and based on the reference object, after adjusting the point cloud collection in the M frame point cloud data in the coordinate system , sequentially splicing the point cloud sets in different frames of point cloud data according to the collection time, and the embodiment of the present application does not limit the specific implementation manner of the splicing process.

In a possible implementation manner, the first detection frame is a detection result obtained by performing target detection on the first point cloud map, and the first detection frame may be used to indicate that in the first point cloud map Annotated static object.

Exemplarily, the target detection process on the first point cloud map can be realized by the first detection model. Wherein, the first detection model can be obtained by the training unit 540 mentioned above using the first training data for training, the first training data is marked training data, and the first training data can include the third point Cloud map with static object annotation data.

It should be noted that, in the embodiment of the present application, when preparing the first training data, since there are obvious morphological differences between static objects and dynamic objects on the third point cloud map, the training unit 540 may only retain The static object labeling frame is used as a positive sample, that is, static object labeling data, and the first detection model is obtained by using the third point cloud map and the static object labeling data for training. It can be understood that, in the embodiment of the present application, the third point cloud map may be the same as or different from the first point cloud map, that is, M frames of point cloud data or the first point cloud data spliced by the M frames of point cloud data. The point cloud map can be used as data to be labeled or as training data, which is not limited in this application.

In the process of implementing S610, the point cloud data processing system may directly acquire the first point cloud map and related information used to describe the first point cloud map. Alternatively, the point cloud data processing system may also obtain the M frames of point cloud data, and splicing the M frames of point cloud data to obtain the first point cloud map. In the embodiment of the present application, the first point cloud The method of obtaining the map is not limited. Further, the point cloud data processing system can perform target detection on the first point cloud map according to the first detection model, and obtain the first detection frame. Wherein, the first detection frame may be used to label the static object in the first point cloud map, and the attribute information of the first detection frame may be used to describe the static object marked by the first detection frame.

For example, as shown in FIG. 7 , in the (x, y, z) three-dimensional space, the first detection frame may include, for example, detection frame 1, detection frame 2, and detection frame 3, and the attribute information of the first detection frame For example, it may include the position of detection frame 1/detection frame 2/detection frame 3 (such as the position of detection frame 1/detection frame 2/detection frame 3 in the first point cloud map, or the relative position with other detection frames, etc. ), size (such as length, width, radius, diameter, etc.), shape (such as cylinder, cone, cube, cuboid, irregular shape, etc.), based on the attribute information of the first detection frame, the first detection frame can be detected. The position, size, shape, etc. of the static object marked by a detection frame. It should be noted that the first detection frame and the attribute information of the first detection frame are only exemplarily described here. Actually, it needs to be determined according to the detection and identification of the point cloud set corresponding to the target object in the point cloud data. This will not be repeated here.

S620: The point cloud data processing system acquires the second detection frame.

In the embodiment of the present application, the second detection frame is used to indicate the dynamic object marked in the first point cloud data, the first point cloud data is obtained from N frames of point cloud data, N is an integer greater than or equal to 1, and N <M. The N frames of point cloud data are partial frames of point cloud data of the M frames of point cloud data, wherein, when N=1, the N frames of point cloud data are single frame point cloud data, and when N>1, the N frames of point cloud data Cloud data includes several frames of point cloud data. During specific implementation, the point cloud data processing system determines the specific value of N or M according to the configuration information or the application scenario, and the embodiment of the present application does not limit the specific implementation manner. Wherein, in the case where the part of the frame point cloud data is point cloud data of several frames, the first point cloud data can be obtained by splicing the point cloud data of these frames, and the first point cloud data only includes the first Partial information of a point cloud map can also be called a local point cloud map.

In a possible implementation manner, the second detection frame may be a detection result obtained by performing target detection on the first point cloud data, and the second detection frame may be used to indicate that the label in the first point cloud data of dynamic objects.

Exemplarily, the target detection process of the first point cloud data can be realized by the second detection model. Wherein, the second detection model can be obtained by training the aforementioned training unit 540 using the second training data, the second training data is labeled training data, and the second training data can include the second point cloud Data and dynamic objects label data.

It should be noted that, in the embodiment of the present application, when preparing the second training data, the training data may only retain the dynamic object labeling frame as a positive sample, that is, the dynamic object labeling data, and use the second point cloud data and the The dynamic object tag data is trained to obtain the second detection model. It can be understood that in the embodiment of the present application, the second point cloud data can be the same as or different from the first point cloud data, that is, the first point cloud data can be used as data to be labeled or as training data. This is not limited.

In a possible implementation, due to the moving characteristics of dynamic objects, the difficulty of implementing the labeling process for dynamic objects is greater than that for static objects, so the second detection model obtained through training may have a certain detection error , when the second detection model is used for target detection, the detection result may still include a detection frame marked with a static object. In the embodiment of the present application, in order to reduce the detection error, during the implementation of S620, the point cloud data processing system can perform target detection on the first point cloud data according to the second detection model to obtain a third detection frame, according to The first detection frame corrects the third detection frame to obtain the second detection frame. Wherein, the third detection frame is used to indicate the dynamic object marked in the first point cloud data. Through this correction process, the detection result of the static object in the first point cloud map can be fused to the first point cloud map. The false detection result of the static object in the point cloud data, so as to ensure that the obtained second detection frame is only used to indicate the dynamic object marked in the first point cloud data. Wherein, the second detection frame can be used to label the dynamic object in the first point cloud data, and the attribute information of the second detection frame can be used to describe the dynamic object marked by the second detection frame.

As shown in Figure 8, in the (x, y, z) three-dimensional space, the third detection frame may include, for example, detection frame 4, detection frame 5, and detection frame 6, and the attribute information of the third detection frame may include, for example For information such as the position, size, and shape of the detection frame 4/detection frame 5/detection frame 6, please refer to the relevant description above in conjunction with FIG. 7 for details, and will not be repeated here. Among them, the detection frame 4 and the detection frame 5 are used to indicate the dynamic object (for example, the point cloud collection collected by detecting the vehicle, etc.), the detection frame 6 (represented by a dotted line frame, and used to distinguish from the detection frame 4 and the detection frame indicating the dynamic object. The detection frame 5) is used to indicate the static object marked due to the detection error (for example, the point cloud collection collected by detecting trees, road signs, etc.). Since the detection frame 6 is also included in the first detection frame obtained by performing target detection on the first point cloud map, the point cloud data processing system corrects the third detection frame according to the first detection frame, for example, from the third detection frame A detection frame indicating a static object is removed from the detection frame, and a third detection frame remaining after processing is used as the second detection frame. As shown in FIG. 8 , after the detection frame 6 is removed, the remaining detection frame 4 and detection frame 5 are the second detection frame, and the attribute information of the detection frame 4 and the detection frame 5 is the attribute information of the second detection frame.

It can be understood that, in the embodiment of the present application, since the dynamic object is different from the static object, correspondingly, the attribute information of the first detection frame used to mark the static object is different from the attribute information of the second detection frame used to mark the dynamic object They may be the same or different, which is not limited in this embodiment of the present application.

S630: The point cloud data processing system performs fusion processing on the single frame of point cloud data in the M frames of point cloud data according to the second detection frame, to obtain a second point cloud map.

In this embodiment of the present application, the second detection frame may also be referred to as a dynamic object detection result, and the dynamic object detection result may be used to indicate different dynamic objects.

In the embodiment of the present application, due to the movable characteristics of the static objects corresponding to the dynamic objects, in the M frames of point cloud data, the single frame of point cloud data may only include locally collected point cloud sets for the same dynamic object Rather than the point cloud collection collected as a whole for the dynamic object, this will result in too few points in a single frame of point cloud data or points in several frames of point cloud data (called sparse point cloud), and at the same time, for different frames The splicing process of point cloud data may also cause smearing caused by dynamic object splicing.

In view of these problems, in S630, on the one hand, the point cloud data processing system can associate the second detection frames corresponding to the same dynamic object according to the attribute information of the second detection frame, and point cloud data of different frames for The point cloud collection collected by the same dynamic object is spliced at the object level to obtain the dynamic object association result (called dense point cloud). On the other hand, the point cloud data processing system can remove the dynamic object in the single frame point cloud data in the M frame point cloud data according to the second detection frame, and obtain the third point cloud data corresponding to the single frame point cloud data , the third point cloud data only includes static objects. Furthermore, the point cloud data processing system can use the dynamic object association result to perform fusion processing on the third point cloud data corresponding to M frames of point cloud data to obtain a second point cloud map, which includes the dynamic object and/or static objects.

As shown in Figure 9, with the moving vehicle 41 as a dynamic object, the a-frame point cloud data collected at time t1 includes the point cloud collection collected by the head of the detection vehicle 41, and the b-frame point cloud data collected at time t2 Including the collection of point clouds collected by the body of the detection vehicle 41, the b-frame point cloud data collected at time t3 includes the collection of point clouds collected by the rear of the detection vehicle 41, respectively represented by detection frame A, detection frame B, detection frame Marked by C, it is a sparse point cloud at this time. When performing object-level association, the point cloud data processing system can recognize that the detection frame A in the point cloud data of frame a, the detection frame B in the point cloud data of frame b, and the detection frame C in the point cloud data of frame c all correspond to the corresponding For the vehicle 41 described above, the point cloud sets respectively marked by the detection frame A, detection frame B, and detection frame C are associated with the same dynamic object. When performing object-level splicing, the point cloud data processing system can splice point cloud sets corresponding to the same dynamic object in different frames of point cloud data based on the granularity of the dynamic object and according to the point cloud data collection time, to obtain the dynamic The relatively dense point cloud collection of the object, marked by the association box D shown in Figure 9, is a dense point cloud at this time. At the same time, the object-level splicing process can align the same dynamic objects in different frames of point cloud data based on the point acquisition time, so as to eliminate the smearing phenomenon caused by dynamic objects. Furthermore, after the point cloud data processing system performs fusion processing on the single frame point cloud data in the multi-frame point cloud data based on the dynamic object association result, the same dynamic object can be presented in the second point cloud map A relatively dense collection of point clouds.

S640: The point cloud data processing system corrects the first detection frame and the second detection frame according to the second point cloud map to obtain a target detection frame, and the target detection frame is used to indicate that in the single frame Static objects and/or dynamic objects annotated in point cloud data.

Wherein, since the first detection frame is used to indicate the static object, and the second detection frame is used to indicate the dynamic object, and the attributes of the static object and the dynamic object may be the same or may be different, during the correction process in S640, the point cloud The data processing system may correct different attributes of different objects according to the second point cloud map, so as to obtain target detection frames of all single frames of point cloud data.

For example, the point cloud data processing system may, in the second point cloud map, correct the first attribute (such as size, position, shape, etc.) The second attribute (such as size, shape, etc.) of the second detection frame of the object is corrected to obtain a fourth detection frame, and the fourth detection frame is used to indicate the dynamic object marked in the second point cloud map and/or static object; in the single-frame point cloud data corresponding to the second point cloud map, modify the third attribute (such as position, etc.) of the fourth detection frame of the marked dynamic object to obtain the fifth detection frame; to mark the static The fourth detection frame of the object and the fifth detection frame are used as target detection frames of the M frames of point cloud data.

Wherein, the target detection frame is used to mark static objects and/or dynamic objects in a single frame of point cloud data, and the attribute information of the target detection frame is used to describe the static objects and/or dynamic objects marked by the target detection frame. Wherein, the above-mentioned correction to the size may include, for example, the increase and decrease of the size of the inspection frame, and the correction to the position may, for example, include correction to the absolute position and/or relative position of the detection frame (including the detection frame in any direction). The correction of the shape may include, for example, the correction of the shape of the monitoring frame, for example, from a cube to a cuboid, from a cuboid to a cube, from a cone to a cylinder, and so on.

It can be understood that, in S640, the point cloud data processing system may automatically implement the above correction process, or in S640, the above correction process may be implemented manually.

For example, when manual correction is adopted, the point cloud data processing system may output the second point cloud map and the first detection frame and/or Marking the second detection frame of the dynamic object, the labeler can view the second point cloud map, the first detection frame and the second detection frame through the user interface. The annotator can correct the first attributes such as the size, position, and shape of the first detection frame in the second point cloud map, and correct the second attributes such as the size and shape of the second detection frame to obtain the fourth detection frame. box information. Further, the annotator corrects the third attributes such as the position of the fourth detection frame of the marked dynamic object in each single frame of point cloud data corresponding to the second point cloud map to obtain the fifth detection frame.

Thus, using the second point cloud map, the fourth detection frame and the fifth detection frame of the static object are used as the target detection frame (ie, the labeling result) of the single frame point cloud data in the M frame point cloud data, Not only can more point cloud information of each object be obtained in a single frame of point cloud data, but also the ability of the point cloud data processing system to automatically label point cloud collections corresponding to distant objects or severely occluded objects in a single frame of point cloud data can be improved. It can also reduce the smearing phenomenon caused by dynamic objects by aligning dynamic objects in multi-frame point cloud data to the relevant features of static objects. At the same time, based on the auxiliary labeling of point cloud maps containing dynamic objects and static objects, convenient auxiliary corrections can be realized, which facilitates the labeling of accurate object attributes at a lower labor cost.

In addition, in some implementations, after obtaining the target detection frame, the point cloud data processing system can also use multi-model fusion and point cloud map to generate a softening frame to provide a higher-quality detection frame and a softening frame that is easy to train. It helps to improve the training accuracy of models using automatic labeling systems for partial frame point cloud data.

During specific implementation, after S640, the method may also include the following steps: using the single-frame point cloud data in the M frames of point cloud data and the target detection frame as the third training data, based on the third training In the process of data training multiple detection models, the sixth detection frame is determined, and the detection result of the sixth detection frame is an error; the second point cloud map and the target detection frame are used as the fourth training data, based on the Determining a seventh detection frame during the process of training multiple detection models with the fourth training data; correcting the seventh detection frame based on the sixth detection frame. For example, for the same object, the sixth detection frame with an incorrect detection result is used instead of the seventh detection frame with a correct detection result, and the sixth detection frame with a misaligned detection result and the seventh detection frame with an incorrect detection result are eliminated. Wherein, the detection frame obtained after correction (for example, denoted as the eighth detection frame) includes a softening frame for training and attribute information of the softening frame, and the softening frame and its attribute information can also replace the above-mentioned single frame point The target detection frame of the cloud data is used as training data to train the target detection model, thereby further improving the training accuracy of the single-frame point cloud data detection model.

For ease of understanding, the point cloud data processing method will be introduced below by taking a vehicle driving scene as an example in conjunction with the method flow chart shown in FIG. 10 .

Referring to Figure 10, the method may include the following steps:

S1010: The acquisition unit of the point cloud data processing system acquires a point cloud sequence. Wherein, the point cloud sequence can be collected by the radar on the vehicle, for example, and the point cloud sequence includes continuous M frames of point cloud data (M is an integer greater than or equal to 2, and the value of M depends on the point cloud data processing system processing capabilities, configuration information or application scenarios, etc.).

When implementing S1010, the acquisition unit of the point cloud data processing system may acquire the point cloud sequence from the vehicle, it may be that the vehicle actively reports the point cloud sequence, or the vehicle responds to the point cloud data The request of the processing system feeds back the sequence of point clouds. Alternatively, the point cloud data processing system may include a storage unit, the M frames of point cloud data may be stored in the storage unit, and the acquisition unit may read the point cloud sequence from the storage unit. This example does not limit the acquisition method of the point cloud sequence.

S1021: The processing unit of the point cloud data processing system transforms the single frame point cloud data in the M frame point cloud data into the world coordinate system according to the positioning attitude of the vehicle, and stitches the M frame point cloud data in the world coordinate system , get the first point cloud map.

S1022: The processing unit of the point cloud data processing system performs object detection on the first point cloud map according to the first detection model to obtain a first detection frame. For the detailed implementation of S1021-S1022, refer to the relevant introduction in conjunction with S610 above, and will not be repeated here.

S1031 (optional): The processing unit of the point cloud data processing system splices the N frames of point cloud data in the M frames of point cloud data to obtain the first point cloud data. It can be understood that S1031 is only executed when several frames of point cloud data (that is, N≥2) in M frames of point cloud data are combined as the first point cloud data, if the M frames of point cloud data are If a single frame of point cloud data is used as the first point cloud data, S1031 does not need to be executed.

S1032: The processing unit of the point cloud data processing system performs object detection on the first point cloud data according to the second detection model to obtain a third detection frame. Wherein, the third detection frame is used to label dynamic objects in the first point cloud data. It should be understood that within the error range of the second detection model, the third detection frame may also be used to label static objects in the first point cloud data.

S1033: The processing unit of the point cloud data processing system corrects the third detection frame according to the first detection frame to obtain the second detection frame, and the second detection frame is used to label the first point cloud Dynamic objects in data. For the detailed implementation of S1031-S1033, refer to the relevant introduction in connection with S620 above, and will not be repeated here.

S1041: The processing unit of the point cloud data processing system removes the dynamic object in the single frame point cloud data in the M frame point cloud data according to the second detection frame obtained in S1033, and obtains the third point corresponding to the single frame point cloud data cloud data.

S1042: The processing unit of the point cloud data processing system performs splicing processing according to the third point cloud data corresponding to the M frames of point cloud data, to generate a fourth point cloud map.

S1043: The processing unit of the point cloud data processing system uses the second detection frame obtained in S1033 to correspond to the same The second detection frame of the dynamic object is associated.

S1044: The processing unit of the point cloud data processing system performs object-level stitching on the point cloud set marked in the second detection frame corresponding to the same dynamic object obtained in S1043 according to the M frame point cloud data, that is, in different frames of point cloud data The point cloud sets associated with the same dynamic object are aligned and integrated into a dynamic object-level point cloud set at the same time to obtain the dynamic object association result.

S1045: The processing unit of the point cloud data processing system performs fusion processing on the fourth point cloud map obtained in S1042 according to the dynamic object association result obtained in S1044, to obtain a second point cloud map.

For the detailed implementation of S1041-S1045, refer to the relevant introduction in conjunction with S630 above, and will not be repeated here.

S1051: The correction unit of the point cloud data processing system obtains the second point cloud map in S1045 through the user interface output, and the labeler can correct the size, shape, position, etc. of the first detection frame of the static object in the second point cloud map The first attribute, and the second attribute such as the size and shape of the second detection frame marking the dynamic object are used to obtain the fourth detection frame.

S1052: The correction unit of the point cloud data processing system provides the static object and/or dynamic object marked in the second point cloud map by the fourth detection frame obtained after attribute correction to the single frame point corresponding to the second point cloud map The cloud data is output on the user interface, and the annotator can correct the third attribute such as the position of the fourth detection frame of the annotated dynamic object in the single frame point cloud data. Thus, the labeling result of the single frame point cloud data is obtained.

For the detailed implementation of S1051-S1052, refer to the relevant introduction in conjunction with S640 above, and will not be repeated here.

S1061: The training unit of the point cloud data processing system uses the single frame of point cloud data in the M frames of point cloud data and the target detection frame as the third training data, and trains multiple detection models based on the third training data In the process of determining the sixth detection frame, the detection result of the sixth detection frame information is wrong.

S1062: The training unit of the point cloud data processing system uses the second point cloud map and the target detection frame as fourth training data, and determines a seventh detection during the process of training multiple detection models based on the fourth training data frame.

S1063: The correction unit of the point cloud data processing system corrects the seventh detection frame based on the sixth detection frame.

For the detailed implementation of S1061-S1063, please refer to the related introduction of softening technology above, and will not repeat them here.

The present application also provides a point cloud data processing system. FIG. 11 is a schematic structural diagram of a point cloud data processing system 1100 provided in an embodiment of the present application. The point cloud data processing system 1100 can be applied to the system shown in FIG. 4 A server or terminal device in an application scenario. Referring to FIG. 11 , the point cloud data processing system 1100 may include a first acquisition unit 1101 , a second acquisition unit 1102 , a processing unit 1103 and a correction unit 1104 .

Wherein, for example, the first acquisition unit 1101 is configured to acquire a first detection frame, the first detection frame is used to indicate a static object marked in the first point cloud map, and the first point cloud map consists of M frames The point cloud data is obtained, M is an integer greater than or equal to 2; the second acquisition unit 1102 is used to acquire second detection frame information, and the second detection frame is used to indicate the dynamic object marked in the first point cloud data, The first point cloud data is obtained from N frames of point cloud data, N is an integer greater than or equal to 1, and N<M; the processing unit 1103 is configured to process the M frames of point cloud data according to the second detection frame Fusion processing is performed on the single-frame point cloud data in to obtain a second point cloud map; the correction unit 1104 is configured to correct the first detection frame and the second detection frame according to the second point cloud map to obtain A target detection frame, where the target detection frame is used to indicate annotated static objects and/or dynamic objects marked in the single frame point cloud data.

In a possible implementation manner, the first acquisition unit 1101 is configured to: stitch the M frames of point cloud data to obtain the first point cloud map; Object detection is performed on the cloud map to obtain the first detection frame. The first detection model is obtained through training using first training data, and the first training data includes the third point cloud map and static object labeling data.

In a possible implementation manner, the second acquisition unit 1102 is configured to: perform object detection on the first point cloud data according to a second detection model to obtain a third detection frame, wherein the second detection model Obtained by using the second training data for training, the second training data includes the second point cloud data and dynamic object labeling data, and the third detection frame is used to indicate the dynamic object marked in the first point cloud data; Correcting the third detection frame according to the first detection frame to obtain the second detection frame.

In a possible implementation manner, the processing unit 1103 is configured to: according to the second detection frame, remove the dynamic object in the single frame of point cloud data in the M frames of point cloud data to obtain the single frame of point cloud data The corresponding third point cloud data; according to the attribute information of the second detection frame, associate the second detection frame corresponding to the same dynamic object to obtain a dynamic object association result; based on the dynamic object association result, the M frame The third point cloud data corresponding to the point cloud data is fused to obtain the second point cloud map.

In a possible implementation manner, the correcting unit 1104 is configured to: in the second point cloud map, correct the first attribute of the first detection frame marking the static object, and/or correct the dynamic The second attribute of the second detection frame of the object is corrected to obtain the fourth detection frame; in the single frame point cloud data corresponding to the second point cloud map, the third attribute of the fourth detection frame marked with the dynamic object is modified Correction to obtain the fifth detection frame; the fourth detection frame marked with the static object and the fifth detection frame are used as the target detection frame of the M frames of point cloud data.

In a possible implementation manner, the system further includes a training unit. After obtaining the target detection frame, the training unit is configured to perform the following steps: convert the single frame point cloud in the M frames of point cloud data to Data and the target detection frame are used as the third training data, and a sixth detection frame is determined during the process of training multiple detection models based on the third training data, and the detection result of the sixth detection frame is an error; the The second point cloud map and the target detection frame are used as the fourth training data, and the seventh detection frame is determined in the process of training a plurality of detection models based on the fourth training data; the seventh detection frame is corrected based on the sixth detection frame. Seven detection boxes.

The present application also provides a point cloud data processing device 1200. FIG. 12 is a schematic structural diagram of a point cloud data processing device 1200 provided in an embodiment of the present application. The data processing device 1200 can be applied to the scene shown in FIG. 4 server or terminal device in the Referring to FIG. 12 , the point cloud data processing device 1200 includes: a processor 1201 , a memory 1202 and a bus 1203 . Wherein, the processor 1201 and the memory 1202 communicate through the bus 1203 , or communicate through other means such as wireless transmission. The memory 1202 is used to store instructions, and the processor 1201 is used to execute the instructions stored in the memory 1202 . The memory 1202 stores program codes, and the processor 1201 can call the program codes stored in the memory 1202 .

In an optional embodiment of the present application, when the data processing device 1200 is a point cloud data processing device, the processor 1201 is used to execute the above-mentioned method embodiment. For details, please refer to the relevant description above, which will not be repeated here. repeat.

It can be understood that the memory 1202 in FIG. 12 of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash. The volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (Static RAM, SRAM), Dynamic Random Access Memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (Synchlink DRAM, SLDRAM ) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DR RAM). It should be noted that the memory of the systems and methods described herein is intended to include, but not be limited to, these and any other suitable types of memory.

Based on the above embodiments, the embodiments of the present application further provide a computer program, which, when the computer program is run on a computer, causes the computer to execute the above method embodiments.

Based on the above embodiments, an embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a computer, the computer executes the above-mentioned method embodiment. Wherein, the storage medium may be any available medium that can be accessed by a computer. By way of example but not limitation: computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage media or other magnetic storage devices, or may be used to carry or store information in the form of instructions or data structures desired program code and any other medium that can be accessed by a computer.

Based on the above embodiments, an embodiment of the present application further provides a chip for reading a computer program stored in a memory to implement the above method embodiments.

Based on the above embodiments, an embodiment of the present application provides a chip system, where the chip system includes a processor, configured to support a computer device to implement the above method embodiments. In a possible design, the chip system further includes a memory, and the memory is used to store necessary programs and data of the computer device. The system-on-a-chip may consist of chips, or may include chips and other discrete devices.

Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

Apparently, those skilled in the art can make various changes and modifications to this application without departing from the protection scope of this application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application is also intended to include these modifications and variations.

Claims

A method for processing point cloud data, comprising:

Obtain a first detection frame, the first detection frame is used to indicate the static object marked in the first point cloud map, the first point cloud map is obtained from M frames of point cloud data, and M is an integer greater than or equal to 2 ;

Obtain a second detection frame, the second detection frame is used to indicate the dynamic object marked in the first point cloud data, the first point cloud data is obtained from N frames of point cloud data, N is an integer greater than or equal to 1 , N<M;

According to the second detection frame, the single-frame point cloud data in the M frames of point cloud data is fused to obtain a second point cloud map;

Correct the first detection frame and the second detection frame according to the second point cloud map to obtain a target detection frame, and the target detection frame is used to indicate the static marked in the single frame point cloud data object and/or dynamic object.
The method according to claim 1, wherein said obtaining the first detection frame comprises:

Splicing the M frames of point cloud data to obtain the first point cloud map;

Perform target detection on the first point cloud map according to the first detection model to obtain the first detection frame, wherein the first detection model is obtained by training using first training data, and the first training data includes the first Three point cloud maps with static object annotation data.
The method according to claim 1 or 2, wherein said obtaining the second detection frame comprises:

Target detection is performed on the first point cloud data according to the second detection model to obtain a third detection frame, wherein the second detection model is obtained by training using the second training data, and the second training data includes the second point Cloud data and dynamic object labeling data, the third detection frame is used to indicate the dynamic object marked in the first point cloud data;

Correcting the third detection frame according to the first detection frame to obtain the second detection frame.
The method according to any one of claims 1-3, wherein, according to the second detection frame, the single frame point cloud data in the M frames of point cloud data is fused to obtain the first Two point cloud maps, including:

According to the second detection frame, remove the dynamic object in the single-frame point cloud data in the M-frame point cloud data, and obtain the third point cloud data corresponding to the single-frame point cloud data;

According to the attribute information of the second detection frame, associating the second detection frame corresponding to the same dynamic object to obtain a dynamic object association result;

Fusion processing is performed on the third point cloud data corresponding to the M frames of point cloud data based on the dynamic object association result to obtain the two point cloud map.
The method according to any one of claims 1-4, wherein the first detection frame and the second detection frame are corrected according to the second point cloud map to obtain a target detection frame ,include:

In the second point cloud map, the first attribute of the first detection frame marked with the static object is corrected, and/or the second attribute of the second detection frame marked with the dynamic object is corrected to obtain the fourth detected frame, the fourth detection frame is used to indicate the dynamic object and/or static object marked in the second point cloud map;

In the single-frame point cloud data corresponding to the second point cloud map, the third attribute of the fourth detection frame marked with the dynamic object is corrected to obtain the fifth detection frame;

The fourth detection frame and the fifth detection frame marked with a static object are used as target detection frames of the M frames of point cloud data.
The method according to any one of claims 1-5, wherein the method further comprises:

Using the single-frame point cloud data in the M frames of point cloud data and the target detection frame as the third training data, the sixth detection frame is determined in the process of training multiple detection models based on the third training data, so The detection result of the sixth detection frame is wrong;

Using the second point cloud map and the target detection frame as fourth training data, determining a seventh detection frame during the process of training multiple detection models based on the fourth training data;

Correcting the seventh detection frame based on the sixth detection frame.
A point cloud data processing system, characterized in that it comprises:

The first acquisition unit is configured to acquire a first detection frame, the first detection frame is used to indicate the static object marked in the first point cloud map, the first point cloud map is obtained from M frames of point cloud data, and M is an integer greater than or equal to 2;

The second acquisition unit is configured to acquire a second detection frame, the second detection frame is used to indicate the dynamic object marked in the first point cloud data, the first point cloud data is obtained from N frames of point cloud data, N is an integer greater than or equal to 1, N<M;

A processing unit, configured to perform fusion processing on the single frame of point cloud data in the M frames of point cloud data according to the second detection frame, to obtain a second point cloud map;

A correction unit, configured to correct the first detection frame and the second detection frame according to the second point cloud map to obtain a target detection frame, and the target detection frame is used to indicate that the single-frame point cloud Annotated static objects and/or dynamic objects labeled in the data.
The system according to claim 7, wherein the first acquisition unit is used for:

Splicing the M point cloud data to obtain the first point cloud map;

Perform target detection on the first point cloud map according to the first detection model to obtain the first detection frame, wherein the first detection model is obtained by training using first training data, and the first training data includes the first Three point cloud maps with static object annotation data.
The system according to claim 7 or 8, wherein the second acquisition unit is used for:

Target detection is performed on the first point cloud data according to the second detection model to obtain a third detection frame, wherein the second detection model is obtained by training using the second training data, and the second training data includes the second point Cloud data and dynamic object labeling data, the third detection frame is used to indicate the dynamic object marked in the first point cloud data;

Correcting the information of the third detection frame according to the first detection frame to obtain the second detection frame.
The system according to any one of claims 7-9, wherein the processing unit is used for:

According to the second detection frame, remove the dynamic object in the single-frame point cloud data in the M-frame point cloud data, and obtain the third point cloud data corresponding to the single-frame point cloud data;

According to the attribute information of the second detection frame, associating the second detection frame corresponding to the same dynamic object to obtain a dynamic object association result;

Fusion processing is performed on the third point cloud data corresponding to the M frames of point cloud data based on the dynamic object association result to obtain a second point cloud map.
The system according to any one of claims 7-10, wherein the correction unit is used for:

In the second point cloud map, the first attribute of the first detection frame marked with the static object is corrected, and/or the second attribute of the second detection frame marked with the dynamic object is corrected to obtain the fourth detected frame;

In the single-frame point cloud data corresponding to the second point cloud map, the third attribute of the fourth detection frame marked with the dynamic object is corrected to obtain the fifth detection frame;

The fourth detection frame and the fifth detection frame of the marked static object are used as the target detection frame of the M frames of point cloud data.
The system according to any one of claims 7-11, wherein the system further comprises a training unit, the training unit is used for:

Using the single-frame point cloud data in the M frames of point cloud data and the target detection frame as the third training data, the sixth detection frame is determined in the process of training multiple detection models based on the third training data, so The detection result of the sixth detection frame is wrong;

Using the second point cloud map and the target detection frame information as fourth training data, determining a seventh detection frame during the process of training multiple detection models based on the fourth training data;

The correction unit is further configured to: correct the seventh detection frame based on the sixth detection frame.
A point cloud data processing system, characterized in that it includes a memory and a processor,

The memory is used to store programs;

The processor is configured to execute the program stored in the memory, so that the device implements the method according to any one of claims 1-6.
A computer-readable storage medium, characterized in that the computer-readable medium stores program codes, and when the program codes run on the computer, the computer executes the computer program according to any one of claims 1 to 6. method.
A computer program product, characterized in that, when the computer program product is run on a computer, the computer is made to execute the method according to any one of claims 1 to 6.