CN113569877B

CN113569877B - Point cloud data processing method and device and electronic equipment

Info

Publication number: CN113569877B
Application number: CN202111125622.3A
Authority: CN
Inventors: 杨林; 韩志华; 郭立群; 张旭
Original assignee: Suzhou Zhitu Technology Co Ltd
Current assignee: Suzhou Zhitu Technology Co Ltd
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2022-02-25
Anticipated expiration: 2041-09-26
Also published as: CN113569877A

Abstract

The invention provides a point cloud data processing method, a point cloud data processing device and electronic equipment, wherein the method comprises the following steps: acquiring an initial point cloud data set and a front view corresponding to the initial point cloud data set, and performing convolution operation on the front view to obtain a convolution characteristic diagram corresponding to a target object; wherein each feature point in the convolution feature map is determined by the following method: determining a first feature point set corresponding to the convolution range according to the depth of each front view feature point in the convolution range; and in the process of extracting the features corresponding to the convolution range, the depth of the feature points in each front view is considered, the feature map of the target object determined based on the depth information can avoid the feature of other objects at the joint of a plurality of objects, the possibility of wrong feature extraction is reduced, the accuracy of the feature map is further improved, and the robustness of the features is enhanced.

Description

Point cloud data processing method and device and electronic equipment

Technical Field

The invention relates to the technical field of data processing, in particular to a point cloud data processing method and device and electronic equipment.

Background

With the rapid development of 3D sensor technologies (laser radar, RGB-D cameras, multi-view cameras, etc.) and the gradual reduction of costs, more and more autonomous driving apparatuses start to use 3D sensors as indispensable perception sensors. Compare traditional 2D sensor (like color camera), abundant geometric position information in the traffic environment can be acquireed to the 3D sensor, promotes unmanned vehicles's perception performance, and then makes the security and the high efficiency of autopilot equipment all obtain guaranteeing. The method is based on three-dimensional point cloud data acquired from a laser radar and a machine learning (neural network) method, performs target detection, tracking, segmentation and the like on a traffic environment, and is an important component of automatic driving environment perception.

Due to the characteristics of sparsity, disorder, rotation invariance and the like of point cloud data, a traditional 2D convolution mode cannot be directly applied to the point cloud data in the process of learning by using a neural network, and in the prior art, the point cloud data are converted into two-dimensional foresight map point cloud data, and a conventional 2D convolution feature extraction mode is adopted to directly extract features of the foresight map point cloud. However, the target object may have the situations of large and small on the front view, mutual occlusion, and the like, and compared with rasterization or original point cloud data in a 3D space, the mainstream 2D convolution feature extraction method does not consider the geometric size information of the target object and the relative position information between objects, and performs feature extraction on the whole feature region (front view point cloud), so that target features of different position sizes are mixed at a joint, thereby reducing the robustness and accuracy of the features.

Disclosure of Invention

In view of this, the present invention provides a method and an apparatus for processing point cloud data, and an electronic device, so as to improve robustness and accuracy of features extracted for a target object.

In a first aspect, an embodiment of the present invention provides a point cloud data processing method, where the method includes: acquiring an initial point cloud data set and a front view corresponding to the initial point cloud data set; the data points in the initial point cloud data set are three-dimensional data information of a target object acquired through a 3D sensor, the front view is a two-dimensional image, and the characteristic points of the front view in the front view correspond to the data points in the initial point cloud data set one by one; performing convolution operation on the front view to obtain a convolution characteristic diagram corresponding to the target object; wherein each feature point in the convolution feature map is determined by the following method: determining a first feature point set corresponding to the convolution range according to the depth of each front view feature point in the convolution range; and performing feature aggregation on the first feature point set to obtain feature points corresponding to the convolution range.

Further, the step of determining the first feature point set corresponding to the convolution range according to the feature point depth of each front view includes: judging whether the depth difference between the front view feature point and the front view feature point at the center of the convolution range is smaller than a first depth difference threshold value, if so, determining the front view feature point as a feature point participating in convolution; and counting all the characteristic points participating in convolution in the convolution range to generate a first characteristic point set corresponding to the convolution range.

Further, the method further comprises: carrying out deconvolution operation on the convolution characteristic graph to obtain a deconvolution characteristic graph; the deconvolution operation corresponds to a deconvolution range that is the same as the convolution range.

Further, the step of performing deconvolution operation on the convolution feature map to obtain a deconvolution feature map includes: expanding the convolution characteristic diagram to obtain an expanded characteristic diagram; the size of the extended feature map is larger than that of the convolution feature map; acquiring a reference characteristic diagram, wherein the reference characteristic diagram is a characteristic diagram which is generated in the process of obtaining a convolution characteristic diagram by convolving the front view and has the same size as the deconvolution characteristic diagram; performing convolution operation on the extended characteristic diagram according to the reference characteristic diagram to obtain a deconvolution characteristic diagram; wherein each feature point in the deconvolution feature map is determined by the following method: determining the depth difference of the depth of each feature point in the expanded feature map in the deconvolution range and the depth of the feature point at the corresponding position in the reference feature map; if the depth difference is smaller than a second depth difference threshold value, determining the feature point in the deconvolution range as a feature point participating in convolution; counting all the characteristic points participating in convolution in the deconvolution range to generate a second characteristic point set; and performing feature aggregation on the second feature point set to obtain feature points corresponding to the deconvolution range.

Further, the first depth difference threshold value and/or the second depth difference threshold value are/is determined by a depth difference prediction neural network trained in advance.

Further, the method further comprises: and judging whether the execution times of the convolution operation reach a convolution time threshold value, if not, continuing to execute the convolution operation on the feature graph until the execution times of the convolution operation reach the convolution time threshold value.

Further, the method further comprises: and judging whether the execution times of the deconvolution operation reach a deconvolution time threshold, if not, continuing to execute the deconvolution operation on the deconvolution feature graph until the execution times of the deconvolution operation reach the deconvolution time threshold.

Further, the front view is determined by projecting the initial point cloud data set.

Further, the method further comprises: executing a processing task corresponding to the target object according to the feature diagram of the target object; wherein the processing task comprises at least one of: the method comprises a target object detection task, a target object segmentation task and a target object tracking task.

In a second aspect, an embodiment of the present invention further provides a point cloud data processing apparatus, where the apparatus includes: the data acquisition module is used for acquiring an initial point cloud data set and a front view corresponding to the initial point cloud data set; the data points in the initial point cloud data set are three-dimensional data information of a target object acquired through a 3D sensor, the front view is a two-dimensional image, and the characteristic points of the front view in the front view correspond to the data points in the initial point cloud data set one by one; the convolution module is used for carrying out convolution operation on the front view to obtain a convolution characteristic diagram corresponding to the target object; wherein each feature point in the convolution feature map is determined by the following method: determining a first feature point set corresponding to the convolution range according to the depth of each front view feature point in the convolution range; and performing feature aggregation on the first feature point set to obtain feature points corresponding to the convolution range.

In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a processor and a memory, where the memory stores computer-executable instructions that can be executed by the processor, and the processor executes the computer-executable instructions to implement the point cloud data processing method according to the first aspect.

In a fourth aspect, embodiments of the present invention further provide a computer-readable storage medium, where computer-executable instructions are stored, and when the computer-executable instructions are called and executed by a processor, the computer-executable instructions cause the processor to implement the point cloud data processing method of the first aspect.

According to the point cloud data processing method, the point cloud data processing device and the electronic equipment, the initial point cloud data set and the front view corresponding to the initial point cloud data set are obtained, and the front view is subjected to convolution operation to obtain the convolution characteristic diagram of the target object; wherein each feature point in the convolution feature map is determined by the following method: determining a first feature point set corresponding to the convolution range according to the depth of each front view feature point in the convolution range; and performing feature aggregation on the first feature point set to obtain feature points corresponding to the convolution range. In the process of extracting the features corresponding to the convolution ranges, the depths of the feature points in each front view are considered, and the feature points in each convolution range are screened out based on the depths, because the depth information of a plurality of objects is different in the front views, different objects can be distinguished through different depth information, so that the feature graph of the target object determined based on the depth information can avoid the connection of the objects from containing the features of other objects, the possibility of wrong feature extraction is reduced, the accuracy of the feature graph is improved, and the robustness of the features is enhanced.

Additional features and advantages of the disclosure will be set forth in the description which follows, or in part may be learned by the practice of the above-described techniques of the disclosure, or may be learned by practice of the disclosure.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of an electronic system according to an embodiment of the present invention;

fig. 2 is a flowchart of a point cloud data processing method according to an embodiment of the present invention;

FIG. 3 is a flow chart of another method for processing point cloud data according to an embodiment of the present invention;

fig. 4a to fig. 4e are schematic flow diagrams of a method for processing point cloud data in an actual application scenario according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a point cloud data processing apparatus according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of another point cloud data processing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The current point cloud data processing method converts acquired 3D point cloud data into a two-dimensional front view, and directly uses a 2D convolution method for the front view to obtain a feature map of a target object, and depth information included by each feature point in the front view is not considered in the process, so that feature extraction of the target object is easily performed on the basis of feature points of a plurality of objects at the overlapping position of different objects, and the accuracy of the feature extraction is influenced. Accordingly, embodiments of the present invention provide a point cloud data processing method, an apparatus and an electronic device to alleviate the above technical problems.

Referring to fig. 1, a schematic diagram of an electronic system 100 is shown. The electronic system can be used for realizing the point cloud data processing method and device provided by the embodiment of the invention.

As shown in FIG. 1, an electronic system 100 includes one or more processing devices 102, one or more memory devices 104, an input device 106, an output device 108, and one or more image capture devices 110, which are interconnected via a bus system 112 and/or other type of connection mechanism (not shown). It should be noted that the components and structure of the electronic system 100 shown in fig. 1 are exemplary only, and not limiting, and that the electronic system may have other components and structures as desired.

The processing device 102 may be a server, a smart terminal, or a device containing a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, may process data for other components in the electronic system 100, and may control other components in the electronic system 100 to perform point cloud data processing functions.

Storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by processing device 102 to implement the client functionality (implemented by the processing device) of the embodiments of the invention described below and/or other desired functionality. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

Image capture device 110 may acquire the image to be processed and store the image to be processed in storage 104 for use by other components.

For example, the devices used for implementing the point cloud data processing method, apparatus and electronic device according to the embodiments of the present invention may be integrally disposed, or may be dispersedly disposed, such as integrally disposing the processing device 102, the storage device 104, the input device 106 and the output device 108, and disposing the image capturing device 110 at a designated position where an image can be captured. When the above-described devices in the electronic system are integrally provided, the electronic system may be implemented as an intelligent terminal such as a camera, a smart phone, a tablet computer, a vehicle-mounted terminal, and the like.

In a possible implementation manner, an embodiment of the present invention provides a method for processing point cloud data, and in particular, the method for processing point cloud data provided by the embodiment of the present invention may be applied to a vehicle controller, or a background server in communication connection with the vehicle controller, and may be used to process point cloud data acquired by a 3D sensor on a vehicle, so as to perform target detection, tracking, segmentation, and the like on a traffic environment, thereby achieving an automatic driving purpose of the vehicle.

Specifically, fig. 2 is a flowchart of a point cloud data processing method according to an embodiment of the present invention, and referring to fig. 2, the method includes:

s202: acquiring an initial point cloud data set and a front view corresponding to the initial point cloud data set; the data points in the initial point cloud data set are three-dimensional data information of a target object acquired through a 3D sensor, the front view is a two-dimensional image, and the characteristic points of the front view in the front view correspond to the data points in the initial point cloud data set one by one;

the initial point cloud data is point cloud data generated from information obtained by a 3D sensor, such as a laser radar sensor, an RGB-D camera sensor, a multi-view camera, and the like.

The front view is determined by projecting an initial point cloud data set. Specifically, perspective projection may be performed on the initial point cloud data according to a polar coordinate relationship to form a front view point cloud.

The formed front view can be divided into a lossless form and a lossy form according to the attributes of the initial point cloud data. The front view in a lossless form infers the real emission angle and the specific arrangement condition of each point under the condition that specific parameters (provided by a laser radar manufacturer) of the laser radar emission point cloud are known, so as to infer the projection resolution of the front view, and each point in the same frame can accurately correspond to a specific front view grid to form the lossless front view point cloud.

The front view in a lossy form is a front view size and an angular resolution which are artificially set under the condition that specific parameters of the laser radar emission point cloud are unknown, namely, the projection resolution of the front view cannot be deduced. And reversely pushing out the corresponding polar coordinates from the point cloud under the Cartesian coordinate system, and projecting to a front view according to the polar coordinates.

S204: performing convolution operation on the front view to obtain a convolution characteristic diagram corresponding to the target object;

wherein each feature point in the convolution feature map is determined by the following method:

(1) determining a first feature point set corresponding to the convolution range according to the depth of each front view feature point in the convolution range;

(2) and performing feature aggregation on the first feature point set to obtain feature points corresponding to the convolution range.

After determining a convolution range, the convolution operation is to traverse the front view to generate feature points corresponding to each convolution range, for example, 10 × 10 front view, and if it is determined that the convolution range is 5 × 5, starting from the upper left corner, the first convolution range determines a feature point a1, and takes the (0, 1) coordinate point as a starting point, the second convolution range determines a feature point a2, and the above process is repeated until the last feature point a36 is obtained, and then the 36 feature points a1-a36 constitute the feature map of the target object.

According to the point cloud data processing method provided by the embodiment of the invention, the initial point cloud data set and the foresight corresponding to the initial point cloud data set are obtained, and the foresight is subjected to convolution operation to obtain the convolution characteristic diagram of the target object; wherein each feature point in the convolution feature map is determined by the following method: determining a first feature point set corresponding to the convolution range according to the depth of each front view feature point in the convolution range; and performing feature aggregation on the first feature point set to obtain feature points corresponding to the convolution range. In the process of extracting the features corresponding to the convolution ranges, the depths of the feature points in each front view are considered, and the feature points in each convolution range are screened out based on the depths, because the depth information of a plurality of objects is different in the front views, different objects can be distinguished through different depth information, so that the feature graph of the target object determined based on the depth information can avoid the connection of the objects from containing the features of other objects, the possibility of wrong feature extraction is reduced, the accuracy of the feature graph is improved, and the robustness of the features is enhanced.

For convenience of understanding, on the basis of the above method provided by the embodiment of the present invention, fig. 3 further provides another point cloud data processing method provided by the embodiment of the present invention, as shown in fig. 3, the method may include the following steps:

s302: acquiring an initial point cloud data set and a front view corresponding to the initial point cloud data set; the data points in the initial point cloud data set are three-dimensional data information of a target object acquired through a 3D sensor, the front view is a two-dimensional image, and the characteristic points of the front view in the front view correspond to the data points in the initial point cloud data set one by one;

s304: performing convolution operation on the front view to obtain a convolution characteristic diagram corresponding to the target object; wherein each feature point in the convolution feature map is determined by the following method:

(1) judging whether the depth difference between the front view feature point and the front view feature point at the center of the convolution range is smaller than a first depth difference threshold value, if so, determining the front view feature point as a feature point participating in convolution;

specifically, before each convolution operation, whether the depth difference between the depth of each foresight image feature point in the convolution range and the depth of the convolution center is smaller than a first depth difference threshold is judged by taking the convolution center as an origin, and the greater the depth difference, the greater the probability that the point and the convolution center belong to different objects is shown, so that a first depth difference threshold can be set, the feature point with the depth difference larger than the first depth difference threshold is determined as the feature point belonging to different objects from the convolution center, and conversely, the feature point with the depth difference smaller than the first depth difference threshold is determined as the feature point belonging to the same object as the convolution center. And for the feature points belonging to the same object, the subsequent convolution operation is participated, and the feature points not belonging to the same object do not participate in the convolution operation, so that the feature points participating in the convolution belong to the same object, and the accuracy of feature extraction is improved.

(2) And counting all the characteristic points participating in convolution in the convolution range to generate a first characteristic point set corresponding to the convolution range.

(3) And performing feature aggregation on the first feature point set to obtain feature points corresponding to the convolution range.

Feature aggregation is performed on the first feature point set, that is, points in the first feature point set are compressed to obtain a set smaller than the first feature point set, which may be referred to as a feature point corresponding to the convolution range.

S306: and judging whether the execution times of the convolution operation reach a convolution time threshold value, if not, continuing to execute the convolution operation on the feature graph until the execution times of the convolution operation reach the convolution time threshold value.

And performing feature point aggregation on each convolution range to obtain a set consisting of feature points corresponding to each convolution range, and finishing the current convolution operation after obtaining the features corresponding to the last convolution range.

S308: executing a processing task corresponding to the target object according to the feature diagram of the target object; wherein the processing task comprises at least one of: the method comprises a target object detection task, a target object segmentation task and a target object tracking task.

Further, the step S304 may be repeatedly executed for multiple times, that is, each obtained feature point set corresponding to the current convolution layer is used as an initial point cloud set for the next convolution operation, and the convolution operation is repeated to obtain a more accurate and abstract feature map, so as to meet the requirements of target detection, tracking and segmentation tasks with different accuracies.

In some possible embodiments, after obtaining the convolution feature map after the convolution operation, the convolution feature map may be subjected to deconvolution, and the number of times of deconvolution is set according to requirements of different accuracies, so as to obtain a deconvolution feature map of the target accuracy.

Based on this, the method provided by the embodiment of the present invention may further include: carrying out deconvolution operation on the feature map to obtain a deconvolution feature map; the deconvolution operation corresponds to a deconvolution range that is the same as the convolution range.

Specifically, the deconvolution feature map may be determined by:

(1) expanding the convolution characteristic graph to obtain an expanded characteristic graph; the extended feature map is larger in size than the convolved feature map.

The deconvolution operation is the reverse operation of the convolution operation, i.e., the deconvolution operation is equivalent to expanding a set of feature points that is larger than the deconvolution range based on the feature points at the center of the deconvolution. The deconvolution operation may also be referred to as an upsampling operation, and the upsampling strategy used in the deconvolution process may be a common upsampling strategy, such as Transposed Convolution, Nearest neighbor Interpolation, Bilinear Interpolation, Bicubic Interpolation, and the like, and the upsampling strategy is not limited in the embodiments of the present invention.

(2) Acquiring a reference feature map, wherein the reference feature map is a feature map which is generated in the process of obtaining the convolution feature map by convolving the front view and has the same size as the deconvolution feature map;

(3) performing convolution operation on the extended feature map according to the reference feature map to obtain the deconvolution feature map; wherein each feature point in the deconvolution feature map is determined by: determining a depth difference between the depth of each feature point in the extended feature map within the deconvolution range and the depth of a feature point at a corresponding position in the reference feature map; if the depth difference is smaller than a second depth difference threshold value, determining the feature point in the deconvolution range as a feature point participating in convolution; counting all the characteristic points participating in convolution in the deconvolution range to generate a second characteristic point set; and performing feature aggregation on the second feature point set to obtain feature points corresponding to the deconvolution range.

The first depth difference threshold and/or the second depth difference threshold may be set manually or determined by a depth difference prediction neural network trained in advance, and may be the same as or different from the first depth difference threshold and the second depth difference threshold.

For example, the feature map f1 is convolved twice to obtain feature maps f2 and f3, the feature map f3 is deconvoluted, and f3 is first extended to obtain extended feature maps f3 ', f 3' having the same size as f2, and the specific extension manner may be a difference algorithm or other extension algorithms. f2 is the reference feature map of f3 ', and within a deconvolution range, for example, within a range of 2 × 2, whether the depth difference between the feature point of f 3' and the corresponding feature point in f2 is smaller than a second depth difference threshold value, i.e., whether the feature point of (1, 1) position in f3 'is compared with the feature point of (1, 1) position in f2, the feature point of (1, 2) position in f 3' is compared with the feature point of (1, 2) position in f2, the feature point of (2, 1) position in f3 'is compared with the feature point of (2, 1) position in f2, the feature point of (2, 2) position in f 3' is compared with the feature point of (2, 2) position in f2, and it is determined which four feature points are involved in convolution, if the feature points of (2, 1) and (2, 2) position are determined to be involved in convolution feature points, and the two feature points are aggregated to obtain the corresponding deconvolution range, and moving the deconvolution range to the next position, and repeating the judgment process.

In some possible embodiments, different numbers of deconvolution operations may be performed according to different precision requirements, and based on this, the method may further include:

and judging whether the execution times of the deconvolution operation reach a deconvolution time threshold, if not, continuing to execute the deconvolution operation on the deconvolution feature graph until the execution times of the deconvolution operation reach the deconvolution time threshold.

For convenience of understanding, based on the above-mentioned point cloud data processing method, fig. 4 a-4 e show a schematic processing flow of point cloud data, as shown in fig. 4a, the initial point cloud data set is a point cloud data set composed of data including a vehicle and a pedestrian obtained by a lidar sensor, and is represented as a point cloud data set

And N is the number of points in the point cloud. At a certain point

And represents coordinate values in a cartesian coordinate system. The method comprises the following steps:

(1) and (4) carrying out perspective projection on the original point cloud data set according to the polar coordinate relation to form a front view, as shown in fig. 4 b.

(2) The feature depth of the front view is obtained and is denoted by r, wherein the feature depth comprises the feature depth of the initial front view and the feature depth of the front view after each convolution, as shown in fig. 4 c.

(3) And determining a convolution range based on the depth and a threshold value, wherein the threshold value R is set artificially or obtained through deep learning network learning with training. Before each convolution operation, with the convolution center as the origin, within the convolution range:

3-1, setting the characteristic value to zero at the characteristic point with the depth information larger than R;

3-2, keeping the characteristic value unchanged for the characteristic points with the depth information less than or equal to R.

As shown in fig. 4d, to

The center of the convolution is judged, the convolution size is 5 x 5 for example

If there is a corresponding feature point greater than R. Wherein,

therefore, it is

The feature value is a point near the same vehicle, and the corresponding feature value is kept unchanged.

After the judgment, the user can judge the content of the food,

therefore, it is

And setting the corresponding characteristic value to zero, namely not participating in the convolution characteristic extraction.

Wherein the threshold R may be obtained by artificially setting the threshold or from a pre-trained neural network. When the R value is obtained through the neural network, a corresponding R value is obtained for each feature, and in the process of moving the convolution range (convolution kernel), the convolution center uses the R value at the corresponding position to perform feature point screening. If the R value is based on manual settings, each convolutional layer may share one R value.

(4) And performing convolution operation to obtain the characteristic point corresponding to the convolution range, namely the characteristic point corresponding to the P15.

Wherein, the depth of the feature point corresponding to the convolution range is consistent with the depth of P1 before the convolution operation.

In particular, the convolution operation may be a dot product with the feature values using pre-trained convolution weights. The depth value corresponding to the feature after convolution is the original feature depth corresponding to the convolution center.

And after all convolution operations of the neural network in the same layer are finished, obtaining each convolved feature and the feature depth corresponding to the convolved feature.

Further, judging whether the convolution operation reaches the preset convolution times or not according to a preset sampling strategy, if so, entering the step (5), and otherwise, repeating the steps (2) - (3).

Specifically, the preset sampling strategy is set according to requirements, for example, a classical network Unet strategy of a segmentation task is to perform down-sampling four times and then perform up-sampling four times, feature extraction with unchanged size is performed for a plurality of times between every two times of down-sampling or up-sampling, and the size of a feature map finally output is the same as the original size. For another example, a classical network structure YOLO of a detection task is downsampled for 5 times without upsampling, a plurality of times of feature extraction with unchanged size exist between every two downsampling, and then a target detection result is directly output.

The embodiment of the invention does not limit the specific sampling strategy.

(5) Confirming the characteristic extraction range of the up-sampling operator based on the depth and the threshold value.

The threshold R' may be set manually, may be obtained by learning through a pre-training deep learning network, and may be consistent with R used in corresponding feature extraction downsampling (i.e., downsampling convolution operation with the same feature map size).

Feature size after confirmation of upsampling: the size of the feature map after upsampling is consistent with the size of the feature map before the corresponding feature extraction downsampling operation. That is, performing one convolution operation on the original point cloud data set to obtain the feature map a1, and performing another convolution operation on the original point cloud data set to obtain the feature map a2 based on a1, then performing a deconvolution (i.e., upsampling) operation based on a2, and obtaining the feature map, which should have the same size as the feature map of a2 before performing the convolution operation, that is, the feature map A3 obtained by performing the deconvolution operation based on a2 should have the same size as the feature map a 1.

Before each upsampling operation, taking the upsampling center as the origin, within the upsampling range:

5-1, setting the characteristic value to zero at the characteristic point with the depth information larger than R';

5-2, keeping the characteristic value unchanged for the characteristic points with the depth information less than or equal to R'.

As shown in fig. 4e, with deconvolution of the center

Deconvolution in a deconvolution range of 5 × 5 as an example, judgment is made

If there is a corresponding feature point greater than R'.

After the judgment, the user can judge the content of the food,

therefore, it is

The feature values are the nearby points of the same-genus pedestrian, and the corresponding feature values are kept unchanged.

Therefore, it is

I.e. vehicle points, the corresponding eigenvalues are zeroed, i.e. not involved in the deconvolution feature extraction.

(6) And performing feature extraction by using a preset up-sampling operator.

After repeating the steps (2) to (6) for a plurality of times, the feature point cloud can be continuously abstracted and aggregated, and the feature map can be used for tasks such as target detection, segmentation, tracking and the like according to needs. The number of times of convolution/deconvolution application in the embodiment of the present invention can be freely set according to the abstract degree of the required characteristics of the sensing task. For example, for the segmentation task, the class information of each point is required to be output, so the number of down-sampling and up-sampling operations should be consistent to ensure that the feature map size output by the neural network is consistent with the original point cloud front view size. The number of times of convolution and deconvolution is not limited in the embodiment of the present invention.

In some possible embodiments, after the feature map of the target object is obtained by the point cloud data processing method provided by the embodiment of the present invention, a processing task corresponding to the target object may be further executed according to the feature map of the target object; wherein, the processing task corresponding to the target object comprises at least one of the following: the method comprises a target object detection task, a target object segmentation task and a target object tracking task.

In summary, according to the method for processing point cloud data provided by the embodiment of the present invention, in the convolution and deconvolution processes, depth information of feature points of different objects is considered, so that different feature points included in different objects are effectively distinguished, extraction of an error feature is avoided, and the obtained feature is more robust.

Based on the above method embodiment, an embodiment of the present invention further provides a point cloud data processing apparatus, as shown in fig. 5, the apparatus includes:

a data obtaining module 502, configured to obtain an initial point cloud data set and a front view corresponding to the initial point cloud data set; the data points in the initial point cloud data set are three-dimensional data information of a target object acquired through a 3D sensor, the front view is a two-dimensional image, and the characteristic points of the front view in the front view correspond to the data points in the initial point cloud data set one by one;

a convolution module 504, configured to perform a convolution operation on the front view to obtain a convolution feature map corresponding to the target object; wherein each feature point in the convolution feature map is determined by the following method: determining a first feature point set corresponding to the convolution range according to the depth of each front view feature point in the convolution range; and performing feature aggregation on the first feature point set to obtain feature points corresponding to the convolution range.

The point cloud data processing device acquires the initial point cloud data set and a foresight corresponding to the initial point cloud data set, and performs convolution operation on the foresight to obtain a convolution characteristic diagram of a target object; wherein each feature point in the convolution feature map is determined by the following method: determining a first feature point set corresponding to the convolution range according to the depth of each front view feature point in the convolution range; and performing feature aggregation on the first feature point set to obtain feature points corresponding to the convolution range. In the process of extracting the features corresponding to the convolution ranges, the depths of the feature points in each front view are considered, and the feature points in each convolution range are screened out based on the depths, because the depth information of a plurality of objects is different in the front views, different objects can be distinguished through different depth information, so that the feature graph of the target object determined based on the depth information can avoid the connection of the objects from containing the features of other objects, the possibility of wrong feature extraction is reduced, the accuracy of the feature graph is improved, and the robustness of the features is enhanced.

The process of determining the first feature point set corresponding to the convolution range according to the feature point depth of each front view includes: judging whether the depth difference between the front view feature point and the front view feature point at the center of the convolution range is smaller than a first depth difference threshold value, if so, determining the front view feature point as a feature point participating in convolution; and counting all the characteristic points participating in convolution in the convolution range to generate a first characteristic point set corresponding to the convolution range.

Based on the above apparatus shown in fig. 5, another point cloud data processing apparatus is further provided in the embodiments of the present invention, as shown in fig. 6, the apparatus further includes, on the basis of the above apparatus:

a deconvolution module 602, configured to perform a deconvolution operation on the convolution feature map to obtain a deconvolution feature map; the deconvolution operation corresponds to a deconvolution range that is the same as the convolution range.

The deconvolution module 602 is further configured to expand the convolution feature map to obtain an expanded feature map; the size of the extended feature map is larger than that of the convolution feature map; acquiring a reference characteristic diagram, wherein the reference characteristic diagram is a characteristic diagram which is generated in the process of obtaining a convolution characteristic diagram by convolving the front view and has the same size as the deconvolution characteristic diagram; performing convolution operation on the extended characteristic diagram according to the reference characteristic diagram to obtain a deconvolution characteristic diagram; wherein each feature point in the deconvolution feature map is determined by the following method: determining the depth difference of the depth of each feature point in the expanded feature map in the deconvolution range and the depth of the feature point at the corresponding position in the reference feature map; if the depth difference is smaller than a second depth difference threshold value, determining the feature point in the deconvolution range as a feature point participating in convolution; counting all the characteristic points participating in convolution in the deconvolution range to generate a second characteristic point set; and performing feature aggregation on the second feature point set to obtain feature points corresponding to the deconvolution range.

The first depth difference threshold value and/or the second depth difference threshold value are/is determined through a depth difference prediction neural network which is trained in advance.

And a convolution number judging module 604, configured to judge whether the execution number of the convolution operation reaches a convolution number threshold, and if not, continue to execute the convolution operation on the feature map until the execution number of the convolution operation reaches the convolution number threshold.

And a deconvolution time judgment module 606, configured to judge whether the execution time of the deconvolution operation reaches a deconvolution time threshold, and if not, continue to execute the deconvolution operation on the deconvolution feature map until the execution time of the deconvolution operation reaches the deconvolution time threshold.

The front view is determined by projecting an initial point cloud data set.

The processing module 608 is configured to execute a processing task corresponding to the target object according to the feature map of the target object; wherein the processing task comprises at least one of: the method comprises a target object detection task, a target object segmentation task and a target object tracking task.

The implementation principle and the generated technical effect of the point cloud data processing device provided by the embodiment of the invention are the same as those of the embodiment of the method, and for the sake of brief description, no part of the embodiment of the device is mentioned, and reference may be made to the corresponding contents in the embodiment of the point cloud data processing method.

An embodiment of the present invention further provides an electronic device, as shown in fig. 7, which is a schematic structural diagram of the electronic device, where the electronic device includes a processor 701 and a memory 702, the memory 702 stores computer-executable instructions that can be executed by the processor 701, and the processor 701 executes the computer-executable instructions to implement the point cloud data processing method.

In the embodiment shown in fig. 7, the electronic device further comprises a bus 703 and a communication interface 704, wherein the processor 701, the communication interface 704 and the memory 702 are connected by the bus 703.

The Memory 702 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 704 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 703 may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 703 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 7, but this does not indicate only one bus or one type of bus.

The processor 701 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 701. The Processor 701 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory, and the processor 701 reads information in the memory and completes the steps of the point cloud data processing method of the foregoing embodiment in combination with hardware thereof.

The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are called and executed by a processor, the computer-executable instructions cause the processor to implement the point cloud data processing method, and specific implementation may refer to the foregoing method embodiment, and is not described herein again.

The point cloud data processing method, the point cloud data processing device and the computer program product of the electronic device provided by the embodiment of the invention comprise a computer readable storage medium storing program codes, instructions included in the program codes can be used for executing the method described in the previous method embodiment, and specific implementation can refer to the method embodiment, and is not described herein again.

Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A point cloud data processing method, characterized in that the method comprises:

acquiring an initial point cloud data set and a front view corresponding to the initial point cloud data set; the data points in the initial point cloud data set are three-dimensional data information of a target object acquired through a 3D sensor, the front view is a two-dimensional image, and characteristic points of a front view in the front view correspond to the data points in the initial point cloud data set one by one;

performing convolution operation on the front view to obtain a convolution characteristic diagram corresponding to the target object; wherein each feature point in the convolution feature map is determined by: determining a first feature point set corresponding to a convolution range according to the depth of each feature point of the front view in the convolution range; performing feature aggregation on the first feature point set to obtain feature points corresponding to the convolution range;

the method for determining the first feature point set corresponding to the convolution range according to the depth of each front view feature point in the convolution range comprises the following steps:

judging whether the depth difference between the front view feature point and the front view feature point at the center of the convolution range is smaller than a first depth difference threshold value, if so, determining the front view feature point as a feature point participating in convolution;

and counting all the characteristic points participating in convolution in the convolution range to generate a first characteristic point set corresponding to the convolution range.

2. The method of claim 1, further comprising:

performing deconvolution operation on the convolution feature map to obtain a deconvolution feature map; the deconvolution range corresponding to the deconvolution operation is the same as the convolution range.

3. The method of claim 2, wherein the step of deconvolving the convolved feature map to obtain a deconvolved feature map comprises:

expanding the convolution characteristic graph to obtain an expanded characteristic graph; the size of the extended feature map is larger than that of the convolution feature map;

acquiring a reference feature map, wherein the reference feature map is a feature map which is generated in the process of obtaining the convolution feature map by convolving the front view and has the same size as the deconvolution feature map;

performing convolution operation on the extended feature map according to the reference feature map to obtain the deconvolution feature map; wherein each feature point in the deconvolution feature map is determined by: determining a depth difference between the depth of each feature point in the extended feature map within the deconvolution range and the depth of a feature point at a corresponding position in the reference feature map; if the depth difference is smaller than a second depth difference threshold value, determining the feature point in the deconvolution range as a feature point participating in convolution; counting all the characteristic points participating in convolution in the deconvolution range to generate a second characteristic point set; and performing feature aggregation on the second feature point set to obtain feature points corresponding to the deconvolution range.

4. The method of claim 3, wherein the first depth difference threshold and/or the second depth difference threshold is determined by a pre-trained depth difference prediction neural network.

5. The method of claim 1, further comprising:

and judging whether the execution times of the convolution operation reach a convolution time threshold value, if not, continuing to execute the convolution operation on the feature graph until the execution times of the convolution operation reach the convolution time threshold value.

6. The method of claim 2, further comprising:

7. The method of any of claims 1-6, wherein the front view is determined by projecting the initial point cloud data set.

8. The method according to any one of claims 1-6, further comprising: executing a processing task corresponding to the target object according to the feature diagram of the target object; wherein the processing task comprises at least one of: the detection task of the target object, the segmentation task of the target object and the tracking task of the target object.

9. A point cloud data processing apparatus, characterized in that the apparatus comprises:

the system comprises a data acquisition module, a data acquisition module and a data processing module, wherein the data acquisition module is used for acquiring an initial point cloud data set and a front view corresponding to the initial point cloud data set; the data points in the initial point cloud data set are three-dimensional data information of a target object acquired through a 3D sensor, the front view is a two-dimensional image, and characteristic points of a front view in the front view correspond to the data points in the initial point cloud data set one by one;

the convolution module is used for carrying out convolution operation on the front view to obtain a convolution characteristic diagram corresponding to the target object; wherein each feature point in the convolution feature map is determined by: determining a first feature point set corresponding to a convolution range according to the depth of each feature point of the front view in the convolution range; performing feature aggregation on the first feature point set to obtain feature points corresponding to the convolution range;

the process of determining the first feature point set corresponding to the convolution range according to the depth of each front view feature point in the convolution range comprises the following steps:

the convolution module is further used for judging whether the depth difference between the foresight image feature point and the foresight image feature point in the center of the convolution range is smaller than a first depth difference threshold value or not, and if so, determining the foresight image feature point as a feature point participating in convolution;

10. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement the method of any of claims 1 to 8.

11. A computer-readable storage medium having computer-executable instructions stored thereon which, when invoked and executed by a processor, cause the processor to implement the method of any of claims 1 to 8.