CN112966696A

CN112966696A - Method, device and equipment for processing three-dimensional point cloud and storage medium

Info

Publication number: CN112966696A
Application number: CN202110163660.1A
Authority: CN
Inventors: 乔宇; 徐名业; 张钧皓; 周志鹏
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-06-15
Anticipated expiration: 2041-02-05
Also published as: WO2022166400A1; CN112966696B

Abstract

The application is applicable to the technical field of computers, and provides a method for processing three-dimensional point cloud, a device for processing three-dimensional point cloud, equipment for processing three-dimensional point cloud and a storage medium, wherein the method comprises the following steps: acquiring point cloud data comprising a plurality of points; inputting the point cloud data into a trained convolutional neural network for processing to obtain target features corresponding to each point, wherein the convolutional neural network comprises a geometric attention fusion module and a focusing module; and determining the prediction category corresponding to each point based on the target characteristics corresponding to each point. The target characteristics of each point extracted based on the method comprise important geometric information corresponding to each point, so that the extracted target characteristics of each point are more accurate and effective, and the obtained prediction result is very accurate when the category is predicted according to the target characteristics of each point.

Description

Method, device and equipment for processing three-dimensional point cloud and storage medium

Technical Field

The present application belongs to the field of computer technology, and in particular, relates to a method for processing a three-dimensional point cloud, an apparatus for processing a three-dimensional point cloud, a device for processing a three-dimensional point cloud, and a storage medium.

Background

The Point Cloud (also called Point Cloud) is a Point data set of the appearance surface of the product obtained by a measuring instrument in the reverse engineering, and the Point Cloud data has color information in addition to geometric positions. The color information is typically obtained by capturing a color image with a camera and then assigning color information (RGB) of pixels at corresponding locations to corresponding points in the point cloud. The intensity information is obtained by the echo intensity collected by the receiving device of the laser scanner, and the intensity information is related to the surface material, roughness and incident angle direction of the target, and the emission energy and laser wavelength of the instrument.

However, when processing point cloud data, the three-dimensional point cloud data is non-normalized due to the fact that the three-dimensional point cloud data is different from an image, the non-normalized three-dimensional point cloud is projected into a two-dimensional image by a multi-view projection technology, and then the two-dimensional image is processed, the point cloud data needs to be converted into other data formats firstly when being processed at present, for example, the three-dimensional point cloud is projected into the two-dimensional image and is used as the input of a convolutional neural network; however, this process has the following disadvantages: (1) due to occlusion, the process of projection itself causes partial data loss. (2) The process of data conversion is relatively computationally intensive. Therefore, it is necessary to directly construct a convolutional neural network to process the three-dimensional point cloud data.

However, the conventional convolutional neural network which can directly process three-dimensional point cloud data cannot accurately extract the characteristic information of each point, so that the prediction result is inaccurate when the category prediction is performed on the points.

Disclosure of Invention

In view of this, the embodiments of the present application provide a method, an apparatus, a device, and a storage medium for processing a three-dimensional point cloud, so as to solve the problem that when the conventional convolutional neural network that can directly process three-dimensional point cloud data cannot accurately extract feature information of each point, the prediction result is inaccurate when performing category prediction on the points.

A first aspect of an embodiment of the present application provides a method for processing a three-dimensional point cloud, including:

acquiring point cloud data comprising a plurality of points;

inputting the point cloud data into a trained convolutional neural network for processing to obtain target features corresponding to each point, wherein the convolutional neural network comprises a geometric attention fusion module and a focusing module, the geometric attention fusion module is used for extracting local enhancement features of each point, and the focusing module is used for extracting the target features of each point based on the local enhancement features of each point;

and determining the prediction category corresponding to each point based on the target characteristics corresponding to each point.

A second aspect of an embodiment of the present application provides an apparatus for processing a three-dimensional point cloud, including:

an acquisition unit configured to acquire point cloud data including a plurality of points;

the processing unit is used for inputting the point cloud data into a trained convolutional neural network for processing to obtain a target feature corresponding to each point, the convolutional neural network comprises a geometric attention fusion module and a focusing module, the geometric attention fusion module is used for extracting a local enhancement feature of each point, and the focusing module is used for extracting the target feature of each point based on the local enhancement feature of each point;

and the determining unit is used for determining the prediction category corresponding to each point on the basis of the target feature corresponding to each point.

A third aspect of embodiments of the present application provides an apparatus for processing a three-dimensional point cloud, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for processing a three-dimensional point cloud as described in the first aspect when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method of processing a three-dimensional point cloud as described in the first aspect above.

A fifth aspect of embodiments of the present application provides a computer program product, which, when run on an apparatus for processing a three-dimensional point cloud, causes the apparatus for processing a three-dimensional point cloud to perform the steps of the method for processing a three-dimensional point cloud described in the first aspect above.

The method for processing the three-dimensional point cloud, the device for processing the three-dimensional point cloud, the equipment for processing the three-dimensional point cloud and the storage medium have the following beneficial effects:

according to the embodiment of the application, the equipment for processing the three-dimensional point cloud processes the point cloud data through the trained convolutional neural network to obtain the target characteristics corresponding to each point, and the prediction category corresponding to each point is determined based on the target characteristics corresponding to each point. When the target features corresponding to each point are extracted, the local enhancement features of each point are extracted based on the geometric attention fusion module included in the convolutional neural network, and then the target features of each point are extracted and obtained based on the focusing module included in the convolutional neural network and the local enhancement features of each point. The target characteristics of each point extracted based on the method comprise important geometric information corresponding to each point, so that the extracted target characteristics of each point are more accurate and effective, and the obtained prediction result is very accurate when the category is predicted according to the target characteristics of each point.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic diagram of a complex point cloud scene difficult to partition;

FIG. 2 is a schematic flow chart diagram of a method for processing a three-dimensional point cloud provided by an embodiment of the present application;

FIG. 3 is a schematic flow chart diagram illustrating a method for processing a three-dimensional point cloud according to another embodiment of the present application;

FIG. 4 is a schematic diagram of a geometric attention fusion module provided herein;

FIG. 5 is a schematic flow chart diagram illustrating a method for processing a three-dimensional point cloud according to another embodiment of the present application;

FIG. 6 is a schematic flow chart diagram illustrating a method for processing a three-dimensional point cloud according to yet another embodiment of the present application;

FIG. 7 is a schematic view of a focusing module provided herein;

FIG. 8 is a process for evaluating new evaluation criteria for non-partitionable areas as provided herein;

FIG. 9 is a semantic segmentation network for large complex scene point clouds according to the present application;

FIG. 10 is a process of adaptive variation of indistinguishable points in a training process as provided herein;

FIG. 11 provides an application scenario diagram for the present application;

FIG. 12 is a schematic diagram of an apparatus for processing a three-dimensional point cloud according to an embodiment of the present application;

fig. 13 is a schematic diagram of an apparatus for processing a three-dimensional point cloud according to another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the prior art, a Point Cloud (english is called Point Cloud) is a Point data set of an appearance surface of a product obtained by a measuring instrument in a reverse engineering, and some Point Cloud data have color information in addition to a geometric position. The color information is typically obtained by capturing a color image with a camera and then assigning color information (RGB) of pixels at corresponding locations to corresponding points in the point cloud. The intensity information is obtained by the echo intensity collected by the receiving device of the laser scanner, and the intensity information is related to the surface material, roughness and incident angle direction of the target, and the emission energy and laser wavelength of the instrument.

However, when processing point cloud data, the three-dimensional point cloud data is non-normalized due to the fact that the three-dimensional point cloud data is different from an image, the non-normalized three-dimensional point cloud is projected into a two-dimensional image by a multi-view projection technology, and then the two-dimensional image is processed, the point cloud data needs to be converted into other data formats firstly when being processed at present, for example, the three-dimensional point cloud is projected into the two-dimensional image and is used as the input of a convolutional neural network; however, this process has the following disadvantages: (1) due to occlusion, the process of projection itself causes partial data loss. (2) The process of data conversion has large calculation amount, which causes large memory consumption, occupies large computer resources, and easily loses space geometric information in the conversion process.

The non-standardized point cloud data is converted into the spatial voxel data by adopting a voxel conversion method, and although the process can reduce the problem of data loss, the converted voxel data has a large amount of data and has the problem of high redundancy.

In addition, the one-dimensional convolutional neural network can directly operate and process non-standardized point cloud data, and the basic idea is to learn the spatial coding of each point and then aggregate all single point features into one integral representation. But this design does not fully capture the relationship between points.

An enhanced version of the point cloud convolution may divide the point cloud into overlapping local regions according to a distance measurement of the base space and use a two-dimensional convolution to extract the local feature neighborhood structure that captures the fine geometry. However, it only considers local areas of each point and cannot correlate similar local features on the point cloud.

Therefore, it is necessary to directly construct a convolutional neural network to process the three-dimensional point cloud data.

However, the conventional convolutional neural network which can directly process three-dimensional point cloud data cannot accurately extract the characteristic information of each point, so that the prediction result is inaccurate when the category prediction is performed on the points. The existing processing method of the three-dimensional scene point cloud has a poor segmentation effect on difficult-to-partition areas, and the problems are mainly focused on the segmentation edges of objects, the interiors of objects which are easy to confuse and some discrete and confusing small areas.

Referring to fig. 1, fig. 1 is a schematic diagram of a complex point cloud scene difficult to partition. As shown in fig. 1, the first type is a complex boundary region, belonging to boundary points (object boundary and prediction boundary). In most cases, it is difficult to accurately determine the boundaries between different objects. Since the feature of each point is characterized by the information of the local area, the prediction of the boundary points between different classes of objects close to each other in the euclidean space is too smooth, so that the classes of the points cannot be accurately predicted.

The second type is the obfuscated interior region, which contains interior points from different classes of objects with similar textures and geometries. For example, doors and walls have a similar appearance, are almost flat, and have similar colors. In this case, even for a human being, it is difficult to accurately recognize whether some points belong to a door or a wall.

The third type is isolated small regions, which are scattered and difficult to predict. Furthermore, due to occlusion, objects in the scene are not fully captured by the device. Therefore, for points in an isolated small region, the class to which they belong cannot be predicted accurately.

In view of this, the present application provides a method for processing a three-dimensional point cloud, in the method, in an embodiment of the present application, an apparatus for processing a three-dimensional point cloud processes point cloud data through a trained convolutional neural network to obtain a target feature corresponding to each point, and determines a prediction category corresponding to each point based on the target feature corresponding to each point. When the target features corresponding to each point are extracted, the local enhancement features of each point are extracted based on the geometric attention fusion module included in the convolutional neural network, and then the target features of each point are extracted and obtained based on the focusing module included in the convolutional neural network and the local enhancement features of each point. The target characteristics of each point extracted based on the method comprise important geometric information corresponding to each point, so that the extracted target characteristics of each point are more accurate and effective, and the obtained prediction result is very accurate when the category is predicted according to the target characteristics of each point.

The method for processing the three-dimensional point cloud can be applied to various fields needing analysis of the three-dimensional point cloud, such as the fields of human-computer interaction such as automatic driving (such as obstacle detection, automatic path planning of automatic driving equipment and the like), robots (object detection, route identification and the like of a family service robot) and the like. The description is given for illustrative purposes only and is not intended to be limiting.

Referring to fig. 2, fig. 2 is a schematic flowchart of a method for processing a three-dimensional point cloud according to an embodiment of the present disclosure. The main execution body of the method for processing the three-dimensional point cloud in this embodiment is an apparatus for processing the three-dimensional point cloud, and the apparatus includes, but is not limited to, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), a notebook computer, an ultra-mobile Personal computer (UMPC), a netbook, an independent server, a distributed server, a server cluster or a cloud server, and may further include a terminal such as a desktop computer. The method for processing the three-dimensional point cloud as shown in fig. 2 may include S101 to S103, and the specific implementation principle of each step is as follows.

S101: point cloud data comprising a plurality of points is acquired.

Point cloud data comprising a plurality of points may be acquired by an apparatus that processes a three-dimensional point cloud. Specifically, if the device for processing the three-dimensional point cloud includes a laser device, a stereo camera, or a transit time camera, the device may perform acquisition by the laser device, the stereo camera, or the transit time camera. Specifically, the point cloud data of the three-dimensional object can be acquired by adopting a data acquisition method based on automatic point cloud splicing, in the acquisition process, a plurality of stations can be used for scanning and splicing the station data together to obtain the point cloud data, and the accurate registration of the point clouds at different angles is realized by a method of iteratively optimizing coordinate transformation parameters.

The method can also be used for acquiring point cloud data through other equipment and transmitting the acquired point cloud data to the equipment for processing the three-dimensional point cloud. The description is given for illustrative purposes only and is not intended to be limiting.

S102: and inputting the point cloud data into a trained convolutional neural network for processing to obtain target features corresponding to each point, wherein the convolutional neural network comprises a geometric attention fusion module and a focusing module, the geometric attention fusion module is used for extracting local enhancement features of each point, and the focusing module is used for extracting the target features of each point based on the local enhancement features of each point.

S103: and determining the prediction category corresponding to each point based on the target characteristics corresponding to each point.

In this embodiment, a device for processing a three-dimensional point cloud stores a convolutional neural network trained in advance. The convolutional neural network is obtained by training an initial convolutional neural network based on a training set and a test set by using a machine learning algorithm. The convolutional neural network comprises a geometric attention fusion module and a focusing module, wherein the geometric attention fusion module is used for extracting local enhanced features of each point, and the focusing module is used for extracting target features of each point based on the local enhanced features of each point. The training set comprises sample point cloud data of a plurality of sample points, and the testing set comprises sample characteristics and sample categories corresponding to each sample point.

It can be understood that the convolutional neural network may be trained in advance by a device for processing the three-dimensional point cloud, or may be transplanted to a device for processing the three-dimensional point cloud after being trained in advance by other devices. That is, the execution agent that trains the convolutional neural network may be the same as or different from the execution agent that uses the convolutional neural network. For example, when the initial convolutional neural network is trained by using other devices, after the training of the initial convolutional neural network is finished by the other devices, the network parameters of the initial convolutional neural network are fixed, and a file corresponding to the convolutional neural network is obtained. The file is then migrated to a device that processes the three-dimensional point cloud.

After acquiring point cloud data of a plurality of points, equipment for processing the three-dimensional point cloud extracts local enhancement features of each point by using a geometric attention fusion module included in a convolutional neural network; and extracting the target characteristics of each point by using a focusing module included in the convolutional neural network based on the local enhancement characteristics of each point.

Determining a prediction probability value corresponding to each category corresponding to each point based on the target characteristics corresponding to each point; and determining the prediction category corresponding to each point based on the prediction probability values corresponding to the categories.

In this embodiment, when extracting the target feature corresponding to each point, the local enhancement feature of each point is extracted based on the geometric attention fusion module included in the convolutional neural network, and then the target feature of each point is extracted and obtained based on the focusing module included in the convolutional neural network and the local enhancement feature of each point. The target characteristics of each point extracted based on the method comprise important geometric information corresponding to each point, so that the extracted target characteristics of each point are more accurate and effective, and the obtained prediction result is very accurate when the category is predicted according to the target characteristics of each point.

Fig. 3 is a schematic flow chart of a method for processing a three-dimensional point cloud according to another embodiment of the present application, which mainly relates to a possible implementation manner of extracting a local enhancement feature of each point based on a geometric attention fusion module. Referring to fig. 3, the method includes:

s201: for each point in the point cloud data, acquiring a neighboring point of the point in Euclidean space based on the geometric attention fusion module, and determining the neighboring point of the point in feature value space based on the neighboring point of the point in Euclidean space.

For each point in the point cloud data, closeness with KThe neighbor query algorithm acquires neighbor points of the point in the Euclidean space, and a characteristic value graph structure is determined based on the neighbor points of the point in the Euclidean space; determining a three-dimensional structure tensor based on the eigenvalue graph structure; decomposing the three-dimensional structure tensor to obtain an eigenvalue matrix; a neighbor point of the point in the eigenvalue space is determined based on the eigenvalue matrix. Alternatively, a feature value tuple is calculated based on the original coordinates of each point and expressed as

And as input features for that point

S202: and fusing the neighbor points of the points in the Euclidean space and the neighbor points of the points in the characteristic value space to obtain the corresponding local characteristics of the points.

The local feature corresponding to the point can be obtained by fusing the neighboring point of the point in the Euclidean space and the neighboring point of the point in the feature value space through the following formula (1), wherein the formula (1) is as follows:

in the above-mentioned formula (1),

a local feature corresponding to each point is represented,

is a model with a set of learnable non-linear functions that, in the geometric attention fusion module in an embodiment,

is a two-layer two-dimensional convolution,

indicating a cascade operation, the previous one

For representing features in Euclidean space, the latter

For representing features in a feature value space.

S203: and aggregating the local features corresponding to the points to obtain local enhancement features corresponding to the points.

For each point, the local features calculated to obtain each point correspondence in S202 are aggregated into

I.e. the local enhancement feature corresponding to the point is obtained.

Optionally, in a possible implementation manner, the local features corresponding to the point may be aggregated based on the attention pooling manner, so as to obtain the local enhancement features corresponding to the point.

Specifically, the local features corresponding to the point can be aggregated through the following formula (2), and the local enhancement features corresponding to the point are obtained, where the formula (2) is as follows:

in the above-mentioned formula (2),

the local enhancement features corresponding to each point are represented,

is to have a set of non-linear functions that can be learned.

To facilitate understanding of the process of extracting the local enhancement feature of each point by the geometric attention fusion module, please refer to fig. 4, where fig. 4 is a schematic diagram of a geometric attention fusion module provided in the present application. As shown in fig. 4, the geometric attention fusion module may also be referred to as a geometric-based attention fusion module, where the geometric attention fusion module inputs the point coordinates, the point features, and the feature roots of each point, performs K nearest neighbors in a feature root space based on the feature roots, and performs K nearest neighbors in an euclidean space based on the point coordinates and the point features of each point, to obtain nearest neighbors of the point in the euclidean space and nearest neighbors of the point in a feature value space. And fusing the neighbor point of the point in the Euclidean space and the neighbor point of the point in the characteristic value space to obtain the local characteristic corresponding to the point. And carrying out dot product and summation processing on the local features corresponding to the dots through a multilayer perceptron to obtain local enhancement features corresponding to the dots. I.e. in the geometry-based attention fusion module, the inputs are point-by-point coordinates, point features and feature roots. In this module, we aggregate features in the eigenvalue space and the euclidean space, and then use the attention pool to generate the output features for each point.

In this embodiment, to better describe each point, we enhance local features with feature values at each point, that is, based on a geometric attention fusion module, extract the locally enhanced features of each point, and effectively retain the geometric information of each point in space. The geometric attention fusion module effectively captures the most important geometric information of each point, effectively fuses the geometric information of each point, is beneficial to follow-up local enhancement features based on each point, and accurately extracts the target features of each point.

Fig. 5 is a schematic flow chart of a method for processing a three-dimensional point cloud according to another embodiment of the present application, and mainly relates to a possible implementation manner of extracting a target feature of each point based on a local enhanced feature of each point. Referring to fig. 5, the method includes:

s301: and carrying out local difference on each point based on the local enhancement features of each point to obtain the local difference of each point.

S302: and determining the non-resolvable point in the plurality of points according to the corresponding local difference of each point.

S303: and extracting the target characteristics corresponding to each indistinguishable point by adopting a multilayer perceptron.

The obtained multiple points include an indistinguishable point, which is a point of the multiple points where the prediction type is not easily determined, that is, a point of a difficult-to-partition area in the complex point cloud scene difficult-to-partition area schematic diagram shown in fig. 1. I.e. complex border regions, interior regions of confusion and points in isolated cells.

Illustratively, the focusing module, which may also be referred to as an Indistinguishable Area Focusing (IAF) module, can adaptively select indistinguishable points and enhance the characteristics of each point.

The IAF module is a new indistinguishable region model based on hierarchical semantic features, and the model can adaptively select indistinguishable points. In order to enhance the characteristics of the indistinguishable points, the IAF model first acquires the fine-grained characteristics and the high-level semantic characteristics of the indistinguishable points, and then enhances the characteristics through non-local operations between the indistinguishable points and the corresponding whole point sets.

In order to adaptively find the indistinguishable points in the training process, the indistinguishable points can be mined by using low-level geometric information and high-level semantic information.

Local differences refer to the difference between each point and its neighbors. The local difference reflects the difference of each point to a certain extent, and the difference depends on the geometrical features of low level, the latent space and the semantic features of high level. So we use local differences as criteria to mine the indistinguishable points. For each point p_iWe get K neighbors in euclidean space, then we get the following local differences for each point in each layer, then we accumulate these local differences together, we adjust the descending order according to the local difference accumulation results, then choose a portion of points with larger local differences as indistinguishable points. Corresponding to the three regions of points we have mentioned before, these indistinguishable points change dynamically as the network is iteratively updated. It should be noted that at the beginning of training, the indistinguishable points are distributed over areas where the original properties (coordinates and color) change rapidly. As the training process progresses, the indistinguishable points are located in the indistinguishable regions mentioned in the introduction. We aggregate the label predictions of the intermediate features and the indistinguishable points, and then use the multi-layered perceptron to extract the features of the indistinguishable points separately, in order to enhance the pointsThe present application implicitly enhances the characteristics of the non-resolvable points by updating the characteristics of all points through the following equations using a non-local mechanism. In addition, we compute the prediction output of the current layer.

Illustratively, each point is locally differentiated based on the local enhancement features of each point, and the local difference of each point is obtained, which can be realized by the following formula (3), where the formula (3) is as follows:

we then accumulate these local differences together by the following equation (4):

we are based on LD^lIn descending order, and then selecting the uppermost

The dots serve as indistinguishable dots.

Fig. 6 is a schematic flow chart of a method for processing a three-dimensional point cloud according to still another embodiment of the present application, which mainly relates to a possible implementation manner of extracting a target feature corresponding to each unresolvable point by using a multi-layer sensor. Referring to fig. 6, the method includes:

s401: and acquiring a prediction label corresponding to each indistinguishable point, and acquiring an intermediate feature corresponding to each indistinguishable point.

The prediction label corresponding to each of the unresolvable points can be obtained by the following formula (5).

S402: and for each indistinguishable point, aggregating the predicted labels and the intermediate features corresponding to the indistinguishable points to obtain an aggregation result corresponding to the indistinguishable points.

S403: and extracting the target characteristics corresponding to each indistinguishable point by adopting a multilayer perceptron based on the aggregation result corresponding to each indistinguishable point.

For each non-resolvable point, the predicted labels and the intermediate features corresponding to the non-resolvable points can be aggregated through the following formula (6), so that an aggregation result corresponding to the non-resolvable points is obtained.

We aggregate the label predictions of the intermediate features and the indistinguishable points and then extract the features of the indistinguishable points separately using the multi-layered perceptron.

j∈M_l-1Indicating that the points belong to an indistinguishable set of points.

In order to enhance the characteristics of the dots, particularly the characteristics of the non-resolvable dots, the characteristics of all the dots are updated by the following formula (7) using a non-local mechanism, thereby implicitly enhancing the characteristics of the non-resolvable dots.

To facilitate understanding of the processing procedure of the focusing module, please refer to fig. 7, and fig. 7 is a schematic diagram of a focusing module provided in the present application. As shown in fig. 7, the focus module may also be referred to as a non-zoned focus processing module. And performing upsampling, multilayer perceptron learning and other processing on the input features, the features corresponding to the coding layer and the predicted values of the previous layer, finally extracting the indistinguishable points and the target features corresponding to the indistinguishable points, and calculating the predicted output of the current layer.

In this embodiment, a new indistinguishable area-focused network (IAF-Net) is proposed that adaptively selects indistinguishable points using hierarchical semantic features and enhances the fine-grained features of the points, particularly those indistinguishable points. We also introduced multi-stage losses, improving the characterization in a progressive manner; in the aspect of network design, a cascade structure is adopted to learn the geometric characteristics of the point cloud data progressively.

Optionally, in a possible implementation manner, the present application further provides a new evaluation criterion for the non-partitioned area. Whether the prediction category corresponding to each indistinguishable point is accurate can be evaluated based on a preset measurement method; and when the number of the indistinguishable points with accurate prediction categories is detected not to meet the preset threshold, continuing to train the convolutional neural network.

Specifically, in order to better distinguish the effects of different methods in three-dimensional semantic segmentation, a new evaluation method based on indistinguishable point measurement is proposed. This evaluation index focuses on the effectiveness of the segmentation method for indistinguishable regions. For the entire point cloud P ═ { P }₁,p₂,....,p_NWe have prediction data Pred ═ Z_iI is more than or equal to 1 and less than or equal to N and a true data value Label (Z)_i,gt,1≤i≤N}。

For all conditions Z satisfied_i≠Z_i,gtP of (a)_iPredicting K neighbor points in a point Euclidean space to be Z_iJ is more than or equal to 1 and less than or equal to K, and then the condition Z is satisfied statistically_i≠Z_i,gtNumber of points m_iThen handle

Using 0, zeta₁，ζ ₂1 into three parts S1, S2, S3, and finally used

As a new evaluation criterion, it corresponds to the segmentation performance on three kinds of non-partitioned areas.

For ease of understanding, please refer to fig. 8, which is a process for evaluating the new evaluation criteria for the non-partitioned area provided by the present application.

Optionally, in a possible implementation manner, determining a prediction category corresponding to each point based on the target feature corresponding to each point includes: and determining the prediction probability value corresponding to each category corresponding to each indistinguishable point based on the target characteristics corresponding to each indistinguishable point. And determining the prediction category corresponding to each indistinguishable point based on the prediction probability value corresponding to each category.

Illustratively, please refer to fig. 9, and fig. 9 is a semantic segmentation network for a large complex scene point cloud provided by the present application. The semantic segmentation network comprises a feature extraction unit and a segmentation unit.

In the feature extraction unit, we use the hierarchical structure to learn features of each level. The network takes N points as input, and extracts the characteristics of the point cloud by using the geometric attention module and the non-partitioned area focusing processing module mentioned in the second aspect of the first aspect.

For segmentation, the network connects each level, and then calculates to obtain the class prediction probability corresponding to each point in the point cloud. Illustratively, by the segmentation unit, the prediction probability values corresponding to the respective classes corresponding to each of the unresolvable points are determined. And determining the prediction category corresponding to each indistinguishable point based on the prediction probability value corresponding to each category. For example, if the predicted probability value for a category corresponding to a certain unresolvable point as a table is 0.6 and the predicted probability value for a category corresponding to a book is 0.9, the predicted category corresponding to the unresolvable point is a book. The description is given for illustrative purposes only and is not intended to be limiting.

In the above embodiment, the indistinguishable points include points located on complex boundaries, points with similar local textures but different categories, and points in isolated small hard regions, which greatly affect the performance of three-dimensional semantic segmentation.

To solve this problem, we propose a new indistinguishable area-focused network (IAF-Net) that adaptively selects indistinguishable points using hierarchical semantic features and enhances the fine-grained features of the points, especially those that are indistinguishable. We also introduce multi-stage losses to improve the characterization in a progressive manner. In addition, in order to analyze the segmentation performance of the indistinguishable region, a new indistinguishable point-based metric method (IPBM) is proposed. Our IAF-Net achieves comparable results to the latest performance on some popular 3D point cloud datasets (such as S3DIS and ScanNet) and is significantly better than other approaches on IPBM.

According to the method and the device, the point cloud data are directly processed by setting the point cloud convolution neural network shared by the local geometric information, the point cloud data do not need to be converted into other complex data formats, the occupation of a memory and the consumption of computer resources are reduced, and rich characteristic data can be extracted more quickly. And the method is more favorable for exploring the geometrical characteristics of the integral structure of the point cloud edge contour, thereby improving the precision of classification and segmentation tasks.

Optionally, before the point cloud data is input into a trained convolutional neural network for processing to obtain a target feature corresponding to each point, the method for processing a three-dimensional point cloud provided by the present application further includes: acquiring a training set and a test set, wherein the training set comprises sample point cloud data of a plurality of sample points, and the test set comprises sample characteristics and sample categories corresponding to each sample point; training the initial convolutional neural network through the training set to obtain a convolutional neural network in training; validating the convolutional neural network in the training based on the sample set; when the verification result does not meet the preset condition, adjusting the network parameters of the convolutional neural network in the training, and continuing to train the convolutional neural network in the training based on the training set; and when the verification result meets the preset condition, stopping training the convolutional neural network in the training, and taking the trained convolutional neural network as the trained convolutional neural network.

When the training set and the test set are obtained, the device can collect sample point cloud data of a plurality of sample points, or other devices can transmit the collected sample point cloud data to the device. Optionally, whether the point cloud data is acquired by the device or transmitted to the device after being acquired by other devices, the point cloud data can be enhanced by rotating points in the point cloud data and/or disturbing point coordinates of points in the point source data within a predetermined range around the points; and/or randomly deleting the points in the point cloud data. Illustratively, the random probability is randomly generated according to a preset maximum random probability, and the points in the point cloud data are deleted according to the generated random probability. Based on experiments, the data enhancement method can enhance the generalization ability of convolutional neural network learning, and further improve the accuracy of tests on a test set (point cloud data which is not used during training).

When the parameters of the convolutional neural network are input, the following steps can be further carried out: and manually classifying and screening the collected three-dimensional point cloud data according to the categories to finish preliminary data preparation work. A convolution kernel of a trained convolutional neural network of the first portion of point cloud data of the classified category may be used to obtain a trained convolutional neural network; the second portion of the point cloud data in the classification category may be used as validation data to evaluate the convolutional neural network. For example, according to the data sorting process, 90% of data of each category of the three-dimensional point cloud is selected as training data for network training, and the rest 10% of data is reserved as experimental verification data for later evaluation on model identification accuracy and generalization capability.

Illustratively, please refer to fig. 10, fig. 10 is a process of adaptive variation of an unresolvable point in a training process provided by the present application. At the beginning of training, the indistinguishable points are distributed over regions where the original properties (coordinates and color) change rapidly. As the training process progresses, the indistinguishable points are located in the indistinguishable regions mentioned in the introduction.

Optionally, after extracting the features of the point cloud data, the method further comprises: after several deconvolution module processes on the geometric feature information, the geometric features of the point cloud can be extracted using a max K pooling operation for subsequent classification, segmentation, or registration. Supposing that the features obtained by the multilayer convolution module are NxM dimensional vectors, N is the number of points, M is the dimension of each point feature, and the maximum K pooling operation means that the maximum K values are taken from the ith dimensional features of the N points, so that the global feature vector of the KxM dimensional point cloud is finally obtained. The output characteristics of each layer of convolution modules may be combined to perform a maximum pooling operation, and finally through the fully connected layers. In addition, cross-entropy functions can be used as loss functions, and back-propagation algorithms can be used to train and optimize the model. For the segmentation task, on the basis of obtaining global features, the global features and the object class information of the point cloud are used as local features of the point cloud, higher-dimensional local cloud features are formed after the point cloud, and segmentation prediction is carried out through the prediction probability of object segmentation parts obtained by a multilayer perceptron and a normalized index function after the local features of the point cloud are extracted.

The method comprises the steps of designing a convolutional neural network structure for three-dimensional point cloud classification and segmentation, adjusting network parameters of the neural network, including but not limited to (learning rate and batch size), and promoting the convolutional neural network to converge to the optimal network model optimization direction by adopting different learning strategies; and finally, testing the verification data by using the trained network model to realize the classification and segmentation of the point cloud. In addition, the geometric information unwrapping convolution designed by the invention is a module in the neural network, can directly extract the characteristics with large and small geometric changes from the signals distributed on the point cloud, and therefore can be combined with other modules in the neural network. A network. The number of input and output channels and the combination of output channels can be altered to achieve the best results in different tasks. Different neural network structures can be designed by using the geometric characteristic information sharing module.

In addition, through experimental verification, the point cloud-oriented feature extraction method described in the application can test the scene segmentation task of large-scale point cloud data (S3DIS, ScanNet). Compared with the current internationally advanced method, the m-IOU of Area-5 is 64.6 percent, and the result of 6-flod is 70.3 percent, so that the method has the leading advantage in performance.

The method can be applied to scene segmentation tasks and three-dimensional scene reconstruction tasks in the fields of unmanned driving and robot vision. Referring to fig. 11, fig. 11 provides an application scenario diagram for the present application. Fig. 11 mainly illustrates the application of the present invention to the scene segmentation task of unmanned vehicle and robot vision. By analyzing and processing the three-dimensional point cloud obtained from the scan, the class and location of the object can be obtained, which is the basis for other tasks in this field.

Illustratively, the method for processing the three-dimensional point cloud can be used for a scene segmentation task of the unmanned intelligent robot. First, point cloud data of a scene is collected by a depth camera, and object categories in the scene point cloud data are marked. Local features of the point cloud are extracted by a convolutional neural network based on geometric sharing and used for pixel-level classification, which is training for scene segmentation. In actual use, the depth camera is used for collecting point cloud data of an actual scene, then a trained neural network is used for extracting local features of the point cloud, and then the scene is segmented. The segmentation results (i.e., the different objects in the scene) are returned to the unmanned vehicle (or smart robot) for data storage and further analysis.

Alternatively, in practical applications, the input features may be changed according to different tasks, for example, the input features are replaced or combined by the distance between a point and a nearby point, the color information of the point, the combination of feature vectors, and the local shape context information of the point.

Optionally, the non-partitioned area focusing module in the network is a portable point cloud feature learning module, and can be applied to other tasks related to point clouds, such as three-dimensional point cloud completion, three-dimensional point cloud detection, and the like, as a feature extractor.

Referring to fig. 12, fig. 12 is a schematic view illustrating an apparatus for processing a three-dimensional point cloud according to an embodiment of the present disclosure. The device comprises units for performing the steps in the embodiments corresponding to fig. 2, 3, 5 and 6. Please refer to the related descriptions in the embodiments corresponding to fig. 2, fig. 3, fig. 5, and fig. 6, respectively. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 11, it includes:

an acquisition unit 510 for acquiring point cloud data including a plurality of points;

a processing unit 520, configured to input the point cloud data into a trained convolutional neural network for processing, so as to obtain a target feature corresponding to each point, where the convolutional neural network includes a geometric attention fusion module and a focusing module, the geometric attention fusion module is configured to extract a local enhancement feature of each point, and the focusing module is configured to extract a target feature of each point based on the local enhancement feature of each point;

a determining unit 530, configured to determine a prediction category corresponding to each point based on the target feature corresponding to each point.

Optionally, the processing unit 520 is specifically configured to:

for each point in the point cloud data, acquiring a neighboring point of the point in Euclidean space based on the geometric attention fusion module, and determining the neighboring point of the point in feature value space based on the neighboring point of the point in Euclidean space;

fusing the neighboring points of the points in the Euclidean space and the neighboring points of the points in the characteristic value space to obtain the corresponding local characteristics of the points;

and aggregating the local features corresponding to the points to obtain local enhancement features corresponding to the points.

Optionally, the processing unit 520 is further configured to:

and aggregating the local features corresponding to the points based on an attention pooling mode to obtain local enhancement features corresponding to the points.

Optionally, the plurality of points include a non-resolvable point, where a prediction category is not easily determined among the plurality of points, the processing unit 520 is further configured to:

carrying out local difference on each point based on the local enhancement features of each point to obtain the corresponding local difference of each point;

determining the indistinguishable points in the plurality of points according to the local difference corresponding to each point;

and extracting the target characteristics corresponding to each indistinguishable point by adopting a multilayer perceptron.

Optionally, the processing unit 520 is further configured to:

acquiring a prediction label corresponding to each indistinguishable point and acquiring an intermediate feature corresponding to each indistinguishable point;

for each indistinguishable point, aggregating the predicted labels and the intermediate features corresponding to the indistinguishable points to obtain an aggregation result corresponding to the indistinguishable points;

and extracting the target characteristics corresponding to each indistinguishable point by adopting a multilayer perceptron based on the aggregation result corresponding to each indistinguishable point.

Optionally, the determining unit 530 is specifically configured to:

determining a prediction probability value corresponding to each category corresponding to each indistinguishable point based on the target characteristics corresponding to each indistinguishable point;

and determining the prediction category corresponding to each indistinguishable point based on the prediction probability value corresponding to each category.

Optionally, the apparatus further comprises:

the system comprises a sample acquisition unit, a data acquisition unit and a data acquisition unit, wherein the sample acquisition unit is used for acquiring a training set and a test set, the training set comprises sample point cloud data of a plurality of sample points, and the test set comprises sample characteristics and sample categories corresponding to each sample point;

the first training unit is used for training the initial convolutional neural network through the training set to obtain a convolutional neural network in training;

a validation unit for validating the convolutional neural network under training based on the sample set;

the adjusting unit is used for adjusting the network parameters of the convolutional neural network in the training when the verification result does not meet the preset condition, and continuing to train the convolutional neural network in the training based on the training set;

and the second training unit is used for stopping training the convolutional neural network in the training when the verification result meets the preset condition, and taking the trained convolutional neural network as the trained convolutional neural network.

Optionally, the apparatus further comprises:

the evaluation unit is used for evaluating whether the prediction category corresponding to each indistinguishable point is accurate or not based on a preset measurement method;

and the third training unit is used for continuing to train the convolutional neural network when the number of the indistinguishable points with accurate prediction categories is detected not to meet the preset threshold value.

Referring to fig. 13, fig. 13 is a schematic diagram of an apparatus for processing a three-dimensional point cloud according to another embodiment of the present disclosure. As shown in fig. 13, the apparatus 6 for processing a three-dimensional point cloud of this embodiment includes: a processor 60, a memory 61, and computer instructions 62 stored in the memory 61 and executable on the processor 60. The processor 60, when executing the computer instructions 62, implements the steps in the various method embodiments of processing a three-dimensional point cloud described above, such as S101-S103 shown in fig. 2. Alternatively, the processor 60, when executing the computer instructions 62, implements the functions of the units in the embodiments described above, such as the functions of the units 510 to 530 shown in fig. 12.

Illustratively, the computer instructions 62 may be divided into one or more units that are stored in the memory 61 and executed by the processor 60 to accomplish the present application. The one or more units may be a series of computer instruction segments capable of performing specific functions, which are used to describe the execution of the computer instructions 62 in the apparatus 6 for processing a three-dimensional point cloud. For example, the computer instructions 62 may be divided into an acquisition unit, a processing unit, and a determination unit, each unit functioning specifically as described above.

The apparatus for processing the three-dimensional point cloud may include, but is not limited to, a processor 60 and a memory 61. Those skilled in the art will appreciate that fig. 6 is merely an example of the apparatus 6 for processing a three-dimensional point cloud and does not constitute a limitation of the apparatus for processing a three-dimensional point cloud, and may include more or fewer components than those shown, or some components in combination, or different components, for example, the apparatus for processing a three-dimensional point cloud may also include input and output terminals, network access terminals, buses, etc.

The Processor 60 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 61 may be an internal storage unit of the apparatus for processing the three-dimensional point cloud, such as a hard disk or a memory of the apparatus for processing the three-dimensional point cloud. The memory 61 may also be an external storage terminal of the device for processing the three-dimensional point cloud, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are equipped on the device for processing the three-dimensional point cloud. Further, the memory 61 may also include both an internal storage unit and an external storage terminal of the apparatus for processing a three-dimensional point cloud. The memory 61 is used for storing the computer instructions and other programs and data required by the terminal. The memory 61 may also be used to temporarily store data that has been output or is to be output.

The embodiment of the present application further provides a computer storage medium, where the computer storage medium may be nonvolatile or volatile, and the computer storage medium stores a computer program, and the computer program, when executed by a processor, implements the steps in the above-mentioned method for constructing a product knowledge graph.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not cause the essential features of the corresponding technical solutions to depart from the spirit scope of the technical solutions of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims

1. A method of processing a three-dimensional point cloud, comprising:

acquiring point cloud data comprising a plurality of points;

2. The method of claim 1, wherein said extracting the local enhancement features for each of said points comprises:

3. The method of claim 2, wherein said aggregating the local features corresponding to the points to obtain local enhancement features corresponding to the points comprises:

4. The method of claim 1, wherein the plurality of points includes a non-resolvable point, the non-resolvable point being a point of the plurality of points for which a prediction category is not readily determinable, and wherein extracting the target feature for each of the points based on the locally enhanced feature for each of the points comprises:

5. The method of claim 4, wherein extracting the target feature corresponding to each unresolvable point using the multi-layered perceptron comprises:

6. The method of claim 4, wherein determining the prediction category for each point correspondence based on the target feature for each point correspondence comprises:

7. The method of any one of claims 1 to 6, wherein before inputting the point cloud data into a trained convolutional neural network for processing, and obtaining the target feature corresponding to each point, the method further comprises:

acquiring a training set and a test set, wherein the training set comprises sample point cloud data of a plurality of sample points, and the test set comprises sample characteristics and sample categories corresponding to each sample point;

training the initial convolutional neural network through the training set to obtain a convolutional neural network in training;

validating the convolutional neural network in training based on the sample set;

when the verification result does not meet the preset condition, adjusting the network parameters of the convolutional neural network in training, and continuing to train the convolutional neural network in training based on the training set;

and when the verification result meets the preset condition, stopping training the convolutional neural network in the training, and taking the trained convolutional neural network as the trained convolutional neural network.

8. The method of claim 4, wherein the method further comprises:

evaluating whether the prediction category corresponding to each indistinguishable point is accurate or not based on a preset measurement method;

and when the number of the indistinguishable points with accurate prediction categories is detected not to meet a preset threshold value, continuing to train the convolutional neural network.

9. An apparatus for processing a three-dimensional point cloud, comprising:

10. An apparatus for processing a three-dimensional point cloud, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 8 when executing the computer program.

11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.