CN116503552A

CN116503552A - Multi-scale feature fusion-based coarse-to-fine point cloud shape completion method

Info

Publication number: CN116503552A
Application number: CN202310404149.5A
Authority: CN
Inventors: 张德军; 王杨; 徐战亚; 吴亦奇
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2023-07-28

Abstract

The invention provides a method for supplementing a shape of a coarse-to-fine point cloud based on multi-scale feature fusion, which comprises the following steps: stage 1, generating a point cloud skeleton: extracting global features from incomplete point cloud XSupplementing a rough point cloud Y through a multi-layer perceptron _coarse Taking the three points as a skeleton of the complete point cloud; stage 2, refining geometrical features: skeleton Y using a complete point cloud _coarse And the incomplete point cloud X learns to multi-scale local geometric featuresFusing global featuresAnd local geometric featuresThe fused features are gradually up-sampled and a fine complete point cloud Y' is generated through a multi-layer perceptron. The beneficial effects of the invention are as follows: the advantages of different characteristics are effectively utilized, and the accuracy of the point cloud complement neural network is improved; and the neural network is rolled based on the deep learning chart, so that the accuracy of the point cloud complement result is improved.

Description

Multi-scale feature fusion-based coarse-to-fine point cloud shape completion method

Technical Field

The invention relates to the field of point clouds, in particular to a method for supplementing the shape of a coarse-to-fine point cloud based on multi-scale feature fusion.

Background

The traditional three-dimensional shape complement method usually adopts voxels or grids to describe a three-dimensional object, and then three-dimensional convolution is used for extracting features, so that great success is achieved in tasks such as shape complement and three-dimensional reconstruction. However, these methods suffer from the same drawbacks that voxel-based methods consume a lot of memory and cannot generate high-resolution three-dimensional shapes.

To increase resolution, it is proposed to progressively voxel specific regions using octree structures. Recent years have focused on three-dimensional reconstruction of gridding due to the quantization effect of the voxelization operations. The existing mesh representation is a target mesh deformed based on a template mesh, and is only valid for a specific type of three-dimensional data. Thus, researchers turn their eyes towards using unstructured point cloud data as a representation of three-dimensional objects, which results in less computational memory consumption due to their unique data representation, but with the same data size, the fine-grained representation of three-dimensional shapes is more powerful.

However, three-dimensional convolution is not suitable for processing point cloud data because regular convolution operators are not suitable for computing unordered data types. PointNet and PointNet++ are pioneering efforts to directly process three-dimensional point clouds, providing new ideas for point cloud data processing and inspiring research for many downstream tasks.

The point cloud data is extremely easy to acquire, and has the advantages that the storage is convenient, the calculation cost required by processing is lower than that of voxel data, and the like. In recent years, shape complement tasks for point cloud data have been attracting attention. The point cloud complement aims to generate a complete three-dimensional point cloud shape according to the locally observed point cloud. PCN, topNet, SA-Net, ECG is a framework that uses encoder-decoder to achieve point cloud completion. Most of these methods use a PointNet or PointNet++ encoder to extract global features from the incomplete input point cloud and use a decoder to infer the complete point cloud shape from the extracted global features. However, the point cloud complement methods all have a common problem: they do not fully exploit the local information of the point cloud, resulting in a lack of clear local detail of the generated point cloud.

The PCN firstly learns global features from the incomplete point cloud, then generates a coarse complete point cloud, and finally uses the floating-Net to up-sample the complete point cloud to obtain a fine complete point cloud. TopNet predicts the complete point cloud shape using a tree structured decoder. SA-Net adopts a multi-stage Folding structure and skip-coating to repair incomplete point cloud shapes. The ECG firstly uses global features to generate a rough point cloud shape, and then uses edge local features to generate a fine point cloud shape.

To preserve and recover local geometric detail, the local features are utilized to refine the coarse complete point cloud. NSFA combines known features and predicted missing features to complement the complete three-dimensional point cloud shape. NSFA assumes a ratio of known to missing parts of about 1:1 (the observable part is about half of the whole object), however this a priori condition is in most cases very demanding. VRCNet focuses on the framework of an anti-learning and variational automatic encoder to improve the authenticity and consistency of the generated complete shape. Xie et al propose a point cloud generator based on style and contrast differentiable rendering for point cloud completion. Yu et al convert point cloud completion into set-to-set conversion problems and propose a Transformer encoder-decoder architecture (PoinTr) that can better learn structural knowledge, preserve detailed information to complete point cloud completion. However, the network structure of these methods is complex, the network is easy to be over-fitted, and some prior information is needed as a premise.

Disclosure of Invention

In order to solve the above problems, the present invention provides a coarse-to-fine point cloud shape complement method (Multi-scale Feature Fusion Network, MFF-Net) based on Multi-scale feature fusion, mainly comprising:

stage 1, generating a point cloud skeleton:

extracting global features from incomplete point cloud X

Supplementing a rough point cloud Y through a multi-layer perceptron _coarse Taking the three points as a skeleton of the complete point cloud;

stage 2, refining geometrical features:

skeleton Y using a complete point cloud _coarse And the incomplete point cloud X learns to multi-scale local geometric features

Fusing global featuresAnd local geometry->

The fused features are gradually up-sampled and a fine complete point cloud Y' is generated through a multi-layer perceptron.

Further, SA of PointNet++ is adopted to extract global structure information from incomplete point cloud X

Further, the SA gradually abstracts larger and larger local areas in a multi-level hierarchical structure, and more local neighborhood features can be focused when the features of the points with different resolutions are acquired; the multi-level hierarchical encoder is composed of a series of SAs; the SA includes a sampling layer, a grouping layer, and a PointNet layer.

Further, the sampling layer samples the input incomplete point cloud with the most distant point to select a group of points as the mass centers of the respective local neighborhoods, the grouping layer constructs a local neighborhood set by searching adjacent points around the mass centers, and the PointNet layer encodes the local neighborhood points as feature vectors by using MLPs.

Further, the global features are decoded by a multi-layer perceptron to generate a rough complement result, namely a rough point cloud Y _coarse As a skeleton of the complete point cloud.

Further, the method adopts a global and layout feature fusion module with a soft attention mechanism to the global featuresAnd local geometry->Fusion is performed.

Further, after fusion, the information of K adjacent points around each point is aggregated by using a DB with the function of a graph network, and then the information is cascaded with the original point characteristics to form new local characteristics, and then the acquired local characteristics are downsampled by an EP module.

Further, a K-nearest neighbor algorithm is used to select K nearest neighbor points in the feature space for each point to construct a neighborhood graph, thereby extracting edge features.

The technical scheme provided by the invention has the beneficial effects that: the advantages of different characteristics are effectively utilized, and the accuracy of the point cloud complement neural network is improved; and the neural network is rolled based on the deep learning chart, so that the accuracy of the point cloud complement result is improved.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flow chart of a method for shape complement from coarse to fine point cloud based on multi-scale feature fusion in an embodiment of the invention.

Detailed Description

For a clearer understanding of technical features, objects and effects of the present invention, a detailed description of embodiments of the present invention will be made with reference to the accompanying drawings.

The embodiment of the invention provides a method for supplementing the shape of a coarse-to-fine point cloud based on multi-scale feature fusion. And respectively defining the point cloud Y and the point cloud X as a complete three-dimensional point cloud object and a locally observed incomplete point cloud thereof. Wherein X is not a subset of Y. The point cloud completion task is to predict a complete three-dimensional point cloud shape Y' under the condition of a given point cloud X. Samples of X and Y can be easily obtained using a large-scale synthetic dataset, which is implemented using supervised learning methods.

Referring to fig. 1, fig. 1 is a flowchart of a method for completing a shape of a coarse-to-fine point cloud based on multi-scale feature fusion according to an embodiment of the present invention, which specifically includes:

the method uses subtasks of two stages to realize point cloud completion, wherein the two stages are respectively used for generating a point cloud skeleton and refining geometric details. In particular, the MMF-Net is made up of two sub-networks of encoder-decoders and gradually complements the point cloud shape Y' in a coarse-to-fine manner.

Stage 1: and generating a point cloud skeleton. Firstly, extracting global features by using a PointNet++ multi-level point Set extraction module (Set extraction, SA)Then complement a rough point cloud Y by a multi-layer perceptron (multilayer perceptrons, MLPs) _coarse As a skeleton of the complete point cloud.

Stage 2: refining the geometric details. First using a coarse skeleton Y _coarse And the incomplete point cloud X learns to multi-scale local geometric featuresECG fusion->And->The correlation between these two more complementary information is not explicitly considered in the process of (a). In order to combine the local geometric features intimately->And global features->The method designs a global and layout feature fusion module (global and local Feature, GLF) with a soft attention mechanism, while retaining the respective information. And finally, gradually upsampling the fused features and generating a fine complete point cloud Y' through a multi-layer perceptron.

The point cloud completion is realized in a rough-to-fine mode, the learned global features are decoded to generate a rough completion result which is used as a skeleton of the complete point cloud, and then the local geometric features are gradually recovered. Predicting the shape skeleton before refining the local geometry is necessary for the following reasons: (1) The shape skeleton describes a complete point cloud structure, i.e. a region missing in local observation is restored; (2) The shape skeleton may be considered as an adaptive three-dimensional anchor for mining local geometric features in the residual point cloud.

The goal of stage 1 codecs is to generate a complete point cloud structure, the main challenge of which is how to recover the complete three-dimensional shape from the incomplete point cloud while preserving certain local geometry. The method uses SA in PointNet++, gradually abstracts larger and larger local areas in a multi-level hierarchical structure, and can pay attention to more local neighborhood characteristics when acquiring the characteristics of points with different resolutions.

The multi-level hierarchical encoder is composed of a series of SAs. The SA includes a sampling layer, a grouping layer, and a PointNet layer. The sampling layer samples the input incomplete point cloud with the furthest point to select a group of points as the mass centers of the respective local neighborhoods. The grouping layer constructs a local neighborhood set by finding neighboring points around the centroid. The PointNet layer encodes local neighborhood points (including centroids) as feature vectors using MLPs. The encoder finally converts the input incomplete point cloud into a 1024-dimensional global feature, and then generates a rough complete point cloud through a decoder composed of MLPs.

The goal of the stage 2 codec is to further generate a high quality full point cloud shape based on the observed incomplete point cloud and the complement coarse point cloud shape. However, in incomplete, unevenly distributed, and sparsely isolated point clouds, efficient extraction of edge geometry is challenging. Thus, the global shape structure of the coarse point cloud X and the local geometric details in the incomplete point cloud X are utilized to avoid this problem. The point cloud shape skeleton may be considered an adaptive three-dimensional anchor point that facilitates capturing useful local edge features from the point cloud of flaws.

Both stage 1 and stage 2 employ a codec comprising DB, EP and GLF modules.

As shown in FIG. 1, the input of stage 2 includes a complement of coarse point cloud Y _coarse And the original incomplete point cloud X is cascaded, and then the structure skeleton of the complete point cloud is provided and the local geometric details of the input point cloud are reserved. And then, using DB (dense block) with graph network function to aggregate the information of K adjacent points around each point, cascading with the original point characteristics to form new local characteristics, and downsampling the acquired local characteristics through an EP (edge-preserved pooling) module. The above process aggregates the individual point features of each point neighborhood K, resulting in lower resolution point features while preserving more characterizable local features. The K nearest neighbor algorithm is used to select K nearest neighbor points in the feature space for each point to construct a neighborhood graph, thereby extracting edge features. Edge preservation pooling is employed. When the input points are resampled, the characteristics of the selected centroid point and the local neighborhood points thereof are simultaneously propagated to the subsequent network layers for processing. Thus, as the point set becomes more sparse, it experiencesThe field becomes larger. To learn multi-scale edge features of points more efficiently. Next, the local features generated by aggregation are feature-fused with global features of the same resolution generated by the 1 st stage encoder-decoder using GLF, and the fused features are used as input of the next DB. The feature fusion module is used for complementing the advantages of the global features and the local features and enhancing the network capability. The above process is repeated to obtain the characteristics of the multi-resolution points. Finally, the multi-resolution features generated by the encoder-decoder of the 2 nd stage and the features obtained by gradually up-sampling (US) the global features are respectively cascaded, and fine point clouds are generated through MLPs.

The beneficial effects of the invention are as follows: the advantages of different characteristics are effectively utilized, and the accuracy of the point cloud complement neural network is improved; and the neural network is rolled based on the deep learning chart, so that the accuracy of the point cloud complement result is improved.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. A method for supplementing the shape of a coarse-to-fine point cloud based on multi-scale feature fusion is characterized by comprising the following steps of: comprising the following steps:

stage 1, generating a point cloud skeleton:

extracting global feature F from incomplete point cloud X _X ^G ；

stage 2, refining geometrical features:

skeleton Y using a complete point cloud _coarse And the incomplete point cloud X learns to obtain multi-scale local geometric characteristics F _X ^P ；

Fusing global features F _X ^G And local geometric feature F _X ^P ；

2. The method for completing the shape of the point cloud from coarse to fine based on multi-scale feature fusion according to claim 1, wherein the method comprises the following steps of: SA extraction of global structure information F from incomplete cloud X by using PointNet++, and method for extracting global structure information F from incomplete cloud X by using SA _X ^G 。

3. The method for supplementing the shape of the point cloud from coarse to fine based on multi-scale feature fusion as claimed in claim 2, wherein the method comprises the following steps of: the SA gradually abstracts larger and larger local areas in a multi-level hierarchical structure, and can pay attention to more local neighborhood characteristics when acquiring the characteristics of different resolution points; the multi-level hierarchical encoder is composed of a series of SAs that include a sampling layer, a grouping layer, and a PointNet layer.

4. A coarse-to-fine point cloud shape completion method based on multi-scale feature fusion as claimed in claim 3, wherein: the sampling layer samples the input incomplete point cloud with the most distant point to select a group of points as the mass centers of the respective local neighborhoods, the grouping layer constructs a local neighborhood set by searching adjacent points around the mass centers, and the PointNet layer encodes the local neighborhood points into feature vectors by using MLPs.

5. The method for completing the shape of the point cloud from coarse to fine based on multi-scale feature fusion according to claim 1, wherein the method comprises the following steps of: decoding the global features by a multi-layer perceptron to generate a rough complement result, namely a rough point cloud Y _coarse As a skeleton of the complete point cloud.

6. The method for completing the shape of the point cloud from coarse to fine based on multi-scale feature fusion according to claim 1, wherein the method comprises the following steps of: the method adopts a global and layout feature fusion module with a soft attention mechanism to realize global feature F _X ^G And local geometric feature F _X ^P Fusion is performed.

7. The method for completing the shape of the point cloud from coarse to fine based on multi-scale feature fusion according to claim 1, wherein the method comprises the following steps of: after fusion, the DB with the graph network function is used for aggregating the information of K nearest neighbor points around each point, and then the information is cascaded with the original point characteristics to form new local characteristics, and the acquired local characteristics are downsampled through the EP module.

8. The method for completing the shape of the point cloud from coarse to fine based on multi-scale feature fusion according to claim 7, wherein the method comprises the following steps of: the K nearest neighbor algorithm is used to select K nearest neighbor points in the feature space for each point to construct a neighborhood graph, thereby extracting edge features.