CN113486988B

CN113486988B - Point cloud completion device and method based on adaptive self-attention transformation network

Info

Publication number: CN113486988B
Application number: CN202110890669.2A
Authority: CN
Inventors: 高子淇; 刘文印; 陈俊洪; 梁达勇
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2021-08-04
Filing date: 2021-08-04
Publication date: 2022-02-15
Anticipated expiration: 2041-08-04
Also published as: CN113486988A

Abstract

The invention discloses a point cloud completion device and method based on an adaptive self-attention transformation network, which comprises the following steps: the point cloud sampling module is used for carrying out point cloud sampling twice to obtain two layers of fused point cloud spatial information characteristics; the adaptive self-attention transformation module is used for carrying out adaptive feature fusion according to the point cloud spatial information features; and the completion module is used for performing cloud point completion according to the output result of the adaptive self-attention transformation module. By adopting the technical scheme of the invention, the light-weight calculation amount and the effectiveness of the multiple feature fusion are ensured.

Description

Point cloud completion device and method based on adaptive self-attention transformation network

Technical Field

The invention belongs to the technical field of deep learning, and particularly relates to a point cloud completion device and method based on feature adaptability and a self-attention transformation network.

Background

With the wide application of three-dimensional vision in the fields of robots, autonomous driving and augmented reality, point clouds are widely used as a data representation form with small data volume and fineness. At present, the point cloud data is generally acquired by using a laser radar, a binocular stereo camera or a low-resolution RGB-D radar. However, due to the influence of environmental factors, complete point cloud data cannot be obtained generally, and completing incomplete point clouds becomes an important task. Therefore, under the condition of limited hardware conditions, the point cloud repairing and completing work based on deep learning is the key and the basis of the subsequent point cloud related task. The point cloud is a massive point set of target surface characteristics, and the object surface characteristics are described through the spatial information of the point xyz axis. In the processing process, a large amount of information calculation and feature fusion become the key problems of instant point cloud completion. For example, only three-dimensional surface features of an object part under a visual angle can be extracted under a depth camera of a robot in the existing three-dimensional vision; the problem of missing in point cloud reconstruction of surrounding objects through vision in autonomous driving is that the sizes and volumes of the surrounding objects and the relative distances between the objects cannot be accurately estimated; in summary, the existing point cloud completion network cannot complete the missing surface features of the object obtained by the robot camera in real time.

Disclosure of Invention

The invention provides a point cloud complementing device and method based on an adaptive self-attention transformation network, which are used for point cloud complementing by using a simple and efficient self-attention transformation network, and meanwhile, in order to ensure the flexibility of the network, adaptive networks of different operators are designed for different characteristics, and the integration of the two ensures the light-weight calculated amount and the effectiveness of the multiple characteristic integration.

In order to achieve the purpose, the invention adopts the following technical scheme:

a point cloud completion device based on an adaptive attention transformation network comprises:

the point cloud sampling module is used for carrying out point cloud sampling twice to obtain two layers of fused point cloud spatial information characteristics;

the adaptive self-attention transformation module is used for carrying out adaptive feature fusion according to the point cloud spatial information features;

the complementing module is used for complementing cloud points according to the output result of the adaptive self-attention transformation module;

wherein the point cloud refers to a collection of target surface characteristics that represent an apparent surface of an object; the target surface characteristics are object surface characteristics obtained by scanning through a depth camera in the process of robot vision grabbing, or vehicle surface characteristics obtained in the process of scanning surrounding vehicles through the depth camera in an unmanned mode.

Preferably, the point cloud sampling module includes:

the acquisition unit is used for acquiring relative point cloud coordinates formed by the object with a central origin;

the mapping unit is used for mapping the N point cloud coordinates to a high dimension through convolution, firstly calculating the xyz coordinates of the 3-dimensional object into 64-dimensional vectors through convolution calculation, expressing each point by using one 64-dimensional vector, and then calculating the 64-dimensional vectors into 128-dimensional vectors through one convolution calculation to express so as to obtain global point cloud space position information;

the clustering unit is used for acquiring the distance of all other points corresponding to each point through sampling of the farthest point, sequencing the distances of all the points corresponding to each point, then selecting the features of k points closest to each point through a k adjacent point algorithm, packaging the features into a k multiplied by 128 matrix to form a group, acquiring 1024 groups through the clustering unit for the first time, and acquiring 512 groups through the clustering unit for the second time;

and the extraction unit is used for performing feature fusion by using twice identical convolution calculation to obtain key information, extracting the maximum point cloud space information features of the k multiplied by 128 matrix on k rows by using the maximum pooling layer, and finally forming a 1 multiplied by 128 vector.

Preferably, the adaptive attention transformation module includes:

the first calculation unit is used for calculating the characteristic information through a layer of convolution to obtain a K vector according to the point cloud space information characteristics extracted by the point cloud sampling module, reducing the vector dimension to one fourth, calculating the characteristic information through the same layer of convolution to obtain a Q vector, and calculating a V vector through a layer of convolution without changing the dimension;

the second calculation unit is used for multiplying the Q vector and the K vector to obtain a scoring vector grouped for each point, and then using a soft maximum layer to normalize the scoring vector into an attention map, so that the sum of the scoring vector and the attention map is 1; dividing the attention diagram by the sum of the attention diagram, carrying out normalization again, and carrying out weighted summation on the attention diagram and the V vector to obtain an attention diagram of the fusion feature; finally, subtracting the characteristics which are not fused at the beginning from the fused characteristics at the end to obtain relative spatial characteristic information;

the third calculation unit is used for pre-designing n different convolution kernels to be placed in a convolution pool, firstly placing the relative spatial feature information obtained by the previous calculation unit into an independent scoring network, changing the feature map into n dimensions through one layer of convolution calculation, generating weights for the n convolution kernels, and then normalizing the weights by using a soft maximum layer to enable the sum to be 1; multiplying the weight of each convolution kernel obtained from the scoring network by the convolution kernel, then adding the weight and the convolution kernel into a complete convolution kernel, and performing the last convolution calculation on the relative spatial feature information obtained by the last calculation unit and the convolution kernel to obtain feature-fused spatial feature information; and adding the point cloud spatial information features input from the point cloud sampling module at first and the spatial feature information after feature fusion to obtain complete spatial feature information for point cloud generation.

Preferably, the completion module utilizes the point cloud fusion characteristics of each layer to generate three sections of completion point clouds through an MLP structure; respectively taking the sampling features of the first point cloud sampling layer and the second point cloud sampling layer as the input of two layers of adaptive self-attention transformation modules, and respectively taking the final output dimensions of N/4 x 512 and N/2 x 256; outputting the sampling characteristics of the first point cloud sampling layer through two layers of adaptive self-attention transformation modules for point cloud generation, and reducing 256 dimensions into 3-dimensional xyz space point cloud coordinates by using convolution calculation to obtain integral object point cloud completion; respectively obtaining two outputs by the sampling features of a second point cloud sampling layer through two layers of adaptive self-attention transformation modules, wherein the two outputs are used for generating point clouds, and 512-dimensional space point cloud coordinates are reduced into 3-dimensional xyz space point cloud coordinates by using convolution calculation to obtain point cloud completion of local details of an object; and generating point cloud numbers of N/4 x 3, N/4 x 3 and N/2 x 3 respectively by the three-layer MLP structure, and finally fusing and outputting N x 3 point clouds.

The invention also provides a point cloud completion method based on the self-attention transformation network, which comprises the following steps:

s1, performing two-time point cloud sampling to obtain two-layer fused point cloud spatial information characteristics;

step S2, performing adaptive feature fusion according to the point cloud spatial information features;

s3, according to the result of the adaptability characteristic fusion, cloud point completion is achieved;

Preferably, in step 1, the cloud coordinates of the N points are mapped to a high dimension by convolution to obtain space position information of the global point cloud, then the distance of all other points corresponding to each point is obtained by sampling the farthest point, and then k points closest to each point are clustered by k adjacent point algorithms to form a group; dividing the data into 1024 groups, sampling and grouping again, and dividing into 512 groups; and finally, extracting key useful point cloud spatial information features by adopting a maximum pooling layer.

Preferably, in the step 2, by embedding point cloud space information features, a K vector, a Q vector and a V vector of each grouping space feature are respectively calculated by using three-layer convolution; multiplying the Q vector and the K vector to obtain a scoring vector for grouping each point, multiplying the scoring vector and the V vector to obtain fusion characteristics, and performing final point cloud convolution operation by using a characteristic difference, wherein the adaptive network is applied to a final convolution layer, and the method specifically comprises the following steps of: n different convolution kernels are designed in advance and placed in a convolution pool, firstly, grouping space features are placed in a scoring network, the features are mapped into n dimensions through a layer of convolution, and weights for the n convolution kernels are generated; the weights obtained from the scoring network for each convolution kernel are multiplied by the convolution kernel and finally summed to a complete convolution kernel for convolution calculation.

Preferably, in step 3, three segments of completion point clouds are generated through an MLP structure by utilizing the point cloud fusion characteristics of each layer; respectively taking the sampling features of the first point cloud sampling layer and the second point cloud sampling layer as the input of two layers of adaptive self-attention transformation modules, and respectively taking the final output dimensions of N/4 x 512 and N/2 x 256; outputting the sampling characteristics of the first point cloud sampling layer through two layers of adaptive self-attention transformation modules for point cloud generation, and reducing 256 dimensions into 3-dimensional xyz space point cloud coordinates by using convolution calculation to obtain integral object point cloud completion; and respectively obtaining two outputs by the sampling features of the second point cloud sampling layer through two layers of adaptive self-attention transformation modules, wherein the two outputs are used for generating point clouds, and 512-dimensional points are reduced into 3-dimensional xyz space point cloud coordinates by using convolution calculation to obtain point cloud completion of local details of the object.

Preferably, in step 3, the three-layer MLP structure generates point cloud numbers N/4 × 3, and N/2 × 3, respectively, and finally fuses and outputs N × 3 point clouds.

The point cloud completion method and device based on the adaptive self-attention transformation network greatly reduce convolution operation by using simple linear matrix operation, reduce network parameters, accelerate the calculation efficiency of the network, and simultaneously ensure the high precision of point cloud completion by utilizing the fusion of adaptive characteristics.

Drawings

FIG. 1 is a schematic structural diagram of a point cloud completion device based on an adaptive attention-changing network;

FIG. 2 is a schematic diagram of a point cloud sampling module;

FIG. 3 is a schematic diagram of an adaptive attention transform module;

FIG. 4 is a schematic diagram of a third computing unit in the adaptive attention transform module;

FIG. 5 is a schematic structural diagram of a completion module;

FIG. 6 is a flow chart of a point cloud completion method based on feature adaptive and self-attention transformation networks.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings.

As shown in FIG. 1, the invention provides a point cloud complementing device based on feature adaptability and self-attention transformation network, and the aim of the invention is to complement complete point cloud from partial point cloud. The point cloud refers to a collection of target surface characteristics that represent the apparent surface of an object; the target surface characteristics are object surface characteristics obtained by scanning through a depth camera in the process of robot vision grabbing, or vehicle surface characteristics obtained in the process of scanning surrounding vehicles through the depth camera in an unmanned mode. The spatial information of the point cloud is represented by [ x, y, z ], and represents the coordinates of each point in a three-dimensional coordinate system. It includes: the system comprises a point cloud sampling module, an adaptive self-attention transformation module and a completion module. The point cloud sampling module is used for sampling point clouds twice to obtain two layers of fused point cloud spatial information characteristics; the adaptive self-attention transformation module is used for carrying out adaptive feature fusion according to the point cloud spatial information features; and the completion module is used for realizing cloud point completion according to the output of the adaptive self-attention transformation module.

As shown in fig. 2, the point cloud sampling module maps N point cloud coordinates to a high dimension by convolution to obtain global point cloud spatial position information, then obtains distances of all other points corresponding to each point by farthest point sampling, and then clusters k points closest to each point by k adjacent point algorithms to form a group. After dividing it into 1024 packets, sampling and grouping are performed again, and then divided into 512 packets. After each grouping, in order to better fuse the information of each point in each grouping, two layers of convolution calculation are used for carrying out feature fusion. And finally extracting key useful point cloud characteristic information by adopting a maximum pooling layer. Each grouping thus implies the spatial signature of a number of nearby points.

Further, a point cloud sampling module, comprising:

the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring relative point cloud coordinates of an object formed by a central origin, and the point cloud is a massive point set of target surface characteristics and represents the appearance surface of the object;

the mapping unit is used for mapping the N point cloud coordinates to a high dimension through convolution, firstly calculating the xyz coordinates of the 3-dimensional object into 64-dimensional vectors through convolution calculation, expressing each point by using one 64-dimensional vector, and then calculating the 64-dimensional vectors into 128-dimensional vectors through one convolution calculation to express, so that the spatial position information of the global point cloud is obtained at this time;

the clustering unit acquires the distances of all other points corresponding to each point through sampling of the farthest point, sorts the distances of all points corresponding to each point, selects the features of k points closest to each point through a k adjacent point algorithm, packages the features into a k multiplied by 128 matrix to form a group, and acquires 1024 groups for the first time and 512 groups for the second time through the method.

And the extraction unit is used for performing feature fusion by using twice identical convolution calculation to obtain key information in order to change each k × 128 grouped matrix into a 128-dimensional vector after each grouping, extracting the maximum value of the key information on k rows by using the maximum pooling layer, and finally forming a 1 × 128 vector.

The point cloud sampling module adopts twice point cloud sampling to obtain point cloud characteristics under different resolutions, and the point cloud spatial information characteristics fused by the two layers are used for the next step of characteristic fusion and point cloud generation.

As shown in fig. 3, the adaptive attention transformation module applies the attention transformation network for natural language processing to point cloud completion, and the feature adaptive module makes up for the deficiency of the attention transformation network in feature fusion. As shown in fig. 3, by embedding the point cloud spatial features, K, Q, and V vectors of each packet spatial feature are calculated by applying a triple layer convolution. And multiplying the Q vector and the K vector to obtain a scoring vector for each point group, multiplying the scoring vector and the V vector to obtain a fusion feature, fusing the information of the grouping space features of other points in each point grouping feature, and finally performing final point cloud convolution operation by using a feature difference. Because the small calculation amount of matrix operation inevitably leads to data processing imperfection, and in order to have more flexible convolution processing with better adaptability for different types of point cloud characteristics, the adaptive network is innovatively applied to the final convolution layer. As shown in fig. 4, n different convolution kernels are designed in advance and placed in a convolution pool, the grouping space features are placed in a scoring network first, and the features are mapped into n dimensions through one layer of convolution, so that weights for the n convolution kernels are generated. Since it is by using the features directly for obtaining the weight, this weight only focuses on how to better process the current packet spatial feature information. The weights obtained from the scoring network for each convolution kernel are multiplied by the convolution kernel and finally summed to a complete convolution kernel for convolution calculation. Therefore, only one part of each convolution kernel is adopted, the weights of the grouping features with different distributions can be correspondingly adjusted to obtain more proper convolution kernels for convolution calculation, the effect of multilayer convolution is realized by using one layer of convolution calculation, and the calculation feature information is better processed. The process does not increase more convolution calculation, enhances the flexibility and adaptability of the network to data, and improves the effectiveness of the characteristic information.

Further, an adaptive attention transformation module comprising:

the first calculation unit is used for calculating the characteristic information through a layer of convolution to obtain a K vector according to the point cloud characteristic information extracted by the point cloud sampling module, wherein the vector dimension is reduced to one fourth through the convolution calculation, then a Q vector is obtained through the characteristic information through the same layer of convolution calculation, and a V vector is obtained through the convolution calculation without changing the dimension;

and the second calculation unit is used for multiplying the Q vector and the K vector to obtain a scoring vector grouped for each point, and then normalizing the scoring vectors into an attention map by using a soft maximum layer to enable the sum to be 1. Dividing the attention diagram by the sum of the attention diagram, carrying out normalization again, carrying out weighted summation on the attention diagram and the V vector to obtain an attention diagram of fused features, fusing information of grouped spatial features of other points for each grouped feature, and finally subtracting the feature which is not fused at the beginning from the fused feature to obtain relative spatial feature information;

a third calculating unit, as shown in fig. 4, is configured to improve the last layer of convolution in the self-attention mechanism, create a feature adaptive convolution, pre-design n different convolution kernels to be placed in a convolution pool, place the relative spatial feature information obtained by the previous calculating unit into an independent scoring network, change the feature map into n dimensions through a layer of convolution calculation, generate weights for the n convolution kernels, and normalize the weights by using a soft maximum layer, so that the sum is 1. This weight determines how much of the parameter is used in each convolution kernel to determine how much each convolution kernel is in proportion to the last added convolution kernel. Since this weight is obtained by directly using the relative feature map of the last computing unit, the weight only focuses on how to better process the current packet spatial feature information. The weights for each convolution kernel obtained from the scoring network are multiplied by the convolution kernel and finally summed to form a complete convolution kernel. And performing the last convolution calculation on the relative spatial feature information obtained by the last calculation unit and the convolution kernel to obtain feature-fused spatial feature information. And finally, adding the spatial feature information input from the point cloud sampling module at first and the spatial feature information fused with the last features to obtain complete spatial feature information for generating the point cloud. Therefore, only a part of each convolution kernel is adopted, the weights of the grouping features with different distributions can be correspondingly adjusted to obtain more proper convolution kernels for convolution calculation, the effect of multilayer convolution is realized by using one layer of convolution calculation, and the calculation feature information is better processed. The process does not increase more convolution calculation, enhances the flexibility and adaptability of the network to data, and improves the effectiveness of the characteristic information.

And the completion module is used for realizing cloud point completion according to the output of the adaptive self-attention transformation module, extracting and fusing the point cloud space characteristics of the two layers of structures, and generating three sections of completion point clouds through an MLP structure by utilizing the point cloud fusion characteristics of each layer. As shown in fig. 5, the sampling features of the first point cloud sampling layer and the second point cloud sampling layer are used as the input of the two adaptive attention transformation modules, and the final output dimensions are N/4 × 512 and N/2 × 256, respectively. And outputting the sampling characteristics of the first point cloud sampling layer through two layers of adaptive self-attention transformation modules for point cloud generation, and reducing 256 dimensions into 3-dimensional xyz space point cloud coordinates by using convolution calculation to obtain the whole object point cloud completion. And respectively obtaining two outputs by the sampling features of the second point cloud sampling layer through two layers of adaptive self-attention transformation modules, wherein the two outputs are used for generating point clouds, and 512-dimensional points are reduced into 3-dimensional xyz space point cloud coordinates by using convolution calculation to obtain point cloud completion of local details of the object. And generating point cloud numbers of N/4 x 3, N/4 x 3 and N/2 x 3 respectively by the three-layer MLP structure, and finally fusing and outputting N x 3 point clouds.

As shown in fig. 6, the present invention also provides a point cloud completion method based on feature adaptability and self-attention transformation network, and the object of the present invention is to complete a complete point cloud from partial point clouds, the point cloud refers to a set of surface characteristics of a target, which represents the appearance surface of an object; the target surface characteristic is an object surface characteristic obtained by scanning through a depth camera in the process of robot vision grabbing, or a vehicle surface characteristic obtained in the process of scanning surrounding vehicles through the depth camera in an unmanned manner; the completion method comprises the following steps:

and step S3, according to the result of the adaptive feature fusion, cloud point completion is realized.

On the basis of the existing point cloud completion model, a self-attention transformation network is applied to point cloud completion, so that the parameter quantity and the operation quantity of the model are greatly reduced, and the model processing speed is improved; an adaptive network is added to improve the subsequent characteristic processing capability of the self-attention transformation network, and the flexibility, the adaptability and the effectiveness of data are improved; high accuracy is ensured while reducing the amount of model calculation.

Furthermore, in the process of robot vision grabbing, the incomplete object surface obtained by scanning through the depth camera can be completed quickly through the method and the device, so that the complete object surface is obtained, and grabbing is completed. In the process of scanning surrounding vehicles through the depth camera in unmanned driving, the obtained incomplete vehicle surface characteristics can be quickly subjected to surface completion.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. A point cloud complementing device based on an adaptive attention-adaptive transformation network is characterized by comprising:

wherein the point cloud refers to a collection of target surface characteristics that represent an apparent surface of an object; the target surface characteristic is an object surface characteristic obtained by scanning through a depth camera in the process of robot vision grabbing, or a vehicle surface characteristic obtained in the process of scanning surrounding vehicles through the depth camera in an unmanned manner;

the adaptive attention transformation module comprises:

2. The adaptive attention transformation network-based point cloud complementing device of claim 1, wherein the point cloud sampling module comprises:

the clustering unit is used for acquiring the distance of all other points corresponding to each point through sampling of the farthest point, sequencing the distances of all the points corresponding to each point, then selecting the features of k points closest to each point through a k adjacent point algorithm, and packaging the features into a k multiplied by 128 matrix to form a group; 1024 groups are obtained through the clustering unit for the first time, and 512 groups are obtained through the clustering unit for the second time;

and the extraction unit is used for performing feature fusion by using twice identical convolution calculation to obtain key information in order to change each k multiplied by 128 matrix into a 128-dimensional vector after each grouping, extracting the maximum point cloud space information feature of the k multiplied by 128 matrix on k rows by using the maximum pooling layer, and finally forming a 1 multiplied by 128 vector.

3. The adaptive attention transformation network-based point cloud completion apparatus of claim 1, wherein the completion module utilizes each layer of point cloud fusion features to generate three segments of completion point clouds by an MLP structure; respectively taking the sampling features of the first point cloud sampling layer and the second point cloud sampling layer as the input of two layers of adaptive self-attention transformation modules, and respectively taking the final output dimensions of N/4 x 512 and N/2 x 256; outputting the sampling characteristics of the first point cloud sampling layer through two layers of adaptive self-attention transformation modules for point cloud generation, and reducing 256 dimensions into 3-dimensional xyz space point cloud coordinates by using convolution calculation to obtain integral object point cloud completion; respectively obtaining two outputs by the sampling features of a second point cloud sampling layer through two layers of adaptive self-attention transformation modules, wherein the two outputs are used for generating point clouds, and 512-dimensional space point cloud coordinates are reduced into 3-dimensional xyz space point cloud coordinates by using convolution calculation to obtain point cloud completion of local details of an object; and generating point cloud numbers of N/4 x 3, N/4 x 3 and N/2 x 3 respectively by the three-layer MLP structure, and finally fusing and outputting N x 3 point clouds.

4. A point cloud completion method based on a self-attention transformation network is characterized by comprising the following steps:

step S2, performing adaptive feature fusion according to the point cloud spatial information features, and respectively calculating the K vector, the Q vector and the V vector of each grouping spatial feature by using three-layer convolution through embedding the point cloud spatial information features; multiplying the Q vector and the K vector to obtain a scoring vector for grouping each point, multiplying the scoring vector and the V vector to obtain fusion characteristics, and performing final point cloud convolution operation by using a characteristic difference, wherein the adaptive network is applied to a final convolution layer, and the method specifically comprises the following steps of: n different convolution kernels are designed in advance and placed in a convolution pool, firstly, grouping space features are placed in a scoring network, the features are mapped into n dimensions through a layer of convolution, and weights for the n convolution kernels are generated; multiplying the weight of each convolution kernel obtained from the scoring network by the convolution kernel, and finally adding the weights to form a complete convolution kernel for convolution calculation;

5. The point cloud completion method based on the self-attention transform network as claimed in claim 4, wherein in step 1, the cloud coordinates of N points are mapped to a high dimension by convolution to obtain the space position information of the global point cloud, then the distance of all other points corresponding to each point is obtained by sampling the farthest point, and then k points closest to each point are clustered by k adjacent point algorithms to form a group; dividing the data into 1024 groups, sampling and grouping again, and dividing into 512 groups; and finally, extracting key useful point cloud spatial information features by adopting a maximum pooling layer.

6. The self-attention transformation network-based point cloud completion method of claim 5, wherein in step 3, three segments of completion point clouds are generated by an MLP structure by using each layer of point cloud fusion features; respectively taking the sampling features of the first point cloud sampling layer and the second point cloud sampling layer as the input of two layers of adaptive self-attention transformation modules, and respectively taking the final output dimensions of N/4 x 512 and N/2 x 256; outputting the sampling characteristics of the first point cloud sampling layer through two layers of adaptive self-attention transformation modules for point cloud generation, and reducing 256 dimensions into 3-dimensional xyz space point cloud coordinates by using convolution calculation to obtain integral object point cloud completion; and respectively obtaining two outputs by the sampling features of the second point cloud sampling layer through two layers of adaptive self-attention transformation modules, wherein the two outputs are used for generating point clouds, and 512-dimensional points are reduced into 3-dimensional xyz space point cloud coordinates by using convolution calculation to obtain point cloud completion of local details of the object.

7. The method according to claim 6, wherein in step 3, the three-layered MLP structure generates N/4 x 3, N/4 x 3 and N/2 x 3 point clouds, and finally fuses and outputs N x 3 point clouds.