WO2022087932A1

WO2022087932A1 - Non-rigid 3d object modeling using scene flow estimation

Info

Publication number: WO2022087932A1
Application number: PCT/CN2020/124586
Authority: WO
Inventors: Taketomi TAKAFUMI; Yang Li; Takehara HIKARI
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2022-05-05
Also published as: CN116391208A

Abstract

A device for reconstructing a non-rigid object, comprising: a depth sensor (204); and a processor configured to repeat steps of: capturing a point cloud with a depth sensor, wherein the first point cloud is set as a canonical model; estimating a scene flow (403) by using the captured point cloud and a previous point cloud; pre-warping the captured point cloud by using the estimated scene flow; estimating deformation parameters for the captured point cloud; warping the pre-warped point cloud by using the deformation parameters; and merging the warped point cloud into the canonical model. The device achieves modeling a non-rigid object accurately and stably.

Description

NON-RIGID 3D OBJECT MODELING USING SCENE FLOW ESTIMATION

TECHNICAL FIELD

The present invention relates to a method and a device for reconstructing a non-rigid 3D object.

BACKGROUND

3D object reconstruction using sequential depth images (point clouds) input from camera sensors has widely been investigated in the fields of computer vision and robotics. To reconstruct a 3D object from sequential depth images, input depth images are aligned and fused into a single coordinate system. A challenging problem in this field is non-rigid object reconstruction in which a reconstruction target is moving and deforming during the scanning process. In this case, the algorithm needs to handle object surface deformations. To achieve non-rigid 3D object reconstruction, non-rigid iterative closest point (ICP) based approaches have been proposed. The non-rigid ICP is a technique to track input point clouds, which can handle object surface deformations using as-rigid-as-possible (ARAP) constraints. By using these methods, surface deformations of the objects during the scanning process can be tracked, and then input point clouds are fused into a single coordinate system using tracked surface deformation information. However, it is difficult to handle large motion of the target object. This is due to the difficulty of the correspondence search process in the deformation tracking process. The existing methods use the previous state of the target object to find correspondences between the reconstructed model and the input point cloud data. If there are large motions or large occlusions, the correspondence search process fails. Therefore, improving the surface deformation tracking process is an important problem in non-rigid 3D object reconstruction.

One category of the existing methods is non-rigid ICP using prior models. Specific template 3D models such as a face and a human body are used as a prior to track a target object in input data. In these methods, first, the template 3D model is fitted to the input data by estimating model parameters such as joint angles, shape parameters, and landmark positions. Then, estimated model parameters are used to track the target object. In one method, a 3D face model and a body model are used as a prior to track the target object. In another method, 3D human model generated from an input color and depth (RGB-D) image using machine learning based human model generation is used, and then this model is used to track the reconstruction target. By using these methods, the target object can be tracked and reconstructed stably.

The most important point of these methods is that the method is for a specific object such as a human body and a face. These methods use the 3D priors for tracking the target object. Therefore, these methods cannot be used for arbitrary non-rigid 3D modeling tasks. Although the method does not use the template model explicitly in the reconstruction process, the neural network needs to be trained by using the human model dataset. As a result, the trained network cannot handle non-human objects. These are the limitations of this approach.

Another category of the existing methods is scene flow estimation. It is a technique to track input 3D point clouds. In these methods, a 3D flow of the point clouds is estimated by using the machine learning technology. In one method, the 3D scene flow is trained on a 2D image domain. The method outputs the rigidity mask and then decomposes multiple rigid scene flows from the entire rigid scene flow. As a result, this method can separate camera motions and object motions. In another method, the scene flow is trained on a 3D space directly. The network learns flow embedding that represent point motions. These methods can be applied to a general 3D point cloud tracking task.

A problem of existing 3D scene flow estimation methods is that these methods are not designed for non-rigid object modeling tasks. These methods do not have a regularization for surface deformation. As a result, these methods cannot resolve flow ambiguity. In addition, although the latter method can train the network on a 3D space directly, it cannot handle a large point cloud input. Moreover, it only considers the point difference in a single scale. Therefore, it is difficult to estimate fine dense scene flow using the latter method. These are the limitations of this approach.

The problem of the point cloud tracking process needs to be solved in non-rigid 3D object reconstruction caused by a large motion of the target object.

SUMMARY

A device for reconstructing a non-rigid object is provided to achieve modeling a non-rigid object accurately and stably.

According to a first aspect, a device for reconstructing a non-rigid object is provided, where the device includes: a depth sensor; and a processor configured to repeat steps of: capturing a point cloud with a depth sensor, wherein the first point cloud is set as a canonical model; estimating a scene flow by using the captured point cloud and a previous point cloud; pre-warping the captured point cloud by using the estimated scene flow; estimating deformation parameters for the captured point cloud; warping the pre-warped point cloud by using the deformation parameters; and merging the warped point cloud into the canonical model.

In a possible implementation manner of the first aspect, the device further includes a display, wherein estimating a scene flow includes repeating the following steps until a user input to end the following steps is received: estimating a scene flow by using the captured point cloud and a previous point cloud; projecting the estimated scene flow on a plane, and displaying it on the display; and receiving a user input to adjust a scene flow estimation parameter.

In a possible implementation manner of the first aspect, estimating a scene flow includes: generating a first point feature from the previous point cloud by using a first point cloud convolution kernel, and feeding the first point feature to the lower level of a first point cloud convolution kernel; generating a second point feature from the captured point cloud by using a second point cloud convolution kernel, and feeding the second point feature to the lower level of a second point cloud convolution kernel; and estimating the scene flow by using the first and the second point features and a scene flow estimated in the lower level.

In a possible implementation manner of the first aspect, estimating the scene flow by using the first and the second point features and a scene flow estimated in the lower level includes: warping the point features using the initial scene flow calculated by using the scene flow estimated in the lower level; calculating 3D cost volume by using as-rigid-as-possible cost; and refining the initial scene flow by using the 3D cost volume to obtain the estimated scene flow.

According to a second aspect, a method for reconstructing a non-rigid object is provided, where the method includes: repeating steps of: capturing a point cloud with a depth sensor, wherein the first point cloud is set as a canonical model; estimating a scene flow by using the captured point cloud and a previous point cloud; pre-warping the captured point cloud by using the estimated scene flow; estimating deformation parameters for the captured point cloud; warping the pre-warped point cloud by using the deformation parameters; and merging the warped point cloud into the canonical model.

According to a third aspect, a storage media is provided, wherein the storage media stores a program thereon, when the program is executed by a processor, the program makes the processor to perform the method of the second aspect.

BRIEF DESCRIPTION OF DRAWINGS

To describe the technical solutions in the embodiments of the present invention or in the prior art more clearly, the following briefly introduces the accompanying drawings required for describing the embodiments or the prior art. The accompanying drawings in the following description show merely some embodiments of the present invention, and a person of ordinary skill in the art may still derive other drawings from these accompanying drawings without creative efforts.

FIG. 1A illustrates an example of a usage scene of a reconstruction system 101 according to the first embodiment;

FIG. 1B illustrates an example of a usage scene of a reconstruction system 102 according to the first embodiment;

FIG. 2 illustrates a block diagram of a hardware configuration according to the first embodiment;

FIG. 3 illustrates a block diagram of a functional configuration according to the first embodiment;

FIG. 4 illustrates a flow diagram of non-rigid object reconstruction according to the first embodiment;

FIG. 5 illustrates overall architecture of a 3D scene flow estimation network;

FIG. 6 illustrates details of the scene flow estimator of the Estimator-L1 505;

FIG. 7 illustrates a flow diagram of the scene flow estimation parameter adjustment process; and

FIG. 8 illustrates an example of the user interface of the 3D scene flow estimation parameter adjustment.

DESCRIPTION OF EMBODIMENTS

The following clearly and completely describes the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. The described embodiments are only some but not all of the embodiments of the present invention. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts shall fall within the protected scope of the present invention.

FIG. 1A and FIG. 1B illustrate an example of a usage scene of a reconstruction system 101/102 according to the first embodiment of the present invention. The reconstruction system 101 includes a depth sensor connected to a computer, and this depth sensor is fixed. The reconstruction system 102 includes a depth sensor embedded in a mobile device such as a smartphone, and this depth sensor can be moved. The reconstruction target 100, for example, a human, is scanned by the depth sensor. In the scanning process, the reconstruction target 100 moves in front of the fixed depth sensor to show the entire shape from various directions as shown in FIG. 1A, or the user scans the reconstruction target 100 by holding the depth sensor and moving around the reconstruction target 100 as shown in FIG. 1B.

FIG. 2 illustrates a block diagram of a hardware configuration according to the first embodiment. The reconstruction system includes a CPU (Central Processing Unit) 200, a GPU (Graphics Processing Unit) 201, a RAM (Random Access Memory) 202, a ROM (Read Only Memory) 203, a bus 209, an Input/Output I/F (Interface) 205, a display 207 and a user interface 208. The reconstruction system 101/102 also has a depth sensor 204 and a storage device 206 that are connected to the bus 209 via the Input/Output I/F 205. The CPU 200 controls each component connected through the bus 209 by executing a program. The RAM 202 is used for a main memory of the CPU 200 and so on. The ROM 203 stores an OS (Operating System) , programs, device drivers and so on. The depth sensor 204 connected via the Input/Output I/F 205 captures depth maps. In the depth map, depth values are visualized wherein the depth value is defined as a distance from the target. The storage device 206 connected via the Input/Output I/F 205 is a storage with a large capacity, for example, a hard disk or a flash memory. The Input/Output I/F 205 converts data between the reconstruction system 101/102 and the storage device 206. The GPU 201 processes and renders data by executing a program. The display 207 shows captured depth maps, results of the intermediate process, and the user interface. The user interface 208 embedded in the reconstruction system 101/102 accepts and transfers an input by the user to the CPU 200.

FIG. 3 illustrates a block diagram of a functional configuration according to the first embodiment. When the programs are executed by the CPU 200 and the GPU 201 using the hardware in FIG. 2, this functional configuration is realized. The reconstruction system 101/102 includes a user interface control unit 300, a point cloud capture unit 301, a scene flow estimation unit 302, a scene flow visualization unit 306, a non-rigid ICP unit 303, a point cloud fusion unit 304, and a storage unit 305.

The user interface control unit 300 handles the user’s input. The user interface is shown on the display 207 according to the user’s input.

The point cloud capture unit 301 receives a sequence of depth maps from the depth sensor 204. The received depth maps are converted into point clouds.

The scene flow estimation unit 302 estimates a 3D scene flow of the reconstruction target 100 from the acquired point clouds using the depth sensor 204. The 3D scene flow represents a motion of an object in a 3D space with 3D vectors.

The scene flow visualization unit 306 visualizes the estimated scene flow on the display 207.

The non-rigid ICP unit 303 estimates the surface deformation and warps the acquired point cloud using estimated surface deformation parameters. The non-rigid ICP unit 303 also estimates 3D rigid transformation parameters to align the warped point clouds. The non-rigid ICP unit 303 can be implemented by using any depth-based non-rigid ICP algorithms such as DynamicFusion (for example, refer to R. Newcombe, D. Fox and S. Seitz, "DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-Time" , in IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 2015) and SurfelWarp (for example, refer to W. Gao and R. Tedrake, "SurfelWarp: Efficient Non-Volumetric Dynamic Reconstruction" , in Robotic: Science and System, 2018) .

The point cloud fusion unit 304 merges the warped point clouds to reconstruct a 3D model of the reconstruction target 101. The point cloud fusion unit 304 can be implemented by using any model fusion algorithms such as truncated signed distance function (TSDF) based model fusion using the voxel representation and surfel-based model fusion using the surfel representation.

The storage unit 305 stores the reconstructed 3D model into the storage device 206 for further use.

FIG. 4 illustrates a flow diagram of non-rigid object reconstruction according to the first embodiment. As described above with reference to FIG. 1, the user operates the reconstruction system. The user puts or holds the depth sensor 204 of the reconstruction system 101/102 to dynamically scan a reconstruction target 100. The reconstruction target 100 can be any kinds of non-rigid objects such as a human, a face, and a soft toy. The user is supposed to move around the reconstruction target 100 or the reconstruction target 100 is supposed to move to show the entire shape from various directions in front of the depth sensor 204.

From the viewpoint of the hardware configuration, each step of FIG. 4 is executed on the CPU 200 and the GPU 201 and data are stored in the RAM 202 or the storage device 206 and loaded from them as needed. The CPU 200 receives a sequence of point clouds from the depth sensor 204, and stores it in the RAM 202. The point clouds captured from the depth sensor 204 may be stored in the storage device 206. The following describes each step of FIG. 4 from the viewpoint of the functional configuration.

At step 400, the point cloud capture unit 301 acquires a point cloud from the depth sensor 204 or the storage device 206 which stores previously captured point clouds. Specifically, the point cloud capture unit 301 captures a sequence of depth maps of the reconstruction target 100 using depth sensor 204. Acquired depth maps are converted into point clouds.

At step 401, the input point cloud is checked whether it is the first point cloud input or not. If it is the first point cloud input, this point cloud is set as a canonical model at step 402. The canonical model is a base model and subsequent point clouds are merged into the canonical model.

For the frames except for the first frame (the first point cloud) , the scene flow estimation unit 302 estimates a 3D scene flow at step 403 by using point clouds t and t-1. ‘t’ represents a time sequence number when the point cloud is captured. The details of the 3D scene flow estimation network is explained later.

The estimated scene flow is used at step 404. The non-rigid ICP unit 303 receives the estimated scene flow and the point cloud t. The point cloud t is pre-warped by using the estimated scene flow at step 404.

A non-rigid ICP algorithm is applied to the pre-warped point cloud t to align to the canonical model at step 405. The deformation parameters for the point cloud t are estimated by using the standard non-rigid ICP criteria such as point-to-point, point-to-plane, and as-rigid-as-possible.

The pre-warped point cloud t is warped by using the estimated deformation parameters, and then the warped point cloud t is fed into the point cloud fusion unit 304. The warped point cloud is merged into the canonical model at step 406.

At step 407, the reconstruction system checks whether the input point cloud is the final frame (the final point cloud input) or not. If the point cloud is not the final point cloud input, the steps 400 to 406 are repeatedly executed.

After fusing the final point cloud input, the loop closure process is applied to reduce error accumulation at step 408. Depending on the representation of the reconstructed model, the loop closure process can be implemented by any kinds of algorithms (for example, refer to C.R. Qi, L. Yi, H. Su and L.J. Guibas, "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space" , in International Conference on Neural Information Processing Systems, Long Beach, CA, 2017, and Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni and A. Markham, "RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds" , in IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, 2020) .

After applying the loop closure, a 3D model is extracted from the fused point cloud at the step 409. Finally, the extracted 3D model is stored in the storage device 206.

In an embodiment of the present invention, a pair of point clouds which are successive frames are fed into the neural network to estimate a 3D scene flow of each point. The quality of the 3D scene flow is assessed by the user by visualizing the color-coded 3D scene flow on a 2D screen. The system asks whether the estimated 3D scene flow is smooth or not. If the estimated 3D flow is smooth enough, the non-rigid 3D point cloud fusion process is applied to reconstruct a 3D model of the target object. The following describes a 3D scene flow estimation network with reference to FIG. 5 and FIG. 6, and describes the scene flow estimation parameter adjustment process with reference to FIG. 7 and FIG. 8.

FIG. 5 illustrates overall architecture of a 3D scene flow estimation network. As-rigid-as-possible (ARAP) regularizer is used (specifically, at step 606 and step 607 in FIG. 6) in the 3D scene flow estimation network. The existing scene flow algorithms are not designed for 3D object reconstruction. By adding ARAP regularizer to the scene flow estimation network, the algorithm can handle the surface deformation properly. This is the important factor for non-rigid 3D object reconstruction.

A neural network model is trained under a supervised learning manner to learn a 3D scene flow by considering surface deformation using a paired point clouds dataset. The dataset contains a collection of dynamic object data, which is composed by sequences of the point clouds of the objects. The neural network is trained to explicitly infer the surface deformation of the point cloud pair using the as-rigid-as-possible (ARAP) constraint. Correspondences between two point clouds can be effectively estimated by using a 3D scene flow estimation network. One embodiment is non-rigid 3D object reconstruction. The estimated 3D scene flow is used to find the corresponding point pairs between point clouds. These correspondences are used in the non-rigid point cloud fusion process.

As shown in FIG. 5, this network includes a point feature pyramid and a coarse-to-fine scene flow estimator. In an embodiment of the present invention, first, two point clouds 500 and 501 (for example, the point clouds t-1 and t at step 403 above) are fed into the network. Then, the point feature pyramids (PC1-L1, PC1-L2, PC1-L3, PC2-L1, PC2-L2, and PC2-L3) are constructed by using point cloud convolution kernels (PC Conv-L1 502, PC Conv-L2 503, and PC Conv-L3 504) . In this process, weights are shared between the same level point cloud convolution layers (the same weight is used at the same level) . The point cloud convolution kernel can be implemented by using any kinds of the algorithms such as PointNet++ (for example, refer to C.R. Qi, L. Yi, H. Su and L.J. Guibas, "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space" , in International Conference on Neural Information Processing Systems, Long Beach, CA, 2017) , RandLA-Net (for example, refer to Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni and A. Markham, "RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds" , in IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, 2020) , and KPConv (for example, refer to H. Thomas, C.R. Qi, J. -E. Deschaud, B. Marcotegui, F. Goulette and L.J. Guibas, "KPConv: Flexible and Deformable Convolution for Point Clouds, " in International Conference on Computer Vision, Seoul, Korea, 2019) .

Scene flow estimators

505, 506, and 507 receives corresponding level of point features. In addition, each scene flow estimator receives the estimated scene flow from the lower level of the estimator. Note that the lowest level of the estimator receives the initial flow 508. The initial flow can be set as zero. The highest level of the estimator outputs the final result of the estimated scene flow 509.

FIG. 6 illustrates details of the scene flow estimator of the Estimator-L1 505. Note that the other estimators (Estimator-L2 506 and Estimator-L3 507) have the same architecture as the Estimator-L1 505. The Estimator-L1 receives the point features PC1-L1 600 and PC2-L1 601 as inputs. In addition, the Estimator-L1 also receives the Flow-L1 602 from the lower level of the Estimator-L2 506. First, the initial scene flow for each point feature PC1-L1 600 and PC2-L2 601 is calculated by a linear blending 603 by using the Flow-L1 602. After the linear blending process, the upsampled scene flow 604 can be obtained. The warping layer 605 warps the point features using the initial scene flow. After warping, 3D cost volume is calculated by using point-to-point cost, point-to-plane cost, as-rigid-as-possible (ARAP) cost, and point-to-point distance cost in a learned feature space at step 606. The ARAP cost is calculated for each point i. At first, K nearest neighbor points around the point i are searched in the input point cloud. For each nearest neighbor point j (j=1, 2, …, K) , the following ARAP energy is calculated.

ARAP Energy: || (t _i-t _j) -R _i (t′ _i-t′ _j) || ²

wherein t _i and t _j are translation parameters for point i and j, respectively. R _i is a 3D rotation matrix for the point i. The ARAP cost for the point i is calculated by K ARAP energies. Finally, the neural network refines the initial 3D scene flow using the 3D cost volume in the scene flow refiner 607. To refine the scene flow, the point cloud convolution kernel is used in the neural network.

FIG. 7 illustrates a flow diagram of the scene flow estimation parameter adjustment process. This process is the optional process for the non-rigid 3D reconstruction. If the process shown in FIG. 7 is omitted, user interactions are not required. In this process, the user can adjust a frame skip parameter, which is a 3D scene flow estimation parameter, for 3D object reconstruction to control the stability, efficiency and accuracy of the reconstruction process. First, a 3D scene flow is estimated at step 700 in the same way as the step 403. At step 701, the estimated 3D scene flow is projected on the 2D image plane and displayed on the display 207. For intuitive visualization, the 3D scene flow is colorized and/or rendered in the 3D space. The user can check the estimation result from the qualitative aspect at step 702. If the estimated 3D scene flow is smooth enough, the user ends the parameter adjustment process. If the estimated 3D scene flow is not smooth enough, the user change the frame skip parameter at step 703. At step 403 above, the successive frames, namely, the point clouds t-1 and t are used for the 3D scene flow estimation. By adjusting the frame skip parameter, the interval of two point clouds can be changed to, for example, t-3 and t, t-5 and t, etc. The process goes back to step 700 to estimate a 3D scene flow again with the adjusted parameter, and at step 701, the user can check the estimated scene flow.

FIG. 8 illustrates an example of the user interface of the 3D scene flow estimation parameter adjustment. The user 802 can adjust the frame skip parameter by sliding the bar 801 at the bottom of the screen 800 by using the touch interface (the user interface 208 on the display 207) . The result of visualization at the step 701 is shown on the display 207. The user can interactively check the quality of the 3D scene flow estimation result before and after the parameter adjustment. Although the example shows the usage of the smartphone, the user interface can be implemented by using a display device and a pointing device such as a mouse device.

In another embodiment, the 3D scene flow estimation resulting from existing algorithms (for example, refer to Z. Lv, K. Kim, D. Sun, A.J. Troccoli and J. Kautz, "Learning rigidity of dynamic scenes for three-dimensional scene flow estimation" . US Patent US20190057509A1, 16 8 2017, and X. Liu, C.R. Qi and L.J. Guibas, "FlowNet3D: Learning Scene Flow in 3D Point Clouds, " in IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 2019) may be refined by using optimization with regularizer.

According to the embodiments of the present invention, a non-rigid object can be modeled accurately and stably even though the reconstruction target has a large motion. In addition, the embodiments of the present invention can be applied to various types of non-rigid objects because the algorithm does not use any priors.

What is disclosed above are merely exemplary embodiments of the present invention, and certainly is not intended to limit the protection scope of the present invention. A person of ordinary skill in the art may understand that all or some of processes that implement the foregoing embodiments and equivalent modifications made in accordance with the claims of the present invention shall fall within the scope of the present invention.

Claims

A device for reconstructing a non-rigid object, comprising:

a depth sensor; and

a processor configured to repeat steps of:

capturing a point cloud with a depth sensor, wherein the first point cloud is set as a canonical model;

estimating a scene flow by using the captured point cloud and a previous point cloud;

pre-warping the captured point cloud by using the estimated scene flow;

estimating deformation parameters for the captured point cloud;

warping the pre-warped point cloud by using the deformation parameters; and

merging the warped point cloud into the canonical model.
The device according to claim 1, further comprising a display, wherein estimating a scene flow comprises repeating the following steps until a user input to end the following steps is received:

estimating a scene flow by using the captured point cloud and a previous point cloud;

projecting the estimated scene flow on a plane, and displaying it on the display; and

receiving a user input to adjust a scene flow estimation parameter.
The device according to claim 1 or 2, wherein estimating a scene flow comprises:

generating a first point feature from the previous point cloud by using a first point cloud convolution kernel, and feeding the first point feature to the lower level of a first point cloud convolution kernel;

generating a second point feature from the captured point cloud by using a second point cloud convolution kernel, and feeding the second point feature to the lower level of a second point cloud convolution kernel; and

estimating the scene flow by using the first and the second point features and a scene flow estimated in the lower level.
The device according to claim 3, wherein estimating the scene flow by using the first and the second point features and a scene flow estimated in the lower level comprises:

warping the point features using the initial scene flow calculated by using the scene flow estimated in the lower level;

calculating 3D cost volume by using as-rigid-as-possible cost; and

refining the initial scene flow by using the 3D cost volume to obtain the estimated scene flow.
A method for reconstructing a non-rigid object, comprising repeating steps of:

capturing a point cloud with a depth sensor, wherein the first point cloud is set as a canonical model;

estimating a scene flow by using the captured point cloud and a previous point cloud;

pre-warping the captured point cloud by using the estimated scene flow;

estimating deformation parameters for the captured point cloud;

warping the pre-warped point cloud by using the deformation parameters; and

merging the warped point cloud into the canonical model.
A storage media storing a program thereon, when the program is executed by a processor, the program makes the processor to perform the method according to claim 5.