CN113474818A - Apparatus and method for performing data-driven pairwise registration of three-dimensional point clouds - Google Patents

Apparatus and method for performing data-driven pairwise registration of three-dimensional point clouds Download PDF

Info

Publication number
CN113474818A
CN113474818A CN202080013849.6A CN202080013849A CN113474818A CN 113474818 A CN113474818 A CN 113474818A CN 202080013849 A CN202080013849 A CN 202080013849A CN 113474818 A CN113474818 A CN 113474818A
Authority
CN
China
Prior art keywords
ppf
encoder
local
self
scan
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080013849.6A
Other languages
Chinese (zh)
Inventor
邓皓文
托尔加·比尔达尔
斯洛博丹·伊利奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Original Assignee
Siemens AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG filed Critical Siemens AG
Priority claimed from PCT/EP2020/052128 external-priority patent/WO2020164911A1/en
Publication of CN113474818A publication Critical patent/CN113474818A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Length Measuring Devices With Unspecified Measuring Means (AREA)

Abstract

Method and apparatus (1) for performing data-driven pairwise registration of a three-dimensional 3D point cloud PC, the apparatus comprising: at least one scanner (2) adapted to capture a first local point cloud PC1 in a first scan and a second local point cloud PC2 in a second scan; a PPF derivation unit (3) adapted to derive a captured local point cloud(PC1, PC2) to derive associated point pair characteristics (PPF1, PPF 2); a PPF self-encoder (4) adapted to process the derived point-to-point features (PPF1, PPF2) to extract corresponding PPF feature vectors (V)PPF1,VPPF2) (ii) a A PC auto-encoder (5) adapted to process the captured local point clouds (PC1, PC2) to extract corresponding PC feature vectors (V)PC1,VPC2) (ii) a A subtractor (6) adapted to derive the PC vector (V)PC1,VPC2) Minus the corresponding PPF feature vector (V)PPF1,VPPF2) To calculate a potential difference vector (LDV1, LDV2) for both captured point clouds (PC1, PC2), which potential difference vector (LDV1, LDV2) is concatenated as a potential difference vector (CLDV); and a pose prediction network (8) adapted to calculate a relative pose prediction T between a first scan and a second scan performed by the scanner (2) based on the concatenated potential difference vectors (CLDV).

Description

Apparatus and method for performing data-driven pairwise registration of three-dimensional point clouds
The present invention relates to a method and apparatus for performing data-driven pairwise registration of a three-dimensional point cloud generated by a scanner.
Matching local keypoint descriptors is a step of automatic registration for three-dimensional overlay scans. Point set registration, also known as point matching, is the process of finding a spatial transform that does align two sets of points. The point set or point cloud may include raw data from a three-dimensional scanner. In contrast to two-dimensional descriptors, learned three-dimensional descriptors lack any kind of local orientation assignment (orientation assignment) and, therefore, any subsequent pose estimator is forced to satisfy nearest neighbor queries and exhaustive RANSAC iterations to robustly compute the alignment transformation. This neither results in a reliable processing nor is it computationally efficient. Matching local keypoint descriptors forms a step of automatic registration for three-dimensional overlay scans. The match may contain outliers (outlers) that severely hamper scan registration, i.e., the alignment of the scan results in the computation of a six degree of freedom transformation between the outliers. The conventional processing method is to perform the RANSAC process. The RANSAC process does sample three corresponding matched pairs multiple times, and may transform one scan to another using a computed rigid transform that takes into account all key points counting the number of inliers. Such a sampling process is computationally inefficient.
The Andy Zeng et al article "3D match: Learning the Matching of Local 3D Geometry in Range scopes" discloses a 3D descriptor focused on partially noisy 3D data obtained from a Range of commercial sensors for Matching Local Geometry.
It is therefore an object of the present invention to provide the following methods and devices: the method and apparatus provide more efficient registration of three-dimensional point clouds.
This object is achieved according to a first aspect of the invention by a device comprising the features of claim 1.
According to a first aspect of the invention, the invention provides an apparatus for performing data-driven pairwise registration of a three-dimensional point cloud, the apparatus comprising: at least one scanner adapted to capture a first local point cloud in a first scan and a second local point cloud in a second scan;
a PPF derivation unit adapted to process both the captured local point clouds to derive associated point pair features;
a PPF self-encoder adapted to process the derived point-to-feature to extract a corresponding PPF feature vector;
a PC self-encoder adapted to process the captured local point cloud to extract a corresponding PC feature vector;
a subtractor adapted to subtract the corresponding PPF feature vector from the PC vector to calculate a potential difference vector for both of the captured point clouds, the potential difference vectors concatenated as a potential difference vector; and
a pose prediction network adapted to calculate a relative pose prediction between a first scan and a second scan performed by the scanner based on the concatenated potential difference vectors.
In a possible implementation of the apparatus according to the first aspect of the invention, the apparatus further comprises a pose selection unit adapted to process the pool of calculated relative pose predictions for selecting a suitable pose prediction.
In a possible implementation of the apparatus according to the first aspect of the present invention, the pose prediction network comprises a multi-layer perceptron MLP rotation network for decoding the concatenated potential difference vectors.
In a possible implementation of the apparatus according to the first aspect of the present invention, the PPF self-encoder comprises:
an encoder adapted to encode the point-to-point features derived by the PPF derivation unit to calculate a potential PPF feature vector, the potential PPF feature vector being supplied to the subtractor; and the PPF self-encoder includes:
a decoder adapted to reconstruct the point pair features from the potential PPF feature vectors.
In a possible implementation of the apparatus according to the first aspect of the present invention, the PC self-encoder comprises:
an encoder adapted to encode the captured local point cloud to calculate a potential PC feature vector, which is supplied to the subtractor; and the PC self-encoder includes:
a decoder adapted to reconstruct a local point cloud from the latent PC feature vector.
According to a further second aspect, the invention also provides a data-driven computer-implemented method for pairwise registration of three-dimensional 3D point clouds, comprising the features of claim 6.
According to a second aspect, the invention provides a data-driven computer-implemented method for pairwise registration of three-dimensional point clouds, the method comprising the steps of:
capturing, by at least one scanner, a first local point cloud in a first scan and a second local point cloud in a second scan;
processing both the captured local point clouds to derive associated point pair features;
supplying point pair features of both the captured local point clouds to a PPF self-encoder to provide a PPF feature vector and the captured local point clouds to a PC self-encoder to provide a PC feature vector;
subtracting the corresponding PPF feature vector provided by the PPF self-encoder from the PC vector provided by the PC self-encoder to calculate respective potential difference vectors for the captured point clouds; and
the calculated potential difference vectors are concatenated to provide concatenated potential difference vectors that are applied to a pose prediction network to calculate a relative pose prediction between the first scan and the second scan.
In a possible implementation of the method according to the second aspect of the invention, a pool of relative pose predictions is generated for a plurality of point cloud pairs, each comprising a first local point cloud and a second local point cloud.
In a further possible implementation of the method according to the second aspect of the invention, the pool of generated relative pose predictions is processed to perform pose verification.
In a further possible implementation of the method according to the second aspect of the present invention, the PPF auto-encoder and the PC auto-encoder are trained on the basis of a calculated loss function.
In a possible implementation of the method according to the second aspect of the invention, the loss functions comprise a reconstruction loss function, a pose prediction loss function and a feature consistency loss function.
In a further possible implementation of the method according to the second aspect of the present invention, the PPF feature vector provided by the PPF self-encoder comprises rotation invariant features and wherein the PC feature vector provided by the PC self-encoder comprises non-rotation invariant features.
In the following, possible embodiments of the different aspects of the invention are described in more detail with reference to the drawings.
Fig. 1 shows a block diagram for illustrating a possible exemplary embodiment of an apparatus for performing a data-driven pairwise registration of a three-dimensional point cloud according to a first aspect of the present invention;
FIG. 2 shows a flow diagram illustrating a possible exemplary embodiment of a data-driven computer-implemented method for pairwise registration of three-dimensional point clouds according to further aspects of the invention;
fig. 3 shows a schematic diagram for illustrating a possible exemplary implementation of an apparatus according to the first aspect of the invention;
fig. 4 shows a further schematic diagram for illustrating a further exemplary implementation of the apparatus according to the first aspect of the present invention.
As can be seen in the block diagram of fig. 1, the apparatus 1 for performing data-driven pairwise registration of three-dimensional 3D point clouds PC comprises in the exemplary embodiment shown at least one scanner 2, which at least one scanner 2 is adapted to capture a first local point cloud PC1 in a first scan and a second local point cloud PC2 in a second scan. In the illustrated exemplary embodiment of fig. 1, the apparatus 1 comprises one scanner 2 providing both point clouds PC1, PC 2. In an alternative embodiment, two separate scanners may be used, with the first scanner generating a first local point cloud PC1 and the second scanner generating a second local point cloud PC 2.
The apparatus 1 shown in fig. 1 comprises a PPF derivation unit 3, the PPF derivation unit 3 being adapted to process both the captured local point clouds PC1, PC2 to derive the associated point pair characteristics PPF1, PPF 2.
The apparatus 1 further comprises a PPF self-encoder 4, the PPF self-encoder 4 being adapted to process the derived point-to-point features PPF1, PPF2 output by the PPF derivation unit 3 to extract corresponding PPF feature vectors VPPF1、VPPF2As shown in the block diagram of fig. 1.
The apparatus 1 further comprises a PC self-encoder 5, the PC self-encoder 5 being adapted to process the captured local point clouds PC1, PC2 generated by the scanner 2 to extract corresponding PC feature vectors VPC1、VPC2
The apparatus 1 further comprises a subtractor 6, the subtractor 6 being adapted to derive the PC vector V fromPC1、VPC2Minus the corresponding PPF eigenvector VPPF1、VPPF2To compute potential difference vectors LDV1, LDV2 for both the captured point clouds PC1, PC 2.
The apparatus 1 comprises a concatenation unit 7 for concatenating the received potential difference vectors LDV1, LDV2 into a single concatenated potential difference vector CLDV, as illustrated in the block diagram of fig. 1.
The apparatus 1 further comprises a pose prediction network 8, the pose prediction network 8 being adapted to calculate a relative pose prediction T between a first scan and a second scan performed by the scanner 2 based on the received concatenated potential difference vectors CLDV. In a possible embodiment, the apparatus 1 further comprises a pose selection unit adapted to process a pool of calculated relative pose predictions T for selecting a suitable pose prediction T. The pose prediction network 8 of the apparatus 1 may in a possible embodiment comprise a multi-layer perceptron MLP rotation network for decoding the received concatenated potential difference vectors CLDV.
The apparatus 1 shown in fig. 1 comprises two self-encoders, namely a PPF self-encoder 4 and a PC self-encoder 5. The self-encoder 4, 5 may comprise a neural network adapted to copy the input to the output. The self-encoder works by compressing the received input into a potential spatial representation, and then reconstructing an output from the potential spatial representation. The self-encoders each include an encoder and a decoder.
The PPF self-encoder 4 comprises, in a possible embodiment, an encoder adapted to encode the point-to-point feature PPF derived by the PPF derivation unit 3 to calculate a potential PPF feature vector VPPF1、VPPF2Potential PPF feature vector VPPF1、VPPF2Is supplied to the subtractor 6 of the apparatus 1. The PPF self-encoder 4 further comprises a decoder adapted to reconstruct the point pair features from the potential PPF feature vectors.
In addition, the PC self-encoder 5 of the device 1 is, in a possible implementation, a self-encoderThe formula includes: an encoder adapted to encode the captured local point clouds to compute a latent PC feature vector VPC1、VPC2Latent PC feature vector VPC1、VPC2Is supplied to the subtractor 6; and a decoder adapted to reconstruct the local point cloud PC from the latent PC feature vectors.
FIG. 2 illustrates a possible exemplary embodiment of a data-driven computer-implemented method for pairwise registration of three-dimensional 3D point clouds according to further aspects of the invention. In the illustrated exemplary embodiment, the data driven computer implemented method includes five main steps S1 to S5.
In a first step S1, a first local point cloud PC1 is captured in a first scan and a second local point cloud PC2 is captured in a second scan by at least one scanner, for example by a single scanner 2 as shown in the block diagram of fig. 1, or by two separate scanners.
In a further step S2, both the captured local point clouds PC1, PC2 are processed to derive associated point pair features PPF1, PPF 2.
In a further step S3, the point-pair features PPF1, PPF2 derived in step S2 for both the captured local point clouds PC1, PC2 are supplied to the PPF self-encoder 4 to provide a PPF feature vector VPPF1、VPPF2And supplies the captured local point clouds PC1, PC2 to the PC self-encoder 5 to provide a PC feature vector VPC1、VPC2
In a further step S4, vector V is derived from PC supplied by PC from encoder 5PC1、VPC2Subtracts the corresponding PPF feature vector V provided by the PPF self-encoder 4PPF1、VPPF2To calculate potential difference vectors LDV1, LDV2 for the captured point clouds PC1, PC2, respectively.
In a further step S5, both of the calculated potential difference vectors LDV1, LDV2 are automatically concatenated to provide a concatenated potential difference vector CLDV, which is applied to a pose prediction network to calculate a relative pose prediction T between the first scan and the second scan.
In a possible implementation, a pool of relative pose predictions T may be generated for a plurality of point cloud pairs each including a first local point cloud PC1 and a second local point cloud PC 2. In a possible implementation, a pool of generated relative pose predictions T may be processed to perform pose validation.
In a possible embodiment, the PPF auto-encoder 4 and the PC auto-encoder 5 may be trained based on the calculated loss function L. The loss function L may comprise, in a possible exemplary embodiment, reconstructing the loss function LrecPose prediction loss function LposeAnd a characteristic consistency loss function Lfeat
In a possible embodiment, the PPF feature vector V provided by the PPF self-encoder 4PPF1、VPPF2Comprising rotation-invariant features, and a PC feature vector V provided by the PC self-encoder 5PC1、VPC2Including non-rotationally invariant features.
With a data-driven computer-implemented method for pairwise registration of three-dimensional point clouds PC, the relative transformation between robust local feature descriptors together with matching local keypoint blocks can be learned in a three-dimensional scan. The computational complexity of the estimation of the relative transformation between the matched keypoints, r registration. Furthermore, the computer-implemented method according to the invention is faster and more accurate compared to the conventional RANSAC process, and also results in learning more robust key points or feature descriptors compared to the conventional method.
The method according to the invention does separate the poses from the intermediate feature pairs. The method and apparatus 1 according to the present invention employs a dual architecture comprising a PPF self-encoder 4 and a PC self-encoder 5, wherein the self-encoders each comprise an encoder and a decoder, as also shown in the block diagram of fig. 3.
In the illustrated implementation of fig. 3, the self-encoder comprises an encoder ENC and a decoder DEC. The auto-encoder AE receives the point cloud PC or point-to-feature PPF and may compress the input into a potential feature representation. As also shown in the block diagram of fig. 4, the apparatus may include two separate self-encoders AE for each PC cloud having a separate input source. The PPF folding networks (FoldNet)4A, 4B and the PC folding networks 5A, 5B may be trained separately, and the PPF folding networks 4A, 4B and the PC folding networks 5A, 5B may be capable of extracting rotation invariant features and rotation variant features, respectively. The features extracted by each PPF folded network 4A, 4B are rotation invariant and therefore the features are the same across the same local block in different poses, whereas the features extracted by the PC folded networks 5A, 5B change in different poses, i.e. the features are non-rotation invariant. Thus, the method and apparatus 1 uses the features extracted by the PPF folded network as canonical features, i.e., the features of the canonical pose block. By subtracting the PPF folded network features from the PC folded network features, the remainder contains mainly geometry-free pose information. This geometry-free pose information may be supplied to the pose prediction network 8 to decode pose information according to the obtained feature differences.
With respect to data preparation, finding a canonical pose for a given local block is not easy. Local reference frames may be helpful but are generally unreliable because local reference frames are largely affected by noise. Defining the absolute pose of a local patch is challenging due to the lack of canonical poses. It is relevant that a partial block from one partial scan can be aligned with its corresponding partial block from another partial scan under the same relative transformation. Such basic reality information has been provided in many available data sets for training. Instead of trying to find the true pose of a local block as a training supervision, the method according to the invention does combine the pose characteristics of two corresponding blocks and use a pose prediction network 8 to recover the relative pose between the pose characteristics of two corresponding blocks, as is also shown in the block diagram of fig. 4.
In view of the fact that segment pairs or partial scan pairs can be used to train the network to predict the relative pose T, it may be beneficial to utilize this pair relationship as an additional signal to the PPF folding network to extract better local features. The training of the network can be done in a completely unsupervised manner. Existing pairs of relationships can be used to ensure that features extracted from the same block are as close as possible, regardless of noise, missing parts, or clutter. During training, additional L2 penalties may be added to the PPF folded network intermediate features generated for the block pairs. In this way, the quality of the learned features can be further improved.
For a given partial scan pair, a set of local correspondences may be established using features extracted from the PPF folded network. Each corresponding pair may generate an assumption for the relative pose between them, which also forms a vote for the relative pose between the two partial scans. Thus, it is possible to get a pool of hypotheses or relative pose predictions generated by all found correspondences. Since not all generated hypotheses are correct, in a possible embodiment the hypotheses may be inserted into a RANSAC-like pipeline, i.e. each hypothesis is exhaustively verified and scored, wherein the best scored hypothesis is retained as the final prediction.
In a further possible implementation, the hypotheses may be transformed into hough space to find peaks in the space where most hypotheses are clustered together. In general, this relies on the assumption that correctly predicted subsets are grouped together, which is valid in most cases.
With the method according to the invention, better local features for establishing local correspondences can be generated. This approach is able to predict the relative pose T given only two pairs of blocks, unlike the RANSAC process which requires at least three pairs to generate a minimum hypothesis.
Due to the combination of the advanced network structure and the weakly supervised training scheme, better local features can be extracted. A pipeline to recover relative pose information given a pair of local blocks or point clouds may be incorporated into the robust 3D reconstruction pipeline.
Pure geometric local blocks typically carry two pieces of information, namely structure and motion:
(1) 3D structure summarized by the dots themselves
Figure BDA0003207055400000081
Where ρ ═ x, y, z]T
(2) Motion, which corresponds in context to a 3D transformation or pose T oriented globally and locating in space the set of points Pi∈SE(3):
Figure BDA0003207055400000082
Wherein R.epsilon.SO (3) and
Figure BDA0003207055400000083
set of points P representing a local blockiIs generally considered to be a canonical version thereof
Figure BDA0003207055400000084
The transformed copy of (a). Typically, such a canonical absolute pose T is looked up from a single local blockiInvolving computing local reference frames known to be unreliable [36 ]]. The invention is based on the following premises: a good local (block-level) pose estimation results in a good global rigid alignment of the two segments. First, by decoupling the pose components from the structural information, a data-driven predictor network can be designed that can regress the pose for any block and display good generalization properties.
A naive way to achieve tolerance to 3D structures is to train the network conditional on a database of input blocks for pose prediction, and leave invariance to the network. Unfortunately, networks trained in this manner require a very large number of unique local blocks, or simply lack generalization. To alleviate this drawback, the structural components are eliminated by training invariant-invariant network pairs and using intermediate latent spatial algorithms. The invariant function Ψ is characterized by:
Figure BDA0003207055400000085
where g (.) is a function that depends only on pose. When g (T) ═ I, Ψ is referred to as T-invariant.
For any input, P results in a specification Ψ (P) ← Ψ (P) of the specificationc) The result of (1). When g (T) ≠ I, it can be assumed that the isokinetic behavior of T can be approximated by some additive linear operations:
g(T)Ψ(Pc)≈h(T)+Ψ(pc) (3)
h (T) is a potentially highly non-linear function of T. By substituting formula (3) into formula (2), it is possible to obtain:
Ψ(P)-Ψ(Pc)≈h(T) (4)
that is, the difference in potential space can approximate the pose to a maximum of non-linearity h. Approximating the inverse of h by means of a four-layer MLP network
Figure BDA0003207055400000091
And by regressing the motion (rotation) term:
ρ(f)≈R|t (5)
wherein f ═ Ψ (P) - Ψ (P)c). Note that f only illustrates motion, and can therefore be generalized to any local block structure, yielding a powerful pose predictor under the above assumptions.
Note that ρ (·) can be used directly to return the absolute pose to the canonical frame. However, this is undesirable due to the above-mentioned difficulty in defining unique local reference frames. Since a given situation takes into account a pair of scenes, the relative pose can be estimated safely rather than the absolute pose, replacing the prerequisite of a good estimate of the LRF. This also helps to easily make the labels required for training. Therefore, ρ (-) can be modeled as a relative pose predictor network 8, as shown in fig. 1, 4.
At rigid transformation TijThe corresponding local structures of the next two scenes (i, j) in good registration are also associated with TijThe alignment is good. Therefore, the relative pose between the local patches can be easily obtained by calculating the relative pose between the segments.
To achieve generalized relative pose prediction, three key components can be implemented: invariant network Ψ (P)c) Wherein g (t) is I; networkΨ (P) which varies depending on the input; and MLP ρ (·). The recent PPF folding network self-encoder is suitable for psi (P)c) Modeling because of Ψ (P)c) Is unsupervised, works for point blocks and achieves true invariance because the point-to-point feature (PPF) completely marginalizes the motion item. Interestingly, by preserving the same network architecture as the PPF folded network, if the PPF part is replaced with the 3D point itself Ψ (Pc), the intermediate features depend on both the structural and pose information. The PC folding network is used as an equal variation network psi (P) ═ g (T) } psi (P)c). By using the PPF folding network and the PC folding network, the rotation invariant feature and the rotation variant feature can be learned separately. As shown in fig. 3, the PPF folding network and the PC folding network share the same architecture, while performing different encoding of the partial blocks. The difference of the encoder outputs of the two networks, i.e. the potential features of the PPF folding network and the potential features of the PC folding network, respectively, is taken by a subtractor 6, resulting in features that are almost exclusively specific to pose (motion) information. These features are then fed into a generalized pose prediction network 8 to recover the rigid relative transformation. The overall architecture of the complete relative pose prediction is shown in fig. 4.
Multiple cues, both supervised and unsupervised, may be used to train the network to guide the network in finding the optimal parameters. In particular, the loss function L may comprise three parts:
L=Lrec1Lpose2Lfeat (6)
Lrec、Lposeand LfeatRespectively, reconstruction loss, pose prediction loss and feature consistency loss. L isrecReflecting the reconstruction fidelity of the PC folding network/PPF folding network. In order to enable the encoder of the PPF folded network/PC folded network to generate good features for pose regression and good features for finding robust local correspondences, similar to the steps in the PPF folded network, two auto-encoders AE can be trained in an unsupervised way using the chamfer distance as a metric:
Figure BDA0003207055400000101
wherein the content of the first and second substances,
Figure BDA0003207055400000102
where the a operator refers to the (estimated) set of reconstructions. FppfRefers to a point pair characteristic of the same computed set of points.
The correspondence of the two local blocks is centralized and normalized before being sent to the PC/PPF folding network. This eliminates the translation section
Figure BDA0003207055400000103
Pose prediction loss LposeIs to enable a pose prediction network to predict the relative rotation R between given blocks12epsilon.SO (3). Thus, for LposeThe preferred choice of (c) describes the difference between the predicted rotation and the substantially true rotation:
Lpose=||q-q*||2 (9)
note that the rotation is parameterized by quaternions. This is mainly due to the reduced number of regression parameters and the lightweight projection operation-vector-normalization.
In a hypothetical correspondence (p)1,p2) And predicted rotation q*Conditional translation t*Can be calculated by:
t*=p1-R*p2 (10)
wherein R is*Corresponds to q*Is represented by a matrix of (a).
The pose prediction network 8 requires local block pairs for training. Additionally, the information may be utilized as an additional weak supervisory signal to further facilitate training of the PPF folded network. Such guidance may improve the quality of intermediate latent features previously trained in a completely unsupervised manner. In particular, correspondence subject to noise, missing data, or clutterFeatures can generate high reconstruction losses that cause local features to be different even for the same local block. Such additional information helps to ensure that features extracted from the same block are as close as possible in the space of embedding, which is very beneficial as it does establish local correspondences that enable searching nearest neighbors in the feature space. Loss of feature consistency LfeatExpressed as:
Figure BDA0003207055400000111
representing a corresponding set of local blocks, wherein fpFor features extracted at p by the PPF folded network, fp∈fppf
The full 6DoF pose can be parameterized by translation conditioned on the matching point (3DoF) and the 3DoF orientation provided by the pose prediction network. Thus, having a set of correspondences is equivalent to having a set of pre-generated transformation assumptions. Note that this is in contrast to the standard RANSAC method, in which the pose is parameterized by an m-3-correspondence, and establishing N correspondences may cause
Figure BDA0003207055400000112
This assumption is to be verified. This small number of assumptions that are already linear in the number of correspondences makes it possible to exhaustively evaluate the set of assumption-matching pairs used for pose verification. The estimate can be refined by recalculating the transform using the surviving inliers. The hypothesis with the highest score is then retained as the final decision.

Claims (9)

1. An apparatus (1) for performing data-driven pairwise registration of a three-dimensional 3D point cloud PC, the apparatus comprising:
(a) at least one scanner (2), the at least one scanner (2) being adapted to capture a first local point cloud PC1 in a first scan and a second local point cloud PC2 in a second scan, wherein the first scan comprises a first local structure of a first scene and the second scan comprises a second local structure of a second scene, the first local structure of the first scene corresponding to the second local structure of the second scene and having a relative pose to the second local structure of the second scene;
(b) a PPF derivation unit (3), the PPF derivation unit (3) being adapted to process both the captured local point clouds (PC1, PC2) to derive associated point pair features (PPF1, PPF 2);
(c) a PPF-auto-encoder (4), the PPF-auto-encoder (4) being adapted to process the derived point-pair features (PPF1, PPF2) to extract corresponding PPF feature vectors (V)PPF1,VPPF2);
(d) A PC-autocoder (5), the PC-autocoder (5) being adapted to process the captured local point clouds (PC1, PC2) to extract corresponding PC feature vectors (V)PC1,VPC2);
(e) A subtractor (6), said subtractor (6) being adapted to derive the PC vector (V)PC1,VPC2) Minus the corresponding PPF feature vector (V)PPF1,VPPF2) To calculate potential difference vectors (LDV1, LDV2) for both captured point clouds (PC1, PC2), said potential difference vectors (LDV1, LDV2) being concatenated as potential difference vectors (CLDV); and
(f) a pose prediction network (8), the pose prediction network (8) being adapted to calculate a relative pose prediction T between the first scan and the second scan performed by the scanner (2) based on a concatenated potential difference vector (CLDV),
wherein the PPF feature vector (V) provided by the PPF self-encoder (4)PPF1,VPPF2) Includes a rotation invariant feature, and
wherein the PC feature vector (V) provided by the PC auto-encoder (5)PC1,VPC2) Including non-rotationally invariant features.
2. The apparatus as defined in claim 1, wherein the apparatus (1) further comprises a pose selection unit adapted to process a pool of the calculated relative pose predictions T for selecting a suitable pose prediction T.
3. The apparatus as defined in claim 2, wherein the pose prediction network (8) comprises a multi-layer perceptron MLP rotation network for decoding the concatenated potential difference vectors (CLDV).
4. The apparatus of any of the preceding claims 1 to 3, wherein the PPF self-encoder (4) comprises:
an encoder (4A), the encoder (4A) being adapted to encode the point-to-feature PPF derived by the PPF derivation unit to calculate a potential PPF feature vector (V) supplied to the subtractorPPF1,VPPF2) (ii) a And the PPF self-encoder (4) comprises:
a decoder (4B), the decoder (4B) being adapted to reconstruct the point-to-point feature PPF from the potential PPF feature vector.
5. The apparatus of any preceding claim 1 to 4, wherein the PC self-encoder (5) comprises:
an encoder (5A), the encoder (5A) being adapted to encode the captured local Point Cloud (PC) to calculate a potential PC feature vector (V) supplied to the subtractorPC1,VPC2) (ii) a And the PC self-encoder (5) comprises:
a decoder (5B), the decoder (5B) being adapted to reconstruct the local point cloud PC from the potential PC feature vectors.
6. A pairwise registered, data-driven computer-implemented method for three-dimensional 3D point cloud PC, the method comprising the steps of:
(a) capturing (S1), by at least one scanner, a first local point cloud PC1 in a first scan and a second local point cloud PC2 in a second scan, wherein the first scan includes a first local structure of a first scene and the second scan includes a second local structure of a second scene, the first local structure of the first scene corresponding to the second local structure of the second scene and having a relative pose to the second local structure of the second scene;
(b) processing (S2) both the captured local point clouds (PC1, PC2) to derive associated point pair features (PPF1, PPF 2);
(c) supplying (S3) point-pair features (PPF1, PPF2) of both the captured local point clouds (PC1, PC2) to a PPF self-encoder to provide a PPF feature vector (V)PPF1,VPPF2) And supplying the captured local point clouds (PC1, PC2) to a PC self-encoder to provide PC feature vectors (V)PC1,VPC2);
(d) From a PC vector (V) provided by the PC self-encoderPC1,VPC2) Subtracting (S4) the corresponding PPF feature vector (V) provided by the PPF self-encoderPPF1,VPPF2) To calculate respective potential difference vectors (LDV1, LDV2) for the captured point clouds (PC1, PC 2); and
(e) concatenating (5) the calculated potential difference vectors (LDV1, LDV2) to provide a concatenated potential difference vector (CLDV) which is applied to a pose prediction network to calculate a relative pose prediction T between the first scan and the second scan,
wherein a PPF feature vector (V) provided by the PPF self-encoderPPF1,VPPF2) Includes a rotation invariant feature, and
wherein the PC feature vector (V) provided by the PC self-encoderPC1,VPC2) Including non-rotationally invariant features.
7. The method of claim 6, wherein a pool of relative pose predictions T is generated for a plurality of pairs of point clouds PC, each comprising a first local point cloud PC1 and a second local point cloud PC 2.
8. The method of claim 7, wherein the pool of generated relative pose predictions T is processed to perform pose validation.
9. The method according to any of the preceding claims 6 to 8, wherein the PPF self-encoder and the PC self-encoder are trained based on the calculated loss function L.
CN202080013849.6A 2019-02-11 2020-01-29 Apparatus and method for performing data-driven pairwise registration of three-dimensional point clouds Pending CN113474818A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP191564350 2019-02-11
EP15064350 2019-02-11
PCT/EP2020/052128 WO2020164911A1 (en) 2019-02-11 2020-01-29 An apparatus and a method for performing a data driven pairwise registration of three-dimensional point clouds

Publications (1)

Publication Number Publication Date
CN113474818A true CN113474818A (en) 2021-10-01

Family

ID=77868575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080013849.6A Pending CN113474818A (en) 2019-02-11 2020-01-29 Apparatus and method for performing data-driven pairwise registration of three-dimensional point clouds

Country Status (1)

Country Link
CN (1) CN113474818A (en)

Similar Documents

Publication Publication Date Title
US20220084221A1 (en) An apparatus and a method for performing a data driven pairwise registration of three-dimensional point clouds
Zhou et al. Self-supervised monocular depth estimation with internal feature fusion
CN108319932B (en) Multi-image face alignment method and device based on generative confrontation network
Song et al. Hybridpose: 6d object pose estimation under hybrid representations
Wu et al. SACF-Net: Skip-attention based correspondence filtering network for point cloud registration
Richardson et al. Learning detailed face reconstruction from a single image
Kusakunniran et al. A new view-invariant feature for cross-view gait recognition
Wan et al. Region-aware reflection removal with unified content and gradient priors
Cox et al. Dynamic histogram warping of image pairs for constant image brightness
CN110544275A (en) Methods, systems, and media for generating registered multi-modality MRI with lesion segmentation tags
Zhao et al. Vcgan: Video colorization with hybrid generative adversarial network
WO2022103684A1 (en) Face-aware person re-identification system
Lin et al. Fp-age: Leveraging face parsing attention for facial age estimation in the wild
CN111582066B (en) Heterogeneous face recognition model training method, face recognition method and related device
Liu et al. Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion.
Ramon et al. Multi-view 3D face reconstruction in the wild using siamese networks
Yang et al. S3Net: A single stream structure for depth guided image relighting
Nguyen et al. Multi-camera multi-object tracking on the move via single-stage global association approach
CN113474818A (en) Apparatus and method for performing data-driven pairwise registration of three-dimensional point clouds
Van Luong et al. Sparse signal reconstruction with multiple side information using adaptive weights for multiview sources
Paterson et al. 3D head tracking using non-linear optimization.
CN111783497A (en) Method, device and computer-readable storage medium for determining characteristics of target in video
Zhao et al. 3d-aware hypothesis & verification for generalizable relative object pose estimation
Wu et al. Capturing implicit spatial cues for monocular 3D hand reconstruction
George et al. From Modalities to Styles: Rethinking the Domain Gap in Heterogeneous Face Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination