AU2021105153A4

AU2021105153A4 - An unsupervised learning of point cloud denoising

Info

Publication number: AU2021105153A4
Application number: AU2021105153A
Authority: AU
Inventors: Zhenya Yue
Original assignee: Yunshigao Technology Co Ltd
Current assignee: Yunshigao Technology Co Ltd
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2021-11-11
Anticipated expiration: 2029-08-09

Abstract

Depth cameras and laser scanners have been widely used to digitize real-world scenes, which are initially represented by point clouds. However, point cloud data is often affected by noise and outliers during the collection process, this makes point cloud denoising a fundamental and challenging problem. In this patent, a denoising algorithm without clean training data and any pairs of noisy is proposed, the algorithm is implemented by extending the idea of learning from unsupervised image denoising to unstructured 3D point clouds. Unsupervised image denoisers operate under the assumption that a noisy pixel observation is a random realization of a distribution around a clean pixel value, which allows appropriate learning on this distribution to eventually converge to the correct value. Regrettably, this assumption is not valid for unstructured points: 3D point clouds are subject to total noise, i.e., deviations in all coordinates, with no reliable pixel grid. Thus, an observation can be the realization of an entire manifold of clean 3D points, which makes a naive extension of unsupervised image denoisers to 3D point clouds impractical. To solve this difficulty, we introduce a priori information, that guides the convergence to the only nearest mode among many possible modes on the manifold. Our results demonstrate unsupervised denoising performance similar to that of supervised learning with clean data when given enough training examples. 1 Fig.1 a) H J- i+1 b).s p(zIS C) S qlzly) o Observation 0 0 o y E 00E 0 Other pixel obs. 6 I Y 0 0YO 0 0 0 Unique mode 00 0 0 00 00 Pixel 0 0 0 0 o o Distribution pli), Domain Domain 0Domain _________________ - Non-unique mode Fig.2 Level Points ReetvFeatures o 0 08 0o~ 0 0fl 3/6 1 0 0(0 0 0 0 0 64 2 0 0 0 0 128 100 0 0 0)6 o 000000000000000000 3 Fig.3 a) b C) d T - 0 01

Description

Fig.1

a) H J- i+1 b).s p(zIS C) S qlzly) o Observation 0 0 o y E 00E 0 Other pixel obs. 6 I Y 0 0YO 0 0 0 Unique mode 00 0 0 00 00 Pixel 0 0 0 0 o o Distribution pli), Domain Domain 0Domain _________________ - Non-unique mode

Fig.2

Level Points ReetvFeatures

o 0 08 0o~ 0 0fl 3/6

1 0 0(0 0 0 00 64

2 0 0 0 0 128

100 0 0 0)6

o 000000000000000000 3

Fig.3

a) b

C) d

T - 0 01

1. Background and Purpose A point cloud is a geometric data type consisting in an unordered collection of 3D points representing samples of 2D surfaces of physical objects or entire scenes. Point clouds are becoming increasingly popular due to the availability of instruments such as LiDARs and the interest in exploiting the richness of the geometric representation in challenging applications such as autonomous driving. However, the acquisition process is imperfect and a significant amount of noise typically affects the raw point clouds. Therefore, point cloud denoising methods are of paramount importance to improve the performance of various downstream tasks such as shape matching, surface reconstruction, object segmentation and more. However, point cloud obtained with 3D scanners or by image-based reconstruction techniques are often corrupted with significant amount of noise and outliers. Traditional model-based techniques have typically focused on fitting a surface to the noisy data. Such techniques work well in low-noise settings but they usually suffer from oversmoothing, especially in presence of high amounts of noise or geometries with sharp edges. Although the denoising method based on machine learning has good performance, it requires a large amount of high-quality point cloud data during the training process, it is difficult to obtain the clearn point cloud in practice. Consequently, it is desirable to be able to denoise the acquired noisy 3D point clouds by solely using the noisy data itself. In this patent, a denoising algorithm without clean training data and any pairs of noisy is proposed, the algorithm is implemented by extending the idea of learning from unsupervised image denoising to unstructured 3D point clouds. Unsupervised image denoisers operate under the assumption that a noisy pixel observation is a random realization of a distribution around a clean pixel value, which allows appropriate learning on this distribution to eventually converge to the correct value. Regrettably, this assumption is not valid for unstructured points: 3D point clouds are subject to total noise, i.e., deviations in all coordinates, with no reliable pixel grid. Thus, an observation can be the realization of an entire manifold of clean 3D points, which makes a naive extension ofunsupervised image denoisers to 3D point clouds impractical. To solve this difficulty, we introduce a priori information, that guides the convergence to the only nearest mode among many possible modes on the manifold. Our results demonstrate unsupervised denoising performance similar to that of supervised learning with clean data when given enough training examples.

2. Denoising Theory

2.1. Unsupervised and unpaired approach An observation yi at pixel i in a noise corrupted image is a sample of a noise distribution yi ~ p(zlxi) around the true value xi. This is shown in Fig.l.a. The black curve is the true signal and pixels (dotted vertical lines) sample it at fixed positions i (black circles) according to a sampling distribution p(zIxi) (yellow curve) around the true value (pink circle). In classic supervised denoising, we know both a clean xi and a noisy value y ~ p(zlxi) for pixel i and minimize argminEy ~ p(zlxj)l(f0(Y),Xi) (1)

Wherefis a tunable function with parameters ), and I is a loss such as L 1 . Here and in the following, we omit the fact that the input to f comprises of many y that form an entire image, or at least a patch. We can learn the mapping from one noisy point cloud data to another. It has been shown, that learning arg minEy1 p(zlx)Ey 2 (zzlxi) 1(fe(Y1),2) (2)

converges to the same value as if it had been learned using the mean of the distribution p(zlxi) whenlis L 2 . However, in practice, it is difficult to obtain the same point cloud corrupted by different noise realizations as a training set. Overcoming this, we can learn a mapping from all noisy observations in one data, like image. argminEy ~ p(zlxj)l(f(Y), Y) (3)

2.2. Unique Modes Different from observing pixels at index i in an image(dotted line Fig.1.a), which tell us that y is a realization of a hidden value xi to infer, it is unknown which hidden surface point is realized when observing a point in an unpaired setting. A noisy point observation y, can be a realization of p(zlx )1 in the same way as it could be a realizationof p(zIx 2 ). Consequently, the distribution p(zIS) has a manifold of modes (pink line in Fig.1.b).. Learning a mapping from a noisy realization to itself will try to converge to this multimodal distribution, since, for the same neighborhood, the network will try to regress different points from this distribution at the same time. Therefore, we regularize the problem by imposing the prior q(zly) that captures the probability that a given observation y is a realization of the clean point z. q(zly) p(z|S) * k(z - y) (4) k(d) = exp(- ) (5) 2)2a (5) where - is the bandwidth of k and W = diag (w), is a diagonal weight matrix trading spatial and appearance locality. We use a value w = 1/ar, r being 5% of the diameter of the model and a a scaling factor. This results in convergence to the nearest (in space and appearance) mode when optimizing arg minEy1 p(zlx)Ey 2 (zzlx) 1 (fe(Y1),Y2) (6)

The effect of this prior is seen in Fig.1.c. Out of many modes, the unique closest one remains. 2.3. Converging to the mode We train the network to converge to the mode of the prior distribution q(zly) by using the approximation of the Lo loss function (|f (y) -- qI + E) (7) where their E = 108 , and their y is annealed from 2 to 0 over the training.

3. Unsupervised 3D Point Cloud Denoiser training

Step 1: Random uniform batch All operations are defined on batches that have the size of the point cloud, the range is between 0 and 1. Step 2: Sample prior batch To minimize Eq.6 we need to draw samples according to the prior q which is implemented using rejection sampling: we pick a random point 4 from 'Y within r from y, and train on it only if k(q - y) > { for a uniform random E (0,1). In practice, a single sample is used to estimate this inner expected value over q(zly). Step 3: Minimize batch Denoising a point cloud means to learn Eq.6, In Eq.6, We implementfusing an unstructured encoder-decoder based on Monte Carlo convolution(Fig.2). Such an architecture consumes the point cloud, transforms spatial neighborhoods into latent codes defined on a coarser point set (encoder), and up-sample these to the original point resolution (decoder). The effective receptive field, so the neighborhood from which the NN regressed points are considered, is 30 % of the diameter of the model. In particular, we perform two levels of encoding, the first with a receptive field of 5 %, the second at %. The Poisson disk radii for pooling in Level 1 and Level 2 are half the size of the receptive fields. We use an ADAM optimizer to optimize Eq.7, with an initial learning rate of .005, which is decreased during training. Step 4: Iterative optimization Our results improved if the output of the network is fed as input again, but we noticed a filament structures forming on the surface after several denoising iterations,

Since the points are only constrained to lie on the surface, there are multiple displacement vectors that bring them equally close to the surface. In multiple iterations, the points drift tangent to the surface, forming clusters. Therefore, we introduce a regularization term: Lr = arg min Ey ~ p(zls) max ||fy),f (y)|11 2 (8) o y En(y,y) Where n('y, y) is the set of points from the noisy point cloud within a patch centered at y. E.6 is represented by Ls

LS = arg min E p(zlx,)Ey 2 p(zxi)1(fe(Y1),2) (9)

The full loss function is a weighted combination of the two loss terms: La = aLs + (1 - a)Lr (9) Since the second term can be seen as a regularization, we set a to 0.99 in our experiments. Step 5: Color prior information 3D point clouds that come with RGB color annotation are a surprising opportunity to further overcome the limitations of unsupervised training. Otherwise, in some cases, the spatial prior cannot resolve round edges. This is not because the networkfis unable to resolve them, but because unsupervised training does not 'see' the sharp details. Fig. 3 details how colors resolve this: without RGB, the corners are rounded in Fig.3.a. When adding color, here red and blue (Fig.3.b), the points become separated (Fig.3.c). The sampling of the prior q(zly) on a red point, will never pick a blue one and vice versa. Consequently, the learning behaves as if it had seen the sharp detail. The effect of this color prior is seen in Fig.3.d.

4. Brief Description of the Drawings

Fig.1 is the substantial differences exist when denoising image and point cloud data.

Fig.2 is an overview of the algorithm structure. Fig.3 is the influence of color prior.

Fig.4 is the denoising results for noise generated by image-based 3D reconstruction techniques.

Fig.5 is the denoising results for real noise.

5. Algorithm experiment In order to check if the proposed algorithm can remove noise, we test our model from two different types of noisy data.

The first experiment involves real point clouds generated by image-based 3D reconstruction techniques. Point clouds obtained by these methods are in general highly affected by noise and a large amount of outliers due to image imperfections. The point cloud shown in Fig.4 on the left is the noisy reconstruction produced by the algorithm and is the input to the denoiser. The picture in the center is the effect picture after pointclean net denoising, on the right in Fig.4 is the effect data after the proposed algorithm. Second, we used the Paris-rue-Madame data set which is composed of 20 million points, Fig.5 shows the performance of our algorithm, the left is noisy data in the data set, and the right side is the effect after the proposed algorithm.

Claims

The claims defining the invention are as follows: An unsupervised learning of point cloud denoising.

1. Unsupervised Learning of point cloud denoising Description

1.1 Denoising Theory Step 1: Unsupervised and unpaired approach In classic supervised denoising, we know both a clean xi and a noisy value y ~ p(zlxi) for pixel i and minimize argminEy ~ p(zlxj)l(f0(Y),Xi) (1)

Step 2: Unique Modes Different from observing pixels at index i in an image, which tell us that y is a realization of a hidden value xi to infer, it is unknown which hidden surface point is realized when observing a point in an unpaired setting. A noisy point observation y, can be a realization of p(zlx 1) in the same way as it could be a realization of p(zIx 2 ). Consequently, the distribution p(zS) has a manifold of modes. Learning a mapping from a noisy realization to itself will try to converge to this multimodal distribution, since, for the same neighborhood, the network will try to regress different points from this distribution at the same time. Therefore, we regularize the problem by imposing theprior q(zly) that captures the probability that a given observationy is a realization of the clean point z. q(zly) p(z|S) * k(z - y) (4) k(d) 1= exp(- )(5) where - is the bandwidth of k and W = diag (w), is a diagonal weight matrix trading spatial and appearance locality. We use a value w = 1/ar, r being 5% of the diameter of the model and a a scaling factor. This results in convergence to the nearest (in space and appearance) mode when optimizing arg minEy1 p(zlx)Ey 2 (zzlxi) 1(fe(Y1),2) (6)

Step 3: Converging to the mode We train the network to converge to the mode of the prior distribution q(zly) by using the approximation of the LO loss function (|f (y) -- qI + E) (7) where their E = 108 , and their y is annealed from 2 to 0 over the training.

1.

2 Unsupervised 3D Point Cloud Denoiser training Step 1: Random uniform batch All operations are defined on batches that have the size of the point cloud, the range is between 0 and 1. Step 2: Sample prior batch To minimize Eq.6 we need to draw samples according to the prior q which is implemented using rejection sampling: we pick a random point 4 from 'Y within r from y, and train on it only if k(q - y) > { for a uniform random E (0,1). In practice, a single sample is used to estimate this inner expected value over q(zly). Step 3: Minimize batch Denoising a point cloud means to learn Eq.6, In Eq.6, We implementfusing an unstructured encoder-decoder based on Monte Carlo convolution. Such an architecture consumes the point cloud, transforms spatial neighborhoods into latent codes defined on a coarser point set (encoder), and up-sample these to the original point resolution (decoder). The effective receptive field, so the neighborhood from which the NN regressed points are considered, is 30 % of the diameter of the model. In particular, we perform two levels of encoding, the first with a receptive field of 5 %, the second at %. The Poisson disk radii for pooling in Level 1 and Level 2 are half the size of the receptive fields. We use an ADAM optimizer to optimize Eq.7, with an initial learning rate of .005, which is decreased during training. Step 4: Iterative optimization

Our results improved if the output of the network is fed as input again, but we noticed a filament structures forming on the surface after several denoising iterations, Since the points are only constrained to lie on the surface, there are multiple displacement vectors that bring them equally close to the surface. In multiple iterations, the points drift tangent to the surface, forming clusters. Therefore, we introduce a regularization term: 2 Lr = argminEy ~ p(zis) max ||fe(y),fe(y')11 (8) 0 y En(Y,y)

Where n('y, y) is the set of points from the noisy point cloud within a patch centered at y. E.6 is represented by Ls

LS = arg min E p(zlx,)Ey 2 p(zlxi)(fe(Y1,2) (9)

The full loss function is a weighted combination of the two loss terms: La = aLs+ (1- a)Lr (10) Since the second term can be seen as a regularization, we set a to 0.99 in our experiments. Step 5: Color prior information 3D point clouds that come with RGB color annotation are a surprising opportunity to further overcome the limitations of unsupervised training. Otherwise, in some cases, the spatial prior cannot resolve round edges. This is not because the network f is unable to resolve them, but because unsupervised training does not 'see' the sharp details. Fig. 3 details how colors resolve this: without RGB, the corners are rounded in Fig.3.a). When adding color, here red and blue (Fig.3.b), the points become separated (Fig.3.c). The sampling of the prior q(zly) on a red point, will never pick a blue one and vice versa. Consequently, the learning behaves as if it had seen the sharp detail.

Fig.

3 Fig.2 Fig.1

Fig.5 Fig.4