CN112912890A

CN112912890A - Method and system for generating synthetic point cloud data using generative models

Info

Publication number: CN112912890A
Application number: CN201980058780.6A
Authority: CN
Inventors: 卢卡斯·普格一卡西亚; 乔尔·皮诺; 埃尔迈拉·阿米洛·阿博尔法蒂
Original assignee: Royal Institute For Learning Promotion / Mcgill University; Huawei Technologies Co Ltd
Current assignee: Royal Institute For Learning Promotion / Mcgill University; Huawei Technologies Co Ltd
Priority date: 2018-09-14
Filing date: 2019-09-14
Publication date: 2021-06-04
Also published as: US20200090357A1; US11151734B2; WO2020052678A1

Abstract

Methods and systems for generating synthetic point cloud data are described. A projected 2D data mesh is generated by projecting the 3D point cloud into a 2D mesh having rotational homogeneity. The generative model is learned using the projected 2D data mesh, where the generative model is implemented using, for example, a flexible convolution and transposed flexible convolution operation in a generative countermeasure network. The learned generative model is used to generate a composite point cloud.

Description

Method and system for generating synthetic point cloud data using generative models

Technical Field

The present application relates to a system and method of learning a generative model for generating synthetic point cloud data, and a system and method of generating synthetic point cloud data from data selected in a distribution using the learned generative model.

Background

In many autonomous tasks, understanding the environment plays a key role. The success of an autonomous device (e.g., a robot or vehicle) in performing an autonomous task depends on robust sensory data input and algorithms for processing the sensory data. In many cases, sensory data is noisy or some sensory data is missing. To be able to handle this situation, the autonomous device needs to be able to "understand" such sensory data. Humans have this ability. For example, if some pixels in some video frames are lost, a human can easily "predict/imagine" the lost data (e.g., generate those lost samples in their brain based on it having temporal and spatial information about those pixels) and still have the same perception of the video.

Furthermore, many autonomous tasks require testing or even training in a simulator environment, as training and testing autonomous devices in a real environment may be difficult, e.g., may be costly and/or less safe (e.g., in the case of autonomous driving). However, many conventional simulators do not provide realistic sensory data for autonomous devices. As a result, autonomous devices trained and tested in a simulator may not operate well in a real environment. Therefore, it is desirable to build simulators that can produce more realistic data.

In order to solve the above problems, an efficient method for generating synthetic data is required. In recent years, researchers have successfully used generative models to generate image and video data, such as described by Goodfellow et al in their paper (evolution of neural information processing systems, pp 2672-2680, 2014), and Zhu et al in their paper (arXiv preprint, 2017). However, in most cases, autonomous devices require a three-dimensional (3D) understanding of the real environment in order to operate well, which relies on accurate 3D sensory data (e.g., in the form of a point cloud). Generating point clouds is currently a challenging task.

Generative models are a class of machine learning methods that aim to generate samples from the same distribution as the training data. The generative models are of different types, such as variational auto-encoder (VAE) and Generative Adaptive Network (GAN). Many generative models that use Convolutional Neural Networks (CNNs) in their architecture require inverse convolution operations (e.g., in the decoder in a VAE or in the generator in a GAN environment). For conventional convolution, the operation is a transposed convolution. Many deep learning based methods capture local features in an irreversible manner.

Accordingly, there is a need for a system and method for generating synthetic data that addresses at least some of the problems discussed above.

Disclosure of Invention

Methods and systems for generating synthetic data in the form of a point cloud are provided. The disclosed methods and systems learn a generative model from a point cloud in a training phase, and generate a composite point cloud from the learned generative model in an inference phase. In some examples, the disclosed methods and systems may utilize a Deep Neural Network (DNN) architecture. The present application also describes methods and systems for ordering and projecting data points from a point cloud into a mesh-based data structure that helps improve the efficiency of learning the generative model during the training phase.

According to an aspect of the present application, there is provided a method comprising: obtaining a first batch of point clouds representing a 3D environment; generating a projected 2D data grid for each of the first plurality of point clouds by projecting each of the first plurality of point clouds into a projected two-dimensional (2D) data grid having rotational homogeneity; in the training phase, learning to generate a generation model of one or more batches of synthetic point clouds, and learning the generation model by providing a projection 2D data grid of each point cloud of a first batch for the generation model, wherein the generation model comprises flexible convolution operation and transposed flexible convolution.

According to the preceding aspect, the method further comprises: one or more batches of synthetic point clouds are generated from data sampled from a distribution using the learned generative model.

According to any preceding aspect, the generating the projected 2D data grid further comprises: populating the projected 2D data mesh by wrapping elements of the mesh from one edge to an opposite edge of the projected 2D data mesh; in the training phase, the filled generation model is learned by using the projected 2D data grid.

According to any preceding aspect, each point cloud of the first plurality of point clouds is projected onto the projected 2D data mesh using the following formula:

where x, y, and z represent the 3D coordinates of the data points in the point cloud.

According to the aforementioned aspect, the projected 2D data grid is populated by adding an added leftmost column containing elements from the original rightmost column of the projected 2D data grid and adding an added rightmost column containing elements from the original leftmost column of the projected 2D data grid, wherein the number of populated columns is determined by the size of the convolution kernel.

According to any preceding aspect, each row in the projected 2D data grid corresponds to a respective closed loop in the point cloud.

According to any preceding aspect, the method further comprises: using the one or more batches of synthetic point clouds to supplement any missing data points from the point clouds in the first batch.

According to the aforementioned aspect, the method further comprises: combining the generative model with a sequence model, wherein the sequence model generates synthetic time data for predicting any missing data points from the point clouds in the first batch.

According to the foregoing aspect, the learned generative model is a Recurrent Neural Network (RNN).

According to any preceding aspect, the method further comprises: one or more batches of synthetic point clouds are generated from the distributively sampled data according to the actions of the autonomous device using the learned generative model.

According to an aspect of the present application, there is provided a method comprising: obtaining a first batch of point clouds representing a 3D environment; generating a projected 2D data grid for each point cloud of the first plurality of point clouds by projecting each point cloud of the first plurality of point clouds into the projected 2D data grid having rotational homogeneity; in a training phase, learning a generative model that generates one or more batches of synthetic point clouds, the generative model being learned by providing the first batch of point clouds to the generative model, wherein the projected 2D data mesh is used to identify nearest neighbors for performing a flexible convolution operation and a transposed flexible convolution operation during learning.

According to the foregoing aspect, the method further includes generating one or more batches of synthetic point clouds from the distributively sampled data using the learned generation model.

According to the preceding aspect, the generating the projected 2D data grid further comprises: populating the projected 2D data mesh by wrapping elements of the mesh from one edge to an opposite edge of the projected 2D data mesh; in the training phase, the filled generation model is learned by using the projected 2D data grid.

According to any preceding aspect, the method further comprises: combining the generative model with a sequence model, wherein the sequence model captures temporal data from the point clouds in the first batch of point clouds used to predict any missing data points.

According to any of the preceding aspects, the learned generative model is a recurrent neural network, RNN.

According to another aspect of the present application, there is provided a processing unit including: a processor; a memory storing computer-executable instructions that, when executed by the processor, cause the processing unit to perform: obtaining a first batch of point clouds representing a 3D environment; generating a projected 2D data grid for each point cloud of the first plurality of point clouds by projecting each point cloud of the first plurality of point clouds into the projected 2D data grid having rotational homogeneity; in the training phase, learning to generate a generation model of one or more batches of synthetic point clouds, and learning the generation model by providing a projection 2D data grid of each point cloud of a first batch for the generation model, wherein the generation model comprises flexible convolution operation and transposed flexible convolution.

Drawings

Reference will now be made by way of example to the accompanying drawings which illustrate exemplary embodiments of the present application, and in which:

FIG. 1 is a block diagram illustrating some components of an example autonomous vehicle;

FIG. 2 illustrates a block diagram of some components of a processing system for implementing a method for learning a generative model for generating a synthetic point cloud;

FIG. 3 is a diagram conceptually illustrating how 2D convolution is applied to grid-based data;

FIG. 4 is a schematic diagram of 2D image coordinates relative to 3D light detection and ranging (LIDAR) coordinates;

fig. 5 is a schematic diagram of the population of a projected 2D mesh according to examples described herein;

6A-6C are schematic diagrams of example projections of a point cloud onto a 2D mesh according to examples described herein;

FIG. 7 is a flow diagram of an example method for learning a generative model for generating a synthetic point cloud.

Like reference numerals may be used to refer to like parts in different figures.

Detailed Description

Some examples of the present application are described in the context of an autonomous vehicle. However, the methods and systems disclosed herein may also be applicable to implementations other than autonomous devices, such as in the context of off-board devices and/or semi-autonomous devices. For example, any system or apparatus that requires training and/or testing using point clouds and/or that may benefit from the ability to synthesize point clouds during operation may benefit from the present application. Further, examples of the present application may be used to learn a generation model that generates synthetic point clouds in other environments.

Although the examples described herein may refer to automobiles as autonomous vehicles, the teachings of the present application may relate to other forms of autonomous or semi-autonomous equipment including, for example, trams, subways, trucks, buses, surface and underwater vessels, airplanes, drones (also known as Unmanned Aerial Vehicles (UAVs)), warehouse equipment, construction equipment, or farm equipment, and may include manned vehicles as well as unmanned vehicles. The methods and systems disclosed herein may also be associated with off-board devices (e.g., autonomous vacuum cleaners and lawn mowers).

FIG. 1 is a block diagram of certain components of an exemplary autonomous vehicle 100. Although described as autonomous, the vehicle 100 may operate in a fully autonomous, semi-autonomous, or fully user-controlled mode. In the present application, the vehicle 100 is described in an automotive embodiment; however, as noted above, the present application may be implemented in other on-board or off-board machines.

For example, the vehicle 100 includes a sensor system 110, a data analysis system 120, a path planning system 130, a vehicle control system 140, and an electromechanical system 150. Other systems and components may be suitably included in the vehicle 100. The various systems and components of the vehicle may communicate with each other, such as wired or wireless communication. For example, the sensor system 110 may be in communication with the data analysis system 120, the path planning system 130, and the vehicle control system 140; the data analysis system 120 may be in communication with the path planning system 130 and the vehicle control system 140; the path planning system 130 may be in communication with the vehicle control system 140; the vehicle control system 140 may be in communication with the mechanical system 150.

The sensor system 110 includes various sensing units for collecting information about the internal and/or external environment of the vehicle 100. In the exemplary embodiment shown, the sensor system 110 includes a radar unit 112, a LIDAR unit 114, a camera 116, and a Global Positioning System (GPS) unit 118. The sensor system 110 may comprise other sensing units, such as temperature sensors, precipitation sensors or microphones, etc.

In an exemplary embodiment, the LIDAR unit 114 may include one or more LIDAR sensors and may capture data in a wide view (e.g., a 360 ° view) about the vehicle 100. LIDAR data (e.g., raw sensor data acquired by one or more LIDAR sensors) may include 3D information about an environment, and may be processed to form a set of data points in 3D space. In this application, the term "3D point cloud" or "point cloud" will be used to refer to a set of data points having a 3D structure in space. Each data point in the 3D point cloud represents a 3D coordinate (e.g., x, y, and z values) of a sensed object in 3D space. The groups of data points in the 3D point cloud may be irregularly spaced, depending on the sensed environment. In some examples, each data point in the point cloud may contain other information, such as the intensity of the reflected light or the time of detection, in addition to the 3D coordinates.

Using various sensing units 112, 114, 116, 118, the sensor system 110 may collect information about the local environment of the vehicle 100 (e.g., any directly surrounding obstacles) as well as information from a wider vicinity (e.g., the radar unit 112 and the LIDAR unit 114 may collect information from an area around the vehicle 100 up to 100m radius or more). The sensor system 110 may also collect information about the position and orientation of the vehicle 100 relative to a frame of reference (e.g., using the GPS unit 118).

The sensor system 110 communicates with the data analysis system 120 to be able to detect and identify objects in the environment of the vehicle 100, for example to detect and identify stationary obstacles, or a pedestrian or another vehicle. The data analysis system 120 may be implemented using software that may include any number of separate or interconnected modules or functions, including machine learning algorithms and image processing functions, for example. The data analysis system 120 may be implemented using one or more dedicated image processing units, or may be implemented using one or more general purpose processing units of the vehicle 100. The data analysis system 120 may repeatedly (e.g., at regular intervals) receive raw sensor data from the sensor system 110, process the raw sensor data, and perform image analysis in real-time or near real-time. The output of the data analysis system 120 may include, for example, an identification of objects in 2D and/or 3D space, including object categories, object locations, and object boundaries.

The data acquired by the sensor system 110 and processed by the data analysis system 120 may be provided to the path planning system 130. The vehicle control system 140 is used to control the operation of the vehicle 100 based on the targets set by the path planning system 130. The vehicle control system 140 may be used to provide full, partial, or supplemental control of the vehicle 100. The electromechanical system 150 receives control signals from the vehicle control system 140 to operate mechanical and/or electromechanical components of the vehicle 100, such as the engine, transmission, steering system, and braking system.

The sensor system 110, the data analysis system 120, the path planning system 130 and the vehicle control system 140 may be implemented at least partially in one or more processing units of the vehicle 100, either individually or in combination.

Before operating in a real environment, the vehicle 100 may need to be trained and/or tested for desired operation. During training and/or testing, the data analysis system 120, the path planning system 130, and/or the vehicle control system 140 may be trained and/or tested outside of the environment of the vehicle 100. For example, one or more modules of the data analysis system 120 can be machine learning-based modules that implement models that are learned using machine learning algorithms such as deep learning. The machine learning based module may be implemented using a neural network, such as a convolutional neural network CNN, which may be trained and/or tested. Training and/or testing the convolutional neural network may be performed using real data (e.g., obtained by operating the vehicle 100 in a real environment) and/or using synthetic data. Synthetic data is typically generated to simulate the real data that would be received via the sensor system 110.

The vehicle 100 may also utilize the synthesized data during real operations. For example, one or more sensors of the sensor system 110 may be blocked or otherwise prevented from obtaining sensed data at some point in real operation (e.g., due to temporary occlusion by the LIDAR unit 114). The vehicle 100 may generate composite data to estimate at least some of the missing data. For example, the sensor system 110 may implement a learning generation model for generating such synthetic data in the absence of data.

As an example, fig. 2 shows an example of a processing unit 200, which processing unit 200 may be used for learning generative models from batches of real point clouds in a training phase and for implementing the learned generative models to generate synthetic data from data sampled in a distribution as described in further detail below. In some embodiments, the processing unit 200 may be implemented, for example, in the vehicle 100 of fig. 1, to implement a learned generative model for generating synthetic data during operation of the vehicle 100. The processing unit 200 may also be provided outside the vehicle 100, for example, to implement a learned generative model that generates synthetic data for training and/or testing the vehicle 100 outside of a real environment (e.g., within a simulation).

In this example, the processing unit 200 includes one or more physical processors 210 (e.g., microprocessors, graphics processing units, digital signal processors, or other computing elements) coupled to electronic memory 220 and one or more input and output interfaces or devices 230. The electronic memory 220 may include both tangible memory (e.g., flash memory) and transient memory (e.g., RAM). The tangible memory may store instructions, data, and/or software modules for execution by the processor to perform the examples described herein. The electronic storage 220 may include any suitable volatile and/or non-volatile storage and retrieval device. The electronic memory 220 may include one or more of a Random Access Memory (RAM), a Read Only Memory (ROM), a hard disk, an optical disk, a Subscriber Identity Module (SIM) card, a memory stick, a Secure Digital (SD) memory card, and the like.

In the example of fig. 2, computer instructions and data are stored in the electronic memory 220 of the processing unit 200, which enables the processor 210 to generate synthetic data, as disclosed herein.

The synthetic data may need to be in the form of a 3D point cloud. For example, the raw data acquired by the LIDAR unit is typically in the form of a point cloud, and it may be desirable to generate synthetic data in a similar form.

The success of applying convolution to 2D images has led to the desire to process 3D point clouds using convolution. However, unlike 2D images, 3D point clouds are typically invariant to alignment. Studies have been conducted to process point cloud data using a deep neural network DNN. PointNet (e.g., as described in Qi et al, in the journal Computer Vision and Pattern Recognition (CVPR) (IEEE, 1(2), 4, 2017) is one example of a DNN that uses permutation-invariant operations (max operations) that can effectively capture global features, however, PointNet has been found to be unsuccessful in extracting local features other DNN's, such as PointNet + + (e.g., as described in Qi et al, in the neural information processing system's evolution, 5099. sup. th. 5108, 2017), PointNet CNN (e.g., as described in Li et al, arXiv preprint (arXiv: 1801.07791, 2018)), and other CNN's, such as ModelNet (e.g., as described in Simonovsky et al, in the journal CVPR, 7.2017) have convolved variants that can extract local and global features.

For generating the model, it is desirable that the operator be differentiable (e.g., as defined by x, y, z coordinates) with respect to the location of the data point in the 3D point cloud. This helps to achieve a better gradient flow so that the gradient model can be learned more efficiently during the training phase. Gradient flow is generally important for all machine learning algorithms.

The generation of the countermeasure network GAN is a generation model for generating synthetic data. The GAN generative model includes a generator and a discriminator. The generator is for generating a composite point cloud, and the evaluator evaluates the composite point cloud. The purpose of the generator is to generate a synthetic point cloud that cannot be distinguished by the discriminator from a real point cloud (e.g., a point cloud generated from raw LIDAR sensor data). Typically, in the training phase, the GAN is trained using real point clouds until the synthetic point clouds generated by the generator (using the learned generation model) are indistinguishable from the real point clouds by the discriminator. The discriminator may be implemented using a neural network such as CNN, and the generator may be implemented using another neural network such as, for example, a deconvolution neural network. Although the present application describes examples of using neural networks to learn and deploy generative models, this is not limiting, i.e., other machine learning methods may also be used.

In the case of GAN, the gradient flow of data is critical because the only way the generator learns is to receive gradients from the discriminator. In the proposed point cloud convolution, a flexible convolution may allow such propagation of the gradient. Groh et al describe a flexible convolution in an arXiv preprint (arXiv accession number: 1803.07289, 2018). The flexible convolution extends the common convolution operator on the image from the 2D mesh to arbitrary 3D locations via spatial correlation by convolution weights. Thus, the flexible convolution provides a method for applying convolution to a 3D point cloud where the data points may have irregularly spaced neighbors.

When a flexible convolution is used as a building block of a discriminator of a GAN or DNN encoder of a variational auto-encoder (VAE), an inverse operator (similar to a transposed convolution) may be defined in a decoder of a generator of the GAN or a decoder of the VAE. Other deep learning-based methods capture local features in an irreversible manner (e.g., grouping neighboring points and convolving each group (referred to as group operations) to extract local features, but it is unclear how to reverse the grouping operations). However, querying all neighboring points in 3D space is computationally expensive when using flexible convolution. To address this problem, the present application describes a method of projecting 3D points in a point cloud to a 2D mesh-based data structure.

For a better understanding of the present application, consider the data structure of a 2D image. The 2D image is typically encoded as a 2D mesh-based data structure. The 2D convolution may be applied to a mesh-based data structure, such as an image. In these structures, the order of each element is defined based on the position of each element in the grid. For example, as shown in FIG. 3, the element in row i and column j of the grid is indexed as element a_ij. Similarly, the element in row i and column j +1 is indexed as element a_i(j+1)(ii) a And similarly, the element in row i +1 and column j is indexed as element a_(i+1)j. This index order is used in the 2D convolution. For example, consider a 3 x 32D kernel containing 9 elements. To perform a 2D convolution on the elements in the middle of a 3 x 3 grid, 8 neighboring elements as well as the central pixel may be considered, e.g. using a linear convolution calculation.

For data in a 3D point cloud, the points are not ordered in a 2D mesh, and the neighbors of any given point in the 3D point cloud are not easily identified. Considering all possible neighbors of a point in a given 3D point cloud may also require extensive computations. In the present application, such problems are solved by defining a mesh-based structure for the points in the 3D point cloud.

For example, a 3D point cloud produced by Velodyne 64 LIDAR may be decomposed into 64 scans through 360 °. In some embodiments, the horizontal and vertical dimensions may be quantized (e.g., to improve efficiency).

When 2D convolution is applied to 2D data (i.e., 2D image data), the operation has translational metamerism. Similarly, the mesh-based data structure onto which the defined point cloud is projected should also have rotational metamorphic properties. To achieve this, a polar coordinate system may be used.

For example, consider a 2D coordinate system (e.g., for 2D image data) and a 3D coordinate system (e.g., for LIDAR data), as shown in fig. 4.

The projection of the 3D point cloud to the 2D mesh-based data structure may be defined as follows:

such a projection may be conceptually understood as a single line projecting a closed loop in the point cloud into the 2D mesh. When projected in this manner, the first and last elements in each row are actually adjacent to each other. To ensure that the projected 2D data accurately reflects this, the leftmost and rightmost columns of the 2D data grid may be filled with values from the other side (rather than zero-filling as is typical when processing 2D images). Fig. 5 depicts an example of this. For the projected 2D data grid 510, a populated 2D data grid 520 is generated with an added leftmost column 522 containing values from the rightmost column 514 of the original projected data grid 510 and an added rightmost column 524 containing values from the leftmost column 512 of the projected data grid 510. The populated data grid 520 may be generated by adding the added

columns

522, 524 to the projected data grid 510. Alternatively, the filler data grid 520 may be a new data structure created in addition to the projection data grid 510.

6A-6C illustrate an example of how a point cloud may be projected onto a 2D data grid using the above-described method. Fig. 6A is a 2D representation of a 3D point cloud 610. The point cloud 610 includes information in a 360 ° view, such as a 360 ° view around a LIDAR sensor. In the 2D representation shown, the sensor will be located in the center of the image. Fig. 6B shows how the point cloud 610 is projected into a 2D data mesh format, as described above. Three

closed loops

612, 614, 616 are highlighted in the point cloud 610. It should be noted that the

rings

612, 614, 616 need not be circular, but may be irregularly shaped and irregularly spaced. Each

closed loop

612, 614, 616 is projected to a

respective row

626, 624, 626 in the 2D data grid 620. Although not explicitly shown, the 2D data grid 620 may be populated as described above. Fig. 6C shows a 2D representation of the point cloud 610, and a visual representation of the 2D data mesh 620b after being projected.

When a point cloud is projected onto a 2D data grid, some information may be lost. This problem can be at least partially addressed by generating a populated data grid as described above, which helps to ensure that adjacent elements on the leftmost and rightmost columns reflect the data in the point cloud. Other elements not at the edges of the 2D data grid may be less affected by the projection. Furthermore, the computational efficiency obtained by this approach may be sufficient to make any possible information loss acceptable as a compromise.

In some examples, the 2D data grid may not need to be populated as described above. For example (e.g., where the projected 2D data grid has a large number of columns), information lost by omitting the added leftmost column and the added rightmost column may be considered acceptable.

After the point cloud is projected into the 2D data mesh format, the projected 2D data mesh is used as input to learn a generative model, such as a GAN generative model. Thus, the generative model may be learned to generate a synthetic 3D point cloud.

FIG. 7 illustrates a flow diagram of a method 700 for learning a generative model that generates one or more batches of synthetic point clouds from sampled data.

At 702, a first batch of real point clouds is obtained. The first batch of point clouds may be obtained directly from a LIDAR sensor, such as the LIDAR sensor of the LIDAR unit 114 shown in fig. 1. The first batch of point clouds may also be obtained, for example, from a database or memory that stores point clouds previously obtained from the LIDAR unit 114.

At 704, each point cloud of the first plurality of point clouds is projected to generate a projected 2D data mesh. The 2D data mesh may have a similar format as the 2D image. The projection can be done using the formula as described above:

this operation may help ensure the rotation isomorphism required to process, for example, a LIDAR point cloud. Other formulas may be used for projection, for example any other projection that achieves rotational homogeneity may be used. For example, although the present application describes a projection in which closed loops in a point cloud are projected to rows in a 2D data grid, other methods may project closed loops in a point cloud to columns in a 2D data grid.

Another example method of generating a projected 2D data grid that achieves rotational co-variability is now described. The example method generates a data grid having H rows and W columns. First, the data points in the point cloud are clustered together according to the elevation angle. This results in H clusters, corresponding to H LIDAR channels (each channel capturing data at a particular elevation angle). For each cluster, the data points are sorted in increasing order of azimuth (in some cases, this sort may not be necessary where the raw data from the LIDAR unit has already been sorted). In order for the grid to have a fixed number of elements per row, 360 ° is divided into a fixed number of bins, resulting in W bins, each defining a range of azimuth angles. The data points within the cluster are placed into the corresponding bin according to the azimuth. For each bin, the average of the data points belonging to that bin is calculated to obtain a single representative value. Thus, H × W elements are populated into the 2D data grid. In some implementations, the 2D mesh may have a depth of 3, with each of the x, y, and z spatial coordinates recorded in each depth.

Optionally, at 706, the projected data grid is populated such that elements at edges of the projected data grid wrap around to opposite edges.

In examples where the projection data grid contains rows corresponding to closed loops in the 3D point cloud, the 2D data grid may be populated by adding an added leftmost column containing elements from the rightmost column of the original projection data grid, and adding an added rightmost column containing elements from the original leftmost column of the projection data grid. The number of columns that fill in the edges of the resulting 2D data grid depends on the size of the kernel convolution.

The padding may be modified in the case where rings in the 3D point cloud are projected to columns in the 2D data grid. In such examples, the projection data grid may be populated by adding an added top-most row containing elements from the original bottom-most row of the projection data grid, and adding an added bottom-most row containing elements from the original top-most row of the projection data grid. Other such modifications to the fill operation may be used for different projection methods.

The projected data grid may be populated by adding the added columns/rows directly to the projected data grid, or a new data structure may be created for the populated data grid. Populating a data grid in this manner can help the generative model understand the relationships between points near the edges of the data grid.

At 708, the generative model is learned using the projected data grid (with or without padding). The generative model may use flexible convolution operations and transposed flexible convolution operations, and may be learned using GAN as described above. The learning of the generative model may include repeating 702 using different batches of the real point cloud.

The generative model may be learned by inputting a projection data mesh to the generative model in order to learn the generative model that generates the synthetic point cloud from the sampled 2D data.

In another possible approach, the generative model may be learned by inputting the 3D point cloud to the generative model in order to learn the generative model that generates the composite point cloud from the sampled 3D data. As described above, the flexible convolution is used to learn the generative model. In this approach, the projected 2D data mesh is not used directly as input to generate the model, but rather helps perform a flexible convolution. The projected 2D data mesh is used to identify the nearest neighbor for each data point in the point cloud for computing the flexible convolution. For example, for a given data point in the point cloud, a corresponding data element in the 2D data grid and neighboring elements (e.g., k nearest neighbors) in the 2D data grid are identified. Those neighboring elements identified in the 2D data mesh are then returned to the point cloud in order to identify neighboring data points in the point cloud. Then, a flexible convolution may be performed using those identified neighboring data points in the point cloud.

After the generative model has been properly learned, a batch of synthetic 3D point clouds may optionally be generated using the generative model, at 710. For example, data sampled from a selected distribution (e.g., a normal distribution, a gaussian distribution, or a polynomial distribution, among other possibilities and depending on the desired application) may be provided as input to a learned generation model for generating the batch of synthetic 3D point clouds.

In some examples, step 710 may be performed separately from other steps of method 700. For example, a generative model may be learned in one environment (e.g., in a laboratory environment), and the learned generative model may be used to generate a collection of synthetic point clouds in a different environment (e.g., in an autonomous vehicle). The present application thus also describes a method of generating a batch of synthetic point clouds using a generative model that has been learned in the manner described above.

In some examples, step 702-708 may be performed iteratively. For example, learning the generative model may include generating a projected 2D data mesh using a second batch of point clouds and further learning the generative model using the second batch of point clouds and the corresponding projected 2D data mesh. In some examples, steps 702-708 may be performed in parallel with 710. That is, learning of the generative model may continue over time and may be performed in parallel with generating the synthetic data by the generative model.

The method 700 may be used to generate a batch of synthetic point clouds for training and/or testing purposes, such as for training and/or testing autonomous or semi-autonomous devices.

The method 700 may be implemented at least in part in an autonomous device. For example, generating synthetic point cloud data using at least a trained generative model may be implemented in an autonomous vehicle (e.g., to supplement missing sensor data).

In some examples, the method 700 may be performed together on the same processing system. In other examples, portions of the method 700 may be performed separately and by different systems.

The method 700 may be performed, at least in part, by the processing unit 200 of fig. 2. For example, the processing unit 200 may execute instructions to perform the above-described step 702 and 706 to generate a projected 2D data mesh from the point cloud. The processing unit 200 may execute instructions to perform the above step 708 to learn the generative model using the projected 2D data mesh as input. The processing unit 200 may perform the above step 710 to implement the learned generative model for generating the synthetic point cloud.

In the examples described herein, the present application enables the generation of a synthetic point cloud that more accurately reflects a real point cloud (e.g., data from a LIDAR unit). The synthetic point cloud may be generated from a generative model that is learned in an unsupervised manner.

The synthetic point cloud generated in this manner may be used to supplement missing point clouds and/or missing data points in the point cloud (e.g., a real point cloud obtained using a LIDAR unit).

In some examples, the learned generative model may be combined with a sequence model. The sequence model may be learned to generate a sequence of synthetic point clouds. This may be used to generate synthetic time data, for example to predict and/or supplement missing data in the real point cloud. The sequence model may be implemented using a recurrent neural network RNN with flexible convolution operations. In such implementations, the learned generative model may be used in neurons of the RNN. In some embodiments, the RNN may be a learned generative model.

In some examples, when the learned generative model is implemented in an autonomous device (e.g., an autonomous vehicle), a collection of synthetic point clouds may be generated from (or in response to) the actions of the autonomous device. Such actions may be performed in reality or in simulation. For example, an action (e.g., a left turn) of the autonomous device may be fed back to the generative model to enable the generative model to generate a synthetic point cloud reflecting the action.

In some examples, the present application describes a method that includes generating a set of synthetic point cloud data using a generative model implemented using a flexible convolution in a generative countermeasure network. Training a generative model using one or more projected two-dimensional data meshes, generating a projected 2D data mesh from a respective set of 3D point cloud data by projecting each set of point cloud data onto the respective projected 2D data mesh while maintaining rotational homogeneity.

Although the methods and processes are described herein as having steps in a certain order, one or more of the steps of the methods and processes may be omitted or altered as appropriate. One or more of the steps may be performed in an order different from their described order, as desired.

Although the present application is described, at least in part, in terms of methods, one of ordinary skill in the art will appreciate that the present application also relates to various components, whether by means of hardware components, software, or any combination of both, for performing at least some aspects and features of the described methods. Accordingly, the technical solution of the present application may be embodied in the form of a software product. Suitable software products may be stored in a pre-recorded memory device or other similar non-volatile or non-transitory computer readable medium, including, for example, a DVD, CD-ROM, USB flash drive, removable hard drive, or other storage medium. The software product includes instructions tangibly stored thereon that enable a processing device (e.g., a personal computer, server, or network device) to perform examples of the methods disclosed herein.

This application may be embodied in other specific forms without departing from the subject matter of the claims. The described exemplary embodiments are to be considered in all respects only as illustrative and not restrictive. Features from one or more of the above-described embodiments may be combined to create alternative embodiments not explicitly described, and features suitable for such combinations are understood to be included within the scope of the present application.

All values and subranges within the disclosed ranges are also disclosed. Moreover, although the systems, devices, and processes disclosed and illustrated herein may include a particular number of elements/components, the systems, devices, and components may be modified to include additional or fewer of such elements/components. For example, although any elements/components disclosed may be referred to in the singular, the embodiments disclosed herein may be modified to include a plurality of such elements/components. The subject matter described herein is intended to cover and embrace all suitable variations in technology.

Claims

1. A method, comprising:

obtaining a first batch of point clouds representing a three-dimensional (3D) environment;

generating a projected 2D data grid for each point cloud of the first plurality of point clouds by projecting each point cloud of the first plurality of point clouds into the projected two-dimensional 2D data grid having rotational homogeneity; and

in a training phase, learning a generative model for generating one or more batches of synthetic point clouds, the generative model being learned by providing the generative model with a projected 2D data mesh of each point cloud of a first batch, wherein the generative model comprises a flexible convolution operation and a transposed flexible convolution operation.

2. The method of claim 1, further comprising:

one or more batches of synthetic point clouds are generated from the data sampled from the distribution using the learned generative model.

3. The method of claim 1 or 2, wherein the generating the projected 2D data grid further comprises:

wrapping elements of the projected 2D data mesh from one edge of the projected 2D data mesh to an opposite edge to fill the projected 2D data mesh; and

in the training phase, a model is generated using projection 2D data mesh learning after padding.

4. The method of any of claims 1 to 3, wherein each point cloud of the first plurality of point clouds is projected onto the projected 2D data mesh using the following formula:

5. The method of claim 4, wherein the projected 2D data grid is populated by adding an appended leftmost column containing elements from an original rightmost column of the projected 2D data grid and adding an appended rightmost column containing elements from an original leftmost column of the projected 2D data grid, wherein a number of columns populated is determined by a size of a convolution kernel.

6. The method of claim 5, wherein each row in the projected 2D data grid corresponds to a respective closed loop in the point cloud.

7. The method of any one of claims 1 to 6, further comprising:

supplementing any missing data points from the point clouds in the first batch with the one or more batches of synthesized point clouds.

8. The method of any one of claims 1 to 7, further comprising:

combining the generative model with a sequence model, wherein the sequence model generates synthetic time data that is used to predict any missing data points in the first batch of point clouds.

9. The method according to any of claims 1 to 8, characterized in that the learned generative model is a Recurrent Neural Network (RNN).

10. The method of claim 1, further comprising:

one or more batches of composite point clouds are generated from the data sampled in the distribution according to the actions of the autonomous device using the learned generative model.

11. A processing unit, comprising:

a processor;

a memory storing computer-executable instructions that, when executed by the processor, cause the processing unit to perform the method of any one of claims 1 to 10.

12. A computer readable memory storing instructions, which, when executed by a processor of a processing unit, cause the processing unit to perform the method of any one of claims 1 to 10.

13. A method, comprising:

obtaining a first batch of point clouds representing a 3D environment;

generating a projected 2D data grid for each point cloud of the first plurality of point clouds by projecting each point cloud of the first plurality of point clouds into the projected 2D data grid having rotational homogeneity; and

in a training phase, a generative model is learned for generating one or more batches of synthetic point clouds, the generative model being learned by providing a first batch of point clouds to the generative model, wherein the projected 2D data mesh is used to identify nearest neighbors for performing a flexible convolution operation and a transposed flexible convolution operation during learning.

14. The method of claim 13, further comprising:

15. The method of claim 13 or 14, wherein the generating the projected 2D data grid further comprises:

16. The method of any of claims 13 to 15, further comprising:

combining the generative model with a sequence model, wherein the sequence model captures temporal data used to predict any missing data points from the first batch of point clouds.

17. The method of any one of claims 13 to 16, wherein the learned generative model is a recurrent neural network.

18. A processing unit, comprising:

a processor;

a memory storing computer-executable instructions that, when executed by the processor, cause the processing unit to perform the method of any of claims 13 to 17.

19. A computer readable memory storing instructions, which, when executed by a processor of a processing unit, cause the processing unit to perform the method of any one of claims 13 to 17.