CN116636204A

CN116636204A - Mixed tree coding for inter and intra prediction for geometric coding

Info

Publication number: CN116636204A
Application number: CN202180086668.0A
Authority: CN
Inventors: B·雷; A·K·拉马苏布拉莫尼安; L·法姆范; G·范德奥韦拉; M·卡尔切维茨
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2020-12-29
Filing date: 2021-12-28
Publication date: 2023-08-22

Abstract

The device for decoding a bitstream comprising point cloud data is configured to: determining an octree defining an octree-based partition of a space containing point clouds, wherein leaf nodes of the octree contain one or more points of the point clouds; and directly decoding a location of each of the one or more points in the leaf node, wherein to directly decode the location of each of the one or more points in the leaf node, the one or more processors are further configured to: generating predictions of one or more points; and determining one or more points based on the prediction.

Description

Mixed tree coding for inter and intra prediction for geometric coding

The present application claims priority from U.S. patent application Ser. No.17/562,398, filed on day 27 of 12 in 2021, and U.S. provisional patent application Ser. No.63/131,546, filed on day 29 of 12 in 2020, each of which is incorporated herein by reference in its entirety. U.S. patent application Ser. No.17/562,398, filed on day 27 of 12 in 2021, claims the benefit of U.S. provisional patent application Ser. No.63/131,546, filed on day 29 of 12 in 2020.

Technical Field

The present disclosure relates to point cloud encoding and decoding.

Disclosure of Invention

The present disclosure generally describes a hybrid tree coding (coding) method that combines octree coding and predictive coding for enhanced inter/intra prediction at the block level of point cloud compression.

In one example, the present disclosure describes a method of encoding a point cloud, the method comprising: determining an octree defining an octree-based segmentation of a space containing point clouds, wherein: the leaf nodes of the octree contain one or more points of the point cloud, and the location of each of the one or more points in the leaf node is directly signaled; generating predictions of one or more points using intra-prediction or inter-prediction; and encoding and decoding a syntax element indicating whether one or more points are predicted using intra-prediction or inter-prediction.

According to one example of the present disclosure, an apparatus for decoding a bitstream including point cloud data includes: the memory is used for storing the point cloud data; and one or more processors coupled to the memory and implemented in the circuitry, the one or more processors configured to: determining an octree defining an octree-based partition of a space containing point clouds, wherein leaf nodes of the octree contain one or more points of the point clouds; and directly decoding a location of each of the one or more points in the leaf node, wherein to directly decode the location of each of the one or more points in the leaf node, the one or more processors are further configured to: the method includes generating a prediction of one or more points, and determining the one or more points based on the prediction.

According to another example of the present disclosure, a method of decoding a point cloud includes: determining an octree defining an octree-based partition of a space containing point clouds, wherein leaf nodes of the octree contain one or more points of the point clouds; directly decoding a location of each of the one or more points in the leaf node, wherein directly decoding the location of each of the one or more points in the leaf node comprises: generating predictions of one or more points; and determining one or more points based on the prediction.

According to another example of the present disclosure, a computer-readable storage medium stores instructions that, when executed by one or more processors, cause the one or more processors to: determining an octree defining an octree-based partition of a space containing point clouds, wherein leaf nodes of the octree contain one or more points of the point clouds; and directly decoding a location of each of the one or more points in the leaf node, wherein to directly decode the location of each of the one or more points in the leaf node, the instructions cause the one or more processors to: generating predictions of one or more points; and determining one or more points based on the prediction.

According to another example of the present disclosure, an apparatus includes: means for determining an octree defining an octree-based partition of a space containing a point cloud, wherein leaf nodes of the octree contain one or more points of the point cloud; means for directly decoding a position of each of one or more points in the leaf node, wherein the means for directly decoding a position of each of the one or more points in the leaf node comprises: means for generating predictions of one or more points; and means for determining one or more points based on the prediction.

The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1 is a block diagram illustrating an example encoding and decoding system that may perform the techniques of this disclosure.

FIG. 2 is a block diagram illustrating an example geometric point cloud compression (G-PCC) encoder.

Fig. 3 is a block diagram illustrating an example G-PCC decoder.

Fig. 4 is a conceptual diagram illustrating an example octree partitioning for geometric coding.

Fig. 5 is a conceptual diagram illustrating an example of a prediction tree.

Fig. 6 is a conceptual diagram illustrating an example rotated Lidar acquisition model.

Fig. 7 is a conceptual diagram illustrating an example motion estimation flow diagram of inters.

Fig. 8 is a conceptual diagram illustrating an example algorithm for estimating global motion.

Fig. 9 is a conceptual diagram illustrating an example algorithm for estimating local node motion vectors.

Fig. 10 is a conceptual diagram illustrating an example of high-level octree partitioning.

Fig. 11 is a conceptual diagram illustrating an example of local prediction tree generation.

Fig. 12 is a conceptual diagram showing an example current point set (O0 to O12) and reference point set (R0 to R12) (where n=m=13).

Fig. 13 is a conceptual diagram illustrating an example current point set and a motion compensation reference point set (where n=m=13).

Fig. 14 is a conceptual diagram illustrating an example ranging-finishing system that may be used with one or more techniques of the present disclosure.

Fig. 15 is a conceptual diagram illustrating an example scenario based on a vehicle in which one or more techniques of the present disclosure may be used.

Fig. 16 is a conceptual diagram illustrating an example augmented reality system in which one or more techniques of the present disclosure may be used.

Fig. 17 is a conceptual diagram illustrating an example mobile device system in which one or more techniques of this disclosure may be used.

Fig. 18 is a flowchart illustrating example operations for decoding a bitstream including point cloud data.

Detailed Description

A point cloud is a collection of points in three-dimensional (3D) space. These points may correspond to points on an object in three-dimensional space. Thus, the point cloud may be used to represent the physical content of a three-dimensional space. The point cloud may be effective in various situations. For example, a point cloud may be used in the context of an autonomous vehicle to represent the location of an object on a road. In another example, to locate virtual objects in an Augmented Reality (AR) or Mixed Reality (MR) application, a point cloud may be used in an environment that represents the physical content of the environment. Point cloud compression is the process of encoding and decoding point clouds. Encoding a point cloud may reduce the amount of data required to store and transmit the point cloud.

There have been two main proposals for signaling the location of points in a point cloud: octree codec and predictive tree codec. As part of encoding the point cloud data using octree encoding, the G-PCC encoder may generate octrees. Each node of the octree corresponds to a cuboid space. Nodes of the octree may have zero or eight child nodes. In other examples, nodes may be partitioned into child nodes according to other tree structures. Child nodes of a parent node correspond to equal sized cuboids within the cuboid corresponding to the parent node. The locations of the various points of the point cloud relative to the origin of the node may be signaled. If a node does not contain any points of the point cloud, the node is said to be unoccupied. If the node is unoccupied, it may not be necessary to signal additional data about the node. Conversely, a node is said to be occupied if it contains one or more points of the point cloud.

When encoding the point cloud data using the prediction tree codec, the G-PCC encoder determines a prediction mode for each point of the point cloud. The prediction mode for this point may be one of the following:

● No prediction/zero prediction (0)

● Incremental (delta) prediction (p 0)

● Linear prediction (2 x p0-p 1)

● Parallelogram prediction (2 x p0+p1-p 2)

In the case where the prediction mode of the point is "no prediction/zero prediction," the point is considered the root point (i.e., root vertex), and the coordinates (e.g., x, y, z coordinates) of the point are signaled in the bitstream. In the case where the prediction mode of the point is "delta prediction," the G-PCC encoder determines the difference (i.e., delta) between the coordinates of the point and the coordinates of a parent point, such as the root point or other point. In the case where the prediction mode is "linear prediction", the G-PCC encoder uses linear prediction of the coordinates of two parent points to determine the predicted coordinates of the point. The G-PCC encoder signals the difference between the predicted coordinates and the actual coordinates of the point, which are determined using linear prediction. In the case where the prediction mode is "parallelogram prediction", the G-PCC encoder uses three parent points to determine the prediction coordinates. The G-PCC encoder then signals the difference (e.g., "primary residual") between the predicted and actual coordinates of the point. The predictive relationship between points essentially defines a tree of points.

It has been experimentally observed that octree codecs may be more suitable for dense point clouds than predictive tree codecs. The point clouds obtained using 3D modeling are typically dense enough for octree codecs to work better. However, point clouds for automotive applications acquired using LiDAR, for example, tend to be somewhat rough, and thus predictive codecs may be more suitable for these applications.

In some examples, the angular pattern may be used to represent coordinates of points in a spherical coordinate system. Because the conversion process between spherical and cartesian (e.g., x, y, z) coordinate systems is not perfect, information may be lost. But since the G-PCC encoder may perform the conversion process, the G-PCC encoder may signal a "secondary residual (secondary residual) of the point, which indicates the difference between the cartesian coordinates of the point and the original cartesian coordinates of the point resulting from applying the conversion process to the spherical coordinates of the point.

The present disclosure relates to a hybrid codec model in which both octree codec and direct codec are used for the codec point cloud. For example, octree coding may be used initially to divide space into nodes down to a particular level. Nodes at a particular level (as well as other occupied nodes in the octree that are not further partitioned) may be referred to as "leaf nodes. Points within the leaf node volume (volume) may be encoded using a "direct" codec mode.

When encoding the points of the leaf nodes in a "direct" codec mode, the G-PCC encoder may select an intra-prediction mode for the leaf nodes or an inter-prediction mode for the leaf nodes. The G-PCC encoder may signal whether a point of a leaf node is encoded using intra-prediction mode or inter-prediction mode.

If the G-PCC encoder selects an intra prediction mode for a leaf node, the G-PCC encoder may encode points in the leaf node using a prediction tree codec in much the same manner as described above. That is, the G-PCC encoder may select from the four prediction modes and signal the coordinates of the points accordingly. However, instead of signaling the coordinates of the origin with respect to the entire space associated with the octree, the G-PCC encoder may signal the coordinates of the origin with respect to the leaf node. This may increase the codec efficiency, especially for the root node.

If the G-PCC encoder selects an inter prediction mode for a leaf node, the G-PCC encoder may encode a point in the leaf node relative to a set of points in a reference frame. The reference frame may be a previously decoded frame, similar to a previous frame of video. The G-PCC encoder may perform motion estimation to identify a set of points in the reference frame that have a similar spatial arrangement as the points in the leaf nodes. The motion vector of the leaf node indicates a displacement between the point of the leaf node and the set of points identified in the reference frame.

The G-PCC encoder may signal the parameter set of the leaf node. The parameters of the leaf node may include a reference index identifying the reference frame. The parameters of the leaf node may also include a value indicating the number of points in the leaf node.

The parameters of the leaf nodes may also include residual values for each point in the leaf node. The residual values of the points in the leaf nodes indicate the differences between the predicted coordinates of the leaf nodes (determined by adding the motion vector of the leaf node to the point in the reference frame corresponding to the point in the leaf node). In examples using the angle mode, the G-PCC encoder may also signal the secondary residuals for these points.

In some examples, the parameters of the leaf node further include a Motion Vector Difference (MVD). MVD indicates the difference between the motion vector of the leaf node and the predicted motion vector. The predicted motion vector is a motion vector of a neighboring node of the octree. The parameters of the leaf node may include an index identifying neighboring nodes.

In other examples, similar to merge (merge) mode in conventional video codec, parameters of a leaf node do not include MVD, and a motion vector of a leaf node may be assumed to be the same as a motion vector of an identified neighboring node.

In some examples, signaling of the residual may be skipped. In some such examples using the angle mode, signaling of the primary residual may be skipped while the secondary residual is still signaled.

Fig. 1 is a block diagram illustrating an example encoding and decoding system 100 that may perform the techniques of this disclosure. The techniques of this disclosure are generally directed to encoding (encoding and/or decoding) point cloud data, i.e., supporting point cloud compression. In general, point cloud data includes any data used to process a point cloud. The codec may effectively compress and/or decompress the point cloud data.

As shown in fig. 1, the system 100 includes a source device 102 and a destination device 116. The source device 102 provides encoded point cloud data to be decoded by the destination device 116. Specifically, in the example of fig. 1, the source device 102 provides point cloud data to the destination device 116 via the computer readable medium 110. The source device 102 and the destination device 116 may comprise any of a variety of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as smartphones, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, land or marine vehicles, spacecraft, aircraft, robots, LIDAR devices, satellites, and the like. In some cases, the source device 102 and the destination device 116 may be equipped for wireless communication.

In the example of fig. 1, source device 102 includes a data source 104, a memory 106, a G-PCC encoder 200, and an output interface 108. Destination device 116 includes an input interface 122, a G-PCC decoder 300, a memory 120, and a data consumer 118. In accordance with the present disclosure, the G-PCC encoder 200 of the source device 102 and the G-PCC decoder 300 of the destination device 116 may be configured to apply the techniques of the present disclosure related to a hybrid tree codec method that combines octree codecs and predictive codecs for enhanced inter/intra prediction at the block level of point cloud compression. Thus, the source device 102 represents an example of an encoding device, while the destination device 116 represents an example of a decoding device. In other examples, the source device 102 and the destination device 116 may include other components or arrangements. For example, the source device 102 may receive data (e.g., point cloud data) from an internal or external source. Likewise, the destination device 116 may interface with external data consumers rather than including the data consumers in the same device.

The system 100 as shown in fig. 1 is only one example. In general, other digital encoding and/or decoding devices may perform the techniques of the present disclosure related to a hybrid tree codec method that combines octree codec and predictive codec for enhanced inter/intra prediction at the block level of point cloud compression. The source device 102 and the destination device 116 are merely examples of devices in which the source device 102 generates decoded data for transmission to the destination device 116. The "codec" device of the present disclosure refers to a device that performs codec (encoding and/or decoding) on data. Thus, the G-PCC encoder 200 and the G-PCC decoder 300 represent examples of a codec device, specifically an encoder and a decoder, respectively. In some examples, the source device 102 and the destination device 116 may operate in a substantially symmetrical manner such that each of the source device 102 and the destination device 116 includes encoding and decoding components. Thus, the system 100 may support unidirectional or bidirectional transmission between the source device 102 and the destination device 116, for example, for streaming, playback, broadcasting, telephony, navigation, and other applications.

In general, the data source 104 represents a source of data (i.e., raw, unencoded point cloud data), and may provide a continuous series of "frames" of data to the G-PCC encoder 200, which the G-PCC encoder 200 encodes the data for. The data source 104 of the source device 102 may include a point cloud capture device, such as any of a variety of cameras or sensors, for example, a 3D scanner or light detection and ranging (LIDAR) device, one or more cameras, an archive (archive) containing previously captured data, and/or a data feed interface that receives data from a data content provider. Alternatively or additionally, the point cloud data may be computer generated from scanners, cameras, sensors, or other data. For example, the data source 104 may generate computer graphics-based data as source data, or a combination of real-time data, archived data, and computer-generated data. In each case, the G-PCC encoder 200 encodes captured, pre-captured, or computer-generated data. The G-PCC encoder 200 may rearrange the order of frames from reception (sometimes referred to as "display order") into a codec order for codec. G-PCC encoder 200 may generate one or more bitstreams including the encoded data. The source device 102 may then output the encoded data onto the computer readable medium 110 via the output interface 108 for receipt and/or retrieval by, for example, the input interface 122 of the destination device 116.

The memory 106 of the source device 102 and the memory 120 of the destination device 116 may represent general purpose memory. In some examples, memory 106 and memory 120 may store raw data, e.g., raw data from data source 104 and raw decoded data from G-PCC decoder 300. Additionally or alternatively, memory 106 and memory 120 may store software instructions executable by, for example, G-PCC encoder 200 and G-PCC decoder 300, respectively. Although memory 106 and memory 120 are shown separately from G-PCC encoder 200 and G-PCC decoder 300 in this example, it should be understood that G-PCC encoder 200 and G-PCC decoder 300 may also include internal memories for functionally similar or equivalent purposes. Further, memory 106 and memory 120 may store encoded data, for example, data output from G-PCC encoder 200 and input to G-PCC decoder 300. In some examples, memory 106 and portions of memory 120 may be allocated as one or more buffers, e.g., for storing raw, decoded, and/or encoded data. For example, memory 106 and memory 120 may store data representing a point cloud.

Computer-readable medium 110 may represent any type of medium or device capable of transmitting encoded data from source device 102 to destination device 116. In one example, the computer-readable medium 110 represents a communication medium that enables the source device 102 to send encoded data directly to the destination device 116 in real-time, e.g., via a radio frequency network or a computer-based network. According to a communication standard, such as a wireless communication protocol, output interface 108 may modulate a transmission signal including encoded data, and input interface 122 may demodulate a received transmission signal. The communication medium may include any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the internet. The communication medium may include routers, switches, base stations, or any other equipment that facilitates communication from source device 102 to destination device 116.

In some examples, source device 102 may output encoded data from output interface 108 to storage device 112. Similarly, destination device 116 may access encoded data from storage device 112 via input interface 122. Storage device 112 may include any of a variety of distributed or locally accessed data storage media such as a hard disk, blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded data.

In some examples, source device 102 may output the encoded data to file server 114 or another intermediate storage device that may store the encoded data generated by source device 102. The destination device 116 may access the stored data from the file server 114 via streaming or download. The file server 114 may be any type of server device capable of storing encoded data and transmitting the encoded data to the destination device 116. File server 114 may represent a web server (e.g., for a website), a File Transfer Protocol (FTP) server, a content delivery network device, or a Network Attached Storage (NAS) device. The destination device 116 may access the encoded data from the file server 114 over any standard data connection, including an internet connection. This may include a wireless channel (e.g., wi-Fi connection), a wired connection (e.g., digital subscriber line (digital subscriber line, DSL), cable modem, etc.), or a combination of both suitable for accessing the encoded data stored on the file server 114. The file server 114 and the input interface 122 may be configured to operate in accordance with a streaming protocol, a download transmission protocol, or a combination thereof.

Output interface 108 and input interface 122 may represent wireless transmitters/receivers, modems, wired network components (e.g., ethernet cards), wireless communication components operating according to any of a variety of IEEE 802.11 standards, or other physical components. In examples where output interface 108 and input interface 122 comprise wireless components, output interface 108 and input interface 122 may be configured to transmit data, such as encoded data, in accordance with cellular communication standards, such as 4G, 4G-LTE (long term evolution), LTE-advanced, 5G, and the like. In some examples where output interface 108 includes a wireless transmitter, output interface 108 and input interface 122 may be configured to be in accordance with a protocol such as the IEEE 802.11 specification, the IEEE802.15 specification (e.g., zigBee 2), bluetooth, etc ^TM A standard or other wireless standard like standard to transmit data such as encoded data. In some examples, source device 102 and/or destination device 116 may include respective system-on-chip (SoC) devices. For example, source device 102 may comprise a SoC device that performs functionality attributed to G-PCC encoder 200 and/or output interface 108, and destination device 116 may compriseSoC devices that perform the functionality attributed to G-PCC decoder 300 and/or input interface 122.

The techniques of this disclosure may be applied to encoding and decoding to support any of a variety of applications, such as communication between autonomous vehicles, communication between scanners, cameras, sensors, and processing devices (such as local or remote servers), geographic mapping, or other applications.

The input interface 122 of the destination device 116 receives the encoded bitstream from the computer readable medium 110 (e.g., communication medium, storage device 112, file server 114, etc.). The encoded bitstream may include signaling information defined by the G-PCC encoder 200, such as syntax elements having values describing characteristics and/or processing of the codec units (e.g., slices, pictures, groups of pictures, sequences, etc.), which is also used by the G-PCC decoder 300. The data consumer 118 uses the decoded data. For example, the data consumer 118 may use the decoded data to determine the location of the physical object. In some examples, the data consumer 118 may include a display that presents images based on the point cloud.

G-PCC encoder 200 and G-PCC decoder 300 may each be implemented as any of a variety of suitable encoder and/or decoder circuits, such as one or more microprocessors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combination thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of the G-PCC encoder 200 and the G-PCC decoder 300 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined encoder/decoder (codec) in the respective device. The devices that include G-PCC encoder 200 and/or G-PCC decoder 300 may include one or more integrated circuits, microprocessors, and/or other types of devices.

The G-PCC encoder 200 and the G-PCC decoder 300 may operate according to a codec standard, such as a video point cloud compression (V-PCC) standard or a geometric point cloud compression (G-PCC) standard. The present disclosure may generally relate to encoding and decoding (e.g., encoding and decoding) of pictures to include processes of encoding or decoding data. The encoded bitstream typically includes a series of values representing syntax elements of a codec decision (e.g., a codec mode).

The present disclosure may generally relate to "signaling" certain information, such as syntax elements. The term "signaling" may generally relate to the communication of values of syntax elements and/or other data used to decode the encoded data. That is, the G-PCC encoder 200 may signal the value of the syntax element in the bitstream. In general, signaling refers to generating values in a bitstream. As described above, the source device 102 may transmit the bitstream to the destination device 116 in substantially real-time or non-real-time, such as may occur when the syntax elements are stored to the storage device 112 for subsequent retrieval by the destination device 116.

The potential requirements for standardization of point cloud codec technology are being investigated by ISO/IEC MPEG (JTC 1/SC 29/WG 11), whose compression capabilities greatly exceed current methods. The team is collaborating on this exploratory activity, called the three-dimensional graphics team (3 DG), to evaluate the compression technology design proposed by the domain expert.

Point cloud compression activities fall into two different ways. The first approach is "video point cloud compression" (V-PCC), which segments a 3D object and projects segments into multiple 2D planes (denoted as "patches" in 2D frames) that are further encoded by conventional 2D video codecs such as High Efficiency Video Codec (HEVC) (ITU-T h.265) codecs. The second approach is "geometry-based point cloud compression" (G-PCC), which directly compresses the 3D geometry, i.e., the location of the point set in 3D space, and the associated attribute values (for each point associated with the 3D geometry). G-PCC addresses compression of point clouds both in category 1 (static point clouds) and in category 3 (dynamically acquired point clouds). The latest draft of the G-PCC standard is available in "Text of ISO/IEC FDIS 23090-9Geometry-based Point Cloud Compression (ISO/IEC FDIS 23090-9 Text based on geometrical point cloud compression)" (month 10 of 2020, ISO/IEC JTC 1/SC29/WG 7MDS19617, conference call) and the description of the codec is available in "G-PCC Codec Description (G-PCC codec description)" (month 10 of 2020, ISO/IEC JTC 1/SC29/WG 7MDS19620, conference call).

The point cloud contains a set of points in 3D space and may have attributes associated with the points. The attribute may be color information such as R, G, B or Y, cb, cr, or reflection information, or other attributes. The point cloud may be captured by various cameras or sensors (such as LIDAR sensors and 3D scanners) or may be generated by a computer. The point cloud data is used for a variety of applications including, but not limited to, construction (modeling), graphics (3D models for visualization and animation), and the automotive industry (LIDAR sensors for aiding navigation).

The 3D space occupied by the point cloud data may be surrounded by a virtual bounding box. The position of a point in the bounding box may be represented with some accuracy; thus, the location of one or more points may be quantified based on accuracy. At the minimum level, the bounding box is divided into voxels (voxels), which are the smallest spatial units represented by a unit cube. Voxels in the bounding box may be associated with zero, one, or more points. The bounding box may be divided into a plurality of cube/cuboid regions, which may be referred to as tiles (tiles). Each tile may be encoded into one or more slices. The segmentation of the bounding box into slices and tiles may be based on the number of points in each segment, or based on other considerations (e.g., a particular region may be encoded into tiles). The strip region may be further partitioned using partitioning decisions similar to those in video codecs.

Fig. 2 provides an overview of G-PCC encoder 200. Fig. 3 provides an overview of a G-PCC decoder 300. The modules shown are logical and do not necessarily correspond one-to-one to the code implemented in the reference implementation of the G-PCC codec, i.e., TMC13 test model software studied by ISO/IEC MPEG (JTC 1/SC 29/WG 11).

In the G-PCC encoder 200 and the G-PCC decoder 300, the point cloud location is first encoded and decoded. The attribute codec depends on the decoded geometry. The surface approximation analysis unit 212 and the RAHT unit 218 of fig. 2, and the surface approximation synthesis unit 310 and the RAHT unit 314 of fig. 3 are options commonly used for class 1 data. The LOD generation unit 220 and lifting (lifting) unit 222 and LOD generation unit 316 and inverse lifting unit 318 of fig. 3 are options commonly used for category 3 data. All other modules are generic between category 1 and category 3.

For geometry, there are two different types of codec techniques: octree and prediction tree codecs. Hereinafter, the present disclosure focuses on octree codecs. For class 3 data, the compressed geometry is typically represented as an octree from root down to the leaf level of a single voxel. For class 3 data, the compressed geometry is typically represented as an octree from root down to the leaf level of a single voxel. For class 1 data, the compressed geometry is typically represented by pruning the octree (i.e., the octree from the root down to the leaf level of the block that is larger than the voxel) plus a model that approximates the surface within each leaf of the pruned octree. In this way, the class 1 and class 3 data share an octree codec mechanism, while the class 1 data may additionally approximate the voxels of each She Zina with a surface model. The surface model used is triangulation comprising 1 to 10 triangles per block, resulting in a triangle so. Thus, the class 1 geometry codec is referred to as a trisop geometry codec, while the class 3 geometry codec is referred to as an octree geometry codec.

Fig. 4 is a conceptual diagram illustrating an example octree partitioning for geometric coding. Octree 400 includes 8 child nodes. Some of these child nodes, such as node 402, have no child nodes. However, other child nodes, such as node 404, have child nodes, and some of the child nodes of node 404 also have child nodes, and so on.

At each node of the octree, a notification (when not inferred) is signaled for one or more child nodes (up to eight nodes). Designating a plurality of neighbors, including (a) nodes that share a face with a current octree node, (b) nodes that share a face, edge, or vertex with a current octree node, and so on. Within each neighborhood, the occupancy of a node and/or its child nodes may be used to predict the occupancy of the current node or its child nodes. For sparsely populated points in certain nodes of the octree, the codec also supports a direct codec mode in which the 3D positions of the points are directly encoded. A flag may be signaled to indicate that the direct mode is signaled. At the lowest level, points associated with octree nodes/leaf nodes may also be encoded.

Once the geometry is encoded, the attributes corresponding to the geometry points are encoded. When there are a plurality of attribute points corresponding to one reconstructed/decoded geometric point, an attribute value representing the reconstructed point may be derived.

There are three attribute codec methods in G-PCC: regional Adaptive Hierarchical Transform (RAHT) codec, interpolation-based hierarchical nearest neighbor prediction (predictive transform), and interpolation-based hierarchical nearest neighbor prediction (lifting transform) with update/lifting steps. RAHT and boosting are typically used for class 1 data, while prediction is typically used for class 3 data. However, any method can be used for any data, and as with the geometry codec in G-PCC, the attribute codec method for the codec point cloud is specified in the bitstream.

The encoding and decoding of the attributes may be performed in level-of-detail (LOD), wherein with each level of detail a finer representation of the point cloud attributes may be obtained. Each level of detail may be specified based on a distance metric from neighboring nodes or based on a sampling distance.

At the G-PCC encoder 200, the residual obtained as an output of the codec method of the attribute is quantized. The quantized residual may be encoded using context-adaptive arithmetic coding.

In the example of fig. 2, the G-PCC encoder 200 may include a coordinate transformation unit 202, a color transformation unit 204, a voxelization unit 206, an attribute transmission unit 208, an octree analysis unit 210, a surface approximation analysis unit 212, an arithmetic coding unit 214, a geometry reconstruction unit 216, a RAHT unit 218, a LOD generation unit 220, a lifting unit 222, a coefficient quantization unit 224, and an arithmetic coding unit 226.

As shown in the example of fig. 2, G-PCC encoder 200 may receive a set of locations and a set of attributes. These locations may include coordinates of points in the point cloud. The attributes may include information about points in the point cloud, such as colors associated with the points in the point cloud.

The coordinate transformation unit 202 may apply a transformation to the point coordinates to transform the coordinates from an initial domain to a transformation domain. The present disclosure may refer to the transformed coordinates as transformed coordinates. The color transformation unit 204 may apply a transformation to transform the color information of the attribute to a different domain. For example, the color conversion unit 204 may convert color information from an RGB color space to a YCbCr color space.

Further, in the example of fig. 2, the voxelization unit 206 may voxelize the transformed coordinates. Voxelization of the transformed coordinates may include quantifying and removing some points of the point cloud. In other words, multiple points of a point cloud may be grouped (subsumed) in a single "voxel," which may be considered a point in some aspects hereafter. Further, the octree analysis unit 210 may generate octrees based on the voxelized transformation coordinates. Additionally, in the example of fig. 2, the surface approximation analysis unit 212 may analyze the points to potentially determine a surface representation of the set of points. The arithmetic coding unit 214 may entropy-encode syntax elements representing the information of the surface and/or octree determined by the surface approximation analysis unit 212. The G-PCC encoder 200 may output these syntax elements in a geometric bitstream.

The geometry reconstruction unit 216 may reconstruct transformed coordinates of points in the point cloud based on the octree, data indicative of the surface determined by the surface approximation analysis unit 212, and/or other information. The number of transformed coordinates reconstructed by the geometry reconstruction unit 216 may be different from the original points of the point cloud due to the voxelization and surface approximation. The present disclosure may refer to the resulting points as reconstruction points. The attribute transmission unit 208 may transmit the attribute of the original point of the point cloud to the reconstruction point of the point cloud.

In addition, the RAHT unit 218 may apply RAHT codec to the attributes of the reconstruction point. Alternatively or additionally, the LOD generation unit 220 and the lifting unit 222 may apply LOD processing and lifting, respectively, to the attributes of the reconstruction points. The RAHT unit 218 and the lifting unit 222 may generate coefficients based on the attributes. The coefficient quantization unit 224 may quantize the coefficients generated by the RAHT unit 218 or the lifting unit 222. The arithmetic coding unit 226 may apply arithmetic coding to syntax elements representing quantized coefficients. The G-PCC encoder 200 may output these syntax elements in the attribute bitstream.

In the example of fig. 3, the G-PCC decoder 300 may include a geometric arithmetic decoding unit 302, an attribute arithmetic decoding unit 304, an octree synthesis unit 306, an inverse quantization unit 308, a surface approximation synthesis unit 310, a geometric reconstruction unit 312, a RAHT unit 314, a LoD generation unit 316, an inverse lifting unit 318, an inverse transform coordinate unit 320, and an inverse transform color unit 322.

G-PCC decoder 300 may obtain a geometry bitstream and an attribute bitstream. The geometric arithmetic decoding unit 302 of the G-PCC decoder 300 may apply arithmetic decoding (e.g., context Adaptive Binary Arithmetic Coding (CABAC) or other types of arithmetic decoding) to syntax elements in the geometric bitstream. Similarly, the attribute arithmetic decoding unit 304 may apply arithmetic decoding to syntax elements in the attribute bitstream.

The octree synthesis unit 306 may synthesize octrees based on syntax elements parsed from the geometric bitstream. In the case of using surface approximations in the geometric bitstream, the surface approximation synthesis unit 310 may determine a surface model based on syntax elements parsed from the geometric bitstream and based on octree.

Further, the geometric reconstruction unit 312 may perform reconstruction to determine coordinates of points in the point cloud. The inverse transformation coordinate unit 320 may apply an inverse transformation to the reconstructed coordinates to convert the reconstructed coordinates (positions) of the points in the point cloud from the transformation domain back to the initial domain.

Additionally, in the example of fig. 3, the inverse quantization unit 308 may inverse quantize the attribute values. The attribute value may be based on syntax elements obtained from the attribute bitstream (e.g., including syntax elements decoded by the attribute arithmetic decoding unit 304).

Depending on how the attribute values are encoded, the RAHT unit 314 may perform RAHT codec to determine color values of points in the point cloud based on the inversely quantized attribute values. Alternatively, the LOD generation unit 316 and the inverse boost unit 318 may use a level of detail-based technique to determine color values of points in the point cloud.

Further, in the example of fig. 3, the inverse transform color unit 322 may apply an inverse color transform to the color values. The inverse color transform may be an inverse of the color transform applied by the color transform unit 204 of the G-PCC encoder 200. For example, the color conversion unit 204 may convert color information from an RGB color space to a YCbCr color space. Accordingly, the inverse color transform unit 322 may transform color information from the YCbCr color space to the RGB color space.

Fig. 2 and 3 illustrate various elements to aid in understanding the operations performed by G-PCC encoder 200 and G-PCC decoder 300. These units may be implemented as fixed function circuits, programmable circuits or a combination thereof. A fixed function circuit refers to a circuit that provides a specific function, and is preset on an operation that can be performed. Programmable circuitry refers to circuitry that can be programmed to perform various tasks and provide flexible functionality in the operations that can be performed. For example, the programmable circuit may execute software or firmware that causes the programmable circuit to operate in a manner defined by instructions of the software or firmware. Fixed function circuitry may execute software instructions (e.g., receive parameters or output parameters) but the type of operation that fixed function circuitry performs is typically not variable. In some examples, one or more of these units may be different circuit blocks (fixed function or programmable), and in some examples, one or more of these units may be an integrated circuit.

Prediction geometry codec is introduced as an alternative to octree geometry codec, wherein nodes are arranged in a tree structure (which defines a prediction structure), and various prediction strategies are used to predict the coordinates of each node in the tree with respect to its predictor.

Fig. 5 shows an example of a prediction tree 410, represented as a directed graph (directed graph), in which the arrows point in the prediction direction. Node 412 is the root vertex and has no predicted value. Nodes 414A and 414B have two child nodes. The dotted node has 3 child nodes. White filled nodes have one child node and nodes 418A through 418E are leaf nodes without child nodes. Each node has only one parent node.

Four prediction strategies are specified for each node based on its parent node (p 0), its great grandparent node (p 1) and its great grandparent node (p 2):

● No prediction/zero prediction (0)

● Incremental prediction (p 0)

● Linear prediction (2 x p0-p 1)

● Parallelogram prediction (2 x p0+p1-p 2)

G-PCC encoder 200 may employ any algorithm to generate a prediction tree; the algorithm used may be determined based on the application/use case and several policies may be used. Some strategies are described in "G-PCC Codec Description (G-PCC codec description)" (ISO/IEC JTC 1/SC29/WG 7MDS19620, conference call, month 10 2020).

For each node, the residual coordinate values are encoded in the bitstream in a depth-first manner starting from the root node.

Predictive geometry codec is mainly applicable to class 3 (LIDAR acquired) point cloud data, e.g. for low latency applications.

The angular pattern may be used to predict geometric codec, where the characteristics of the LIDAR sensor may be used to more efficiently codec the prediction tree. The position coordinates are converted to (r, phi, i) (radius, azimuth and laser index) and prediction is performed in this domain (residual is encoded in the r, phi, i domain). Due to rounding errors, the codec of r, phi, i is not lossless and thus the second set of residuals corresponding to cartesian coordinates is codec. The description of the coding and decoding strategy for predicting the angular pattern of the geometric codec refers to "G-PCC Codec Description (G-PCC codec)" (ISO/IEC JTC 1/SC29/WG 7MDS19620, conference call, month 10 2020).

The method focuses on point clouds acquired using a rotating Lidar model. Here, the lidar has N lasers (e.g., N=16, 32, 64), these lasers are rotated about the Z-axis according to the azimuth angle Φ (see fig. 6). Each laser may have a different elevation angle θ (i) _i＝1…N And height ofLet it be assumed that the laser i hits a point M having cartesian integer coordinates (x, y, z) defined according to the coordinate system depicted in fig. 6.

The method provides three parameters (r, phi, i) for the location modeling of M, which are calculated as follows:

●

●φ＝atan2(y,x)

●

more precisely, the method uses quantized versions of (r, φ, i), denoted asWherein, three integersAnd i is calculated as follows:

●

wherein:

●(q _r ,o _r ) And (q) _φ ,o _φ ) Respectively controlAnd->Is used for the quantization parameter of the precision of (a).

● sign (t) is a function that returns 1 when t is a positive number, otherwise returns (-1).

● |t| is the absolute value of t.

To avoid reconstruction mismatch due to the use of floating point operations,and tan (θ (i)) _i＝1…N The values of (2) are pre-calculated and quantized as follows:

wherein:

●and (q) _θ ,o _θ ) Control +.>And->Is used for the quantization parameter of the precision of (a).

The reconstructed cartesian coordinates are obtained as follows:

●

●/>

●

wherein app_cos (-) and app_sin (-) are approximations of cos (-) and sin (-). The computation may use a fixed point representation, a look-up table, and linear interpolation.

It should be noted that, for various reasons,may be different from (x, y, z):

-quantization

-approximation value

Model inaccuracy (precision)

Model parameter inaccuracy

Design (r) _x ,r _y ,r _z ) The reconstruction residual is defined as follows:

-

In this method, an encoder (e.g., G-PCC encoder 200) operates as follows:

● For model parametersAnd quantization parameter q _r 、/>q _θ And q _φ Encoding

● Based on ISO/IEC FDIS 23090-9Geometrical prediction schemes described in the text of geometrical point cloud compression, ISO/IEC JTC 1/SC29/WG 7MDS19617 (conference call, month 10 2020) are applied to the representation

A new predicted value using the characteristics of the lidar may be introduced. For example, the rotational speed of a lidar scanner about the z-axis is typically constant. Thus, at presentThe following can be predicted:

wherein the method comprises the steps of

■(δ _φ (k)) _k＝1…K Is the set of potential speeds from which the encoder can select. The index k may be explicitly written to the bitstream, or may be inferred from the context based on deterministic policies applied by both the encoder and the decoder, and

■ n (j) is the number of skipped points that can be written explicitly to the bitstream or can be inferred from the context based on deterministic policies applied by both the encoder and decoder. Subsequently, n (j) may also be referred to as a "phi multiplier," and may be used with only the delta predictor in some embodiments.

● A reconstructed residual (r) is encoded for each node _x ,r _y ,r _z )

G-PCC decoder 300 operates as follows:

● For model parametersAnd quantization parameter q _r 、/>q _θ And q _φ Decoding

● Decoding a node-associated geometric prediction scheme according to the geometric prediction scheme described in the text of ISO/IEC FDIS 23090-9 based on geometric point cloud compression, ISO/IEC JTC 1/SC29/WG 7MDS19617 (conference call, 10 of 2020)Parameters.

● Calculating the reconstructed coordinates as described above

● Decoding residual (r) _x ,r _y ,r _z )

As discussed in the following section, lossy compression may reconstruct the residual error (r _x ,r _y ,r _z )

To support

● The original coordinates (x, y, z) are calculated as follows

○

Lossy compression can be performed by reconstructing the residual (r _x ,r _y ,r _z ) Quantization is performed or by discarding points.

The quantized reconstructed residual is calculated as follows:

●

wherein (q) _x ,o _x )、(q _y ,o _y ) And (q) _z ,o _z ) Respectively controlAnd->Is used for the quantization parameter of the precision of (a).

Grid (Trellis) quantization may be used to further improve RD (rate distortion) performance results.

Quantization parameters may be changed at the sequence/frame/stripe/block level to achieve region adaptive quality and for rate control purposes.

The G-PCC encoder 200 may be configured to perform motion estimation for inter prediction. The motion estimation (global and local) processes applied in the inters software are described below. InterEM is based on octree-based codec extensions for inter-prediction. Although motion estimation is applied to octree-based frames, a similar process (or at least a portion thereof) may also be applied to predictive geometry codec.

Two types of motion are involved in the G-PCC intersem software: global motion matrix and local node motion vector. Global motion parameters include rotation matrices and translation vectors that can be applied to all points in the predicted (reference) frame. The local node motion vector of a node of the octree is a motion vector that is applied only to points within the node in the predicted (reference) frame. Details of the motion estimation algorithm in inters are described below.

Fig. 7 shows a flow chart illustrating a motion estimation process. The inputs to the process include a predicted frame 420 and a current frame 422.G-PCC encoder 200 first estimates global motion on a global scale (424). After applying the estimated global motion to predicted frame 420 (426), G-PCC encoder 200 estimates local motion at a finer scale, node level, in an octree (428). Finally, G-PCC encoder 200 applies motion compensation to the estimated local node motion and encodes the determined motion vectors and points (430).

The aspect of fig. 7 will be explained in more detail below. G-PCC encoder 200 may perform a process of estimating a global motion matrix and a translation vector. In the inters software, the global motion matrix is defined as the feature points between the matching predicted frame (reference) and the current frame.

Fig. 8 shows an example of a global motion estimation process that may be performed by G-PCC encoder 200. In the example of fig. 8, G-PCC encoder 200 finds feature points (432), samples the feature points (434), and performs motion estimation using a Least Mean Square (LMS) algorithm (436).

In the algorithm shown in fig. 8, points having a large position change between the predicted frame and the current frame may be defined as feature points. For each point in the current frame, the nearest point in the predicted frame is found, and a point pair is established between the current frame and the predicted frame. If the distance between pairs of points is greater than a threshold, the pairs of points are considered feature points.

After the feature points are found, the feature points are sampled to reduce the severity of the problem (e.g., by selecting a subset of the feature points to reduce the complexity of motion estimation). Then, an LMS algorithm is applied to derive the motion parameters by attempting to reduce the error between the predicted frame and the corresponding feature points in the current frame.

Fig. 9 illustrates an example process for estimating local node motion vectors. In the local node estimation algorithm shown in fig. 9, the motion vectors are estimated in a recursive manner. The cost function for selecting the most suitable motion vector may be based on the rate-distortion cost. In fig. 9, a path 440 shows a procedure of a current node that is not divided into 8 child nodes, and a path 442 shows a procedure of a current node that is divided into 8 child nodes.

If the current node is not partitioned into 8 children nodes (440), a motion vector is determined that may result in the lowest cost between the current node and the predicted node. If the current node is divided into 8 sub-nodes (442), a motion estimation algorithm is applied and the total cost under the division condition is obtained by adding the estimated cost value of each sub-node. Determining whether to divide by comparing costs of dividing with costs of non-dividing; each child node is assigned its corresponding motion vector (or may be further divided into its child nodes) if divided and the current node is assigned a motion vector if not divided.

Two parameters that affect motion vector estimation performance are block size (BlockSize) and minimum prediction unit size (MinPUSize). The BlockSize defines an upper limit of the node size to which motion vector estimation is applied, and the MinPUSize defines a lower limit.

The inters software, essentially acting as an octree codec, performs occupancy prediction and uses the global/local motion information and the reference point cloud when performing occupancy prediction. Thus, the inters software does not perform direct motion compensation of points, which may include, for example, applying motion to points in the reference frame to project those points to the current frame. The difference between the actual point and the predicted point may then be encoded and decoded, which may be more efficient when performing inter prediction.

One or more of the techniques disclosed herein may be applied independently or in combination. The present disclosure proposes techniques to perform direct motion compensation while still benefiting from flexible octree segmentation-based coding structures. In the following, these techniques are mainly shown in the context of octree partitioning, but can also be extended to OTQTBT (octree-quadtree-binary tree) partitioning scenarios.

The G-PCC encoder 200 and/or the G-PCC decoder 300 may be configured to perform advanced partitioning and process mode flags. In one example of the present disclosure, G-PCC encoder 200 and/or G-PCC decoder 300 may be configured to perform octree-based partitioning (for occupancy prediction) on the current point cloud. However, octree partitioning may be stopped at some level and then points within the octree leaf volume are directly encoded (hereinafter, this is referred to as "direct prediction"), instead of codec occupancy. Leaf node sizes or octree depth values may be signaled to specify the level at which octree partitioning stops, and these points are encoded and decoded into octree leaf volumes.

For each such octree node, where octree partitioning is stopped and "direct prediction" is activated, a flag may be signaled to indicate whether the point set within the octree volume is intra-predicted or inter-predicted. In the set of geometric parameters, the maximum and minimum sizes of octree leaves may be defined.

Fig. 10 is a conceptual diagram illustrating the high-level octree partitioning of octree 444. Fig. 10 is an example of an octree leaf node for direct prediction containing 13 points (O0 to O12). In special cases, the root node of the octree (without partitioning) can be encoded using "direct prediction".

The G-PCC encoder 200 and/or the G-PCC decoder 300 may be configured to perform intra prediction. When the flag value is set to intra prediction, intra prediction is performed for all points within the volume. For this purpose, a "local prediction tree" is generated. The generation of such trees is non-standard (non-normal) (the points may be traversed in a different order such as azimuthal, morton, radial or other order). For each point, its prediction mode (0, 1, 2, 3), the amount of child (child) information, the main residual and the secondary residual (if the angle mode is enabled) are signaled. Thus, in summary, intra-prediction is functionally similar to predictive geometry codec.

Alternatively, a single prediction mode may also be signaled for all points in the octree leaf volume, which may reduce the associated signaling costs. The radius value (if the angle mode is enabled) or the (x, y, z) value (if the angle mode is disabled) of the zero predictor may be set to the top left point within the octree leaf volume, etc. Alternatively, the zero predictor may be signaled for the octree volume, or an index indicating a point within the octree volume for the zero predictor may be signaled. Furthermore, clipping may be performed after performing the prediction/reconstruction if the value is outside the octree leaf volume.

The syntax table for intra prediction may be similar to the syntax for the prediction tree described in "Text of ISO/IEC FDIS 23090-9Geometry-based Point Cloud Compression (ISO/IEC FDIS 23090-9 Text based on geometrical point cloud compression)" (month 10 2020, teleconference, ISO/IEC JTC 1/SC29/WG 7MDS19617, incorporated herein by reference in its entirety.

The G-PCC encoder 200 and/or the G-PCC decoder 300 may be configured to perform inter prediction. Assume that there are N points inside the octree leaf: (O (0) … … O (N-1)), for inter-prediction, motion estimation is performed at the encoder side with the current set of points in the octree leaf volume, and a best match is found to the set of similar points in the reference point cloud frame (where the reference point cloud may be motion uncompensated or globally motion compensated). For inter prediction of octree leaves, the following is signaled:

i. reference index (if there are multiple reference point cloud frames to predict)

Motion Vector Difference (MVD). (the difference between the actual MV and the predicted MV (as described above with respect to performing MV prediction from neighbors))

Points in octree leaves (N).

The main residual (and also the secondary residual if the angle mode is enabled) of n points (R' i) (tuple), difference between 3D coordinates.

In the following, this disclosure describes the reference index (if applicable) and motion compensation procedure for MVs for a given signaling of octree nodes.

a. The top left point of the current octree leaf is at (X0, Y0, Z0), the dimension is (Sx, sy, sz), and the motion vector is mv= (MVx, MVy, MVz). Thus, the upper left point of the corresponding reference block in the reference point cloud frame is at (Xr, yr, zr) = (X0-MVx, Y0-MVy, Z0-MVz) and has a size of (Sx, sy, sz)

b. All points within the reference block are fetched (fetch) and arranged in a 1D array, the ordering may be predetermined/fixed or signaled (e.g., for octree leaves). Say, there are M such points with coordinates (in the reference frame): (R0 … … R (M-1)), where Ri is a triplet providing the 3D coordinates of the i-th point, as shown in fig. 12 for i=0 … … (M-1).

c. All points are motion compensated by applying signaled MVs, which are used as prediction geometry positions (Pi), i.e., pi=ri+mv, as shown in fig. 13.

d. If the angle mode is enabled, for all M points, the corresponding points are derived

In fig. 12, the current point set is marked as O0 to O12, and the reference point set is marked as R0 to R12, where n=m=13. In fig. 13, the current point set is marked as O0 to O12, and the motion compensation reference point set is marked as P0 to P12, where n=m=13.

Now, there are three possible scenarios:

i.N =m (the current octree node and the reference block have the same number of points).

ii.N>M。

iii.N<M。

A first scenario will now be described, in which n=m. The residuals (primary residuals and secondary residuals, if applicable) are directly added to the motion compensation points to generate the reconstruction = Pi + R' i.

A second scenario will now be described, where N > M. The last value P (M-1) is used to extend the motion compensation points in the 1D array (Pi), i.e., [ P '0, … … P' (M-1), P '(M), … … P' (N-1) ]= [ P0, … … P (M-1), … … P (M-1) ], and then the residuals are directly added to generate the reconstruction = P 'i+r' i. Optionally, zero predictors are used for extension.

A third scenario will now be described, where N < M. The residuals are directly added to the first N points, i.e., [ P0, … … P (N-1) ], to generate reconstruction = Pi + Ri.

The G-PCC encoder 200 and/or the G-PCC decoder 300 may be configured to perform MV prediction from neighbors. The MVs of the current octree leaf may be predicted from the MVs of spatio-temporal neighboring inter octree leaves and the corresponding MV differences may be signaled. In case there are multiple spatio-temporal candidates, the MV prediction index may be signaled. In addition to spatio-temporal neighboring candidates, previously used MV candidates may be added in a MV candidate list based on the recent history.

MV information may also be combined with spatio-temporal neighboring information by specifying a signaled "combination flag". In case there are multiple space-time candidates, the merge index may be signaled.

G-PCC encoder 200 and/or G-PCC decoder 300 may be configured to perform skipping the main residual.

For good inter prediction, the main residual that is applicable when the angle mode is enabled is typically small, even close to zero. In such a case, the main residual may also be skipped entirely for all points in the octree leaf volume. Thus, the primary_residual_skip flag may be signaled for the octree leaf volume. In this case, the difference between the original point and the predicted point is fully encoded in the secondary residual.

Alternatively, the primary_residual_skip_flag may be signaled on an octree layer that is higher in volume than the octree leaves and applied to one or more octree leaves associated with the octree layer.

The following table is a syntax table for inter-prediction octree leaves.

/>

Examples in various aspects of the disclosure may be used alone or in any combination.

Fig. 14 is a conceptual diagram illustrating an example ranging system 600 that may be used with one or more techniques of this disclosure. In the example of fig. 14, ranging system 600 includes an illuminator 602 and a sensor 604. The illuminator 602 may emit light 606. In some examples, illuminator 602 can emit light 606 as one or more laser beams. The light 606 may be one or more wavelengths, such as infrared wavelengths or visible wavelengths. In other examples, the light 606 is not a coherent laser. When the light 606 encounters an object, such as object 608, the light 606 generates return light 610. The return light 610 may include back-scattered light and/or reflected light. The return light 610 may pass through a lens 611, the lens 611 directing the return light 610 to create an image 612 of the object 608 on the sensor 604. The sensor 604 generates a signal 618 based on the image 612. The image 612 may include a set of points (e.g., represented by points in the image 612 of fig. 14).

In some examples, the illuminator 602 and the sensor 604 may be mounted on a rotating structure such that the illuminator 602 and the sensor 604 capture a 360 degree view of the environment. In other examples, ranging system 600 may include one or more optical components (e.g., mirrors, collimators, diffraction gratings, etc.) that enable illuminator 602 and sensor 604 to detect objects within a particular range (e.g., up to 360 degrees). Although the example of fig. 14 shows only a single illuminator 602 and sensor 604, ranging system 600 may include multiple illuminators and sensor sets.

In some examples, illuminator 602 generates a structured light pattern. In such an example, ranging system 600 may include a plurality of sensors 604 on which respective images of the structured light pattern are formed. Ranging system 600 may use the difference between the images of the structured light pattern to determine a distance from object 608 from which the structured light pattern is backscattered. Structured light based ranging systems may have a high level of accuracy (e.g., accuracy in the range of less than (sub) millimeters) when the object 608 is relatively close to the sensor 604 (e.g., 0.2 meters to 2 meters). Such high precision may be used for facial recognition applications, such as unlocking mobile devices (e.g., cell phones, tablet computers, etc.), and security applications.

In some examples, ranging system 600 is a time-of-flight (ToF) based system. In some examples where ranging system 600 is a ToF-based system, illuminator 602 generates pulses of light. In other words, the illuminator 602 can modulate the amplitude of the emitted light 606. In such an example, the sensor 604 detects the return light 610 from the light pulses 606 generated by the illuminator 602. Ranging system 600 may then determine a distance to object 608 from which light 606 is backscattered based on a delay between the time light 606 is emitted and detected and a known speed of light in air. In some examples, rather than (or in addition to) modulating the amplitude of the emitted light 606, the illuminator 602 can modulate the phase of the emitted light 606. In such an example, the sensor 604 may detect the phase of the return light 610 from the object 608 and determine a distance from a point on the object 608 using the speed of light and based on a time difference between a time the illuminator 602 generates the light 606 at a particular phase and a time the return light 610 is detected at the sensor 604 at the particular phase.

In other examples, the point cloud may be generated without using the illuminator 602. For example, in some examples, sensor 604 of ranging system 600 may include two or more optical cameras. In such examples, ranging system 600 may use an optical camera to capture a stereoscopic image of an environment including object 608. Ranging system 600 (e.g., point cloud generator 620) may then calculate the difference between the locations in the stereoscopic image. The ranging system 600 may then use these differences (disparities) to determine a distance from the location shown in the stereoscopic image. From these distances, point cloud generator 620 may generate a point cloud.

The sensor 604 may also detect other properties of the object 608, such as color and reflectance information. In the example of fig. 14, point cloud generator 620 may generate a point cloud based on signal 618 generated by sensor 604. Ranging system 600 and/or point cloud generator 620 may form part of data source 104 (fig. 1).

Fig. 15 is a conceptual diagram illustrating an example scenario based on a vehicle in which one or more techniques of the present disclosure may be used. In the example of fig. 15, the vehicle 700 includes a laser package (package) 702, such as a LIDAR system. Although not shown in the example of fig. 15, vehicle 700 may also include a data source and a G-PCC encoder, such as G-PCC encoder 200 (fig. 1). In the example of fig. 15, the laser suite 702 emits a laser beam 704, the laser beam 704 being reflected from a pedestrian 706 or other object on the roadway. The data source of the vehicle 700 may generate a point cloud based on the signals generated by the laser suite 702. The G-PCC encoder of vehicle 700 may encode the point cloud to generate bit stream 708. Bit stream 708 may include significantly fewer bits than the uncoded point cloud obtained by the G-PCC encoder. An output interface of the vehicle 700, such as the output interface 108 (fig. 1), may send the bit stream 708 to one or more other devices. Thus, the vehicle 700 is able to send the bit stream 708 to other devices faster than the unencoded point cloud data. Additionally, the bit stream 708 may require less data storage capacity.

In the example of fig. 15, a vehicle 700 may send a bitstream 708 to another vehicle 710. Vehicle 710 may include a G-PCC decoder, such as G-PCC decoder 300 (fig. 1). The G-PCC decoder of vehicle 710 may decode bit stream 708 to reconstruct the point cloud. The vehicle 710 may use the reconstructed point cloud for various purposes. For example, the vehicle 710 may determine that the pedestrian 706 is on a road in front of the vehicle 700 based on the reconstructed point cloud and thus begin decelerating, e.g., even before the driver of the vehicle 710 realizes that the pedestrian 706 is on the road. Thus, in some examples, the vehicle 710 may perform autonomous navigational operations, generate a notification or alert, or perform another action based on the reconstructed point cloud.

Additionally or alternatively, the vehicle 700 may send the bitstream 708 to the server system 712. Server system 712 may use bitstream 708 for a variety of purposes. For example, server system 712 may store bitstream 708 for subsequent reconstruction of the point cloud. In this example, server system 712 may train the autopilot system using the point cloud and other data (e.g., vehicle telemetry data generated by vehicle 700). In other examples, server system 712 may store bitstream 708 for use in investigating subsequent reconstructions of a panel collision investigation (e.g., if vehicle 700 collides with pedestrian 706), or may send navigation notifications or instructions to vehicle 700 or vehicle 710.

Fig. 16 is a conceptual diagram illustrating an example augmented reality system in which one or more techniques of the present disclosure may be used. Augmented reality (XR) is a term used to encompass a range of technological errors, including Augmented Reality (AR), mixed Reality (MR), and Virtual Reality (VR). In the example of fig. 16, the user 800 is located at a first location 802. The user 800 wears an XR headset 804. Instead of XR headset 804, user 800 may use a mobile device (e.g., a mobile phone, tablet, etc.). The XR headset 804 includes a depth detection sensor, such as a LIDAR system, that detects the position of a point on the object 806 at the first location 802. The data source of the XR headset 804 may use the signals generated by the depth detection sensor to generate a point cloud representation of the object 806 at the first location 802. XR headset 804 may include a G-PCC encoder (e.g., G-PCC encoder 200 of fig. 1) configured to encode a point cloud to generate bitstream 808.

XR headset 804 may send bit stream 808 (e.g., via a network such as the internet) to XR headset 810 worn by user 812 at second location 814. XR headset 810 may decode bitstream 808 to reconstruct the point cloud. XR headset 810 may use the point cloud to generate an XR visualization (e.g., AR, MR, VR visualization) representative of object 806 at first location 802. Thus, in some examples, such as when XR headset 810 generates a VR visualization, user 812 at location 814 may have a 3D immersive experience of first location 802. In some examples, XR headset 810 may determine a location of the virtual object based on the reconstructed point cloud. For example, XR headset 810 may determine that the environment (e.g., first location 802) includes a flat surface based on the reconstructed point cloud, and then determine that a virtual object (e.g., cartoon character) is to be positioned on the flat surface. XR headset 810 may generate an XR visualization in which the virtual object is in the determined location. For example, XR headset 810 may display a cartoon character sitting on a flat surface.

Fig. 17 is a conceptual diagram illustrating an example mobile device system in which one or more techniques of this disclosure may be used. In the example of fig. 17, a mobile device 900, such as a mobile phone or tablet, includes a depth detection sensor (such as a LIDAR system) that detects the location of a point on an object 902 in the environment of the mobile device 900. The data source of the mobile device 900 may use the signals generated by the depth detection sensor to generate a point cloud representation of the object 902. Mobile device 900 may include a G-PCC encoder (e.g., G-PCC encoder 200 of fig. 1) configured to encode a point cloud to generate a bitstream 904. In the example of fig. 17, mobile device 900 can send a bitstream to a remote device 906 (such as a server system or other mobile device). The remote device 906 may decode the bitstream 904 to reconstruct the point cloud. Remote device 906 may use the point cloud for various purposes. For example, the remote device 906 may use the point cloud to generate an environment map of the mobile device 900. For example, the remote device 906 may generate a map of the interior of the building based on the reconstructed point cloud. In another example, the remote device 906 may generate an image (e.g., a computer graphic) based on the point cloud. For example, the remote device 906 may use points of the point cloud as vertices of the polygon and color attributes of the points as a basis for coloring the polygon. In some examples, the remote device 906 may perform facial recognition using a point cloud.

Fig. 18 is a flowchart illustrating example operations for decoding a bitstream including point cloud data. As part of the decoding point cloud, G-PCC decoder 300 may perform the operations of fig. 18. In the example of fig. 18, G-PCC decoder 300 determines an octree defining an octree-based partition of a space containing point clouds (1000). Leaf nodes of the octree contain one or more points of the point cloud.

G-PCC decoder 300 directly decodes the location of each of one or more points in the leaf node (1002). To directly encode the location of each of the one or more points in the leaf node, G-PCC decoder 300 generates a prediction of the one or more points (1004), and determines the one or more points based on the prediction (1006). To directly decode the location of each of one or more points in a leaf node, G-PCC decoder 300 may be configured to: receiving a flag, wherein a first value of the flag indicates that a prediction of one or more points is generated by intra-prediction and a second value of the flag indicates that a prediction of one or more points is generated by inter-prediction; and decoding one or more points using intra-prediction or inter-prediction based on the value of the flag.

G-PCC decoder 300 may be configured to receive, in a bit stream of the point cloud, octree leaf volumes specifying volumes of leaf nodes. For example, the number of the cells to be processed, assume that a complete point cloud is encapsulated in W X W in a cuboid. The point cloud may be recursively partitioned and the octree leaf volume is W/2 for a given partition depth d ^d ×W/2 ^d ×W/2 ^d . At this level, the occupancy flag (binary) may be signaled when the occupancy flagWhen the sign is equal to 1, the cuboid is indicated to have at least one point. When the occupancy flag is 1, then another intra or inter flag may be signaled indicating whether points inside the cuboid are intra or inter predicted, respectively.

To generate predictions of one or more points, G-PCC decoder 300 may be further configured to generate predictions of one or more points using intra-prediction, and to generate predictions of one or more points using intra-prediction, G-PCC decoder 300 may be further configured to determine a local prediction tree of one or more points.

To determine one or more points based on the prediction, G-PCC decoder 300 may be configured to receive at least one of a prediction mode, a primary residual, and a secondary residual for each of the one or more points in a bit stream of the point cloud. To generate predictions of one or more points, G-PCC decoder 300 may be configured to generate predictions of one or more points using inter-prediction, and to generate predictions of one or more points using inter-prediction, G-PCC decoder 300 may be further configured to perform motion estimation with the one or more points to determine a set of similar points in a reference point cloud frame.

To generate a prediction of one or more points, G-PCC decoder 300 may be further configured to generate a prediction of one or more points using inter-prediction, and to generate a prediction of one or more points using inter-prediction, G-PCC decoder 300 may be further configured to perform motion compensation based on the set of points in the reference point cloud frame to predict the one or more points. To perform motion compensation, G-PCC decoder 300 may also be configured to apply a motion vector to a set of points in a reference point cloud frame to determine a prediction of one or more points. G-PCC decoder 300 may be configured to predict a motion vector based on a motion vector of spatio-temporal neighboring inter-frame octree leaves.

G-PCC decoder 300 may also be configured to reconstruct a point cloud from the point cloud data. As part of reconstructing the point cloud, G-PCC decoder 300 may be further configured to determine a location of one or more points of the point cloud based on the planar location.

It should be appreciated that, according to an example, certain acts or events of any of the techniques described herein can be performed in a different order, may be added, combined, or omitted entirely (e.g., not all of the described acts or events are necessary for the practice of the technique). Further, in some examples, acts or events may be performed concurrently, e.g., through multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially.

In one or more examples, the described functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. A computer-readable medium may include a computer-readable storage medium corresponding to a tangible medium, such as a data storage medium, or a communication medium, including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, a computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium, which is non-transitory, or (2) a communication medium such as a signal or carrier wave. Data storage media can be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, and any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Further, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the terms "processor" and "processing circuitry" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Furthermore, in some aspects, the functionality described herein may be provided in dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be implemented entirely in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in various devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit in combination with appropriate software and/or firmware, or provided by a collection of interoperable hardware units, including one or more processors as described above.

The following numbered clauses illustrate one or more aspects of the devices and techniques described in this disclosure.

Clause 1A: a method of encoding a point cloud, the method comprising: determining an octree defining an octree-based segmentation of a space containing point clouds, wherein: the leaf nodes of the octree contain one or more points of the point cloud, and the location of each of the one or more points in the leaf node is directly signaled; generating predictions of one or more points using intra-prediction or inter-prediction; and encoding and decoding a syntax element indicating whether one or more points are predicted using intra-prediction or inter-prediction.

Clause 2A: the method of clause 1A, wherein the octree leaf volume specifying the volume of the leaf node is signaled in the bitstream.

Clause 3A: the method according to clause 1A or 2A, wherein: generating the prediction of the one or more points includes generating the prediction of the one or more points using intra-prediction, and generating the prediction of the one or more points using intra-prediction includes determining a local prediction tree of the one or more points.

Clause 4A: the method of clause 3A, wherein, for each of the one or more points, at least one of the prediction mode, the primary residual, and the secondary residual is signaled.

Clause 5A: the method according to clause 1A or 2A, wherein: generating the prediction of the one or more points includes generating the prediction of the one or more points using inter-prediction, and generating the prediction of the one or more points using inter-prediction includes performing motion estimation with the one or more points to determine a set of similar points in the reference point cloud frame.

Clause 6A: the method according to any of clauses 1A, 2A, or 5A, wherein: generating the prediction of the one or more points includes generating the prediction of the one or more points using inter-prediction, and generating the prediction of the one or more points using inter-prediction includes performing motion compensation based on a set of points in the reference point cloud frame to predict the one or more points.

Clause 7A: the method of clause 6A, wherein performing motion compensation comprises applying a motion vector to a set of points in the reference point cloud frame to determine a prediction of one or more points.

Clause 8A: the method of clause 7A, further comprising predicting the motion vector based on the motion vectors of the spatio-temporal neighboring inter-frame octree leaves.

Clause 9A: the method of any one of clauses 1A to 8A, further comprising generating a point cloud.

Clause 10A: an apparatus for processing a point cloud, the apparatus comprising one or more components for performing the method according to any of clauses 1A-9A.

Clause 11A: the apparatus of clause 10A, wherein the one or more components comprise one or more processors implemented in circuitry.

Clause 12A: the apparatus of any of clauses 10A or 11A, further comprising a memory storing data representing a point cloud.

Clause 13A: the apparatus according to any one of clauses 10A to 12A, wherein the apparatus comprises a decoder.

Clause 14A: the apparatus according to any one of clauses 10A to 13A, wherein the apparatus comprises an encoder.

Clause 15A: the apparatus of any one of clauses 10A to 14A, further comprising means for generating a point cloud.

Clause 16A: the apparatus of any one of clauses 10A to 15A, further comprising a display that presents an image based on the point cloud.

Clause 17A: a computer-readable storage medium having stored therein instructions that, when executed, cause one or more processors to perform the method of any of clauses 1A-9A.

Clause 1B: an apparatus for decoding a bitstream comprising point cloud data, the apparatus comprising: the memory is used for storing the point cloud data; and one or more processors coupled to the memory and implemented in the circuitry, the one or more processors configured to: determining an octree defining an octree-based partition of a space containing point clouds, wherein leaf nodes of the octree contain one or more points of the point clouds; and directly decoding a location of each of the one or more points in the leaf node, wherein to directly decode the location of each of the one or more points in the leaf node, the one or more processors are further configured to: generating predictions of one or more points; and determining one or more points based on the prediction.

Clause 2B: the apparatus of clause 1B, wherein, to directly decode the location of each of the one or more points in the leaf node, the one or more processors are further configured to: receiving a flag, wherein a first value of the flag indicates that a prediction of one or more points is generated by intra-prediction and a second value of the flag indicates that a prediction of one or more points is generated by inter-prediction; and decoding one or more points using intra-prediction or inter-prediction based on the value of the flag.

Clause 3B: the apparatus of clause 1B, wherein the one or more processors are further configured to receive an octree leaf volume specifying a volume of leaf nodes in the bitstream including the point cloud.

Clause 4B: the apparatus according to clause 1B, wherein: the one or more processors are further configured to generate predictions of the one or more points using intra-prediction, and to generate predictions of the one or more points using intra-prediction, the one or more processors are further configured to determine a local prediction tree of the one or more points.

Clause 5B: the apparatus of clause 1B, wherein to determine the one or more points based on the prediction, the one or more processors are further configured to receive at least one of a prediction mode, a primary residual, and a secondary residual for each of the one or more points in a bitstream comprising the point cloud.

Clause 6B: the apparatus according to clause 1B, wherein: the one or more processors are further configured to generate predictions of the one or more points using the inter-prediction, and to perform motion estimation with the one or more points to determine a set of similar points in the reference point cloud frame.

Clause 7B: the apparatus according to clause 1B, wherein: the one or more processors are further configured to generate a prediction of the one or more points using the inter prediction, and to perform motion compensation based on the set of points in the reference point cloud frame to predict the one or more points.

Clause 8B: the apparatus of clause 7B, wherein to perform motion compensation, the one or more processors are further configured to apply the motion vector to a set of points in the reference point cloud frame to determine a prediction of the one or more points.

Clause 9B: the apparatus of clause 8B, wherein the one or more processors are further configured to predict the motion vector based on motion vectors of spatio-temporal neighboring inter-frame octree leaves.

Clause 10B: the apparatus of clause 1B, wherein the one or more processors are further configured to reconstruct a point cloud from the point cloud data.

Clause 11B: the apparatus of clause 10B, wherein the one or more processors are configured to determine, as part of reconstructing the point cloud, a location of one or more points of the point cloud based on the planar location.

Clause 11B: the apparatus of clause 10B, wherein the one or more processors are configured to determine, as part of reconstructing the point cloud, a location of one or more points of the point cloud based on the directly decoded location of each of the one or more points in the leaf node.

Clause 12B: the apparatus of clause 11B, wherein the one or more processors are further configured to generate a map of an interior of the building based on the point cloud.

Clause 13B: the apparatus of clause 11B, wherein the one or more processors are further configured to perform autonomous navigation operations based on the point cloud.

Clause 14B: the apparatus of clause 11B, wherein the one or more processors are further configured to generate the computer graphic based on Yu Dianyun.

Clause 15B: the apparatus of clause 11B, wherein the one or more processors are configured to: determining a location of the virtual object based on the reconstructed point cloud data; and generating an augmented reality (XR) visualization in which the virtual object is in the determined location.

Clause 16B: the apparatus of clause 11B, further comprising a display that presents an image based on the point cloud.

Clause 17B: the device of clause 1B, wherein the device is a mobile phone or tablet computer.

Clause 18B: the apparatus of clause 1B, wherein the apparatus is a vehicle.

Clause 19B: the device of clause 1B, wherein the device is an augmented reality device.

Clause 20B: a method of decoding a point cloud, the method comprising: determining an octree defining an octree-based partition of a space containing point clouds, wherein leaf nodes of the octree contain one or more points of the point clouds; directly decoding a location of each of the one or more points in the leaf node, wherein directly decoding the location of each of the one or more points in the leaf node comprises: generating predictions of one or more points; and determining one or more points based on the prediction.

Clause 21B: the method of clause 20B, wherein directly decoding the location of each of the one or more points in the leaf node further comprises: receiving a flag, wherein a first value of the flag indicates that a prediction of one or more points is generated by intra-prediction and a second value of the flag indicates that a prediction of one or more points is generated by inter-prediction; and decoding one or more points using intra-prediction or inter-prediction based on the value of the flag.

Clause 22B: the method of clause 20B, further comprising receiving in the bit stream of the point cloud an octree leaf volume specifying a volume of leaf nodes.

Clause 23B: the method according to clause 20B, wherein: generating the prediction of the one or more points includes generating the prediction of the one or more points using intra-prediction, and generating the prediction of the one or more points using intra-prediction includes determining a local prediction tree of the one or more points.

Clause 24B: the method of clause 20B, wherein determining one or more points based on the prediction comprises receiving at least one of a prediction mode, a primary residual, and a secondary residual for each of the one or more points in a bit stream of the point cloud.

Clause 25B: the method according to clause 20B, wherein: generating the prediction of the one or more points includes generating the prediction of the one or more points using inter-prediction, and generating the prediction of the one or more points using inter-prediction includes performing motion estimation with the one or more points to determine a set of similar points in the reference point cloud frame.

Clause 26B: the method according to clause 20B, wherein: generating the prediction of the one or more points includes generating the prediction of the one or more points using inter-prediction, and generating the prediction of the one or more points using inter-prediction includes performing motion compensation based on a set of points in the reference point cloud frame to predict the one or more points.

Clause 27B: the method of clause 26B, wherein performing motion compensation comprises applying a motion vector to a set of points in the reference point cloud frame to determine a prediction of one or more points.

Clause 28B: the method of clause 27B, further comprising predicting the motion vector based on the motion vectors of the spatio-temporal neighboring inter-frame octree leaves.

Clause 29B: a computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to: determining an octree defining an octree-based partition of a space containing point clouds, wherein leaf nodes of the octree contain one or more points of the point clouds; and directly decoding a location of each of the one or more points in the leaf node, wherein to directly decode the location of each of the one or more points in the leaf node, the instructions cause the one or more processors to: generating predictions of one or more points; and determining one or more points based on the prediction.

Clause 30B: an apparatus, comprising: means for determining an octree defining an octree-based partition of a space containing a point cloud, wherein leaf nodes of the octree contain one or more points of the point cloud; means for directly decoding a location of each of the one or more points in the leaf node, wherein the means for directly decoding a location of each of the one or more points in the leaf node comprises: means for generating predictions of one or more points; and means for determining one or more points based on the prediction.

Clause 1C: an apparatus for decoding a bitstream comprising point cloud data, the apparatus comprising: the memory is used for storing the point cloud data; and one or more processors coupled to the memory and implemented in the circuitry, the one or more processors configured to: determining an octree defining an octree-based partition of a space containing point clouds, wherein leaf nodes of the octree contain one or more points of the point clouds; and directly decoding a location of each of the one or more points in the leaf node, wherein to directly decode the location of each of the one or more points in the leaf node, the one or more processors are further configured to: generating predictions of one or more points; and determining one or more points based on the prediction.

Clause 2C: the apparatus of clause 1C, wherein, to directly decode the location of each of the one or more points in the leaf node, the one or more processors are further configured to: receiving a flag, wherein a first value of the flag indicates that a prediction of one or more points is generated by intra-prediction and a second value of the flag indicates that a prediction of one or more points is generated by inter-prediction; and decoding one or more points using intra-prediction or inter-prediction based on the value of the flag.

Clause 3C: the apparatus of clause 1C or 2C, wherein the one or more processors are further configured to receive an octree leaf volume specifying a volume of leaf nodes in the bitstream including the point cloud.

Clause 4C: the apparatus according to any one of clauses 1C to 3C, wherein: the one or more processors are further configured to generate predictions of the one or more points using intra-prediction, and to generate predictions of the one or more points using intra-prediction, the one or more processors are further configured to determine a local prediction tree of the one or more points.

Clause 5C: the apparatus of any one of clauses 1C to 4C, wherein, to determine the one or more points based on the prediction, the one or more processors are further configured to receive at least one of a prediction mode, a primary residual, and a secondary residual for each of the one or more points in a bitstream comprising the point cloud.

Clause 6C: the apparatus according to any one of clauses 1C to 5C, wherein: the one or more processors are further configured to generate predictions of the one or more points using the inter-prediction, and to perform motion estimation with the one or more points to determine a set of similar points in the reference point cloud frame.

Clause 7C: the apparatus according to any one of clauses 1C to 5C, wherein: the one or more processors are further configured to generate a prediction of the one or more points using the inter prediction, and to perform motion compensation based on the set of points in the reference point cloud frame to predict the one or more points.

Clause 8C: the apparatus of clause 7C, wherein to perform motion compensation, the one or more processors are further configured to apply the motion vector to a set of points in the reference point cloud frame to determine a prediction of the one or more points.

Clause 9C: the apparatus of clause 8C, wherein the one or more processors are further configured to predict the motion vector based on motion vectors of spatio-temporal neighboring inter-frame octree leaves.

Clause 10C: the apparatus of any of clauses 1C to 9C, wherein the one or more processors are further configured to reconstruct a point cloud from the point cloud data.

Clause 11C: the apparatus of clause 10C, wherein the one or more processors are configured to determine, as part of reconstructing the point cloud, a location of one or more points of the point cloud based on the directly decoded location of each of the one or more points in the leaf node.

Clause 12C: the apparatus of clause 11C, wherein the one or more processors are further configured to generate a map of an interior of the building based on the point cloud.

Clause 13C: the apparatus of clause 11C, wherein the one or more processors are further configured to perform autonomous navigation operations based on the point cloud.

Clause 14C: the apparatus of clause 11C, wherein the one or more processors are further configured to generate the computer graphic based on Yu Dianyun.

Clause 15C: the apparatus of clause 11C, wherein the one or more processors are configured to: determining a location of the virtual object based on the reconstructed point cloud data; and generating an augmented reality (XR) visualization in which the virtual object is in the determined location.

Clause 16C: the apparatus of any of clauses 11C to 15, further comprising a display that presents an image based on the point cloud.

Clause 17C: the device of any of clauses 1C to 16C, wherein the device is a mobile phone or a tablet computer.

Clause 18C: the apparatus according to any one of clauses 1C to 16C, wherein the apparatus is a vehicle.

Clause 19C: the device of any of clauses 1C to 16C, wherein the device is an augmented reality device.

Clause 20C: a method of decoding a point cloud, the method comprising: determining an octree defining an octree-based partition of a space containing point clouds, wherein leaf nodes of the octree contain one or more points of the point clouds; directly decoding the location of each of the one or more points in the leaf node, wherein directly decoding the location of each of the one or more points in the leaf node comprises: generating predictions of one or more points; and determining one or more points based on the prediction.

Clause 21C: the method of clause 20C, wherein directly decoding the location of each of the one or more points in the leaf node further comprises: receiving a flag, wherein a first value of the flag indicates that a prediction of one or more points is generated by intra-prediction and a second value of the flag indicates that a prediction of one or more points is generated by inter-prediction; and decoding one or more points using intra-prediction or inter-prediction based on the value of the flag.

Clause 22C: the method of clause 20C or 21C, further comprising receiving in the bit stream of the point cloud an octree leaf volume specifying a volume of leaf nodes.

Clause 23C: the method according to any one of clauses 20C to 22C, wherein: generating the prediction of the one or more points includes generating the prediction of the one or more points using intra-prediction, and generating the prediction of the one or more points using intra-prediction includes determining a local prediction tree of the one or more points.

Clause 24C: the method of any of clauses 20C to 23C, wherein determining one or more points based on the prediction comprises receiving at least one of a prediction mode, a primary residual, and a secondary residual for each of the one or more points in a bit stream of the point cloud.

Clause 25C: the method according to any one of clauses 20C to 24C, wherein: generating the prediction of the one or more points includes generating the prediction of the one or more points using inter-prediction, and generating the prediction of the one or more points using inter-prediction includes performing motion estimation with the one or more points to determine a set of similar points in the reference point cloud frame.

Clause 26C: the method according to any of clauses 20C to 25C, wherein: generating the prediction of the one or more points includes generating the prediction of the one or more points using inter-prediction, and generating the prediction of the one or more points using inter-prediction includes performing motion compensation based on a set of points in the reference point cloud frame to predict the one or more points.

Clause 27C: the method of clause 26C, wherein performing motion compensation comprises applying a motion vector to a set of points in the reference point cloud frame to determine a prediction of one or more points.

Clause 28C: the method of clause 27C, further comprising predicting the motion vector based on the motion vectors of the spatio-temporal neighboring inter-frame octree leaves.

Various examples have been described. These and other examples are within the scope of the following claims.

Claims

1. An apparatus for decoding a bitstream comprising point cloud data, the apparatus comprising:

a memory storing the point cloud data; and

one or more processors coupled to the memory and implemented in circuitry, the one or more processors configured to:

determining an octree defining an octree-based partition of a space containing a point cloud, wherein leaf nodes of the octree contain one or more points of the point cloud; and

directly decoding a location of each of the one or more points in the leaf node, wherein to directly decode the location of each of the one or more points in the leaf node, the one or more processors are further configured to:

generating predictions of the one or more points; and

the one or more points are determined based on the prediction.

2. The device of claim 1, wherein to directly decode the location of each of the one or more points in the leaf node, the one or more processors are further configured to:

Receiving a flag, wherein a first value of the flag indicates that the prediction of the one or more points is generated by intra prediction and a second value of the flag indicates that the prediction of the one or more points is generated by inter prediction; and

based on the value of the flag, the one or more points are decoded using intra-prediction or inter-prediction.

3. The device of claim 1, wherein the one or more processors are further configured to receive, in the bitstream including the point cloud, a octree leaf volume specifying a volume of the leaf node.

4. The apparatus of claim 1, wherein:

in order to generate the predictions of the one or more points, the one or more processors are further configured to generate the predictions of the one or more points using intra-prediction, and

to generate the predictions for the one or more points using intra-prediction, the one or more processors are further configured to determine a local prediction tree for the one or more points.

5. The device of claim 1, wherein to determine the one or more points based on the prediction, the one or more processors are further configured to receive at least one of a prediction mode, a primary residual, and a secondary residual for each of the one or more points in the bitstream including the point cloud.

6. The apparatus of claim 1, wherein:

in order to generate the predictions of the one or more points, the one or more processors are further configured to generate the predictions of the one or more points using inter-prediction, and

to generate the prediction of the one or more points using inter-prediction, the one or more processors are further configured to perform motion estimation with the one or more points to determine a set of similar points in a reference point cloud frame.

7. The apparatus of claim 1, wherein:

to generate the prediction of the one or more points using inter-prediction, the one or more processors are further configured to perform motion compensation based on a set of points in a reference point cloud frame to predict the one or more points.

8. The device of claim 7, wherein to perform motion compensation, the one or more processors are further configured to apply a motion vector to the set of points in the reference point cloud frame to determine a prediction of the one or more points.

9. The device of claim 8, wherein the one or more processors are further configured to predict the motion vector based on a motion vector of spatiotemporal neighboring inter-frame octree leaves.

10. The device of claim 1, wherein the one or more processors are further configured to reconstruct the point cloud from the point cloud data.

11. The device of claim 10, wherein the one or more processors are configured to determine, as part of reconstructing the point cloud, a location of one or more points of the point cloud based on the directly decoded location of each of the one or more points in the leaf node.

12. The device of claim 11, wherein the one or more processors are further configured to generate a map of an interior of a building based on the point cloud.

13. The device of claim 11, wherein the one or more processors are further configured to perform autonomous navigation operations based on the point cloud.

14. The device of claim 11, wherein the one or more processors are further configured to generate a computer graph based on the point cloud.

15. The device of claim 11, wherein the one or more processors are configured to:

determining a position of a virtual object based on the point cloud; and

an augmented reality XR visualization is generated in which the virtual object is in the determined location of the virtual object.

16. The apparatus of claim 11, further comprising a display to present an image based on the point cloud.

17. The device of claim 1, wherein the device is a mobile phone or tablet.

18. The apparatus of claim 1, wherein the apparatus is a vehicle.

19. The device of claim 1, wherein the device is an augmented reality device.

20. A method of decoding a point cloud, the method comprising:

determining an octree defining an octree-based partition of a space containing the point cloud, wherein leaf nodes of the octree contain one or more points of the point cloud;

directly decoding a location of each of the one or more points in the leaf node, wherein directly decoding the location of each of the one or more points in the leaf node comprises:

Generating predictions of the one or more points; and

the one or more points are determined based on the prediction.

21. The method of claim 20, wherein directly decoding the location of each of the one or more points in the leaf node further comprises:

22. The method of claim 20, further comprising receiving an octree leaf volume specifying a volume of the leaf node in a bit stream of the point cloud.

23. The method according to claim 20, wherein:

generating the predictions of the one or more points includes generating the predictions of the one or more points using intra-prediction, and

generating the predictions of the one or more points using intra-prediction includes determining a local prediction tree for the one or more points.

24. The method of claim 20, wherein determining the one or more points based on the prediction comprises receiving at least one of a prediction mode, a primary residual, and a secondary residual for each of the one or more points in a bitstream of the point cloud.

25. The method according to claim 20, wherein:

generating the predictions of the one or more points includes generating the predictions of the one or more points using inter-frame prediction, and

generating the prediction of the one or more points using inter-prediction includes performing motion estimation with the one or more points to determine a set of similar points in a reference point cloud frame.

26. The method according to claim 20, wherein:

generating the prediction of the one or more points using inter-prediction includes performing motion compensation based on a set of points in a reference point cloud frame to predict the one or more points.

27. The method of claim 26, wherein performing motion compensation comprises applying motion vectors to the set of points in the reference point cloud frame to determine a prediction of the one or more points.

28. The method of claim 27, further comprising predicting the motion vector based on motion vectors of spatio-temporal neighboring inter-frame octree leaves.

29. A computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to:

directly decoding a location of each of the one or more points in the leaf node, wherein to directly decode the location of each of the one or more points in the leaf node, the instructions cause the one or more processors to:

generating predictions of the one or more points; and

the one or more points are determined based on the prediction.

30. An apparatus, comprising:

means for determining an octree defining an octree-based partition of a space containing a point cloud, wherein leaf nodes of the octree contain one or more points of the point cloud;

means for directly decoding a position of each of the one or more points in the leaf node, wherein the means for directly decoding the position of each of the one or more points in the leaf node comprises:

Means for generating predictions of the one or more points; and

means for determining the one or more points based on the prediction.