CN113678466A

CN113678466A - Method and apparatus for predicting point cloud attribute encoding

Info

Publication number: CN113678466A
Application number: CN202080021859.4A
Authority: CN
Inventors: D·弗林; S·拉瑟雷
Original assignee: BlackBerry Ltd
Current assignee: BlackBerry Ltd
Priority date: 2019-03-18
Filing date: 2020-03-12
Publication date: 2021-11-19
Also published as: US20200302651A1; EP3942830A1; WO2020187710A1; US10964068B2

Abstract

Methods and apparatus for attribute encoding for point clouds. In a top-down encoding process, a predicted geometrically weighted sum of properties of each occupied sub-volume in the volume is determined based on a prediction operation that uses at least the geometrically weighted sum of properties of the volume. The prediction operation involves upsampling data from the previous (parent) depth of the coding tree. The predicted geometrically weighted sum of attributes and the corresponding geometrically weighted sum of the original attributes are subtracted in the attribute domain or the transform domain to produce residual coefficients. The AC residual sparseness is encoded in the bitstream. The transform used has the property of a DC coefficient, so that the DC coefficient of the transform from the geometrically weighted sum of properties of the set of child sub-volumes is the geometrically weighted sum of properties of the volume.

Description

Method and apparatus for predicting point cloud attribute encoding

Technical Field

The present application relates generally to data compression and, in one particular example, to a method and apparatus for point cloud encoding. Methods and apparatus for encoding point cloud attributes using predictive coding are described.

Background

Data compression is used in communications and computer networking to efficiently store, transmit, and reproduce information. Three-dimensional representations of physical spaces are often stored as point clouds, where a point cloud is made up of a plurality of points each having a geometric location in a given space. Point clouds have a range of applications. In one example, they may be used for Virtual Reality (VR) and Augmented Reality (AR) applications. In another example, the point cloud may be used for computer vision applications, such as automated vehicles.

The point cloud data can be very large, especially when it is time-varying. For example, LiDAR scanning may produce large amounts of sparse point cloud data that must be processed, analyzed, or transmitted very quickly for real-time vehicle control applications. As another example, a sophisticated VR application may involve dense point cloud data with fast real-time user movement. Therefore, efficient compression of point cloud data becomes a challenge.

Much work has been done on mechanisms for efficiently encoding the geometry of point clouds. This typically involves splitting the geometric space recursion into smaller and smaller subunits until each occupied subunit contains only one point. Octree-based encoding processes have been developed for efficiently encoding the positioning or geometry data.

In addition to encoding the geometry, one or more attributes about the point may also be encoded. For example, in the case of VR, color or intensity (brightness) information may be encoded for each pixel. In the case of LiDAR scanning, reflectivity information may be encoded for each pixel. The pixels may have alternative or additional attributes.

It would be advantageous to provide a method and apparatus for efficiently and effectively compressing attribute data of a point cloud.

Drawings

Reference will now be made, by way of example, to the accompanying drawings which illustrate example embodiments of the present application, and in which:

FIG. 1 shows an example of an octree-based point cloud geometry;

FIG. 2 illustrates the application of a 2-point transform to an example child sub-volume;

FIG. 3 illustrates the recursive application of a region-adaptive hierarchical transformation in three directions for an example child node;

FIG. 4 illustrates a flow diagram of one example method of encoding attribute data of a point cloud;

FIG. 5 illustrates a flow diagram of an example method of decoding compressed attribute data of a point cloud;

FIG. 6 diagrammatically illustrates an example encoding process for point cloud attributes using top-down encoding and inter-layer prediction;

FIG. 7 graphically illustrates determination of residual AC coefficients in the process of FIG. 6;

FIG. 8 diagrammatically illustrates an example decoding process;

FIG. 9 illustrates, in block diagram form, an example encoder;

FIG. 10 illustrates, in block diagram form, an example decoder;

FIG. 11 shows an exemplary upsampling process in two-dimensional graphical form;

FIG. 12 illustrates an exemplary graph reflecting five nodes or points;

FIG. 13 illustrates an example chart for one implementation of compression performance in encoding YUV attributes of dense point clouds;

FIG. 14 illustrates another example chart in encoding reflectivity of a sparse point cloud for one implementation;

FIG. 15 shows a simplified block diagram of an example embodiment of an encoder; and

fig. 16 shows a simplified block diagram of an example embodiment of a decoder.

Like reference numerals may be used to refer to like elements in the various views.

Detailed Description

Methods and apparatus for encoding attributes of a point cloud are described. The encoding may be top-down encoding. The method and apparatus may involve a prediction operation based on upsampling of attribute data from parent depths in a coding tree. The method and apparatus may employ a transformation that conforms to the properties of DC coefficients such that the aggregate DC coefficient used to transform the geometric-weighted sum of attributes of child (child) sub-volumes is a geometric-weighted sum of attributes of parent volumes.

In one aspect, the present application describes a method of encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being located within a space that is recursively split and contains points of the point cloud, each point having a respective attribute. The method can comprise the following steps: in a top-down encoding process for a recursively split space, for a volume containing sub-volumes comprising a plurality of points, a predicted geometrically weighted sum of properties of each occupied sub-volume of the volume is determined based on a prediction operation based on at least the geometrically weighted sum of properties of the volume. The method may further comprise: applying a transform to the geometrically weighted sum of the predicted properties to produce predicted coefficients, and applying the transform to the geometrically weighted sum of the corresponding original properties of the occupied sub-volume to produce original coefficients; determining a plurality of AC coefficients from a difference between the original coefficients and the predicted coefficients, wherein the residual coefficients comprise a DC coefficient and the plurality of AC coefficients; and encoding the plurality of AC coefficients to output a bitstream of compressed point cloud data.

In another aspect, the present application describes a method of decoding a bitstream of encoded point cloud attributes, the point cloud being located within a space that is recursively split and includes points in the point cloud, each point having a respective attribute. The method may comprise, in a top-down encoding process with respect to a recursively split space, for a volume containing sub-volumes comprising a plurality of points, determining a predicted geometrically weighted sum of properties of each occupied sub-volume of the volume based on a prediction operation based on at least the geometrically weighted sum of properties of the volume. The method may further comprise: applying a transform to the geometrically weighted sum of predicted attributes to produce predicted AC coefficients; decoding the bitstream to reconstruct residual AC coefficients; setting the DC coefficient to a geometrically weighted sum of the properties of the volume; adding the residual AC coefficients and the DC coefficients to the predicted AC coefficients to produce reconstructed coefficients; and inverse transforming the reconstructed coefficients to produce a reconstructed geometrically weighted sum of the properties of the occupied sub-volume. At maximum depth, the reconstructed property geometric weighted sum of the occupied sub-volume is the reconstructed property.

In a further aspect, the present application describes an encoder and a decoder configured to implement such encoding and decoding methods.

In yet a further aspect, the present application describes a non-transitory computer-readable medium storing computer-executable program instructions that, when executed, cause one or more processors to perform the described encoding and/or decoding methods.

In yet another aspect, the present application describes computer-readable signals containing program instructions that, when executed by a computer, cause the computer to perform the described encoding and/or decoding methods.

Computer-implemented applications are also described, including terrain applications, map applications, automotive industry applications, autonomous driving applications, virtual reality applications, and cultural heritage applications, among others. These computer-implemented applications include the following processes: the method may further comprise receiving a data stream or data file, unpacking the data stream or data file to obtain a bit stream of compressed point cloud data, and decoding the bit stream as described in the above aspects and implementations thereof. Thus, these computer-implemented applications utilize point cloud compression techniques in accordance with aspects and implementations described throughout this application.

Methods of encoding and decoding point clouds, and encoders and decoders for encoding and decoding point clouds are also described. In some implementations, the receiving unit receives multiplexed data that is obtained by multiplexing the encoded point cloud data with other encoded data types (such as metadata, images, video, audio, and/or graphics). The receiving unit comprises a demultiplexing unit for separating the multiplexed data into encoded point cloud data and other encoded data, and at least one decoding unit (or decoder) for decoding the encoded point cloud data. In some other implementations, the transmitting unit transmits multiplexed data obtained by multiplexing the encoded point cloud data with other encoded data types (such as metadata, images, video, audio, and/or graphics). The transmitting unit comprises at least one encoding unit (or encoder) for encoding the point cloud data and a multiplexing unit for combining the encoded point cloud data with other encoded data into multiplexed data.

Other aspects and features of the present application will become apparent to those ordinarily skilled in the art upon review of the following description of examples in conjunction with the accompanying figures.

Any feature described in relation to one aspect or embodiment of the invention may be used in relation to one or more other aspects/embodiments. These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described herein.

The terms "node," "volume," and "sub-volume" are used interchangeably in the following description. It will be understood that a node is associated with a volume or sub-volume. A node is a particular point on the tree, which may be an internal node or a leaf node. A volume or sub-volume is a bounded physical space represented by the node. In some cases, the term "volume" may be used to refer to the largest bounded space defined to contain a point cloud. The volume may be recursively divided into sub-volumes for the purpose of constructing a tree structure of interconnected nodes for encoding the point cloud structure.

In this application, the term "and/or" is intended to cover all possible combinations and sub-combinations of the listed elements, including any element, any sub-combination, or all of the listed elements, alone or in combination, without necessarily excluding additional elements.

In this application, the phrase "at least one of … or …" is intended to cover any one or more of the listed elements, including any of the listed elements individually, any subcombination, or all of the elements, without necessarily excluding additional elements, and without necessarily requiring all of the elements.

Many of the examples below will mention compression of point cloud data. Point clouds provide a suitable example to illustrate the advantages of the present application, as the data in a point cloud is bulky and predictive operations can be used to improve compression. However, it will be understood that point clouds are an example and the present application provides methods and apparatus that may be applied to compress other types of data for other purposes.

A point cloud is a collection of points in a three-dimensional coordinate system. These points are often intended to represent the surface of one or more objects. Each point has a position (location) in the three-dimensional coordinate system. The position may be represented by three coordinates (X, Y, Z), which may be a cartesian coordinate system or any other coordinate system. The terms "position," "location," or "geometry" may be used interchangeably herein to refer to the location of a point in space.

A point may have other associated attributes such as color, which in some cases may also be a three-component value such as R, G, B or Y, Cb, Cr. Other associated attributes may include transparency, reflectivity, normal vectors, timestamps, etc., depending on the desired application for the point cloud data.

The point cloud may be static or dynamic. For example, a detailed scan or mapping of an object or terrain may be static point cloud data. LiDAR-based scanning of an environment for machine vision purposes may then be dynamic in that the point cloud changes (at least potentially) over time, e.g., with each successive scan of a volume. The dynamic point cloud is thus a time ordered sequence of point clouds.

Point cloud data may be used in many applications, including protection (scanning of historical or cultural objects), mapping, machine vision (such as autonomous or semi-autonomous automobiles), and virtual reality systems, to name a few. Dynamic point cloud data for applications such as machine vision may be quite different from static point cloud data for protection purposes, for example. For example, automotive vision typically involves a relatively small resolution, achromatic, highly dynamic point cloud acquired by a LiDAR (or similar) sensor having a high capture frequency. The target of such point clouds is not for human consumption or viewing, but rather for machine object detection/classification in the decision-making process. The attributes may also be derived from a detection/classification algorithm that segments the point cloud into detected/classified objects; in such a case, the attribute value is typically a tag of the object to which the point belongs. By way of example, a typical LiDAR frame contains tens of thousands of points, whereas high quality virtual reality applications require millions of points. It is expected that as computing speeds increase and new applications are discovered, the need for higher resolution data will arise over time.

While point cloud data is useful, lack of efficient and effective compression (i.e., encoding and decoding processes) may prevent its acceptance and deployment.

A more common mechanism for encoding point cloud data is through the use of tree-based structures. In a tree-based structure, a bounded three-dimensional volume of a point cloud is recursively divided into sub-volumes. The nodes of the tree correspond to sub-volumes. The decision whether to further partition a sub-volume may be based on the resolution of the tree and/or whether any points are contained in the sub-volume. A leaf node may have an occupancy flag that indicates whether its associated sub-volume contains a point. The split flag may indicate whether the node has child nodes (i.e., whether the current volume has been further divided into sub-volumes). These flags may be entropy encoded in some cases, and predictive encoding may be used in some cases.

One commonly used tree structure is an octree. In this structure, the volumes/sub-volumes are all cubes, and each split of a sub-volume results in eight further sub-volumes/sub-cubes. Another commonly used tree structure is the KD-tree, where a volume (cube or rectangular cuboid) is recursively bisected by a plane orthogonal to one of the axes. Octree is a special case of a KD-tree, where a volume is divided by three planes, where each plane is perpendicular to one of the three axes. Both examples relate to a cube or rectangular cuboid; however, the application is not limited to such a tree structure, and the volumes and sub-volumes may have other shapes in some applications. The partitioning of the volume does not have to be into two sub-volumes (KD-trees) or eight sub-volumes (octree), but may involve other partitioning, including into non-rectangular shapes or involving non-adjacent sub-volumes.

For ease of explanation, the present application may make reference to octrees, and this is also because they are popular candidate tree structures for automotive applications, but it will be understood that the methods and apparatus described herein may be implemented using other tree structures.

In the description herein, reference may be made to a "level" or "depth" of a point cloud or tree representation thereof. In tree-based recursive splitting of sub-volumes, it will be appreciated that each successive split adds a further level or depth to the tree, which may be as deep as the depth or level at which each occupied sub-volume contains a point of the point cloud. The sub-volumes may also be referred to as "nodes". Conventionally, in this application, the root or top node or level is the largest defined volume containing at least a portion of the point cloud. For example, in some cases, a volume containing a point cloud may be partitioned into Largest Coding Units (LCUs), and each LCU may be independently encoded. In some cases, the LCU may be a volume containing the entire point cloud. Also by convention, in the present application, a sub-volume at depth d (or level) may be sub-divided into sub-volumes at a "greater" depth d +1, such that the depth d increases as the resolution of the tree increases. In this sense, a "larger" depth or level refers to a higher resolution level of a smaller sub-volume in a top-down configuration of the tree with the root node at the top of d 0 or d 1, depending on the convention chosen for the depth index d.

The present application may also relate to "upsampling" of attribute data, where attribute data from level d is used to construct a prediction of attribute data of level d +1, i.e. data is upsampled to predict data at a greater depth/higher resolution. The conventions for these terms or labels may be modified in some implementations without affecting the essential operation of the methods and apparatus described herein.

The geometry of the tree is often losslessly encoded. Flags or other bits defining the tree structure may be serialized in some cases. A binary encoder or a non-binary encoder may be used. Predictive operations may be used in some implementations to attempt to further compress the data. Entropy coding may also improve compression. At the decoder, the compressed data is losslessly decoded to reconstruct the geometry of the tree, enabling the decoder to determine the location of each point in the point cloud.

In some cases, the geometry may be encoded using lossy compression. In this case, the encoded octree represents an approximation of the original point cloud, which typically requires fewer bits than lossless encoding of the octree representing the original point cloud, but at the cost of distortion between the encoded geometry and the original geometry. In lossy compression, during the encoding process, interpolation from the attributes of the original point closest to the reconstructed point of the lossy-encoded and reconstructed point cloud may be used to find the attribute associated with the reconstructed point.

Moving Picture Experts Group (MPEG) and international organization for standardization (ISO) are continuously conducting discussions on standards for Point Cloud Compression (PCC). For example, current work is reflected in MPEG-I part 9 regarding geometry-based point cloud compression. Those of ordinary skill in the art will be familiar with options for compression of geometric point cloud data.

The point cloud may include more than just the geometric location of the points. In some cases, a dot has an attribute, such as color, reflectivity, transparency, a timestamp, or other characteristic that can be embodied in some sort of value or parameter. For example, the color may be a three-component color value, such as RGB or YUV, which are commonly used in video and images.

There are two competing attribute coding methods under consideration: level of detail (LoD) and Region Adaptive Hierarchical Transform (RAHT). LoD is described, for example, in "G-PCC codec description" (ISO/IEC JTC1/SC29/WG11, Australian, China, output document w18015,2018, 12 months). RAHT is described, for example, in Compression of 3D point Cloud Using a Region-Adaptive Hierarchical Transform (R.L.de Queiroz, P.A.Chou, IEEE Transactions on Image Processing, vol 25(8),2016, 8 months).

In general, LoD involves sampling a Point Cloud (PC) at several levels of detail from 1 to L to obtain an increased set of points

The attributes are coded hierarchically, first for E₁Point of (1), then for E₂In (out of E)₁Middle), etc. E_l-1Is used as E_lA predictor (predictor) of the property in (1); for example by calculating a weighted average of adjacent encoded properties, subtracting this average from the original properties to obtain a residual, and encoding the residual. This approach has similarities to scalable video coding.

LoD finds efficient application in the case of dense point clouds, such as might be encountered in VR applications, as an example. Due to the lack of location correlation, for example, it does not necessarily perform well in the case of sparse point clouds such as those produced by LiDAR. LoD is also relatively computationally demanding because it works on unstructured geometry for neighboring points of the encoded attribute, i.e., a search for neighboring points may be particularly desirable in sparse point clouds because neighboring points do not necessarily belong to neighboring nodes in an octree, which makes using octrees to find neighboring points inefficient.

RAHT is a form of 3D transform applicable to a 2 × 2 × 2 cube, and is applied by continuously performing 2-point transform in X, Y and the Z direction to generate a set of AC coefficients, and finally reducing the cube to one DC coefficient by the continuous transform. RAHT is orthogonal and relatively easy to implement in terms of computational complexity. RAHT finds efficient application in the case of sparse point clouds, but does not necessarily perform well in the case of dense point clouds. Further details regarding RAHT transformation are provided below.

According to one aspect of the present application, a method and apparatus for encoding point cloud attributes is described that performs well compared to both LoD and RAHT, whether the point cloud is dense or sparse. The method and apparatus may involve using per-node transforms rather than per-direction transforms like in RAHT, although in some embodiments the transforms may be direction-specific transforms, RAHT being a possible example among others. In some cases, the method and apparatus may include prediction of attributes, particularly inter-depth prediction of attributes. In some cases, the sum of the attributes of the parent nodes at one depth level is used at least in part to predict the sum of the attribute values associated with their child sub-volumes at the next depth level. In some such cases, the attribute data from one or more neighbors of the parent node may be used in predicting the sum of the attribute values of the child sub-volumes. The sum of the attributes used may in some cases be a geometric weighted sum or a mean sum of the attributes. In at least one example, the prediction operation applies a weight to the neighboring property from the neighboring node that reflects (the inverse of) the geometric distance of the neighboring node to the child sub-volume whose property value is to be predicted.

As seen in image and video coding, the transformation provides the possibility of compression gain by mapping pixel or voxel domain data to the spectral domain. The resulting transform domain data includes a DC component and a plurality of AC components. By concentrating the data in a DC component as well as several lower frequency AC components, the overall compression can be improved. In some cases, this is further combined with coefficient quantization in lossy coding schemes to further improve data compression at the cost of introducing distortion to the encoded data relative to the original data.

A difficulty with point cloud compression compared to video or image compression is that not every location in the partitioned volume may be a point present. Towards the leaf nodes of the code tree, some sub-volumes may contain points and some may not.

FIG. 1 shows one example of an octree-based point cloud geometry as reflected at depth d and a greater depth d + 1. At depth d, the occupied subvolume is indicated by shading. The current sub-volume 102 is indicated by a darker shading. At depth d +1, the child sub-volume 104 of the current sub-volume 102 is shown shaded. At the highest resolution depth, the occupied child sub-volumes 104 each contain a respective point of the point cloud, and in this example, each point has a respective attribute value(s).

RAHT starts at the deepest level (i.e., highest resolution) where each occupied subvolume contains a single point. To perform attribute data compression using RAHT, a two-point transform is first applied in one direction (x, y, or z). Figure 2 graphically illustrates the application 201 of the 2-point transform to an example child sub-volume 104 when applied along direction 200. If two child sub-volumes aligned in the direction of the transform are occupied, i.e. have corresponding property values, RAHT converts them into a DC component and an AC component. If the attributes are respectively represented by c₁And c₂Given, then RAHT can be expressed as:

for example, as depicted in FIG. 2, property c of two corresponding child sub-volumes 210 and 211₁And c₂Transformed by a two-point RAHT transform into DC coefficients and AC coefficients 213 associated with the merged sub-volume 212. The same process is applied to the properties of the sub-volumes 220 and 221, the properties of the sub-volumes 220 and 221 being transformed into DC coefficients and AC coefficients 223 associated with the merged sub-volume 222. The volume 230 is not aligned with another sub-volume along the direction 200, it is not transformed (or equivalently transformed using a one-point transform that is an identity transform to obtain DC coefficients), and its (un) transformed coefficients (i.e., DC coefficients) are associated with the merged sub-volume 232.

Elementary transformation RAHT (w)₁,w₂) Can be defined as:

wherein w₁Is the number of points contained by the first child sub-volume, and w₂Is contained by the second child sub-volumeThe number of points of (c). In the case of the deepest level, the number w_iIs 1. After the elementary transform, the AC coefficients are encoded and the DC coefficients are kept as new information associated with the merging of the two child nodes. The merged subvolume having a sum of w₁+w₂Given the number of associated points.

By construction, the merged sub-volume forms a set of volumes belonging to a 2D structure perpendicular to the direction of the transform. For example, the

merged sub-volumes

212, 222, and 232 belong to a plane perpendicular to the direction 200. The method may then be applied recursively along a second direction 250 perpendicular to the first direction 200.

Fig. 3 illustrates the

recursive application

201, 301 and 302 of RAHT in three consecutive

orthogonal directions

200, 250 and 300 of an example child node 104. The second application 301 of RAHT for the

merged sub-volumes

212, 222 and 232 in the second direction 250 provides two DC coefficients and one AC coefficient 323 associated with the further merged

sub-volumes

312 and 322. The two further merged

sub-volumes

312 and 322 belong to a 1D structure perpendicular to the two

directions

200 and 250, i.e. the sub-volumes are aligned along the direction 300 perpendicular to the first two

directions

200 and 250. Then, RAHT provides unique DC coefficients 332 and AC coefficients 333 in the third direction 300 for the third application 302 of the further merged

sub-volumes

312 and 322.

As a result, the recursive application of RAHT on the child node 104 provides a unique DC coefficient 332 and a set 343 of AC coefficients (213, 223, 323, and 333 in the example of fig. 3). It should be understood that this property remains the same regardless of the configuration of the occupied sub-volume in the child node 104.

AC coefficients 343 obtained from the application of RAHT for three directions are encoded in the bitstream. The obtained unique DC coefficient 332 becomes the "attribute data" of the parent node in the next round of recursive encoding, in which the parent node and its seven siblings in the octree are encoded using the same RAHT process. The process continues recursively in a bottom-up fashion until the root node is reached. At the root node, the AC coefficients and the final DC coefficients are encoded in the bitstream.

Top-down attribute coding

In one aspect of the present application, instead of using a bottom-up recursive transform like RAHT, the encoding process is top-down, i.e., starting from the root node and progressing downward toward the level of the sub-volume containing the individual points. Also, as mentioned above, in some implementations, a transformation of the "entire node" is applied to find the DC coefficients and associated AC coefficients of the sub-volumes.

In another aspect of the application, the transformation is applied to a set of attribute geometric weighted sums of child nodes within a node; the decoded property geometric weighted sum for each subvolume corresponds to the respective DC coefficient for the transform to the next level.

In yet further aspects of the present application, inter-depth prediction may be incorporated. Inter-depth prediction may be used to predict attribute values of child nodes. More specifically, the prediction may be a prediction of a geometrically weighted sum of attribute values in the child nodes. This prediction of the geometric weighted sum of attributes is subtracted from the actual or original geometric weighted sum of attributes to obtain a residual geometric weighted sum of attributes. Note that the transformation may be applied before or after the subtraction. After transformation of the residual (or transformation of the predicted and original values before finding their difference), a set of residual AC coefficients is generated. The AC coefficients are encoded. Note that the DC coefficient does not need to be encoded because it is known from the inverse transform of the coefficient of the previous (parent) level.

At the encoder, the geometry is known, so that the number of points per sub-volume is known. At the decoder, the position of the points in space is also known from the decoding of the compressed point cloud geometry. Thus, both the encoder and decoder have structural information to know whether a sub-volume contains points, and the number of points w each node contains can be found in the octree using a simple bottom-up process.

The number of points in any given sub-volume may be designated as w, where w is the number of points in the sub-volume at depth d. For each point p, the attribute value may be designated as attribute (p). The sum of the attribute values in a given node (node) may be given by:

the mean of the attributes in the node is thus:

a_node＝A_node/w_node

the geometrically weighted sum of the attribute values can be defined as

Transformation of attribute information in the field

Occurs in (1). This domain is used because the construction of the orthogonal transform implies a "DC coefficient property", i.e. a DC coefficient

The quantity determined for each sub-volume from the root node down to the leaf nodes is a geometrically weighted sum of the attributes:

when transforming the geometrically weighted sum of attributes of its set of child nodes, this corresponds to the DC coefficient of that node. This would allow a top-down encoding process where each layer inherits its DC value, i.e. the attribute geometric weighting value of the parent node, from the inverse transform of the parent node level.

In encoding a set of child nodes within a node, an encoder determines a geometrically weighted sum of attributes for each child node

If there are k child nodes and the transform is designated as T, the encoder applies the transform to the set of k child nodes to produce transform domain coefficients:

the DC coefficient has been known as a quantity from the parent node

Thus, the encoder encodes the AC coefficients and then proceeds to perform the same process within each of the child nodes.

The decoder starts at the root node and decodes the root DC coefficient and the AC coefficient, and then inverse transforms them to obtain decoded quantities of the root node's child nodes

For each of these child nodes, the decoder then uses the decoded amount of the ith child

As the (decoded) DC coefficient for that child node, the AC coefficient is decoded, and an inverse transform is applied to obtain a geometric weighted sum of the decoded attributes for each of the grandchild nodes within that child node

This process repeats until the decoding reaches the leaf node and the final decoded attribute value is obtained.

It will be appreciated that in some implementations, quantization may be applied to transform domain coefficients to introduce lossy coding.

Reference will now be made to FIG. 4, which illustrates a flow diagram of one example method 400 of encoding attribute data of a point cloud. The method 400 assumes that the three-dimensional space within which the point cloud data is located has been partitioned appropriately using tree-based recursive partitions, such as octrees. From the partitioning and geometric encoding, the encoder knows which leaf nodes contain points and therefore knows the number w of points in each sub-volume (node) inside the tree. Likewise, the encoder learns the attribute values associated with each point in the point cloud from the raw point cloud data. As mentioned above, the attribute may be color, reflectivity, or any other characteristic.

Method 400 begins at operation 402 with determining a geometric weighting value for an attribute of each child node at depth d +1 within the current (parent) node

Operation 402 is to encode the current (parent) node at depth d. In operation 404, the set of attribute geometric weighting values of the child node is transformed to produce a set of transform domain coefficients. The transform T used in operation 404 conforms to the DC coefficient properties such that the DC coefficient

In operation 406, if the current node is the root node, the DC coefficient is encoded. The encoding may be entropy encoding. If the current node is not the root node, the DC coefficient is not encoded, since the decoder will already know it from the coefficients at the upper depth that have already been decoded.

In operation 408, the AC coefficients are encoded. The encoding may be entropy encoding.

If the process is lossy, e.g. the transform comprises a quantization step to quantize the transform domain coefficients, then in operation 410 the encoder reconstructs the (decoded) attribute geometric weighted sum of child nodes by dequantizing and inverse transforming the quantized transform domain coefficients, as the decoder would do

It then sets the DC coefficient of each child node to its reconstructed attribute geometric weighted sum.

In operation 412, the encoder evaluates whether there are additional nodes for the current depth d to encode. If so, it moves to the next subvolume of depth d as indicated by operation 414 and returns to operation 402. If not, it determines whether it is at the maximum depth in operation 416. If so, it ends and if not, it moves to the next lower depth d → d +1 in operation 418 and returns to operation 402 to continue encoding at the next level. It will be appreciated that this example is a breadth-first encoding example.

Reference will now be made to FIG. 5, which illustrates a flow diagram of an example method 500 of decoding compressed attribute data of a point cloud. The method 500 assumes that the decoder has decoded the geometry of the point cloud and thus the location of the point in the partitioned three-dimensional space has been determined. Thus, the decoder knows the number w of points in each sub-volume in the tree-based structure of the encoded point cloud.

The method 500 begins with operation 502. If the current node is the root node, the decoder decodes the DC coefficient. In operation 504, the decoder decodes the AC coefficients of the current node from the bitstream. It then combines the DC coefficient with the decoded AC coefficient and inverse transforms (and dequantizes, if applicable) the coefficient to produce a decoded attribute geometric weighted sum

A collection of (a). These are the geometric weighted sum of the decoded attributes associated with the child nodes of the current node. Furthermore, each decoded attribute of a node geometrically weights and acts as a (decoded) DC coefficient for that node when the next level node is encoded. Thus, in operation 508, the decoder may set the (decoded) DC coefficient of each child node to its corresponding reconstructed attribute geometric weighted sum.

In operation 510, the decoder evaluates whether there are additional nodes at the current depth d. If so, return to operation 504 to continue decoding. If not, the decoder determines if it is already at the maximum depth (e.g., leaf node) in operation 512, and if so, it ends. If not, move to the next lower depth d → d +1 and return to operation 504.

Top-down coding with inter-depth prediction

As mentioned above, inter-depth prediction may be applied to improve compression performance. The inter-depth prediction process uses information from the parent depth, such as attribute information from nodes adjacent to the parent node, to predict attribute information of the child nodes. The prediction is then subtracted from the actual attribute information at the child node level and the residual data is encoded. The use of attribute information such as parent level attribute information from neighboring nodes to predict child depth may be referred to as "upsampling".

In some examples, a "neighboring node" may include a node that is a sibling of the parent node within the sub-volume, such as seven siblings in an octree structure. In some examples, a "neighboring node" may include a node that shares a face with a parent node. In some examples, a "neighboring node" may include a node that shares an edge with a parent node. In some examples, a "neighboring node" may include a node that shares a vertex with a parent node.

Since the data encoded in these examples is a sum of geometric weights of the attributes

The prediction operation aims at predicting a geometrically weighted sum of properties. However,

depends on the number w of points, which means that

When w is large, it follows

Progressively growing. Thus, in some implementations, to perform inter-depth prediction in a bounded domain, the upsampling process is performed in a mean attribute domain, e.g., a/w, which is naturally bounded by a range of attribute values. The geometric weighting value of the node attribute may be divided by

And converted to the mean property domain. That is, the sum of the mean values of the attributes a is obtained as

The bounded nature of the mean attribute domain is advantageous because it relates to a more physical meaning (the mean attribute is a physical quantity, such as the mean color, but the geometric weighted sum of the attributes is not generally) and the numerical stability of the upsampling process, thus resulting in a more efficient prediction. Moreover, having a bounded domain simplifies fixed point implementation.

The sum of the mean values of the property values at depth d may then be used in an upsampling process to predict the sum of the mean values of the upsampled property values at depth d + 1. The mean sum of attribute values may be a mean sum of attribute values from the parent node and/or one or more neighboring nodes. Any of a variety of possible upsampling operations may be used, some examples of which are described further below. For example, for a child node, the predicted average sum of the upsampled attribute values a_upMay then be converted to a predicted, upsampled attribute geometric weighted sum at depth d +1

Thus, the upsampling produces a predicted geometric weighted sum of attributes for the set of child nodes. The encoder subtracts the predicted geometric weighted sum of attributes from the actual (original) geometric weighted sum of attributes to obtain a residual value. These are then transformed to find the AC coefficients used for encoding.

In some cases, a transform is applied to the predicted geometric weighted sum of attributes to obtain predicted coefficients, and the transform is applied to the original geometric weighted sum of attributes to obtain original coefficients. The predicted coefficients are then subtracted from the original coefficients to obtain the AC coefficients for encoding.

Referring now to fig. 6, an example encoding process 600 for point cloud attributes using top-down encoding and inter-layer prediction is diagrammatically illustrated. The process 600 is applied to encode property information for nodes within the parent sub-volume 602. The parent sub-volume 602 is partitioned into a set of child sub-volumes 604 at a depth d.

At depth d-1, the parent subvolume 602 has a contiguous occupied subvolume 606. In this example, the set of adjacent sub-volumes 606 may include any occupied sub-volume at depth d-1 that shares a vertex with the parent sub-volume 602. The encoder determines a geometrically weighted sum of attributes for the parent node and each of the neighboring nodes

These property geometric weighted sums may be "reconstructed" (decoded) property geometric weighted sums resulting from decoding the encoded coefficient data for their respective sub-volumes and inverse transforming the coefficients to reconstruct the geometric weighted sum, especially if the encoding is lossy due to the quantization used in the transformation process. In this way, the encoder ensures that it works with the same data that the decoder can use.

The encoder then applies "normalization" to the signal by dividing by the corresponding

Valued to geometrically weight the sum of the properties of each of the parent sub-volume 602 and the adjacent sub-volume 606 at depth d-1

Transform to the mean property domain. These values are known to both the encoder and decoder, since the geometry of the point cloud is known to both. As a result, the encoder determines the attribute mean and A for each of the parent nodes and each of its occupied neighbor nodes_i/w_i. Using these values, the encoder applies an upsampling operation to generate a predicted property mean and a for each occupied child sub-volume 604 of the parent sub-volume 602_i,up/w_i。

The encoder then averages and inverse normalizes the predicted attribute values to obtain a geometric weighted sum A of the predicted attributes for each occupied child sub-volume 604_i,up/w_i。

In this example, the encoder then transforms the geometrically weighted sum of predicted properties to obtain predicted transform domain coefficients. The geometric weighted sum of the original attributes of each child sub-volume 604

Determined by the encoder and transformed to generate the original transform domain coefficients. The predicted AC coefficients are subtracted from the original AC coefficients to achieve residual AC coefficients, which the encoder then entropy encodes to output a bitstream of encoded data for the parent node 602.

Fig. 7 diagrammatically illustrates the determination of residual AC coefficients by subtracting predicted coefficients from the original coefficients.

Fig. 8 diagrammatically illustrates an example of a decoding process 800. The decoder uses the same prediction process to generate a predicted attribute geometric weighted sum

It also reconstructs residual AC coefficients by entropy decoding and inverse quantization. Note that the DC coefficients are not taken directly from the bitstream, but are known to the decoder by the already reconstructed and inverse transformed coefficients at the parent depth (depth) d-1. The (decoded) DC component is given by a geometrically weighted sum of the reconstructed properties of the parent nodes,

the predicted AC coefficients are then added to the reconstructed residual AC coefficients to produce reconstructed AC coefficients. These may alternatively be referred to herein as decoded coefficients. The DC coefficients obtained from the parent depth and the reconstructed AC coefficients are then subjected to an inverse transform to obtain a reconstructed attribute geometric weighted sum of the child sub-volume 604 at depth d-1

Referring now to fig. 9, an example encoder 900 for encoding attribute data of a point cloud is shown in block diagram form. The encoder 900 may be implemented using a combination of hardware and software, such as one or more processing units, memory, and processor-readable instructions. For clarity and ease of explanation, portions of the encoder 900 or ancillary elements (such as input, output, user interface devices) or other such components related to the encoding of the point cloud geometry are not illustrated.

The encoder 900 has original point cloud attribute information 902 and point cloud geometry 904. The encoder 900 includes a DC coefficient property compatible transform operator 906 to transform the original point cloud attributes in the form of a geometrically weighted sum of attributes into transform domain coefficients.

The same transformation operator 908 is applied to the predicted attribute geometric weighted sum obtained from the prediction/upsampling operator 910, which prediction/upsampling operator 910 uses as its input the attribute geometric weighted sum from the parent node and one or more of its neighbors (i.e., at the upper depth). The output of the transform operator 908 is a set of predicted AC coefficients, which are then subtracted from the original AC coefficients from the transform operator 906 to produce residual AC coefficients. These residual AC coefficients are quantized and encoded by quantizer and encoder 920 to produce an output bitstream of compressed point cloud attribute data.

The input to the prediction/upsample operator 910 is provided via a decoding feedback loop 912, the quantized residual AC coefficients are inverse quantized in this decoding feedback loop 912 in an inverse quantizer 922 and then added to the predicted AC coefficients to produce reconstructed AC coefficients. Along with the DC coefficient, this set of coefficients is then inverse transformed by an inverse transform operator 914 to produce a reconstructed attribute geometric weighted sum for the child node of the current node, which will then serve as an input to the next level of prediction operation down.

An example of a corresponding decoder 1000 is illustrated in block diagram form in fig. 10. From the point cloud location data that has been decoded before, the decoder 1000 has its available point cloud geometry. For ease of explanation, elements related to decoding are not necessarily illustrated.

The decoder 1000 includes a decoder and inverse quantizer 1002 to decode and inverse quantize residual AC coefficients encoded in a bitstream. The decoder 1002 includes a prediction/upsample operator 1010 that mirrors the same components 910 (fig. 9) from the encoder 900 (fig. 9). A prediction/upsample operator 1010 takes the decoded/reconstructed geometric weighted sum of the attribute data from the upper depths (e.g., parent node and one or more neighboring nodes) and produces a predicted attribute geometric weighted sum. Those predicted attributes are subjected to a transform 1004 to produce predicted AC coefficients. Those predicted AC coefficients are added to the reconstructed residual AC coefficients from the decoder and inverse quantizer 1002 to produce reconstructed AC coefficients. An inverse transform 1006 is applied to obtain a reconstructed geometric weighted sum of the attributes of the current node. Since the decoder 1000 processes the data with a top-down reconstruction, the decoded inverse transformed attribute data yields reconstructed attribute information for each point once it reaches a leaf node.

Upsampling operations

As mentioned above, the prediction operation employs upsampling of attribute information from depth d-1 to produce predicted attribute geometric weight values for the nodes at d. In these examples, upsampling is performed in the domain of the property means and a/w to ensure bounds on the value to be upsampled and numerical stability, but it will be appreciated that in some implementations the domain may be used

The prediction operation in (1).

Reference will now be made to fig. 11, which illustrates an upsampling process 1100 in two-dimensional diagrammatic form. At depth d-1 there is a parent node 1102 and neighboring

nodes

1104a, 1104b, and 1104c (collectively 1104). For clarity and ease of explanation, this example is illustrated in two dimensions, but extensions to three dimensions will be understood in view of the description herein.

Parent node 1102 has child node 1106 for which attribute information is to be predicted. The DC coefficient of each of the parent node 1102 and the neighboring nodes 1104 is known. Since the tree geometry is already known by encoding/decoding the point cloud geometryThe number w of points in any node is known to both the encoder and decoder. Thus, by dividing by the correspondence of the corresponding node

Value, equivalent to the geometrically weighted sum of the attributes of parent node 1102 and neighbor node 1104

Can be converted to an attribute mean and a/w.

The upsampling operation is then applied to generate the predicted attribute value mean sum a for the child node 1106_up. The upsampling operation takes as input the sum of the property means of the parent node 1102 and its occupied neighbor node 1104. In this example implementation, the upsampling operation also takes into account the distance metrics that associate the child node 1106 with the respective parent node 1102 and neighboring node 1104. The distance metric may reflect the distance between the center point of the sub-volume corresponding to the child node 1106 and the center point of the sub-volume corresponding to the respective parent node 1102 or neighboring node 1104. Reciprocal of the distance

The relative weight of the correlation between the attribute information from the node at depth d-1 and the node at depth d can be reflected. Other weighting factors or additional weighting factors may be used in other implementations of the upsampling operation. In one example, the predicted attribute mean sum for the child node 1102 may be given by the weighted sum of:

in some implementations, the upsampling may be implemented using a FIR (finite impulse response) filter.

Full-node transformation

As mentioned above, the two-point RAHT transform can be applied successively in the x, y and z directions to find the DC transform domain coefficients and AC coefficients for the octree-based sub-volume. Review the formula for RAHT transform given by:

in a practical implementation, the transformation is performed in three cascaded steps, for example by applying it in the X-direction, then in the Y-direction, then in the Z-direction. However, it is mathematically possible to combine two elementary RAHT transforms to obtain a single orthogonal three-point transform. The generalization to more points is derived by induction. Order to

Are and each contain w_iAttribute information associated with three nodes of a point. The first elementary RAHT transform is applied to the first two nodes to obtain a first DC coefficient DC_2pA first AC coefficient AC₁And unchanged third node information

The second primary RAHT transform is then applied to the first DC coefficient and the third node information. As a result, a second DC coefficient combining the three points is obtained

And two AC coefficients.

The three-point orthogonally transformed orthogonal matrix RAHT (w) then combines the two elementary two-point RAHT transforms₁，w₂，w₃) Is the product:

bearing in mind that we might consider graph transformation next. Graph transforms are a general framework for constructing transforms on any set of points. This mathematical construction is described in detail below.

Reference will now be made to fig. 12, which illustrates an exemplary diagram reflecting five nodes or points. This set of points may be designated as E. For two different points p in E_i≠p_jThe correlation factor d between these two points_ijMay be determined. In many example implementations, the correlation factor is a decreasing function of the distance from two points. For example, the inverse of the euclidean distance (with a negative sign as usual) may be taken:

d_ij＝-1/||p_j-p_i||₂

the diagonal terms may be obtained by summing the terms row by row as follows:

d_ii＝-∑_j≠id_ij

by construction, the matrix is symmetric and diagonal dominant. Thus, it is diagonalizable in the orthonormal basis:

D＝VAV^t-shirtIn which V is^T＝V^-1

A Laplace bilinear operator may be defined to operate on pairs (a, b) of attributes associated with a set of points E by:

Lap(a，b)：＝a^TDb

norm and distance can be derived:

||a||_Lap：＝Lap(a，a)＝a^TDa

d_Lap(a，b)：＝||b-d||_Lap

for example, the distance may measure the original attribute and its encoded version attribute_codeA distortion Δ (in the following expression, the symbol α is used instead of attribute):

Δ＝d_Lap(α，α_code)＝(α-α_code)^TD(α-α_code)

using an orthogonal decomposition of the matrix D, it is possible to obtain:

wherein V_iIs the ith column of V. Thus, the graph transform GT for the set E of points is naturally:

GT(E)：＝V^T

by construction, the transform is orthogonal and well suited for attribute compression by first applying a constant step quantizer to the transformed attribute coefficients and then entropy encoding the quantized coefficients. Graph transforms are powerful in compression efficiency when applied to large sets of points, but are not as practical as RAHT, since O (N) must be of typical complexity for N points²) To perform diagonal decomposition of the matrix.

Still considering the graph of FIG. 12, it is considered to have N nodes, where each node contains a corresponding number w_iAnd (4) points. The graph γ is constructed from the center of the cube associated with the node. From the graph gamma and the weight w_iA "weighted graph transform" WGT (w) may be constructed_iγ). Such a transformation would associate attribute information c with the node_iInto a DC coefficient and N-1 AC coefficients.

In some embodiments, to be compatible with attribute encoding for point clouds, the transformation is to reflect this relationship (using the notation of the previous symbol and w being w_iAnd (d) of (a):

which is referred to above as the "DC coefficient property". Thus, it is possible to provideThe matrix D uses the element D_ijDefined as the Laplace matrix of the map transform obtained for the graph γ. Subsequently, matrix D^W(i.e., a weighted Laplace matrix) may be defined as:

since the matrix is real and symmetric, it can be decomposed as follows:

D^W＝VΛV^Tin which V is^T＝V^-1

And similar to the unweighted graph transform, the weighted graph transform is defined as:

WGT(w_i，Y)：＝V^T

a matrix D can be shown^WThe kernel of (a) is non-zero,

and the associated column vector V in the orthogonal decomposition is:

this indicates the weighted graph transform WGT (w)_iγ) satisfies the DC coefficient properties set forth above. If all weights w_iAll having the same value w, the weighted graph transform is identical to the well-known non-weighted graph transform. This is particularly true when applied to occupied leaf nodes, where systematically w is 1. Interestingly, this two-point weighted graph transformation is identical to the two-point elementary RAHT transformation:

this means that RAHT is a special case of WGT.

The present encoding and decoding process may employ any transform that conforms to the properties of DC coefficients. This includes WGTs described above, including but not limited to RAHT.

Applications to other tree structures

The example described above is based on an octree geometry, where each node has eight children that divide the volume associated with the node into eight sub-volumes. The most common octree has cubes (or cuboids) associated with nodes and all cubes (or cuboids) of the same depth have a common size.

Another popular tree structure for point cloud representation is the KD-tree. They essentially split a volume into two sub-volumes, for example by splitting an initial cuboid into two sub-cuboids along a plane parallel to one face. The two sub-cuboids do not necessarily have the same size, i.e. the partitions may be unequal. The direction of splitting (among the three directions) and the non-uniformity of splitting (if any) are the information needed to represent the KD-tree.

Obtaining a transform T by directly applying the elementary RAHT transform to the two sub-volumes_nodeThe encoding and decoding processes described herein may be applied in a KD tree. This upsampling process is naturally performed in a direction perpendicular to the splitting plane.

More generally, the encoding and decoding process can be applied to any tree structure, as the upsampling process using a weighted sum can be applied to any configuration of volumes and sub-volumes. For example, all neighboring nodes within a fixed threshold distance from the current node may be used as predictors for the occupied child node of the current node for the upsampling of the mean attribute. The principles described above may be applied to other structures, properties, transforms, etc., as long as the DC coefficient properties remain unchanged. A generalized weighted graph transform can be applied to any tree and ensures the properties of the DC coefficients, thus providing another embodiment involving any tree structure.

Influence on compression Performance

The described techniques may perform well for both dense and sparse point clouds. They can be compared to RAHT-based procedures and LoD-based procedures.

Fig. 13 shows an example chart 1300 for one implementation of compression performance in encoding YUV attributes of dense point clouds. Graph 1300 indicates bits-per-point (bits-per-point) on the x-axis and peak signal-to-noise ratio (PSNR) on the y-axis. The chart includes the present procedure, RAHT and LoD. It is to be noted that the embodiment of the present procedure performs at least as well as or better than LoD and significantly better than RAHT on the combined YUV metric.

FIG. 14 illustrates another example chart 1400 in encoding reflectivity of a sparse point cloud for one implementation. Graph 1400 again shows the bits per dot on the x-axis and the PSNR on the y-axis. It will be noted that the embodiments of the present process perform roughly as well as RAHT and much better than LoD.

Complexity advantage of the proposed method of combining transform and upsampling

The complexity of the proposed method is the addition of the complexity of the transform process and the upsampling process. In some embodiments, due to the simplicity of the two-point RAHT transform and its recursive nature, very low complexity can be maintained with respect to the transform process. The upsampling process is very local in space, considering neighboring nodes, which typically share faces, edges or vertices with the parent node for which the attributes are to be upsampled.

In contrast, the LoD method is much more computationally demanding, especially for sparse point clouds, since it has to look for long-range (long-range) attribute correlations to allow efficient attribute prediction. This means a computationally intensive long-range neighbor search.

In the presently described approach, the combination of transform and upsampling automatically benefits from long-range correlation, since two points far away must belong to two adjacent nodes at a sufficiently low depth (i.e., close to the root node). At this sufficiently low depth, the transform and upsampling ensure that the correlation between the two points is used.

Referring now to fig. 15, a simplified block diagram of an example embodiment of an encoder 1500 is shown. The decoder 1500 includes a processor 1502, a memory 1504, and an encoding application 1506. The encoded applications 1506 may include computer programs or applications stored in the memory 1504 and containing instructions that, when executed, cause the processor 1502 to perform operations such as those described herein. For example, the encoding application 1506 may encode and output a bitstream encoded according to the processes described herein. It will be appreciated that the encoding application 1506 may be stored on a non-transitory computer readable medium, such as a compact disc, flash memory device, random access memory, hard disk, or the like. When executed, the processor 1502 performs the operations and functions specified in the instructions to operate as a special purpose processor implementing the described process (es). Such a processor may be referred to in some examples as a "processor circuit" or a "processor circuit arrangement.

Reference is now also made to fig. 16, which shows a simplified block diagram of an example embodiment of a decoder 1600. The decoder 1600 includes a processor 1602, a memory 1604, and a decoding application 1606. The decoding application 1606 may comprise a computer program or application stored in the memory 1604 and containing instructions that, when executed, cause the processor 1602 to perform operations such as those described herein. It will be appreciated that the decoding application 1606 may be stored on a computer readable medium, such as a compact disc, a flash memory device, random access memory, a hard disk, and the like. When executed, the processor 1602 performs the operations and functions specified in the instructions to operate as a special-purpose processor implementing the described process (es). Such a processor may be referred to in some examples as a "processor circuit" or a "processor circuit arrangement.

It will be appreciated that a decoder and/or encoder in accordance with the present application may be implemented in a variety of computing devices, including but not limited to servers, appropriately programmed general purpose computers, machine vision systems, and mobile devices. The decoder or encoder may be implemented in software containing instructions for configuring one or more processors to perform the functions described herein. The software instructions may be stored on any suitable non-transitory computer readable memory, including CD, RAM, ROM, flash memory, etc.

It will be appreciated that the decoders and/or encoders described herein, as well as the modules, routines, processes, threads, or other software components implementing the methods/processes described for configuring an encoder or decoder, may be implemented using standard computer programming techniques and languages. The application is not limited to a particular processor, computer language, computer programming environment, data structure, or other such implementation details. Those skilled in the art will recognize that the described processes may be implemented as part of computer executable code stored in volatile or non-volatile memory, as part of an Application Specific Integrated Circuit (ASIC), and so forth.

The present application also provides a computer readable signal encoding data generated by applying an encoding process according to the present application.

Certain adaptations and modifications of the described embodiments can be made. The embodiments discussed above are therefore to be considered in all respects as illustrative and not restrictive.

Claims

1. A method of encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being located within a space that is recursively split and contains points of the point cloud, each point having a respective attribute, the method comprising:

in a top-down encoding process with respect to the space being recursively split, for a volume containing a sub-volume comprising a plurality of points,

determining a predicted geometrically weighted sum of properties of each occupied sub-volume of the volume based on a prediction operation, the prediction operation being based on at least the geometrically weighted sum of properties of the volume;

applying a transform to the geometric weighted sum of predicted properties to produce predicted coefficients and to the corresponding original property geometric weighted sum of the occupied sub-volumes to produce original coefficients;

determining a plurality of AC coefficients from differences between the original coefficients and the predicted coefficients, wherein residual coefficients comprise a DC coefficient and the plurality of AC coefficients; and

encoding the plurality of AC coefficient encodings to output the bitstream of compressed point cloud data.

2. The method of claim 1, wherein the one DC coefficient is obtained from a geometrically weighted sum of the properties of the volume.

3. The method of claim 2, further comprising: determining the geometrically weighted sum of the properties of the volume by obtaining the geometrically weighted sum of the properties of the volume from an encoding of a parent volume, the volume being part of the parent volume.

4. The method of any of claims 1 to 3, further comprising: determining the geometrically weighted sum of attributes by: the attributes of all points located within the volume are summed and divided by the square root of the count of points located within the volume.

5. The method of any of claims 1 to 4, wherein the prediction operation is further based on a geometrically weighted sum of respective attributes of at least one neighboring volume, the at least one neighboring volume sharing at least one vertex with the volume.

6. The method of claim 5, wherein the prediction operation is based on upsampling the geometrically weighted sum of the properties of the volume and at least one of the geometrically weighted sums of the properties of the neighboring volumes.

7. The method of claim 6, wherein upsampling the geometrically weighted sum of the properties of the volume and at least one of the geometrically weighted sums of the properties of the neighboring volumes comprises: the method further includes the steps of geometrically weighting and normalizing the property by dividing by a respective square root of the geometrically weighted sum of the property and a count of points in the respective sum to obtain a respective property mean sum, upsampling the respective property mean sum to generate a predicted property mean sum for each occupied sub-volume, and denormalizing the predicted property mean sum for each occupied sub-volume to obtain the predicted property geometrically weighted sum for each occupied sub-volume of the volume.

8. The method of any preceding claim, wherein the transform conforms to DC coefficient properties such that a DC coefficient resulting from a transform of a set of geometrically weighted sums of attributes of child volumes is a geometrically weighted sum of attributes of the volumes.

9. A method of decoding a bitstream of encoded point cloud attributes, the point cloud being located within a space that is recursively split and contains points of the point cloud, each point having a respective attribute, the method comprising:

applying a transform to a geometrically weighted sum of the predicted attributes to produce predicted AC coefficients;

decoding the bitstream to reconstruct residual AC coefficients;

setting a DC coefficient to the geometrically weighted sum of the properties of the volume;

adding the residual AC coefficients and the DC coefficients to the predicted AC coefficients to produce reconstructed coefficients; and

inverse transforming the reconstructed coefficients to produce a reconstructed geometrically weighted sum of properties of the occupied sub-volume,

such that at a maximum depth the geometric weighted sum of the reconstructed properties of the occupied sub-volume is a reconstructed property.

10. The method of claim 9, further comprising: determining the geometrically weighted sum of the properties of the volume by taking the geometrically weighted sum of the properties of the volume from a decoding of a parent volume, the volume being part of the parent volume.

11. The method of claim 9 or 10, wherein the geometrically weighted sum of the attributes of the volume is based on a sum of attributes of all points located within the volume divided by a square root of a count of points located within the volume.

12. The method of any of claims 9 to 11, wherein the prediction operation is further based on a geometrically weighted sum of respective attributes of at least one neighboring volume, the at least one neighboring volume sharing at least one vertex with the volume.

13. The method of claim 12, wherein the prediction operation is based on upsampling the geometrically weighted sum of the properties of the volume and at least one of the geometrically weighted sums of the neighboring volumes.

14. The method of claim 13, wherein upsampling the geometrically weighted sum of the properties of the volume and at least one of the geometrically weighted sums of the neighboring volumes comprises: the method further includes geometrically weighting and normalizing the property by dividing by a respective square root of the geometrically weighted sum of the property and a count of points in the respective sum to obtain a respective property mean sum, upsampling the respective property mean sum to generate a predicted property mean sum for each occupied sub-volume, and denormalizing the predicted property mean sum for each occupied sub-volume to obtain the predicted property geometrically weighted sum for each occupied sub-volume of the volume.

15. The method of any of claims 9 to 14, wherein the transformation conforms to DC coefficient properties such that a DC coefficient resulting from a transformation of a set of geometrically weighted sums of attributes of child volumes is a geometrically weighted sum of attributes of the volume.

16. An encoder for encoding a point cloud to generate a bitstream of compressed point cloud data, the point cloud being located within a space that is recursively split and contains points of the point cloud, each point having a respective attribute, the encoder comprising:

a memory;

at least one processor;

a coded application stored in the memory and containing processor-executable instructions that, when executed by the at least one processor, cause the at least one processor to:

encoding the plurality of AC coefficient encodings to output a bitstream of compressed point cloud data.

17. Encoder in accordance with claim 16, in which the one DC coefficient is obtained from a geometrically weighted sum of the properties of the volume.

18. The encoder of claim 17, wherein the processor-executable instructions, when executed, further cause the at least one processor to: determining the geometrically weighted sum of the properties of the volume by obtaining the geometrically weighted sum of the properties of the volume from an encoding of a parent volume, the volume being part of the parent volume.

19. The encoder of any of claims 16 to 18, wherein the processor-executable instructions, when executed, further cause the at least one processor to determine the geometrically weighted sum of properties by: the attributes of all points located within the volume are summed and divided by the square root of the count of points located within the volume.

20. Encoder according to any of the claims 16 to 19, wherein the prediction operation is further based on a geometrically weighted sum of respective properties of at least one neighboring volume, the at least one neighboring volume sharing at least one vertex with the volume.

21. Encoder according to claim 20, wherein the prediction operation is based on upsampling the geometrically weighted sum of the properties of the volume and at least one of the geometrically weighted sums of the properties of the neighboring volumes.

22. The encoder of claim 21, wherein upsampling the geometrically weighted sum of the properties of the volume and at least one of the geometrically weighted sums of the properties of the neighboring volumes comprises: the method further includes the steps of geometrically weighting and normalizing the property by dividing by a respective square root of the geometrically weighted sum of the property and a count of points in the respective sum to obtain a respective property mean sum, upsampling the respective property mean sum to generate a predicted property mean sum for each occupied sub-volume, and denormalizing the predicted property mean sum for each occupied sub-volume to obtain the predicted property geometrically weighted sum for each occupied sub-volume of the volume.

23. Encoder according to any of the claims 16 to 22, wherein the transform conforms to DC coefficient properties such that a DC coefficient resulting from a transform of a set of geometrically weighted sums of attributes of child volumes is a geometrically weighted sum of attributes of the volumes.

24. A decoder for decoding a bitstream of encoded attributes of a point cloud located within a space that is recursively split and contains points in the point cloud, each point having a respective attribute, the decoder comprising:

a memory;

at least one processor;

decoding the bitstream to reconstruct residual AC coefficients;

25. The decoder of claim 24, wherein the processor-executable instructions, when executed, further cause the at least one processor to: determining the geometrically weighted sum of the properties of the volume by taking the geometrically weighted sum of the properties of the volume from a decoding of a parent volume, the volume being part of the parent volume.

26. The decoder of claim 24 or 25, wherein the geometrically weighted sum of the attributes of the volume is based on a sum of the attributes of all points located within the volume divided by a square root of a count of points located within the volume.

27. Decoder according to any of claims 24 to 26, wherein the prediction operation is further based on a geometrically weighted sum of respective properties of at least one neighboring volume, the at least one neighboring volume sharing at least one vertex with the volume.

28. The decoder of claim 27, wherein the prediction operation is based on upsampling the geometrically weighted sum of the properties of the volume and at least one of the geometrically weighted sums of the neighboring volumes.

29. The decoder of claim 28, wherein upsampling the geometrically weighted sum of the properties of the volume and at least one of the geometrically weighted sums of the neighboring volumes comprises: the method further includes geometrically weighting and normalizing the property by dividing by a respective square root of the geometrically weighted sum of the property and a count of points in the respective sum to obtain a respective property mean sum, upsampling the respective property mean sum to generate a predicted property mean sum for each occupied sub-volume, and denormalizing the predicted property mean sum for each occupied sub-volume to obtain the predicted property geometrically weighted sum for each occupied sub-volume of the volume.

30. Decoder according to any of the claims 24 to 29, wherein the transform conforms to the DC coefficient properties such that a DC coefficient resulting from a transform of a set of geometrically weighted sums of attributes of child volumes is a geometrically weighted sum of attributes of the volumes.