JP2024501966A

JP2024501966A - Hybrid tree coding for inter and intra prediction for geometry coding

Info

Publication number: JP2024501966A
Application number: JP2023539021A
Authority: JP
Inventors: バッパディトヤ・レイ; アダルシュ・クリシュナン・ラマスブラモニアン; ルオン・ファム・ヴァン; ヘルト・ファン・デル・アウウェラ; マルタ・カルチェヴィッチ
Original assignee: クアルコム，インコーポレイテッド
Priority date: 2020-12-29
Filing date: 2021-12-28
Publication date: 2024-01-17
Also published as: TW202234894A; EP4272166A1; KR20230127219A; WO2022147015A1

Abstract

A device for decoding a bitstream containing point cloud data is to determine an octree that defines an octree-based partition of the space containing the point cloud, wherein the leaf nodes of the octree are: the one or more points at the leaf node, the one or more points at the leaf node being configured to include one or more points of the point cloud, and directly decoding the position of each of the one or more points at the leaf node; The one or more processors generate predictions of the one or more points and, based on the predictions, determine the one or more points in order to directly decode the position of each of the points. It is further configured as follows.

Description

本出願は、2021年12月27日に出願した米国特許出願第17/562,398号、および2020年12月29日に出願した米国仮特許出願第63/131,546号の優先権を主張し、各々の内容全体が参照により本明細書に組み込まれる。2021年12月27日に出願した米国特許出願第17/562,398号は、2020年12月29日に出願した米国仮特許出願第63/131,546号の利益を主張する。 This application claims priority to U.S. Patent Application No. 17/562,398, filed on December 27, 2021, and U.S. Provisional Patent Application No. 63/131,546, filed on December 29, 2020. The entire contents are incorporated herein by reference. U.S. Patent Application No. 17/562,398, filed on December 27, 2021, claims the benefit of U.S. Provisional Patent Application No. 63/131,546, filed on December 29, 2020.

本開示は、点群符号化および復号に関する。 TECHNICAL FIELD This disclosure relates to point cloud encoding and decoding.

「Text of ISO/IEC FDIS 23090-9 Geometry-based Point Cloud Compression」、ISO/IEC JTC 1/SC29/WG 7 MDS19617、Teleconference、2020年10月“Text of ISO/IEC FDIS 23090-9 Geometry-based Point Cloud Compression”, ISO/IEC JTC 1/SC29/WG 7 MDS19617, Teleconference, October 2020 G-PCC Codec Description、ISO/IEC JTC 1/SC29/WG 7 MDS19620、Teleconference、2020年10月G-PCC Codec Description, ISO/IEC JTC 1/SC29/WG 7 MDS19620, Teleconference, October 2020

概して、本開示は、点群圧縮のためのブロックレベルにおける拡張インター/イントラ予測のための、8分木コーディングおよび予測コーディングを組み合わせるハイブリッド木コーディング方法について説明する。 In general, this disclosure describes a hybrid tree coding method that combines octree coding and predictive coding for enhanced inter/intra prediction at the block level for point cloud compression.

一例では、本開示は、点群をコーディングする方法について説明し、方法は、点群を含む空間の8分木ベースの分割を定義する、8分木を決定するステップであって、8分木のリーフノードが点群の1つまたは複数の点を含み、リーフノードにおける1つまたは複数の点の各々の位置が直接シグナリングされる、ステップと、イントラ予測またはインター予測を使用して、1つまたは複数の点の予測を生成するステップと、1つまたは複数の点がイントラ予測を使用して予測されるか、インター予測を使用して予測されるかを示す、シンタックス要素をコーディングするステップとを含む。 In one example, this disclosure describes a method of coding a point cloud, the method comprising: determining an octree that defines an octree-based partition of a space that includes the point cloud; A leaf node of contains one or more points of the point cloud, and the position of each of the one or more points in the leaf node is directly signaled, using intra-prediction or inter-prediction. or generating a prediction for multiple points and coding a syntax element indicating whether the one or more points are predicted using intra prediction or inter prediction. including.

本開示の一例によれば、点群データを含むビットストリームを復号するためのデバイスは、点群データを記憶するためのメモリと、メモリに結合され、回路において実装された1つまたは複数のプロセッサとを含み、1つまたは複数のプロセッサが、点群を含む空間の8分木ベースの分割を定義する、8分木を決定することであって、8分木のリーフノードが、点群の1つまたは複数の点を含む、こと、および、リーフノードにおける1つまたは複数の点の各々の位置を直接復号することを行うように構成され、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するために、1つまたは複数のプロセッサが、1つまたは複数の点の予測を生成すること、および予測に基づいて、1つまたは複数の点を決定することを行うようにさらに構成される。 According to one example of the present disclosure, a device for decoding a bitstream including point cloud data includes a memory for storing point cloud data, and one or more processors coupled to the memory and implemented in circuitry. and the one or more processors determining an octree defining an octree-based partition of a space containing the point cloud, the leaf nodes of the octree defining an octree-based partition of a space containing the point cloud, one or more points, and configured to directly decode the position of each of the one or more points in a leaf node, In order to directly decode the position, the one or more processors further include generating a prediction of the one or more points and determining the one or more points based on the prediction. configured.

本開示の別の例によれば、点群を復号する方法は、点群を含む空間の8分木ベースの分割を定義する、8分木を決定するステップであって、8分木のリーフノードが、点群の1つまたは複数の点を含む、ステップと、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するステップとを含み、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するステップが、1つまたは複数の点の予測を生成するステップと、予測に基づいて、1つまたは複数の点を決定するステップとを含む。 According to another example of the present disclosure, a method for decoding a point cloud includes the steps of: determining an octree that defines an octree-based partition of a space containing the point cloud; the node includes one or more points of the point cloud; directly decoding the position of each of the one or more points at the leaf node; Directly decoding each location includes generating a prediction of the one or more points and determining the one or more points based on the prediction.

本開示の別の例によれば、コンピュータ可読記憶媒体は、命令を記憶し、命令が、1つまたは複数のプロセッサによって実行されたとき、1つまたは複数のプロセッサに、点群を含む空間の8分木ベースの分割を定義する、8分木を決定することであって、8分木のリーフノードが、点群の1つまたは複数の点を含む、こと、および、リーフノードにおける1つまたは複数の点の各々の位置を直接復号することを行わせ、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するために、命令が、1つまたは複数のプロセッサに、1つまたは複数の点の予測を生成すること、および予測に基づいて、1つまたは複数の点を決定することを行わせる。 According to another example of the present disclosure, a computer-readable storage medium stores instructions that, when executed by the one or more processors, cause the one or more processors to perform a defining an octree-based partition, determining an octree, a leaf node of the octree containing one or more points of the point cloud, and one in the leaf node; or causes the one or more processors to directly decode the position of each of the one or more points in the leaf node, or generating a prediction of a plurality of points and determining one or more points based on the prediction.

本開示の別の例によれば、装置は、点群を含む空間の8分木ベースの分割を定義する、8分木を決定するための手段であって、8分木のリーフノードが、点群の1つまたは複数の点を含む、手段と、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するための手段とを含み、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するための手段が、1つまたは複数の点の予測を生成するための手段と、予測に基づいて、1つまたは複数の点を決定するための手段とを備える。 According to another example of the present disclosure, the apparatus is a means for determining an octree that defines an octree-based partition of a space that includes a point cloud, wherein leaf nodes of the octree include: each of the one or more points at the leaf node, including one or more points of the point cloud, and means for directly decoding the position of each of the one or more points at the leaf node. Means for directly decoding the position of comprises means for generating a prediction of one or more points and means for determining one or more points based on the prediction.

1つまたは複数の例の詳細が、添付の図面および以下の説明に記載されている。他の特徴、目的、および利点は、説明、図面、および特許請求の範囲から明らかになるであろう。 The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

本開示の技法を実行し得る例示的な符号化および復号システムを示すブロック図である。FIG. 1 is a block diagram illustrating an example encoding and decoding system that may implement the techniques of this disclosure. 例示的なジオメトリ点群圧縮(G-PCC)エンコーダを示すブロック図である。FIG. 1 is a block diagram illustrating an example geometry point cloud compression (G-PCC) encoder. 例示的なG-PCCデコーダを示すブロック図である。FIG. 2 is a block diagram illustrating an example G-PCC decoder. ジオメトリコーディングのための例示的な8分木分割を示す概念図である。FIG. 2 is a conceptual diagram illustrating an example octree decomposition for geometry coding. 予測木の例を示す概念図である。FIG. 2 is a conceptual diagram showing an example of a prediction tree. 例示的な回転するLidar獲得モデルを示す概念図である。FIG. 2 is a conceptual diagram illustrating an example rotating lidar acquisition model. InterEMのための例示的な動き推定フローチャートを示す概念図である。FIG. 2 is a conceptual diagram illustrating an example motion estimation flowchart for InterEM. グローバル動きの推定のための例示的なアルゴリズムを示す概念図である。FIG. 2 is a conceptual diagram illustrating an example algorithm for global motion estimation. ローカルノード動きベクトルの推定のための例示的なアルゴリズムを示す概念図である。FIG. 2 is a conceptual diagram illustrating an example algorithm for estimating local node motion vectors. 高レベル8分木分割の一例を示す概念図である。FIG. 2 is a conceptual diagram illustrating an example of high-level 8-ary tree partitioning. ローカル予測木生成の一例を示す概念図である。FIG. 2 is a conceptual diagram showing an example of local prediction tree generation. N=M=13である、点の例示的な現在のセット(O0～O12)および点の参照セット(R0～R12)を示す概念図である。FIG. 2 is a conceptual diagram illustrating an exemplary current set of points (O0-O12) and reference set of points (R0-R12), where N=M=13. N=M=13である、点の例示的な現在のセットおよび点の動き補償された参照セットを示す概念図である。FIG. 3 is a conceptual diagram illustrating an exemplary current set of points and a motion compensated reference set of points, where N=M=13. 本開示の1つまたは複数の技法とともに使用され得る、例示的な距離測定システムを示す概念図である。FIG. 1 is a conceptual diagram illustrating an example distance measurement system that may be used with one or more techniques of this disclosure. 本開示の1つまたは複数の技法が使用され得る、例示的な車両ベースシナリオを示す概念図である。1 is a conceptual diagram illustrating an example vehicle-based scenario in which one or more techniques of this disclosure may be used. FIG. 本開示の1つまたは複数の技法が使用され得る、例示的なエクステンデッドリアリティシステムを示す概念図である。1 is a conceptual diagram illustrating an example extended reality system in which one or more techniques of this disclosure may be used. FIG. 本開示の1つまたは複数の技法が使用され得る、例示的なモバイルデバイスシステムを示す概念図である。1 is a conceptual diagram illustrating an example mobile device system in which one or more techniques of this disclosure may be used. 点群データを含むビットストリームを復号するための例示的な動作を示すフローチャートである。2 is a flowchart illustrating example operations for decoding a bitstream that includes point cloud data.

点群とは、3次元(3D)空間における点の集合である。点は、3次元空間内のオブジェクト上の点に対応し得る。したがって、点群は、3次元空間の物理的内容を表すために使用され得る。点群は、多種多様な状況において効用を有することがある。たとえば、点群は、道路上のオブジェクトの位置を表すために、自律車両のコンテキストにおいて使用され得る。別の例では、点群は、拡張現実(AR)または複合現実(MR)適用例において仮想オブジェクトを測位する目的のために、環境の物理的内容を表すコンテキストにおいて使用され得る。点群圧縮とは、点群を符号化および復号するためのプロセスである。点群を符号化することは、点群の記憶および送信のために必要とされるデータの量を低減し得る。 A point cloud is a collection of points in three-dimensional (3D) space. A point may correspond to a point on an object in three-dimensional space. Therefore, point clouds can be used to represent the physical content of three-dimensional space. Point clouds may have utility in a wide variety of situations. For example, point clouds may be used in the context of autonomous vehicles to represent the location of objects on a road. In another example, point clouds may be used in the context of representing the physical contents of an environment for the purpose of positioning virtual objects in augmented reality (AR) or mixed reality (MR) applications. Point cloud compression is a process for encoding and decoding point clouds. Encoding the point cloud may reduce the amount of data required for storage and transmission of the point cloud.

点群の中の点のロケーションをシグナリングするための2つの主要な提案、すなわち、8分木コーディングおよび予測木コーディングが前に存在していた。8分木コーディングを使用して、点群データを符号化することの一部として、G-PCCエンコーダは、8分木を生成し得る。8分木の各ノードは、直方体空間に対応する。8分木のノードは、0個の子ノードまたは8つの子ノードを有し得る。他の例では、ノードは、他の木構造に従って、子ノードに分割され得る。親の子ノードは、親ノードに対応する直方体内の等しいサイズの直方体に対応する。点群の個々の点の位置は、ノードの原点に関してシグナリングされ得る。ノードが点群のいかなる点も含んでいない場合、ノードは占有されていないと言われる。ノードが占有されていない場合、ノードに関する追加のデータをシグナリングする必要がないことがある。逆に、ノードが点群の1つまたは複数の点を含んでいる場合、ノードは占有されていると言われる。 Two major proposals have previously existed for signaling the location of points in a point cloud: octree coding and prediction tree coding. As part of encoding point cloud data using octree coding, the G-PCC encoder may generate an octree. Each node of the octree corresponds to a rectangular parallelepiped space. A node in an octree may have 0 child nodes or 8 child nodes. In other examples, nodes may be split into child nodes according to other tree structures. A child node of a parent corresponds to a cuboid of equal size within the cuboid corresponding to the parent node. The positions of individual points of the point cloud may be signaled with respect to the origin of the node. A node is said to be unoccupied if it does not contain any points in the point cloud. If a node is not occupied, there may be no need to signal additional data about the node. Conversely, a node is said to be occupied if it contains one or more points of the point cloud.

予測木コーディングを使用して、点群データを符号化するとき、G-PCCエンコーダは、点群の各点のための予測モードを決定する。点のための予測モードは、以下のうちの1つであり得る。 When encoding point cloud data using predictive tree coding, the G-PCC encoder determines a prediction mode for each point in the point cloud. The prediction mode for a point can be one of the following:

・予測なし(no prediction)/ゼロ予測(0)
・デルタ予測(p0)
・直線予測(2*p0-p1)
・平行四辺形予測(2*p0+p1-p2)
点のための予測モードが「予測なし/ゼロ予測」である場合、点は、ルート点(すなわち、ルート頂点)と見なされ、点の座標(たとえば、x、y、z座標)は、ビットストリームにおいてシグナリングされる。点のための予測モードが「デルタ予測」である場合、G-PCCエンコーダは、点の座標と、ルート点または他の点など、親点の座標との間の差(すなわち、デルタ)を決定する。予測モードが「直線予測」である場合、G-PCCエンコーダは、2つの親点の座標の直線予測を使用して、点の予測された座標を決定する。G-PCCエンコーダは、直線予測を使用して決定された、予測された座標と、点の実際の座標との間の差をシグナリングする。予測モードが「平行四辺形予測」である場合、G-PCCエンコーダは、3つの親点を使用して、予測された座標を決定する。次いで、G-PCCエンコーダは、予測された座標と点の実際の座標との間の差(たとえば、「1次残差」)をシグナリングする。点の間の予測関係は、点の木を本質的に定義する。・No prediction/zero prediction (0)
・Delta prediction (p0)
・Line prediction (2*p0-p1)
・Parallelogram prediction (2*p0+p1-p2)
If the prediction mode for a point is "No Prediction/Zero Prediction", the point is considered a root point (i.e., root vertex) and the point's coordinates (e.g., x, y, z coordinates) are stored in the bitstream. signaled in If the prediction mode for a point is "delta prediction," the G-PCC encoder determines the difference (i.e., delta) between the coordinates of the point and the coordinates of a parent point, such as the root point or other points. do. If the prediction mode is "linear prediction", the G-PCC encoder uses a linear prediction of the coordinates of the two parent points to determine the predicted coordinates of the point. The G-PCC encoder signals the difference between the predicted coordinates and the actual coordinates of a point, determined using linear prediction. If the prediction mode is "parallelogram prediction", the G-PCC encoder uses three parent points to determine the predicted coordinates. The G-PCC encoder then signals the difference (eg, the "first-order residual") between the predicted coordinates and the actual coordinates of the point. The predictive relationships between points essentially define a tree of points.

8分木コーディングは、予測木コーディングよりも密な点群に好適であり得ることが、実験的に観測されている。3Dモデリングを使用して獲得された点群は、8分木コーディングがより良好に機能するように十分密であることが多い。しかしながら、たとえば、自動車の適用例のために、LiDARを使用して獲得された点群は、やや粗い傾向があり、したがって、予測コーディングは、これらの適用例ではより良好に機能し得る。 It has been experimentally observed that octree coding may be better suited for dense point clouds than predictive tree coding. Point clouds acquired using 3D modeling are often dense enough for octree coding to work better. However, for example, for automotive applications, point clouds acquired using LiDAR tend to be rather coarse, so predictive coding may work better in these applications.

いくつかの例では、角度モードが、球面座標系における点の座標を表すために使用され得る。球面座標系とデカルト(たとえば、x、y、z)座標系との間のコンバージョンプロセスが完全ではないので、情報が失われることがある。しかしながら、G-PCCエンコーダがコンバージョンプロセスを実行することができるので、G-PCCエンコーダは、点の球面座標にコンバージョンプロセスを適用することから生じる点のデカルト座標と、点の元のデカルト座標との間の差を示す、点のための「2次残差」をシグナリングすることができる。 In some examples, angular modes may be used to represent the coordinates of a point in a spherical coordinate system. Because the conversion process between spherical and Cartesian (eg, x, y, z) coordinate systems is not perfect, information may be lost. However, since the G-PCC encoder can perform a conversion process, the G-PCC encoder can convert the Cartesian coordinates of a point resulting from applying the conversion process to the spherical coordinates of the point to the original Cartesian coordinates of the point. A "quadratic residual" can be signaled for the points, indicating the difference between.

本開示は、8分木コーディングと直接コーディングの両方が、点群をコーディングするために使用される、ハイブリッドコーディングモデルに関する。たとえば、8分木コーディングは、空間を特定のレベルまでノードに分割するために最初に使用され得る。特定のレベルにおけるノード(および、さらに分割されない8分木の他の占有されたノード)は、「リーフノード」と呼ばれることがある。リーフノードのボリューム内の点は、「直接」コーディングモードを使用してコーディングされ得る。 This disclosure relates to a hybrid coding model where both octree coding and direct coding are used to code point clouds. For example, octree coding may be first used to partition the space into nodes to a certain level. Nodes at a particular level (and other occupied nodes of the octree that are not further split) are sometimes called "leaf nodes." Points within the volume of leaf nodes may be coded using the "direct" coding mode.

「直接」コーディングモードにおいてリーフノードの点を符号化するとき、G-PCCエンコーダは、リーフノードのためのイントラ予測モード、またはリーフノードのためのインター予測モードを選択し得る。G-PCCエンコーダは、リーフノードの点がイントラ予測モードを使用して符号化されるか、インター予測モードを使用して符号化されるかをシグナリングし得る。 When encoding the points of a leaf node in the "direct" coding mode, the G-PCC encoder may select an intra-prediction mode for the leaf node or an inter-prediction mode for the leaf node. The G-PCC encoder may signal whether a leaf node point is encoded using an intra-prediction mode or an inter-prediction mode.

G-PCCエンコーダが、リーフノードのためのイントラ予測モードを選択する場合、G-PCCエンコーダは、上記で説明したものとほとんど同じ様式で、予測木コーディングを使用して、リーフノードにおける点を符号化し得る。すなわち、G-PCCエンコーダは、4つの予測モードの中から選択し、それに応じて、点のための座標をシグナリングし得る。しかしながら、8分木に関連付けられた空間全体の原点に対する座標をシグナリングするのではなく、G-PCCエンコーダは、リーフノードの原点に対する座標をシグナリングし得る。これによって、特にルートノードのための、コーディング効率が向上し得る。 If the G-PCC encoder selects intra-prediction mode for a leaf node, the G-PCC encoder encodes the points at the leaf nodes using predictive tree coding in much the same manner as described above. can be converted into That is, the G-PCC encoder may select among four prediction modes and signal the coordinates for the points accordingly. However, rather than signaling the coordinates relative to the origin of the entire space associated with the octree, the G-PCC encoder may signal the coordinates relative to the origin of the leaf nodes. This may improve coding efficiency, especially for the root node.

G-PCCエンコーダが、リーフノードのためのインター予測モードを選択する場合、G-PCCエンコーダは、参照フレーム内の点のセットに関して、リーフノードにおける点を符号化し得る。参照フレームは、ビデオの前のフレームに類似する、前にコーディングされたフレームであり得る。G-PCCエンコーダは、リーフノードにおける点と同様の空間的配置を有する、参照フレーム内の点のセットを識別するために、動き推定を実行し得る。リーフノードのための動きベクトルは、リーフノードの点と、参照フレーム内の点の識別されたセットとの間の変位を示す。 If the G-PCC encoder selects inter-prediction mode for a leaf node, the G-PCC encoder may encode the points at the leaf node with respect to the set of points in the reference frame. A reference frame may be a previously coded frame that is similar to a previous frame of the video. The G-PCC encoder may perform motion estimation to identify a set of points in the reference frame that have a similar spatial arrangement as the points in the leaf nodes. A motion vector for a leaf node indicates the displacement between a point of the leaf node and an identified set of points in the reference frame.

G-PCCエンコーダは、リーフノードのためのパラメータのセットをシグナリングし得る。リーフノードのためのパラメータは、参照フレームを識別する参照インデックスを含み得る。リーフノードのためのパラメータはまた、リーフノードにおける点の数を示す値も含み得る。 A G-PCC encoder may signal a set of parameters for leaf nodes. Parameters for leaf nodes may include a reference index that identifies a reference frame. Parameters for leaf nodes may also include a value indicating the number of points in the leaf node.

リーフノードのパラメータはまた、リーフノードにおける点の各々のための残差値も含み得る。リーフノードにおける点のための残差値は、(リーフノードにおける点に対応する参照フレーム内の点に、リーフノードの動きベクトルを加算することによって決定された)リーフノードの予測された座標間の差を示す。角度モードが使用される例では、G-PCCエンコーダはまた、点のための2次残差もシグナリングし得る。 The leaf node parameters may also include residual values for each of the points in the leaf node. The residual value for a point at a leaf node is the difference between the leaf node's predicted coordinates (determined by adding the leaf node's motion vector to the point in the reference frame that corresponds to the point at the leaf node). Show the difference. In examples where angular mode is used, the G-PCC encoder may also signal the quadratic residual for the points.

いくつかの例では、リーフノードのためのパラメータはまた、動きベクトル差分(MVD)も含む。MVDは、リーフノードの動きベクトルと予測された動きベクトルとの間の差を示す。予測された動きベクトルは、8分木の近隣ノードの動きベクトルである。リーフノードのためのパラメータは、近隣ノードを識別するインデックスを含み得る。 In some examples, the parameters for leaf nodes also include motion vector differences (MVD). MVD indicates the difference between the leaf node's motion vector and the predicted motion vector. The predicted motion vector is the motion vector of the neighboring nodes of the octree. Parameters for leaf nodes may include indices that identify neighboring nodes.

他の例では、従来のビデオコーディングにおけるマージモードに類似して、リーフノードのためのパラメータは、MVDを含まず、リーフノードの動きベクトルは、識別された近隣ノードの動きベクトルと同じであると仮定され得る。 In another example, similar to merge mode in traditional video coding, the parameters for leaf nodes do not include MVD and the motion vector of the leaf node is the same as the motion vector of the identified neighbor node. It can be assumed.

いくつかの例では、残差のシグナリングがスキップされ得る。角度モードが使用される、いくつかのそのような例では、2次残差を依然としてシグナリングしながら、1次残差のシグナリングがスキップされ得る。 In some examples, residual signaling may be skipped. In some such instances where angular mode is used, the signaling of the first-order residual may be skipped while still signaling the second-order residual.

図1は、本開示の技法を実行し得る例示的な符号化および復号システム100を示すブロック図である。本開示の技法は、一般に、点群データをコーディング(符号化および/または復号)すること、すなわち、点群圧縮をサポートすることを対象とする。一般に、点群データは、点群を処理するための任意のデータを含む。コーディングは、点群データを圧縮および/または圧縮解除するのに有効であり得る。 FIG. 1 is a block diagram illustrating an example encoding and decoding system 100 that may implement the techniques of this disclosure. The techniques of this disclosure are generally directed to coding (encoding and/or decoding) point cloud data, i.e., supporting point cloud compression. Generally, point cloud data includes any data for processing point clouds. Coding may be effective in compressing and/or decompressing point cloud data.

図1に示すように、システム100は、ソースデバイス102および宛先デバイス116を含む。ソースデバイス102は、宛先デバイス116によって復号されるべき符号化された点群データを提供する。詳細には、図1の例では、ソースデバイス102は、コンピュータ可読記録媒体110を介して宛先デバイス116に点群データを提供する。ソースデバイス102および宛先デバイス116は、デスクトップコンピュータ、ノートブック(すなわち、ラップトップ)コンピュータ、タブレットコンピュータ、セットトップボックス、スマートフォンなどの電話ハンドセット、テレビジョン、カメラ、ディスプレイデバイス、デジタルメディアプレーヤ、ビデオゲーミングコンソール、ビデオストリーミングデバイス、地上または海上車両、宇宙船、航空機、ロボット、LIDARデバイス、衛星などを含む、広範囲のデバイスのいずれかを備え得る。場合によっては、ソースデバイス102および宛先デバイス116は、ワイヤレス通信のために装備され得る。 As shown in FIG. 1, system 100 includes a source device 102 and a destination device 116. Source device 102 provides encoded point cloud data to be decoded by destination device 116. Specifically, in the example of FIG. 1, source device 102 provides point cloud data to destination device 116 via computer-readable storage medium 110. Source device 102 and destination device 116 may include desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as smartphones, televisions, cameras, display devices, digital media players, video gaming consoles, etc. It may comprise any of a wide range of devices, including video streaming devices, ground or sea vehicles, spacecraft, aircraft, robots, LIDAR devices, satellites, etc. In some cases, source device 102 and destination device 116 may be equipped for wireless communication.

図1の例では、ソースデバイス102は、データソース104、メモリ106、G-PCCエンコーダ200、および出力インターフェース108を含む。宛先デバイス116は、入力インターフェース122、G-PCCデコーダ300、メモリ120、およびデータコンシューマ118を含む。本開示によれば、ソースデバイス102のG-PCCエンコーダ200および宛先デバイス116のG-PCCデコーダ300は、点群圧縮のためのブロックレベルにおける拡張インター/イントラ予測のための、8分木コーディングおよび予測コーディングを組み合わせるハイブリッド木コーディング方法に関連する、本開示の技法を適用するように構成され得る。したがって、ソースデバイス102は符号化デバイスの一例を表し、宛先デバイス116は復号デバイスの一例を表す。他の例では、ソースデバイス102および宛先デバイス116は、他の構成要素または配置を含んでよい。たとえば、ソースデバイス102は、内部または外部ソースからデータ(たとえば、点群データ)を受信し得る。同様に、宛先デバイス116は、同じデバイスの中にデータコンシューマを含むのではなく、外部データコンシューマとインターフェースし得る。 In the example of FIG. 1, source device 102 includes data source 104, memory 106, G-PCC encoder 200, and output interface 108. Destination device 116 includes input interface 122, G-PCC decoder 300, memory 120, and data consumer 118. According to this disclosure, G-PCC encoder 200 of source device 102 and G-PCC decoder 300 of destination device 116 perform 8-tree coding and The techniques of this disclosure may be configured to apply techniques related to hybrid tree coding methods that combine predictive coding. Thus, source device 102 represents an example of an encoding device and destination device 116 represents an example of a decoding device. In other examples, source device 102 and destination device 116 may include other components or arrangements. For example, source device 102 may receive data (eg, point cloud data) from an internal or external source. Similarly, destination device 116 may interface with external data consumers rather than including the data consumers within the same device.

図1に示すようなシステム100は一例にすぎない。概して、他のデジタル符号化および/または復号デバイスが、点群圧縮のためのブロックレベルにおける拡張インター/イントラ予測のための、8分木コーディングおよび予測コーディングを組み合わせるハイブリッド木コーディング方法に関連する、本開示の技法を実行し得る。ソースデバイス102および宛先デバイス116は、ソースデバイス102が宛先デバイス116への送信のためにコード化データを生成するようなデバイスの例にすぎない。本開示は、データのコーディング(符号化および/または復号)を実行するデバイスを「コーディング」デバイスと呼ぶ。したがって、G-PCCエンコーダ200およびG-PCCデコーダ300は、コーディングデバイス、具体的には、それぞれ、エンコーダおよびデコーダの例を表す。いくつかの例では、ソースデバイス102および宛先デバイス116は、ソースデバイス102および宛先デバイス116の各々が符号化および復号構成要素を含むように、実質的に対称的に動作し得る。したがって、システム100は、たとえば、ストリーミング、再生、ブロードキャスト、テレフォニー、ナビゲーション、および他の適用例のために、ソースデバイス102と宛先デバイス116との間の一方向または双方向送信をサポートし得る。 A system 100 as shown in FIG. 1 is only one example. In general, other digital encoding and/or decoding devices are related to hybrid tree coding methods that combine octree coding and predictive coding for enhanced inter/intra prediction at block level for point cloud compression. The disclosed techniques may be implemented. Source device 102 and destination device 116 are only examples of devices for which source device 102 generates encoded data for transmission to destination device 116. This disclosure refers to a device that performs coding (encoding and/or decoding) of data as a "coding" device. Accordingly, G-PCC encoder 200 and G-PCC decoder 300 represent examples of coding devices, specifically encoders and decoders, respectively. In some examples, source device 102 and destination device 116 may operate substantially symmetrically, such that source device 102 and destination device 116 each include encoding and decoding components. Thus, system 100 may support one-way or two-way transmission between source device 102 and destination device 116, for example, for streaming, playback, broadcast, telephony, navigation, and other applications.

一般に、データソース104は、データ(すなわち、未加工の符号化されていない点群データ)のソースを表し、データの逐次的な一連の「フレーム」をG-PCCエンコーダ200に提供してよく、G-PCCエンコーダ200はフレームのためのデータを符号化する。ソースデバイス102のデータソース104は、様々なカメラもしくはセンサー、たとえば、3Dスキャナもしくは光検出および測距(LIDAR)デバイス、1つもしくは複数のビデオカメラ、以前キャプチャされたデータを含むアーカイブ、ならびに/またはデータコンテンツプロバイダからデータを受信するためのデータフィードインターフェースのうちのいずれかなどの点群キャプチャデバイスを含み得る。代替または追加として、点群データは、スキャナ、カメラ、センサーまたは他のデータからコンピュータ生成され得る。たとえば、データソース104は、コンピュータグラフィックスベースのデータをソースデータとして生成するか、またはライブデータとアーカイブされたデータとコンピュータ生成されたデータとの組合せを生じ得る。各場合において、G-PCCエンコーダ200は、キャプチャされたデータ、事前にキャプチャされたデータ、またはコンピュータ生成されたデータを符号化する。G-PCCエンコーダ200は、受信された順序(「表示順序」と呼ばれることがある)からコーディング用のコーディング順序にフレームを並べ替え得る。G-PCCエンコーダ200は、符号化されたデータを含む1つまたは複数のビットストリームを生成し得る。次いで、ソースデバイス102は、たとえば、宛先デバイス116の入力インターフェース122による受信および/または取出しのために、符号化されたデータを、出力インターフェース108を介してコンピュータ可読記録媒体110上に出力し得る。 Generally, data source 104 represents a source of data (i.e., raw, unencoded point cloud data) and may provide a sequential series of “frames” of data to G-PCC encoder 200; G-PCC encoder 200 encodes data for the frame. The data sources 104 of the source devices 102 may include various cameras or sensors, such as 3D scanners or light detection and ranging (LIDAR) devices, one or more video cameras, archives containing previously captured data, and/or It may include a point cloud capture device, such as any of a data feed interface for receiving data from a data content provider. Alternatively or additionally, point cloud data may be computer generated from scanners, cameras, sensors or other data. For example, data source 104 may generate computer graphics-based data as the source data, or may produce a combination of live, archived, and computer-generated data. In each case, G-PCC encoder 200 encodes captured data, pre-captured data, or computer-generated data. G-PCC encoder 200 may reorder frames from a received order (sometimes referred to as a "display order") to a coding order for coding. G-PCC encoder 200 may generate one or more bitstreams containing encoded data. Source device 102 may then output the encoded data via output interface 108 onto computer-readable storage medium 110, for example, for reception and/or retrieval by input interface 122 of destination device 116.

ソースデバイス102のメモリ106および宛先デバイス116のメモリ120は、汎用メモリを表す場合がある。いくつかの例では、メモリ106およびメモリ120は、未加工データ、たとえば、データソース104からの未加工データ、およびG-PCCデコーダ300からの未加工の復号データを記憶し得る。追加または代替として、メモリ106およびメモリ120は、たとえば、それぞれ、G-PCCエンコーダ200およびG-PCCデコーダ300によって実行可能なソフトウェア命令を記憶し得る。メモリ106およびメモリ120は、この例ではG-PCCエンコーダ200およびG-PCCデコーダ300とは別々に示されているが、G-PCCエンコーダ200およびG-PCCデコーダ300は、機能的に同様のまたは等価な目的で内部メモリも含み得ることを理解されたい。さらに、メモリ106およびメモリ120は、たとえば、G-PCCエンコーダ200から出力されG-PCCデコーダ300に入力される、符号化されたデータを記憶し得る。いくつかの例では、メモリ106およびメモリ120の一部は、たとえば、未加工の復号および/または符号化されたデータを記憶するための、1つまたは複数のバッファとして割り振られ得る。たとえば、メモリ106およびメモリ120は、点群を表すデータを記憶し得る。 Memory 106 of source device 102 and memory 120 of destination device 116 may represent general purpose memory. In some examples, memory 106 and memory 120 may store raw data, eg, raw data from data source 104 and raw decoded data from G-PCC decoder 300. Additionally or alternatively, memory 106 and memory 120 may store software instructions executable by G-PCC encoder 200 and G-PCC decoder 300, respectively, for example. Although memory 106 and memory 120 are shown separately from G-PCC encoder 200 and G-PCC decoder 300 in this example, G-PCC encoder 200 and G-PCC decoder 300 may be functionally similar or It should be understood that internal memory may also be included for equivalent purposes. Additionally, memory 106 and memory 120 may store encoded data that is output from G-PCC encoder 200 and input to G-PCC decoder 300, for example. In some examples, a portion of memory 106 and memory 120 may be allocated as one or more buffers, eg, for storing raw decoded and/or encoded data. For example, memory 106 and memory 120 may store data representing point clouds.

コンピュータ可読記録媒体110は、符号化されたデータをソースデバイス102から宛先デバイス116にトランスポートすることが可能な任意のタイプの媒体またはデバイスを表す場合がある。一例では、コンピュータ可読記録媒体110は、たとえば、無線周波数ネットワークまたはコンピュータベースのネットワークを介して、ソースデバイス102が符号化されたデータを宛先デバイス116にリアルタイムで直接送信することを可能にする通信媒体を表す。ワイヤレス通信プロトコルなどの通信規格に従って、出力インターフェース108は、符号化されたデータを含む送信信号を変調してよく、入力インターフェース122は、受信された送信信号を復調してよい。通信媒体は、無線周波数(RF)スペクトル、または1つもしくは複数の物理伝送線路など、任意のワイヤレス通信媒体またはワイヤード通信媒体を備え得る。通信媒体は、ローカルエリアネットワーク、ワイドエリアネットワーク、またはインターネットなどのグローバルネットワークなど、パケットベースのネットワークの一部を形成し得る。通信媒体は、ルータ、スイッチ、基地局、またはソースデバイス102から宛先デバイス116への通信を容易にするために有用であり得る任意の他の機器を含み得る。 Computer-readable storage medium 110 may represent any type of medium or device that can transport encoded data from source device 102 to destination device 116. In one example, computer-readable storage medium 110 is a communication medium that enables source device 102 to transmit encoded data directly to destination device 116 in real-time, such as over a radio frequency network or a computer-based network. represents. In accordance with a communication standard, such as a wireless communication protocol, output interface 108 may modulate transmitted signals containing encoded data, and input interface 122 may demodulate received transmitted signals. A communication medium may comprise any wireless or wired communication medium, such as the radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, wide area network, or global network such as the Internet. Communication media may include routers, switches, base stations, or any other equipment that may be useful for facilitating communication from source device 102 to destination device 116.

いくつかの例では、ソースデバイス102は、符号化されたデータを出力インターフェース108から記憶デバイス112に出力し得る。同様に、宛先デバイス116は、入力インターフェース122を介して、記憶デバイス112からの符号化されたデータにアクセスし得る。記憶デバイス112は、ハードドライブ、ブルーレイディスク、DVD、CD-ROM、フラッシュメモリ、揮発性もしくは不揮発性メモリ、または符号化されたデータを記憶するための任意の他の好適なデジタル記憶媒体のような、種々の分散型またはローカルにアクセスされるデータ記憶媒体のいずれかを含み得る。 In some examples, source device 102 may output encoded data from output interface 108 to storage device 112. Similarly, destination device 116 may access encoded data from storage device 112 via input interface 122. Storage device 112 may be a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded data. , may include any of a variety of distributed or locally accessed data storage media.

いくつかの例では、ソースデバイス102は、ソースデバイス102によって生成された符号化されたデータを記憶し得るファイルサーバ114または別の中間記憶デバイスに、符号化されたデータを出力してよい。宛先デバイス116は、ストリーミングまたはダウンロードを介して、ファイルサーバ114からの記憶されたデータにアクセスし得る。ファイルサーバ114は、符号化されたデータを記憶し、その符号化されたデータを宛先デバイス116に送信することが可能な任意のタイプのサーバデバイスであってよい。ファイルサーバ114は、(たとえば、ウェブサイト用の)ウェブサーバ、ファイル転送プロトコル(FTP)サーバ、コンテンツ配信ネットワークデバイス、またはネットワーク接続ストレージ(NAS)デバイスを表し得る。宛先デバイス116は、インターネット接続を含む任意の標準的なデータ接続を通じて、ファイルサーバ114からの符号化されたデータにアクセスし得る。これは、ワイヤレスチャネル(たとえば、Wi-Fi接続)、ワイヤード接続(たとえば、デジタル加入者回線(DSL)、ケーブルモデムなど)、またはファイルサーバ114上に記憶された符号化されたデータにアクセスするのに適した両方の組合せを含み得る。ファイルサーバ114および入力インターフェース122は、ストリーミング送信プロトコル、ダウンロード送信プロトコル、またはそれらの組合せに従って動作するように構成され得る。 In some examples, source device 102 may output encoded data to file server 114 or another intermediate storage device that may store encoded data generated by source device 102. Destination device 116 may access stored data from file server 114 via streaming or downloading. File server 114 may be any type of server device capable of storing encoded data and transmitting the encoded data to destination device 116. File server 114 may represent a web server (eg, for a website), a file transfer protocol (FTP) server, a content distribution network device, or a network attached storage (NAS) device. Destination device 116 may access encoded data from file server 114 through any standard data connection, including an Internet connection. This can be used to access encoded data stored on a wireless channel (e.g., Wi-Fi connection), a wired connection (e.g., digital subscriber line (DSL), cable modem, etc.), or on a file server 114. may include combinations of both as appropriate. File server 114 and input interface 122 may be configured to operate according to streaming transmission protocols, download transmission protocols, or a combination thereof.

出力インターフェース108および入力インターフェース122は、ワイヤレス送信機/受信機、モデム、ワイヤードネットワーキング構成要素(たとえば、イーサネットカード)、種々のIEEE 802.11規格のいずれかに従って動作するワイヤレス通信構成要素、または他の物理構成要素を表し得る。出力インターフェース108および入力インターフェース122がワイヤレス構成要素を備える例では、出力インターフェース108および入力インターフェース122は、4G、4G-LTE(ロングタームエボリューション)、LTEアドバンスト、5Gなどのセルラー通信規格に従って、符号化されたデータなどのデータを転送するように構成され得る。出力インターフェース108がワイヤレス送信機を備えるいくつかの例では、出力インターフェース108および入力インターフェース122は、IEEE 802.11仕様、IEEE 802.15仕様(たとえば、ZigBee(商標))、Bluetooth(商標)規格などの他のワイヤレス規格に従って符号化されたデータなどのデータを転送するように構成され得る。いくつかの例では、ソースデバイス102および/または宛先デバイス116は、それぞれのシステムオンチップ(SoC)デバイスを含み得る。たとえば、ソースデバイス102は、G-PCCエンコーダ200および/または出力インターフェース108に起因する機能を実行するためのSoCデバイスを含むことができ、宛先デバイス116は、G-PCCデコーダ300および/または入力インターフェース122に起因する機能を実行するためのSoCデバイスを含むことができる。 Output interface 108 and input interface 122 may be a wireless transmitter/receiver, modem, wired networking component (e.g., an Ethernet card), a wireless communication component operating in accordance with any of various IEEE 802.11 standards, or other physical configuration. Can represent an element. In examples where output interface 108 and input interface 122 comprise wireless components, output interface 108 and input interface 122 are encoded according to cellular communication standards such as 4G, 4G-LTE (Long Term Evolution), LTE Advanced, 5G, etc. may be configured to transfer data, such as data that has been In some examples where output interface 108 comprises a wireless transmitter, output interface 108 and input interface 122 may be compatible with other wireless transmitters, such as the IEEE 802.11 specification, the IEEE 802.15 specification (e.g., ZigBee(TM)), the Bluetooth(TM) standard, etc. It may be configured to transfer data, such as data encoded according to a standard. In some examples, source device 102 and/or destination device 116 may include respective system-on-chip (SoC) devices. For example, source device 102 may include a SoC device for performing functions due to G-PCC encoder 200 and/or output interface 108, and destination device 116 may include G-PCC decoder 300 and/or input interface 122 may include an SoC device for performing functions attributed to the 122.

本開示の技法は、自律車両の間の通信、スキャナ、カメラ、センサーと、ローカルサーバもしくはリモートサーバなどの処理デバイスとの間の通信、地理的マッピング、または他の適用例などの、様々な適用例のうちのいずれかをサポートする符号化および復号に適用され得る。 The techniques of this disclosure may be used in a variety of applications, such as communication between autonomous vehicles, communication between scanners, cameras, sensors and processing devices such as local or remote servers, geographic mapping, or other applications. Any of the examples may be applied to support encoding and decoding.

宛先デバイス116の入力インターフェース122は、コンピュータ可読記録媒体110(たとえば、通信媒体、記憶デバイス112、ファイルサーバ114など)から、符号化されたビットストリームを受信する。符号化されたビットストリームは、コード化単位(たとえば、スライス、ピクチャ、ピクチャグループ、シーケンスなど)の特性および/または処理を記述する値を有するシンタックス要素などの、G-PCCデコーダ300によっても使用される、G-PCCエンコーダ200によって定義されるシグナリング情報を含み得る。データコンシューマ118は、復号データを使用する。たとえば、データコンシューマ118は、物理オブジェクトのロケーションを決定するために、復号データを使用し得る。いくつかの例では、データコンシューマ118は、点群に基づいて像を提示するためのディスプレイを備え得る。 Input interface 122 of destination device 116 receives an encoded bitstream from computer-readable storage medium 110 (eg, a communication medium, storage device 112, file server 114, etc.). The encoded bitstream is also used by the G-PCC decoder 300, such as syntax elements having values that describe the characteristics and/or processing of the coding unit (e.g., slice, picture, picture group, sequence, etc.) The G-PCC encoder 200 may include signaling information defined by the G-PCC encoder 200. Data consumer 118 uses the decrypted data. For example, data consumer 118 may use the decoded data to determine the location of a physical object. In some examples, data consumer 118 may include a display to present an image based on the point cloud.

G-PCCエンコーダ200およびG-PCCデコーダ300は各々、1つまたは複数のマイクロプロセッサ、デジタル信号プロセッサ(DSP)、特定用途向け集積回路(ASIC)、フィールドプログラマブルゲートアレイ(FPGA)、ディスクリート論理、ソフトウェア、ハードウェア、ファームウェア、またはそれらの任意の組合せなどの、様々な好適なエンコーダおよび/またはデコーダ回路のいずれかとして実装され得る。技法が部分的にソフトウェアにおいて実装されるとき、デバイスは、好適な非一時的コンピュータ可読記録媒体にソフトウェア用の命令を記憶し、本開示の技法を実行するために1つまたは複数のプロセッサを使用してハードウェアにおいて命令を実行し得る。G-PCCエンコーダ200およびG-PCCデコーダ300の各々は、1つまたは複数のエンコーダまたはデコーダの中に含まれてよく、それらのいずれも、それぞれのデバイスの中で複合エンコーダ/デコーダ(コーデック)の一部として統合されてよい。G-PCCエンコーダ200および/またはG-PCCデコーダ300を含むデバイスは、1つまたは複数の集積回路、マイクロプロセッサ、および/または他のタイプのデバイスを備え得る。 G-PCC encoder 200 and G-PCC decoder 300 each include one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software , hardware, firmware, or any combination thereof. When the techniques are partially implemented in software, the device stores instructions for the software on a suitable non-transitory computer-readable storage medium and uses one or more processors to execute the techniques of this disclosure. instructions may be executed in hardware. Each of the G-PCC encoder 200 and the G-PCC decoder 300 may be included within one or more encoders or decoders, any of which may be a combined encoder/decoder (codec) within the respective device. May be integrated as part. A device including G-PCC encoder 200 and/or G-PCC decoder 300 may include one or more integrated circuits, microprocessors, and/or other types of devices.

G-PCCエンコーダ200およびG-PCCデコーダ300は、ビデオ点群圧縮(V-PCC)規格またはジオメトリ点群圧縮(G-PCC)規格などのコーディング規格に従って動作し得る。本開示は、一般に、データを符号化または復号するプロセスを含めるように、ピクチャのコーディング(たとえば、符号化および復号)に言及することがある。符号化されたビットストリームは、一般に、コーディング決定(たとえば、コーディングモード)を表すシンタックス要素のための一連の値を含む。 G-PCC encoder 200 and G-PCC decoder 300 may operate according to a coding standard, such as a video point cloud compression (V-PCC) standard or a geometry point cloud compression (G-PCC) standard. This disclosure may generally refer to coding (eg, encoding and decoding) pictures to include the process of encoding or decoding data. An encoded bitstream typically includes a series of values for syntax elements that represent coding decisions (eg, coding modes).

本開示は、概して、シンタックス要素などのいくつかの情報を「シグナリングすること」に言及することがある。「シグナリング」という用語は、一般に、シンタックス要素および/または符号化されたデータを復号するために使用される他のデータのための値の通信を指すことがある。すなわち、G-PCCエンコーダ200は、ビットストリーム中でシンタックス要素のための値をシグナリングし得る。一般に、シグナリングは、ビットストリーム中で値を生成することを指す。上述のように、ソースデバイス102は、実質的にリアルタイムで、または、宛先デバイス116によって後で取り出すためにシンタックス要素を記憶デバイス112に記憶するときに行われ得るなど、リアルタイムではなく、ビットストリームを宛先デバイス116にトランスポートし得る。 This disclosure may generally refer to "signaling" some information, such as syntax elements. The term "signaling" may generally refer to the communication of values for syntax elements and/or other data used to decode encoded data. That is, G-PCC encoder 200 may signal values for syntax elements in the bitstream. Generally, signaling refers to producing a value in a bitstream. As mentioned above, source device 102 may generate a bitstream in substantially real-time or in a non-real-time manner, such as when storing syntax elements on storage device 112 for later retrieval by destination device 116. may be transported to destination device 116.

ISO/IEC MPEG(JTC1/SC29/WG11)は、現行の手法のものを大幅に超える圧縮能力をもつ点群コーディング技術の規格化に対する潜在的必要性を調査している。このグループは、この分野の専門家により提案されている圧縮技術の設計を評価するために、3-Dimensional Graphics Team(3DG)として知られている共同研究において、この調査活動に一緒に取り組んでいる。 ISO/IEC MPEG (JTC1/SC29/WG11) is investigating the potential need for standardization of point cloud coding techniques with compression capabilities that significantly exceed those of current methods. The group is working together on this research effort in a collaboration known as the 3-Dimensional Graphics Team (3DG) to evaluate compression technology designs proposed by experts in the field. .

点群圧縮活動は、2つの異なる手法にカテゴリー化される。第1の手法は「ビデオ点群圧縮」(V-PCC)であり、これは、3Dオブジェクトをセグメント化し、セグメントを複数の2D平面(2Dフレームにおける「パッチ」として表される)に投射し、これらは、高効率ビデオコーディング(HEVC)(ITU-T H.265)コーデックなどのレガシー2Dビデオコーデックによってさらにコーディングされる。第2の手法は「ジオメトリベース点群圧縮」(G-PCC)であり、これは、3Dジオメトリ、すなわち、3D空間中の点のセットの位置と、(3Dジオメトリに関連付けられた各点についての)関連付けられた属性値とを直接圧縮する。G-PCCは、カテゴリー1(静的な点群)とカテゴリー3(動的に獲得された点群)の両方における点群の圧縮に対処する。G-PCC規格の最近の草案が、「Text of ISO/IEC FDIS 23090-9 Geometry-based Point Cloud Compression」、ISO/IEC JTC 1/SC29/WG 7 MDS19617、Teleconference、2020年10月において入手可能であり、コーデックの記述が、G-PCC Codec Description、ISO/IEC JTC 1/SC29/WG 7 MDS19620、Teleconference、2020年10月において入手可能である。 Point cloud compression activities are categorized into two different techniques. The first technique is "Video Point Cloud Compression" (V-PCC), which segments a 3D object, projects the segments onto multiple 2D planes (represented as "patches" in a 2D frame), and These are further coded by a legacy 2D video codec, such as the High Efficiency Video Coding (HEVC) (ITU-T H.265) codec. The second technique is "geometry-based point cloud compression" (G-PCC), which uses a 3D geometry, i.e., the location of a set of points in 3D space, and (for each point associated with the 3D geometry) ) and the associated attribute values directly. G-PCC addresses point cloud compression in both category 1 (static point clouds) and category 3 (dynamically acquired point clouds). A recent draft of the G-PCC standard is available in "Text of ISO/IEC FDIS 23090-9 Geometry-based Point Cloud Compression", ISO/IEC JTC 1/SC29/WG 7 MDS19617, Teleconference, October 2020. Yes, a description of the codec is available in G-PCC Codec Description, ISO/IEC JTC 1/SC29/WG 7 MDS19620, Teleconference, October 2020.

点群は、3D空間中の点のセットを含み、点に関連付けられた属性を有してよい。属性は、R、G、B、もしくはY、Cb、Crなどの色情報、または反射率情報、あるいは他の属性であってよい。点群は、LIDARセンサーおよび3Dスキャナなど、様々なカメラまたはセンサーによってキャプチャされてよく、コンピュータ生成されてもよい。点群データは、限定はしないが、構築(モデリング)、グラフィックス(可視化およびアニメーションのための3Dモデル)、および自動車産業(ナビゲーションを助けるために使用されるLIDARセンサー)を含む、様々な適用例において使用される。 A point cloud includes a set of points in 3D space and may have attributes associated with the points. The attributes may be color information such as R, G, B, or Y, Cb, Cr, or reflectance information, or other attributes. Point clouds may be captured by various cameras or sensors, such as LIDAR sensors and 3D scanners, and may be computer generated. Point cloud data has a variety of applications including, but not limited to, construction (modeling), graphics (3D models for visualization and animation), and the automotive industry (LIDAR sensors used to aid navigation). used in

点群データによって占有される3D空間は、仮想境界ボックスによって囲まれてよい。境界ボックス中での点の位置は、一定の精度で表されてよく、したがって、1つまたは複数の点の位置は、精度に基づいて量子化されてよい。最小レベルにおいて、境界ボックスはボクセルに分割され、ボクセルは、単位立方体によって表される、空間の最小単位である。境界ボックス中のボクセルは、ゼロ、1つ、または複数の点に関連付けられてよい。境界ボックスは、タイルと呼ばれ得る、複数の立方体/直方体領域に分割されてよい。各タイルは、1つまたは複数のスライスにコーディングされ得る。境界ボックスの、スライスおよびタイルへの区分は、各区分中の点の数に基づいて、または他の事項に基づいてよい(たとえば、特定の領域がタイルとしてコーディングされてよい)。スライス領域は、ビデオコーデックにおけるものと同様の分割決定を使って、さらに区分されてよい。 The 3D space occupied by the point cloud data may be surrounded by a virtual bounding box. The position of a point within a bounding box may be represented with a certain precision, and therefore the position of one or more points may be quantized based on the precision. At the minimal level, the bounding box is divided into voxels, which are the smallest units of space represented by unit cubes. Voxels in the bounding box may be associated with zero, one, or multiple points. The bounding box may be divided into multiple cubic/cuboid regions, which may be called tiles. Each tile may be coded into one or more slices. The partitioning of the bounding box into slices and tiles may be based on the number of points in each partition or based on other considerations (eg, particular regions may be coded as tiles). The slice regions may be further partitioned using partitioning decisions similar to those in video codecs.

図2は、G-PCCエンコーダ200の概要を提供する。図3は、G-PCCデコーダ300の概要を提供する。図示されるモジュールは論理的であり、G-PCCコーデックの基準実装、すなわち、ISO/IEC MPEG(JTC1/SC29/WG11)によって研究されるTMC13テストモデルソフトウェアにおいて実装されるコードに必ずしも1対1で対応するわけではない。 FIG. 2 provides an overview of G-PCC encoder 200. FIG. 3 provides an overview of G-PCC decoder 300. The illustrated modules are logical and do not necessarily correspond one-to-one to the code implemented in the reference implementation of the G-PCC codec, i.e. the TMC13 test model software studied by ISO/IEC MPEG (JTC1/SC29/WG11). It does not correspond.

G-PCCエンコーダ200とG-PCCデコーダ300の両方において、点群位置が最初にコーディングされる。属性コーディングは、復号されたジオメトリに依存する。図2の表面近似分析ユニット212およびRAHTユニット218、ならびに図3の表面近似合成ユニット310およびRAHTユニット314は、典型的にはカテゴリー1データのために使用されるオプションである。LOD生成ユニット220および222のリフティングユニット、ならびに図3のLOD生成ユニット316および逆リフティングユニット318は、典型的にはカテゴリー3データのために使用されるオプションである。他のモジュールはすべて、カテゴリー1と3との間で共通である。 In both G-PCC encoder 200 and G-PCC decoder 300, point cloud positions are first coded. Attribute coding depends on the decoded geometry. Surface approximation analysis unit 212 and RAHT unit 218 of FIG. 2 and surface approximation synthesis unit 310 and RAHT unit 314 of FIG. 3 are options typically used for Category 1 data. The lifting units of LOD generation units 220 and 222, as well as LOD generation unit 316 and reverse lifting unit 318 of FIG. 3, are options typically used for Category 3 data. All other modules are common between categories 1 and 3.

ジオメトリについて、2つの異なるタイプのコーディング技法、すなわち、8分木および予測木コーディングが存在する。以下では、本開示は、8分木コーディングに焦点を当てる。カテゴリー3データの場合、圧縮されたジオメトリは、通常、個々のボクセルのルートからリーフレベルまでの8分木として表される。カテゴリー3データの場合、圧縮されたジオメトリは、通常、個々のボクセルのルートからリーフレベルまでの8分木として表される。カテゴリー1データの場合、圧縮されたジオメトリは、通常、刈り込み8分木(すなわち、ボクセルよりも大きいブロックの、ルートから下方にリーフレベルまでの8分木)に、刈り込み8分木の各リーフ内の表面を近似するモデルを加えたものによって表される。このようにして、カテゴリー1データとカテゴリー3データの両方が、8分木コーディング機構を共有し、カテゴリー1データは、追加として、表面モデルを用いて各リーフ内のボクセルを近似してよい。使用される表面モデルは、ブロックごとに1～10個の三角形を含む三角測量であり、三角形スープを生じる。したがって、カテゴリー1ジオメトリコーデックは、Trisoupジオメトリコーデックとして知られ、カテゴリー3ジオメトリコーデックは、8分木ジオメトリコーデックとして知られる。 For geometry, two different types of coding techniques exist: octree and predictive tree coding. Below, this disclosure focuses on octree coding. For Category 3 data, the compressed geometry is typically represented as an octree from the root of the individual voxel to the leaf level. For Category 3 data, the compressed geometry is typically represented as an octree from the root of the individual voxel to the leaf level. For Category 1 data, the compressed geometry is typically compressed into a pruned octtree (i.e., an octree from the root down to the leaf level for blocks larger than a voxel), within each leaf of the pruned octtree. is expressed by adding a model that approximates the surface of . In this way, both category 1 and category 3 data share an octree coding scheme, and category 1 data may additionally approximate voxels within each leaf using a surface model. The surface model used is a triangulation containing 1 to 10 triangles per block, resulting in a triangle soup. Thus, category 1 geometry codecs are known as Trisoup geometry codecs, and category 3 geometry codecs are known as octree geometry codecs.

図4は、ジオメトリコーディングのための例示的な8分木分割を示す概念図である。8分木400は、8つの子ノードを含む。ノード402など、それらの子ノードのうちのいくつかは、子ノードを有していない。しかしながら、ノード404など、子ノードのうちの他のものは子ノードを有し、ノード404の子ノードのうちのいくつかもまた子ノードを有する、などとなる。 FIG. 4 is a conceptual diagram illustrating an example octree decomposition for geometry coding. Octree 400 includes eight child nodes. Some of those child nodes, such as node 402, have no child nodes. However, other of the child nodes, such as node 404, have child nodes, some of the child nodes of node 404 also have child nodes, and so on.

8分木の各ノードにおいて、子ノード(最大で8つのノード)のうちの1つまたは複数について占有がシグナリングされる(推論されないとき)。(a)現在の8分木ノードと面を共有するノード、(b)現在の8分木ノードと面、エッジ、または頂点を共有するノードなどを含む複数の近傍が指定される。各近傍内で、ノードおよび/またはその子の占有率を使用して現在のノードまたはその子の占有率を予測してもよい。8分木のいくつかのノードにおいて過疎な点について、コーデックは、点の3D位置が直接符号化される直接コーディングモードもサポートする。直接モードがシグナリングされることを示すためのフラグがシグナリングされてよい。最低レベルにおいて、8分木ノード/リーフノードに関連付けられた点の数もコーディングされ得る。 At each node of the octree, occupancy is signaled (when not inferred) for one or more of the child nodes (up to eight nodes). Multiple neighbors are specified, including (a) nodes that share faces with the current octree node, and (b) nodes that share faces, edges, or vertices with the current octree node. Within each neighborhood, the occupancy of the node and/or its children may be used to predict the occupancy of the current node or its children. For sparse points at some nodes of the octree, the codec also supports a direct coding mode in which the 3D positions of the points are directly encoded. A flag may be signaled to indicate that direct mode is signaled. At the lowest level, the number of points associated with an octree node/leaf node may also be coded.

ジオメトリがコーディングされると、ジオメトリ点に対応する属性がコーディングされる。1つの再構成された/復号されたジオメトリ点に対応する複数の属性点があるとき、再構成された点を表す属性値が導出され得る。 Once the geometry is coded, the attributes corresponding to the geometry points are coded. When there are multiple attribute points corresponding to one reconstructed/decoded geometry point, attribute values representing the reconstructed point can be derived.

G-PCCには3つの属性コーディング方法、すなわち、領域適応階層変換(RAHT)コーディング、補間ベース階層最近傍予測(予測変換)、および更新/リフティングステップを用いる補間ベース階層最近傍予測(リフティング変換)がある。RAHTおよびリフティングは通常、カテゴリー1データのために使用され、予測は通常、カテゴリー3データのために使用される。しかしながら、いずれの方法も任意のデータのために使用されてよく、単にG-PCCにおけるジオメトリコーデックを用いるように、点群をコーディングするために使用される属性コーディング方法が、ビットストリームの中で指定される。 There are three attribute coding methods for G-PCC, namely Area Adaptive Hierarchical Transform (RAHT) coding, interpolation-based hierarchical nearest neighbor prediction (prediction transform), and interpolation-based hierarchical nearest neighbor prediction with update/lifting steps (lifting transform). There is. RAHT and lifting are typically used for Category 1 data, and forecasting is typically used for Category 3 data. However, either method may be used for arbitrary data; just like using the geometry codec in G-PCC, the attribute coding method used to code the point cloud is specified in the bitstream. be done.

属性のコーディングは、ある詳細度(LOD)で執り行われてよく、各詳細度を用いて、点群属性のより精密な表現が取得され得る。各詳細度は、近隣ノードからの距離メトリックに基づいて、またはサンプリング距離に基づいて指定されてよい。 Coding of attributes may be performed at certain levels of detail (LOD), and each level of detail may be used to obtain a more refined representation of point cloud attributes. Each level of detail may be specified based on a distance metric from neighboring nodes or based on a sampling distance.

G-PCCエンコーダ200において、属性のためのコーディング方法の出力として取得された残差が量子化される。量子化された残差は、コンテキスト適応型算術コーディングを使用してコーディングされ得る。 In the G-PCC encoder 200, the residual obtained as the output of the coding method for the attributes is quantized. The quantized residual may be coded using context-adaptive arithmetic coding.

図2の例では、G-PCCエンコーダ200は、座標変換ユニット202、色変換ユニット204、ボクセル化ユニット206、属性転送ユニット208、8分木分析ユニット210、表面近似分析ユニット212、算術符号化ユニット214、ジオメトリ再構成ユニット216、RAHTユニット218、LOD生成ユニット220、リフティングユニット222、係数量子化ユニット224、および算術符号化ユニット226を含み得る。 In the example of FIG. 2, the G-PCC encoder 200 includes a coordinate transformation unit 202, a color transformation unit 204, a voxelization unit 206, an attribute transfer unit 208, an octree analysis unit 210, a surface approximation analysis unit 212, and an arithmetic coding unit. 214, a geometry reconstruction unit 216, a RAHT unit 218, an LOD generation unit 220, a lifting unit 222, a coefficient quantization unit 224, and an arithmetic encoding unit 226.

図2の例に示すように、G-PCCエンコーダ200は、位置のセットおよび属性のセットを受信し得る。位置は、点群の中の点の座標を含み得る。属性は、点群の中の点に関連付けられた色など、点群の中の点についての情報を含み得る。 As shown in the example of FIG. 2, G-PCC encoder 200 may receive a set of locations and a set of attributes. A location may include the coordinates of a point within a point cloud. Attributes may include information about points in the point cloud, such as colors associated with points in the point cloud.

座標変換ユニット202は、座標を初期ドメインから変換ドメインに変換するように、点の座標に変換を適用してよい。本開示は、変換された座標を変換座標と呼ぶことがある。色変換ユニット204は、属性の色情報を異なるドメインに変換するための変換を適用し得る。たとえば、色変換ユニット204は、色情報をRGB色空間からYCbCr色空間に変換し得る。 Coordinate transformation unit 202 may apply a transformation to the coordinates of the points to transform the coordinates from an initial domain to a transformed domain. This disclosure may refer to the transformed coordinates as transformed coordinates. Color transformation unit 204 may apply a transformation to transform the color information of the attributes to different domains. For example, color conversion unit 204 may convert color information from RGB color space to YCbCr color space.

さらに、図2の例では、ボクセル化ユニット206は、変換座標をボクセル化し得る。変換座標のボクセル化は、量子化と、点群のいくつかの点を除去することとを含み得る。言い換えれば、点群の複数の点が単一の「ボクセル」内に包含されてよく、ボクセルは、その後、いくつかの観点において1つの点として扱われてよい。さらに、8分木分析ユニット210は、ボクセル化された変換座標に基づいて8分木を生成し得る。追加として、図2の例では、表面近似分析ユニット212は、点のセットの表面表現を潜在的に決定するために点を分析し得る。算術符号化ユニット214は、8分木、および/または表面近似分析ユニット212によって決定された表面の情報を表す、シンタックス要素をエントロピー符号化し得る。G-PCCエンコーダ200は、これらのシンタックス要素をジオメトリビットストリームにおいて出力し得る。 Additionally, in the example of FIG. 2, voxelization unit 206 may voxelize the transformed coordinates. Voxelization of the transformed coordinates may include quantization and removing some points of the point cloud. In other words, multiple points of a point cloud may be contained within a single "voxel," which may then be treated as a single point in some respects. Additionally, octree analysis unit 210 may generate an octree based on the voxelized transformed coordinates. Additionally, in the example of FIG. 2, surface approximation analysis unit 212 may analyze the points to potentially determine a surface representation of the set of points. Arithmetic encoding unit 214 may entropy encode syntax elements representing information of the octree and/or surface determined by surface approximation analysis unit 212. G-PCC encoder 200 may output these syntax elements in a geometry bitstream.

ジオメトリ再構成ユニット216は、8分木、表面近似分析ユニット212によって決定された表面を示すデータ、および/または他の情報に基づいて、点群の中の点の変換座標を再構成し得る。ジオメトリ再構成ユニット216によって再構成された変換座標の数は、ボクセル化および表面近似により、点群の点の元の数とは異なることがある。本開示は、得られた点を再構成された点と呼ぶことがある。属性転送ユニット208は、点群の元の点の属性を点群の再構成された点に転送し得る。 Geometry reconstruction unit 216 may reconstruct transformed coordinates of points in the point cloud based on the octree, data representing the surface determined by surface approximation analysis unit 212, and/or other information. The number of transformed coordinates reconstructed by the geometry reconstruction unit 216 may differ from the original number of points of the point cloud due to voxelization and surface approximation. This disclosure may refer to the obtained points as reconstructed points. Attribute transfer unit 208 may transfer attributes of the original points of the point cloud to the reconstructed points of the point cloud.

さらに、RAHTユニット218は、RAHTコーディングを再構成された点の属性に適用し得る。代替または追加として、LOD生成ユニット220およびリフティングユニット222は、それぞれ、LOD処理およびリフティングを再構成された点の属性に適用し得る。RAHTユニット218およびリフティングユニット222は、属性に基づいて係数を生成し得る。係数量子化ユニット224は、RAHTユニット218またはリフティングユニット222によって生成された係数を量子化し得る。算術符号化ユニット226は、量子化された係数を表すシンタックス要素に算術コーディングを適用し得る。G-PCCエンコーダ200は、これらのシンタックス要素を属性ビットストリームにおいて出力し得る。 Additionally, RAHT unit 218 may apply RAHT coding to the attributes of the reconstructed points. Alternatively or additionally, LOD generation unit 220 and lifting unit 222 may apply LOD processing and lifting, respectively, to the reconstructed point attributes. RAHT unit 218 and lifting unit 222 may generate coefficients based on the attributes. Coefficient quantization unit 224 may quantize the coefficients generated by RAHT unit 218 or lifting unit 222. Arithmetic encoding unit 226 may apply arithmetic coding to syntax elements representing quantized coefficients. G-PCC encoder 200 may output these syntax elements in the attribute bitstream.

図3の例では、G-PCCデコーダ300は、ジオメトリ算術復号ユニット302、属性算術復号ユニット304、8分木合成ユニット306、逆量子化ユニット308、表面近似合成ユニット310、ジオメトリ再構成ユニット312、RAHTユニット314、LoD生成ユニット316、逆リフティングユニット318、逆座標変換ユニット320、および逆色変換ユニット322を含み得る。 In the example of FIG. 3, the G-PCC decoder 300 includes a geometry arithmetic decoding unit 302, an attribute arithmetic decoding unit 304, an octree synthesis unit 306, an inverse quantization unit 308, a surface approximation synthesis unit 310, a geometry reconstruction unit 312, It may include a RAHT unit 314, a LoD generation unit 316, an inverse lifting unit 318, an inverse coordinate transformation unit 320, and an inverse color transformation unit 322.

G-PCCデコーダ300は、ジオメトリビットストリームおよび属性ビットストリームを取得し得る。G-PCCデコーダ300のジオメトリ算術復号ユニット302は、ジオメトリビットストリーム中のシンタックス要素に算術復号(たとえば、コンテキスト適応型バイナリ算術コーディング(CABAC)または他のタイプの算術復号)を適用してよい。同様に、属性算術復号ユニット304は、属性ビットストリーム中のシンタックス要素に算術復号を適用してよい。 G-PCC decoder 300 may obtain a geometry bitstream and an attribute bitstream. Geometry arithmetic decoding unit 302 of G-PCC decoder 300 may apply arithmetic decoding (eg, context adaptive binary arithmetic coding (CABAC) or other types of arithmetic decoding) to syntax elements in the geometry bitstream. Similarly, attribute arithmetic decoding unit 304 may apply arithmetic decoding to syntax elements in the attribute bitstream.

8分木合成ユニット306は、ジオメトリビットストリームから解析されたシンタックス要素に基づいて8分木を合成し得る。ジオメトリビットストリームの中で表面近似が使用される事例では、表面近似合成ユニット310は、ジオメトリビットストリームから解析されたシンタックス要素に基づいて、および8分木に基づいて、表面モデルを決定し得る。 Octree synthesis unit 306 may synthesize an octree based on syntax elements parsed from the geometry bitstream. In instances where a surface approximation is used in the geometry bitstream, the surface approximation synthesis unit 310 may determine the surface model based on syntax elements parsed from the geometry bitstream and based on the octree. .

さらに、ジオメトリ再構成ユニット312は、点群の中の点の座標を決定するために再構成を実行し得る。逆座標変換ユニット320は、点群の中の点の再構成された座標(位置)を変換ドメインから初期ドメインにコンバートし戻すために、逆変換を再構成された座標に適用し得る。 Additionally, geometry reconstruction unit 312 may perform reconstruction to determine coordinates of points in the point cloud. Inverse coordinate transformation unit 320 may apply an inverse transformation to the reconstructed coordinates (positions) of points in the point cloud to convert them from the transformation domain back to the initial domain.

さらに、図3の例では、逆量子化ユニット308は属性値を逆量子化し得る。属性値は、属性ビットストリームから取得されたシンタックス要素(たとえば、属性算術復号ユニット304によって復号されたシンタックス要素を含む)に基づき得る。 Additionally, in the example of FIG. 3, dequantization unit 308 may dequantize the attribute values. The attribute values may be based on syntax elements obtained from the attribute bitstream (eg, including syntax elements decoded by attribute arithmetic decoding unit 304).

どのように属性値が符号化されるかに応じて、RAHTユニット314は、逆量子化された属性値に基づいて点群の点についての色値を決定するために、RAHTコーディングを実行し得る。代替的に、LOD生成ユニット316および逆リフティングユニット318が、詳細度ベースの技法を使用して、点群の点についての色値を決定し得る。 Depending on how the attribute values are encoded, RAHT unit 314 may perform RAHT coding to determine color values for points in the point cloud based on the dequantized attribute values. . Alternatively, LOD generation unit 316 and inverse lifting unit 318 may determine color values for points in the point cloud using detail-based techniques.

さらに、図3の例では、逆色変換ユニット322は、色値に逆色変換を適用し得る。逆色変換は、G-PCCエンコーダ200の色変換ユニット204によって適用される色変換の逆であってよい。たとえば、色変換ユニット204は、色情報をRGB色空間からYCbCr色空間に変換し得る。それに応じて、逆色変換ユニット322は、色情報をYCbCr色空間からRGB色空間に変換してよい。 Additionally, in the example of FIG. 3, inverse color transform unit 322 may apply an inverse color transform to the color values. The inverse color transform may be the inverse of the color transform applied by color transform unit 204 of G-PCC encoder 200. For example, color conversion unit 204 may convert color information from RGB color space to YCbCr color space. Accordingly, inverse color conversion unit 322 may convert the color information from YCbCr color space to RGB color space.

図2および図3の様々なユニットは、G-PCCエンコーダ200およびG-PCCデコーダ300によって実行される動作を理解するのを助けるために示されている。ユニットは、固定機能回路、プログラマブル回路、またはそれらの組合せとして実装され得る。固定機能回路は、特定の機能を提供する回路を指し、実行され得る動作に対してプリセットされる。プログラマブル回路は、様々なタスクを実行するようにプログラムされ得る回路を指し、実行され得る動作において柔軟な機能を提供する。たとえば、プログラマブル回路は、ソフトウェアまたはファームウェアの命令によって定義された方法でプログラマブル回路を動作させるソフトウェアまたはファームウェアを実行し得る。固定機能回路は、(たとえば、パラメータを受信するかまたはパラメータを出力するための)ソフトウェア命令を実行し得るが、固定機能回路が実行する動作のタイプは概して不変である。いくつかの例では、ユニットのうちの1つまたは複数は、異なる回路ブロック(固定機能またはプログラマブル)であってよく、いくつかの例では、ユニットのうちの1つまたは複数は集積回路であってよい。 The various units in FIGS. 2 and 3 are shown to aid in understanding the operations performed by G-PCC encoder 200 and G-PCC decoder 300. A unit may be implemented as a fixed function circuit, a programmable circuit, or a combination thereof. Fixed function circuitry refers to circuitry that provides a specific function and is preset for the operations that may be performed. Programmable circuit refers to a circuit that can be programmed to perform a variety of tasks, providing flexibility in the operations that can be performed. For example, a programmable circuit may execute software or firmware that causes the programmable circuit to operate in a manner defined by the software or firmware instructions. Although fixed function circuits may execute software instructions (eg, to receive parameters or output parameters), the types of operations that fixed function circuits perform generally remain unchanged. In some examples, one or more of the units may be different circuit blocks (fixed function or programmable), and in some examples, one or more of the units may be an integrated circuit. good.

予測ジオメトリコーディングは、8分木ジオメトリコーディングに対する代替として導入され、ノードは(予測構造を定義する)木構造の中に配置され、様々な予測戦略が、その予測子に関連する木の中の各ノードの座標を予測するために使用される。 Predictive geometry coding was introduced as an alternative to octree geometry coding, where nodes are placed in a tree structure (defining the prediction structure) and different prediction strategies Used to predict node coordinates.

図5は、予測木410の一例を示し、この例は、矢印が予測方向を指す有向グラフとして表されている。ノード412はルート頂点であり、予測子を有していない。ノード414Aおよび414Bは、2つの子を有する。破線ノードは、3つの子を有する。白色ノードは、1つの子を有し、ノード418A～418Eは、子を有していないリーフノードである。各ノードは1つだけの親ノードを有する。 FIG. 5 shows an example of a prediction tree 410, which is represented as a directed graph with arrows pointing in prediction directions. Node 412 is the root vertex and has no predictors. Nodes 414A and 414B have two children. The dashed node has three children. White nodes have one child, and nodes 418A-418E are leaf nodes with no children. Each node has only one parent node.

4つの予測戦略は、その親(p0)、祖父(p1)および祖父の父(p2)に基づいて各ノードに対して指定される。 Four prediction strategies are specified for each node based on its parent (p0), grandfather (p1) and grandfather's father (p2).

・予測なし/ゼロ予測(0)
・デルタ予測(p0)
・直線予測(2*p0-p1)
・平行四辺形予測(2*p0+p1-p2)
G-PCCエンコーダ200は、予測木を生成するために任意のアルゴリズムを採用してもよく、使用されるアルゴリズムは、適用/使用ケースに基づいて決定されてもよく、いくつかの戦略が使用されてもよい。いくつかの戦略については、G-PCC Codec Description、ISO/IEC JTC 1/SC29/WG 7 MDS19620、Teleconference、2020年10月に記載されている。・No prediction/zero prediction (0)
・Delta prediction (p0)
・Line prediction (2*p0-p1)
・Parallelogram prediction (2*p0+p1-p2)
The G-PCC encoder 200 may employ any algorithm to generate the prediction tree, and the algorithm used may be determined based on the application/use case, and several strategies may be used. You can. Some strategies are described in the G-PCC Codec Description, ISO/IEC JTC 1/SC29/WG 7 MDS19620, Teleconference, October 2020.

各ノードに対して、残差座標値(residual coordinate value)が、深度優先で、ルートノードから開始するビットストリームの中でコーディングされる。 For each node, residual coordinate values are coded in the bitstream starting from the root node in a depth-first manner.

予測ジオメトリコーディングは、主にカテゴリー3(LIDAR獲得された)点群データに対して、たとえば、低レイテンシ適用例に対して有用である。 Predictive geometry coding is primarily useful for category 3 (LIDAR-acquired) point cloud data, eg, for low-latency applications.

角度モードが、予測ジオメトリコーディングの中で使用されてもよく、ここで、LIDARセンサーの特性が、予測木をより効率的にコーディングすることにおいて利用され得る。位置の座標が、(r,φ,i)(半径、方位角、およびレーザーインデックス)にコンバートされ、予測は、このドメインの中で実行される(残差が、r,φ,iドメインの中でコーディングされる)。丸めにおける誤差に起因して、r,φ,iにおけるコーディングは無損失ではなく、したがって、デカルト座標に対応する残差の第2のセットがコーディングされる。予測ジオメトリコーディングに対する角度モードに対して使用される符号化および復号戦略の説明が、G-PCC Codec Description、ISO/IEC JTC 1/SC29/WG 7 MDS19620、Teleconference、2020年10月から以下で再現される。 Angular mode may be used in predictive geometry coding, where the characteristics of the LIDAR sensor can be exploited in coding the predictive tree more efficiently. The coordinates of the position are converted to (r,φ,i) (radius, azimuth, and laser index) and the prediction is performed in this domain (the residuals are in the r,φ,i domain). ). Due to errors in rounding, the coding in,r,φ,i,is not lossless, so a second set of residuals,corresponding to Cartesian coordinates, is coded. A description of the encoding and decoding strategies used for angular modes for predictive geometry coding is reproduced below from the G-PCC Codec Description, ISO/IEC JTC 1/SC29/WG 7 MDS19620, Teleconference, October 2020. Ru.

方法は、回転するLidarモデルを使用して獲得された点群に焦点を当てる。ここで、ライダーは、方位角φに従ってZ軸回りに回転するN個のレーザー(たとえば、N=16、32、64)を有する(図6参照)。各レーザーは、異なる仰角θ(i)_i=1...Nおよび高さσ(i)_i=1...Nを有し得る。レーザーiが、図6で説明する座標系に従って定義されたデカルト整数座標(x,y,z)を有する点Mに当たると仮定する。 The method focuses on point clouds acquired using a rotating lidar model. Here, the lidar has N lasers (eg, N=16, 32, 64) rotating around the Z axis according to the azimuthal angle φ (see Figure 6). Each laser may have a different elevation angle θ(i) _i=1...N and height σ(i) _i=1...N . Assume that laser i hits a point M with Cartesian integer coordinates (x,y,z) defined according to the coordinate system described in FIG.

この方法は、次のように計算される、3つのパラメータ(r,φ,i)を用いたMの位置のモデリングを提供する。 This method provides modeling of the position of M with three parameters (r, φ, i), which are calculated as follows.

より正確には、方法は、 More precisely, the method is

で示される、(r,φ,i)の量子化バージョンを使用し、ここで3つの整数 We use a quantized version of (r,φ,i), denoted by where three integers

、 ,

およびiは、次のように計算される。 and i are calculated as follows.

ただし、
・ (q_r,o_r)および(q_φ,o_φ)は、 however,
・(q _r ,o _r ) and (q _φ ,o _φ ) are

および and

のそれぞれの精度を制御する量子化パラメータである。 is a quantization parameter that controls the precision of each.

・ sign(t)は、tが正であれば1を、そうでなければ(-1)を返す関数である。・ sign(t) is a function that returns 1 if t is positive, and (-1) otherwise.

・ |t|は、tの絶対値である。 - |t| is the absolute value of t.

浮動小数点演算の使用による再構成不整合を回避するために、σ(i)_i=1...Nおよびtan(θ(i))_i=1...Nは、次のように事前計算および量子化される。 To avoid reconstruction inconsistencies due to the use of floating point arithmetic, σ(i) _i=1...N and tan(θ(i)) _i=1...N are precomputed as and quantized.

ただし、
・ (q_σ,o_σ)および(q_θ,o_θ)は、 however,
・(q _σ ,o _σ ) and (q _θ ,o _θ ) are

および and

再構成されたデカルト座標は、次のように取得される。 The reconstructed Cartesian coordinates are obtained as follows.

ただし、app_cos(.)およびapp_sin(.)は、cos(.)およびsin(.)の近似値である。計算は、固定小数点表現、ルックアップテーブル、および線形補間を使用することであり得る。 However, app_cos(.) and app_sin(.) are approximate values of cos(.) and sin(.). Calculations may be using fixed point representations, lookup tables, and linear interpolation.

は、様々な理由、
- 量子化
- 近似値
- モデルの不正確性
- モデルパラメータの不正確性
のために、(x,y,z)とは異なり得ることに留意されたい。 For various reasons,
- Quantization
- approximation
- Model inaccuracy
- Note that it may differ from (x,y,z) due to inaccuracies in model parameters.

(r_x,r_y,r_z)を、次のように定義される再構成残差とする。 Let (r _x , r _y , r _z ) be the reconstruction residuals defined as follows.

この方法では、エンコーダ(たとえば、G-PCCエンコーダ200)は、次のように進む。 In this method, the encoder (eg, G-PCC encoder 200) proceeds as follows.

・モデルパラメータ・Model parameters

および and

ならびに量子化パラメータq_r、q_σ、q_θ、およびq_φを符号化する
・ Text of ISO/IEC FDIS 23090-9 Geometry-based Point Cloud Compression、ISO/IEC JTC 1/SC29/WG 7 MDS19617、Teleconference、2020年10月において説明されるジオメトリ予測スキームを表現 and encode the quantization parameters q _r , q _σ , q _θ , and q _φ Text of ISO/IEC FDIS 23090-9 Geometry-based Point Cloud Compression, ISO/IEC JTC 1/SC29/WG 7 MDS19617, Teleconference , representing the geometry prediction scheme described in October 2020

に適用する
○ ライダーの特性を活用する新しい予測子が導入され得る。たとえば、z軸回りのライダースキャナの回転速度は通常一定である。したがって、現在の ○ New predictors that exploit lidar characteristics may be introduced. For example, the rotational speed of a lidar scanner about the z-axis is typically constant. Therefore, the current

は、次のように予測され得る。 can be predicted as follows.

ただし、
○ (δ_φ(k))_k=1...Kは、エンコーダがそこから選定することができる可能性のある速度のセットである。インデックスkは、ビットストリームに明示的に書き込まれ得るか、またはエンコーダとデコーダの両方によって適用される決定論的戦略に基づいて、コンテキストから推論され得る。 however,
○ (δ _φ (k)) _k=1...K is the set of possible velocities from which the encoder can choose. The index k may be written explicitly into the bitstream or may be inferred from the context based on a deterministic strategy applied by both the encoder and decoder.

○ n(j)はスキップされた点の数であり、ビットストリームに明示的に書き込まれ得るか、またはエンコーダとデコーダの両方によって適用される決定論的戦略に基づいて、コンテキストから推論され得る。n(j)はまた、後に「ファイ乗数(phi multiplier)」と呼ばれることもあり、いくつかの実装形態では、デルタ予測子とともにのみ使用され得る。 o n(j) is the number of skipped points, which can be explicitly written to the bitstream or inferred from the context based on a deterministic strategy applied by both the encoder and decoder. n(j) is also later referred to as a "phi multiplier" and may in some implementations be used only with a delta predictor.

・各ノードを用いて、再構成残差(r_x,r_y,r_z)を符号化する
G-PCCデコーダ300は、次のように進む。・Encode the reconstruction residual (r _x , r _y , r _z ) using each node
G-PCC decoder 300 proceeds as follows.

・モデルパラメータ・Model parameters

および and

ならびに量子化パラメータq_r、q_σ、q_θ、およびq_φを復号する
・ Text of ISO/IEC FDIS 23090-9 Geometry-based Point Cloud Compression、ISO/IEC JTC 1/SC29/WG 7 MDS19617、Teleconference、2020年10月において説明されるジオメトリ予測スキームに従って、ノードに関連する and decode the quantization parameters q _r , q _σ , q _θ , and q _φ Text of ISO/IEC FDIS 23090-9 Geometry-based Point Cloud Compression, ISO/IEC JTC 1/SC29/WG 7 MDS19617, Teleconference, associated with the node according to the geometry prediction scheme described in October 2020.

パラメータを復号する。 Decode the parameters.

・上記で説明したように、再構成された座標 - Reconstructed coordinates as explained above

を計算する
・残差(r_x,r_y,r_z)を復号する
○ 次のセクションで説明するように、不可逆圧縮は、再構成残差(r_x,r_y,r_z)を量子化することによってサポートされ得る
・元の座標(x,y,z)を次のように計算する Decode the residual (r _x ,r _y ,r _z ) ○ As explained in the next section, lossy compression quantizes the reconstructed residual (r _x ,r _y ,r _z ) can be supported by calculating the original coordinates (x,y,z) as follows

不可逆圧縮は、再構成残差(r_x,r_y,r_z)に量子化を適用することによって、または点をドロップすることによって達成され得る。 Lossy compression can be achieved by applying quantization to the reconstruction residuals (r _x , r _y , r _z ) or by dropping points.

量子化された再構成残差は、次のように計算される。 The quantized reconstruction residual is calculated as follows.

ただし、(q_x,o_x)、(q_y,o_y)および(q_z,o_z)は、 However, (q _x, o _x ), (q _y, o _y ) and (q _z, o _z ) are

、 ,

および and

RD(レートひずみ)性能の結果をさらに改善するために、トレリス量子化が使用され得る。 Trellis quantization may be used to further improve the RD (rate-distortion) performance results.

量子化パラメータは、領域適応品質(region adaptive quality)を達成するため、およびレート制御目的のために、シーケンス/フレーム/スライス/ブロックレベルにおいて変化し得る。 Quantization parameters may be varied at the sequence/frame/slice/block level to achieve region adaptive quality and for rate control purposes.

G-PCCエンコーダ200は、インター予測のための動き推定を実行するように構成され得る。以下では、InterEMソフトウェアにおいて適用される動き推定(グローバルおよびローカル)プロセスについて説明する。InterEMは、インター予測のための8分木ベースのコーディング拡張に基づく。動き推定は8分木ベースのフレームワークに適用されるが、同様のプロセス(またはそれの少なくとも一部)は、予測ジオメトリコーディングにも適用可能であり得る。 G-PCC encoder 200 may be configured to perform motion estimation for inter prediction. In the following, we describe the motion estimation (global and local) process applied in the InterEM software. InterEM is based on an octree-based coding extension for inter prediction. Although motion estimation is applied to an octree-based framework, a similar process (or at least part of it) may also be applicable to predictive geometry coding.

G-PCC InterEMソフトウェアに関わる2つの種類の動き、すなわち、グローバル動き行列およびローカルノード動きベクトルがある。グローバル動きパラメータは、予測(参照)フレーム内のすべての点に適用され得る、回転行列および並進ベクトルを含む。8分木のノードのローカルノード動きベクトルは、予測(参照)フレーム内のノード内の点にのみ適用される動きベクトルである。InterEMにおける動き推定アルゴリズムの詳細について、以下で説明する。 There are two types of motion involved in the G-PCC InterEM software: global motion matrices and local nodal motion vectors. Global motion parameters include rotation matrices and translation vectors that can be applied to all points in the predicted (reference) frame. The local node motion vector of a node in an octree is a motion vector that applies only to points within the node in the predicted (reference) frame. The details of the motion estimation algorithm in InterEM are explained below.

図7は、動き推定プロセスを示すフローチャートを示す。プロセスへの入力は、予測フレーム420および現在のフレーム422を含む。G-PCCエンコーダ200は、最初にグローバルスケールにおけるグローバル動きを推定する(424)。予測フレーム420に推定されたグローバル動きを適用した後(426)、G-PCCエンコーダ200は、8分木におけるより精密なスケール、ノードレベルにおけるローカル動きを推定する(428)。最後に、G-PCCエンコーダ200は、推定されたローカルノード動きに動き補償を適用し、決定された動きベクトルおよび点を符号化する(430)。 FIG. 7 shows a flowchart illustrating the motion estimation process. Inputs to the process include predicted frame 420 and current frame 422. G-PCC encoder 200 first estimates global motion on a global scale (424). After applying the estimated global motion to the predicted frame 420 (426), the G-PCC encoder 200 estimates the local motion at a finer scale, node level, in the octree (428). Finally, G-PCC encoder 200 applies motion compensation to the estimated local node motion and encodes the determined motion vectors and points (430).

図7の態様について、以下でより詳細に説明する。G-PCCエンコーダ200は、グローバル動き行列および並進ベクトルを推定するためのプロセスを実行し得る。InterEMソフトウェアでは、グローバル動き行列は、予測フレーム(参照)と現在のフレームとの間の特徴点を一致させるように定義される。 The embodiment of FIG. 7 is described in more detail below. G-PCC encoder 200 may perform a process to estimate global motion matrices and translation vectors. In InterEM software, a global motion matrix is defined to match the feature points between the predicted frame (reference) and the current frame.

図8は、G-PCCエンコーダ200によって実行され得るグローバル動き推定プロセスの一例を示す。図8の例では、G-PCCエンコーダ200は、特徴点を発見し(432)、特徴点をサンプリングし(434)、最小2乗平均(LMS)アルゴリズムを使用した動き推定を実行する(436)。 FIG. 8 shows an example of a global motion estimation process that may be performed by G-PCC encoder 200. In the example of FIG. 8, G-PCC encoder 200 finds feature points (432), samples the feature points (434), and performs motion estimation using a least mean squares (LMS) algorithm (436). .

図8によって示されているアルゴリズムでは、予測フレームと現在のフレームとの間で大きい位置変化を有する点が、特徴点であると定義され得る。現在のフレーム内の各点について、予測フレーム内の最も近い点が発見され、点ペアが、現在のフレームと予測フレームとの間で構築される。ペアにされた点の間の距離がしきい値よりも大きい場合、それらのペアにされた点が特徴点と見なされる。 In the algorithm illustrated by FIG. 8, points that have a large position change between the predicted frame and the current frame may be defined as feature points. For each point in the current frame, the closest point in the predicted frame is found and point pairs are constructed between the current frame and the predicted frame. If the distance between paired points is greater than a threshold, the paired points are considered feature points.

特徴点を発見した後、(たとえば、動き推定の複雑さを低減するように、特徴点のサブセットを選定することによって)問題のスケールを低減するために、特徴点におけるサンプリングが実行される。次いで、LMSアルゴリズムは、予測フレームおよび現在のフレーム内のそれぞれの特徴点の間の誤差を低減するように試みることによって、動きパラメータを導出するために適用される。 After finding the feature points, sampling at the feature points is performed to reduce the scale of the problem (eg, by selecting a subset of the feature points to reduce the complexity of motion estimation). The LMS algorithm is then applied to derive motion parameters by attempting to reduce the error between the respective feature points in the predicted frame and the current frame.

図9は、ローカルノード動きベクトルを推定するための例示的なプロセスを示す。図9に示されたローカルノード推定アルゴリズムでは、動きベクトルが再帰的な様式において推定される。最も好適な動きベクトルを選定するために使用されるコスト関数は、レートひずみコストに基づき得る。図9では、パス440は、8つの子に分割されない現在のノードのためのプロセスを示し、パス442は、8つの子に分割される現在のノードのためのプロセスを示す。 FIG. 9 shows an example process for estimating local node motion vectors. In the local node estimation algorithm shown in FIG. 9, motion vectors are estimated in a recursive manner. The cost function used to select the most suitable motion vector may be based on rate-distortion costs. In FIG. 9, path 440 shows the process for the current node that is not split into eight children, and path 442 shows the process for the current node that is split into eight children.

現在のノードが8つの子に分割されない場合(440)、現在のノードと予測ノードとの間の最も低いコストを生じ得る動きベクトルが決定される。現在のノードが8つの子に分割される場合(442)、動き推定アルゴリズムが適用され、分割条件下の総コストが、各子ノードの推定されたコスト値を加算することによって取得される。分割するか分割しないかの決定には、分割と分割なしとの間のコストを比較することによって到達され、分割される場合、各サブノードは、それのそれぞれの動きベクトルを割り当てられ(または、それの子までさらに分割され得)、分割されない場合、現在のノードは、動きベクトルを割り当てられる。 If the current node is not split into eight children (440), the motion vector that can yield the lowest cost between the current node and the predicted node is determined. If the current node is split into eight children (442), a motion estimation algorithm is applied and the total cost under the split condition is obtained by adding the estimated cost values of each child node. The decision to split or not split is reached by comparing the cost between split and no split; if split, each subnode is assigned its respective motion vector (or ), otherwise the current node is assigned a motion vector.

動きベクトル推定の性能に影響を与える2つのパラメータは、ブロックサイズ(BlockSize)および最小予測ユニットサイズ(MinPUSize)である。BlockSizeは、動きベクトル推定を適用するためにノードサイズの上限を定義し、MinPUSizeは下限を定義する。 Two parameters that affect the performance of motion vector estimation are block size (BlockSize) and minimum prediction unit size (MinPUSize). BlockSize defines the upper bound of the node size to apply motion vector estimation, and MinPUSize defines the lower bound.

基本的に8分木コーダであるInterEMソフトウェアは、占有予測を実行し、占有予測を行いながら、グローバル/ローカル動きおよび参照点群の情報を使用する。したがって、InterEMソフトウェアは、点の直接的な動き補償を実行せず、点の直接的な動き補償は、たとえば、現在のフレームに点を投射するために、参照フレーム内の点に動きを適用することを含み得る。次いで、実際の点と予測された点との間の差がコーディングされ得、このことは、インター予測の実行においてより効果的であり得る。 The InterEM software, which is essentially an octree coder, performs occupancy prediction and uses global/local motion and reference point cloud information while making occupancy prediction. Therefore, the InterEM software does not perform direct point motion compensation; direct point motion compensation applies motion to a point in a reference frame, e.g. to project the point to the current frame. may include. The difference between the actual point and the predicted point may then be coded, which may be more effective in performing inter-prediction.

本文書で開示する1つまたは複数の技法は、独立して適用されるか、または組み合わせられ得る。本開示は、柔軟な8分木区分ベースのコーディング構造からの利益を依然として得ながら、直接的な動き補償を実行するための技法を提案する。以下では、これらの技法が、主に8分木分割のコンテキストにおいて示されるが、OTQTBT(8分木-4分木-2分木)分割シナリオにも拡張され得る。 One or more techniques disclosed in this document may be applied independently or combined. This disclosure proposes techniques for performing direct motion compensation while still benefiting from a flexible octree partition-based coding structure. In the following, these techniques are primarily presented in the context of octree partitioning, but can also be extended to OTQTBT (octtree-quadtree-binary tree) partitioning scenarios.

G-PCCエンコーダ200および/またはG-PCCデコーダ300は、高レベル分割を実行するように、かつモードフラグを処理するように構成され得る。本開示の一例では、G-PCCエンコーダ200および/またはG-PCCデコーダ300は、現在の点群上で(占有予測のための)8分木ベースの分割を実行するように構成され得る。しかしながら、あるレベルにおいて8分木分割を停止し、次いで、占有をコーディングするのではなく、その8分木リーフボリューム(octree-leaf volume)の内部の点を直接コーディングすること(以下で、これが「直接予測」と呼ばれる)が可能である。リーフノードサイズ、または8分木深度値は、8分木分割が停止され、点が8分木リーフボリュームとしてコーディングされるレベルを指定するために、シグナリングされ得る。 G-PCC encoder 200 and/or G-PCC decoder 300 may be configured to perform high-level segmentation and process mode flags. In one example of this disclosure, G-PCC encoder 200 and/or G-PCC decoder 300 may be configured to perform octree-based segmentation (for occupancy prediction) on the current point cloud. However, rather than stopping the octree split at some level and then coding the occupancy, it is possible to directly code points inside that octree-leaf volume (in what follows, this is referred to as direct prediction) is possible. A leaf node size, or octree depth value, may be signaled to specify the level at which octree splitting is stopped and points are coded as octree leaf volumes.

各そのような8分木リーフノードについて、8分木分割が停止し、「直接予測」がアクティブ化される場合、8分木リーフボリュームの内部の点のセットがイントラ予測されるか、インター予測されるかを示すために、フラグがシグナリングされ得る。ジオメトリパラメータセットにおいて、8分木リーフのための最大および最小サイズが定義され得る。 For each such octree leaf node, if the octree splitting is stopped and "direct prediction" is activated, then the set of points inside the octree leaf volume are either intra-predicted or inter-predicted. A flag may be signaled to indicate whether the In the geometry parameter set, maximum and minimum sizes for the octree leaves may be defined.

図10は、8分木444の高レベル8分木分割を示す概念図である。図10は、13個の点(O0～O12)を含む直接予測のための8分木リーフノードの一例である。特殊な場合には、8分木のルートノード(分割なし)が、「直接予測」を使用してコーディングされ得る。 FIG. 10 is a conceptual diagram showing high-level octree partitioning of octree 444. FIG. 10 is an example of an octree leaf node for direct prediction that includes 13 points (O0 to O12). In a special case, the root node of an octree (without splitting) may be coded using "direct prediction".

G-PCCエンコーダ200および/またはG-PCCデコーダ300は、イントラ予測を実行するように構成され得る。フラグ値がイントラに設定されるとき、ボリュームの内部のすべての点がイントラ予測される。このために、「ローカル予測木」が生成される。そのような木の生成は、非規範的である(点が、方位角、モートン、半径方向などによる異なる順序で、または何らかの他の順序で横断され得る)。各点について、その予測モード(0、1、2、3)、子の数の情報、1次残差、および2次残差(角度モードが使用可能にされる場合)がシグナリングされる。そのため、要約すれば、イントラ予測は、その機能において予測ジオメトリコーディングと同様である。 G-PCC encoder 200 and/or G-PCC decoder 300 may be configured to perform intra prediction. When the flag value is set to intra, all points inside the volume are intra predicted. For this purpose, a "local prediction tree" is generated. The generation of such a tree is non-normative (points may be traversed in different orders by azimuthal, Morton, radial, etc., or in some other order). For each point, its prediction mode (0, 1, 2, 3), number of children information, first-order residual, and second-order residual (if angular mode is enabled) are signaled. So, in summary, intra prediction is similar in function to predictive geometry coding.

代替的に、また、単一の予測モードが8分木リーフボリューム内のすべての点についてシグナリングされ、それによって、関連するシグナリングコストが低減され得る。ゼロ予測子のための半径値(角度モードが使用可能にされる場合)、または(x、y、z)値(角度モードが使用不能にされる場合)は、たとえば、8分木リーフボリュームの内部の左上の点に設定され得る。代替的に、ゼロ予測子が8分木リーフボリュームについてシグナリングされ得るか、または8分木リーフボリュームの内部のゼロ予測子のために使用するための点を示すインデックスが、シグナリングされ得る。その上、値が8分木リーフボリュームの外部にある場合、予測/再構成を実行した後、クリッピングが実行され得る。 Alternatively, a single prediction mode may also be signaled for all points within the octree leaf volume, thereby reducing the associated signaling costs. The radius value (if angular mode is enabled) or (x, y, z) value (if angular mode is disabled) for the zero predictor is e.g. Can be set to the top left point inside. Alternatively, a zero predictor may be signaled for the octree leaf volume, or an index indicating a point to use for the zero predictor inside the octree leaf volume may be signaled. Moreover, if the value is outside the octree leaf volume, clipping may be performed after performing prediction/reconstruction.

イントラ予測のためのシンタックステーブルは、その内容全体が参照により組み込まれる、Text of ISO/IEC FDIS 23090-9 Geometry-based Point Cloud Compression、ISO/IEC JTC 1/SC29/WG 7 MDS19617、Teleconference、2020年10月において、予測木について説明されたシンタックスと同様であり得る。 The syntax table for intra-prediction is incorporated by reference in its entirety, Text of ISO/IEC FDIS 23090-9 Geometry-based Point Cloud Compression, ISO/IEC JTC 1/SC29/WG 7 MDS19617, Teleconference, 2020 The syntax can be similar to that described for prediction trees in October.

G-PCCエンコーダ200および/またはG-PCCデコーダ300は、インター予測を実行するように構成され得る。8分木リーフが、内部にN個の点、(O(0), ....,O(N-1))を有すると仮定すると、インター予測の場合、動き推定は、8分木リーフボリューム内の点の現在のセットを用いて、エンコーダ側において実行され、参照点群フレーム内の点の同様のセットとの最良の一致を発見する(参照点群が動き補償されていないか、またはグローバル動き補償されているかのいずれかであり得る場合)。8分木リーフのインター予測の場合、以下がシグナリングされる。 G-PCC encoder 200 and/or G-PCC decoder 300 may be configured to perform inter prediction. Assuming that an octree leaf has N points inside, (O(0), ....,O(N-1)), for inter prediction, motion estimation is performed using an octree leaf. is performed on the encoder side using the current set of points in the volume to find the best match with a similar set of points in the reference point cloud frame (if the reference point cloud is not motion compensated or global motion compensation (if applicable). For inter-prediction of octree leaves, the following is signaled:

i.参照インデックス(そこから予測するための複数の参照点群フレームがある場合)
ii.動きベクトル差分(MVD)。((ネイバーからのMV予測の実行に関して上記で説明したように)実際のMVと予測されたMVとの間の差)
iii.8分木リーフ内の点の数(N)。 i. Reference index (if you have multiple reference point cloud frames to predict from)
ii. Motion Vector Difference (MVD). (the difference between the actual MV and the predicted MV (as explained above regarding performing MV prediction from neighbors))
iii.Number of points in the octree leaf (N).

iv.N個の点のための1次(および、角度モードが使用可能にされる場合、2次も)残差(タプレット(tuplet)、3D座標間の差)(R'i)。 iv. The linear (and also quadratic if angular mode is enabled) residuals (tuplets, differences between 3D coordinates) for N points (R'i).

以下では、本開示は、8分木ノードのためのシグナリングされた参照インデックス(適用可能な場合)およびMVが与えられた動き補償プロセスについて説明する。 Below, this disclosure describes a motion compensation process given a signaled reference index (if applicable) and MV for an octree node.

a.現在の8分木リーフは、(X0, Y0, Z0)におけるその左上の点、および(Sx, Sy, Sz)としての次元を有し、動きベクトルはMV = (MVx, MVy, MVz)である。そのため、参照点群フレーム内の対応する参照ブロックは、左上が(Xr, Yr, Zr) = (X0 - MVx, Y0 - MVy, Z0 - MVz)であり、(Sx, Sy, Sz)のサイズを有する。 a. The current octree leaf has its top left point at (X0, Y0, Z0) and dimensions as (Sx, Sy, Sz), and the motion vector is MV = (MVx, MVy, MVz) It is. Therefore, the corresponding reference block in the reference point cloud frame has the upper left corner (Xr, Yr, Zr) = (X0 - MVx, Y0 - MVy, Z0 - MVz) and the size of (Sx, Sy, Sz). have

b.この参照ブロックの内部の、および1D配列において配置されたすべての点をフェッチし、順序は、たとえば、8分木リーフについてあらかじめ決定/固定またはシグナリングされ得る。たとえば、(参照フレーム内の)座標をもつM個のそのような点(R0,.....R(M-1))があるとし、ただし、Riは、i番目の点の3D座標を提供するトリプレットである。図12に示すように、i = 0...(M-1))の場合。 b. Fetch all points inside this reference block and located in a 1D array, the order may be predetermined/fixed or signaled for the octree leaves, for example. For example, suppose there are M such points (R0,....R(M-1)) with coordinates (in the reference frame), where Ri denotes the 3D coordinates of the i-th point. Triplets provided. As shown in Figure 12, when i = 0...(M-1)).

c.すべての点は、シグナリングされたMVを適用することによって動き補償され、これが、予測されたジオメトリ位置(Pi)、すなわち、図13に示すように、Pi = Ri + MVとして使用される。 c. All points are motion compensated by applying the signaled MV, which is used as the predicted geometry position (Pi), i.e. Pi = Ri + MV, as shown in Figure 13.

d.角度モードが使用可能にされる場合、すべてのM個の点について、対応する d. If angular mode is enabled, for all M points, the corresponding

が導出される。 is derived.

図12では、点の現在のセットがO0～O12とラベル付けされ、点の参照セットがR0～R12とラベル付けされ、N = M = 13である。図13では、点の現在のセットがO0～O12とラベル付けされ、点の動き補償された参照セットがP0～P12とラベル付けされ、N = M = 13である。 In Figure 12, the current set of points is labeled O0-O12, the reference set of points is labeled R0-R12, and N = M = 13. In Figure 13, the current set of points is labeled O0-O12, the motion compensated reference set of points is labeled P0-P12, and N = M = 13.

ここで、3つのシナリオがあり得る。 There are three possible scenarios here.

i.N = M(現在の8分木ノードおよび参照ブロックが、同じ数の点を有する)。 i.N = M (current octree node and reference block have the same number of points).

ii.N > M。 ii.N>M.

iii.N < M。 iii.N < M.

N=Mである第1のシナリオについて、次に説明する。残差(1次、および適用可能な場合、2次残差)が、動き補償された点に直接加算されて、再構成= Pi + R'iが生成される。 The first scenario where N=M will be explained next. The residuals (first order and, if applicable, second order residuals) are added directly to the motion compensated points to produce the reconstruction = Pi + R'i.

N > Mである第2のシナリオについて、次に説明する。1D配列における動き補償された点(Pi)が、最後の値P(M-1)を使用して拡張され、すなわち、[P'0, ......P'(M-1), P'(M),.....P'(N-1)] = [P0, ......P(M-1), P(M-1),.....P(M-1)]であり、次いで、残差が直接加算されて、再構成= P'i + R'iが生成される。代替的に、ゼロ予測子が拡張のために使用される。 The second scenario where N > M is described next. The motion compensated point (Pi) in the 1D array is expanded using the last value P(M-1), i.e. [P'0, ......P'(M-1), P'(M),.....P'(N-1)] = [P0, ......P(M-1), P(M-1),.....P( M-1)] and the residuals are then added directly to produce reconstruction = P'i + R'i. Alternatively, a zero predictor is used for expansion.

N < Mである第3のシナリオについて、次に説明する。残差が、第1のN点に直接加算され、すなわち、[P0, ......P(N-1)]であり、再構成= Pi + Riが生成される。 A third scenario in which N < M is described next. The residual is added directly to the first N points, ie [P0,...P(N-1)], producing reconstruction = Pi + Ri.

G-PCCエンコーダ200および/またはG-PCCデコーダ300は、ネイバーからのMV予測を実行するように構成され得る。空間時間的近隣インター8分木リーフのMVから、現在の8分木リーフのMVを予測することが可能であり、対応するMV差がシグナリングされ得る。MV予測インデックスは、複数の空間時間的候補がある場合にシグナリングされ得る。空間時間的ネイバー候補に加えて、最近の履歴に基づいて、MV候補リストにおいて前に使用されたMV候補を追加することも可能である。 G-PCC encoder 200 and/or G-PCC decoder 300 may be configured to perform MV prediction from neighbors. From the MVs of the spatiotemporally neighboring inter-octree leaves, it is possible to predict the MV of the current octree leaf, and the corresponding MV difference can be signaled. The MV prediction index may be signaled when there are multiple spatiotemporal candidates. In addition to spatiotemporal neighbor candidates, it is also possible to add previously used MV candidates in the MV candidate list based on recent history.

シグナリングされた「マージフラグ」を指定することによって、MV情報を空間時間的近隣とマージすることも可能である。マージインデックスは、複数の空間時間的候補がある場合にシグナリングされ得る。 It is also possible to merge MV information with spatiotemporal neighbors by specifying a signaled "merge flag." A merge index may be signaled when there are multiple spatiotemporal candidates.

G-PCCエンコーダ200および/またはG-PCCデコーダ300は、1次残差をスキップすることを実行するように構成され得る。 G-PCC encoder 200 and/or G-PCC decoder 300 may be configured to perform skipping the first order residual.

良好なインター予測の場合、角度モードが使用可能にされるときに適用可能である1次残差は、典型的には小さいか、またはさらには0に近い。そのような場合、8分木リーフボリューム内のすべての点について、完全に1次残差をスキップすることも可能である。そのため、primary_residual_skipフラグが、8分木リーフボリュームについてシグナリングされ得る。この場合、元の点と予測された点との間の差が、2次残差において完全にコーディングされる。 For good inter-prediction, the first-order residual that is applicable when angular mode is enabled is typically small or even close to zero. In such cases, it is also possible to skip the first-order residuals completely for all points in the octree leaf volume. Therefore, the primary_residual_skip flag may be signaled for octree leaf volumes. In this case, the difference between the original point and the predicted point is fully coded in the quadratic residual.

代替的に、primary_residual_skip_flagが、8分木リーフボリュームよりも高い8分木レベルにおいてシグナリングされ得、8分木レベルに関連付けられる1つまたは複数の8分木リーフに適用される。 Alternatively, primary_residual_skip_flag may be signaled at an octree level higher than the octree leaf volume and applied to one or more octree leaves associated with the octree level.

以下のテーブルは、インター予測された8分木リーフのためのシンタックステーブルである。 The table below is the syntax table for inter-predicted octree leaves.

本開示の様々な態様における例は、個々にまたは任意の組合せで使用され得る。 Examples of various aspects of this disclosure may be used individually or in any combination.

図14は、本開示の1つまたは複数の技法とともに使用され得る、例示的な距離測定システム600を示す概念図である。図14の例では、距離測定システム600は、照明器602およびセンサー604を含む。照明器602は、光606を放射し得る。いくつかの例では、照明器602は、1つまたは複数のレーザービームとして、光606を放射し得る。光606は、赤外波長または可視光波長などの1つまたは複数の波長内にあり得る。他の例では、光606は、コヒーレントなレーザー光ではない。光606がオブジェクト608などのオブジェクトに遭遇したとき、光606は戻り光610を生じる。戻り光610は、後方散乱光および/または反射光を含み得る。戻り光610は、オブジェクト608の画像612をセンサー604上に生じるように戻り光610を導くレンズ611を通過し得る。センサー604は、画像612に基づいて信号618を生成する。画像612は、(たとえば、図14の画像612の中のドットによって表されるような)点のセットを含み得る。 FIG. 14 is a conceptual diagram illustrating an example distance measurement system 600 that may be used with one or more techniques of this disclosure. In the example of FIG. 14, distance measurement system 600 includes an illuminator 602 and a sensor 604. Illuminator 602 may emit light 606. In some examples, illuminator 602 may emit light 606 as one or more laser beams. Light 606 can be within one or more wavelengths, such as infrared wavelengths or visible light wavelengths. In other examples, light 606 is not coherent laser light. When light 606 encounters an object, such as object 608, light 606 produces returned light 610. Returned light 610 may include backscattered light and/or reflected light. Return light 610 may pass through lens 611 which directs return light 610 to produce an image 612 of object 608 on sensor 604. Sensor 604 generates signal 618 based on image 612. Image 612 may include a set of points (eg, as represented by the dots in image 612 of FIG. 14).

いくつかの例では、照明器602およびセンサー604は、照明器602およびセンサー604が環境の360度ビューをキャプチャするように、回転構造体上に取り付けられ得る。他の例では、距離測定システム600は、照明器602およびセンサー604が特定の範囲(たとえば、360度まで)の中でオブジェクトを検出することを可能にする、1つまたは複数の光学構成要素(たとえば、ミラー、コリメータ、回折格子など)を含み得る。図14の例は単一の照明器602およびセンサー604のみを示しているが、距離測定システム600は照明器およびセンサーの複数のセットを含み得る。 In some examples, illuminator 602 and sensor 604 may be mounted on a rotating structure such that illuminator 602 and sensor 604 capture a 360 degree view of the environment. In other examples, the ranging system 600 includes one or more optical components ( (eg, mirrors, collimators, diffraction gratings, etc.). Although the example of FIG. 14 shows only a single illuminator 602 and sensor 604, the distance measurement system 600 may include multiple sets of illuminators and sensors.

いくつかの例では、照明器602は、構造化光パターンを生成する。そのような例では、距離測定システム600は、その上で構造化光パターンのそれぞれの画像が形成される複数のセンサー604を含み得る。距離測定システム600は、そこから構造化光パターンが後方散乱するオブジェクト608までの距離を決定するために、構造化光パターンの画像間の視差を使用し得る。構造化光ベースの距離測定システムは、オブジェクト608がセンサー604に比較的近い(たとえば、0.2メートルから2メートル)とき、高レベルの精度(たとえば、サブミリメートル範囲での精度)を有し得る。この高レベルの精度は、モバイルデバイス(たとえば、モバイルフォン、タブレットコンピュータなど)をロック解除するなどの顔認識用途において、およびセキュリティ用途のために有用であり得る。 In some examples, illuminator 602 generates a structured light pattern. In such an example, the distance measurement system 600 may include a plurality of sensors 604 on which respective images of the structured light pattern are formed. The distance measurement system 600 may use the disparity between images of the structured light pattern to determine the distance to the object 608 from which the structured light pattern backscatters. Structured light-based distance measurement systems may have a high level of accuracy (eg, accuracy in the sub-millimeter range) when object 608 is relatively close to sensor 604 (eg, 0.2 meters to 2 meters). This high level of accuracy can be useful in facial recognition applications such as unlocking mobile devices (eg, mobile phones, tablet computers, etc.) and for security applications.

いくつかの例では、距離測定システム600は飛行時間(ToF)ベースのシステムである。距離測定システム600がToFベースのシステムであるいくつかの例では、照明器602は光のパルスを生成する。言い換えれば、照明器602は、放射光606の振幅を変調し得る。そのような例では、センサー604は、照明器602によって生成された光606のパルスから戻り光610を検出する。次いで、距離測定システム600は、光606が放射されるときと検出されるときとの間の遅延と、知られている空気中の光の速度とに基づいて、光606がそこから後方散乱するオブジェクト608までの距離を決定し得る。いくつかの例では、放射光606の振幅を変調するのではなく(または、それに加えて)、照明器602は、放射光606の位相を変調してもよい。そのような例では、センサー604は、オブジェクト608からの戻り光610の位相を検出し、光の速度を使用して、および照明器602が特定の位相において光606を生成したときと、センサー604がその特定の位相において戻り光610を検出したときとの間の時間差に基づいて、オブジェクト608の上の点までの距離を決定し得る。 In some examples, distance measurement system 600 is a time-of-flight (ToF)-based system. In some examples where distance measurement system 600 is a ToF-based system, illuminator 602 generates pulses of light. In other words, illuminator 602 may modulate the amplitude of emitted light 606. In such an example, sensor 604 detects return light 610 from the pulse of light 606 generated by illuminator 602. The distance measurement system 600 then backscatters the light 606 therefrom based on the delay between when the light 606 is emitted and when it is detected and the known speed of light in air. A distance to object 608 may be determined. In some examples, instead of (or in addition to) modulating the amplitude of the radiation 606, the illuminator 602 may modulate the phase of the radiation 606. In such an example, the sensor 604 detects the phase of the returned light 610 from the object 608 and uses the speed of light and when the illuminator 602 produces the light 606 at a particular phase and the sensor 604 may determine the distance to a point above object 608 based on the time difference between when it detects returned light 610 at that particular phase.

他の例では、点群は、照明器602を使用することなしに生成され得る。たとえば、いくつかの例では、距離測定システム600のセンサー604は、2つ以上の光学カメラを含み得る。そのような例では、距離測定システム600は、光学カメラを使用して、オブジェクト608を含む環境の立体画像をキャプチャし得る。次いで、距離測定システム600(たとえば、点群生成器620)は、立体画像の中のロケーションの間の視差を計算し得る。次いで、距離測定システム600は、視差を使用して、立体画像の中で示されるロケーションまでの距離を決定し得る。これらの距離から、点群生成器620は、点群を生成し得る。 In other examples, the point cloud may be generated without using illuminator 602. For example, in some examples, sensor 604 of distance measurement system 600 may include two or more optical cameras. In such an example, ranging system 600 may use an optical camera to capture a stereoscopic image of the environment including object 608. Distance measurement system 600 (eg, point cloud generator 620) may then calculate disparity between locations in the stereo image. Distance measurement system 600 may then use the parallax to determine the distance to the location shown in the stereo image. From these distances, point cloud generator 620 may generate a point cloud.

センサー604は、色および反射情報などの、オブジェクト608の他の属性も検出し得る。図14の例では、点群生成器620は、センサー604によって生成された信号618に基づいて点群を生成し得る。距離測定システム600および/または点群生成器620は、データソース104(図1)の一部を形成し得る。 Sensor 604 may also detect other attributes of object 608, such as color and reflective information. In the example of FIG. 14, point cloud generator 620 may generate a point cloud based on signal 618 generated by sensor 604. Distance measurement system 600 and/or point cloud generator 620 may form part of data source 104 (FIG. 1).

図15は、本開示の1つまたは複数の技法が使用され得る、例示的な車両ベースシナリオを示す概念図である。図15の例では、車両700は、LIDARシステムなどのレーザーパッケージ702を含む。図15の例に示されていないが、車両700は、同じく、データソースと、G-PCCエンコーダ200(図1)などのG-PCCエンコーダとを含み得る。図15の例では、レーザーパッケージ702は、道路の中の歩行者706または他のオブジェクトに反射するレーザービーム704を放射する。車両700のデータソースは、レーザーパッケージ702によって生成された信号に基づいて点群を生成し得る。車両700のG-PCCエンコーダは、点群を符号化して、ビットストリーム708を生成し得る。ビットストリーム708は、G-PCCエンコーダによって取得される符号化されていない点群よりもはるかに少ないビットを含んでよい。車両700の出力インターフェース(たとえば、出力インターフェース108(図1))は、ビットストリーム708を1つまたは複数の他のデバイスに送信し得る。したがって、車両700は、符号化されていない点群データよりも迅速にビットストリーム708を他のデバイスに送信することが可能であり得る。追加として、ビットストリーム708は、より少ないデータ記憶容量を必要とし得る。 FIG. 15 is a conceptual diagram illustrating an example vehicle-based scenario in which one or more techniques of this disclosure may be used. In the example of FIG. 15, vehicle 700 includes a laser package 702, such as a LIDAR system. Although not shown in the example of FIG. 15, vehicle 700 may also include a data source and a G-PCC encoder, such as G-PCC encoder 200 (FIG. 1). In the example of FIG. 15, laser package 702 emits a laser beam 704 that reflects off pedestrians 706 or other objects in the road. A data source on vehicle 700 may generate a point cloud based on signals generated by laser package 702. A G-PCC encoder in vehicle 700 may encode the point cloud to generate bitstream 708. Bitstream 708 may include far fewer bits than the uncoded point cloud obtained by the G-PCC encoder. An output interface of vehicle 700 (eg, output interface 108 (FIG. 1)) may transmit bitstream 708 to one or more other devices. Therefore, vehicle 700 may be able to transmit bitstream 708 to other devices more quickly than unencoded point cloud data. Additionally, bitstream 708 may require less data storage capacity.

図15の例では、車両700は、ビットストリーム708を別の車両710に送信し得る。車両710は、G-PCCデコーダ300(図1)などのG-PCCデコーダを含み得る。車両710のG-PCCデコーダは、ビットストリーム708を復号して、点群を再構成し得る。車両710は、再構成された点群を様々な目的のために使用してよい。たとえば、車両710は、再構成された点群に基づいて、歩行者706が車両700の前方の道路にいることを決定し、したがって、たとえば、歩行者706が道路にいることを車両710の運転者が認識する前でも、減速し始めることができる。したがって、いくつかの例では、車両710は、再構成された点群に基づいて、自律ナビゲーション動作を実行し、通知もしくは警告を生成し、または別のアクションを実行し得る。 In the example of FIG. 15, vehicle 700 may transmit bitstream 708 to another vehicle 710. Vehicle 710 may include a G-PCC decoder, such as G-PCC decoder 300 (FIG. 1). A G-PCC decoder in vehicle 710 may decode bitstream 708 to reconstruct the point cloud. Vehicle 710 may use the reconstructed point cloud for various purposes. For example, vehicle 710 determines that pedestrian 706 is on the road in front of vehicle 700 based on the reconstructed point cloud, and thus, for example, It can begin to slow down even before the person is aware of it. Thus, in some examples, vehicle 710 may perform autonomous navigation operations, generate notifications or alerts, or take other actions based on the reconstructed point cloud.

追加または代替として、車両700は、ビットストリーム708をサーバシステム712に送信し得る。サーバシステム712は、ビットストリーム708を様々な目的のために使用し得る。たとえば、サーバシステム712は、点群のその後の再構成のためにビットストリーム708を記憶し得る。この例では、サーバシステム712は、他のデータ(たとえば、車両700によって生成された車両テレメトリデータ)とともに点群を使用して、自律運転システムを訓練し得る。他の例では、サーバシステム712は、法科学的衝突調査のため(たとえば、車両700が歩行者706と衝突した場合)のその後の再構成のためにビットストリーム708を記憶し得るか、またはナビゲーションのための通知もしくは命令を車両700もしくは車両710に送信し得る。 Additionally or alternatively, vehicle 700 may transmit bitstream 708 to server system 712. Server system 712 may use bitstream 708 for various purposes. For example, server system 712 may store bitstream 708 for subsequent reconstruction of the point cloud. In this example, server system 712 may use the point cloud along with other data (eg, vehicle telemetry data generated by vehicle 700) to train the autonomous driving system. In other examples, server system 712 may store bitstream 708 for subsequent reconstruction for a forensic crash investigation (e.g., if vehicle 700 collides with pedestrian 706) or for navigation. A notification or instruction for the purpose may be sent to vehicle 700 or vehicle 710.

図16は、本開示の1つまたは複数の技法が使用され得る、例示的なエクステンデッドリアリティシステムを示す概念図である。エクステンデッドリアリティ(XR)は、拡張現実(AR)、複合現実(MR)、および仮想現実(VR)を含む様々な技術をカバーするために使用される用語である。図16の例では、ユーザ800は、第1のロケーション802に位置する。ユーザ800は、XRヘッドセット804を装着している。XRヘッドセット804の代替として、ユーザ800は、モバイルデバイス(たとえば、モバイルフォン、タブレットコンピュータなど)を使用してもよい。XRヘッドセット804は、第1のロケーション802におけるオブジェクト806の上の点の位置を検出する、LIDARシステムなどの深度検出センサーを含む。XRヘッドセット804のデータソースは、深度検出センサーによって生成された信号を使用して、第1のロケーション802におけるオブジェクト806の点群表現を生成し得る。XRヘッドセット804は、点群を符号化してビットストリーム808を生成するように構成されたG-PCCエンコーダ(たとえば、図1のG-PCCエンコーダ200)を含み得る。 FIG. 16 is a conceptual diagram illustrating an example extended reality system in which one or more techniques of this disclosure may be used. Extended reality (XR) is a term used to cover a variety of technologies including augmented reality (AR), mixed reality (MR), and virtual reality (VR). In the example of FIG. 16, user 800 is located at first location 802. In the example of FIG. User 800 is wearing XR headset 804. As an alternative to XR headset 804, user 800 may use a mobile device (eg, mobile phone, tablet computer, etc.). XR headset 804 includes a depth sensing sensor, such as a LIDAR system, that detects the position of a point above object 806 at first location 802. The data source of the XR headset 804 may use the signals generated by the depth sensing sensors to generate a point cloud representation of the object 806 at the first location 802. XR headset 804 may include a G-PCC encoder (eg, G-PCC encoder 200 of FIG. 1) configured to encode the point cloud and generate bitstream 808.

XRヘッドセット804は、ビットストリーム808を(たとえば、インターネットなどのネットワークを介して)、第2のロケーション814におけるユーザ812によって装着されたXRヘッドセット810に送信し得る。XRヘッドセット810は、ビットストリーム808を復号して、点群を再構成し得る。XRヘッドセット810は、点群を使用して、第1のロケーション802におけるオブジェクト806を表すXR可視化(たとえば、AR、MR、VR可視化)を生成し得る。したがって、XRヘッドセット810がVR可視化を生成するときなどの、いくつかの例では、ロケーション814におけるユーザ812は、第1のロケーション802の3D没入型体験を有し得る。いくつかの例では、XRヘッドセット810は、再構成された点群に基づいて仮想オブジェクトの位置を決定し得る。たとえば、XRヘッドセット810は、再構成された点群に基づいて、環境(たとえば、第1のロケーション802)が平坦な表面を含むと決定し、次いで、仮想オブジェクト(たとえば、漫画のキャラクター)が平坦な表面上に配置されるべきであると決定し得る。XRヘッドセット810は、仮想オブジェクトが決定された位置にあるXR可視化を生成し得る。たとえば、XRヘッドセット810は、平坦な表面に座っている漫画のキャラクターを示し得る。 XR headset 804 may transmit bitstream 808 (eg, over a network such as the Internet) to XR headset 810 worn by user 812 at a second location 814. XR headset 810 may decode bitstream 808 and reconstruct the point cloud. XR headset 810 may use the point cloud to generate an XR visualization (eg, AR, MR, VR visualization) representing object 806 at first location 802. Thus, in some examples, such as when XR headset 810 generates a VR visualization, user 812 at location 814 may have a 3D immersive experience of first location 802. In some examples, XR headset 810 may determine the position of the virtual object based on the reconstructed point cloud. For example, the XR headset 810 determines based on the reconstructed point cloud that the environment (e.g., first location 802) includes a flat surface and then determines that the virtual object (e.g., a cartoon character) It may be determined that it should be placed on a flat surface. XR headset 810 may generate an XR visualization with virtual objects at determined positions. For example, the XR headset 810 may show a cartoon character sitting on a flat surface.

図17は、本開示の1つまたは複数の技法が使用され得る例示的なモバイルデバイスシステムを示す概念図である。図17の例では、モバイルフォンまたはタブレットコンピュータなどのモバイルデバイス900は、モバイルデバイス900の環境内のオブジェクト902上の点の位置を検出する、LIDARシステムなどの深度検出センサーを含む。モバイルデバイス900のデータソースは、深度検出センサーによって生成された信号を使用して、オブジェクト902の点群表現を生成し得る。モバイルデバイス900は、点群を符号化してビットストリーム904を生成するように構成されたG-PCCエンコーダ(たとえば、図1のG-PCCエンコーダ200)を含み得る。図17の例では、モバイルデバイス900は、ビットストリームをサーバシステムまたは他のモバイルデバイスなどのリモートデバイス906に送信し得る。リモートデバイス906は、ビットストリーム904を復号して、点群を再構成し得る。リモートデバイス906は、点群を様々な目的のために使用し得る。たとえば、リモートデバイス906は、点群を使用して、モバイルデバイス900の環境のマップを生成し得る。たとえば、リモートデバイス906は、再構成された点群に基づいて建物の内部のマップを生成し得る。別の例では、リモートデバイス906は、点群に基づいて像(たとえば、コンピュータグラフィックス)を生成し得る。たとえば、リモートデバイス906は、点群の点をポリゴンの頂点として使用し、点の色属性を、ポリゴンをシェーディングするための基礎として使用し得る。いくつかの例では、リモートデバイス906は、点群を使用して顔認識を実行し得る。 FIG. 17 is a conceptual diagram illustrating an example mobile device system in which one or more techniques of this disclosure may be used. In the example of FIG. 17, a mobile device 900, such as a mobile phone or tablet computer, includes a depth sensing sensor, such as a LIDAR system, that detects the location of a point on an object 902 within the environment of the mobile device 900. A data source on mobile device 900 may generate a point cloud representation of object 902 using the signals generated by the depth sensing sensor. Mobile device 900 may include a G-PCC encoder (eg, G-PCC encoder 200 of FIG. 1) configured to encode the point cloud and generate bitstream 904. In the example of FIG. 17, mobile device 900 may send the bitstream to a remote device 906, such as a server system or other mobile device. Remote device 906 may decode bitstream 904 and reconstruct the point cloud. Remote device 906 may use the point cloud for various purposes. For example, remote device 906 may use a point cloud to generate a map of mobile device 900's environment. For example, remote device 906 may generate a map of the interior of a building based on the reconstructed point cloud. In another example, remote device 906 may generate an image (eg, computer graphics) based on a point cloud. For example, remote device 906 may use points in the point cloud as vertices of a polygon and use the color attributes of the points as a basis for shading the polygon. In some examples, remote device 906 may perform facial recognition using a point cloud.

図18は、点群データを含むビットストリームを復号するための例示的な動作を示すフローチャートである。G-PCCデコーダ300は、点群を復号することの一部として、図18の動作を実行し得る。図18の例では、G-PCCデコーダ300は、点群を含む空間の8分木ベースの分割を定義する、8分木を決定する(1000)。8分木のリーフノードは、点群の1つまたは複数の点を含む。 FIG. 18 is a flowchart illustrating example operations for decoding a bitstream that includes point cloud data. G-PCC decoder 300 may perform the operations of FIG. 18 as part of decoding the point cloud. In the example of FIG. 18, G-PCC decoder 300 determines an octree (1000) that defines an octree-based partitioning of the space containing the point cloud. A leaf node of the octree contains one or more points of the point cloud.

G-PCCデコーダ300は、リーフノードにおける1つまたは複数の点の各々の位置を直接コーディングする(1002)。リーフノードにおける1つまたは複数の点の各々の位置を直接コーディングするために、G-PCCデコーダ300は、1つまたは複数の点の予測を生成し(1004)、予測に基づいて、1つまたは複数の点を決定する(1006)。リーフノードにおける1つまたは複数の点の各々の位置を直接復号するために、G-PCCデコーダ300は、フラグを受信することであって、フラグのための第1の値が、1つまたは複数の点の予測がイントラ予測によって生成されることを示し、フラグのための第2の値が、1つまたは複数の点の予測がインター予測によって生成されることを示す、こと、およびフラグの値に基づいて、イントラ予測またはインター予測を使用して、1つまたは複数の点を復号することを行うように構成され得る。 G-PCC decoder 300 directly codes the position of each of one or more points in a leaf node (1002). To directly code the position of each of the one or more points at a leaf node, G-PCC decoder 300 generates a prediction of the one or more points (1004) and, based on the prediction, one or more points. Determine multiple points (1006). To directly decode the position of each of the one or more points in a leaf node, the G-PCC decoder 300 receives a flag, the first value for the flag being one or more a second value for the flag indicates that the prediction for one or more points is generated by inter-prediction, and a value for the flag may be configured to decode one or more points using intra-prediction or inter-prediction based on.

G-PCCデコーダ300は、点群のためのビットストリームにおいて、リーフノードのボリュームを指定する8分木リーフボリュームを受信するように構成され得る。たとえば、点群全体がW×W×W直方体内にカプセル化されると仮定する。点群は、再帰的に分割され得、所与の分割深度dについて、8分木リーフボリュームはW/2^d×W/2^d×W/2^dである。このレベルにおいて、占有フラグ(バイナリ)がシグナリングされ得、占有フラグが1に等しいとき、直方体が少なくとも1つの点を有することを示す。占有フラグが1であるとき、次いで、さらなるイントラまたはインターフラグがシグナリングされ得、それぞれ、直方体の内部の点がイントラ予測されるか、インター予測されるかを示す。 G-PCC decoder 300 may be configured to receive octree leaf volumes that specify volumes of leaf nodes in a bitstream for a point cloud. For example, assume that the entire point cloud is encapsulated within a W×W×W cuboid. The point cloud may be partitioned recursively, and for a given partitioning depth d, the octree leaf volume is W/2 ^d ×W/2 ^d ×W/2 ^d . At this level, an occupancy flag (binary) can be signaled, when the occupancy flag is equal to 1, it indicates that the cuboid has at least one point. When the occupancy flag is 1, a further intra or inter flag may then be signaled, indicating whether the points inside the cuboid are intra-predicted or inter-predicted, respectively.

1つまたは複数の点の予測を生成するために、G-PCCデコーダ300は、イントラ予測を使用して、1つまたは複数の点の予測を生成するようにさらに構成され得、イントラ予測を使用して、1つまたは複数の点の予測を生成するために、G-PCCデコーダ300は、1つまたは複数の点のためのローカル予測木を決定するようにさらに構成され得る。 To generate the one or more point predictions, the G-PCC decoder 300 may be further configured to generate the one or more point predictions using intra prediction; In order to generate predictions for the one or more points, G-PCC decoder 300 may be further configured to determine a local prediction tree for the one or more points.

予測に基づいて、1つまたは複数の点を決定するために、G-PCCデコーダ300は、点群のためのビットストリームにおいて、1つまたは複数の点の各々のための予測モード、1次残差、および2次残差のうちの少なくとも1つを受信するように構成され得る。1つまたは複数の点の予測を生成するために、G-PCCデコーダ300は、インター予測を使用して、1つまたは複数の点の予測を生成するように構成され得、インター予測を使用して、1つまたは複数の点の予測を生成するために、G-PCCデコーダ300は、参照点群フレーム内の点の同様のセットを決定するために、1つまたは複数の点を用いて動き推定を実行するようにさらに構成され得る。 To determine one or more points based on the prediction, G-PCC decoder 300 determines the prediction mode, first-order residual, for each of the one or more points in the bitstream for the point cloud. and a quadratic residual. To generate one or more point predictions, G-PCC decoder 300 may be configured to generate one or more point predictions using inter prediction, and may be configured to generate one or more point predictions using inter prediction. To generate a prediction of one or more points, G-PCC decoder 300 uses the one or more points to determine a similar set of points in the reference point cloud frame. It may be further configured to perform estimation.

1つまたは複数の点の予測を生成するために、G-PCCデコーダ300は、インター予測を使用して、1つまたは複数の点の予測を生成するようにさらに構成され得、インター予測を使用して、1つまたは複数の点の予測を生成するために、G-PCCデコーダ300は、参照点群フレーム内の点のセットに基づいて、1つまたは複数の点を予測するために、動き補償を実行するようにさらに構成され得る。動き補償を実行するために、G-PCCデコーダ300は、1つまたは複数の点の予測を決定するために、参照点群フレーム内の点のセットに動きベクトルを適用するようにさらに構成され得る。G-PCCデコーダ300は、空間時間的近隣インター8分木リーフの動きベクトルに基づいて、動きベクトルを予測するように構成され得る。 To generate the one or more point predictions, the G-PCC decoder 300 may be further configured to generate the one or more point predictions using inter prediction; In order to generate a prediction of one or more points, the G-PCC decoder 300 performs a motion estimation process to predict one or more points based on a set of points in a reference point cloud frame. It may be further configured to perform compensation. To perform motion compensation, G-PCC decoder 300 may be further configured to apply a motion vector to a set of points in a reference point cloud frame to determine a prediction of one or more points. . G-PCC decoder 300 may be configured to predict motion vectors based on motion vectors of spatiotemporally neighboring inter-octtree leaves.

G-PCCデコーダ300は、点群データから点群を再構成するようにさらに構成され得る。点群を再構成することの一部として、G-PCCデコーダ300は、平面位置に基づいて、点群の1つまたは複数の点の位置を決定するようにさらに構成され得る。 G-PCC decoder 300 may be further configured to reconstruct a point cloud from the point cloud data. As part of reconstructing the point cloud, G-PCC decoder 300 may be further configured to determine the location of one or more points of the point cloud based on the planar position.

例に応じて、本明細書で説明した技法のうちのいずれかのいくつかの行為またはイベントが、異なるシーケンスで実行されることが可能であり、追加、統合、または完全に除外されてよい(たとえば、説明したすべての行為またはイベントが技法の実践にとって必要であるとは限らない)ことを認識されたい。その上、いくつかの例では、行為またはイベントは、連続的にではなく、たとえば、マルチスレッド処理、割込み処理、または複数のプロセッサを通じて並行して実行されてよい。 Depending on the example, some acts or events of any of the techniques described herein may be performed in different sequences, and may be added, integrated, or excluded entirely ( For example, it is recognized that not all actions or events described may be necessary for the practice of a technique). Moreover, in some examples, acts or events may be performed in parallel, eg, through multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially.

1つまたは複数の例では、説明した機能は、ハードウェア、ソフトウェア、ファームウェア、またはそれらの任意の組合せとして実装され得る。ソフトウェアにおいて実装される場合、機能は、1つまたは複数の命令またはコードとして、コンピュータ可読記録媒体上に記憶されるか、またはコンピュータ可読記録媒体を介して送信され、ハードウェアベースの処理ユニットによって実行され得る。コンピュータ可読記録媒体は、データ記憶媒体などの有形媒体に対応するコンピュータ可読記憶媒体、または、たとえば、通信プロトコルに従って、ある場所から別の場所へのコンピュータプログラムの転送を容易にする任意の媒体を含む通信媒体を含み得る。このように、コンピュータ可読記録媒体は一般に、(1)非一時的な有形コンピュータ可読記憶媒体、または(2)信号もしくは搬送波などの通信媒体に対応し得る。データ記憶媒体は、本開示で説明した技法の実装のための命令、コード、および/またはデータ構造を取り出すために、1つもしくは複数のコンピュータまたは1つもしくは複数のプロセッサによってアクセスされ得る、任意の利用可能な媒体であってよい。コンピュータプログラム製品は、コンピュータ可読記録媒体を含み得る。 In one or more examples, the described functionality may be implemented as hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over a computer-readable medium as one or more instructions or code for execution by a hardware-based processing unit. can be done. Computer-readable storage media include computer-readable storage media corresponding to tangible media such as data storage media or any medium that facilitates transfer of a computer program from one place to another according to, e.g., a communication protocol. may include communication media. Thus, computer-readable storage media generally may correspond to (1) non-transitory, tangible computer-readable storage media, or (2) a communication medium such as a signal or carrier wave. A data storage medium can be any type of data storage medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and/or data structures for implementation of the techniques described in this disclosure. It can be any available medium. A computer program product may include a computer readable storage medium.

限定ではなく例として、そのようなコンピュータ可読記憶媒体は、RAM、ROM、EEPROM、CD-ROMもしくは他の光ディスクストレージ、磁気ディスクストレージもしくは他の磁気記憶デバイス、フラッシュメモリ、または、命令もしくはデータ構造の形態の所望のプログラムコードを記憶するために使用されコンピュータによってアクセスされ得る任意の他の媒体を備え得る。また、いかなる接続も適切にコンピュータ可読記録媒体と呼ばれる。たとえば、命令が、同軸ケーブル、光ファイバーケーブル、ツイストペア、デジタル加入者回線(DSL)、または赤外線、無線、およびマイクロ波などのワイヤレス技術を使用して、ウェブサイト、サーバ、または他のリモートソースから送信される場合、同軸ケーブル、光ファイバーケーブル、ツイストペア、DSL、または赤外線、無線、およびマイクロ波などのワイヤレス技術は、媒体の定義に含まれる。しかしながら、コンピュータ可読記憶媒体およびデータ記憶媒体は、接続、搬送波、信号、または他の一時的媒体を含まず、代わりに非一時的有形記憶媒体を対象とすることを理解されたい。本明細書で使用するディスク(disk)およびディスク(disc)は、コンパクトディスク(disc)(CD)、レーザーディスク(disc)、光ディスク(disc)、デジタル多用途ディスク(disc)(DVD)、フロッピーディスク(disk)およびブルーレイディスク(disc)を含み、ディスク(disk)は通常、データを磁気的に再生し、ディスク(disc)は、レーザーを用いてデータを光学的に再生する。上記の組合せもコンピュータ可読記録媒体の範囲内に含まれるべきである。 By way of example and not limitation, such computer-readable storage medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage device, flash memory, or a memory containing instructions or data structures. Any other medium that can be used to store desired form of program code and that can be accessed by a computer may be provided. Also, any connection is properly termed a computer-readable storage medium. For example, instructions may be transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave. If so, coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals or other transitory media, and instead refer to non-transitory tangible storage media. As used herein, "disk" and "disc" refer to compact disc (disc) (CD), laser disc (disc), optical disc (disc), digital versatile disc (disc) (DVD), and floppy disc. (discs) and Blu-ray discs (discs), with discs typically reproducing data magnetically and discs reproducing data optically using lasers. Combinations of the above should also be included within the scope of computer-readable media.

命令は、1つまたは複数のデジタル信号プロセッサ(DSP)、汎用マイクロプロセッサ、特定用途向け集積回路(ASIC)、フィールドプログラマブルゲートアレイ(FPGA)、または他の同等の集積論理回路もしくはディスクリート論理回路などの、1つまたは複数のプロセッサによって実行され得る。したがって、本明細書で使用する「プロセッサ」および「処理回路」という用語は、上記の構造、または本明細書で説明する技法の実装に適した任意の他の構造のいずれかを指すことがある。加えて、いくつかの態様では、本明細書で説明する機能は、符号化および復号のために構成された専用のハードウェアモジュールおよび/もしくはソフトウェアモジュール内で提供されてもよく、または複合コーデックに組み込まれてもよい。また、技法は、1つまたは複数の回路または論理要素において完全に実装され得る。 The instructions may be implemented in one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuits. , may be executed by one or more processors. Accordingly, the terms "processor" and "processing circuitry" as used herein may refer to any of the structures described above, or any other structure suitable for implementing the techniques described herein. . Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in combination codecs. May be incorporated. Also, the techniques may be implemented entirely in one or more circuits or logic elements.

本開示の技法は、ワイヤレスハンドセット、集積回路(IC)、またはICのセット(たとえば、チップセット)を含む、多種多様なデバイスまたは装置に実装され得る。開示した技法を実行するように構成されたデバイスの機能的態様を強調するために、様々な構成要素、モジュール、またはユニットについて本開示で説明したが、それらは必ずしも異なるハードウェアユニットによる実現を必要とするとは限らない。むしろ、上記で説明したように、様々なユニットは、コーデックハードウェアユニットにおいて組み合わされてもよく、または好適なソフトウェアおよび/もしくはファームウェアとともに、上記で説明したような1つもしくは複数のプロセッサを含む、相互動作可能なハードウェアユニットの集合によって提供されてもよい。 The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including wireless handsets, integrated circuits (ICs), or sets of ICs (eg, chipsets). Although various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, they do not necessarily require implementation by different hardware units. Not necessarily. Rather, as explained above, the various units may be combined in a codec hardware unit, or include one or more processors as explained above, together with suitable software and/or firmware. It may also be provided by a collection of interoperable hardware units.

以下の番号付き条項は、本開示で説明するデバイスおよび技法の1つまたは複数の態様を示す。 The numbered sections below indicate one or more aspects of the devices and techniques described in this disclosure.

条項1A:点群をコーディングする方法であって、点群を含む空間の8分木ベースの分割を定義する、8分木を決定するステップであって、8分木のリーフノードが点群の1つまたは複数の点を含み、リーフノードにおける1つまたは複数の点の各々の位置が直接シグナリングされる、ステップと、イントラ予測またはインター予測を使用して、1つまたは複数の点の予測を生成するステップと、1つまたは複数の点がイントラ予測を使用して予測されるか、インター予測を使用して予測されるかを示す、シンタックス要素をコーディングするステップとを含む方法。 Clause 1A: A method for coding a point cloud, the step of determining an octree defining an octree-based partition of the space containing the point cloud, wherein the leaf nodes of the octree are Predicting the one or more points using intra-prediction or inter-prediction, with steps including one or more points and the position of each of the one or more points at a leaf node being directly signaled. and coding a syntax element indicating whether the one or more points are predicted using intra prediction or inter prediction.

条項2A:リーフノードのボリュームを指定する8分木リーフボリュームが、ビットストリームにおいてシグナリングされる、条項1Aの方法。 Clause 2A: The method of Clause 1A, wherein an octree leaf volume specifying the volume of the leaf node is signaled in the bitstream.

条項3A:1つまたは複数の点の予測を生成するステップが、イントラ予測を使用して、1つまたは複数の点の予測を生成するステップを含み、イントラ予測を使用して、1つまたは複数の点の予測を生成するステップが、1つまたは複数の点のためのローカル予測木を決定するステップを含む、条項1Aまたは2Aの方法。 Clause 3A: The step of generating the one or more point predictions includes the step of generating the one or more point predictions using the intra prediction, and the step of generating the one or more point predictions using the intra prediction. The method of clause 1A or 2A, wherein the step of generating a prediction for the points includes the step of determining a local prediction tree for the one or more points.

条項4A:予測モード、1次残差、および2次残差のうちの少なくとも1つが、1つまたは複数の点の各々についてシグナリングされる、条項3Aの方法。 Clause 4A: The method of Clause 3A, wherein at least one of a prediction mode, a first-order residual, and a second-order residual is signaled for each of the one or more points.

条項5A:1つまたは複数の点の予測を生成するステップが、インター予測を使用して、1つまたは複数の点の予測を生成するステップを含み、インター予測を使用して、1つまたは複数の点の予測を生成するステップが、参照点群フレーム内の点の同様のセットを決定するために、1つまたは複数の点を用いて動き推定を実行するステップを含む、条項1Aまたは2Aの方法。 Clause 5A: The step of generating a prediction of one or more points comprises the step of generating a prediction of one or more points using inter prediction; of clause 1A or 2A, wherein the step of generating a prediction of the points in the reference point cloud frame comprises the step of performing motion estimation with the one or more points to determine a similar set of points in the reference point cloud frame. Method.

条項6A:1つまたは複数の点の予測を生成するステップが、インター予測を使用して、1つまたは複数の点の予測を生成するステップを含み、インター予測を使用して、1つまたは複数の点の予測を生成するステップが、参照点群フレーム内の点のセットに基づいて、1つまたは複数の点を予測するために、動き補償を実行するステップを含む、条項1A、2A、または5Aのいずれかの方法。 Clause 6A: Generating a prediction for the one or more points comprises generating a prediction for the one or more points using inter prediction; Clause 1A, 2A, or 5A either way.

条項7A:動き補償を実行するステップが、1つまたは複数の点の予測を決定するために、参照点群フレーム内の点のセットに動きベクトルを適用するステップを含む、条項6Aの方法。 Clause 7A: The method of Clause 6A, wherein the step of performing motion compensation comprises applying a motion vector to a set of points in a reference point cloud frame to determine a prediction of the one or more points.

条項8A:空間時間的近隣インター8分木リーフの動きベクトルに基づいて、動きベクトルを予測するステップをさらに含む、条項7Aの方法。 Clause 8A: The method of Clause 7A, further comprising predicting a motion vector based on the motion vectors of the spatiotemporal neighboring inter-octree leaves.

条項9A:点群を生成するステップをさらに含む、条項1Aから8Aのいずれかの方法。 Clause 9A: Any method of Clauses 1A to 8A further comprising the step of generating a point cloud.

条項10A:点群を処理するためのデバイスであって、条項1Aから9Aのいずれかの方法を実行するための1つまたは複数の手段を備えるデバイス。 Clause 10A: Device for processing point clouds, comprising one or more means for performing any of the methods of Clauses 1A to 9A.

条項11A:1つまたは複数の手段が、回路において実装された1つまたは複数のプロセッサを備える、条項10Aのデバイス。 Clause 11A: A device according to Clause 10A, wherein the one or more means comprises one or more processors implemented in a circuit.

条項12A:点群を表すデータを記憶するためのメモリをさらに備える、条項10Aまたは11Aのいずれかのデバイス。 Clause 12A: The device of any of Clause 10A or 11A, further comprising a memory for storing data representing a point cloud.

条項13A:デバイスがデコーダを備える、条項10Aから12Aのいずれかのデバイス。 Clause 13A: A device according to clauses 10A to 12A, where the device comprises a decoder.

条項14A:デバイスがエンコーダを備える、条項10Aから13Aのいずれかのデバイス。 Clause 14A: A device according to any of Clauses 10A to 13A, where the device comprises an encoder.

条項15A:点群を生成するためのデバイスをさらに備える、条項10Aから14Aのいずれかのデバイス。 Clause 15A: A device according to any of Clauses 10A to 14A, further comprising a device for generating a point cloud.

条項16A:点群に基づいて像を提示するためのディスプレイをさらに備える、条項10Aから15Aのいずれかのデバイス。 Clause 16A: A device according to any of Clauses 10A to 15A, further comprising a display for presenting an image based on a point cloud.

条項17A:命令を記憶したコンピュータ可読記憶媒体であって、命令が、実行されると、1つまたは複数のプロセッサに、条項1Aから9Aのいずれかの方法を実行させる、コンピュータ可読記憶媒体。 Clause 17A: A computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors to perform the method of any of Clauses 1A to 9A.

条項1B:点群データを含むビットストリームを復号するためのデバイスであって、点群データを記憶するためのメモリと、メモリに結合され、回路において実装された1つまたは複数のプロセッサとを備え、1つまたは複数のプロセッサが、点群を含む空間の8分木ベースの分割を定義する、8分木を決定することであって、8分木のリーフノードが、点群の1つまたは複数の点を含む、こと、および、リーフノードにおける1つまたは複数の点の各々の位置を直接復号することを行うように構成され、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するために、1つまたは複数のプロセッサが、1つまたは複数の点の予測を生成すること、および予測に基づいて、1つまたは複数の点を決定することを行うようにさらに構成される、デバイス。 Clause 1B: A device for decoding a bitstream containing point cloud data, comprising a memory for storing the point cloud data and one or more processors coupled to the memory and implemented in a circuit. , the one or more processors determining an octree defining an octree-based partition of a space containing the point cloud, wherein a leaf node of the octree is one or more of the point clouds; including a plurality of points, and configured to directly decode the position of each of the one or more points at the leaf node, and to directly decode the position of each of the one or more points at the leaf node. For decoding, the one or more processors are further configured to generate a prediction of the one or more points and, based on the prediction, determine the one or more points. ,device.

条項2B:リーフノードにおける1つまたは複数の点の各々の位置を直接復号するために、1つまたは複数のプロセッサが、フラグを受信することであって、フラグのための第1の値が、1つまたは複数の点の予測がイントラ予測によって生成されることを示し、フラグのための第2の値が、1つまたは複数の点の予測がインター予測によって生成されることを示す、こと、およびフラグの値に基づいて、イントラ予測またはインター予測を使用して、1つまたは複数の点を復号することを行うようにさらに構成される、条項1Bのデバイス。 Clause 2B: In order to directly decode the position of each of the one or more points in a leaf node, the one or more processors receive a flag, the first value for the flag being: indicating that the predictions for the one or more points are generated by intra-prediction, and a second value for the flag indicates that the predictions for the one or more points are generated by inter-prediction; and the device of clause 1B, further configured to decode the one or more points using intra-prediction or inter-prediction based on the value of the flag.

条項3B:1つまたは複数のプロセッサが、点群を含むビットストリームにおいて、リーフノードのボリュームを指定する8分木リーフボリュームを受信するようにさらに構成される、条項1Bのデバイス。 Clause 3B: The device of Clause 1B, wherein the one or more processors are further configured to receive octree leaf volumes specifying volumes of leaf nodes in the bitstream that includes the point cloud.

条項4B:1つまたは複数の点の予測を生成するために、1つまたは複数のプロセッサが、イントラ予測を使用して、1つまたは複数の点の予測を生成するようにさらに構成され、イントラ予測を使用して、1つまたは複数の点の予測を生成するために、1つまたは複数のプロセッサが、1つまたは複数の点のためのローカル予測木を決定するようにさらに構成される、条項1Bのデバイス。 Clause 4B: To generate the one or more point predictions, the one or more processors are further configured to generate the one or more point predictions using intra prediction; the one or more processors are further configured to determine a local prediction tree for the one or more points to use the prediction to generate a prediction for the one or more points; Clause 1B Devices.

条項5B:予測に基づいて、1つまたは複数の点を決定するために、1つまたは複数のプロセッサが、点群を含むビットストリームにおいて、1つまたは複数の点の各々のための予測モード、1次残差、および2次残差のうちの少なくとも1つを受信するようにさらに構成される、条項1Bのデバイス。 Clause 5B: In order to determine the one or more points based on the prediction, the one or more processors determine a prediction mode for each of the one or more points in the bitstream comprising the point cloud; The device of clause 1B, further configured to receive at least one of a first-order residual, and a second-order residual.

条項6B:1つまたは複数の点の予測を生成するために、1つまたは複数のプロセッサが、インター予測を使用して、1つまたは複数の点の予測を生成するようにさらに構成され、インター予測を使用して、1つまたは複数の点の予測を生成するために、1つまたは複数のプロセッサが、参照点群フレーム内の点の同様のセットを決定するために、1つまたは複数の点を用いて動き推定を実行するようにさらに構成される、条項1Bのデバイス。 Clause 6B: To generate the one or more point predictions, the one or more processors are further configured to generate the one or more point predictions using inter prediction; To generate a prediction of one or more points using prediction, one or more processors use one or more processors to determine a similar set of points in a reference point cloud frame. The device of clause 1B, further configured to perform motion estimation using points.

条項7B:1つまたは複数の点の予測を生成するために、1つまたは複数のプロセッサが、インター予測を使用して、1つまたは複数の点の予測を生成するようにさらに構成され、インター予測を使用して、1つまたは複数の点の予測を生成するために、1つまたは複数のプロセッサが、参照点群フレーム内の点のセットに基づいて、1つまたは複数の点を予測するために、動き補償を実行するようにさらに構成される、条項1Bのデバイス。 Clause 7B: The one or more processors are further configured to generate the one or more point predictions using inter-prediction to generate the one or more point predictions; Using prediction, one or more processors predict one or more points based on a set of points in a reference point cloud frame to generate a prediction of one or more points. The device of Clause 1B further configured to perform motion compensation in order to perform motion compensation.

条項8B:動き補償を実行するために、1つまたは複数のプロセッサが、1つまたは複数の点の予測を決定するために、参照点群フレーム内の点のセットに動きベクトルを適用するようにさらに構成される、条項7Bのデバイス。 Clause 8B: To perform motion compensation, one or more processors apply a motion vector to a set of points in a reference point cloud frame to determine a prediction of the one or more points. Further consisting of the device of clause 7B.

条項9B:1つまたは複数のプロセッサが、空間時間的近隣インター8分木リーフの動きベクトルに基づいて、動きベクトルを予測するようにさらに構成される、条項8Bのデバイス。 Clause 9B: The device of Clause 8B, wherein the one or more processors are further configured to predict motion vectors based on motion vectors of spatiotemporally neighboring inter-octtree leaves.

条項10B:1つまたは複数のプロセッサが、点群データから点群を再構成するようにさらに構成される、条項1Bのデバイス。 Clause 10B: The device of Clause 1B, wherein the one or more processors are further configured to reconstruct a point cloud from point cloud data.

条項11B:1つまたは複数のプロセッサが、点群を再構成することの一部として、平面位置に基づいて点群の1つまたは複数の点の位置を決定するように構成される、条項10Bのデバイス。 Clause 11B: Clause 10B, wherein the one or more processors are configured to determine the position of one or more points of the point cloud based on the planar position as part of reconstructing the point cloud. device.

条項11B:1つまたは複数のプロセッサが、点群を再構成することの一部として、リーフノードにおける1つまたは複数の点の各々の直接復号された位置に基づいて、点群の1つまたは複数の点の位置を決定するように構成される、条項10Bのデバイス。 Clause 11B: The one or more processors, as part of reconstructing the point cloud, calculate one or more of the point cloud based on the directly decoded position of each of the one or more points at the leaf nodes. Clause 10B device configured to determine the position of a plurality of points.

条項12B:1つまたは複数のプロセッサが、点群に基づいて建物の内部のマップを生成するようにさらに構成される、条項11Bのデバイス。 Clause 12B: The device of Clause 11B, wherein the one or more processors are further configured to generate a map of the interior of a building based on the point cloud.

条項13B:1つまたは複数のプロセッサが、点群に基づいて自律ナビゲーション動作を実行するようにさらに構成される、条項11Bのデバイス。 Clause 13B: The device of Clause 11B, wherein the one or more processors are further configured to perform autonomous navigation operations based on the point cloud.

条項14B:1つまたは複数のプロセッサが、点群に基づいてコンピュータグラフィックスを生成するようにさらに構成される、条項11Bのデバイス。 Clause 14B: The device of Clause 11B, wherein the one or more processors are further configured to generate computer graphics based on the point cloud.

条項15B:1つまたは複数のプロセッサが、点群に基づいて仮想オブジェクトの位置を決定すること、および、仮想オブジェクトが決定された位置にある、エクステンデッドリアリティ(XR)可視化を生成することを行うように構成される、条項11Bのデバイス。 Clause 15B: The one or more processors are configured to determine the position of a virtual object based on a point cloud and to generate an extended reality (XR) visualization in which the virtual object is at the determined position. Clause 11B devices constituted by.

条項16B:点群に基づいて像を提示するためのディスプレイをさらに備える、条項11Bのデバイス。 Clause 16B: The device of Clause 11B further comprising a display for presenting an image based on a point cloud.

条項17B:デバイスがモバイルフォンまたはタブレットコンピュータである、条項1Bのデバイス。 Clause 17B: A Clause 1B device where the device is a mobile phone or tablet computer.

条項18B:デバイスが車両である、条項1Bのデバイス。 Clause 18B: A clause 1B device where the device is a vehicle.

条項19B:デバイスがエクステンデッドリアリティデバイスである、条項1Bのデバイス。 Clause 19B: A Clause 1B device where the device is an extended reality device.

条項20B:点群を復号する方法であって、点群を含む空間の8分木ベースの分割を定義する、8分木を決定するステップであって、8分木のリーフノードが、点群の1つまたは複数の点を含む、ステップと、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するステップとを含み、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するステップが、1つまたは複数の点の予測を生成するステップと、予測に基づいて、1つまたは複数の点を決定するステップとを含む、方法。 Clause 20B: A method for decoding a point cloud, the method comprising: determining an octree that defines an octree-based partition of a space containing the point cloud, wherein leaf nodes of the octree directly decoding the position of each of the one or more points at the leaf node, directly decoding the position of each of the one or more points at the leaf node. A method, wherein the step of decoding includes the steps of generating a prediction of one or more points and determining one or more points based on the prediction.

条項21B:リーフノードにおける1つまたは複数の点の各々の位置を直接復号するステップが、フラグを受信するステップであって、フラグのための第1の値が、1つまたは複数の点の予測がイントラ予測によって生成されることを示し、フラグのための第2の値が、1つまたは複数の点の予測がインター予測によって生成されることを示す、ステップと、フラグの値に基づいて、イントラ予測またはインター予測を使用して、1つまたは複数の点を復号するステップとをさらに含む、条項20Bの方法。 Clause 21B: Directly decoding the position of each of the one or more points at the leaf node is receiving a flag, the first value for the flag being a prediction of the one or more points. is generated by intra-prediction, and a second value for the flag indicates that the prediction for the one or more points is generated by inter-prediction, based on the value of the flag, and decoding the one or more points using intra-prediction or inter-prediction.

条項22B:点群のためのビットストリームにおいて、リーフノードのボリュームを指定する8分木リーフボリュームを受信するステップをさらに含む、条項20Bの方法。 Clause 22B: The method of Clause 20B, further comprising receiving an octree leaf volume specifying a volume of a leaf node in the bitstream for the point cloud.

条項23B:1つまたは複数の点の予測を生成するステップが、イントラ予測を使用して、1つまたは複数の点の予測を生成するステップを含み、イントラ予測を使用して、1つまたは複数の点の予測を生成するステップが、1つまたは複数の点のためのローカル予測木を決定するステップを含む、条項20Bの方法。 Clause 23B: The step of generating a prediction of one or more points includes the step of generating a prediction of one or more points using intra-prediction; The method of clause 20B, wherein the step of generating a prediction for the points includes determining a local prediction tree for the one or more points.

条項24B:予測に基づいて、1つまたは複数の点を決定するステップが、点群のためのビットストリームにおいて、1つまたは複数の点の各々のための予測モード、1次残差、および2次残差のうちの少なくとも1つを受信するステップを含む、条項20Bの方法。 Clause 24B: Determining one or more points based on the prediction comprises, in the bitstream for the point cloud, a prediction mode for each of the one or more points, a first-order residual, and two The method of clause 20B, comprising the step of receiving at least one of the following residuals:

条項25B:1つまたは複数の点の予測を生成するステップが、インター予測を使用して、1つまたは複数の点の予測を生成するステップを含み、インター予測を使用して、1つまたは複数の点の予測を生成するステップが、参照点群フレーム内の点の同様のセットを決定するために、1つまたは複数の点を用いて動き推定を実行するステップを含む、条項20Bの方法。 Clause 25B: Generating a prediction of the one or more points comprises generating a prediction of the one or more points using inter prediction; The method of clause 20B, wherein generating a prediction of points in the reference point cloud frame includes performing motion estimation with the one or more points to determine a similar set of points in the reference point cloud frame.

条項26B:1つまたは複数の点の予測を生成するステップが、インター予測を使用して、1つまたは複数の点の予測を生成するステップを含み、インター予測を使用して、1つまたは複数の点の予測を生成するステップが、参照点群フレーム内の点のセットに基づいて、1つまたは複数の点を予測するために、動き補償を実行するステップを含む、条項20Bの方法。 Clause 26B: The step of generating one or more point predictions includes the step of generating one or more point predictions using inter prediction; The method of clause 20B, wherein generating a prediction of the points in the reference point cloud frame includes performing motion compensation to predict the one or more points based on the set of points in the reference point cloud frame.

条項27B:動き補償を実行するステップが、1つまたは複数の点の予測を決定するために、参照点群フレーム内の点のセットに動きベクトルを適用するステップを含む、条項26Bの方法。 Clause 27B: The method of Clause 26B, wherein the step of performing motion compensation comprises applying a motion vector to a set of points in the reference point cloud frame to determine a prediction of the one or more points.

条項28B:空間時間的近隣インター8分木リーフの動きベクトルに基づいて、動きベクトルを予測するステップをさらに含む、条項27Bの方法。 Clause 28B: The method of Clause 27B, further comprising predicting a motion vector based on the motion vectors of the spatiotemporal neighboring inter-octree leaves.

条項29B:命令を記憶するコンピュータ可読記憶媒体であって、命令が、1つまたは複数のプロセッサによって実行されたとき、1つまたは複数のプロセッサに、点群を含む空間の8分木ベースの分割を定義する、8分木を決定することであって、8分木のリーフノードが、点群の1つまたは複数の点を含む、こと、および、リーフノードにおける1つまたは複数の点の各々の位置を直接復号することを行わせ、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するために、命令が、1つまたは複数のプロセッサに、1つまたは複数の点の予測を生成すること、および予測に基づいて、1つまたは複数の点を決定することを行わせる、コンピュータ可読記憶媒体。 Clause 29B: A computer-readable storage medium storing instructions that, when executed by one or more processors, cause an octree-based partitioning of a space containing a point cloud into one or more processors. determining an octree, the leaf nodes of the octree containing one or more points of the point cloud, and each of the one or more points in the leaf nodes Instructions cause one or more processors to directly decode the position of each of the one or more points in a leaf node. and determining one or more points based on the prediction.

条項30B:装置であって、点群を含む空間の8分木ベースの分割を定義する、8分木を決定するための手段であって、8分木のリーフノードが、点群の1つまたは複数の点を含む、手段と、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するための手段とを備え、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するための手段が、1つまたは複数の点の予測を生成するための手段と、予測に基づいて、1つまたは複数の点を決定するための手段とを備える、装置。 Clause 30B: Apparatus for determining an octree, defining an octree-based partitioning of a space containing point clouds, wherein a leaf node of the octree is one of the point clouds. or a plurality of points, and means for directly decoding the position of each of the one or more points at the leaf node, directly decoding the position of each of the one or more points at the leaf node. An apparatus, wherein the means for generating a prediction of one or more points and means for determining one or more points based on the prediction.

条項1C:点群データを含むビットストリームを復号するためのデバイスであって、点群データを記憶するためのメモリと、メモリに結合され、回路において実装された1つまたは複数のプロセッサとを備え、1つまたは複数のプロセッサが、点群を含む空間の8分木ベースの分割を定義する、8分木を決定することであって、8分木のリーフノードが、点群の1つまたは複数の点を含む、こと、および、リーフノードにおける1つまたは複数の点の各々の位置を直接復号することを行うように構成され、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するために、1つまたは複数のプロセッサが、1つまたは複数の点の予測を生成すること、および予測に基づいて、1つまたは複数の点を決定することを行うようにさらに構成される、デバイス。 Clause 1C: A device for decoding a bitstream containing point cloud data, comprising a memory for storing the point cloud data and one or more processors coupled to the memory and implemented in a circuit. , the one or more processors determining an octree defining an octree-based partition of a space containing the point cloud, wherein a leaf node of the octree is one or more of the point clouds; including a plurality of points, and configured to directly decode the position of each of the one or more points at the leaf node, and to directly decode the position of each of the one or more points at the leaf node. For decoding, the one or more processors are further configured to generate a prediction of the one or more points and, based on the prediction, determine the one or more points. ,device.

条項2C:リーフノードにおける1つまたは複数の点の各々の位置を直接復号するために、1つまたは複数のプロセッサが、フラグを受信することであって、フラグのための第1の値が、1つまたは複数の点の予測がイントラ予測によって生成されることを示し、フラグのための第2の値が、1つまたは複数の点の予測がインター予測によって生成されることを示す、こと、およびフラグの値に基づいて、イントラ予測またはインター予測を使用して、1つまたは複数の点を復号することを行うようにさらに構成される、条項1Cのデバイス。 Clause 2C: In order to directly decode the position of each of the one or more points in a leaf node, the one or more processors receive a flag, the first value for the flag being: indicating that the predictions for the one or more points are generated by intra-prediction, and a second value for the flag indicates that the predictions for the one or more points are generated by inter-prediction; and the device of clause 1C, further configured to decode the one or more points using intra-prediction or inter-prediction based on the value of the flag.

条項3C:1つまたは複数のプロセッサが、点群を含むビットストリームにおいて、リーフノードのボリュームを指定する8分木リーフボリュームを受信するようにさらに構成される、条項1Cまたは2Cのデバイス。 Clause 3C: The device of Clause 1C or 2C, wherein the one or more processors are further configured to receive octree leaf volumes specifying volumes of leaf nodes in a bitstream that includes a point cloud.

条項4C:1つまたは複数の点の予測を生成するために、1つまたは複数のプロセッサが、イントラ予測を使用して、1つまたは複数の点の予測を生成するようにさらに構成され、イントラ予測を使用して、1つまたは複数の点の予測を生成するために、1つまたは複数のプロセッサが、1つまたは複数の点のためのローカル予測木を決定するようにさらに構成される、条項1Cから3Cのいずれかのデバイス。 Clause 4C: The one or more processors are further configured to generate the one or more point predictions using intra prediction, and the one or more processors are further configured to generate one or more point predictions using intra prediction. the one or more processors are further configured to determine a local prediction tree for the one or more points to use the prediction to generate a prediction for the one or more points; Any device from clauses 1C to 3C.

条項5C:予測に基づいて、1つまたは複数の点を決定するために、1つまたは複数のプロセッサが、点群を含むビットストリームにおいて、1つまたは複数の点の各々のための予測モード、1次残差、および2次残差のうちの少なくとも1つを受信するようにさらに構成される、条項1Cから4Cのいずれかのデバイス。 Clause 5C: In order to determine the one or more points based on the prediction, the one or more processors determine a prediction mode for each of the one or more points in the bitstream comprising the point cloud; The device of any of clauses 1C to 4C, further configured to receive at least one of a first-order residual, and a second-order residual.

条項6C:1つまたは複数の点の予測を生成するために、1つまたは複数のプロセッサが、インター予測を使用して、1つまたは複数の点の予測を生成するようにさらに構成され、インター予測を使用して、1つまたは複数の点の予測を生成するために、1つまたは複数のプロセッサが、参照点群フレーム内の点の同様のセットを決定するために、1つまたは複数の点を用いて動き推定を実行するようにさらに構成される、条項1Cから5Cのいずれかのデバイス。 Clause 6C: The one or more processors are further configured to generate the one or more point predictions using inter prediction, and the one or more processors are further configured to generate one or more point predictions using To generate a prediction of one or more points using prediction, one or more processors use one or more processors to determine a similar set of points in a reference point cloud frame. A device according to any of clauses 1C to 5C, further configured to perform motion estimation using points.

条項7C:1つまたは複数の点の予測を生成するために、1つまたは複数のプロセッサが、インター予測を使用して、1つまたは複数の点の予測を生成するようにさらに構成され、インター予測を使用して、1つまたは複数の点の予測を生成するために、1つまたは複数のプロセッサが、参照点群フレーム内の点のセットに基づいて、1つまたは複数の点を予測するために、動き補償を実行するようにさらに構成される、条項1Cから5Cのいずれかのデバイス。 Clause 7C: The one or more processors are further configured to generate one or more point predictions using inter prediction, and the one or more processors are further configured to generate one or more point predictions using inter prediction; Using prediction, one or more processors predict one or more points based on a set of points in a reference point cloud frame to generate a prediction of one or more points. A device according to any of Clauses 1C to 5C, further configured to perform motion compensation.

条項8C:動き補償を実行するために、1つまたは複数のプロセッサが、1つまたは複数の点の予測を決定するために、参照点群フレーム内の点のセットに動きベクトルを適用するようにさらに構成される、条項7Cのデバイス。 Clause 8C: To perform motion compensation, one or more processors apply a motion vector to a set of points in a reference point cloud frame to determine a prediction of the one or more points. Further consisting of the devices of Article 7C.

条項9C:1つまたは複数のプロセッサが、空間時間的近隣インター8分木リーフの動きベクトルに基づいて、動きベクトルを予測するようにさらに構成される、条項8Cのデバイス。 Clause 9C: The device of Clause 8C, wherein the one or more processors are further configured to predict motion vectors based on motion vectors of spatiotemporally neighboring inter-octtree leaves.

条項10C:1つまたは複数のプロセッサが、点群データから点群を再構成するようにさらに構成される、条項1Cから9Cのいずれかのデバイス。 Clause 10C: The device of any of Clauses 1C to 9C, wherein the one or more processors are further configured to reconstruct a point cloud from point cloud data.

条項11C:1つまたは複数のプロセッサが、点群を再構成することの一部として、リーフノードにおける1つまたは複数の点の各々の直接復号された位置に基づいて、点群の1つまたは複数の点の位置を決定するように構成される、条項10Cのデバイス。 Clause 11C: The one or more processors, as part of reconstructing the point cloud, calculate one or more of the point cloud based on the directly decoded position of each of the one or more points at the leaf nodes. Clause 10C device configured to determine the position of a plurality of points.

条項12C:1つまたは複数のプロセッサが、点群に基づいて建物の内部のマップを生成するようにさらに構成される、条項11Cのデバイス。 Clause 12C: The device of Clause 11C, wherein the one or more processors are further configured to generate a map of the interior of a building based on the point cloud.

条項13C:1つまたは複数のプロセッサが、点群に基づいて自律ナビゲーション動作を実行するようにさらに構成される、条項11Cのデバイス。 Clause 13C: The device of Clause 11C, wherein the one or more processors are further configured to perform autonomous navigation operations based on point clouds.

条項14C:1つまたは複数のプロセッサが、点群に基づいてコンピュータグラフィックスを生成するようにさらに構成される、条項11Cのデバイス。 Clause 14C: The device of Clause 11C, wherein the one or more processors are further configured to generate computer graphics based on the point cloud.

条項15C:1つまたは複数のプロセッサが、点群に基づいて仮想オブジェクトの位置を決定すること、および、仮想オブジェクトが決定された位置にある、エクステンデッドリアリティ(XR)可視化を生成することを行うように構成される、条項11Cのデバイス。 Clause 15C: The one or more processors are configured to determine the position of a virtual object based on a point cloud and to generate an extended reality (XR) visualization in which the virtual object is at the determined position. Clause 11C devices constituted by.

条項16C:点群に基づいて像を提示するためのディスプレイをさらに備える、条項11Cから15Cのいずれかのデバイス。 Clause 16C: A device according to any of Clauses 11C to 15C, further comprising a display for presenting an image based on a point cloud.

条項17C:デバイスがモバイルフォンまたはタブレットコンピュータである、条項1Cから16Cのいずれかのデバイス。 Clause 17C: Any device in clauses 1C to 16C, where the device is a mobile phone or a tablet computer.

条項18C:デバイスが車両である、条項1Cから16Cのいずれかのデバイス。 Article 18C: A device in any of Articles 1C to 16C, where the device is a vehicle.

条項19C:デバイスがエクステンデッドリアリティデバイスである、条項1Cから16Cのいずれかのデバイス。 Clause 19C: A device in any of Clauses 1C to 16C where the device is an extended reality device.

条項20C:点群を復号する方法であって、点群を含む空間の8分木ベースの分割を定義する、8分木を決定するステップであって、8分木のリーフノードが、点群の1つまたは複数の点を含む、ステップと、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するステップとを含み、リーフノードにおける1つまたは複数の点の各々の位置を直接復号するステップが、1つまたは複数の点の予測を生成するステップと、予測に基づいて、1つまたは複数の点を決定するステップとを含む、方法。 Clause 20C: A method for decoding a point cloud, the step of determining an octree defining an octree-based partition of a space containing the point cloud, the step of determining an octree, wherein leaf nodes of the octree directly decoding the position of each of the one or more points at the leaf node, directly decoding the position of each of the one or more points at the leaf node. A method, wherein the step of decoding includes the steps of generating a prediction of one or more points and determining one or more points based on the prediction.

条項21C:リーフノードにおける1つまたは複数の点の各々の位置を直接復号するステップが、フラグを受信するステップであって、フラグのための第1の値が、1つまたは複数の点の予測がイントラ予測によって生成されることを示し、フラグのための第2の値が、1つまたは複数の点の予測がインター予測によって生成されることを示す、ステップと、フラグの値に基づいて、イントラ予測またはインター予測を使用して、1つまたは複数の点を復号するステップとをさらに含む、条項20Cの方法。 Clause 21C: Directly decoding the position of each of the one or more points in a leaf node is receiving a flag, the first value for the flag being a prediction of the one or more points. is generated by intra-prediction, and a second value for the flag indicates that the prediction for the one or more points is generated by inter-prediction, based on the value of the flag, and decoding the one or more points using intra-prediction or inter-prediction.

条項22C:点群のためのビットストリームにおいて、リーフノードのボリュームを指定する8分木リーフボリュームを受信するステップをさらに含む、条項20Cまたは21Cの方法。 Clause 22C: The method of Clause 20C or 21C, further comprising receiving an octree leaf volume specifying a volume of a leaf node in the bitstream for the point cloud.

条項23C:1つまたは複数の点の予測を生成するステップが、イントラ予測を使用して、1つまたは複数の点の予測を生成するステップを含み、イントラ予測を使用して、1つまたは複数の点の予測を生成するステップが、1つまたは複数の点のためのローカル予測木を決定するステップを含む、条項20Cから22Cのいずれかの方法。 Clause 23C: The step of generating a prediction of one or more points includes the step of generating a prediction of one or more points using intra-prediction; The method of any of clauses 20C to 22C, wherein the step of generating a prediction for the points includes the step of determining a local prediction tree for the one or more points.

条項24C:予測に基づいて、1つまたは複数の点を決定するステップが、点群のためのビットストリームにおいて、1つまたは複数の点の各々のための予測モード、1次残差、および2次残差のうちの少なくとも1つを受信するステップを含む、条項20Cから23Cのいずれかの方法。 Clause 24C: Determining one or more points based on the prediction includes, in the bitstream for the point cloud, a prediction mode for each of the one or more points, a first-order residual, and two The method of any of clauses 20C to 23C, comprising the step of receiving at least one of the following residuals:

条項25C:1つまたは複数の点の予測を生成するステップが、インター予測を使用して、1つまたは複数の点の予測を生成するステップを含み、インター予測を使用して、1つまたは複数の点の予測を生成するステップが、参照点群フレーム内の点の同様のセットを決定するために、1つまたは複数の点を用いて動き推定を実行するステップを含む、条項20Cから24Cのいずれかの方法。 Clause 25C: Generating a prediction of the one or more points comprises generating a prediction of the one or more points using inter prediction; of clauses 20C to 24C, wherein the step of generating a prediction of the points in the reference point cloud frame comprises the step of performing motion estimation with the one or more points to determine a similar set of points in the reference point cloud frame. Either way.

条項26C:1つまたは複数の点の予測を生成するステップが、インター予測を使用して、1つまたは複数の点の予測を生成するステップを含み、インター予測を使用して、1つまたは複数の点の予測を生成するステップが、参照点群フレーム内の点のセットに基づいて、1つまたは複数の点を予測するために、動き補償を実行するステップを含む、条項20Cから25Cのいずれかの方法。 Clause 26C: The step of generating one or more point predictions includes the step of generating one or more point predictions using inter prediction; Any of clauses 20C to 25C, wherein the step of generating a prediction of the points in the reference point cloud frame comprises the step of performing motion compensation to predict the one or more points based on the set of points in the reference point cloud frame. That method.

条項27C:動き補償を実行するステップが、1つまたは複数の点の予測を決定するために、参照点群フレーム内の点のセットに動きベクトルを適用するステップを含む、条項26Cの方法。 Clause 27C: The method of Clause 26C, wherein the step of performing motion compensation comprises applying a motion vector to a set of points in a reference point cloud frame to determine a prediction of the one or more points.

条項28C:空間時間的近隣インター8分木リーフの動きベクトルに基づいて、動きベクトルを予測するステップをさらに含む、条項27Cの方法。 Clause 28C: The method of Clause 27C, further comprising predicting a motion vector based on the motion vectors of the spatiotemporal neighboring inter-octree leaves.

様々な例について説明した。これらおよび他の例は、以下の特許請求の範囲内に入る。 Various examples have been explained. These and other examples are within the scope of the following claims.

100 符号化および復号システム、システム
102 ソースデバイス
104 データソース
106 メモリ
108 出力インターフェース
110 コンピュータ可読記録媒体
112 記憶デバイス
114 ファイルサーバ
116 宛先デバイス
118 データコンシューマ
120 メモリ
122 入力インターフェース
200 G-PCCエンコーダ
202 座標変換ユニット
204 色変換ユニット
206 ボクセル化ユニット
208 属性転送ユニット
210 8分木分析ユニット
212 表面近似分析ユニット
214、226 算術符号化ユニット
216、312 ジオメトリ再構成ユニット
218、314 RAHTユニット
220 LOD生成ユニット
222 リフティングユニット
224 係数量子化ユニット
300 G-PCCデコーダ
302 ジオメトリ算術復号ユニット
304 属性算術復号ユニット
306 8分木合成ユニット
308 逆量子化ユニット
310 表面近似合成ユニット
316 LoD生成ユニット、LOD生成ユニット
318 逆リフティングユニット
320 逆座標変換ユニット
322 逆色変換ユニット
400、444 8分木
402、404 ノード
410 予測木
412、414A、414B、418A～418E ノード
420 予測フレーム
422 現在のフレーム
440、442 パス
600 距離測定システム
602 照明器
604 センサー
606 光、放射光
608、806、902 オブジェクト
610 戻り光
611 レンズ
612 画像
618 信号
620 点群生成器
700、710 車両
702 レーザーパッケージ
704 レーザービーム
706 歩行者
708、808、904 ビットストリーム
712 サーバシステム
800、812 ユーザ
802 第1のロケーション
804、810 XRヘッドセット
814 第2のロケーション
900 モバイルデバイス
902 オブジェクト
906 リモートデバイス 100 Encoding and Decoding Systems, Systems
102 Source device
104 Data Source
106 Memory
108 output interface
110 Computer-readable recording medium
112 Storage devices
114 File server
116 Destination Device
118 Data Consumer
120 memory
122 input interface
200G-PCC encoder
202 Coordinate transformation unit
204 Color conversion unit
206 Voxelization unit
208 Attribute transfer unit
210 Eight tree analysis unit
212 Surface approximation analysis unit
214, 226 arithmetic coding unit
216, 312 geometry reconstruction unit
218, 314 RAHT unit
220 LOD generation unit
222 Lifting unit
224 Coefficient quantization unit
300G-PCC decoder
302 Geometry Arithmetic Decoding Unit
304 Attribute arithmetic decoding unit
306 8-ary tree synthesis unit
308 Inverse quantization unit
310 Surface approximation synthesis unit
316 LoD generation unit, LOD generation unit
318 Reverse lifting unit
320 Inverse coordinate transformation unit
322 Reverse color conversion unit
400, 444 8-fold tree
402, 404 nodes
410 Prediction tree
412, 414A, 414B, 418A-418E nodes
420 predicted frames
422 current frame
440, 442 pass
600 distance measuring system
602 Illuminator
604 sensor
606 Light, synchrotron radiation
608, 806, 902 objects
610 Return light
611 Lens
612 images
618 Signal
620 point cloud generator
700, 710 vehicles
702 laser package
704 Laser Beam
706 Pedestrian
708, 808, 904 bitstream
712 Server System
800, 812 users
802 1st location
804, 810 XR headset
814 Second location
900 mobile devices
902 Object
906 remote device

Claims

A device for decoding a bitstream containing point cloud data, the device comprising:
a memory for storing the point cloud data;
one or more processors coupled to the memory and implemented in circuitry, the one or more processors comprising:
determining an octree that defines an octree-based partition of a space containing a point cloud, a leaf node of the octree containing one or more points of the point cloud; , and configured to directly decode the position of each of the one or more points at the leaf node, for directly decoding the position of each of the one or more points at the leaf node. said one or more processors,
The device is further configured to: generate a prediction of the one or more points; and determine the one or more points based on the prediction.

the one or more processors to directly decode the position of each of the one or more points in the leaf node;
receiving a flag, a first value for the flag indicating that the prediction of the one or more points is generated by intra prediction; and a second value for the flag. indicates that the prediction of the one or more points is generated by inter-prediction, and based on the value of the flag, the prediction of the one or more points is generated using intra-prediction or inter-prediction. 2. The device of claim 1, further configured to decode points.

2. The device of claim 1, wherein the one or more processors are further configured to receive octree leaf volumes that specify volumes of the leaf nodes in the bitstream that includes the point cloud.

to generate the prediction of the one or more points, the one or more processors are further configured to generate the prediction of the one or more points using intra-prediction. ,
to generate the prediction of the one or more points using intra-prediction, the one or more processors determining a local prediction tree for the one or more points; The device of claim 1, further configured.

In order to determine the one or more points based on the prediction, the one or more processors, in the bitstream including the point cloud, for each of the one or more points. 2. The device of claim 1, further configured to receive at least one of a prediction mode, a first-order residual, and a second-order residual.

to generate the prediction of the one or more points, the one or more processors are further configured to generate the prediction of the one or more points using inter-prediction. ,
to generate the prediction of the one or more points using inter-prediction, the one or more processors to determine a similar set of points in the reference point cloud frame; 2. The device of claim 1, further configured to perform motion estimation using one or more points.

to generate the prediction of the one or more points, the one or more processors are further configured to generate the prediction of the one or more points using inter-prediction. ,
to generate the prediction of the one or more points using inter prediction, the one or more processors generate the one or more predictions based on a set of points in a reference point cloud frame; 2. The device of claim 1, further configured to perform motion compensation to predict points of .

to perform motion compensation, the one or more processors are configured to apply a motion vector to the set of points in the reference point cloud frame to determine a prediction of the one or more points. 8. The device of claim 7, further configured to.

9. The device of claim 8, wherein the one or more processors are further configured to predict the motion vector based on motion vectors of spatiotemporal neighboring inter-octree leaves.

2. The device of claim 1, wherein the one or more processors are further configured to reconstruct the point cloud from the point cloud data.

The one or more processors, as part of reconstructing the point cloud, calculate the point cloud based on the directly decoded position of each of the one or more points at the leaf nodes. 11. The device of claim 10, configured to determine the position of one or more points.

12. The device of claim 11, wherein the one or more processors are further configured to generate a map of a building interior based on the point cloud.

12. The device of claim 11, wherein the one or more processors are further configured to perform autonomous navigation operations based on the point cloud.

12. The device of claim 11, wherein the one or more processors are further configured to generate computer graphics based on the point cloud.

said one or more processors,
configured to: determine a position of a virtual object based on the point cloud; and generate an extended reality (XR) visualization in which the virtual object is at the determined position of the virtual object; 12. The device according to claim 11.

12. The device of claim 11, further comprising a display for presenting an image based on the point cloud.

2. The device of claim 1, wherein the device is a mobile phone or a tablet computer.

2. The device of claim 1, wherein the device is a vehicle.

2. The device of claim 1, wherein the device is an extended reality device.

A method for decoding a point cloud, the method comprising:
determining an octree that defines an octree-based partition of a space containing the point cloud, a leaf node of the octree containing one or more points of the point cloud; step and
directly decoding the position of each of the one or more points at the leaf node, directly decoding the position of each of the one or more points at the leaf node,
generating a prediction of the one or more points;
determining the one or more points based on the prediction.

directly decoding the position of each of the one or more points in the leaf node,
receiving a flag, a first value for the flag indicating that the prediction of the one or more points is generated by intra-prediction, and a second value for the flag; indicates that the prediction of the one or more points is generated by inter-prediction;
21. The method of claim 20, further comprising decoding the one or more points using intra-prediction or inter-prediction based on the value of the flag.

21. The method of claim 20, further comprising receiving octree leaf volumes specifying volumes of the leaf nodes in a bitstream for the point cloud.

generating the prediction of the one or more points includes generating the prediction of the one or more points using intra prediction;
21. The method of claim 20, wherein generating the prediction for the one or more points using intra-prediction comprises determining a local prediction tree for the one or more points. .

Determining the one or more points based on the prediction includes, in a bitstream for the point cloud, a prediction mode for each of the one or more points, a first-order residual, and 21. The method of claim 20, comprising receiving at least one of the quadratic residuals.

generating the prediction of the one or more points comprises generating the prediction of the one or more points using inter-prediction;
generating the prediction of the one or more points using inter-prediction, using the one or more points to determine a similar set of points in a reference point cloud frame; 21. The method of claim 20, comprising performing motion estimation.

generating the prediction of the one or more points comprises generating the prediction of the one or more points using inter-prediction;
generating the prediction of the one or more points using inter-prediction to predict the one or more points based on a set of points in a reference point cloud frame; 21. The method of claim 20, comprising the step of performing compensation.

27. The method of claim 26, wherein performing motion compensation comprises applying a motion vector to the set of points in the reference point cloud frame to determine a prediction of the one or more points. Method.

28. The method of claim 27, further comprising predicting the motion vector based on motion vectors of spatiotemporal neighboring inter-octree leaves.

a computer-readable storage medium storing instructions, the instructions, when executed by the one or more processors, transmitting the instructions to the one or more processors;
determining an octree that defines an octree-based partition of a space containing a point cloud, a leaf node of the octree containing one or more points of the point cloud; , and causing direct decoding of the position of each of the one or more points at the leaf node, and for directly decoding the position of each of the one or more points at the leaf node, instructions to the one or more processors;
A computer-readable storage medium that causes: generating a prediction of the one or more points; and determining the one or more points based on the prediction.

A device,
means for determining an octree, defining an octree-based partition of a space containing a point cloud, wherein a leaf node of the octree includes one or more points of the point cloud; , means and
and means for directly decoding the position of each of the one or more points at the leaf node, the means for directly decoding the position of each of the one or more points at the leaf node. but,
means for generating a prediction of the one or more points;
and means for determining the one or more points based on the prediction.