CN116711313A - Inter-prediction codec for geometric point cloud compression - Google Patents
Inter-prediction codec for geometric point cloud compression Download PDFInfo
- Publication number
- CN116711313A CN116711313A CN202180087697.9A CN202180087697A CN116711313A CN 116711313 A CN116711313 A CN 116711313A CN 202180087697 A CN202180087697 A CN 202180087697A CN 116711313 A CN116711313 A CN 116711313A
- Authority
- CN
- China
- Prior art keywords
- point
- prediction
- azimuth
- inter
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000006835 compression Effects 0.000 title description 20
- 238000007906 compression Methods 0.000 title description 20
- 230000004044 response Effects 0.000 claims abstract description 41
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000000034 method Methods 0.000 claims description 180
- 238000003860 storage Methods 0.000 claims description 22
- 238000004891 communication Methods 0.000 claims description 20
- 238000013139 quantization Methods 0.000 description 24
- 238000010586 diagram Methods 0.000 description 22
- 230000006870 function Effects 0.000 description 18
- 230000011664 signaling Effects 0.000 description 16
- 239000013598 vector Substances 0.000 description 15
- 230000009466 transformation Effects 0.000 description 11
- 230000008569 process Effects 0.000 description 10
- 238000012546 transfer Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 7
- 238000013500 data storage Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000012800 visualization Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000003491 array Methods 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000005192 partition Methods 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- VBRBNWWNRIMAII-WYMLVPIESA-N 3-[(e)-5-(4-ethylphenoxy)-3-methylpent-3-enyl]-2,2-dimethyloxirane Chemical compound C1=CC(CC)=CC=C1OC\C=C(/C)CCC1C(C)(C)O1 VBRBNWWNRIMAII-WYMLVPIESA-N 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000001143 conditioned effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000002310 reflectometry Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000012432 intermediate storage Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 235000014347 soups Nutrition 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Landscapes
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Examples of processing point clouds include: in response to determining to predict a current point in the point cloud using the prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and predicting a current point of the point cloud using inter prediction in response to selecting the inter prediction mode for the current point.
Description
The present application requires the following priorities: U.S. patent application Ser. No. 17/646,217, 28 of 12, 2021, and U.S. provisional application Ser. No. 63/131,716, 29 of 12, 2020, 63/134,492, 2021, 4, 5 of 4, 2021, 63/177,186, 2021, 4, 26, and 2021, 63/179,892, and 63/218,170, all of which are incorporated herein by reference in their entirety. U.S. application Ser. No. 17/646217, filed on 28 at 12 at 2021, claims the benefit of: U.S. provisional application number 63/131,716 submitted on 29 th month of 2020, U.S. provisional application number 63/134,492 submitted on 6 th month of 2021, U.S. provisional application number 63/170,907 submitted on 5 th month of 2021, U.S. provisional application number 63/177,186 submitted on 20 th month of 2021, U.S. provisional application number 63/179,892 submitted on 26 th month of 2021, and U.S. provisional application number 63/218,170 submitted on 2 nd month of 2021.
Technical Field
The present disclosure relates to point cloud encoding and decoding.
Drawings
Fig. 1 is a block diagram illustrating an exemplary encoding and decoding system that may perform the techniques of this disclosure.
Fig. 2 is a block diagram illustrating an exemplary geometry point cloud compression (G-PCC) encoder in accordance with one or more aspects of the present disclosure.
Fig. 3 is a block diagram illustrating an exemplary G-PCC decoder according to one or more aspects of the present disclosure.
Fig. 4 is a conceptual diagram illustrating octree partitioning for geometry codec according to one or more aspects of the present disclosure.
Fig. 5A and 5B are conceptual diagrams of a rotational LIDAR acquisition model according to one or more aspects of the present disclosure.
Fig. 6 is a conceptual diagram of a prediction tree for predicting geometry codec according to one or more aspects of the present disclosure.
Fig. 7 is a conceptual diagram illustrating an exemplary inter-prediction process for predicting points in a point cloud according to one or more aspects of the present disclosure.
Fig. 8 is a flow diagram illustrating an exemplary technique for inter-predicting points in a point cloud in accordance with one or more aspects of the present disclosure.
Fig. 9 is a conceptual diagram illustrating an exemplary ranging system that may be used with one or more techniques of this disclosure.
FIG. 10 is a conceptual diagram illustrating an exemplary vehicle-based scenario in which one or more techniques of the present disclosure may be used.
Fig. 11 is a conceptual diagram illustrating an example augmented reality system in which one or more techniques of the present disclosure may be used.
Fig. 12 is a conceptual diagram illustrating an exemplary mobile device system in which one or more techniques of this disclosure may be used.
Disclosure of Invention
In general, this disclosure describes techniques for encoding and decoding nodes of a point cloud using inter-prediction, such as for the geometric point cloud compression (G-PCC) standard currently being developed. However, the exemplary techniques are not limited to the G-PCC standard. In some examples of G-PCC, coordinates of the location of a node (also referred to as a point) of the point cloud may be converted into a (r, Φ, i) field, where the location of the node is represented by three parameters, namely a radius r, an azimuth angle Φ, and a laser index i (e.g., a laser identifier). When the prediction geometry codec is performed using the angle mode in the G-PCC, the G-PCC codec may perform prediction in the (r, phi, i) domain. For example, to codec a particular node of a particular frame of a point cloud, a G-PCC codec may determine a predicted radius r, an azimuth angle phi, and a laser index i for the particular node based on another node of the particular frame, and add the radius r, azimuth angle phi, and laser index i for the node to residual data (e.g., residual radius r, residual azimuth angle phi, and residual laser index i) to determine a reconstructed radius r, azimuth angle phi, and laser index i for the particular node. Since the encoding of residual data may account for a large portion of the encoding overhead, the encoding efficiency (e.g., the number of bits used to encode the point) may be a factor of the predicted radius r, azimuth angle phi, and laser index i of a particular node's proximity to the reconstructed radius r, azimuth angle phi, and laser index i of the particular node. It may therefore be desirable to be able to generate predicted parameters for a node that are as close as possible to the actual parameters of the node.
In accordance with one or more techniques of this disclosure, a G-PCC codec may predict a current point of a current frame of a point cloud using inter-frame prediction. For example, to predict a current point using inter-frame prediction, a G-PCC codec may identify a reference point in a frame different from the current frame, and predict one or more parameters (e.g., radius r, azimuth angle phi, and laser index i) of the current point based on one or more parameters of the reference point. In the case where the parameters of the reference point are closer to the parameters of the current point than the parent point or other available points in the current frame, predicting the current point using inter-prediction may reduce the size of the residual data. In this way, the techniques of this disclosure may enable a G-PCC codec to improve codec efficiency.
In one example, a method of processing a point cloud includes: in response to determining to predict a current point in the point cloud using the prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and predicting a current point of the point cloud using inter prediction in response to selecting the inter prediction mode for the current point.
In another example, an apparatus for processing a point cloud includes: a memory configured to store at least a portion of a point cloud; and one or more processors implemented in the circuit and configured to: in response to determining to predict a current point in the point cloud using the prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and predicting a current point of the point cloud using inter prediction in response to selecting the inter prediction mode for the current point.
In another example, a computer-readable storage medium stores instructions that, when executed by one or more processors, cause the one or more processors to: in response to determining to predict a current point in the point cloud using the prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and predicting a current point of the point cloud using inter prediction in response to selecting the inter prediction mode for the current point.
The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Detailed Description
When the prediction geometry codec is performed using the angle mode in the G-PCC, the G-PCC codec may perform prediction in the (r, phi, i) domain. For example, to codec a particular node of a particular frame of a point cloud, a G-PCC codec may determine a predicted radius r, an azimuth angle phi, and a laser index i for the particular node based on another node of the particular frame, and add the radius r, azimuth angle phi, and laser index i for the node to residual data (e.g., residual radius r, residual azimuth angle phi, and residual laser index i) to determine a reconstructed radius r, azimuth angle phi, and laser index i for the particular node. Since the encoding of residual data may account for a large portion of the encoding overhead, the encoding efficiency (e.g., the number of bits used to encode the point) may be a factor of the predicted radius r, azimuth angle phi, and laser index i of a particular node's proximity to the reconstructed radius r, azimuth angle phi, and laser index i of the particular node. It may therefore be desirable to be able to generate predicted parameters for a node that are as close as possible to the actual parameters of the node.
In accordance with one or more techniques of this disclosure, a G-PCC codec may predict a current point of a current frame of a point cloud using inter-frame prediction. For example, to predict a current point using inter-frame prediction, a G-PCC codec may identify a reference point in a frame different from the current frame, and predict one or more parameters (e.g., radius r, azimuth angle phi, and laser index i) of the current point based on one or more parameters of the reference point. In the case where the parameters of the reference point are closer to the parameters of the current point than the parent point or other available points in the current frame, predicting the current point using inter-prediction may reduce the size of the residual data. In this way, the techniques of this disclosure may enable a G-PCC codec to improve codec efficiency.
Fig. 1 is a block diagram illustrating an exemplary encoding and decoding system 100 that may perform the techniques of this disclosure. The technology of the present disclosure relates generally to encoding (encoding and/or decoding) point cloud data, i.e., supporting point cloud compression. In general, point cloud data includes any data used to process a point cloud. The codec may effectively compress and/or decompress the point cloud data.
As shown in fig. 1, the system 100 includes a source device 102 and a target device 116. The source device 102 provides encoded point cloud data to be decoded by the target device 116. In particular, in the example of fig. 1, the source device 102 provides point cloud data to the target device 116 via the computer-readable medium 110. The source device 102 and the target device 116 may comprise any of a variety of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as smartphones, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, ground or marine vehicles, spacecraft, aircraft, robots, laser radar (LIDAR) devices, satellites, and the like. In some cases, the source device 102 and the target device 116 may be equipped for wireless communication.
In the example of fig. 1, source device 102 includes a data source 104, a memory 106, a G-PCC encoder 200, and an output interface 108. Target device 116 includes input interface 122, G-PCC decoder 300, memory 120, and data consumer 118. In accordance with the present disclosure, the G-PCC encoder 200 of the source device 102 and the G-PCC decoder 300 of the target device 116 may be configured to apply the techniques of the present disclosure in connection with predictive geometry codec. Thus, the source device 102 represents an example of an encoding device, while the target device 116 represents an example of a decoding device. In other examples, the source device 102 and the target device 116 may include other components or arrangements. For example, the source device 102 may receive data (e.g., point cloud data) from an internal or external source. Likewise, the target device 116 may interact with an external data consumer rather than including the data consumer in the same device.
The system 100 shown in fig. 1 is merely one example. In general, other digital encoding and/or decoding devices may perform the techniques of the present disclosure in connection with predictive geometry codec. The source device 102 and the target device 116 are merely examples of devices in which the source device 102 generates encoded data for transmission to the target device 116. The present disclosure refers to "codec" devices as devices that perform data encoding (encoding and/or decoding). Thus, the G-PCC encoder 200 and the G-PCC decoder 300 represent examples of codec devices, in particular encoders and decoders, respectively. In some examples, the source device 102 and the target device 116 may operate in a substantially symmetrical manner such that each of the source device 102 and the target device 116 includes encoding and decoding components. Thus, the system 100 may support unidirectional or bidirectional transmission between the source device 102 and the target device 116, for example, for streaming, playback, broadcasting, telephony, navigation, and other applications.
In general, the data source 104 represents a source of data (i.e., raw, unencoded point cloud data), and may provide a series of "frames" of data to the G-PCC encoder 200, which encodes the data of the frames. The data source 104 of the source device 102 may include a point cloud capture device, such as any of a variety of cameras or sensors, for example, a 3D scanner or light detection and ranging (LIDAR) device, one or more cameras, an archive containing previously captured data, and/or a data feed interface to receive data from a data content provider. Alternatively or additionally, the point cloud data may be computer generated by a scanner, camera, sensor, or other data. For example, the data source 104 may generate computer graphics-based data as source data, or a combination of live data, archived data, and computer-generated data. In each case, the G-PCC encoder 200 encodes captured, pre-captured, or computer-generated data. The G-PCC encoder 200 may rearrange the frames from the receiving order (sometimes referred to as the "display order") to the codec order for the codec. G-PCC encoder 200 may generate one or more bitstreams including encoded data. The source device 102 may then output the encoded data via the output interface 108 onto the computer-readable medium 110 for receipt and/or retrieval by an input interface 122, such as the target device 116.
The memory 106 of the source device 102 and the memory 120 of the target device 116 may represent general purpose memory. In some examples, memory 106 and memory 120 may store raw data, such as raw data from data source 104 and raw decoded data from G-PCC decoder 300. Additionally or alternatively, memory 106 and memory 120 may store software instructions executable by, for example, G-PCC encoder 200 and G-PCC decoder 300, respectively. Although memory 106 and memory 120 are shown separately from G-PCC encoder 200 and G-PCC decoder 300 in this example, it should be appreciated that G-PCC encoder 200 and G-PCC decoder 300 may also include internal memory for functionally similar or equivalent purposes. Further, memory 106 and memory 120 may store encoded data, for example, output from G-PCC encoder 200 and input to G-PCC decoder 300. In some examples, portions of memory 106 and memory 120 may be allocated as one or more buffers, e.g., to store raw, decoded, and/or encoded data. For example, memory 106 and memory 120 may store data representing a point cloud.
Computer-readable medium 110 may represent any type of medium or device capable of transmitting encoded data from source device 102 to destination device 116. In one example, the computer-readable medium 110 represents a communication medium that enables the source device 102 to send encoded data directly to the target device 116 in real-time, e.g., via a radio frequency network or a computer-based network. Output interface 108 may modulate a transmit signal including encoded data according to a communication standard, such as a wireless communication protocol, and input interface 122 may demodulate a received transmit signal. The communication medium may include any wireless or wired communication medium such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the internet. The communication medium may include a router, switch, base station, or any other device operable to facilitate communication from the source device 102 to the target device 116.
In some examples, source device 102 may output encoded data from output interface 108 to storage device 112. Similarly, the target device 116 may access encoded data from the storage device 112 via the input interface 122. Storage device 112 may include any of a variety of distributed or locally accessed data storage media such as hard drives, blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded data.
In some examples, source device 102 may output the encoded data to file server 114 or another intermediate storage device that may store the encoded data generated by source device 102. The target device 116 may access the stored data from the file server 114 via streaming or download. File server 114 may be any type of server device capable of storing encoded data and transmitting the encoded data to target device 116. File server 114 may represent a web server (e.g., for a web site), a File Transfer Protocol (FTP) server, a content delivery network device, or a Network Attached Storage (NAS) device. The target device 116 may access the encoded data from the file server 114 over any standard data connection, including an internet connection. This may include a wireless channel (e.g., wi-Fi connection), a wired connection (e.g., digital Subscriber Line (DSL), cable modem, etc.), or a combination of both suitable for accessing encoded data stored on file server 114. The file server 114 and the input interface 122 may be configured to operate in accordance with a streaming protocol, a download transfer protocol, or a combination thereof.
Output interface 108 and input interface 122 may represent wireless transmitters/receivers, modems, wired networking components (e.g., ethernet cards), wireless communication components operating according to any of a variety of IEEE 802.11 standards, or other physical components. In examples where output interface 108 and input interface 122 comprise wireless components, output interface 108 and input interface 122 may be configured to transmit data, such as encoded data, according to a cellular communication standard, such as 4G, 4G-LTE (long term evolution), LTE-advanced, 5G, etc. In some examples where output interface 108 includes a wireless transmitter, output interface 108 and input interface 122 may be configured to communicate in accordance with other wireless standards (such as the IEEE 802.11 specification, the IEEE 802.15 specification (e.g., zigBee TM )、Bluetooth TM Standard, etc.) to transmit data, such as encoded data. In some examples, source device 102 and/or target device 116 may include respective system-on-chip (SoC) devices. For example, source device 102 may include a SoC device to perform functions attributed to G-PCC encoder 200 and/or output interface 108, and destination device 116 may include a SoC device to perform functions attributed to G-PCC decoder 300 and/or input interface 122.
The techniques of this disclosure may be applied to encoding and decoding to support any of a variety of applications, such as communication between autonomous vehicles, communication between scanners, cameras, sensors, and processing devices (such as local or remote servers), geographic mapping, or other applications.
The input interface 122 of the target device 116 receives the encoded bitstream from the computer readable medium 110 (e.g., communication medium, storage device 112, file server 114, etc.). The encoded bitstream may include signaling information defined by the G-PCC encoder 200, which is also used by the G-PCC decoder 300, such as syntax elements having values describing characteristics and/or processing of the encoded decoding units (e.g., slices, pictures, groups of pictures, sequences, etc.). The data consumer 118 uses the decoded data. For example, the data consumer 118 may use the decoded data to determine the location of the physical object. In some examples, the data consumer 118 may include a display that presents images based on a point cloud.
G-PCC encoder 200 and G-PCC decoder 300 may each be implemented as any of a variety of suitable encoder and/or decoder circuits, such as one or more microprocessors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combination thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of the G-PCC encoder 200 and the G-PCC decoder 300 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device. The devices that include G-PCC encoder 200 and/or G-PCC decoder 300 may include one or more integrated circuits, microprocessors, and/or other types of devices.
The G-PCC encoder 200 and the G-PCC decoder 300 may operate according to a codec standard such as a video point cloud compression (V-PCC) standard or a geometry point cloud compression (G-PCC) standard. The present disclosure relates to encoding and decoding (e.g., encoding and decoding) of pictures may generally refer to a process that includes encoding or decoding data. The encoded bitstream typically includes a series of values representing syntax elements of a codec decision (e.g., a codec mode).
The present disclosure may generally mention "signaling" certain information, such as syntax elements. The term "signaling" may generally refer to the transmission of values of syntax elements and/or other data used to decode encoded data. That is, the G-PCC encoder 200 may signal the value of the syntax element in the bitstream. Typically, signaling refers to generating a value in the bitstream. As described above, the source device 102 may transmit the bitstream to the target device 116 in substantially real-time or non-real-time, such as may occur when the syntax elements are stored to the storage device 112 for later retrieval by the target device 116.
The potential requirements of point cloud codec standardization are being investigated by ISO/IEC MPEG (JTC 1/SC 29/WG 11), whose compression capabilities greatly exceed current methods and will target the creation of standards. The team is collaborating on this exploration activity, with the collaborative project named three-dimensional graphics team (3 DG) to evaluate the compression technology design proposed by the domain expert.
Point cloud compression activity is categorized into two different approaches. The first approach is "video point cloud compression" (V-PCC), which partitions the 3D object and projects segments (denoted as "tiles" in 2D frames) in multiple 2D planes, which are further encoded by conventional 2D video codecs such as High Efficiency Video Codec (HEVC) (ITU-T h.265) codecs. The second approach is "geometry-based point cloud compression" (G-PCC), which directly compresses the 3D geometry, i.e. the localization of a set of points in 3D space and associated attribute values (for each point associated with the 3D geometry). G-PCC solves the problem of compression of point clouds in class 1 (static point clouds) and class 3 (dynamically acquired point clouds). A draft of the G-PCC standard is available in G-PCC DIS (ISO/IEC JTC1/SC29/WG11 w19328, belgium brucell, month 1 2020), and a description of the codec is available in G-PCC codec specification v8 (ISO/IEC JTC1/SC29/WG11 w19525, belgium brucell, month 1 2020).
The point cloud contains a set of points in 3D space and may have attributes associated with the points. The attribute may be color information such as R, G, B or Y, cb, cr or reflectivity information or other attributes. The point cloud may be captured by various cameras or sensors, such as LIDAR sensors and 3D scanners, and may also be generated by a computer. The point cloud data is used for a variety of applications including, but not limited to, construction (modeling), graphics (3D models for visualization and animation), and the automotive industry (LIDAR sensors for aiding navigation).
The 3D space occupied by the point cloud data may be surrounded by a virtual bounding box. The positions of the points in the bounding box may be represented with a certain accuracy; thus, the positioning of one or more points may be quantified based on accuracy. At the minimum level, the bounding box is segmented into voxels, which are the smallest spatial units represented by a unit cube. Voxels in the bounding box may be associated with zero, one, or multiple points. The bounding box may be divided into a plurality of cubes/cube-shaped areas, which may be referred to as tiles. Each slice may be encoded into one or more slices. Dividing the bounding box into stripes and slices may be based on the number of points in each partition, or based on other considerations (e.g., a particular region may be encoded into slices). The slice regions may be further partitioned using partitioning decisions similar to those in video codecs.
Fig. 2 provides an overview of G-PCC encoder 200. Fig. 3 provides an overview of a G-PCC decoder 300. The modules shown are logical modules, not necessarily in one-to-one correspondence with the implementation code in the reference implementation of the G-PCC codec, i.e. TMC13 test model software for ISO/IEC MPEG (JTC 1/SC 29/WG 11) research.
In both the G-PCC encoder 200 and the G-PCC decoder 300, the point cloud positioning is first encoded. The attribute codec depends on the decoded geometry. In fig. 2 and 3, modules 212, 218, 310, and 314 are options that are typically used for category 1 data. Modules 220, 222, 316, and 318 are options that are commonly used for category 3 data. All other modules are generic between category 1 and category 3.
For class 3 data, the compressed geometry is typically represented as an octree from the root up to the leaf level of each voxel. For class 1 data, the compression geometry is typically represented by pruning the octree (i.e., the octree from the root down to the leaf level of the block that is greater than the voxel) plus a model that approximates the surface within each leaf in the pruned octree. In this way, the class 1 and class 3 data share an octree codec mechanism, while the class 1 data may additionally use a surface model to approximate voxels within each leaf. The surface model used is a triangulated partition comprising 1-10 triangles per block, forming a triangle set (triangulation soup). Thus, the class 1 geometry codec is referred to as a trisop geometry codec, while the class 3 geometry codec is referred to as an octree geometry codec.
Fig. 4 is a conceptual diagram illustrating an exemplary octree partitioning for geometry codec in accordance with the techniques of the present disclosure. In the example shown in fig. 4, octree 400 may be partitioned into a series of nodes. For example, each node may be a cube node. At each node of the octree, G-PCC encoder 200 may signal the occupancy of one point-to-node of the point cloud to G-PCC decoder 300 when G-PCC decoder 300 does not infer the occupancy of one or more of the children nodes of the node (which may include up to eight nodes). A plurality of neighbors are specified, including (a) nodes that share a face with a current octree node, (b) nodes that share a face, edge, or vertex with a current octree node, and so on. Within each neighborhood, the occupancy of a node and/or its child nodes may be used to predict the occupancy of the current node or its child nodes. For sparsely populated points in certain nodes of the octree, the codec also supports a direct codec mode in which 3D positioning of the points is directly encoded. A flag may be signaled to indicate that direct mode is signaled. For direct mode, the location of points in the point cloud can be directly encoded without any compression. At the lowest level, the number of points associated with the octree nodes/leaf nodes may also be encoded.
Once the geometry is encoded, the attributes corresponding to the geometry points will be encoded. When there are a plurality of attribute points corresponding to one reconstructed/decoded geometric point, an attribute value representing the reconstructed point may be derived.
There are three attribute codec methods in G-PCC: regional Adaptive Hierarchical Transform (RAHT) codec, interpolation-based hierarchical nearest neighbor prediction (predictive transform), and interpolation-based hierarchical nearest neighbor prediction with update/boost steps (boost transform). RAHT and lifting transforms are typically used for class 1 data, while predictive transforms are typically used for class 3 data. However, any method can be used for any data, just like the geometry codec in G-PCC, the attribute codec method for point cloud codec is specified in the bitstream.
Encoding of the attributes may be performed in levels of detail (LOD), where a finer representation of the point cloud attributes may be obtained by each level of detail. Each level of detail may be specified based on a distance metric from neighboring nodes or based on a sampling distance.
At the G-PCC encoder 200, the residual obtained as an output of the codec method for the attribute is quantized (e.g., by one of the arithmetic coding units 214 and/or 226). The quantized residual may be encoded using context-adaptive arithmetic coding.
In the example of fig. 2, the G-PCC encoder 200 may include a coordinate transformation unit 202, a color transformation unit 204, a voxelization unit 206, an attribute transfer unit 208, an octree analysis unit 210, a surface approximation analysis unit 212, an arithmetic coding unit 214, a geometry reconstruction unit 216, a RAHT unit 218, a LOD generation unit 220, a lifting unit 222, a coefficient quantization unit 224, and an arithmetic coding unit 226.
As shown in the example of fig. 2, G-PCC encoder 200 may receive a set of locations and a set of attributes. The positioning may include coordinates of points in the point cloud. The attributes may include information about points in the point cloud, such as colors associated with the points in the point cloud.
The coordinate transformation unit 202 may apply a transformation to the point coordinates to transform the coordinates from an initial domain to a transformation domain. The present disclosure may refer to the transformed coordinates as transformed coordinates. The color transformation unit 204 may apply a transformation to transform the color information of the attribute to a different domain. For example, the color conversion unit 204 may convert color information from an RGB color space to a YCbCr color space.
Further, in the example of fig. 2, the voxelization unit 206 may voxelize the transformed coordinates. Voxelization of the transformed coordinates may include quantizing and removing some points of the point cloud. In other words, multiple points of a point cloud may be incorporated into a single "voxel" which may then be considered a point in some aspects. Further, the octree analysis unit 210 may generate an octree based on the voxelized transformation coordinates. Further, in the example of fig. 2, the surface approximation analysis unit 212 may analyze points to potentially determine a surface representation of multiple sets of points. The arithmetic coding unit 214 may entropy-encode syntax elements representing the octree and/or information of the surface determined by the surface approximation analysis unit 212. The G-PCC encoder 200 may output these syntax elements in a geometry bitstream.
The geometry reconstruction unit 216 may reconstruct transformed coordinates of points in the point cloud based on the octree, data indicative of the surface determined by the surface approximation analysis unit 212, and/or other information. The number of transformed coordinates reconstructed by the geometry reconstruction unit 216 may be different from the original points of the point cloud due to the voxelization and surface approximation. The present disclosure may refer to the resulting points as reconstruction points. The attribute transfer unit 208 may transfer the attribute of the original point of the point cloud to the reconstructed point of the point cloud. As shown in fig. 2, the attribute transfer unit 208 may transfer the attribute to one or both of the RAHT unit 218 and the LOD generation unit 220.
In addition, the RAHT unit 218 may apply RAHT codec to the attributes of the reconstruction point. Alternatively or additionally, the LOD generation unit 220 and the lifting unit 222 may apply LOD processing and lifting, respectively, to the attributes of the reconstruction points. The RAHT unit 218 and the lifting unit 222 may generate coefficients based on the attributes. The coefficient quantization unit 224 may quantize the coefficients generated by the RAHT unit 218 or the lifting unit 222. The arithmetic coding unit 226 may apply arithmetic coding to syntax elements representing quantized coefficients. The G-PCC encoder 200 may output these syntax elements in the attribute bitstream.
In the example of fig. 3, the G-PCC decoder 300 may include a geometry arithmetic decoding unit 302, an attribute arithmetic decoding unit 304, an octree synthesis unit 306, an inverse quantization unit 308, a surface approximation synthesis unit 310, a geometry reconstruction unit 312, a RAHT unit 314, a LOD generation unit 316, an inverse lifting unit 318, an inverse transform coordinate unit 320, and an inverse transform color unit 322.
G-PCC decoder 300 may obtain a geometry bit stream and an attribute bit stream. The geometry arithmetic decoding unit 302 of the decoder 300 may apply arithmetic decoding (e.g., context Adaptive Binary Arithmetic Coding (CABAC) or other types of arithmetic decoding) to syntax elements in the geometry bitstream. Similarly, the attribute arithmetic decoding unit 304 may apply arithmetic decoding to syntax elements in the attribute bit stream.
The octree synthesis unit 306 may synthesize octrees based on syntax elements parsed from the geometry bitstream. In the case of using surface approximations in the geometry bitstream, the surface approximation synthesis unit 310 may determine the surface model based on syntax elements parsed from the geometry bitstream and based on octrees.
Further, the geometry reconstruction unit 312 may perform reconstruction to determine coordinates of points in the point cloud. The inverse transformation coordinate unit 320 may apply an inverse transformation to the reconstructed coordinates to convert the reconstructed coordinates (locations) of the points in the point cloud from the transformation domain back to the original domain.
In addition, in the example of fig. 3, the inverse quantization unit 308 may inversely quantize the attribute value. The attribute value may be based on syntax elements obtained from the attribute bitstream (e.g., including syntax elements decoded by the attribute arithmetic decoding unit 304). As shown in fig. 3, the inverse quantization unit 308 may transmit the attribute value to one or both of the RAHT unit 314 and the LOD generation unit 316.
Depending on how the attribute values are encoded, the RAHT unit 314 may perform RAHT codec to determine color values of points of the point cloud based on the inversely quantized attribute values. Alternatively, the LOD generation unit 316 and the inverse boost unit 318 may determine the color value of the point cloud using a level of detail-based technique.
Further, in the example of fig. 3, the inverse transform color unit 322 may apply an inverse color transform to the color values. The inverse color transform may be an inverse of the color transform applied by the color transform unit 204 of the encoder 200. For example, the color conversion unit 204 may convert color information from an RGB color space to a YCbCr color space. Accordingly, the inverse color transform unit 322 may transform color information from the YCbCr color space to the RGB color space.
The various elements of fig. 2 and 3 are shown to aid in understanding the operations performed by the encoder 200 and decoder 300. These units may be implemented as fixed function circuits, programmable circuits or a combination thereof. The fixed function circuit refers to a circuit that provides a specific function and is preset according to an operation that can be performed. Programmable circuitry refers to circuitry that can be programmed to perform various tasks and provide flexible functionality in the operations that can be performed. For example, the programmable circuit may execute software or firmware that causes the programmable circuit to operate in a manner defined by instructions of the software or firmware. Fixed function circuitry may execute software instructions (e.g., to receive parameters or output parameters) but the type of operation that fixed function circuitry performs is typically not variable. In some examples, one or more of the units may be different circuit blocks (fixed function or programmable), and in some examples, one or more of the units may be an integrated circuit.
Prediction geometry codec was introduced as an alternative to octree geometry codec, where nodes are arranged in a tree structure (defining a prediction structure), and various prediction strategies are used to predict the coordinates of each node in the tree with respect to its predictions. Fig. 6 is a conceptual diagram illustrating an example of a prediction tree 600, which is a directed graph with arrows pointing in the prediction direction. The horizontal shadow node is the root vertex and has no predictions; the grid shadow node has two child nodes; the diagonal shadow node has 3 child nodes; the non-shadow points have one child node, the vertical shadow nodes are leaf nodes, and these nodes have no child nodes. Each node has only one parent node.
Four prediction strategies may be assigned to each node based on its parent node (p 0), grandparent node (p 1), and great grandparent node (p 2). The prediction strategies include no prediction, delta prediction (p 0), linear prediction (2 x p0-p 1) and parallelogram prediction (2 x p0+p1-p 2).
The encoder (e.g., g-PCC encoder 200) may employ any algorithm to generate the prediction tree; the algorithm used may be determined based on the application/use case and several policies may be used. The encoder may encode the residual coordinate values of each node in the bitstream starting from the root node in a depth-first manner. Predictive geometry codec is particularly useful for class 3 (LIDAR acquired) point cloud data, e.g., for low latency applications.
The angular pattern may be used to predict geometry codec, where the characteristics of the LIDAR sensor may be used to more efficiently codec the prediction tree. The coordinates of the position fix are converted to (r, phi, i) (radius, azimuth and laser index) and prediction is performed in this domain (residual is encoded in the r, phi, i domain). Due to errors in rounding, the codec in r, phi, i is not lossless, so a second set of residuals corresponding to Cartesian coordinates can be encoded. A description of the encoding and decoding strategies for predicting the angular mode of geometry codec is reproduced below. The description is based on fig. 5A and 5B, which are conceptual diagrams of a rotational LIDAR acquisition model.
The techniques of this disclosure may be applied at least to point clouds acquired using a rotating LIDAR model. Here, the LIDAR 502 has N lasers (e.g., n=16, 32, 64) rotated about the Z-axis according to the azimuth angle Φ (see fig. 5A and 5B). Each laser may have a different elevation angle θ (i) i=1…N And height ofThe laser i hits a point M with cartesian integer coordinates (x, y, z) defined according to the coordinate system 500 depicted in fig. 5A.
The position of M is modeled with three parameters (r, phi, i) which can be calculated as follows:
·φ=atan2(y,x)
·
The codec process may use quantized versions of (r, phi, i), denoted asWherein three integers>And i can be calculated as follows:
·
·
·
wherein the method comprises the steps of
·(q r ,o r ) And (q) φ ,o φ ) Respectively controlAnd->Is used for the quantization parameter of the precision of (a).
Sign (t) is a function that returns a 1 if t is canonical, otherwise returns a (-1).
And |t| is the absolute value of t.
To avoid reconstruction mismatch due to the use of floating point operations, one can pre-calculate and quantize as followsAnd tan (θ (i)) i=1…N Is the value of (1):
wherein the method comprises the steps of
·And (q) θ ,o θ ) Control +.>And->Is used for the quantization parameter of the precision of (a).
The reconstructed Cartesian coordinates are shown below:
·
·
·
where app_cos (.) and app_sin (.) are approximations of cos (.) and sin (.). The computation may use a fixed point representation, a look-up table, and linear interpolation.
In some examples, the first and second memory devices may be configured to store, for various reasons,possibly with->Differently, it comprises:
-quantization
-approximation
Model inaccuracy
Model parameter inaccuracy
In some examples, the reconstructed residual (r x ,r y ,r z ) The following can be defined:
-
-
-
in this method, an encoder (e.g., g-PCC encoder 200) may proceed as follows:
model parametersAnd->Quantization parameter q r q θ And q φ Encoding
Applying the geometry prediction scheme described in G-PCC DIS to the representation
The omicron can introduce a new prediction using LIDAR characteristics. For example, the rotational speed of a LIDAR scanner about the z-axis is typically constant. Thus, the G-PCC decoder may predict the current as follows
Wherein the method comprises the steps of
ο(δ φ (k)) k=1…K Is a set of potential speeds from which the encoder can select. The index k may be explicitly written to the bitstream or may be inferred from the context based on deterministic policies applied by both the encoder and the decoder, and
where n (j) is the number of skipped points, it may be explicitly written to the bitstream or may be inferred from the context based on deterministic policies applied by both the encoder and decoder.
Reconstruct residual error with each node (r x ,r y ,r z ) Encoding
The decoder (e.g., g-PCC decoder 300) may proceed as follows:
model parametersAnd->Quantization parameter q r q θ And q φ Decoding
The pair of nodes associated according to the geometrical prediction scheme described in the G-PCC international standard Draft (DIS)Parameter decoding
Calculate the reconstructed coordinates as described above
Pair residual error (r) x ,r y ,r z ) Decoding
Reconstruction of residual error by quantization (r x ,r y ,r z ) Can support lossy compression
Calculate the original coordinates (x, y, z) as follows
ο
ο
ο
Can be obtained by reconstructing the residual error (r x ,r y ,r z ) Lossy compression is achieved either by applying quantization or by discarding points.
The quantized reconstructed residual is calculated as follows:
·
·
·
wherein (q) x ,o x )、(q y ,o y ) And (q) z ,o z ) Respectively controlAnd->Is used for the quantization parameter of the precision of (a).
In some examples, G-PCC encoder 200 and/or G-PCC decoder 300 may use trellis quantization to further improve RD (rate distortion) performance results. Quantization parameters may be changed at the sequence/frame/stripe/block level to achieve region adaptive quality and for rate control purposes.
One or more disadvantages may exist with the above-described techniques. Prediction geometry codec utilizes a mechanism of rotating LIDAR to predict the location of one point in a point cloud from another point in the point cloud. However, this mechanism is limited to points within the same point cloud frame. No information from points in the previously encoded frame (i.e., reference frame) can be used for prediction.
In accordance with one or more techniques of this disclosure, a G-PCC codec (e.g., G-PCC encoder 200 and/or G-PCC decoder 300) may perform point cloud compression using inter-frame prediction. By using inter-prediction, the G-PCC codec may use redundancy across points of the frame to provide additional bit rate savings. Examples in various aspects of the disclosure may be used alone or in any combination.
Although the discussion focuses primarily on polar coordinate systems, the methods disclosed in the present application are also applicable to other coordinate systems, such as Cartesian coordinate systems, spherical coordinate systems, or any custom coordinate system that may be used to represent point cloud positioning and attributes/codec point cloud positioning and attributes.
The G-PCC codec may determine whether to use inter-prediction or intra-prediction to codec the point. For example, the G-PCC encoder may perform an analysis to determine whether it is beneficial (e.g., in terms of bit rate or other conditions) to use inter-prediction or intra-prediction to code a particular point. In some examples, the G-PCC codec may perform this determination in one of the following ways:
1. a first set of conditions is used at the encoder and the first set of conditions is used at the decoder.
2. A syntax element indicating an inter prediction mode is signaled at the encoder using a first set of conditions, and the decoder determines the mode based on the signaled syntax element.
The first set of conditions may be derived using one or more characteristics of the points, such as Cartesian positioning, angular coordinates (radius, azimuth, elevation), prediction modes of neighboring nodes, and the like. Some examples of conditions included in the first set of conditions are shown below:
1. the point belongs to a frame that is not an intra-frame codec frame or a frame that does not correspond to a random access point.
2. The azimuth of the current point is different from the azimuth of the previously decoded point of the current frame, or the delta azimuth between the current point and the previous point is non-zero. In some examples, an approximation of the Δazimuth angle or a quantized version of the Δazimuth angle may be used to determine the inter prediction mode of the current node.
3: this point belongs to a particular type of slice (i.e., a P slice or a B slice indicating inter prediction may be applied).
4: a flag is signaled to indicate whether inter prediction is enabled for a particular frame. The flag may be signaled in a syntax element associated with the frame (e.g., slice/slice header, frame header, etc.) or in a parameter set referenced by the frame. For example, an inter _ prediction _ enabled _ flag may be signaled in a Geometric Parameter Set (GPS) to specify whether inter prediction is enabled for a point cloud frame/stripe referencing GPS. When inter _ prediction _ enabled _ flag indicates that inter prediction is not enabled, an inter prediction indication of a point in the point cloud may not be signaled.
In some alternatives, the inter prediction mode of a point may be determined based on the residual of azimuth or a phi multiplier associated with the point.
In some alternatives, inter prediction (or one or more inter prediction modes) may be enabled only when the angular mode is enabled (i.e., inter prediction may be disabled if the angular mode is not enabled).
The G-PCC codec may select the reference frame. In some examples, the G-PCC codec may use a previously decoded frame (or in some cases a previous frame in decoding order) as a reference frame. In other examples, an indication of the frame number (using LSB values or delta frame number values) may be used to designate the reference frame. More generally, two or more frames may be designated as reference frames, and a point may be encoded using inter-prediction from any reference frame (an indication of the reference frame associated with the point may be signaled or derived).
In another example, prediction (e.g., bi-prediction) may be performed from two or more frames. Thus, the G-PCC codec may predict the point of the current frame based on the reference point in the first reference frame and the reference point in the second reference frame.
The G-PCC codec may utilize multiple inter prediction modes. When inter-predicting a point, there may be one or more methods to predict the point from a reference frame. Each type of prediction may be specified using a different mode value (e.g., the G-PCC codec may signal a syntax element indicating an inter prediction mode for the current point). As one exemplary inter prediction mode, a point may be predicted from a zero motion candidate from a reference frame (e.g., a reference point in a reference frame may be a zero motion candidate). As another exemplary inter prediction mode, a point may be predicted from global motion candidates from a reference frame. As another exemplary inter prediction mode, a point may be predicted from a candidate point from a reference frame, and other parameters (e.g., motion vectors, etc.) may be used to specify the candidate point.
Some details regarding exemplary different types of inter prediction are provided below.
The G-PCC codec may utilize partial inter prediction modes. In one example, inter-prediction may be applied to only a subset of the characteristics of points in the predicted point cloud. One or more characteristics for which intra-prediction is not applicable may be encoded using intra-prediction or by other techniques. Some examples of partial inter prediction modes are as follows:
1. when a point is indicated as being inter predicted, the radius of the point may be predicted from a reference point (or reference frame) when the angle mode is enabled; azimuth and laser IDs (or elevation) may be derived based on intra-prediction or other techniques.
2. If a point is indicated as inter-predicting, when the angle mode is enabled, predicting both the radius and azimuth of the point from a reference point (or reference frame); the G-PCC codec may derive the laser ID (or elevation) based on intra-prediction or other techniques.
The G-PCC codec may perform signaling of inter prediction modes. In some examples, the inter prediction mode associated with a point may be signaled in the bitstream. This may be signaled as a flag or as a mode value. The following are some exemplary inter prediction mode signaling techniques:
1. An inter-pred-flag equal to 0 may indicate that the point is predicted using intra prediction, and an inter-pred-flag equal to 1 may indicate that the point is predicted using inter prediction. In some examples, based on an indication to predict the point using inter prediction, a mode value may be signaled to specify a method used to perform inter prediction of the point.
2. When predicting points with inter prediction, a mode value may be signaled, where a mode value of 0 specifies zero motion candidates for prediction and a mode value of 1 specifies global motion candidates for prediction.
In this disclosure, inter prediction mode may be used interchangeably with an indication of whether to encode or decode a point using inter prediction and the particular inter prediction technique used.
Certain conditions may affect how the G-PCC codec signals the inter prediction mode. In some cases, the inter prediction mode may be signaled only when the second set of conditions is applicable. The second set of conditions may be derived using one or more characteristics of the points, such as Cartesian positioning, angular coordinates (radius, azimuth, elevation), prediction modes of neighboring nodes, and the like. Some examples of conditions included in the second set of conditions are shown below:
1. The point belongs to a frame that is not an intra-frame codec frame or a frame that does not correspond to a random access point.
2. The azimuth of the current point is different from the azimuth of the previously decoded point, or the delta azimuth between the current point and the previous point is non-zero. In some examples, an approximation of the Δazimuth angle or a quantized version of the Δazimuth angle may be used to determine the inter prediction mode of the current node.
3. Delta azimuth between the current point and the previous point is greater than a threshold; the value of the threshold may be fixed or signaled in the bitstream or derived based on syntax elements signaled in the bitstream.
More generally, the second set of conditions may include one or more criteria including, but not limited to, a Δazimuth value, a Δradius value, a Δlaser ID value, a Δelevation value. Such a delta value of the characteristic may be indicated by a corresponding residual value signaled for the point in the bitstream. The second set of conditions may also include delta coordinate values (Δx, Δy, Δz) in the cartesian domain.
When the inter prediction mode is not signaled for a specific point, the inter prediction mode may be inferred for that point as a fixed value.
4. When the inter prediction mode is not signaled for the point, it is assumed that the value is 0 (or a value specifying intra prediction).
And (5) optimizing signaling. In some cases, the inter prediction mode may not be signaled for all points, but may be derived from syntax elements signaled in the bitstream.
1. The inter prediction mode may be signaled for the prediction tree and may be applied to all points in the prediction tree.
2. The inter prediction mode may be signaled in the geometric stripe and may be applied to all points in the stripe.
3. The inter prediction mode may be signaled for a prediction block (which is designated as a fixed number of points, or the number of points signaled, etc.), and may be applied to all points in the prediction block.
In some examples, the inter prediction mode may be signaled only to some nodes specified by the tree depth.
4. The inter prediction mode may be signaled only for the root node of the prediction tree.
In one example, when inter-prediction is applied for some characteristics of the point (e.g., radius) and intra-prediction is applied for other characteristics of the point (e.g., azimuth, laser ID), only a subset of intra-prediction modes may be allowed (e.g., for azimuth, laser ID, only mode 0 of intra-prediction may be allowed). In some examples, if only one intra prediction mode is allowed, the mode value is not signaled and inferred at the decoder. In some examples, if a subset (two or more) of intra-prediction modes are allowed, a more optimized codec of the mode syntax element may be applied (e.g., fewer bits may be sufficient to codec the index to a subset of modes).
For each point and corresponding to each prediction mode, the G-PCC codec may derive a prediction value of one or more characteristics of the current point based on the positioning of the one or more points in the reference frame. The following is an exemplary prediction process:
when the zero motion candidate is selected, the prediction candidate may be selected as a point having the nearest azimuth and laser ID corresponding to specific values of azimuth and laser ID (e.g., reconstructed azimuth value and laser ID of the current point); the value of the radius of the prediction candidate is selected as the predicted value of the radius of the current node.
When selecting the zero motion candidate, the prediction candidate may be selected as a point having the nearest azimuth and laser ID corresponding to specific values of azimuth and laser ID (e.g., reconstructed azimuth value and laser ID of previously decoded and reconstructed point); the values of the radius, azimuth and laser ID of the predicted candidates are selected as predicted values of the radius, azimuth and laser ID of the current node.
When selecting the global motion candidate, the prediction candidate may be selected as a point having a nearest azimuth and laser ID corresponding to specific values of azimuth and laser ID after applying global motion compensation to the prediction candidate; the value of the radius of the prediction candidate is selected as the predicted value of the radius of the current node.
When selecting the global motion candidate, the prediction candidate after applying global motion compensation to the prediction candidate may be selected as a point having a nearest azimuth and laser ID corresponding to specific values of azimuth and laser ID (e.g., a reconstructed azimuth value and laser ID of a previously decoded and reconstructed point); the values of the radius, azimuth and laser ID of the predicted candidates are selected as predicted values of the radius, azimuth and laser ID of the current node.
When a general candidate is selected, the prediction candidate may be selected as a point having a nearest azimuth and laser ID corresponding to specific values of azimuth and laser ID after a motion vector (associated with a current point) is applied to the prediction candidate; the value of the radius of the prediction candidate is selected as the predicted value of the radius of the current node.
When a general candidate is selected, a prediction candidate after applying a motion vector to the prediction candidate may be selected as a point having a nearest azimuth and laser ID corresponding to specific values of azimuth and laser ID (e.g., a reconstructed azimuth value and laser ID of a previously decoded and reconstructed point); the values of the radius, azimuth and laser ID of the predicted candidates are selected as predicted values of the radius, azimuth and laser ID of the current node.
In some examples, the motion vector may be used to derive a point that is used to derive the radius, azimuth, and laser ID (or elevation) of the current point, or a prediction of these values. In some examples, motion vectors may be used to derive predictions of x, y, z coordinates, or predictions of these values.
In some examples, more than one point in the reference frame may be used to derive the prediction value. The number of points may be signaled in the strip header or in the GPS. In some examples, indexes to two or more points in a reference frame may be signaled to specify a particular point to be used for inter prediction.
4. When the zero motion candidate is selected, two prediction candidates are selected as two points having two nearest azimuth angles and laser IDs corresponding to specific values of azimuth angles and laser IDs; the average of the radii of the two prediction candidates is selected as the predicted value of the radius of the current node.
5. When two or more prediction candidates are selected as candidates for inter prediction (e.g., based on laser ID and/or azimuthal proximity), the prediction candidates may be selected by signaling an index to a list comprising two or more prediction candidates. In some alternatives, a weighted average (a weight calculated based on the difference between the laser ID value of the current point and the predicted value and/or the difference between the azimuth angle of the current point and the predicted value) may also be used as a prediction candidate.
The G-PCC codec may perform preparation of the reference frame. In some examples, the reference frame may be motion compensated prior to inter prediction. For example, one or more points of a reference frame may be associated with motion (e.g., global motion), and prediction of a subsequent frame may be obtained by positioning with the associated motion compensation points. The motion may be estimated or obtained from external information, such as GPS information. Thus, the "reference frame" used for inter prediction may be a "compensated reference frame". In other words, the reference frame may be a motion compensated reference frame.
In some examples, one or more points in the compensated reference frame may be considered zero motion candidates for the current frame. In some examples, points in the reference frame to which motion compensation is applied may belong to a particular feature (e.g., object) in the point cloud, or they may be marked (e.g., ground, non-ground) by some estimation algorithm. Global motion parameters (e.g., rotation, translation) may be signaled in the bitstream.
In some examples, in addition to the global motion parameter, additional adjustment values for one or more of the following may be signaled: x, y, z, radius, azimuth, laser id. For example, when preparing a predicted frame by applying motion compensation (e.g., using motion parameters), one or more additional adjustment values for x, y, z may be applied in the cartesian domain; one or more adjustment values for radius, azimuth, laser id may be applied in the spherical domain.
In some examples, an adjustment value (representing a "global" adjustment) may be specified for the entire sequence; these values may also be incorporated into the global motion parameter.
In some examples, an adjustment value (representing a "group" adjustment) may be specified for a group of frames; these values may also be incorporated into the global motion parameter.
In some examples, an adjustment value may be specified for a frame (representing a "frame" adjustment); these values may also be incorporated into the global motion parameter.
In some examples, adjustment values (representing "local" adjustments) may be specified for different regions of the point cloud; these values may also be incorporated in the global motion parameters or signaled as modifications to the global motion applicable to the frame.
One or more of global adjustment, frame adjustment, and local adjustment may be signaled; when more than one of these parameters is present, the respective adjustments may be applied continuously or together (e.g., global, frame, and local x-adjustments xg, xf, xl may be applied as xg+xf+xl, respectively, of points belonging to the region). In other examples, all adjustments of one level (e.g., global) may be applied before the adjustments of a different level (e.g., frame) are applied.
The motion vectors associated with the inter prediction modes of the points may be signaled in the bitstream. In some cases, multiple points may share the same motion vector and may be signaled in a stripe or in a parameter set (e.g., GPS or a parameter set dedicated to motion parameters). In some cases, a motion vector associated with a particular point may be predicted from neighboring points of inter prediction in space or time, and only the difference between the actual motion vector and the predicted motion vector may be signaled.
The G-PCC codec may perform the context selection. For example, the G-PCC codec may determine a context to use for predicting the geometry parameters/syntax elements based on one or more syntax elements signaled in the bitstream or variables derived therefrom. The use of additional contexts may improve the compression efficiency of the codec.
In some examples, the G-PCC codec may utilize a first parameter associated with a point to determine a context of other syntax elements associated with the point. For example, the context of one or more syntax elements associated with the phi multiplier (e.g., ptn_phi_mult_abs_gt0_flag, ptn_phi_mult_sign_flag, ptn_phi_mult_abs_gt1_flag, ptn_phi_mult_abs_minus2, ptn_phi_mult_abs_minus9) may be selected based on the inter-frame flag.
In one example, the inter flag indicates whether to select a first set of contexts in the case of using inter prediction or a second set of contexts in the case of using intra prediction. This may apply to the encoding of the phi multiplier, primary residual and secondary residual in case of angular mode enabled prediction geometry (spherical positioning codec: azimuth, radius, elevation or laser ID).
If the context of the syntax element depends on a first set of conditions, one context may be selected when the first set of conditions is true and the first parameter associated with the point takes a first value, and a second context may be selected when the first set of conditions is true and the first parameter associated with the point takes a second value different from the first value.
The different choices of the first parameter are as follows; each selection may present a different trade-off between compression efficiency and storage (more contexts may require more storage associated with context states):
1. in one example, the first parameter may be an inter-frame flag associated with the point.
2. In another example, the first parameter may be a prediction mode associated with the point (e.g., a prediction mode used in predicting geometry).
3. In another example, the first parameter may be a tuple consisting of an inter flag and a prediction mode associated with the point, e.g. (Interflag, predMode)
4. When there is more than one inter prediction candidate (e.g., zero motion candidate, global motion candidate, etc.), the inter predmode is caused to represent various inter prediction candidates (e.g., 0 for intra, 1 for zero motion candidate, 2 for global motion candidate, etc.). The first parameter may be an interppredmode.
In some examples, the G-PCC codec may use different conditions/parameters for different types of frames. For example, for intra-coded frames (not predicting with reference to other frames in the sequence), a second parameter may be selected to determine context as described in the present disclosure, while for other frames (e.g., inter-coded frames), a third parameter may be selected to determine context as described in the present disclosure. The second parameter and the third parameter may be different (e.g., for an intra frame, the prediction mode may be selected as the second parameter, and for an inter frame, the tuple of the prediction mode and the inter flag may be selected as the third parameter). The G-PCC codec may extend similar context selections to one or more syntax elements/components (primary residual, secondary residual, etc.).
Fig. 7 is a conceptual diagram illustrating an exemplary inter-prediction process for predicting points in a point cloud according to one or more aspects of the present disclosure. As shown in FIG. 7, the current frame 750 may include a plurality of points 752A-752L (collectively, "points 752") and the reference frame 754 may include a plurality of sub-points 756A-756L (collectively, "points 756"). The reference frame 754 may be a frame that is encoded and/or reconstructed before the current frame 750 is decoded and/or reconstructed (e.g., the reference frame 754 may be located before the current frame 750 in the codec order). The G-PCC codec may utilize intra-prediction to predict one or more of the points 752 of the current frame 750 based on one or more of the points 756 of the reference frame 754. For example, a G-PCC decoder (or a reconstruction loop of a G-PCC encoder) may predict one or more parameters (e.g., (r, phi, i)) of a current point 752A in points 752 based on one or more of points 756.
To perform intra prediction to predict a current point in a current frame, the G-PCC codec may determine a reference point in a reference frame different from the current frame, and predict one or more parameters of the current point based on the reference point. For example, to predict the current point 751 a, the g-PCC codec may determine the reference point 756A and predict one or more parameters of the current point 752A based on the one or more parameters of the reference point 756A. The determined reference point may be referred to as an identified reference point.
The G-PCC codec may use any suitable technique to determine the reference point. As one example, the G-PCC codec may determine a pivot point in the current frame that is located before the current point in the encoding and decoding order; and determining a reference point based on one or more parameters of the pivot point. For example, in the case where the codec sequence is counter-clockwise, the G-PCC codec may determine that point 752B is a point prior to the current point 752A (e.g., a point immediately preceding the current point in the codec sequence) (i.e., determine that point 752B is a pivot point), and determine the reference point based on one or more parameters of pivot point 752B.
To determine a reference point based on one or more parameters of the pivot point, the G-PCC codec may determine a reference pivot point based on an azimuth of the pivot point in a reference frame; and determining the reference point based on the reference pivot point. For example, the G-PCC codec may determine a point in reference frame 754 that has the same azimuth (or the same azimuth and the same laser ID) as pivot point 752B. In the example of fig. 7, the G-PCC codec may determine that point 756B is the reference pivot point because point 756B has the same azimuth as pivot point 752B. Although the reference pivot point in the example of fig. 7 corresponds to an actual point (e.g., the actual point in frame 754), the techniques of this disclosure are not necessarily limited thereto. For example, in some examples, the reference pivot point may be a virtual point that does not correspond to a reconstruction point in the reference frame 754.
In some examples, the G-PCC codec may determine the reference pivot point based on an actual (e.g., un-scaled) azimuth of the pivot point. In other examples, the G-PCC codec may determine the reference pivot point based on the scaled azimuth of the pivot point. For example, the G-PCC codec may determine the scaled azimuth by scaling the azimuth of the pivot point by a constant value.
To determine the reference point based on the reference pivot point, the G-PCC codec may identify a point having an azimuth greater than an azimuth of the reference pivot point as a reference point in the reference frame. For example, the G-PCC codec may determine which of the points 756 have azimuth values that are greater than the azimuth value of the reference pivot point, and select (from the set of points 756 having azimuth values that are greater than the azimuth value of the reference pivot point) the point with the smallest azimuth value. In this example of fig. 7, point 756A may be a point in reference frame 754 having a minimum azimuth that is greater than the azimuth of reference pivot point 765B. Thus, the G-PCC codec may identify the point 756A as a reference point for performing intra prediction of the current point 752A.
In some examples, the G-PCC codec may determine the reference point based on an actual (e.g., un-scaled) azimuth of the reference pivot point. In other examples, the G-PCC codec may determine the reference point based on the scaled azimuth of the reference pivot point. For example, the G-PCC codec may determine the scaled azimuth of the reference pivot point by scaling the azimuth of the pivot point by a constant value. Thus, in some examples, the G-PCC codec may determine the reference point by identifying a point (e.g., point 756A) having a minimum scaled azimuth that is greater than the scaled azimuth of the reference pivot point as the reference point. In some examples, the G-PCC codec may utilize points having a second, smaller azimuth angle than the scaled azimuth angle. For example, the G-PCC codec may determine the reference point by identifying a point (e.g., point 756L) having a second smallest scaled azimuth that is larger than the scaled azimuth of the reference pivot point as the reference point.
The G-PCC codec may predict the parameters of the current point 752A based on the parameters of the reference point 756A. For example, the G-PCC codec may signal residual data representing the difference between the parameters of the current point 752A and the reference point 756A. The G-PCC decoder may add residual data to the parameters of reference point 756A to reconstruct the parameters of current point 752A.
Although discussed above as using a single reference point in a single reference frame, the techniques of this disclosure are not limited thereto. As one example, multiple reference points in a single reference frame may be used together to predict the current point. For example, the G-PCC codec may determine a plurality of reference points in a reference frame and based on the reference pivot point. The G-PCC codec may predict one or more parameters of a current point in the current frame based on the plurality of reference points. As another example, a current point may be predicted using reference points from multiple reference frames.
As described above, the G-PCC codec may perform azimuth prediction. Let (r, phi, laser ID) be the three coordinates of the pivot point in the spherical coordinate system (called radius, azimuth and laser ID). The techniques disclosed herein may also be applied to other coordinate systems.
In some examples, the G-PCC codec may codec points in the current point cloud frame in an orderly manner as follows:
1. for a current point in the current frame, the G-PCC codec may select a pivot point in the current frame that is located before the first point in decoding order. In some examples, the pivot point is a previous point in the current frame in decoding order. In some examples, the pivot point is a second previous point in the current frame in decoding order. More generally, more than one previous point may be selected as the pivot point for the current point. In some examples, the pivot point may be a virtual point derived based on a previously decoded point in the current frame and an azimuthal displacement that is a multiple of an azimuthal quantization scale value (predetermined or derived from a signaled syntax element).
The g-PCC codec may select one point in the reference frame, i.e. the reference pivot point associated with the pivot point. The reference pivot point may be selected as a point in the reference frame having the same azimuth and laser ID as the pivot point. In some examples, points with other laser ID values may also be candidates for the reference pivot point (e.g., the reference pivot point may be selected as a point in the reference frame that has the same azimuth as the pivot point and that is within a range of [ LaseID-M, laserID+M ], where LaserID is the laser ID of the pivot point and M is a fixed value (e.g., 1), or may be selected based on the distance of the pivot point from the origin, or derived as a function of LaserID (e.g., M may be smaller for smaller LaserID values and M may be greater for larger laser ID values.) in some examples, the distance metric may be defined using azimuth and laser ID and the reference pivot point may be selected as a point in the reference frame that has a minimum distance from the pivot point and azimuth, the normalized azimuth value may be obtained by using azimuth values at a first constant value, the normalized ID may be obtained by using laser ID values at a second constant value, or may be derived as a function of the laser ID of the pivot point and the origin (e.g., M may be smaller for smaller laser ID and larger for larger laser ID values), and the reference pivot point may be scaled from the reference frame that has the same azimuth value as the pivot point and the laser ID may have a nominal azimuth value in the pivot point.
3.G-PCC codec may select a reference point in the reference frame that is associated with the reference pivot point. The reference point may be selected as a point in the reference frame having a minimum azimuth that is greater than the azimuth of the reference pivot point and having the same laser ID as the reference pivot point. The reference point may be selected as a point in the reference frame having a second smallest azimuth angle that is larger than the azimuth angle of the reference pivot point and having the same laser ID as the reference pivot point. In some examples, when a reference point is not available, inter prediction may be disabled for the current point. In some examples, the reference point may be selected as the reference pivot point.
4.G-PCC codec may calculate a first residual between the reference point and the reference pivot point.
The g-PCC codec may use the first residual to derive a first prediction for the current value. The prediction may be derived by adding a component of the first residual to a corresponding component of the pivot point (e.g., a radius prediction (similar to azimuth) may be obtained by adding a radius component of the first residual to a radius component of the pivot point). In some examples, the first prediction may be set equal to the reference point.
6.G-PCC codec may codec a second residual between the first prediction and the location of the current point.
7. Composition of residual: the one or more residuals disclosed in the present disclosure may include one or more of the following: the residual may comprise a radius residual between the reference pivot point and the reference point. The residual may comprise an azimuth residual between the reference pivot point and the reference point.
8.G-PCC codec may derive the current point based on the first prediction and the second residual. In some examples, the G-PCC codec may derive the current point from the second residual (e.g., not based on the first prediction).
The G-PCC codec may apply one or more of the techniques described above to the quantized azimuth values; the scale value for quantization may be derived from the signaled value or predetermined. The quantized azimuth value and the laserID may be used to search for inter-predicted points in the reference. For example, the azimuth of the previously decoded and reconstructed point may be quantized, and the inter-predicted point with quantized azimuth and laserID closest to the previous point may be selected as the prediction of azimuth, radius and laserID for the current point, or as the quantized/dequantized prediction, or not quantized.
The G-PCC codec may apply one or more of the techniques described above to the quantized laserID value; the scale value for quantization may be derived from the signaled value or predetermined. The azimuth and quantized laserID values may be used to search for inter-predicted points in the reference. For example, the laser id of the previously decoded and reconstructed point may be quantized, and the inter-predicted point having the azimuth and quantized laser id closest to the previous point and quantized laser id may be selected as the prediction of azimuth, radius and laser id of the current point, or as the prediction of quantization/dequantization, or not.
The G-PCC codec may apply one or more of the techniques described above to the quantized azimuth and quantized laserID values; the scale value for quantization may be derived from the signaled value or predetermined. The quantized azimuth and quantized laser id values may be used to search for inter-predicted points in the reference. For example, the azimuth and the laserID of the previously decoded and reconstructed point may be quantized, and the quantized azimuth and inter-predicted point of the quantized laserID having the quantized azimuth and quantized laserID closest to the previous point may be selected as the prediction of azimuth, radius and laserID of the current point, or as the prediction of quantization/dequantization, or not.
In some examples, a reference frame may refer to a set of (radius, azimuth, laser id) tuples derived from the reference frame. For example, for each point in the reference frame, if there are no other points in the set that have the same azimuth as the laser ID, the G-PCC codec may add radius, azimuth, laserID to the set. In some cases, a quantized value of azimuth may be added. In some cases, even if another tuple with the same phi and laserID exists in the set, e.g., (r 1, phi, laserID), if the value of r is less than the value or r1 (in which case the existing tuple (r 1, phi, laserID) can be replaced with a new (r, phi, laserID)), the (r, phi, laserID) can be added. In some cases, points in the reference frame may be in the x, y, z domains; the points may be stored for reference as they are or by converting them into spherical domains. In some cases, the motion compensated positioning may be added to the reference frame. The compensation may be based on signaled motion vectors (e.g., global motion vectors with rotation and/or translation) associated with the current frame and the reference frame.
The G-PCC codec may perform context selection for inter prediction modes. As described above, the G-PCC codec may signal the inter prediction flag and the inter prediction mode. The context that the G-PCC codec uses to codec the inter prediction flag or inter prediction mode may be selected as follows: the G-PCC codec may set b1, b2, b3, … bN to the inter prediction flag values of the N previously decoded points (b 1 is the immediately preceding node, b2 immediately preceding b1, etc.). The G-PCC codec may select the context index as a number generated by using b1, b2 … bN as follows: ctxidx=b1+b2 < <1+b3< <2+ … bN < (N-1). In another example, the G-PCC codec may select the context index as b1+b2+ … +bn. In some examples, the values of b1, b2, ·bn may also be used to select the context of other syntax elements such as inter prediction modes, motion vectors, and the like. In some examples, N may be fixed at 5.
The G-PCC codec may perform improved radius residual codec. The radius, azimuth and laserID associated with each point may be encoded in the prediction geometry codec. The residuals of the radius components for each point may be encoded with an "equal_to_zero" flag, sign bits, the number of bits of the residual and/or the residual. The number of bits may be an indicator of the relative magnitude of the radius residual. When points captured by the LIDAR are close to each other in a point cloud (e.g., points of an object or building captured by one of the lasers), the radius may not change much between the points. Since the object (likely to be a solid) is continuous, successive LIDAR scan points may be returned to the sensor with the azimuthal difference between successive points approaching the sampling frequency of the azimuth. The difference in azimuth or azimuth residual of neighboring points (of the same laser) on the object is close to zero.
The context of the number of bits of the radius residual is selected based on the azimuth residual of the current point. For example, based on the magnitude of the azimuth residual, the context may be switched as follows, where N is a fixed value (signaled or predetermined). Table 1 below provides one example of a context index of the number of bits used in the number of bits of the radius residual N.
Table 1: context index for the number of bits of a radius residual
It is noted that other methods of selecting a context index based on a radius residual may also be used.
The following specific examples illustrate specific embodiments of several techniques of the present disclosure.
Example a:
for each point (random access point) in a point cloud frame that is not all intra-frame codecs, a flag may be used to specify whether inter-prediction points are to be used for codec. When inter-frame prediction is applied, a "zero motion vector" candidate is selected to predict the radius of the predicted point from the point in the reference frame that has the same laser ID value as the current point and has the quantized azimuth value closest to the current point. The syntax structure is changed as follows (< ADD > … </ADD > tag means addition, < DELETE > … </DELETE > tag means deletion):
<ADD>
ptn_inter_flag [ nodeIdx ] equal to 0 specifies that the radius residual of the current node is encoded using intra prediction. ptn_inter_flag [ nodeIdx ] equals 1 specifying that the radius residual of the current node is encoded using inter prediction. When not present, the value of ptn_inter_flag [ nodeIdx ] is inferred to be equal to 0.
</ADD>
When the current point cloud frame is a random access point or a full intra frame, the value of the intersonableflag is set to 0, otherwise, is set to 1.
In some alternatives, when the ptn_inter_flag is equal to 1, the syntax element ptn_pred_mode [ ] is not signaled and inferred as a default value (e.g., equal to 0, corresponding to the prediction geometry mode 0).
The modification of the positioning prediction process is as follows:
positioning prediction process
The inputs to this process are:
the variable predMode indicates the prediction mode of the current node,
-arrays aPos0, aPos1 and aPos2, wherein the values are aPosX [ k ], k=0..2, and x=0..2. Each array contains locations associated with the X-th ancestor nodes in the depth-first tree traversal order,
a variable curDepth indicating the node distance between the current node and the root node of the current prediction tree,
<ADD>
-a variable interFlag indicating whether the radius of the current node is coded using inter prediction.
</ADD>
The output of this process is an array predPos, whose value is predPos k, k=0.2, indicating the predicted point locations associated with the nodeIdx tree nodes.
When predMode is equal to 0, the predicted point location is calculated as follows,
when predMode equals 1, the predicted point fix is the fix associated with the first ancestor.
for(k=0;k<3;k++)
predPos[k]=aPos0[k]
When predMode is equal to 2, the predicted point locations are linear combinations of the locations associated with the first two ancestors
for(k=0;k<3;k++)
predPos[k]=aPos0[k]+aPos0[k]-aPos1[k]
Otherwise, predMode equals 3, the predicted point locations are linear combinations of the locations associated with all three ancestors
for(k=0;k<3;k++)
predPos[k]=aPos0[k]+aPos1[k]-aPos2[k]
<ADD>
When the InterFlag is equal to 1, predPos [0] is derived as follows. Let refFramePos [ k ] (k=0..2) be the location of the point in the reference frame, such that refFramePos [2] =predpos [2], and the absolute difference between the values of refFramePos [1] and predPos [1] is the lowest in the reference frame midpoint. PredPos [0] is set equal to refFramePos [0].
</ADD>
In one alternative, quantized values of refFramePos [1] and predPos [1] are used for the absolute difference of the measured values, where the quantization scale may be a fixed number (e.g., geom_angular_azimuth_step_minus1+1)
Example B:
in example B, when inter prediction is used, the prediction mode may not be signaled. For example, when inter prediction is used, the prediction mode is inferred to be equal to 0. The syntax structure is changed as follows (< ADD > … </ADD > tag means addition, < DELETE > … </DELETE > tag means deletion):
When ptn_pred_mode [ nodeIdx ] is not signaled, ptn_pred-mode [ nodeIdx ] can be inferred to be equal to 0. In some examples, the following inferences may be added: when ptn_inter_flag [ nodeIdx ] is equal to 1, ptn_pred_mode [ nodeIdx ] is inferred to be equal to 0.
Example C:
in example C, the signaling order of phi and prediction modes may not be modified. The signaling of the prediction mode may be conditioned on the signaling of the inter-frame flag. The signaling of ptn_phi_mult_abs_gt0_flag may also be conditioned on an inter flag. The syntax structure is changed as follows (< ADD > … </ADD > tag means addition, < DELETE > … </DELETE > tag means deletion):
when the ptn_inter_flag [ nodeIdx ] is not signaled, the ptn_inter_flag [ nodeIdx ] is inferred to be equal to 0.
When ptn_phi_mult_abs_gt0_flag [ nodeIdx ] is not signaled, ptn_phi_mult_abs_gt0_flag [ nodeIdx ] is inferred to be equal to 0. In another example, the following reasoning is added: when ptn_inter_flag [ nodeIdx ] is equal to 1, ptn_phi_mutt_abs_gt0_flag [ node Idx ] is inferred to be equal to 1.
Example D:
in example D, the order of the prediction mode and phi syntax elements may not be changed. For example, the inter flag may be signaled after the prediction mode but before the phi syntax element.
When ptn_pred_mode [ nodeIdx ] is greater than 0 and intersonableflag is equal to 1 (ptn_inter_flag [ ] is not signaled), ptn_inter_filg [ nodeIdx ] is inferred to be 0.
Example E:
example E is similar to example a, modified in that the signaling of the inter flag also depends on the value of pred_mode. The whole of the following syntax is added.
When InterEnableFlag is equal to 1 and PtnPhiMult [ nodeIdx ] is equal to 0, or ptn_pred_mode [ nodeIdx ] is not equal to 0, ptn_inter_flag [ nodeIdx ] is inferred to be equal to 0.
Example F:
in this example, the predicted predPos [0] for a pair of angles is obtained by comparing not only azimuth angles of points having the same laser ID as the current point, but also azimuth angles of points of adjacent laser IDs.
When the InterFlag is equal to 1, the G-PCC codec may derive predPos [0] as follows. The G-PCC codec may set refFramePos [ k ] (k=0..2) to the location of the point in the reference frame such that refFramePos [2] takes one of the values { predPos [2] -1, predPos [2] +1} and the absolute difference between the values of refFramePos [1] and predPos [1] is lowest in the point of the reference frame. The G-PCC codec may set predPos [0] equal to refFramePos [0].
In one case, the reference point may be obtained taking into account points from more laser IDs in the reference frame.
In another case, instead of comparing azimuth angles (i.e., refFramePos [1] and predPos [1 ]), a weighted cost function may be specified as follows:
J(refPos)=w1*f1(refPos[1],predPos[1])+w2*f2(refPos[2],predPos[2])
a point in the reference frame located as refPos k (k=0.2) is selected for predicting the radius, which is the point with the smallest value of J (refPos).
In one case, the value of f1 (x, y) may be the square of the difference between x and y, or more generally, any function that indicates the distance between the current point and the reference point in the azimuth (or second) dimension.
In one case, the value of f2 (x, y) may be the square of the difference/absolute difference between x and y, or more generally, any function indicative of the distance between the current point and the reference point in the laser ID (or third) dimension.
In another case, more than one reference candidate may be selected for a point, and an inter prediction candidate index may be signaled to specify which reference candidate to select for the point.
For example, for the current point, three points in the reference frame may be selected as follows:
-the point of laser ID equal to predPos 2-1 with its azimuth value closest to predPos 1
-the point of laser ID equal to predPos 2, with its azimuth value closest to predPos 1
-the point where the laser ID is equal to predPos [2] +1, with the azimuth value closest to predPos [1]
The index of the set/list may be signaled to specify the points selected for prediction.
More generally, for each current point that is encoded and decoded with inter prediction for a radius:
-selecting a reference frame based on a predetermined decision (e.g. a previously decoded frame) or based on one or more signaled indications (e.g. frame index/counter values).
Selecting a set RefCandSuperSet in points in the reference frame based on a first set of conditions (e.g., one or more methods disclosed herein based on laser ID range and/or azimuth range)
Selecting a subset RefCandSubset from RefCandSuperSet based on a second set of conditions (e.g. the azimuth angle of the particular laser ID closest to the current point, etc.)
If there is more than one entry for RefCandSubset, then arrange the entries in the list RefCandList based on a third set of conditions (e.g., arrange the entries based on an increase in laser ID difference from the current point); otherwise (only one candidate in RefCandSubset), then one candidate in RefCandSubset is used for prediction.
-signaling the index of the entry for prediction in RefCandList.
-predicting the radius of the current point using the prediction entry.
In some examples, when there is no point in the reference frame in RefCandSubset, inter prediction may be disabled, and in this case an inter flag may or may not be signaled.
Example G:
according to one or more aspects of the present disclosure, the G-PCC codec may signal parameters in the bitstream for deriving the azimuth quantization/scale. In some examples, the G-PCC codec may use previously decoded points in the current point cloud frame to inter-predict all three components.
The bearing scale value (used to derive inter prediction candidates) may be signaled in GPS as follows:
in some examples, inter_azim_scale_log2 may be signaled regardless of inter_prediction_enabled_flag.
In another example, the angle mode may also be used to adjust the signaling of inter_azim_scale_log2.
The following are exemplary semantics of the above-described syntax elements.
inter _ prediction _ enabled _ flag equal to 1 specifies that inter prediction can be used for reference to a GPS point cloud frame. inter _ prediction _ enabled _ flag equal to 0 specifies that inter prediction is not used for reference GPS point cloud frames.
inter _ azim _ scale _ log2 is used to scale the orientation corner points that may be used in deriving the inter prediction reference. The value of inter_azim_scale_log2 should be in the range of 0 to numazim bits (inclusive).
(variable numazim bits may specify a maximum number of bits for representing azimuth.)
In some examples, the inter_azim_scale_log2 syntax element may be an example of a syntax element specifying a base 2 logarithmic value for the scale factor. The G-PCC codec may determine a scaled azimuth (e.g., a scaled azimuth of the pivot point) based on the scaling factor.
When encoding and decoding a frame, the reconstructed spherical position of one or more points in the frame may be stored in a reference frame, which may be used by future frames as inter-prediction references. The reconstructed sphere location may be stored in a hash table as follows:
the G-PCC codec may obtain a spherical coordinate representation of the reconstruction point.
The G-PCC codec may obtain the derived azimuth and laser index from the reconstruction point and use the tuple of the derived azimuth and laser index as an index to the hash table.
For example, the derived quantized azimuth, denoted quantized (val), may be derived by the G-PCC codec as follows:
int offset=azimScaleLog2?(1<<(azimScaleLog2 -1)):0;
quantized(val)=val>=0?(val+offset)>>azimScaleLog2
:-((-val+offset)>>azimScaleLog2);
Note that in some cases, azimScaleLog2 may not be signaled and is inferred to be one of:
value 0 (i.e. without quantization).
Less than or equal to the smallest power of 2 of the azimuth speed signaled in the bit stream.
Less than or equal to the smallest power of 2 of the signaled (azimuth speed/2) in the bit stream.
Different sequences may have different azimuthal scaling values.
Each entry in the hash table may be derived from one or more reconstructed points.
In one alternative, the hash table entry of the frame may be the first point in the reconstructed frame with specific values of the derived azimuth and the laser index in decoding order.
In one alternative, the hash table entry of the frame may be the point of the minimum radius of the reconstructed frame among the points having specific values of the derived azimuth and the laser index.
In a further alternative, the hash table entry of the frame may be a point value corresponding to an average, mean, median, weighted average, geometric mean, etc. calculated using points having specific values of the derived azimuth and the laser index.
In some examples, the reconstructed point location (r, phi, laserID) in the spherical domain may be entered into the azimuth table as follows: table index= (quantized (Phi), laserID), table entry= (r, phi, laserID).
When the current point (in the current frame) is encoded, the following steps may be performed:
the G-PCC codec may set prevnode= (r 1, phi1, laserID 1) as the previous node in the current frame in decoding order. In some alternatives, the prevNode may be selected as the parent node of the current node.
In the table associated with the reference frame, the G-PCC codec may check if there is an index (quantized (phi), laserID 1) where quantized (phi) is greater than quantized (phi 1). If such an index does not exist, the G-PCC codec may not apply inter prediction to the current node. If there is at least one such index, the G-PCC codec may select the index with the smallest quantized (phi) (which is greater than quantized (phi 1)).
The G-PCC codec may use the entries associated with the above selected indices (r_inter, phi_inter, laserid_inter) as inter prediction of the current point.
In some examples, the following code may be used to implement the above techniques by a combination of hash table arrays (rather than 2D hash tables), where computer phiquantized () corresponds to the quaternize () function described previously. In the following case, refPointVals is a vector of a hash table storing a reference point, refPointVals [ currLaserId ] is a hash table corresponding to index LaserID=currLaserId, and all entries in refPointVals [ currLaserId ] store quantized (phi) with increasing values. Because the table is stored in increments quantized (phi), the upper_bound function is used to search for the first quantized (phi) that is greater than the quantized current azimuth quantize (currAzim). In the following example, idx is equal to refPointVals [ currLaserId ]. End (), there is no such table index, and therefore, inter prediction is not used. When present, idx- > second specifies inter prediction candidates (r_inter, phi_inter, laserid_inter).
In some alternatives, the hash table may not store the point at which the quantized (phi) value is incremented, but rather use a hash function to parse and recover the table entry. In another alternative, a generic table/data structure may be used to store points from the reference frames. As described above, the techniques described above may be implemented using the following code.
If inter prediction is used, the G-PCC codec may signal a flag indicating that inter prediction is used.
Codec the residual using the inter prediction candidates derived above (at the decoder, the prediction candidates are used to add the residual and derive the reconstructed spherical coordinate location of the current point).
Fig. 8 is a flow diagram illustrating an exemplary technique for inter-predicting points in a point cloud in accordance with one or more aspects of the present disclosure. Although described with respect to G-PCC encoder 200 (fig. 1 and 2), it should be appreciated that other devices may be configured to perform a method similar to the method of fig. 8.
G-PCC encoder 200 may determine whether to use prediction geometry codec to predict the current point in the current point cloud frame (850). In response to determining to predict the current point using the prediction geometry codec (yes branch of 850), G-PCC encoder 200 may select a prediction mode for the current point from the set of prediction modes (852). The prediction mode set may include at least an intra prediction mode and an inter prediction mode. The G-PCC encoder 200 may use various techniques to select the prediction mode for the current point, but may generally select the prediction mode that results in minimizing the number of bits that need to be signaled. In response to determining not to use predictive geometry codec to predict the current point (the "no" branch of 850), G-PCC encoder 200 may use a different technique, such as octree geometry codec, to predict the current point (860).
G-PCC encoder 200 may determine whether an inter prediction mode is selected for the current point (854). In response to selecting the inter prediction mode for the current point ("yes" branch of 854), G-PCC encoder 200 may use inter prediction to predict the current point of the point cloud (856). In response to not selecting the inter prediction mode for the current point ("no" branch of 854), G-PCC encoder 200 may predict the current point of the point cloud using a different prediction geometry codec technique, such as intra prediction (858).
To predict the current point of the point cloud using inter-prediction (856), G-PCC encoder 200 may perform the techniques discussed above with reference to fig. 7. For example, G-PCC encoder 200 may determine a pivot point in a current point cloud frame (i.e., a current point cloud or a current frame), determine a reference point cloud frame (i.e., a reference point cloud or a reference frame), determine a reference pivot point in the reference point cloud frame based on the pivot point in the current point cloud frame, determine a reference point in the reference point cloud frame based on the reference pivot point, and encode residual data representing differences between parameters of the reference point (e.g., radius r, azimuth angle phi, and laser index i) and parameters of the current point (e.g., in a bitstream).
In some examples, G-PCC encoder 200 may encode a first syntax element in the bitstream that indicates whether to encode the current point using inter prediction. For example, G-PCC encoder 200 may encode the inter-prediction flag with a value indicating whether the current point is encoded using inter-prediction (e.g., a value of 0 may indicate that the current point is encoded using intra-prediction, and a value of 1 may indicate that the current point is encoded using inter-prediction). The G-PCC decoder may determine whether to use inter prediction to encode the current point based on the value of the first syntax element.
The signaling of the first syntax element (e.g., indicating whether the current point is encoded using inter prediction) may be conditional. As a first example condition, G-PCC encoder 200 may encode a second syntax element indicating whether inter-prediction is enabled (e.g., for the current point cloud frame). In the case where the G-PCC encoder 200 encodes the second syntax element with a value that indexes the inter prediction enabled, the G-PCC decoder 200 may decode the first syntax element. On the other hand, in the case where the G-PCC encoder 200 encodes the second syntax element with a value that indexes non-inter prediction enabled, the G-PCC encoder 200 may not encode the first syntax element. As another example condition, the G-PCC encoder may signal (e.g., encode) the first syntax element if the current point cloud frame is not an intra-coded frame (e.g., and not encode the first syntax element if the current point cloud frame is an intra-coded frame). As another example condition, the G-PCC encoder may signal (e.g., encode) the first syntax element if the current point cloud frame does not correspond to a random access point (e.g., and not encode the first syntax element if the current point cloud frame does not correspond to a random access point). As another exemplary condition, the G-PCC encoder may signal (e.g., encode) the first syntax element if the slice including the current point is of a particular type (e.g., a slice type that allows inter prediction, such as a P slice or a B slice). As another example condition, the G-PCC encoder may signal (e.g., encode) the first syntax element if the angle mode is enabled (e.g., and not encode the first syntax element if the angle mode is not enabled).
The G-PCC encoder 200 may encode the one or more syntax elements using context-adaptive binary arithmetic coding (CABAC). In some examples, G-PCC encoder 200 may select a context based on a value of an inter-prediction element (e.g., a first syntax element). As one example, G-PCC encoder 200 may select a context (e.g., ptn_phi_mut_abs_gt0_flag, ptn_phi_mut_sign_flag, ptn_phi_mut_abs_1_flag, ptn_phi_mut_abs_minus2, ptn_phi_mut_abs_minus9) for CABAC encoding one or more syntax elements representing the phi multiplier based on the values of the inter-prediction syntax elements. As another example, G-PCC encoder 200 may select a context (e.g., ptn_residual_abs_gt0_flag, ptn_residual_sign_flag, ptn_residual_abs_log2, ptn_residual_abs_remaining) for CABAC encoding one or more syntax elements representing the primary residual data based on the values of the inter-prediction syntax elements. As another example, G-PCC encoder 200 may select a context of CABAC for encoding an instance of an inter-prediction syntax element based on values (e.g., instances of inter-prediction syntax elements for previous point codecs) of N previous instances (e.g., 2, 3, 4, 5, 6, 7, 8) of the inter-prediction syntax element.
Fig. 9 is a conceptual diagram illustrating an exemplary ranging system 700 that may be used with one or more techniques of this disclosure. In the example of fig. 9, ranging system 700 includes an illuminator 702 and a sensor 704. The illuminator 702 may emit light 706. In some examples, illuminator 702 can emit light 706 as one or more laser beams. The light 706 may take one or more wavelengths, such as infrared wavelengths or visible wavelengths. In other examples, the light 706 is not a coherent laser. When the light 706 encounters an object, such as object 708, the light 706 generates return light 710. The return light 710 may include back-scattered light and/or reflected light. The return light 710 may pass through a lens 711 that directs the return light 710 to create an image 712 of the object 708 on the sensor 704. The sensor 704 generates a signal 714 based on the image 712. The image 712 may include a set of points (e.g., as represented by the points in the image 712 of fig. 9).
In some examples, the illuminator 702 and the sensor 704 may be mounted on a rotating structure such that the illuminator 702 and the sensor 704 capture a 360 degree view of the environment (e.g., a rotating LIDAR sensor). In other examples, ranging system 700 may include one or more optical components (e.g., mirrors, collimators, diffraction gratings, etc.) that enable illuminator 702 and sensor 704 to detect a distance of an object within a particular range (e.g., up to 360 degrees). Although the example of fig. 9 shows only a single illuminator 702 and sensor 704, ranging system 700 may include multiple illuminators and sensor sets.
In some examples, illuminator 702 generates a structured light pattern. In such an example, ranging system 700 may include a plurality of sensors 704 on which respective images of the structured light pattern are formed. Ranging system 700 may use the differences between the images of the structured light pattern to determine a distance to object 708 from which the structured light pattern is backscattered. Structured light based ranging systems can have a high level of accuracy (e.g., accuracy in the sub-millimeter range) when the object 708 is relatively close to the sensor 704 (e.g., 0.2 meters to 2 meters). Such a high level of precision may be useful in facial recognition applications, such as unlocking mobile devices (e.g., mobile phones, tablet computers, etc.) and for security applications.
In some examples, ranging system 700 is a time-of-flight (ToF) based system. In some examples where ranging system 700 is a ToF-based system, illuminator 702 generates pulses of light. In other words, the illuminator 702 can modulate the amplitude of the emitted light 706. In such an example, the sensor 704 detects return light 710 from the light pulse 706 generated by the illuminator 702. Ranging system 700 may then determine the distance to object 708 from which light 706 is backscattered based on the delay between the time light 706 is emitted and detected and the known speed of light in air. In some examples, illuminator 702 can modulate the phase of emitted light 706 instead of (or in addition to) modulating the amplitude of emitted light 706. In such an example, the sensor 704 may detect the phase of the return light 710 from the object 708 and determine a distance to a point on the object 708 using the speed of light and based on a time difference between a time the illuminator 702 generates the light 706 at a particular phase and a time the sensor 704 detects the return light 710 at the particular phase.
In other examples, the point cloud may be generated without using the illuminator 702. For example, in some examples, the sensor 704 of the ranging system 700 may include two or more optical cameras. In such examples, ranging system 700 may use an optical camera to capture stereoscopic images of the environment including object 708. Ranging system 700 may include a point cloud generator 716 that may calculate differences between locations in the stereoscopic image. The ranging system 700 may then use the difference to determine a distance to a location shown in the stereoscopic image. From these distances, point cloud generator 716 may generate a point cloud.
The sensor 704 may also detect other properties of the object 708, such as color and reflectivity information. In the example of fig. 9, a point cloud generator 716 may generate a point cloud based on the signals 714 generated by the sensor 704. Ranging system 700 and/or point cloud generator 716 may form part of data source 104 (fig. 1). Thus, the point cloud generated by ranging system 700 may be encoded and/or decoded according to any of the techniques of this disclosure.
FIG. 10 is a conceptual diagram illustrating an exemplary vehicle-based scenario in which one or more techniques of the present disclosure may be used. In the example of fig. 8, a vehicle 800 includes a ranging system 802. Ranging system 802 may be implemented in the manner discussed with respect to fig. 10. Although not shown in the example of fig. 10, vehicle 800 may also include a data source, such as data source 104 (fig. 1), and a G-PCC encoder, such as G-PCC encoder 200 (fig. 1). In the example of fig. 10, ranging system 802 emits a laser beam 804 that is reflected from a pedestrian 806 or other object in the roadway. The data source of vehicle 800 may generate a point cloud based on the signals generated by ranging system 802. The G-PCC encoder of vehicle 800 may encode the point cloud to generate bitstreams 808, such as geometry bitstreams (fig. 2) and attribute bitstreams (fig. 2). Bit stream 808 may include significantly fewer bits than the uncoded point cloud obtained by the G-PCC encoder.
An output interface of the vehicle 800, such as the output interface 108 (fig. 1), may send the bit stream 808 to one or more other devices. Bit stream 808 may include significantly fewer bits than the uncoded point cloud obtained by the G-PCC encoder. Thus, the vehicle 800 may be able to send the bit stream 808 to other devices faster than the uncoded point cloud data. In addition, the bit stream 808 may require less data storage capacity.
In the example of fig. 10, a vehicle 800 may send a bit stream 808 to another vehicle 810. Vehicle 810 may include a G-PCC decoder, such as G-PCC decoder 300 (fig. 1). The G-PCC decoder of vehicle 810 may decode bit stream 808 to reconstruct the point cloud. The vehicle 810 may use the reconstructed point cloud for various purposes. For example, the vehicle 810 may determine that the pedestrian 806 is in a road in front of the vehicle 800 based on the reconstructed point cloud, and thus begin decelerating, e.g., even before the driver of the vehicle 810 realizes that the pedestrian 806 is in the road. Thus, in some examples, vehicle 810 may perform autonomous navigational operations based on the reconstructed point cloud.
Additionally or alternatively, the vehicle 800 may send the bit stream 808 to a server system 812. The server system 812 may use the bit stream 808 for various purposes. For example, the server system 812 may store the bit stream 808 for subsequent reconstruction of the point cloud. In this example, server system 812 may use the point cloud along with other data (e.g., vehicle telemetry data generated by vehicle 800) to train the autonomous driving system. In other examples, the server system 812 may store the bit stream 808 for subsequent reconstruction for forensic incident investigation.
Fig. 11 is a conceptual diagram illustrating an example augmented reality system in which one or more techniques of the present disclosure may be used. Augmented reality (XR) is a term used to encompass a range of technologies including Augmented Reality (AR), mixed Reality (MR), and Virtual Reality (VR). In the example of fig. 11, the user 900 is located at a first location 902. The user 900 wears an XR headset 904. As an alternative to XR headset 904, user 900 may use a mobile device (e.g., mobile phone, tablet, etc.). XR headset 904 includes a depth detection sensor, such as a ranging system, that detects the location of a point on object 906 at location 902. The data source of the XR headset 904 may use the signals generated by the depth detection sensor to generate a point cloud representation of the object 906 at the location 902. XR headset 904 may include a G-PCC encoder (e.g., G-PCC encoder 200 of fig. 1) configured to encode a point cloud to generate bit stream 908.
The XR headset 904 may send a bit stream 908 to an XR headset 910 worn by a user 912 at a second location 914 (e.g., via a network such as the internet). XR headset 910 may decode bit stream 908 to reconstruct the point cloud. XR headset 910 may use the point cloud to generate an XR visualization (e.g., AR, MR, VR visualization) that represents object 906 at location 902. Thus, in some examples, such as when XR headset 910 generates a VR visualization, user 912 may have a 3D immersive experience of location 902. In some examples, XR headset 910 may determine the location of the virtual object based on the reconstructed point cloud. For example, XR headset 910 may determine that the environment (e.g., location 902) includes a flat surface based on the reconstructed point cloud, and then determine that a virtual object (e.g., cartoon character) is to be positioned on the flat surface. XR headset 910 may generate an XR visualization in which the virtual object is in the determined location. For example, XR headset 910 may display a cartoon character sitting on a flat surface.
Fig. 12 is a conceptual diagram illustrating an exemplary mobile device system in which one or more techniques of this disclosure may be used. In the example of fig. 12, a mobile device 1000 (e.g., a wireless communication device) such as a mobile phone or tablet includes a ranging system, such as a LIDAR system, that detects the location of points on an object 1002 in the environment of the mobile device 1000. The data source of the mobile device 1000 may use the signals generated by the depth detection sensor to generate a point cloud representation of the object 1002. The mobile device 1000 may include a G-PCC encoder (e.g., G-PCC encoder 200 of fig. 1) configured to encode a point cloud to generate the bit stream 1004. In the example of fig. 12, mobile device 1000 can send a bitstream to a remote device 1006, such as a server system or other mobile device. Remote device 1006 may decode bit stream 1004 to reconstruct the point cloud. Remote device 1006 may use the point cloud for various purposes. For example, the remote device 1006 may use the point cloud to generate a map of the environment of the mobile device 1000. For example, the remote device 1006 may generate a map of the interior of the building based on the reconstructed point cloud. In another example, the remote device 1006 may generate an image (e.g., a computer graphic) based on the point cloud. For example, the remote device 1006 may use points of the point cloud as vertices of the polygon and color attributes of the points as a basis for coloring the polygon. In some examples, remote device 1006 may use the reconstructed point cloud for facial recognition or other security applications.
The following numbered clauses may illustrate one or more aspects of the present disclosure:
clause 1A. A method of processing a point cloud, the method comprising: points of the point cloud are selectively encoded and decoded using inter-frame prediction.
Clause 2A the method of clause 1A, wherein encoding and decoding the point using inter-frame prediction comprises: the value of the radius residual for the point is encoded using inter prediction.
Clause 3A the method of clause 2A, further comprising: a first syntax element indicating whether to encode the point using inter prediction is encoded via a bitstream.
Clause 4A the method of clause 3A, further comprising: encoding, via the bitstream, a second syntax element indicating whether inter prediction is enabled, wherein encoding the first syntax element includes encoding the first syntax element in response to the second syntax element indicating that inter prediction is enabled.
Clause 5A the method of clause 3A or clause 4A, wherein the first syntax element comprises a ptn_inter_flag syntax element.
Clause 6A the method of clause 4A or 5A, wherein the second syntax element comprises an intersonableflag syntax element.
Clause 7A the method of clause 4A, wherein the second syntax element indicates whether inter prediction is enabled for a particular frame including the point.
Clause 8A the method of any of clauses 1A-7A, further comprising: determining an enabling angle mode; and in response to determining to enable the angular mode, determining to enable inter prediction.
Clause 9A the method of any of clauses 1A-8A, wherein encoding the point using inter-prediction comprises encoding a first subset of the characteristics of the point using inter-prediction, the method further comprising: a second subset of the characteristics of the points are encoded using intra prediction.
Clause 10A the method of any of clauses 1A-9A, wherein encoding the point using inter-prediction comprises determining a value of the point based on a reference frame.
Clause 11A the method of any of clauses 1A-10A, further comprising: one or more syntax elements for predicting the geometry parameters are encoded in the bitstream.
Clause 12A the method of clause 11A, further comprising: a context of a context adaptive codec for predicting a geometric parameter is determined based on values of one or more other syntax elements of the codec in the bitstream.
Clause 13A the method of any of clauses 1A-12A, further comprising: one or more syntax elements having values representing global motion parameters are encoded in a bitstream.
Clause 14A the method of any of clauses 1A-13A, further comprising: encoding and decoding a first syntax element in the bitstream specifying whether inter prediction is enabled; and in response to the first syntax element specifying that inter prediction is enabled, encoding a scaled second syntax element specifying a bearing corner in the bitstream.
Clause 15A the method of any of clauses 1A-13A, further comprising: encoding and decoding a first syntax element in the bitstream specifying whether inter prediction is enabled; and encoding and decoding a scaled second syntax element specifying a bearing corner in the bitstream and regardless of whether the first syntax element specifies that inter prediction is enabled.
Clause 16A the method of clause 14A or clause 15A, wherein: the first syntax element specifying whether inter prediction is enabled includes an inter_prediction_enabled_flag syntax element, and the second syntax element specifying scaling of a azimuth angle point includes an inter_azim_scale_log2 syntax element.
Clause 17A an apparatus for processing a point cloud, the apparatus comprising one or more components for performing the method of any of clauses 1A-16A.
Clause 18A the device of clause 17A, wherein the one or more components comprise one or more processors implemented in a circuit.
The apparatus of any of clauses 17A or 18A, further comprising a memory for storing the data representing the point cloud.
Clause 20A the device of any of clauses 17A-19A, wherein the device comprises a decoder.
Clause 21A the apparatus of any of clauses 17A-20A, wherein the apparatus comprises an encoder.
Clause 22A the device of any of clauses 17A-21A, further comprising a device for generating the point cloud.
Clause 23A the device of any of clauses 17A-22A, further comprising a display for presenting an image based on the point cloud.
Clause 24A, a computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors to perform the method of any of clauses 1A-16A.
Clause 1B, a method of processing a point cloud, the method comprising: in response to determining to predict a current point in the point cloud using prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and predicting the current point of the point cloud using inter prediction in response to selecting the inter prediction mode for the current point.
Clause 2B the method of clause 1B, wherein selecting the prediction mode comprises: encoding, via a bitstream, a first syntax element indicating whether to encode the current point using inter prediction; and responsive to the first syntax element indicating that the current point is encoded using inter prediction, selecting the inter prediction mode as the prediction mode for the current point.
Clause 3B the method of clause 2B, further comprising: encoding, via the bitstream, a second syntax element indicating whether inter prediction is enabled, wherein encoding the first syntax element includes encoding the first syntax element in response to the second syntax element indicating that inter prediction is enabled.
Clause 4B the method of clause 3B, wherein the second syntax element indicates whether inter prediction is enabled for a particular frame including the current point.
Clause 5B the method of clause 1B, wherein the current point is in a current frame, and wherein predicting the current point using inter-frame prediction comprises: determining a reference point in a reference frame different from the current frame; and predicting one or more parameters of the current point based on the reference point.
Clause 6B the method of clause 5B, wherein the reference frame is a motion compensated reference frame.
Clause 7B the method of clause 5B, wherein predicting the one or more parameters comprises: one or more of an azimuth, a laser identifier, and a radius of the current point are predicted.
The method of clause 8B, wherein determining the reference point comprises: determining a pivot point in the current frame that is located before the current point in encoding and decoding order; and determining the reference point based on one or more parameters of the pivot point.
Clause 9B the method of clause 8B, wherein determining the reference point based on the one or more parameters of the pivot point comprises: in the reference frame, determining a reference pivot point based on an azimuth of the pivot point; and determining the reference point based on the reference pivot point.
Clause 10B the method of clause 9B, wherein determining the reference pivot point further comprises determining the reference pivot point based on a laser identifier of the pivot point.
Clause 11B the method of clause 9B, wherein the reference pivot point is a virtual point in the reference frame.
Clause 12B the method of clause 9B, wherein determining the reference pivot point based on the azimuth of the pivot point comprises: the reference pivot point is determined based on the scaled azimuth of the pivot point.
Clause 13B the method of clause 12B, further comprising: encoding and decoding a syntax element of a base 2 logarithm value specifying a scaling factor; and determining the scaled azimuth of the pivot point based on the scaling factor.
Clause 14B the method of clause 9B, wherein determining the reference point based on the reference pivot point comprises: a point having an azimuth greater than an azimuth of the reference pivot point is identified as the reference point in the reference frame.
Clause 15B the method of clause 14B, wherein identifying the point in the reference frame having an azimuth angle greater than an azimuth angle of the reference pivot point comprises: a point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point is identified as the reference point in the reference frame.
Clause 16B the method of clause 15B, wherein identifying the point having a scaled azimuth that is greater than the scaled azimuth of the reference pivot point comprises: a point having a minimum scaled azimuth that is greater than the scaled azimuth of the reference pivot point is identified as the point.
Clause 17B the method of clause 15B, wherein identifying the point having a scaled azimuth that is greater than the scaled azimuth of the reference pivot point comprises: a point having a second small scaled azimuth that is greater than the scaled azimuth of the reference pivot point is identified as the point.
Clause 18B the method of clause 9B, wherein determining the reference point comprises: determining a plurality of reference points in the reference frame and based on the reference pivot point, wherein predicting the one or more parameters of the current point based on the reference points includes predicting the one or more parameters of the current point based on the plurality of reference points.
Clause 19B the method of clause 5B, wherein the identified reference point comprises a zero motion candidate.
Clause 20B the method of clause 1B, further comprising: encoding, via a bitstream, a third syntax element indicating an inter-prediction mode for the current point, wherein predicting the current point of the point cloud using inter-prediction comprises predicting the current point using the inter-prediction mode.
Clause 21B the method of clause 1B, wherein selecting the prediction mode comprises: encoding, via a bitstream, an inter-prediction syntax element indicating whether to encode the current point using inter-prediction; and when the inter prediction syntax element indicates that the current point is encoded using inter prediction, selecting the inter prediction mode as the prediction mode for the current point, the method further comprising: a context for context-adaptive binary arithmetic coding (CABAC) of one or more syntax elements is selected based on a value of the inter-prediction syntax element.
The method of clause 22B, wherein the one or more syntax elements comprise one or more of: one or more syntax elements representing a phi multiplier; and one or more syntax elements representing the main residual data.
Clause 23B the method of clause 21B, further comprising: a context for CABAC encoding and decoding the inter-prediction syntax element is selected based on values of inter-prediction syntax elements of N previous points of the point cloud.
Clause 24B the method of clause 23B, wherein N is 5.
Clause 25B an apparatus for processing a point cloud, the apparatus comprising: a memory configured to store at least a portion of the point cloud; and one or more processors implemented in the circuit and configured to: in response to determining to predict a current point in the point cloud using prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and predicting the current point of the point cloud using inter prediction in response to selecting the inter prediction mode for the current point.
The apparatus of clause 26B, wherein, to select the prediction mode, the one or more processors are configured to: encoding, via a bitstream, a first syntax element indicating whether to encode the current point using inter prediction; and responsive to the first syntax element indicating that the current point is encoded using inter prediction, selecting the inter prediction mode as the prediction mode for the current point.
Clause 27B the device of clause 26B, wherein the one or more processors are further configured to: a second syntax element indicating whether inter prediction is enabled is encoded via the bitstream, wherein to encode the first syntax element, the one or more processors are configured to encode the first syntax element in response to the second syntax element indicating that inter prediction is enabled.
Clause 28B the apparatus of clause 27B, wherein the second syntax element indicates whether inter prediction is enabled for a particular frame that includes the current point.
Clause 29B, the device of clause 25B, wherein the current point is in a current frame, and wherein, to predict the current point using inter-prediction, the one or more processors are configured to: determining a reference point in a reference frame different from the current frame; and predicting one or more parameters of the current point based on the reference point.
Clause 30B the apparatus of clause 29B, wherein the reference frame is a motion compensated reference frame.
Clause 31B, the device of clause 29B, wherein, to predict the one or more parameters, the one or more processors are configured to: one or more of an azimuth, a laser identifier, and a radius of the current point are predicted.
The apparatus of clause 32B, wherein, to determine the reference point, the one or more processors are configured to: determining a pivot point in the current frame that is located before the current point in encoding and decoding order; and determining the reference point based on one or more parameters of the pivot point.
The apparatus of clause 33B, wherein to determine the reference point based on the one or more parameters of the pivot point, the one or more processors are configured to: in the reference frame, determining a reference pivot point based on an azimuth of the pivot point; and determining the reference point based on the reference pivot point.
Clause 34B the device of clause 33B, wherein to determine the reference pivot point, the one or more processors are configured to determine the reference pivot point based on a laser identifier of the pivot point.
Clause 35B the device of clause 33B, wherein the reference pivot point is a virtual point in the reference frame.
The apparatus of clause 36B, wherein to determine the reference pivot point based on the azimuth of the pivot point, the one or more processors are configured to: the reference pivot point is determined based on the scaled azimuth of the pivot point.
The apparatus of clause 36B, wherein the one or more processors are further configured to: encoding and decoding a syntax element of a base 2 logarithm value specifying a scaling factor; and determining the scaled azimuth of the pivot point based on the scaling factor.
The apparatus of clause 33B, wherein to determine the reference point based on the reference pivot point, the one or more processors are configured to: a point having an azimuth greater than an azimuth of the reference pivot point is identified as the reference point in the reference frame.
Clause 39B, the device of clause 38B, wherein to identify the point in the reference frame having an azimuth angle greater than an azimuth angle of the reference pivot point, the one or more processors are configured to: a point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point is identified as the reference point in the reference frame.
Clause 40B, the device of clause 39B, wherein to identify the point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point, the one or more processors are configured to: a point having a minimum scaled azimuth that is greater than the scaled azimuth of the reference pivot point is identified as the point.
Clause 41B, the device of clause 39B, wherein to identify the point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point, the one or more processors are configured to: a point having a second small scaled azimuth that is greater than the scaled azimuth of the reference pivot point is identified as the point.
Clause 42B, the device of clause 33B, wherein to determine the reference point, the one or more processors are configured to: in the reference frame and based on the reference pivot point, a plurality of reference points are determined, wherein to predict the one or more parameters of the current point based on the reference points, the one or more processors are configured to predict the one or more parameters of the current point based on the plurality of reference points.
Clause 43B the device of clause 29B, wherein the identified reference point comprises a zero motion candidate.
Clause 44B the device of clause 25B, wherein the one or more processors are further configured to: a third syntax element indicating an inter-prediction mode for the current point is encoded via a bitstream, wherein to predict the current point of the point cloud using inter-prediction, the one or more processors are configured to predict the current point using the inter-prediction mode.
Clause 45B, the device of clause 25B, wherein, to select the prediction mode, the one or more processors are configured to: encoding, via a bitstream, an inter-prediction syntax element indicating whether to encode the current point using inter-prediction; and when the inter prediction syntax element indicates that the current point is encoded using inter prediction, selecting the inter prediction mode as the prediction mode for the current point, wherein the one or more processors are further configured to: a context for context-adaptive binary arithmetic coding (CABAC) of one or more syntax elements is selected based on a value of the inter-prediction syntax element.
Clause 46B the device of clause 45B, wherein the one or more syntax elements comprise one or more of: one or more syntax elements representing a phi multiplier; and one or more syntax elements representing the main residual data.
Clause 47B the device of clause 45B, the one or more processors being further configured to: a context for CABAC encoding and decoding the inter-prediction syntax element is selected based on values of inter-prediction syntax elements of N previous points of the point cloud.
Clause 48B the device of clause 47B, wherein N is 5.
Clause 49B the device of clause 25B, further comprising: a rotating LIDAR sensor, wherein the one or more processors are configured to generate the point cloud based on data generated by the rotating LIDAR sensor.
Clause 50B the device of clause 25B, wherein the device is a vehicle comprising the rotating LIDAR sensor.
Clause 51B the device of clause 25B, wherein the device is a wireless communication device.
Clause 52B, a computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to: in response to determining to predict a current point in a point cloud using prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and predicting the current point of the point cloud using inter prediction in response to selecting the inter prediction mode for the current point.
Clause 1C a method of processing a point cloud, the method comprising: in response to determining to predict a current point in the point cloud using prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and predicting the current point of the point cloud using inter prediction in response to selecting the inter prediction mode for the current point.
Clause 2C the method of clause 1C, wherein selecting the prediction mode comprises: encoding, via a bitstream, a first syntax element indicating whether to encode the current point using inter prediction; and responsive to the first syntax element indicating that the current point is encoded using inter prediction, selecting the inter prediction mode as the prediction mode for the current point.
Clause 3C the method of clause 2C, further comprising: encoding, via the bitstream, a second syntax element indicating whether inter prediction is enabled, wherein encoding the first syntax element includes encoding the first syntax element in response to the second syntax element indicating that inter prediction is enabled.
Clause 4C the method of clause 3C, wherein the second syntax element indicates whether inter prediction is enabled for a particular frame including the current point.
The method of any of clauses 1B-4C, wherein the current point is in a current frame, and wherein predicting the current point using inter-prediction comprises: determining a reference point in a reference frame different from the current frame; and predicting one or more parameters of the current point based on the reference point.
Clause 6C the method of clause 5C, wherein the reference frame is a motion compensated reference frame.
The method of clause 7C, wherein predicting the one or more parameters comprises: one or more of an azimuth, a laser identifier, and a radius of the current point are predicted.
The method of any of clauses 5C-7C, wherein determining the reference point comprises: determining a pivot point in the current frame that is located before the current point in encoding and decoding order; and determining the reference point based on one or more parameters of the pivot point.
The method of clause 9C, wherein determining the reference point based on the one or more parameters of the pivot point comprises: in the reference frame, determining a reference pivot point based on an azimuth of the pivot point; and determining the reference point based on the reference pivot point.
Clause 10C the method of clause 9C, wherein determining the reference pivot point further comprises determining the reference pivot point based on a laser identifier of the pivot point.
Clause 11C the method of clause 9C or 10C, wherein the reference pivot point is a virtual point in the reference frame.
The method of clause 9C, wherein determining the reference pivot point based on the azimuth of the pivot point comprises: the reference pivot point is determined based on the scaled azimuth of the pivot point.
Clause 13C the method of clause 12C, further comprising: encoding and decoding a syntax element of a base 2 logarithm value specifying a scaling factor; and determining a scaled azimuth of the pivot point based on the scaling factor.
The method of any of clauses 9C-13C, wherein determining the reference point based on the reference pivot point comprises: a point having an azimuth greater than an azimuth of the reference pivot point is identified as the reference point in the reference frame.
Clause 15C the method of clause 14C, wherein identifying the point in the reference frame having an azimuth angle greater than an azimuth angle of the reference pivot point comprises: a point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point is identified as the reference point in the reference frame.
Clause 16C, the method of clause 15C, wherein identifying the point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point comprises: a point having a minimum scaled azimuth that is greater than the scaled azimuth of the reference pivot point is identified as the point.
Clause 17C, the method of clause 15C, wherein identifying the point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point comprises: a point having a second small scaled azimuth that is greater than the scaled azimuth of the reference pivot point is identified as the point.
The method of any of clauses 9C-17C, wherein determining the reference point comprises: determining a plurality of reference points in the reference frame and based on the reference pivot point, wherein predicting the one or more parameters of the current point based on the reference points includes predicting the one or more parameters of the current point based on the plurality of reference points.
The method of any of clauses 5C-18C, wherein the identified reference point comprises a zero motion candidate.
The method of any of clauses 1C-19C, further comprising: encoding, via a bitstream, a third syntax element indicating an inter-prediction mode for the current point, wherein predicting the current point of the point cloud using inter-prediction comprises predicting the current point using the inter-prediction mode.
The method of any of clauses 1C-20C, wherein selecting the prediction mode comprises: encoding, via a bitstream, an inter-prediction syntax element indicating whether to encode the current point using inter-prediction; and when the inter prediction syntax element indicates that the current point is encoded using inter prediction, selecting the inter prediction mode as the prediction mode for the current point, the method further comprising: a context for context-adaptive binary arithmetic coding (CABAC) of one or more syntax elements is selected based on a value of the inter-prediction syntax element.
The method of clause 22C, wherein the one or more syntax elements comprise one or more of: one or more syntax elements representing a phi multiplier; and one or more syntax elements representing the main residual data.
Clause 23C the method of clause 21C or 22C, further comprising: a context for CABAC encoding and decoding the inter-prediction syntax element is selected based on values of inter-prediction syntax elements of N previous points of the point cloud.
Clause 24C the method of clause 23C, wherein N is 5.
Clause 25C an apparatus for processing a point cloud, the apparatus comprising: a memory configured to store at least a portion of the point cloud; and one or more processors implemented in the circuit and configured to: in response to determining to predict a current point in the point cloud using prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and predicting the current point of the point cloud using inter prediction in response to selecting the inter prediction mode for the current point.
The apparatus of clause 26C, wherein, to select the prediction mode, the one or more processors are configured to: encoding, via a bitstream, a first syntax element indicating whether to encode the current point using inter prediction; and responsive to the first syntax element indicating that the current point is encoded using inter prediction, selecting the inter prediction mode as the prediction mode for the current point.
The apparatus of clause 27C, wherein the one or more processors are further configured to: a second syntax element indicating whether inter prediction is enabled is encoded via the bitstream, wherein to encode the first syntax element, the one or more processors are configured to encode the first syntax element in response to the second syntax element indicating that inter prediction is enabled.
Clause 28C the apparatus of clause 27C, wherein the second syntax element indicates whether inter prediction is enabled for a particular frame including the current point.
The apparatus of any of clauses 25C-28C, wherein the current point is in a current frame, and wherein to predict the current point using inter-frame prediction, the one or more processors are configured to: determining a reference point in a reference frame different from the current frame; and predicting one or more parameters of the current point based on the reference point.
Clause 30C the device of clause 29C, wherein the reference frame is a motion compensated reference frame.
The apparatus of clause 31C, or clause 29C or 30C, wherein, to predict the one or more parameters, the one or more processors are configured to: one or more of an azimuth, a laser identifier, and a radius of the current point are predicted.
The apparatus of any of clauses 29C-31C, wherein to determine the reference point, the one or more processors are configured to: determining a pivot point in the current frame that is located before the current point in encoding and decoding order; and determining the reference point based on one or more parameters of the pivot point.
The apparatus of clause 33C, wherein to determine the reference point based on the one or more parameters of the pivot point, the one or more processors are configured to: in the reference frame, determining a reference pivot point based on an azimuth of the pivot point; and determining the reference point based on the reference pivot point.
Clause 34C the device of clause 33C, wherein to determine the reference pivot point, the one or more processors are configured to determine the reference pivot point based on a laser identifier of the pivot point.
Clause 35C the device of clause 33C or 34C, wherein the reference pivot point is a virtual point in the reference frame.
The apparatus of any of clauses 33C-35C, wherein to determine the reference pivot point based on the azimuth of the pivot point, the one or more processors are configured to: the reference pivot point is determined based on the scaled azimuth of the pivot point.
The apparatus of clause 36C, wherein the one or more processors are further configured to: encoding and decoding a syntax element of a base 2 logarithm value specifying a scaling factor; and determining a scaled azimuth of the pivot point based on the scaling factor.
The apparatus of any of clauses 33C-37C, wherein to determine the reference point based on the reference pivot point, the one or more processors are configured to: a point having an azimuth greater than an azimuth of the reference pivot point is identified as the reference point in the reference frame.
Clause 39C, the device of clause 38C, wherein to identify the point in the reference frame having an azimuth angle greater than an azimuth angle of the reference pivot point, the one or more processors are configured to: a point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point is identified as the reference point in the reference frame.
Clause 40C, the device of clause 39C, wherein to identify the point at a scaled azimuth that is greater than a scaled azimuth of the reference pivot point, the one or more processors are configured to: a point having a minimum scaled azimuth that is greater than the scaled azimuth of the reference pivot point is identified as the point.
Clause 41C, the device of clause 39C, wherein to identify the point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point, the one or more processors are configured to: a point having a second small scaled azimuth that is greater than the scaled azimuth of the reference pivot point is identified as the point.
The apparatus of any of clauses 33C-41C, wherein, to determine the reference point, the one or more processors are configured to: in the reference frame and based on the reference pivot point, a plurality of reference points are determined, wherein to predict the one or more parameters of the current point based on the reference points, the one or more processors are configured to predict the one or more parameters of the current point based on the plurality of reference points.
Clause 43C the apparatus of any of clauses 29C-42C, wherein the identified reference point comprises a zero motion candidate.
The apparatus of any of clauses 25C-43C, wherein the one or more processors are further configured to: a third syntax element indicating an inter-prediction mode for the current point is encoded via a bitstream, wherein to predict the current point of the point cloud using inter-prediction, the one or more processors are configured to predict the current point using the inter-prediction mode.
The apparatus of any of clauses 25C-44C, wherein, to select the prediction mode, the one or more processors are configured to: encoding, via a bitstream, an inter-prediction syntax element indicating whether to encode the current point using inter-prediction; and when the inter prediction syntax element indicates that the current point is encoded using inter prediction, selecting the inter prediction mode as the prediction mode for the current point, wherein the one or more processors are further configured to: a context for context-adaptive binary arithmetic coding (CABAC) of one or more syntax elements is selected based on a value of the inter-prediction syntax element.
Clause 46C the device of clause 45C, wherein the one or more syntax elements comprise one or more of: one or more syntax elements representing a phi multiplier; and one or more syntax elements representing the main residual data.
The apparatus of clause 45C or 46C, the one or more processors further configured to: a context for CABAC encoding and decoding the inter-prediction syntax element is selected based on values of inter-prediction syntax elements of N previous points of the point cloud.
Clause 48C the device of clause 47C, wherein N is 5.
The apparatus of any of clauses 25C-48C, further comprising: a rotating LIDAR sensor, wherein the one or more processors are configured to generate the point cloud based on data generated by the rotating LIDAR sensor.
The apparatus of any one of clauses 25C-49C, wherein the apparatus is a vehicle comprising the rotational LIDAR sensor.
The apparatus of any of clauses 25C-49C, wherein the apparatus is a wireless communication apparatus.
Clause 52C, a computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to: in response to determining to predict a current point in a point cloud using prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and predicting the current point of the point cloud using inter prediction in response to selecting the inter prediction mode for the current point.
It should be appreciated that, according to an example, certain acts or events of any of the techniques described herein can be performed in a different order, may be added, combined, or omitted entirely (e.g., not all of the described acts or events are necessary for the practice of the technique). Further, in some examples, an action or event may be performed concurrently, e.g., by multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially.
In one or more examples, the described functionality may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. A computer-readable medium may include a computer-readable storage medium corresponding to a tangible medium, such as a data storage medium, or a communication medium, including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, a computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium (which is non-transitory) or (2) a communication medium (such as a signal or carrier wave). Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Further, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave can be included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but rather refer to non-transitory, tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the terms "processor" and "processing circuitry" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Furthermore, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Furthermore, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques but do not necessarily require realization by different hardware units. Rather, as noted above, the various units may be incorporated in a codec hardware unit or provided by a collection of interoperable hardware units, including one or more processors as noted above, in combination with appropriate software and/or firmware.
Various examples have been described. These examples, as well as other examples, are within the scope of the following claims.
Claims (52)
1. A method of processing a point cloud, the method comprising:
in response to determining to predict a current point in the point cloud using prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and
in response to selecting the inter prediction mode for the current point, inter prediction is used to predict the current point of the point cloud.
2. The method of claim 1, wherein selecting the prediction mode comprises:
encoding, via a bitstream, a first syntax element indicating whether to encode the current point using inter prediction; and
the inter prediction mode is selected as the prediction mode for the current point in response to the first syntax element indicating that the current point is encoded using inter prediction.
3. The method of claim 2, further comprising:
a second syntax element indicating whether inter prediction is enabled is encoded via the bitstream,
Wherein encoding the first syntax element includes encoding the first syntax element in response to the second syntax element indicating that inter-prediction is enabled.
4. The method of claim 3, wherein the second syntax element indicates whether inter-prediction is enabled for a particular frame that includes the current point.
5. The method of claim 1, wherein the current point is in a current frame, and wherein predicting the current point using inter-prediction comprises:
determining a reference point in a reference frame different from the current frame; and
one or more parameters of the current point are predicted based on the reference point.
6. The method of claim 5, wherein the reference frame is a motion compensated reference frame.
7. The method of claim 5, wherein predicting the one or more parameters comprises:
one or more of an azimuth, a laser identifier, and a radius of the current point are predicted.
8. The method of claim 5, wherein determining the reference point comprises:
determining a pivot point in the current frame that is located before the current point in encoding and decoding order; and
the reference point is determined based on one or more parameters of the pivot point.
9. The method of claim 8, wherein determining the reference point based on the one or more parameters of the pivot point comprises:
in the reference frame, determining a reference pivot point based on an azimuth of the pivot point; and
the reference point is determined based on the reference pivot point.
10. The method of claim 9, wherein determining the reference pivot point further comprises determining the reference pivot point based on a laser identifier of the pivot point.
11. The method of claim 9, wherein the reference pivot point is a virtual point in the reference frame.
12. The method of claim 9, wherein determining the reference pivot point based on an azimuth of the pivot point comprises:
the reference pivot point is determined based on the scaled azimuth of the pivot point.
13. The method of claim 12, further comprising:
encoding and decoding a syntax element of a base 2 logarithm value specifying a scaling factor; and
a scaled azimuth of the pivot point is determined based on the scaling factor.
14. The method of claim 9, wherein determining the reference point based on the reference pivot point comprises:
A point having an azimuth greater than an azimuth of the reference pivot point is identified as the reference point in the reference frame.
15. The method of claim 14, wherein identifying the point in the reference frame having an azimuth that is greater than an azimuth of the reference pivot point comprises:
a point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point is identified as the reference point in the reference frame.
16. The method of claim 15, wherein identifying the point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point comprises:
a point having a minimum scaled azimuth that is greater than the scaled azimuth of the reference pivot point is identified as the point.
17. The method of claim 15, wherein identifying the point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point comprises:
a point having a second small scaled azimuth that is greater than the scaled azimuth of the reference pivot point is identified as the point.
18. The method of claim 9, wherein determining the reference point comprises:
a plurality of reference points are determined in the reference frame and based on the reference pivot point,
Wherein predicting the one or more parameters of the current point based on the reference points comprises predicting the one or more parameters of the current point based on the plurality of reference points.
19. The method of claim 5, wherein the identified reference point comprises a zero motion candidate.
20. The method of claim 1, further comprising:
a third syntax element indicating an inter prediction mode for the current point is encoded via a bitstream,
wherein predicting the current point of the point cloud using inter prediction includes predicting the current point using the inter prediction mode.
21. The method of claim 1, wherein selecting the prediction mode comprises:
encoding, via a bitstream, an inter-prediction syntax element indicating whether to encode the current point using inter-prediction; and
when the inter prediction syntax element indicates that the current point is encoded using inter prediction, selecting the inter prediction mode as the prediction mode for the current point, the method further comprising:
a context for context-adaptive binary arithmetic coding (CABAC) of one or more syntax elements is selected based on a value of the inter-prediction syntax element.
22. The method of claim 21, wherein the one or more syntax elements comprise one or more of:
one or more syntax elements representing a phi multiplier; and
one or more syntax elements representing main residual data.
23. The method of claim 21, further comprising:
a context for CABAC encoding and decoding the inter-prediction syntax element is selected based on values of inter-prediction syntax elements of N previous points of the point cloud.
24. The method of claim 23, wherein N is 5.
25. An apparatus for processing a point cloud, the apparatus comprising:
a memory configured to store at least a portion of the point cloud; and
one or more processors implemented in circuitry and configured to:
in response to determining to predict a current point in the point cloud using prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and
in response to selecting the inter prediction mode for the current point, inter prediction is used to predict the current point of the point cloud.
26. The device of claim 25, wherein to select the prediction mode, the one or more processors are configured to:
encoding, via a bitstream, a first syntax element indicating whether to encode the current point using inter prediction; and
the inter prediction mode is selected as the prediction mode for the current point in response to the first syntax element indicating that the current point is encoded using inter prediction.
27. The device of claim 26, wherein the one or more processors are further configured to:
a second syntax element indicating whether inter prediction is enabled is encoded via the bitstream,
wherein to codec the first syntax element, the one or more processors are configured to codec the first syntax element in response to the second syntax element indicating that inter prediction is enabled.
28. The apparatus of claim 27, wherein the second syntax element indicates whether inter-prediction is enabled for a particular frame that includes the current point.
29. The device of claim 25, wherein the current point is in a current frame, and wherein to predict the current point using inter-prediction, the one or more processors are configured to:
Determining a reference point in a reference frame different from the current frame; and
one or more parameters of the current point are predicted based on the reference point.
30. The apparatus of claim 29, wherein the reference frame is a motion compensated reference frame.
31. The device of claim 29, wherein to predict the one or more parameters, the one or more processors are configured to:
one or more of an azimuth, a laser identifier, and a radius of the current point are predicted.
32. The device of claim 29, wherein to determine the reference point, the one or more processors are configured to:
determining a pivot point in the current frame that is located before the current point in encoding and decoding order; and
the reference point is determined based on one or more parameters of the pivot point.
33. The device of claim 32, wherein to determine the reference point based on the one or more parameters of the pivot point, the one or more processors are configured to:
in the reference frame, determining a reference pivot point based on an azimuth of the pivot point; and
the reference point is determined based on the reference pivot point.
34. The device of claim 33, wherein to determine the reference pivot point, the one or more processors are configured to determine the reference pivot point based on a laser identifier of the pivot point.
35. The apparatus of claim 33, wherein the reference pivot point is a virtual point in the reference frame.
36. The device of claim 33, wherein to determine the reference pivot point based on an azimuth of the pivot point, the one or more processors are configured to:
the reference pivot point is determined based on the scaled azimuth of the pivot point.
37. The device of claim 36, wherein the one or more processors are further configured to:
encoding and decoding a syntax element of a base 2 logarithm value specifying a scaling factor; and
the scaled azimuth of the pivot point is determined based on the scaling factor.
38. The device of claim 33, wherein to determine the reference point based on the reference pivot point, the one or more processors are configured to:
a point having an azimuth greater than an azimuth of the reference pivot point is identified as the reference point in the reference frame.
39. The device of claim 38, wherein to identify the point in the reference frame having an azimuth angle greater than an azimuth angle of the reference pivot point, the one or more processors are configured to:
a point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point is identified as the reference point in the reference frame.
40. The device of claim 39, wherein to identify the point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point, the one or more processors are configured to:
a point having a minimum scaled azimuth that is greater than the scaled azimuth of the reference pivot point is identified as the point.
41. The device of claim 39, wherein to identify the point having a scaled azimuth that is greater than a scaled azimuth of the reference pivot point, the one or more processors are configured to:
a point having a second small scaled azimuth that is greater than the scaled azimuth of the reference pivot point is identified as the point.
42. The device of claim 33, wherein to determine the reference point, the one or more processors are configured to:
A plurality of reference points are determined in the reference frame and based on the reference pivot point,
wherein to predict the one or more parameters of the current point based on the reference points, the one or more processors are configured to predict the one or more parameters of the current point based on the plurality of reference points.
43. The apparatus of claim 29, wherein the identified reference point comprises a zero motion candidate.
44. The device of claim 25, wherein the one or more processors are further configured to:
a third syntax element indicating an inter prediction mode for the current point is encoded via a bitstream,
wherein to predict the current point of the point cloud using inter prediction, the one or more processors are configured to predict the current point using the inter prediction mode.
45. The device of claim 25, wherein to select the prediction mode, the one or more processors are configured to:
encoding, via a bitstream, an inter-prediction syntax element indicating whether to encode the current point using inter-prediction; and
When the inter prediction syntax element indicates that the current point is encoded using inter prediction, selecting the inter prediction mode as the prediction mode for the current point, wherein the one or more processors are further configured to:
a context for context-adaptive binary arithmetic coding (CABAC) of one or more syntax elements is selected based on a value of the inter-prediction syntax element.
46. The apparatus of claim 45, wherein the one or more syntax elements comprise one or more of:
one or more syntax elements representing a phi multiplier; and
one or more syntax elements representing main residual data.
47. The device of claim 45, the one or more processors further configured to:
a context for CABAC encoding and decoding the inter-prediction syntax element is selected based on values of inter-prediction syntax elements of N previous points of the point cloud.
48. The apparatus of claim 47, wherein N is 5.
49. The apparatus of claim 25, further comprising:
a rotating LIDAR sensor, wherein the one or more processors are configured to generate the point cloud based on data generated by the rotating LIDAR sensor.
50. The apparatus of claim 25, wherein the apparatus is a vehicle comprising the rotational LIDAR sensor.
51. The device of claim 25, wherein the device is a wireless communication device.
52. A computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to:
in response to determining to predict a current point in a point cloud using prediction geometry codec, selecting a prediction mode for the current point from a set of prediction modes, wherein the set of prediction modes includes at least an intra-prediction mode and an inter-prediction mode; and
in response to selecting the inter prediction mode for the current point, inter prediction is used to predict the current point of the point cloud.
Applications Claiming Priority (9)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US63/131,716 | 2020-12-29 | ||
US63/134,492 | 2021-01-06 | ||
US63/170,907 | 2021-04-05 | ||
US63/177,186 | 2021-04-20 | ||
US63/179,892 | 2021-04-26 | ||
US63/218,170 | 2021-07-02 | ||
US17/646,217 US20220207780A1 (en) | 2020-12-29 | 2021-12-28 | Inter prediction coding for geometry point cloud compression |
US17/646,217 | 2021-12-28 | ||
PCT/US2021/065483 WO2022147100A1 (en) | 2020-12-29 | 2021-12-29 | Inter prediction coding for geometry point cloud compression |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116711313A true CN116711313A (en) | 2023-09-05 |
Family
ID=87836123
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202180087697.9A Pending CN116711313A (en) | 2020-12-29 | 2021-12-29 | Inter-prediction codec for geometric point cloud compression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116711313A (en) |
-
2021
- 2021-12-29 CN CN202180087697.9A patent/CN116711313A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220207780A1 (en) | Inter prediction coding for geometry point cloud compression | |
CA3198854A1 (en) | Inter prediction coding for geometry point cloud compression | |
US20230105931A1 (en) | Inter prediction coding with radius interpolation for predictive geometry-based point cloud compression | |
EP4272166A1 (en) | Hybrid-tree coding for inter and intra prediction for geometry coding | |
US20220215596A1 (en) | Model-based prediction for geometry point cloud compression | |
US20230099908A1 (en) | Coding point cloud data using direct mode for inter-prediction in g-pcc | |
WO2023059987A1 (en) | Inter prediction coding with radius interpolation for predictive geometry-based point cloud compression | |
WO2022147008A1 (en) | Model-based prediction for geometry point cloud compression | |
KR20230125786A (en) | Global motion estimation using road and ground object labels for geometry-based point cloud compression | |
CN116711313A (en) | Inter-prediction codec for geometric point cloud compression | |
US20230345045A1 (en) | Inter prediction coding for geometry point cloud compression | |
US20230230290A1 (en) | Prediction for geometry point cloud compression | |
US11949909B2 (en) | Global motion estimation using road and ground object labels for geometry-based point cloud compression | |
US20240185470A1 (en) | Decoding attribute values in geometry-based point cloud compression | |
US20230345044A1 (en) | Residual prediction for geometry point cloud compression | |
US20240348769A1 (en) | Inter prediction candidate selection in point cloud compression | |
US20230177739A1 (en) | Local adaptive inter prediction for g-pcc | |
US20240037804A1 (en) | Using vertical prediction for geometry point cloud compression | |
US20240233199A1 (en) | Inter prediction for predictive geometry coding | |
TW202408244A (en) | Inter prediction coding for geometry point cloud compression | |
WO2023205318A1 (en) | Improved residual prediction for geometry point cloud compression | |
WO2024220278A1 (en) | Inter prediction candidate selection in point cloud compression | |
CN116648914A (en) | Global motion estimation using road and ground object markers for geometry-based point cloud compression | |
CN118525300A (en) | Prediction for geometric point cloud compression | |
CN116636204A (en) | Mixed tree coding for inter and intra prediction for geometric coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40092081 Country of ref document: HK |