WO2019068259A1 - Point cloud coding - Google Patents

Point cloud coding Download PDF

Info

Publication number
WO2019068259A1
WO2019068259A1 PCT/CN2018/109296 CN2018109296W WO2019068259A1 WO 2019068259 A1 WO2019068259 A1 WO 2019068259A1 CN 2018109296 W CN2018109296 W CN 2018109296W WO 2019068259 A1 WO2019068259 A1 WO 2019068259A1
Authority
WO
WIPO (PCT)
Prior art keywords
tree
point cloud
initial
depth
final
Prior art date
Application number
PCT/CN2018/109296
Other languages
French (fr)
Inventor
Zhu Li
Shan Liu
Jose Alvarez
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Publication of WO2019068259A1 publication Critical patent/WO2019068259A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/40Tree coding, e.g. quadtree, octree

Definitions

  • the disclosed embodiments relate to video coding in general and point cloud coding in particular.
  • a method comprises: obtaining a point cloud of an object, wherein the point cloud comprises points; generating an initial k-d tree of the point cloud, wherein the initial k-d tree comprises an initial depth and is balanced; generating a final k-d tree of the point cloud, wherein the final k-d tree comprises a final depth and is unbalanced, and wherein the final depth is greater than the initial depth; and encoding an encoded point cloud, wherein the encoded point cloud comprises a depth difference between the final depth and the initial depth.
  • the method further comprises further encoding a bounding box describing diagonal corner points of the object.
  • the method further comprises further encoding the initial depth.
  • the method further comprises further encoding the final k-d tree.
  • the method further comprises transmitting the encoded point cloud.
  • the method further comprises determining the initial depth based on a total number of the points.
  • the method further comprises further determining the initial depth based on a predetermined maximum number of levels that can be efficiently encoded given a constraint.
  • the constraint is an amount of memory or a processing power.
  • the method further comprises generating additional nodes beyond the initial depth.
  • the method further comprises further generating the additional nodes based on an MST operation.
  • the method further comprises further generating the additional nodes based on an average residual from the MST operation.
  • the method further comprises further generating the final k-d tree using the initial k-d tree and the additional nodes.
  • the point cloud, the initial k-d tree, and the final k-d tree are 3D.
  • an apparatus comprises a memory; and a processor coupled to the memory and configured to: obtain a point cloud of an object, wherein the point cloud comprises points, generate an initial k-d tree of the point cloud, wherein the initial k-d tree comprises an initial depth and is balanced, generate a final k-d tree of the point cloud, wherein the final k-d tree comprises a final depth and is unbalanced, and wherein the final depth is greater than the initial depth, and encode an encoded point cloud, wherein the encoded point cloud comprises a depth difference between the final depth and the initial depth.
  • the processor is further configured to further encode a bounding box describing diagonal corner points of the object.
  • the processor is further configured to further encode the initial depth.
  • the processor is further configured to further encode the final k-d tree.
  • the apparatus further comprises a transmitter coupled to the processor and configured to transmit the encoded point cloud.
  • the processor is further configured to determine the initial depth based on a total number of the points.
  • the processor is further configured to further determine the initial depth based on a predetermined maximum number of levels that can be efficiently encoded given a constraint.
  • the constraint is an amount of memory or a processing power.
  • the processor is further configured to generate additional nodes beyond the initial depth.
  • the processor is further configured to further generate the additional nodes based on an MST operation.
  • the processor is further configured to further generate the additional nodes based on an average residual from the MST operation.
  • the processor is further configured to further generate the final k-d tree using the initial k-d tree and the additional nodes.
  • the point cloud, the initial k-d tree, and the final k-d tree are 3D.
  • a computer program product comprises computer executable instructions stored on a non-transitory medium that when executed by a processor cause an apparatus to obtain a point cloud of an object, wherein the point cloud comprises points; generate an initial k-d tree of the point cloud, wherein the initial k-d tree comprises an initial depth and is balanced; generate a final k-d tree of the point cloud, wherein the final k-d tree comprises a final depth and is unbalanced, and wherein the final depth is greater than the initial depth; and encode an encoded point cloud, wherein the encoded point cloud comprises a depth difference between the final depth and the initial depth.
  • the instructions further cause the apparatus to further encode a bounding box describing diagonal corner points of the object.
  • the instructions further cause the apparatus to further encode the initial depth.
  • the instructions further cause the apparatus to further encode the final k-d tree.
  • the instructions further cause the apparatus to transmit the encoded point cloud.
  • the instructions further cause the apparatus to determine the initial depth based on a total number of the points.
  • the instructions further cause the apparatus to further determine the initial depth based on a predetermined maximum number of levels that can be efficiently encoded given a constraint.
  • the constraint is an amount of memory or a processing power.
  • the instructions further cause the apparatus to generate additional nodes beyond the initial depth.
  • the instructions further cause the apparatus to further generate the additional nodes based on an MST operation.
  • the instructions further cause the apparatus to further generate the additional nodes based on an average residual from the MST operation.
  • the instructions further cause the apparatus to further generate the final k-d tree using the initial k-d tree and the additional nodes.
  • the point cloud, the initial k-d tree, and the final k-d tree are 3D.
  • a method comprises receiving an encoded point cloud comprising characteristics, wherein the characteristics comprise a bounding box describing diagonal corner points of an object, an initial depth of an initial k-d tree, a final k-d tree associated with a final depth, and a depth difference between the final depth; extracting the characteristics from the encoded point cloud; and generating a point cloud based on the characteristics.
  • the point cloud, the initial k-d tree, and the final k-d tree are 3D.
  • an apparatus comprises a memory; and a processor coupled to the memory and configured to: receive an encoded point cloud comprising characteristics, wherein the characteristics comprise a bounding box describing diagonal corner points of an object, an initial depth of an initial k-d tree, a final k-d tree associated with a final depth, and a depth difference between the final depth, extract the characteristics from the encoded point cloud, and generate a point cloud based on the characteristics.
  • the point cloud, the initial k-d tree, and the final k-d tree are 3D.
  • a computer program product comprises computer executable instructions stored on a non-transitory medium that when executed by a processor cause an apparatus to: receive an encoded point cloud comprising characteristics, wherein the characteristics comprise a bounding box describing diagonal corner points of an object, an initial depth of an initial k-d tree, a final k-d tree associated with a final depth, and a depth difference between the final depth; extract the characteristics from the encoded point cloud; and generate a point cloud based on the characteristics.
  • the point cloud, the initial k-d tree, and the final k-d tree are 3D.
  • the preceding embodiments provide for generating an initial k-d tree that is balanced but lossy, then generating nodes beyond the initial k-d tree to create a final k-d tree that may be unbalanced but lossless.
  • the embodiments provide geometric and scalable coding. By increasing some processing at the encoding stage, the embodiments provide for more efficient coding and thus more efficient communication.
  • FIG. 1 is a schematic diagram of a coding system.
  • FIG. 2 is a flowchart illustrating a method of point cloud encoding and communication according to an embodiment of the disclosure.
  • FIGS. 3A-3D are diagrams demonstrating building of a k-d tree of dimension two.
  • FIG. 4 is a flowchart illustrating a method of point cloud communication and decoding according to an embodiment of the disclosure.
  • FIG. 5 is a schematic diagram of an apparatus according to an embodiment of the disclosure.
  • ASIC application-specific integrated circuit
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA field-programmable gate array
  • LCD liquid crystal display
  • RAM random-access memory
  • ROM read-only memory
  • TCAM ternary content-addressable memory
  • 3D three-dimensional.
  • FIG. 1 is a schematic diagram of a coding system 100.
  • the coding system 100 comprises a source device 110, a medium 150, and a destination device 160.
  • the source device 110 and the destination device 160 are mobile phones, tablet computers, desktop computers, notebook computers, or other suitable devices.
  • the medium 150 is a local network, a radio network, the Internet, or another suitable medium.
  • the source device 110 comprises a video generator 120, an encoder 130, and an output interface 140.
  • the video generator 120 is a camera or another device suitable for generating video. Videos include any visual representations of volumetric spaces or other multidimensional data.
  • the encoder 130 may be referred to as a codec.
  • the encoder 130 performs encoding according to a set of rules, for instance as described in “High Efficiency Video Coding, ” ITU-T H. 265, December 2016 ( “H. 265” ) .
  • the output interface 140 is an antenna or another component suitable for transmitting data to the destination device 160.
  • the video generator 120, the encoder 130, and the output interface 140 are in any suitable combination of devices.
  • the destination device 160 comprises an input interface 170, a decoder 180, and a display 190.
  • the input interface 170 is an antenna or another component suitable for receiving data from the source device 110.
  • the decoder 180 may also be referred to as a codec.
  • the decoder 180 performs decoding according to a set of rules, for instance as described in H. 265.
  • the display 190 is an LCD screen or another component suitable for displaying videos.
  • the input interface 170, the decoder 180, and the display 190 are in any suitable combination of devices.
  • the video generator 120 captures a video
  • the encoder 130 encodes the video to create an encoded video
  • the output interface 140 transmits the encoded video over the medium 150 and towards the destination device 160.
  • the source device 110 may locally store the video or the encoded video, or the source device 110 may instruct storage of the video or the encoded video on another device.
  • the encoded video comprises data defined at various levels, including slices and blocks.
  • a slice is a spatially distinct region of a video frame that the encoder 130 encodes separately from any other region in the video frame.
  • a block is a group of pixels arranged in a rectangle. Blocks may also be referred to as units or coding units.
  • the input interface 170 receives the encoded video from the source device 110, the decoder 180 decodes the encoded video to obtain a decoded video, and the display 190 displays the decoded video.
  • the decoder 180 may decode the encoded video in a reverse manner compared to how the encoder 130 encodes the video.
  • the destination device 160 locally stores the encoded video or the decoded video, or the destination device 160 instructs storage of the encoded video or the decoded video on another device.
  • the coding system 100 is described as coding and communicating videos, which are simply series of images, the same concepts apply to single images.
  • the video generator 120 may be a traditional camera, an infrared camera, a time-of-flight camera, a laser system, a scanner, or another device that scans objects and generates point clouds representing the objects.
  • the objects and the point clouds may be 3D.
  • the point clouds comprise points, which may be more abundant in regions of objects that are more complex and may be less abundant in regions of objects that are less complex.
  • a point cloud representing a human comprises more points in a facial region and fewer points in a torso region covered by a uniformly-colored shirt.
  • the point clouds comprise hundreds of thousands or millions of points, so the point clouds require significant data to encode and significant bandwidth to communicate. There is therefore a desire to efficiently code, and thus communicate, the point clouds.
  • the embodiments provide for generating an initial k-d tree that is balanced but lossy, then generating nodes beyond the initial k-d tree to create a final k-d tree that may be unbalanced but lossless.
  • the embodiments provide geometric and scalable coding.
  • Geometric coding refers to coding of spatial positions of points, as opposed to attribute coding, which refers to coding values of points.
  • Scalable coding refers to coding that works for all levels of partitioning a point cloud.
  • the embodiments are discussed in the context of 3D point clouds and k-d trees of dimension three, but apply to point clouds and k-d trees of any dimension. In addition, the embodiments may apply in similar manners to data structures other than point clouds. By increasing some processing at the encoding stage, the embodiments provide for more efficient coding and thus more efficient communication.
  • FIG. 2 is a flowchart illustrating a method 200 of point cloud encoding and communication according to an embodiment of the disclosure.
  • the source device 110 performs the method 200.
  • the encoder 130 obtains a point cloud of an object.
  • the video generator 120 generates the point cloud and the encoder 130 receives the point cloud from the video generator 120, or the encoder 130 receives the point cloud from another device.
  • the point cloud comprises N points, where N is a positive integer.
  • the object is a 3D object such as a human, so the point cloud is also 3D.
  • the encoder 130 determines an initial depth of an initial k-d tree. For instance, the encoder 130 determines the initial depth based on the following inequality:
  • N is the total number of points in the point cloud
  • D is the initial depth of the initial k-d tree
  • M is a predetermined maximum number of levels the encoder 130 can efficiently encode given a constraint such as an amount of memory or processing power. Depth and levels are described below.
  • a manufacturer of the source device 110 predetermines M and stores M in a memory of the source device 110. M may be about 1,000. The manufacturer, a user of the source device 110, or another entity may adjust M.
  • the encoder 130 generates an initial k-d tree of the point cloud.
  • the initial point cloud comprises the initial depth. Because the point cloud is 3D, the k-d tree is a k-d tree of dimension three. Though the initial k-d tree is of dimension three, for ease of understanding generation of a k-d tree, FIGS. 3A-3D demonstrate building of a k-d tree of dimension two.
  • FIGS. 3A-3D are diagrams 300 demonstrating building of a k-d tree of dimension two.
  • the k-d tree is a binary tree in which every node is a k-dimensional point.
  • a binary tree is a data structure in which each node has at most two children.
  • the k-d tree describes N points in a point cloud.
  • the k-d tree is 2D because its nodes cut in both the x direction and the y direction.
  • FIG. 3A is a diagram showing a root node 310 at a zeroth level.
  • the root node 310 is called a root node because it is the only node in the k-d tree at a zeroth level.
  • a level refers to a number of cuts in the k-d tree.
  • the root node 310 comprises all N points.
  • FIG. 3B is a diagram showing the root node 310 cut into a node 320 and a node 330 at a first level.
  • the root node 310 is a parent of the nodes 320, 330, and the nodes 320, 330 are children of the root node 310.
  • the nodes 320, 330 are cut in the y direction.
  • Each of the nodes 320, 330 comprises N/2 points.
  • FIG. 3C is a diagram showing the node 320 cut into a node 340 and a node 350 at a second level.
  • the node 320 is a parent of the nodes 340, 350, and the nodes 340, 350 are children of the node 320.
  • the nodes 340, 350 are cut in the x direction.
  • Each of the nodes 340, 350 comprises N/4 points.
  • FIG. 3D is a diagram showing the node 350 cut into a leaf node 360 and a leaf node 370 at a third level.
  • the leaf nodes 360, 370 are called leaf nodes because they are the lowest-level nodes in the k-d tree.
  • the node 350 is a parent of the leaf nodes 360, 370, and the leaf nodes 360, 370 are children of the node 350.
  • the leaf nodes 360, 370 are cut in the y direction.
  • Each of the leaf nodes 360, 370 comprise N/8 points.
  • the k-d tree has a depth of three because its nodes split to a third level. Depth may also be referred to as height.
  • the k-d tree is an unbalanced k-d tree because the node 320 splits to a second level, but the node 330 does not, and because the node 350 splits to a third level, but the node 340 does not.
  • the k-d tree would be balanced if each branch of the k-d tree comprised the same number of levels or within one of the same number of levels.
  • a branch is a progression of parent and child relationships. For instance, one branch comprises the root node 310; the nodes 320, 350; and the leaf nodes 360, 370.
  • the nodes are not cut based on their size. For instance, the node 340 is larger than the node 350. Rather, the nodes are cut based on the locations of the pixels so that each child node associated with the same parent node comprises the same or substantially the same number of nodes.
  • the equal sizes of the nodes 320, 330 in FIG. 3B indicate the points are evenly distributed or substantially evenly distributed between the left side and the right side of the root node 310 in FIG. 3A.
  • the unequal sizes of the nodes 340, 350 in FIG. 3C indicate the points are unevenly distributed between the top side and the bottom side of the node 320 in FIG. 3B.
  • the points are less heavily distributed in the node 340 and more heavily distributed in the node 350.
  • the unequal sizes of the leaf nodes 360, 370 in FIG. 3D indicate the points are unevenly distributed between the left side and the right side of the node 350 in FIG. 3C.
  • the points are more heavily distributed in the leaf node 360 and less heavily distributed in the leaf node 370.
  • the initial k-d tree is similar to the k-d tree shown in FIGS. 3A-3D.
  • the initial k-d tree is balanced.
  • the initial k-d tree is of dimension three instead of dimension two
  • the initial k-d tree is of the initial depth determined at step 220 instead of four.
  • the nodes in the initial k-d tree are rectangular prisms instead of squares like in FIGS. 3A-3D.
  • the encoder 130 generates additional nodes beyond the initial depth. To do so, the encoder 130 samples m sampling nodes, which are a percentage of the leaf nodes from the initial k-d tree, meaning the nodes that are the lowest-level nodes in the initial k-d tree at level D. m is a positive integer.
  • a manufacturer of the source device 110 determines the percentage and stores the percentage in a memory of the source device 110. The manufacturer, a user of the source device 110, or another entity may adjust the percentage.
  • the number of nodes in the initial k-d tree is 2 D , and the percentage may be about 5%–10%of 2 D .
  • the encoder 130 performs an MST operation on all of the sampling nodes to connect points in the sampling nodes.
  • the encoder 130 calculates a residual R for each point in a sampling node, where the residual is a number of bits needed to describe a distance of a currently-examined point from a previously-examined point in the MST operation.
  • the encoder 130 calculates a sum of all of the residuals and averages the sum to obtain an average residual R avg .
  • the encoder 130 determines whether to split each sampling node based on the following inequality for each of the m sampling nodes:
  • a manufacturer of the source device 110 determines ⁇ and stores ⁇ in a memory of the source device 110. The manufacturer, a user of the source device 110, or another entity may adjust ⁇ . ⁇ may be about 1.2 –2.0. If the inequality is true, then the encoder 130 splits the sampling node into two additional nodes, specifically two child nodes. If the inequality is false, then the encoder 130 does not split the sampling node into two child nodes. The encoder 130 continues splitting the sampling nodes until the inequality is false for every sampling node. Once the encoder 130 has done so for each sampling node, the encoder 130 calculates the following:
  • is a depth difference
  • D’ is a final depth of the sampling node with the highest level
  • D is an initial depth of the initial k-d tree.
  • the encoder 130 generates a final k-d tree of the point cloud.
  • the final k-d tree comprises the initial k-d tree and the additional nodes.
  • the final k-d tree comprises the final depth and is unbalanced.
  • the encoder 130 encodes the point cloud as an encoded point cloud.
  • the encoded point cloud may be referred to as a bitstream.
  • the encoded point cloud comprises a bounding box, D, ⁇ , and the final k-d tree.
  • the bounding box describes diagonal corner points of the object, for instance a top-right point of the object and a bottom-left point of the object.
  • D is the initial depth of the initial k-d tree determined at step 220.
  • is the depth difference calculated at step 240.
  • the final k-d tree comprises, for each node, a dimension, a cut value, and a description of the points using the MST operation.
  • the dimension is a direction by which the node being encoded is cut. For instance, in FIG. 3B, the root node 310 is cut into the nodes 320, 330 in the y direction.
  • the dimension may be encoded with the following bit map:
  • each node has a dimension value indicating a split along the x direction, the y direction, or the z direction. Thus, each node will not have a dimension value indicating no split.
  • the value is K bits, where K is a resolution of the point cloud. For instance, K is 10 bits.
  • the final k-d tree comprises, for each node, a dimension, a cut value, an indication of whether the node is a parent, an indication of whether the node’s child is to the left or the right, and a description of the points using a TSP operation.
  • the TSP operation produces an optimal traversal of all the points in the node, creating the smallest differential residual for the node.
  • a 0 or 1 bit may provide both the indication of whether the node is a parent and the indication of whether the node’s child is to the left or the right.
  • the encoder 130 encodes the point cloud using entropy encoding such as arithmetic entropy coding or machine-learning-based entropy compression.
  • entropy encoding such as arithmetic entropy coding or machine-learning-based entropy compression.
  • PAQ uses lossless data compression archivers, which use a context mixing algorithm trained on representative point-cloud data streams.
  • PAQ is version 8, which combines predictions of various models by a weighted summation via a shallow neural network. An adaptive probability map may reduce a prediction error before PAQ. After encoding every bit, neural network weights are adjusted along a cost gradient.
  • the output interface 140 transmits the encoded point cloud. Specifically, the output interface 140 transmits the encoded point cloud to the input interface 170 of the destination device 160 over the medium 150.
  • the source device 110 may perform the method 200 at time intervals such as 60 times per second or 120 times per second. In that case, the collection of point clouds, which are 3D, include a fourth dimension of time.
  • FIG. 4 is a flowchart illustrating a method 400 of point cloud communication and decoding according to an embodiment of the disclosure.
  • the destination device 160 performs the method 400.
  • the input interface 170 receives an encoded point cloud comprising characteristics.
  • the input interface 170 may do so in response to the output interface 140 transmitting the encoded point cloud from step 270 of FIG. 2.
  • the characteristics may comprise the bounding box, D, ⁇ , and the final k-d tree described at step 270 in FIG. 2.
  • the decoder 180 extracts the characteristics from the encoded point cloud.
  • the decoder 180 generates a point cloud based on the characteristics.
  • the point cloud may be the point cloud described at step 210 in FIG. 2.
  • FIG. 5 is a schematic diagram of an apparatus 500 according to an embodiment of the disclosure.
  • the apparatus 500 may implement the disclosed embodiments.
  • the apparatus 500 comprises ingress ports 510 and an RX 520 for receiving data; a processor, logic unit, baseband unit, or CPU 530 to process the data; a TX 540 and egress ports 550 for transmitting the data; and a memory 560 for storing the data.
  • the apparatus 500 may also comprise OE components, EO components, or RF components coupled to the ingress ports 510, the RX 520, the TX 540, and the egress ports 550 for ingress or egress of optical, electrical signals, or RF signals.
  • the processor 530 is any combination of hardware, middleware, firmware, or software.
  • the processor 530 comprises any combination of one or more CPU chips, cores, FPGAs, ASICs, or DSPs.
  • the processor 530 communicates with the ingress ports 510, the RX 520, the TX 540, the egress ports 550, and the memory 560.
  • the processor 530 comprises a point cloud coding component 570, which implements the disclosed embodiments. The inclusion of the point cloud coding component 570 therefore provides a substantial improvement to the functionality of the apparatus 500 and effects a transformation of the apparatus 500 to a different state.
  • the memory 560 stores the point cloud coding component 570 as instructions, and the processor 530 executes those instructions.
  • the memory 560 comprises any combination of disks, tape drives, or solid-state drives.
  • the apparatus 500 may use the memory 560 as an over-flow data storage device to store programs when the apparatus 500 selects those programs for execution and to store instructions and data that the apparatus 500 reads during execution of those programs.
  • the memory 560 may be volatile or non-volatile and may be any combination of ROM, RAM, TCAM, or SRAM.
  • An apparatus comprises a memory means and a processor means coupled to the memory means and configured to obtain a point cloud of an object, wherein the point cloud comprises points, generate an initial k-d tree of the point cloud, wherein the initial k-d tree comprises an initial depth and is balanced, generate a final k-d tree of the point cloud, wherein the final k-d tree comprises a final depth and is unbalanced, and wherein the final depth is greater than the initial depth, and encode an encoded point cloud, wherein the encoded point cloud comprises a depth difference between the final depth and the initial depth.

Abstract

A method comprises obtaining a point cloud of an object, wherein the point cloud comprises points; generating an initial k-d tree of the point cloud, wherein the initial k-d tree comprises an initial depth and is balanced; generating a final k-d tree of the point cloud, wherein the final k-d tree comprises a final depth and is unbalanced, and wherein the final depth is greater than the initial depth; and encoding an encoded point cloud, wherein the encoded point cloud comprises a depth difference between the final depth and the initial depth.

Description

Point Cloud Coding
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority to United States provisional patent application number 62/734,831, filed September 21, 2018 by Futurewei Technologies, Inc. and titled “Point Cloud Coding” and United States provisional patent application number 62/566,761 filed October 2, 2017 by Futurewei Technologies, Inc. and titled “Method and Apparatus for Lossless Point Cloud Geometry Compression, ” which are incorporated by reference.
TECHNICAL FIELD
The disclosed embodiments relate to video coding in general and point cloud coding in particular.
BACKGROUND
Videos use a relatively large amount of data, so communication of videos uses a relatively large amount of bandwidth. However, many networks operate at or near their bandwidth capacities. In addition, customers demand high video quality, which requires using even more data. There is therefore a desire to both reduce the amount of data videos use and improve video quality. One solution is to compress videos during an encoding process and decompress the videos during a decoding process.
SUMMARY
In one embodiment, a method comprises: obtaining a point cloud of an object, wherein the point cloud comprises points; generating an initial k-d tree of the point cloud, wherein the initial k-d tree comprises an initial depth and is balanced; generating a final k-d tree of the point cloud, wherein the final k-d tree comprises a final depth and is unbalanced, and wherein the final depth is greater than the initial depth; and encoding an encoded point cloud, wherein the encoded point cloud comprises a depth difference between the final depth and the initial depth.
In any of the preceding embodiments, the method further comprises further encoding a bounding box describing diagonal corner points of the object.
In any of the preceding embodiments, the method further comprises further encoding the initial depth.
In any of the preceding embodiments, the method further comprises further encoding the final k-d tree.
In any of the preceding embodiments, the method further comprises transmitting the encoded point cloud.
In any of the preceding embodiments, the method further comprises determining the initial depth based on a total number of the points.
In any of the preceding embodiments, the method further comprises further determining the initial depth based on a predetermined maximum number of levels that can be efficiently encoded given a constraint.
In any of the preceding embodiments, the constraint is an amount of memory or a processing power.
In any of the preceding embodiments, the method further comprises generating additional nodes beyond the initial depth.
In any of the preceding embodiments, the method further comprises further generating the additional nodes based on an MST operation.
In any of the preceding embodiments, the method further comprises further generating the additional nodes based on an average residual from the MST operation.
In any of the preceding embodiments, the method further comprises further generating the final k-d tree using the initial k-d tree and the additional nodes.
In any of the preceding embodiments, the point cloud, the initial k-d tree, and the final k-d tree are 3D.
In another embodiment, an apparatus comprises a memory; and a processor coupled to the memory and configured to: obtain a point cloud of an object, wherein the point cloud comprises points, generate an initial k-d tree of the point cloud, wherein the initial k-d tree comprises an initial depth and is balanced, generate a final k-d tree of the point cloud, wherein the final k-d tree comprises a final depth and is unbalanced, and wherein the final depth is greater than the initial depth, and encode an encoded point cloud, wherein the encoded point cloud comprises a depth difference between the final depth and the initial depth.
In any of the preceding embodiments, the processor is further configured to further encode a bounding box describing diagonal corner points of the object.
In any of the preceding embodiments, the processor is further configured to further encode the initial depth.
In any of the preceding embodiments, the processor is further configured to further encode the final k-d tree.
In any of the preceding embodiments, the apparatus further comprises a transmitter coupled to the processor and configured to transmit the encoded point cloud.
In any of the preceding embodiments, the processor is further configured to determine the initial depth based on a total number of the points.
In any of the preceding embodiments, the processor is further configured to further determine the initial depth based on a predetermined maximum number of levels that can be efficiently encoded given a constraint.
In any of the preceding embodiments, the constraint is an amount of memory or a processing power.
In any of the preceding embodiments, the processor is further configured to generate additional nodes beyond the initial depth.
In any of the preceding embodiments, the processor is further configured to further generate the additional nodes based on an MST operation.
In any of the preceding embodiments, the processor is further configured to further generate the additional nodes based on an average residual from the MST operation.
In any of the preceding embodiments, the processor is further configured to further generate the final k-d tree using the initial k-d tree and the additional nodes.
In any of the preceding embodiments, the point cloud, the initial k-d tree, and the final k-d tree are 3D.
In yet another embodiment, a computer program product comprises computer executable instructions stored on a non-transitory medium that when executed by a processor cause an apparatus to obtain a point cloud of an object, wherein the point cloud comprises points; generate an initial k-d tree of the point cloud, wherein the initial k-d tree comprises an initial depth and is balanced; generate a final k-d tree of the point cloud, wherein the final k-d tree comprises a final depth and is unbalanced, and wherein the final depth is greater than the initial depth; and encode an encoded point cloud, wherein the encoded point cloud comprises a depth difference between the final depth and the initial depth.
In any of the preceding embodiments, the instructions further cause the apparatus to further encode a bounding box describing diagonal corner points of the object.
In any of the preceding embodiments, the instructions further cause the apparatus to further encode the initial depth.
In any of the preceding embodiments, the instructions further cause the apparatus to further encode the final k-d tree.
In any of the preceding embodiments, the instructions further cause the apparatus to transmit the encoded point cloud.
In any of the preceding embodiments, the instructions further cause the apparatus to determine the initial depth based on a total number of the points.
In any of the preceding embodiments, the instructions further cause the apparatus to further determine the initial depth based on a predetermined maximum number of levels that can be efficiently encoded given a constraint.
In any of the preceding embodiments, the constraint is an amount of memory or a processing power.
In any of the preceding embodiments, the instructions further cause the apparatus to generate additional nodes beyond the initial depth.
In any of the preceding embodiments, the instructions further cause the apparatus to further generate the additional nodes based on an MST operation.
In any of the preceding embodiments, the instructions further cause the apparatus to further generate the additional nodes based on an average residual from the MST operation.
In any of the preceding embodiments, the instructions further cause the apparatus to further generate the final k-d tree using the initial k-d tree and the additional nodes.
In any of the preceding embodiments, the point cloud, the initial k-d tree, and the final k-d tree are 3D.
In yet another embodiment, a method comprises receiving an encoded point cloud comprising characteristics, wherein the characteristics comprise a bounding box describing diagonal corner points of an object, an initial depth of an initial k-d tree, a final k-d tree associated with a final depth, and a depth difference between the final depth; extracting the characteristics from the encoded point cloud; and generating a point cloud based on the characteristics.
In any of the preceding embodiments, the point cloud, the initial k-d tree, and the final k-d tree are 3D.
In yet another embodiment, an apparatus comprises a memory; and a processor coupled to the memory and configured to: receive an encoded point cloud comprising characteristics, wherein the characteristics comprise a bounding box describing diagonal corner points of an object, an initial depth of an initial k-d tree, a final k-d tree associated  with a final depth, and a depth difference between the final depth, extract the characteristics from the encoded point cloud, and generate a point cloud based on the characteristics.
In any of the preceding embodiments, the point cloud, the initial k-d tree, and the final k-d tree are 3D.
In yet another embodiment, a computer program product comprises computer executable instructions stored on a non-transitory medium that when executed by a processor cause an apparatus to: receive an encoded point cloud comprising characteristics, wherein the characteristics comprise a bounding box describing diagonal corner points of an object, an initial depth of an initial k-d tree, a final k-d tree associated with a final depth, and a depth difference between the final depth; extract the characteristics from the encoded point cloud; and generate a point cloud based on the characteristics.
In any of the preceding embodiments, the point cloud, the initial k-d tree, and the final k-d tree are 3D.
The preceding embodiments provide for generating an initial k-d tree that is balanced but lossy, then generating nodes beyond the initial k-d tree to create a final k-d tree that may be unbalanced but lossless. The embodiments provide geometric and scalable coding. By increasing some processing at the encoding stage, the embodiments provide for more efficient coding and thus more efficient communication.
Any of the above embodiments may be combined with any of the other above embodiments to create a new embodiment. These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
FIG. 1 is a schematic diagram of a coding system.
FIG. 2 is a flowchart illustrating a method of point cloud encoding and communication according to an embodiment of the disclosure.
FIGS. 3A-3D are diagrams demonstrating building of a k-d tree of dimension two.
FIG. 4 is a flowchart illustrating a method of point cloud communication and decoding according to an embodiment of the disclosure.
FIG. 5 is a schematic diagram of an apparatus according to an embodiment of the disclosure.
DETAILED DESCRIPTION
It should be understood at the outset that, although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
The following abbreviations apply:
ASIC: application-specific integrated circuit
CPU: central processing unit
DSP: digital signal processor
EO: electrical-to-optical
FPGA: field-programmable gate array
LCD: liquid crystal display
MST: minimum spanning tree
OE: optical-to-electrical
RAM: random-access memory
RF: radio frequency
ROM: read-only memory
RX: receiver unit
SRAM: static RAM
TCAM: ternary content-addressable memory
TSP: traveling salesman problem
TX: transmitter unit
2D: two-dimensional
3D: three-dimensional.
FIG. 1 is a schematic diagram of a coding system 100. The coding system 100 comprises a source device 110, a medium 150, and a destination device 160. The source device 110 and the destination device 160 are mobile phones, tablet computers, desktop  computers, notebook computers, or other suitable devices. The medium 150 is a local network, a radio network, the Internet, or another suitable medium.
The source device 110 comprises a video generator 120, an encoder 130, and an output interface 140. The video generator 120 is a camera or another device suitable for generating video. Videos include any visual representations of volumetric spaces or other multidimensional data. The encoder 130 may be referred to as a codec. The encoder 130 performs encoding according to a set of rules, for instance as described in “High Efficiency Video Coding, ” ITU-T H. 265, December 2016 ( “H. 265” ) . The output interface 140 is an antenna or another component suitable for transmitting data to the destination device 160. Alternatively, the video generator 120, the encoder 130, and the output interface 140 are in any suitable combination of devices.
The destination device 160 comprises an input interface 170, a decoder 180, and a display 190. The input interface 170 is an antenna or another component suitable for receiving data from the source device 110. The decoder 180 may also be referred to as a codec. The decoder 180 performs decoding according to a set of rules, for instance as described in H. 265. The display 190 is an LCD screen or another component suitable for displaying videos. Alternatively, the input interface 170, the decoder 180, and the display 190 are in any suitable combination of devices.
In operation, in the source device 110, the video generator 120 captures a video, the encoder 130 encodes the video to create an encoded video, and the output interface 140 transmits the encoded video over the medium 150 and towards the destination device 160. The source device 110 may locally store the video or the encoded video, or the source device 110 may instruct storage of the video or the encoded video on another device. The encoded video comprises data defined at various levels, including slices and blocks. A slice is a spatially distinct region of a video frame that the encoder 130 encodes separately from any other region in the video frame. A block is a group of pixels arranged in a rectangle. Blocks may also be referred to as units or coding units. In the destination device 160, the input interface 170 receives the encoded video from the source device 110, the decoder 180 decodes the encoded video to obtain a decoded video, and the display 190 displays the decoded video. The decoder 180 may decode the encoded video in a reverse manner compared to how the encoder 130 encodes the video. The destination device 160 locally stores the encoded video or the decoded video, or the destination device 160 instructs storage of the encoded video or the decoded video on another device. Though the coding system 100  is described as coding and communicating videos, which are simply series of images, the same concepts apply to single images.
The video generator 120 may be a traditional camera, an infrared camera, a time-of-flight camera, a laser system, a scanner, or another device that scans objects and generates point clouds representing the objects. The objects and the point clouds may be 3D. The point clouds comprise points, which may be more abundant in regions of objects that are more complex and may be less abundant in regions of objects that are less complex. For instance, a point cloud representing a human comprises more points in a facial region and fewer points in a torso region covered by a uniformly-colored shirt. The point clouds comprise hundreds of thousands or millions of points, so the point clouds require significant data to encode and significant bandwidth to communicate. There is therefore a desire to efficiently code, and thus communicate, the point clouds.
Disclosed herein are embodiments for point cloud coding. The embodiments provide for generating an initial k-d tree that is balanced but lossy, then generating nodes beyond the initial k-d tree to create a final k-d tree that may be unbalanced but lossless. The embodiments provide geometric and scalable coding. Geometric coding refers to coding of spatial positions of points, as opposed to attribute coding, which refers to coding values of points. Scalable coding refers to coding that works for all levels of partitioning a point cloud. The embodiments are discussed in the context of 3D point clouds and k-d trees of dimension three, but apply to point clouds and k-d trees of any dimension. In addition, the embodiments may apply in similar manners to data structures other than point clouds. By increasing some processing at the encoding stage, the embodiments provide for more efficient coding and thus more efficient communication.
FIG. 2 is a flowchart illustrating a method 200 of point cloud encoding and communication according to an embodiment of the disclosure. The source device 110 performs the method 200. At step 210, the encoder 130 obtains a point cloud of an object. For instance, the video generator 120 generates the point cloud and the encoder 130 receives the point cloud from the video generator 120, or the encoder 130 receives the point cloud from another device. The point cloud comprises N points, where N is a positive integer. The object is a 3D object such as a human, so the point cloud is also 3D.
At step 220, the encoder 130 determines an initial depth of an initial k-d tree. For instance, the encoder 130 determines the initial depth based on the following inequality:
N/2 D < M.    (1)
N is the total number of points in the point cloud, D is the initial depth of the initial k-d tree, and M is a predetermined maximum number of levels the encoder 130 can efficiently encode given a constraint such as an amount of memory or processing power. Depth and levels are described below. A manufacturer of the source device 110 predetermines M and stores M in a memory of the source device 110. M may be about 1,000. The manufacturer, a user of the source device 110, or another entity may adjust M.
At step 230, the encoder 130 generates an initial k-d tree of the point cloud. The initial point cloud comprises the initial depth. Because the point cloud is 3D, the k-d tree is a k-d tree of dimension three. Though the initial k-d tree is of dimension three, for ease of understanding generation of a k-d tree, FIGS. 3A-3D demonstrate building of a k-d tree of dimension two.
FIGS. 3A-3D are diagrams 300 demonstrating building of a k-d tree of dimension two. The k-d tree is a binary tree in which every node is a k-dimensional point. A binary tree is a data structure in which each node has at most two children. The k-d tree describes N points in a point cloud. For FIGS. 3A-3D, the k-d tree is 2D because its nodes cut in both the x direction and the y direction.
FIG. 3A is a diagram showing a root node 310 at a zeroth level. The root node 310 is called a root node because it is the only node in the k-d tree at a zeroth level. A level refers to a number of cuts in the k-d tree. The root node 310 comprises all N points.
FIG. 3B is a diagram showing the root node 310 cut into a node 320 and a node 330 at a first level. The root node 310 is a parent of the  nodes  320, 330, and the  nodes  320, 330 are children of the root node 310. The  nodes  320, 330 are cut in the y direction. Each of the  nodes  320, 330 comprises N/2 points.
FIG. 3C is a diagram showing the node 320 cut into a node 340 and a node 350 at a second level. The node 320 is a parent of the  nodes  340, 350, and the  nodes  340, 350 are children of the node 320. The  nodes  340, 350 are cut in the x direction. Each of the  nodes  340, 350 comprises N/4 points.
FIG. 3D is a diagram showing the node 350 cut into a leaf node 360 and a leaf node 370 at a third level. The  leaf nodes  360, 370 are called leaf nodes because they are the lowest-level nodes in the k-d tree. The node 350 is a parent of the  leaf nodes  360, 370, and the  leaf nodes  360, 370 are children of the node 350. The  leaf nodes  360, 370 are cut in the y direction. Each of the  leaf nodes  360, 370 comprise N/8 points.
In FIGS. 3A-3D, the k-d tree has a depth of three because its nodes split to a third level. Depth may also be referred to as height. The k-d tree is an unbalanced k-d tree  because the node 320 splits to a second level, but the node 330 does not, and because the node 350 splits to a third level, but the node 340 does not. The k-d tree would be balanced if each branch of the k-d tree comprised the same number of levels or within one of the same number of levels. A branch is a progression of parent and child relationships. For instance, one branch comprises the root node 310; the  nodes  320, 350; and the  leaf nodes  360, 370.
The nodes are not cut based on their size. For instance, the node 340 is larger than the node 350. Rather, the nodes are cut based on the locations of the pixels so that each child node associated with the same parent node comprises the same or substantially the same number of nodes. Thus, the equal sizes of the  nodes  320, 330 in FIG. 3B indicate the points are evenly distributed or substantially evenly distributed between the left side and the right side of the root node 310 in FIG. 3A. In contrast, the unequal sizes of the  nodes  340, 350 in FIG. 3C indicate the points are unevenly distributed between the top side and the bottom side of the node 320 in FIG. 3B. Specifically, within the node 320, the points are less heavily distributed in the node 340 and more heavily distributed in the node 350. Similarly, the unequal sizes of the  leaf nodes  360, 370 in FIG. 3D indicate the points are unevenly distributed between the left side and the right side of the node 350 in FIG. 3C. Specifically, within the node 350, the points are more heavily distributed in the leaf node 360 and less heavily distributed in the leaf node 370.
Returning to step 230 in FIG. 2, the initial k-d tree is similar to the k-d tree shown in FIGS. 3A-3D. For instance, the initial k-d tree is balanced. However, the initial k-d tree is of dimension three instead of dimension two, and the initial k-d tree is of the initial depth determined at step 220 instead of four. Thus, the nodes in the initial k-d tree are rectangular prisms instead of squares like in FIGS. 3A-3D.
At step 240, the encoder 130 generates additional nodes beyond the initial depth. To do so, the encoder 130 samples m sampling nodes, which are a percentage of the leaf nodes from the initial k-d tree, meaning the nodes that are the lowest-level nodes in the initial k-d tree at level D. m is a positive integer. A manufacturer of the source device 110 determines the percentage and stores the percentage in a memory of the source device 110. The manufacturer, a user of the source device 110, or another entity may adjust the percentage. The number of nodes in the initial k-d tree is 2 D, and the percentage may be about 5%–10%of 2 D. The encoder 130 performs an MST operation on all of the sampling nodes to connect points in the sampling nodes. The encoder 130 calculates a residual R for each point in a sampling node, where the residual is a number of bits needed to describe a distance of a currently-examined point from a previously-examined point in the MST  operation. The encoder 130 calculates a sum of all of the residuals and averages the sum to obtain an average residual R avg. The encoder 130 then determines whether to split each sampling node based on the following inequality for each of the m sampling nodes:
R leaf, i < α R avg.      (2)
R leaf is a residual of a sampling node i; i = 1, 2, ... m; α is a performance factor comparing MST to TSP; and R avg is an average residual as described above. A manufacturer of the source device 110 determines α and stores α in a memory of the source device 110. The manufacturer, a user of the source device 110, or another entity may adjust α. α may be about 1.2 –2.0. If the inequality is true, then the encoder 130 splits the sampling node into two additional nodes, specifically two child nodes. If the inequality is false, then the encoder 130 does not split the sampling node into two child nodes. The encoder 130 continues splitting the sampling nodes until the inequality is false for every sampling node. Once the encoder 130 has done so for each sampling node, the encoder 130 calculates the following:
Δ = D’–D.       (3) .
Δ is a depth difference, D’is a final depth of the sampling node with the highest level, and D is an initial depth of the initial k-d tree.
At step 250, the encoder 130 generates a final k-d tree of the point cloud. The final k-d tree comprises the initial k-d tree and the additional nodes. The final k-d tree comprises the final depth and is unbalanced.
At step 260, the encoder 130 encodes the point cloud as an encoded point cloud. The encoded point cloud may be referred to as a bitstream. The encoded point cloud comprises a bounding box, D, Δ, and the final k-d tree. The bounding box describes diagonal corner points of the object, for instance a top-right point of the object and a bottom-left point of the object. D is the initial depth of the initial k-d tree determined at step 220. Δ is the depth difference calculated at step 240. For levels 0 –D, the final k-d tree comprises, for each node, a dimension, a cut value, and a description of the points using the MST operation. The dimension is a direction by which the node being encoded is cut. For instance, in FIG. 3B, the root node 310 is cut into the  nodes  320, 330 in the y direction. The dimension may be encoded with the following bit map:
00: no split
01: x direction
10: y direction
11: z direction.
For levels 0 –D, each node has a dimension value indicating a split along the x direction, the y direction, or the z direction. Thus, each node will not have a dimension value indicating no split. The value is K bits, where K is a resolution of the point cloud. For instance, K is 10 bits. For levels (D+1) –D’, the final k-d tree comprises, for each node, a dimension, a cut value, an indication of whether the node is a parent, an indication of whether the node’s child is to the left or the right, and a description of the points using a TSP operation. The TSP operation produces an optimal traversal of all the points in the node, creating the smallest differential residual for the node. The dimension and the cut value are described above. However, for levels (D+1) , some nodes may not have children and therefore may have a dimension value indicating no split. A 0 or 1 bit may provide both the indication of whether the node is a parent and the indication of whether the node’s child is to the left or the right.
The encoder 130 encodes the point cloud using entropy encoding such as arithmetic entropy coding or machine-learning-based entropy compression. One type of machine-learning-based entropy compression is PAQ. PAQ uses lossless data compression archivers, which use a context mixing algorithm trained on representative point-cloud data streams. One type of PAQ is version 8, which combines predictions of various models by a weighted summation via a shallow neural network. An adaptive probability map may reduce a prediction error before PAQ. After encoding every bit, neural network weights are adjusted along a cost gradient.
Finally, at step 270, the output interface 140 transmits the encoded point cloud. Specifically, the output interface 140 transmits the encoded point cloud to the input interface 170 of the destination device 160 over the medium 150. The source device 110 may perform the method 200 at time intervals such as 60 times per second or 120 times per second. In that case, the collection of point clouds, which are 3D, include a fourth dimension of time.
FIG. 4 is a flowchart illustrating a method 400 of point cloud communication and decoding according to an embodiment of the disclosure. The destination device 160 performs the method 400. At step 410, the input interface 170 receives an encoded point cloud comprising characteristics. The input interface 170 may do so in response to the output interface 140 transmitting the encoded point cloud from step 270 of FIG. 2. The characteristics may comprise the bounding box, D, Δ, and the final k-d tree described at step 270 in FIG. 2. At step 420, the decoder 180 extracts the characteristics from the encoded point cloud. Finally, at step 430, the decoder 180 generates a point cloud based on the characteristics. The point cloud may be the point cloud described at step 210 in FIG. 2.
FIG. 5 is a schematic diagram of an apparatus 500 according to an embodiment of the disclosure. The apparatus 500 may implement the disclosed embodiments. The apparatus 500 comprises ingress ports 510 and an RX 520 for receiving data; a processor, logic unit, baseband unit, or CPU 530 to process the data; a TX 540 and egress ports 550 for transmitting the data; and a memory 560 for storing the data. The apparatus 500 may also comprise OE components, EO components, or RF components coupled to the ingress ports 510, the RX 520, the TX 540, and the egress ports 550 for ingress or egress of optical, electrical signals, or RF signals.
The processor 530 is any combination of hardware, middleware, firmware, or software. The processor 530 comprises any combination of one or more CPU chips, cores, FPGAs, ASICs, or DSPs. The processor 530 communicates with the ingress ports 510, the RX 520, the TX 540, the egress ports 550, and the memory 560. The processor 530 comprises a point cloud coding component 570, which implements the disclosed embodiments. The inclusion of the point cloud coding component 570 therefore provides a substantial improvement to the functionality of the apparatus 500 and effects a transformation of the apparatus 500 to a different state. Alternatively, the memory 560 stores the point cloud coding component 570 as instructions, and the processor 530 executes those instructions.
The memory 560 comprises any combination of disks, tape drives, or solid-state drives. The apparatus 500 may use the memory 560 as an over-flow data storage device to store programs when the apparatus 500 selects those programs for execution and to store instructions and data that the apparatus 500 reads during execution of those programs. The memory 560 may be volatile or non-volatile and may be any combination of ROM, RAM, TCAM, or SRAM.
An apparatus comprises a memory means and a processor means coupled to the memory means and configured to obtain a point cloud of an object, wherein the point cloud comprises points, generate an initial k-d tree of the point cloud, wherein the initial k-d tree comprises an initial depth and is balanced, generate a final k-d tree of the point cloud, wherein the final k-d tree comprises a final depth and is unbalanced, and wherein the final depth is greater than the initial depth, and encode an encoded point cloud, wherein the encoded point cloud comprises a depth difference between the final depth and the initial depth.
The term “about” means a range including ±10%of the subsequent number unless otherwise stated. The term “substantially” means within ±10%. While several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems  and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, components, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and may be made without departing from the spirit and scope disclosed herein.

Claims (45)

  1. A method comprising:
    obtaining a point cloud of an object, wherein the point cloud comprises points;
    generating an initial k-d tree of the point cloud, wherein the initial k-d tree comprises an initial depth and is balanced;
    generating a final k-d tree of the point cloud, wherein the final k-d tree comprises a final depth and is unbalanced, and wherein the final depth is greater than the initial depth; and
    encoding an encoded point cloud,
    wherein the encoded point cloud comprises a depth difference between the final depth and the initial depth.
  2. The method of claim 1, further comprising further encoding a bounding box describing diagonal corner points of the object.
  3. The method of any of claims 1-2, further comprising further encoding the initial depth.
  4. The method of any of claims 1-3, further comprising further encoding the final k-d tree.
  5. The method of any of claims 1-4, further comprising transmitting the encoded point cloud.
  6. The method of any of claims 1-5, further comprising determining the initial depth based on a total number of the points.
  7. The method of any of claims 1-6, further comprising further determining the initial depth based on a predetermined maximum number of levels that can be efficiently encoded given a constraint.
  8. The method of any of claims 1-7, wherein the constraint is an amount of memory or a processing power.
  9. The method of any of claims 1-8, further comprising generating additional nodes beyond the initial depth.
  10. The method of any of claims 1-9, further comprising further generating the additional nodes based on a minimum spanning tree (MST) operation.
  11. The method of any of claims 1-10, further comprising further generating the additional nodes based on an average residual from the MST operation.
  12. The method of any of claims 1-11, further comprising further generating the final k-d tree using the initial k-d tree and the additional nodes.
  13. The method of any of claims 1-12, wherein the point cloud, the initial k-d tree, and the final k-d tree are three-dimensional (3D) .
  14. An apparatus comprising:
    a memory; and
    a processor coupled to the memory and configured to:
    obtain a point cloud of an object, wherein the point cloud comprises points,
    generate an initial k-d tree of the point cloud, wherein the initial k-d tree comprises an initial depth and is balanced,
    generate a final k-d tree of the point cloud, wherein the final k-d tree comprises a final depth and is unbalanced, and wherein the final depth is greater than the initial depth, and
    encode an encoded point cloud,
    wherein the encoded point cloud comprises a depth difference between the final depth and the initial depth.
  15. The apparatus of claim 14, wherein the processor is further configured to further encode a bounding box describing diagonal corner points of the object.
  16. The apparatus of any of claims 13-15, wherein the processor is further configured to further encode the initial depth.
  17. The apparatus of any of claims 13-16, wherein the processor is further configured to further encode the final k-d tree.
  18. The apparatus of any of claims 13-17, further comprising a transmitter coupled to the processor and configured to transmit the encoded point cloud.
  19. The apparatus of any of claims 13-18, wherein the processor is further configured to determine the initial depth based on a total number of the points.
  20. The apparatus of any of claims 13-19, wherein the processor is further configured to further determine the initial depth based on a predetermined maximum number of levels that can be efficiently encoded given a constraint.
  21. The apparatus of any of claims 13-20, wherein the constraint is an amount of memory or a processing power.
  22. The apparatus of any of claims 13-21, wherein the processor is further configured to generate additional nodes beyond the initial depth.
  23. The apparatus of any of claims 13-22, wherein the processor is further configured to further generate the additional nodes based on a minimum spanning tree (MST) operation.
  24. The apparatus of any of claims 13-23, wherein the processor is further configured to further generate the additional nodes based on an average residual from the MST operation.
  25. The apparatus of any of claims 13-24, wherein the processor is further configured to further generate the final k-d tree using the initial k-d tree and the additional nodes.
  26. The apparatus of any of claims 13-25, wherein the point cloud, the initial k-d tree, and the final k-d tree are three-dimensional (3D) .
  27. A computer program product comprising computer executable instructions stored on a non-transitory medium that when executed by a processor cause an apparatus to:
    obtain a point cloud of an object, wherein the point cloud comprises points;
    generate an initial k-d tree of the point cloud, wherein the initial k-d tree comprises an initial depth and is balanced;
    generate a final k-d tree of the point cloud, wherein the final k-d tree comprises a final depth and is unbalanced, and wherein the final depth is greater than the initial depth; and
    encode an encoded point cloud,
    wherein the encoded point cloud comprises a depth difference between the final depth and the initial depth.
  28. The computer program product of claim 27, wherein the instructions further cause the apparatus to further encode a bounding box describing diagonal corner points of the object.
  29. The computer program product of any of claims 27-28, wherein the instructions further cause the apparatus to further encode the initial depth.
  30. The computer program product of any of claims 27-29, wherein the instructions further cause the apparatus to further encode the final k-d tree.
  31. The computer program product of any of claims 27-30, wherein the instructions further cause the apparatus to transmit the encoded point cloud.
  32. The computer program product of any of claims 27-31, wherein the instructions further cause the apparatus to determine the initial depth based on a total number of the points.
  33. The computer program product of any of claims 27-32, wherein the instructions further cause the apparatus to further determine the initial depth based on a predetermined maximum number of levels that can be efficiently encoded given a constraint.
  34. The computer program product of any of claims 27-33, wherein the constraint is an amount of memory or a processing power.
  35. The computer program product of claim any of claims 27-34, wherein the instructions further cause the apparatus to generate additional nodes beyond the initial depth.
  36. The computer program product of any of claims 27-35, wherein the instructions further cause the apparatus to further generate the additional nodes based on a minimum spanning tree (MST) operation.
  37. The computer program product of any of claims 27-36, wherein the instructions further cause the apparatus to further generate the additional nodes based on an average residual from the MST operation.
  38. The computer program product of any of claims 27-37, wherein the instructions further cause the apparatus to further generate the final k-d tree using the initial k-d tree and the additional nodes.
  39. The computer program product of any of claims 27-38, wherein the point cloud, the initial k-d tree, and the final k-d tree are three-dimensional (3D) .
  40. A method comprising:
    receiving an encoded point cloud comprising characteristics, wherein the characteristics comprise a bounding box describing diagonal corner points of an object, an initial depth of an initial k-d tree, a final k-d tree associated with a final depth, and a depth difference between the final depth;
    extracting the characteristics from the encoded point cloud; and
    generating a point cloud based on the characteristics.
  41. The method of claim 40, wherein the point cloud, the initial k-d tree, and the final k-d tree are three-dimensional (3D) .
  42. An apparatus comprising:
    a memory; and
    a processor coupled to the memory and configured to:
    receive an encoded point cloud comprising characteristics, wherein the characteristics comprise a bounding box describing diagonal corner points of an object, an initial depth of an initial k-d tree, a final k-d tree associated with a final depth, and a depth difference between the final depth,
    extract the characteristics from the encoded point cloud, and
    generate a point cloud based on the characteristics.
  43. The apparatus of claim 42, wherein the point cloud, the initial k-d tree, and the final k-d tree are three-dimensional (3D) .
  44. A computer program product comprising computer executable instructions stored on a non-transitory medium that when executed by a processor cause an apparatus to:
    receive an encoded point cloud comprising characteristics, wherein the characteristics comprise a bounding box describing diagonal corner points of an object, an initial depth of an initial k-d tree, a final k-d tree associated with a final depth, and a depth difference between the final depth;
    extract the characteristics from the encoded point cloud; and
    generate a point cloud based on the characteristics.
  45. The computer program product of claim 44, wherein the point cloud, the initial k-d tree, and the final k-d tree are three-dimensional (3D) .
PCT/CN2018/109296 2017-10-02 2018-10-08 Point cloud coding WO2019068259A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762566761P 2017-10-02 2017-10-02
US62/566,761 2017-10-02
US201862734831P 2018-09-21 2018-09-21
US62/734,831 2018-09-21

Publications (1)

Publication Number Publication Date
WO2019068259A1 true WO2019068259A1 (en) 2019-04-11

Family

ID=65994159

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/109296 WO2019068259A1 (en) 2017-10-02 2018-10-08 Point cloud coding

Country Status (1)

Country Link
WO (1) WO2019068259A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115102935A (en) * 2022-06-17 2022-09-23 腾讯科技(深圳)有限公司 Point cloud encoding method, point cloud decoding method and related equipment
CN115379191A (en) * 2022-08-22 2022-11-22 腾讯科技(深圳)有限公司 Point cloud decoding method, point cloud encoding method and related equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682103A (en) * 2012-04-28 2012-09-19 北京建筑工程学院 Three-dimensional space index method aiming at massive laser radar point cloud models
CN103544249A (en) * 2013-10-11 2014-01-29 北京建筑大学 Method for indexing scattered point cloud space of historic building
CN104040592A (en) * 2011-11-07 2014-09-10 汤姆逊许可公司 Predictive position decoding
CN105139449A (en) * 2015-08-24 2015-12-09 上海卫高网络科技有限公司 Three-dimensional model compression method based on three-dimensional mesh subdivision and coding
US20170046589A1 (en) * 2013-11-07 2017-02-16 Autodesk, Inc. Pre-segment point cloud data to run real-time shape extraction faster
WO2017050858A1 (en) * 2015-09-23 2017-03-30 Koninklijke Philips N.V. Generation of triangle mesh for a three dimensional image
CN106951643A (en) * 2017-03-22 2017-07-14 广东工业大学 A kind of complicated outside plate three dimensional point cloud compressing method of hull and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104040592A (en) * 2011-11-07 2014-09-10 汤姆逊许可公司 Predictive position decoding
CN102682103A (en) * 2012-04-28 2012-09-19 北京建筑工程学院 Three-dimensional space index method aiming at massive laser radar point cloud models
CN103544249A (en) * 2013-10-11 2014-01-29 北京建筑大学 Method for indexing scattered point cloud space of historic building
US20170046589A1 (en) * 2013-11-07 2017-02-16 Autodesk, Inc. Pre-segment point cloud data to run real-time shape extraction faster
CN105139449A (en) * 2015-08-24 2015-12-09 上海卫高网络科技有限公司 Three-dimensional model compression method based on three-dimensional mesh subdivision and coding
WO2017050858A1 (en) * 2015-09-23 2017-03-30 Koninklijke Philips N.V. Generation of triangle mesh for a three dimensional image
CN106951643A (en) * 2017-03-22 2017-07-14 广东工业大学 A kind of complicated outside plate three dimensional point cloud compressing method of hull and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115102935A (en) * 2022-06-17 2022-09-23 腾讯科技(深圳)有限公司 Point cloud encoding method, point cloud decoding method and related equipment
CN115102935B (en) * 2022-06-17 2024-02-09 腾讯科技(深圳)有限公司 Point cloud encoding method, point cloud decoding method and related equipment
CN115379191A (en) * 2022-08-22 2022-11-22 腾讯科技(深圳)有限公司 Point cloud decoding method, point cloud encoding method and related equipment
CN115379191B (en) * 2022-08-22 2024-03-19 腾讯科技(深圳)有限公司 Point cloud decoding method, point cloud encoding method and related equipment

Similar Documents

Publication Publication Date Title
KR20210040272A (en) Point cloud data transmission apparatus, point cloud data transmission method, point cloud data reception apparatus and point cloud data reception method
KR102406845B1 (en) Point cloud data transmission apparatus, point cloud data transmission method, point cloud data reception apparatus and point cloud data reception method
KR102486255B1 (en) Hybrid geometric coding of point clouds
US20210407144A1 (en) Attribute parameter coding for geometry-based point cloud compression
US20230164353A1 (en) Point cloud data processing device and processing method
KR102521595B1 (en) Data processing method and device, electronic device and storage medium
WO2019068259A1 (en) Point cloud coding
US20230171431A1 (en) Device for transmitting point cloud data, method for transmitting point cloud data, device for receiving point cloud data, and method for receiving point cloud data
US20230047400A1 (en) Method for predicting point cloud attribute, encoder, decoder, and storage medium
EP4275354A1 (en) Apparatus and method for point cloud processing
TW202333112A (en) Inter prediction coding with radius interpolation for predictive geometry-based point cloud compression
US20240054685A1 (en) Point cloud decoding method, point cloud encoding method, and point cloud decoding device
TW202224432A (en) Coding of component of color attributes in geometry-based point cloud compression (g-pcc)
US20240013444A1 (en) Point cloud encoding/decoding method and apparatus based on two-dimensional regularized plane projection
WO2022100173A1 (en) Video frame compression method and apparatus, and video frame decompression method and apparatus
Zhao et al. Deep multiple description coding by learning scalar quantization
US20240037799A1 (en) Point cloud coding/decoding method and apparatus, device and storage medium
US20230051431A1 (en) Method and apparatus for selecting neighbor point in point cloud, encoder, and decoder
US20240087174A1 (en) Coding and decoding point cloud attribute information
WO2024065269A1 (en) Point cloud encoding and decoding method and apparatus, device, and storage medium
WO2022257150A1 (en) Point cloud encoding and decoding methods and apparatus, point cloud codec, and storage medium
US20230345008A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
US20230232042A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
WO2023127052A1 (en) Decoding device, encoding device, decoding program, encoding program, decoding method, and encoding method
WO2024065270A1 (en) Point cloud encoding method and apparatus, point cloud decoding method and apparatus, devices, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18864475

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18864475

Country of ref document: EP

Kind code of ref document: A1