WO2019081087A1 - A method for volumetric video encoding and decoding - Google Patents

A method for volumetric video encoding and decoding

Info

Publication number
WO2019081087A1
WO2019081087A1 PCT/EP2018/071091 EP2018071091W WO2019081087A1 WO 2019081087 A1 WO2019081087 A1 WO 2019081087A1 EP 2018071091 W EP2018071091 W EP 2018071091W WO 2019081087 A1 WO2019081087 A1 WO 2019081087A1
Authority
WO
WIPO (PCT)
Prior art keywords
nodes
information
node
octree
branch
Prior art date
Application number
PCT/EP2018/071091
Other languages
French (fr)
Inventor
Payman Aflaki Beni
Jaakko Olli Taavetti KERÄNEN
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2019081087A1 publication Critical patent/WO2019081087A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/005Tree description, e.g. octree, quadtree
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/40Tree coding, e.g. quadtree, octree
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/388Volumetric displays, i.e. systems where the image is built up from picture elements distributed through a volume

Definitions

  • the present invention relates to a method for encoding and decoding volumetric video data.
  • 360-degree viewing camera devices with multiple lenses per viewing direction are becoming more and more popular and affordable for both consumer and professional usage.
  • multi-camera captured scenes can be reconstructed in three- dimensional (3D) if the camera location and pose information is known.
  • reconstruction's quality and coverage may depend on the distribution of the cameras and their capture capabilities.
  • Volumetric video may be captured using one or more multi-camera devices (MCDs). When multiple MCDs are in use, the captured footage may be synchronized so that the MCDs provide different viewpoints in the same world. In contrast to traditional 2D/3D video, volumetric video describes a 3D model of the world where the viewer is free to move and look around to observe different parts of the world. The volumetric
  • presentation of the scene is constructed based on the information captured by said several MCDs.
  • a method comprises determining a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels; determining an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; determining the branch nodes having at least one voxel assigned to the branch node; determining the leaf nodes having at least one voxel assigned to the leaf node; and selecting either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to
  • the method further comprises deciding said information to be communicated to the decoder on the basis of at least one of the following: node structure similarity of current octree to other octrees located in adjacent or further spatial locations; node structure similarity of current octree to node structure of current octree, adjacent, or further octrees in the previous times.
  • the information to be communicated to the decoder lacks information about content of the leaf nodes.
  • the method further comprises deciding said information to be communicated to the decoder on the basis of the number of nodes required to enable the decoder to create the octree presentation correctly.
  • the information to be communicated to the decoder comprises sequence numbers of the nodes.
  • the method further comprises transmitting a flag to the decoder for indicating whether the information contains information about said branch nodes and leaf nodes having at least one voxel assigned to them or information about said leaf nodes having no voxel assigned to them.
  • the method further comprises partitioning the octree into a plurality of subtrees; and deciding the information to be communicated to the decoder subtree-specifically.
  • the method further comprises recording a subtree size of each branch node as an attribute of the branch node; mipmapping the branches nodes; determining a density component for mipmapped values of the branch nodes; and partitioning the octree into the plurality of subtrees based on the subtree size and the density component of the branch nodes.
  • the method further comprises assigning a unique I D number for each subtree; and prefixing an encoded output of a subtree with the I D number of a parent subtree and with the sequence number of the node within the parent subtree where a child subtree attaches to.
  • the method further comprises prefixing each subtree with voxel coordinates of its root node.
  • the second and the third aspects relate to an apparatus and a computer readable storage medium stored with code thereon, which are arranged to carry out the above method and one or more of the embodiments related thereto.
  • Figure 1 shows an example of an octree data structure
  • Figure 2 shows an example of a texture map for voxel attributes
  • Figure 3 shows a flowchart of an encoding method in accordance with an embodiment
  • Figure 4 shows an example of node presentation of an octree data structure utilized method in accordance with a plurality of embodiments
  • Figure 5 shows another example of node presentation of an octree data structure utilized method in accordance with a plurality of embodiments
  • Figure 6 shows a simplified block diagram of a system comprising a plurality of multi-camera units for generating volumetric video in accordance with an embodiment
  • Figure 7 shows a schematic block diagram of an exemplary apparatus or electronic device
  • Figure 8 shows an apparatus according to an example embodiment
  • Figure 9 shows an example of an arrangement for wireless communication comprising a plurality of apparatuses, networks and network elements.
  • Voxels may be formed from the captured 3D points e.g. by merging the 3D points into voxels comprising a plurality of 3D points such that for a selected 3D point, all neighbouring 3D points within a predefined threshold from the selected 3D point are merged into a voxel without exceeding a maximum number of 3D points in a voxel.
  • a voxel implicitly describes a finite volume of space, such as a cube delimited by the neighbouring cells/cubes, whereas a point has no shape and a volume of zero.
  • a point cloud may still describe a plurality of voxel-like volumes if the points are interpreted as origin points and accompanied by information about the dimensions of each volume. This makes it possible to convert a 3D point cloud to a voxel array, and vice versa.
  • a video codec consists of an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can decompress the compressed video representation back into a viewable form.
  • encoder discards some information in the original video sequence in order to represent the video in a more compact form (that is, at lower bitrate).
  • Volumetric video data represents a three-dimensional scene or object and can be used as input for Augmented reality (AR), Virtual Reality (VR) and Mixed Reality (MR) applications.
  • Volumetric video data describes geometry, such as shape, size, position in 3D-space, and respective attributes (e.g. colour, opacity, reflectance, ...), plus any possible temporal changes of the geometry and attributes at given time instances, such as frames in 2D video.
  • 3D data acquisition devices have enabled reconstruction of highly detailed volumetric video representations of natural scenes.
  • Infrared, lasers, time-of-flight, and structured light are all examples of devices that can be used to construct 3D video data.
  • Representation of the 3D data depends on how the data is used.
  • Dense Voxel arrays have been used to represent volumetric medical data.
  • 3D graphics polygonal meshes are extensively used.
  • Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold.
  • Another way to represent 3D data is to code it as a set of texture and depth maps, as is the case in the multi-view plus depth. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi-level surface maps.
  • the reconstructed 3D scene may contain tens or even hundreds of millions of points. If such representations are to be stored or interchanged between endpoints, then efficient compression becomes essential.
  • the octree data structure is used extensively to encode geometry induced by the point cloud.
  • Figure 1 shows an example of the geometry of an voxel octree, where each node represents a point/voxel.
  • the root node is the point cloud aligned bounding box.
  • Each node is recursively subdivided into a plurality of child nodes.
  • the branch nodes may be subdivided into an arbitrary number of child nodes, but typically into eight child nodes. In the following examples it is assumed for sake of clarity that the number of child nodes is eight. Nodes having child nodes are called branch nodes and nodes having no child nodes are called leaf nodes.
  • each dark node represents a branch or a leaf node, associated to at least one voxel with texture information in the volumetric presentation i.e. an occupied voxel. Only non-empty nodes continue to be subdivided. A node that contains a single point from the point cloud is only subdivided until the desired maximum resolution is reached.
  • a point cloud is represented as an octree
  • the point coordinates are quantized to integers, and the volume of space represented by the point becomes explicit: the position and dimensions of each point ultimately are represented as a node (i.e. a unique subvolume) within the octree. If the octree is converted back into a point cloud, the non-empty leaf node centre points can be used to represent points.
  • Each level in the octree is called a Level of Detail (LOD).
  • Figure 1 illustrates four LODs.
  • the voxel attributes for e.g. colour, normal, reflectance
  • LODs are generally useful for efficiency: the smaller LODs can be used for low-precision tasks that require fast performance.
  • the octree structure is typically serialised using occupancy coding. In the serialisation method, each node starting from the root node is coded as an eight bit mask of occupied children where a one in the mask means that the child spatially contains points of the point cloud and a zero indicates that there are no points in that spatial region of the octree subdivided space.
  • solid voxels may be stored in aggregate instead of one voxel per node.
  • leaf nodes may contain a small voxel array, e.g., 2 x 2 x 2 or 4 x 4 x 4 voxels. This reduces the total number of nodes required by the octree, and allows a GPU to read the voxel values faster.
  • uncompressed dynamic volumetric video which may include both point cloud and voxelized video, requires very large storage space/bandwidth.
  • Each point in a 3D point clouds is characterised by at least two modalities: its geometrical structure and colour (texture). There may be other modalities such as normal, reflectance and material.
  • each modality is coded independently. The structuring of octrees needs to be carried out prior to encoding the related RGB values to the correct solid voxel in the octree presentation. Therefore, for interchange purposes, it is necessary to compress the volumetric video data.
  • the method comprises determining (300) a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels;
  • step 304 determines if the current branch node is associated with at least one respective representation in the volumetric content and hence, whether or not it has an associated colour value, such as an RGB value in the texture map. It should be noted that if a branch has children, i.e. is a branch node, it may always have at least one respective voxel associated to it. Step 306 determines whether or not the leaf node has a respective representation in the volumetric content and hence, whether or not it has an associated colour value, such as an RGB value in the texture map.
  • determining the structure of the octree presentation i.e. determining the octree presentation in terms of either the branch nodes and leaf nodes having voxel presentation in the volumetric content or the leaf nodes having no voxel presentation in the volumetric content.
  • the encoder is enabled to select either of these presentations nodes to be communicated to the decoder, whereupon the encoder makes the selection by aiming to minimize the bit rate required to transmit said information to the decoder.
  • the 3D space of the scene may be presented with an octree presentation, defining the location of branch and leaf nodes and the respective colour values in terms of e.g. RGB or YUV.
  • the respective colour values will be mapped to the branch nodes respectively.
  • the branch nodes nodes having at least one child node
  • the decoder may determine the whole structure of the octree.
  • the leaf nodes nodes having no child node
  • the decoder may as well determine the whole structure of the octree.
  • the information to be communicated to the decoder lacks information about content of the leaf nodes.
  • the coding of voxel geometry (i.e. the structure of the octree) and their attributes (i.e. the leaf node content) are separated.
  • the leaf node contents are presented separately. The separation may provide benefits especially when considering dynamic scenes with multiple frames. In that situation, multiple frames may share the same leaf node contents while only containing information about changes in the octree structure. Thus, avoiding transmitting the same leaf node content several times reduces redundancy.
  • the method based on determining the branch nodes which is herein referred to as MODE1 , is best suited for sparse volumes with few solid voxels.
  • MODE1 the nodes which do have at least one leaf node with voxel presentation in the volumetric content are reported and hence, it is possible to create the octree structure based on the node numbers communicated.
  • Figure 4 shows an example octree structure.
  • the root node represents the 3D space of the scene in question on a very rough-scale level. In other words, the root node always exists, otherwise the octree does not exist in the first place. Therefore, the numbering may be started from the first child node onwards.
  • on the first two levels only nodes 0, 5, 13,16, and 23 have at least one leaf node with voxel presentation in the volumetric content.
  • the third level is the lowest LOD associated with the smallest presented spatial information in the volumetric presentation i.e. the third level provides the accuracy of detail for determining the size of a voxel.
  • node 13 has child nodes 24, 25, 26, 27, 28, 30 and 31 as leaf nodes having voxel presentation in the volumetric content and the child node 29 as a leaf node having no voxel
  • node 16 has a child node 32 as a leaf node having voxel presentation in the volumetric content and child nodes 33 - 39 as leaf nodes having no voxel presentation in the volumetric content
  • node 23 has child nodes 40 - 43 as leaf nodes having no voxel presentation in the volumetric content and child nodes 44 - 47 as leaf nodes having voxel presentation in the volumetric content.
  • the method based on determining the leaf nodes having no voxel presentation in the volumetric content associated to them, which is herein referred to as MODE2, is best suited for volumes with many empty voxels or many voxels without a colour representation.
  • MODE2 only the leaf nodes having no voxel presentation in the volumetric content associated to them are reported and hence, it is possible to create the octree structure based on the node numbers communicated.
  • nodes 1 - 4, 6 - 12, 14, 15, 17 - 22, 29, and 33 - 43 are leaf nodes having no voxel presentation in the volumetric content associated to them. Accordingly, the number of leaf nodes having no voxel presentation in the volumetric content associated to them is considerably higher for this octree structure compared to the number of branch or leaf nodes having voxel presentation in the volumetric content associated to them. Therefore, in such cases, the encoder decides to present the octree structure to the decoder by reporting the branch and leaf nodes having voxel presentation in the volumetric content.
  • said information to be communicated to the decoder is decided on the basis of the number of nodes required to enable the decoder to create the octree presentation correctly.
  • Figure 5 shows another example of octree structure.
  • dark nodes with a downward arrow represent a branch node having each of the eight child nodes as leaf nodes having voxel presentation in the volumetric content.
  • the octree structure of Figure 5 comprise only five leaf nodes having no voxel presentation in the volumetric content: nodes 1 , 3, 4, 6 and 36.
  • the decision in the encoder will be to operate in MODE2 and to communicate the leaf nodes.
  • the information to be communicated to the decoder comprises sequence numbers of the nodes.
  • sequence numbers of the leaf nodes i.e. 1 , 3, 4, 6 and 36, is be communicated to the decoder.
  • the method further comprises transmitting a flag to the decoder for indicating whether the information contains information about said branch and leaf nodes having voxel presentation in the volumetric content or information about said leaf nodes having no voxel presentation in the volumetric content.
  • the communication between the encoder and decoder may include a flag, which may be communicated for each 3D space (cubic) separately informing the decoder that the communicated numbers are presenting either the branch nodes or the leaf nodes.
  • mode_Flag 1 (means MODE 2 is selected - Leaf nodes having no voxel presentation in the volumetric content communicated)
  • the flag is only one bit, the overhead caused by including the flag is negligible compared to the bitrate reduction achieved by introducing the two different modes, and hence, it is expected to bring bitrate reduction to the whole system.
  • a typical volumetric 3D scene/object is composed of several thin surfaces surrounded by empty space, due to data acquisition devices recording only the surfaces of objects and surrounding environment. Consequently, the larger nodes in the octree typically have a very low density. Volumetric scenes may also have a large total number of voxels due to being captured in high resolution or simply having large dimensions. If a single presentation mode were to be chosen for the entire octree, it would likely end up being MODE 1 (branch and leaf nodes having voxel presentation in the volumetric content communicated) due to the overall sparseness of the octree.
  • the method further comprises partitioning the octree into a plurality of subtrees and deciding the information to be communicated to the decoder subtree-specifically.
  • the mode for each subtree may depend on the characteristics of the local region represented by the subtree. For example, dense regions such as nodes covering object surfaces can use the most appropriate presentation mode.
  • the mode selection for each subtree may depend on the selected mode for the surrounding subtrees and taking into account the available prediction references from other subtrees.
  • a rate distortion algorithm may decide which mode should be selected to present the current subtree with least amount of information to be communicated taking into account the modes selected for the
  • not only the surrounding subtrees, but also the subtrees which do not have any spatial connection to the current subtree may be taken into account.
  • the mode selection of those subtrees may be used for the current subtree mode selection.
  • the mode selection for the current subtree or any adjacent subtree in a previous time stamp may be used for mode selection.
  • the selected mode for the current subtree in time T5 may depend on the selected mode for the current subtree in at least one of the times TO, T1 , T2, T3, T4 where
  • the mode dependency may refer to the case where the subtree structure is similar to the one which was reported for the another/same subtree in a previous time stamp. In this case, the similarity of the branch/leaf nodes to be communicated may help in deciding which mode to select to reduce the information that should be communicated. Similarly, the mode dependency may refer to the case where the subtree structure is similar to the one which is was reported for another subtree in an adjacent (of farther away) subtree. Similarly in this case, the similarity of branch/leaf nodes to be
  • communicated may help in deciding which mode to select to reduce the information that should be communicated.
  • the partitioning may further comprise recording a subtree size of each branch node as an attribute of the branch node; mipmapping the branches nodes; determining a density component for mipmapped values of the branch nodes; and partitioning the octree into the plurality of subtrees based on the subtree size and the density component of the branch nodes.
  • the encoder may carry out the partitioning and quickly choose the presentation mode for each subtree, for example, as follows:
  • the subtree size of each branch node is recorded as an attribute of the branch node.
  • the subtree size is equal to the total number of child nodes that the branch contains.
  • Branches are mipmapped during the construction of the octree. That is, the attributes of child nodes are averaged, and the average values (e.g., colour) are used as the attributes of the branch node.
  • the Alpha component of the mipmapped colours can be used to keep track of the density of each node.
  • the Alpha of a solid voxel is considered 1 .0. Therefore, the Alpha of a node is a value between 0.0 and 1.0, depending on how many solid voxels it contains.
  • the octree is partitioned by looking at nodes' subtree size and Alpha (density) attributes. The operation is started at the root node.
  • Low density and large size means that the encoder should look for a smaller node within the node children.
  • Low density and small size means that MODE 1 (branch and leaf nodes having voxel presentation in the volumetric content communicated) is chosen.
  • High density and large size means that MODE 2 (leaf nodes having no voxel presentation in the volumetric content communicated) is chosen, unless the subtree size exceeds the range of the index numbering. In the latter case, the encoder should look for a smaller node within the node children,
  • High density and small size means that MODE 2 (leaf nodes having no voxel presentation in the volumetric content communicated) is chosen.
  • the partitioning may further comprise assigning a unique ID number for each subtree; and prefixing an encoded output of a subtree with the ID number of the parent subtree and with the sequence number of the node within the parent subtree where the child subtree attaches to.
  • the partitioning may further comprise prefixing each subtree with the voxel coordinates of its root node.
  • Both of the above embodiments provide necessary information to the decoder for reconstructing the subtrees into one large octree, however the latter embodiment may require more bits than ID number and parent index.
  • the method enables the encoder to determine the most efficient presentation mode of an octree structure and inform the decoder about the necessary data, thereby reducing the bandwidth
  • embodiments utilize the separation of the octree presentation into the structure of the octree and the attributes, and provide a straightforward improved in the compression efficiency through the simplification of presentation of the structure of the octree only.
  • the partitioning of the octree into subtrees according to embodiments enables the encoder to choose the most appropriate presentation mode for each subtree, thereby further enhancing the compression efficiency.
  • volumetric video may be captured using one or more multi-camera devices (MCDs).
  • MCDs multi-camera devices
  • Figure 6 is a simplified block diagram of a system 200 comprising a plurality of multi-camera units 130, 140, 150. Although Figure 6 only depicts three multi-camera unit 130, 140, 150, the system may have two multi-camera units 130, 140 or more than three multi-camera units.
  • the system 200 has information about the location and orientation of each of the multi-camera units 130, 140, 150 of the system. The location and orientation information may have been stored into a camera database 210. This information may have been entered manually or the system 200 may comprise elements which can determine the location and orientation of each of the multi-camera units 130, 140, 150 of the system.
  • the changed location and/or orientation information may be updated in the camera database 210.
  • the system 200 may be controlled by a controller 202, which may be a server or another appropriate element capable of communicating with the multi-camera units 130, 140, 150 and the camera database 210.
  • the location and/or the orientation of the multi-camera units 130, 140, 150 may not be stored into the database 210 but only to each individual multi-camera unit 130, 140, 150. Hence, the location and/or the orientation of the multi-camera units 130, 140, 150 may be requested from the multi-camera units 130, 140, 150 when needed. As an example, if the first multi-camera unit 130 needs to know the location and orientation of second multi-camera unit 130, the first multi-camera unit 130 may request that information from the second multi-camera unit 140. If some information regarding the second multi- camera unit 140 is still needed, the first multi-camera unit 130 may request the missing information from the controller 202, for example.
  • the multi-camera system may be used to reconstruct multi-camera captured scenes in 3D if the camera locations and pose information are accurately known. Such a reconstruction's quality and coverage depends on the distribution of the cameras and their capture capabilities.
  • Volumetric video may be captured using one or more multi-camera devices (MCDs). When multiple MCDs are in use, the captured footage may be synchronized in the controller 202 so that the MCDs provide different viewpoints in the same world.
  • MCDs multi-camera devices
  • volumetric video describes a 3D model of the world where the viewer is free to move and look around to observe different parts of the world.
  • Figure 7 shows a schematic block diagram of an exemplary apparatus or electronic device 50 depicted in Figure 8, which may incorporate a controller according to an embodiment of the invention.
  • the electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system. However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may require transmission of radio frequency signals.
  • the apparatus 50 may comprise a housing 30 for incorporating and protecting the device.
  • the apparatus 50 further may comprise a display 32 in the form of a liquid crystal display.
  • the display may be any suitable display technology suitable to display an image or video.
  • the apparatus 50 may further comprise a keypad 34.
  • any suitable data or user interface mechanism may be employed.
  • the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
  • the apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input.
  • the apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection.
  • the apparatus 50 may also comprise a battery 40 (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator).
  • the apparatus 50 may comprise a combination of different kinds of energy devices, for example a rechargeable battery and a solar cell.
  • the apparatus may further comprise an infrared port 41 for short range line of sight communication to other devices.
  • the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/FireWire wired connection.
  • the apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50.
  • the controller 56 may be connected to memory 58 which in embodiments of the invention may store both data and/or may also store instructions for implementation on the controller 56.
  • the controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller 56.
  • the apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a universal integrated circuit card (UICC) reader and a universal integrated circuit card for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
  • a card reader 48 and a smart card 46 for example a universal integrated circuit card (UICC) reader and a universal integrated circuit card for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
  • a smart card 46 for example a universal integrated circuit card (UICC) reader and a universal integrated circuit card for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
  • UICC universal integrated circuit card
  • the apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network.
  • the apparatus 50 may further comprise an antenna 60 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).
  • the apparatus 50 comprises a camera 42 capable of recording or detecting imaging.
  • the system 10 comprises multiple communication devices which can communicate through one or more networks.
  • the system 10 may comprise any combination of wired and/or wireless networks including, but not limited to a wireless cellular telephone network (such as a global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), long term evolution (LTE) based network, code division multiple access (CDMA) network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.
  • GSM global systems for mobile communications
  • UMTS universal mobile telecommunications system
  • LTE long term evolution
  • CDMA code division multiple access
  • the system shown in Figure 8 shows a mobile telephone network 1 1 and a representation of the internet 28.
  • Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.
  • the example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (I MD) 18, a desktop computer 20, a notebook computer 22, a tablet computer.
  • the apparatus 50 may be stationary or mobile when carried by an individual who is moving.
  • Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24.
  • the base station 24 may be connected to a network server 26 that allows
  • the system may include additional communication devices and communication devices of various types.
  • the communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access
  • CDMA code division multiple access
  • GSM global systems for mobile communications
  • UMTS universal mobile telecommunications system
  • TDMA time divisional multiple access
  • CDMA code division multiple access
  • GSM global systems for mobile communications
  • UMTS universal mobile telecommunications system
  • TDMA time divisional multiple access
  • frequency division multiple access frequency division multiple access
  • a communications device involved in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection. In the following some example implementations of apparatuses utilizing the present invention will be described in more detail.
  • embodiments of the invention operating within a wireless communication device
  • the invention as described above may be implemented as a part of any apparatus comprising a circuitry in which radio frequency signals are transmitted and received.
  • embodiments of the invention may be implemented in a mobile phone, in a base station, in a computer such as a desktop computer or a tablet computer comprising radio frequency communication means (e.g. wireless local area network, cellular radio, etc.).
  • radio frequency communication means e.g. wireless local area network, cellular radio, etc.
  • the various embodiments of the invention may be implemented in hardware or special purpose circuits or any combination thereof. While various aspects of the invention may be illustrated and described as block diagrams or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
  • an apparatus comprises means for determining a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels; means for determining an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; means for determining the branch nodes having at least one voxel assigned to the branch node; means for determining the leaf nodes having at least one voxel assigned to the leaf node; and means for selecting either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to
  • an aspect may comprise, for example, a computer memory system connectable to a processor and having one or more programmable operational characteristics, said characteristics being defined through configuration by said computer based on the type of said processor, wherein said system is connectable to said processor by a bus, wherein a programmable operational characteristic of said system are configured to determine a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels; determine an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; determine the branch nodes having at least one voxel assigned to the branch node; determine the leaf nodes having at
  • Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
  • the design of integrated circuits is by and large a highly automated process.
  • Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
  • Programs such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre stored design modules.
  • the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A method comprising determining a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels; determining an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; determining the branch nodes having at least one voxel assigned to the branch node; determining the leaf nodes having at least one voxel assigned to the leaf node; and selecting either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.

Description

A METHOD FOR VOLUMETRIC VIDEO ENCODING AND DECODING
TECHNICAL FIELD
[0001] The present invention relates to a method for encoding and decoding volumetric video data.
BACKGROUND
[0002] 360-degree viewing camera devices with multiple lenses per viewing direction are becoming more and more popular and affordable for both consumer and professional usage. Moreover, such multi-camera captured scenes can be reconstructed in three- dimensional (3D) if the camera location and pose information is known. Such a
reconstruction's quality and coverage may depend on the distribution of the cameras and their capture capabilities.
[0003] Volumetric video may be captured using one or more multi-camera devices (MCDs). When multiple MCDs are in use, the captured footage may be synchronized so that the MCDs provide different viewpoints in the same world. In contrast to traditional 2D/3D video, volumetric video describes a 3D model of the world where the viewer is free to move and look around to observe different parts of the world. The volumetric
presentation of the scene is constructed based on the information captured by said several MCDs.
[0004] The total amount of information to represent the scene, which is required to be encoded and transmitted, easily becomes very high. This poses significant burdens to both computational capacity of the encoder and the transmission capacity of the current broadcasting data delivery infrastructure.
SUMMARY
[0005] Now in order to at least alleviate the above problems, an enhanced encoding method is introduced herein.
[0006] A method according to a first aspect comprises determining a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels; determining an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; determining the branch nodes having at least one voxel assigned to the branch node; determining the leaf nodes having at least one voxel assigned to the leaf node; and selecting either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.
[0007] According to an embodiment, the method further comprises deciding said information to be communicated to the decoder on the basis of at least one of the following: node structure similarity of current octree to other octrees located in adjacent or further spatial locations; node structure similarity of current octree to node structure of current octree, adjacent, or further octrees in the previous times.
[0008] According to an embodiment, the information to be communicated to the decoder lacks information about content of the leaf nodes.
[0009] According to an embodiment, the method further comprises deciding said information to be communicated to the decoder on the basis of the number of nodes required to enable the decoder to create the octree presentation correctly.
[0010] According to an embodiment, the information to be communicated to the decoder comprises sequence numbers of the nodes.
[001 1 ] According to an embodiment, the method further comprises transmitting a flag to the decoder for indicating whether the information contains information about said branch nodes and leaf nodes having at least one voxel assigned to them or information about said leaf nodes having no voxel assigned to them.
[0012] According to an embodiment, the method further comprises partitioning the octree into a plurality of subtrees; and deciding the information to be communicated to the decoder subtree-specifically.
[0013] According to an embodiment, the method further comprises recording a subtree size of each branch node as an attribute of the branch node; mipmapping the branches nodes; determining a density component for mipmapped values of the branch nodes; and partitioning the octree into the plurality of subtrees based on the subtree size and the density component of the branch nodes.
[0014] According to an embodiment, the method further comprises assigning a unique I D number for each subtree; and prefixing an encoded output of a subtree with the I D number of a parent subtree and with the sequence number of the node within the parent subtree where a child subtree attaches to.
[0015] According to an embodiment, the method further comprises prefixing each subtree with voxel coordinates of its root node. [0016] The second and the third aspects relate to an apparatus and a computer readable storage medium stored with code thereon, which are arranged to carry out the above method and one or more of the embodiments related thereto. BRIEF DESCRIPTION OF THE DRAWINGS
[0017] For better understanding of the present invention, reference will now be made by way of example to the accompanying drawings in which:
[0018] Figure 1 shows an example of an octree data structure;
[0019] Figure 2 shows an example of a texture map for voxel attributes;
[0020] Figure 3 shows a flowchart of an encoding method in accordance with an embodiment;
[0021] Figure 4 shows an example of node presentation of an octree data structure utilized method in accordance with a plurality of embodiments;
[0022] Figure 5 shows another example of node presentation of an octree data structure utilized method in accordance with a plurality of embodiments;
[0023] Figure 6 shows a simplified block diagram of a system comprising a plurality of multi-camera units for generating volumetric video in accordance with an embodiment;
[0024] Figure 7 shows a schematic block diagram of an exemplary apparatus or electronic device;
[0025] Figure 8 shows an apparatus according to an example embodiment;
[0026] Figure 9 shows an example of an arrangement for wireless communication comprising a plurality of apparatuses, networks and network elements.
DETAILED DESCRIPTION OF SOME EXAMPLE EMBODIMENTS
[0027] The following embodiments are exemplary. Although the specification may refer to "an", "one", or "some" embodiment(s) in several locations, this does not necessarily mean that each such reference is to the same embodiment(s), or that the feature only applies to a single embodiment. Single features of different embodiments may also be combined to provide other embodiments.
[0028] The terms voxel and point, as in voxel array and 3D point cloud, are closely related. In general, 3D points are positioned in 3D space without constraints, e.g. in floating point coordinates, while a voxel represents a small volume of space inside a fixed 3D grid, using discrete (integer) coordinates. Voxels may be formed from the captured 3D points e.g. by merging the 3D points into voxels comprising a plurality of 3D points such that for a selected 3D point, all neighbouring 3D points within a predefined threshold from the selected 3D point are merged into a voxel without exceeding a maximum number of 3D points in a voxel.
[0029] A voxel implicitly describes a finite volume of space, such as a cube delimited by the neighbouring cells/cubes, whereas a point has no shape and a volume of zero. A point cloud may still describe a plurality of voxel-like volumes if the points are interpreted as origin points and accompanied by information about the dimensions of each volume. This makes it possible to convert a 3D point cloud to a voxel array, and vice versa.
[0030] A video codec consists of an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can decompress the compressed video representation back into a viewable form. Typically encoder discards some information in the original video sequence in order to represent the video in a more compact form (that is, at lower bitrate).
[0031 ] Volumetric video data represents a three-dimensional scene or object and can be used as input for Augmented reality (AR), Virtual Reality (VR) and Mixed Reality (MR) applications. Volumetric video data describes geometry, such as shape, size, position in 3D-space, and respective attributes (e.g. colour, opacity, reflectance, ...), plus any possible temporal changes of the geometry and attributes at given time instances, such as frames in 2D video.
[0032] Increasing computational resources and advances in 3D data acquisition devices have enabled reconstruction of highly detailed volumetric video representations of natural scenes. Infrared, lasers, time-of-flight, and structured light are all examples of devices that can be used to construct 3D video data. Representation of the 3D data depends on how the data is used. Dense Voxel arrays have been used to represent volumetric medical data. In 3D graphics polygonal meshes are extensively used. Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold. Another way to represent 3D data is to code it as a set of texture and depth maps, as is the case in the multi-view plus depth. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi-level surface maps.
[0033] In the cases of dense point clouds or voxel arrays, the reconstructed 3D scene may contain tens or even hundreds of millions of points. If such representations are to be stored or interchanged between endpoints, then efficient compression becomes essential.
[0034] The octree data structure is used extensively to encode geometry induced by the point cloud. Figure 1 shows an example of the geometry of an voxel octree, where each node represents a point/voxel. The root node is the point cloud aligned bounding box. Each node is recursively subdivided into a plurality of child nodes. In the general, the branch nodes may be subdivided into an arbitrary number of child nodes, but typically into eight child nodes. In the following examples it is assumed for sake of clarity that the number of child nodes is eight. Nodes having child nodes are called branch nodes and nodes having no child nodes are called leaf nodes. In Figure 1 , each dark node represents a branch or a leaf node, associated to at least one voxel with texture information in the volumetric presentation i.e. an occupied voxel. Only non-empty nodes continue to be subdivided. A node that contains a single point from the point cloud is only subdivided until the desired maximum resolution is reached. When a point cloud is represented as an octree, the point coordinates are quantized to integers, and the volume of space represented by the point becomes explicit: the position and dimensions of each point ultimately are represented as a node (i.e. a unique subvolume) within the octree. If the octree is converted back into a point cloud, the non-empty leaf node centre points can be used to represent points.
[0035] Each level in the octree is called a Level of Detail (LOD). Figure 1 illustrates four LODs. At each LOD, the voxel attributes (for e.g. colour, normal, reflectance) are
"mipmapped"; i.e. set to the average of the respective attributes of all the enclosed points.
LODs are generally useful for efficiency: the smaller LODs can be used for low-precision tasks that require fast performance. The octree structure is typically serialised using occupancy coding. In the serialisation method, each node starting from the root node is coded as an eight bit mask of occupied children where a one in the mask means that the child spatially contains points of the point cloud and a zero indicates that there are no points in that spatial region of the octree subdivided space.
[0036] For practical purposes and due to the characteristics of Graphics Processing Units (GPU), solid voxels may be stored in aggregate instead of one voxel per node. In other words, leaf nodes may contain a small voxel array, e.g., 2 x 2 x 2 or 4 x 4 x 4 voxels. This reduces the total number of nodes required by the octree, and allows a GPU to read the voxel values faster.
[0037] While octree based voxel coding is efficient in the compression of geometry, the opportunities to use the specific statistics of the attribute for compression diminishes when used for coding geometry combined with attributes. Therefore, the coding of voxel geometry and their attributes are typically separated. For example in the reference software framework of the development of point cloud compression technology standard in JCT1/SC29/WG1 1 (MPEG), geometry is coded using occupancy codes based on a octree data structure, while the colour attributes are mapped into a 2D sampling grid and then coded using standard JPEG. Figure 2 shows an example of 2D texture map for representing the colour attributes of the voxels. [0038] As becomes evident from the above, uncompressed dynamic volumetric video, which may include both point cloud and voxelized video, requires very large storage space/bandwidth. Each point in a 3D point clouds is characterised by at least two modalities: its geometrical structure and colour (texture). There may be other modalities such as normal, reflectance and material. Typically, but not necessarily, each modality is coded independently. The structuring of octrees needs to be carried out prior to encoding the related RGB values to the correct solid voxel in the octree presentation. Therefore, for interchange purposes, it is necessary to compress the volumetric video data.
[0039] Now in order to at least alleviate the above problems, a method for enhanced encoding of 3D volumetric data is presented hereinafter.
[0040] The method, which is disclosed in Figure 3, comprises determining (300) a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels;
determining (302) an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing a 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; determining (304) the branch nodes having at least one voxel assigned to the branch node; determining (306) the leaf nodes having at least one voxel assigned to the leaf node; and selecting (308) either information about branch nodes and leaf nodes having at least one voxel associated with them or information about leaf nodes having no voxel associated with them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.
[0041] Accordingly, step 304 determines if the current branch node is associated with at least one respective representation in the volumetric content and hence, whether or not it has an associated colour value, such as an RGB value in the texture map. It should be noted that if a branch has children, i.e. is a branch node, it may always have at least one respective voxel associated to it. Step 306 determines whether or not the leaf node has a respective representation in the volumetric content and hence, whether or not it has an associated colour value, such as an RGB value in the texture map.
[0042] Thus, two different methods for determining the structure of the octree presentation are introduced, i.e. determining the octree presentation in terms of either the branch nodes and leaf nodes having voxel presentation in the volumetric content or the leaf nodes having no voxel presentation in the volumetric content. The encoder is enabled to select either of these presentations nodes to be communicated to the decoder, whereupon the encoder makes the selection by aiming to minimize the bit rate required to transmit said information to the decoder.
[0043] The 3D space of the scene may be presented with an octree presentation, defining the location of branch and leaf nodes and the respective colour values in terms of e.g. RGB or YUV. When the structure of the octree is known, the respective colour values will be mapped to the branch nodes respectively. Based on the known structure of the octree, the branch nodes (nodes having at least one child node) can be identified and informed to the decoder. On the basis of this information, the decoder may determine the whole structure of the octree. In the other approach, the leaf nodes (nodes having no child node) can be identified and informed to the decoder. On the basis of this information, the decoder may as well determine the whole structure of the octree.
[0044] According to an embodiment, the information to be communicated to the decoder lacks information about content of the leaf nodes. As described above, the coding of voxel geometry (i.e. the structure of the octree) and their attributes (i.e. the leaf node content) are separated. For minimizing the bit rate to be transmitted to the decoder, only information about the structure of the octree is delivered to the decoder, and the leaf node contents are presented separately. The separation may provide benefits especially when considering dynamic scenes with multiple frames. In that situation, multiple frames may share the same leaf node contents while only containing information about changes in the octree structure. Thus, avoiding transmitting the same leaf node content several times reduces redundancy.
[0045] The two methods for determining the structure of the octree presentation are now illustrated more in detail using an example octree structure depicted in Figure 4. Herein, it may be assumed there is a well-defined sequential numbering (indexing) of all of the nodes present in the octree. This has practical implications with regard to how these numbers are presented as bits. To reduce the total number of bits required for the numbers, the sequential numbers may be limited, e.g., to 16 bits and a large octree may be split into several smaller octrees, each having their own internal numbering of nodes. Another approach would be to use a high number of bits for the sequential numbers for runtime use but compress the list of node numbers when writing out the data. This would make it more straightforward to handle the data in memory without wasting bits in the presentation.
[0046] The method based on determining the branch nodes, which is herein referred to as MODE1 , is best suited for sparse volumes with few solid voxels. In MODE1 , the nodes which do have at least one leaf node with voxel presentation in the volumetric content are reported and hence, it is possible to create the octree structure based on the node numbers communicated.
[0047] Figure 4 shows an example octree structure. In an octree representation of a 3D volumetric representation of a scene, the root node represents the 3D space of the scene in question on a very rough-scale level. In other words, the root node always exists, otherwise the octree does not exist in the first place. Therefore, the numbering may be started from the first child node onwards. As shown in Figure 4, on the first two levels only nodes 0, 5, 13,16, and 23 have at least one leaf node with voxel presentation in the volumetric content.
[0048] In the example of Figure 4, the third level is the lowest LOD associated with the smallest presented spatial information in the volumetric presentation i.e. the third level provides the accuracy of detail for determining the size of a voxel. On the third level, node 13 has child nodes 24, 25, 26, 27, 28, 30 and 31 as leaf nodes having voxel presentation in the volumetric content and the child node 29 as a leaf node having no voxel
presentation in the volumetric content. Similarly, node 16 has a child node 32 as a leaf node having voxel presentation in the volumetric content and child nodes 33 - 39 as leaf nodes having no voxel presentation in the volumetric content, and node 23 has child nodes 40 - 43 as leaf nodes having no voxel presentation in the volumetric content and child nodes 44 - 47 as leaf nodes having voxel presentation in the volumetric content.
[0049] In MODE1 , only the leaf nodes with a voxel presentation in the volumetric content and the branch nodes with at least one leaf node with a voxel presentation in the volumetric content are reported to the decoder, and based on this information the decoder may create the whole structure of the octree up to the lowest level of the nodes.
Thereafter, the respective colour values will be mapped from the colour table to the correct leaf nodes (each leaf node presenting a voxel). MODE1 is very beneficial in the cases where most of the 3D space presented by the octree, such as a cubic, is empty and there are not many nodes with children. Hence, the least amount of node numbers should be communicated with the decoder to present the structure of the octree.
[0050] The method based on determining the leaf nodes having no voxel presentation in the volumetric content associated to them, which is herein referred to as MODE2, is best suited for volumes with many empty voxels or many voxels without a colour representation. In MODE2, only the leaf nodes having no voxel presentation in the volumetric content associated to them are reported and hence, it is possible to create the octree structure based on the node numbers communicated.
[0051] In the example octree structure of Figure 4, nodes 1 - 4, 6 - 12, 14, 15, 17 - 22, 29, and 33 - 43 are leaf nodes having no voxel presentation in the volumetric content associated to them. Accordingly, the number of leaf nodes having no voxel presentation in the volumetric content associated to them is considerably higher for this octree structure compared to the number of branch or leaf nodes having voxel presentation in the volumetric content associated to them. Therefore, in such cases, the encoder decides to present the octree structure to the decoder by reporting the branch and leaf nodes having voxel presentation in the volumetric content.
[0052] According to an embodiment, said information to be communicated to the decoder is decided on the basis of the number of nodes required to enable the decoder to create the octree presentation correctly. The smaller the number of nodes is, the smaller is the bit rate required to transmit said information, and the mode selection may be carried out accordingly.
[0053] Figure 5 shows another example of octree structure. In Figure 5, dark nodes with a downward arrow represent a branch node having each of the eight child nodes as leaf nodes having voxel presentation in the volumetric content. Thus, the octree structure of Figure 5 comprise only five leaf nodes having no voxel presentation in the volumetric content: nodes 1 , 3, 4, 6 and 36. Since the number of branch and leaf nodes having voxel presentation in the volumetric content is considerably higher than the number of leaf nodes having no voxel presentation in the volumetric content in the octree presentation, which is typical for volumes with a plurality of non-empty voxels or voxels having a colour representation, it is more beneficial from compression point of view to represent the octree structure with leaf nodes having no voxel presentation in the volumetric content as compared to the octree structure representation with the branch and leaf nodes having voxel presentation in the volumetric content. Thus, the decision in the encoder will be to operate in MODE2 and to communicate the leaf nodes.
[0054] According to an embodiment, the information to be communicated to the decoder comprises sequence numbers of the nodes. For example, in the case of Figure 5 sequence numbers of the leaf nodes, i.e. 1 , 3, 4, 6 and 36, is be communicated to the decoder.
[0055] According to an embodiment, the method further comprises transmitting a flag to the decoder for indicating whether the information contains information about said branch and leaf nodes having voxel presentation in the volumetric content or information about said leaf nodes having no voxel presentation in the volumetric content. Thus, the communication between the encoder and decoder may include a flag, which may be communicated for each 3D space (cubic) separately informing the decoder that the communicated numbers are presenting either the branch nodes or the leaf nodes. The flag may be determined for example as follows: mode_Flag = 0 (means MODE 1 is selected - Branch and leaf nodes having voxel presentation in the volumetric content communicated)
mode_Flag = 1 (means MODE 2 is selected - Leaf nodes having no voxel presentation in the volumetric content communicated)
[0056] Since the flag is only one bit, the overhead caused by including the flag is negligible compared to the bitrate reduction achieved by introducing the two different modes, and hence, it is expected to bring bitrate reduction to the whole system.
[0057] A typical volumetric 3D scene/object is composed of several thin surfaces surrounded by empty space, due to data acquisition devices recording only the surfaces of objects and surrounding environment. Consequently, the larger nodes in the octree typically have a very low density. Volumetric scenes may also have a large total number of voxels due to being captured in high resolution or simply having large dimensions. If a single presentation mode were to be chosen for the entire octree, it would likely end up being MODE 1 (branch and leaf nodes having voxel presentation in the volumetric content communicated) due to the overall sparseness of the octree.
[0058] According to an embodiment, the method further comprises partitioning the octree into a plurality of subtrees and deciding the information to be communicated to the decoder subtree-specifically. Herein, the mode for each subtree may depend on the characteristics of the local region represented by the subtree. For example, dense regions such as nodes covering object surfaces can use the most appropriate presentation mode.
[0059] In another embodiment, the mode selection for each subtree may depend on the selected mode for the surrounding subtrees and taking into account the available prediction references from other subtrees. In this case, a rate distortion algorithm, may decide which mode should be selected to present the current subtree with least amount of information to be communicated taking into account the modes selected for the
surrounding subtrees. (Intra prediction)
[0060] In another embodiment, not only the surrounding subtrees, but also the subtrees which do not have any spatial connection to the current subtree may be taken into account. In this case, the mode selection of those subtrees may be used for the current subtree mode selection.
[0061 ] In another embodiment the mode selection for the current subtree or any adjacent subtree in a previous time stamp may be used for mode selection. In this case, the selected mode for the current subtree in time T5 may depend on the selected mode for the current subtree in at least one of the times TO, T1 , T2, T3, T4 where
T0<T1 <T2<T3<T4<T5. It should be noted that the reference selected mode may go further back than TO. [0062] The mode dependency may refer to the case where the subtree structure is similar to the one which was reported for the another/same subtree in a previous time stamp. In this case, the similarity of the branch/leaf nodes to be communicated may help in deciding which mode to select to reduce the information that should be communicated. Similarly, the mode dependency may refer to the case where the subtree structure is similar to the one which is was reported for another subtree in an adjacent (of farther away) subtree. Similarly in this case, the similarity of branch/leaf nodes to be
communicated may help in deciding which mode to select to reduce the information that should be communicated.
[0063] Due to the large number of nodes and voxels in the octree (e.g., in the order of hundreds of millions), it is beneficial for an encoder to use techniques that speed up determining which presentation mode to choose.
[0064] According to an embodiment, the partitioning may further comprise recording a subtree size of each branch node as an attribute of the branch node; mipmapping the branches nodes; determining a density component for mipmapped values of the branch nodes; and partitioning the octree into the plurality of subtrees based on the subtree size and the density component of the branch nodes.
[0065] Thus, the encoder may carry out the partitioning and quickly choose the presentation mode for each subtree, for example, as follows:
[0066] During the construction of the octree, the subtree size of each branch node is recorded as an attribute of the branch node. The subtree size is equal to the total number of child nodes that the branch contains.
[0067] Branches are mipmapped during the construction of the octree. That is, the attributes of child nodes are averaged, and the average values (e.g., colour) are used as the attributes of the branch node. The Alpha component of the mipmapped colours can be used to keep track of the density of each node. The Alpha of a solid voxel is considered 1 .0. Therefore, the Alpha of a node is a value between 0.0 and 1.0, depending on how many solid voxels it contains.
[0068] The octree is partitioned by looking at nodes' subtree size and Alpha (density) attributes. The operation is started at the root node.
a. Low density and large size means that the encoder should look for a smaller node within the node children.
b. Low density and small size means that MODE 1 (branch and leaf nodes having voxel presentation in the volumetric content communicated) is chosen.
c. High density and large size means that MODE 2 (leaf nodes having no voxel presentation in the volumetric content communicated) is chosen, unless the subtree size exceeds the range of the index numbering. In the latter case, the encoder should look for a smaller node within the node children,
d. High density and small size means that MODE 2 (leaf nodes having no voxel presentation in the volumetric content communicated) is chosen.
e. If none of the above is applicable, the operation will recursively continue to the children of the node. It is noted that the actual values for "low/high" density and "small/large" branches may vary depending on the situation and they may be tuned as encoder parameters.
[0069] After decoding, the subtrees are again connected together into one large octree.
[0070] According to an embodiment, the partitioning may further comprise assigning a unique ID number for each subtree; and prefixing an encoded output of a subtree with the ID number of the parent subtree and with the sequence number of the node within the parent subtree where the child subtree attaches to.
[0071] According to another embodiment, the partitioning may further comprise prefixing each subtree with the voxel coordinates of its root node.
[0072] Both of the above embodiments provide necessary information to the decoder for reconstructing the subtrees into one large octree, however the latter embodiment may require more bits than ID number and parent index.
[0073] As becomes evident from the above, significant advantages may be obtained through one or more of the disclosed embodiments. The method improves the
compression efficiency in both storing and delivering volumetric data. The method enables the encoder to determine the most efficient presentation mode of an octree structure and inform the decoder about the necessary data, thereby reducing the bandwidth
consumption of the network and simplifying the operation of the decoder. The
embodiments utilize the separation of the octree presentation into the structure of the octree and the attributes, and provide a straightforward improved in the compression efficiency through the simplification of presentation of the structure of the octree only. The partitioning of the octree into subtrees according to embodiments enables the encoder to choose the most appropriate presentation mode for each subtree, thereby further enhancing the compression efficiency.
[0074] As mentioned above, volumetric video may be captured using one or more multi-camera devices (MCDs). Figure 6 is a simplified block diagram of a system 200 comprising a plurality of multi-camera units 130, 140, 150. Although Figure 6 only depicts three multi-camera unit 130, 140, 150, the system may have two multi-camera units 130, 140 or more than three multi-camera units. [0075] It is assumed that the system 200 has information about the location and orientation of each of the multi-camera units 130, 140, 150 of the system. The location and orientation information may have been stored into a camera database 210. This information may have been entered manually or the system 200 may comprise elements which can determine the location and orientation of each of the multi-camera units 130, 140, 150 of the system. If the location and/or the orientation of any of the multi-camera units 130, 140, 150 changes, the changed location and/or orientation information may be updated in the camera database 210. The system 200 may be controlled by a controller 202, which may be a server or another appropriate element capable of communicating with the multi-camera units 130, 140, 150 and the camera database 210.
[0076] The location and/or the orientation of the multi-camera units 130, 140, 150 may not be stored into the database 210 but only to each individual multi-camera unit 130, 140, 150. Hence, the location and/or the orientation of the multi-camera units 130, 140, 150 may be requested from the multi-camera units 130, 140, 150 when needed. As an example, if the first multi-camera unit 130 needs to know the location and orientation of second multi-camera unit 130, the first multi-camera unit 130 may request that information from the second multi-camera unit 140. If some information regarding the second multi- camera unit 140 is still needed, the first multi-camera unit 130 may request the missing information from the controller 202, for example.
[0077] The multi-camera system, as disclosed in Figure 6, may be used to reconstruct multi-camera captured scenes in 3D if the camera locations and pose information are accurately known. Such a reconstruction's quality and coverage depends on the distribution of the cameras and their capture capabilities. Volumetric video may be captured using one or more multi-camera devices (MCDs). When multiple MCDs are in use, the captured footage may be synchronized in the controller 202 so that the MCDs provide different viewpoints in the same world. In contrast to traditional 2D/3D video, volumetric video describes a 3D model of the world where the viewer is free to move and look around to observe different parts of the world.
[0078] The following describes in further detail suitable apparatus and possible mechanisms for implementing the embodiments of the invention. In this regard reference is first made to Figure 7 which shows a schematic block diagram of an exemplary apparatus or electronic device 50 depicted in Figure 8, which may incorporate a controller according to an embodiment of the invention.
[0079] The electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system. However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may require transmission of radio frequency signals.
[0080] The apparatus 50 may comprise a housing 30 for incorporating and protecting the device. The apparatus 50 further may comprise a display 32 in the form of a liquid crystal display. In other embodiments of the invention the display may be any suitable display technology suitable to display an image or video. The apparatus 50 may further comprise a keypad 34. In other embodiments of the invention any suitable data or user interface mechanism may be employed. For example the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display. The apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input. The apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection. The apparatus 50 may also comprise a battery 40 (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator). The term battery discussed in connection with the
embodiments may also be one of these mobile energy devices. Further, the apparatus 50 may comprise a combination of different kinds of energy devices, for example a rechargeable battery and a solar cell. The apparatus may further comprise an infrared port 41 for short range line of sight communication to other devices. In other embodiments the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/FireWire wired connection.
[0081 ] The apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50. The controller 56 may be connected to memory 58 which in embodiments of the invention may store both data and/or may also store instructions for implementation on the controller 56. The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller 56.
[0082] The apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a universal integrated circuit card (UICC) reader and a universal integrated circuit card for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
[0083] The apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network. The apparatus 50 may further comprise an antenna 60 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).
[0084] In some embodiments of the invention, the apparatus 50 comprises a camera 42 capable of recording or detecting imaging.
[0085] With respect to Figure 9, an example of a system within which embodiments of the present invention can be utilized is shown. The system 10 comprises multiple communication devices which can communicate through one or more networks. The system 10 may comprise any combination of wired and/or wireless networks including, but not limited to a wireless cellular telephone network (such as a global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), long term evolution (LTE) based network, code division multiple access (CDMA) network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.
[0086] For example, the system shown in Figure 8 shows a mobile telephone network 1 1 and a representation of the internet 28. Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.
[0087] The example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (I MD) 18, a desktop computer 20, a notebook computer 22, a tablet computer. The apparatus 50 may be stationary or mobile when carried by an individual who is moving.
[0088] Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24. The base station 24 may be connected to a network server 26 that allows
communication between the mobile telephone network 1 1 and the internet 28. The system may include additional communication devices and communication devices of various types.
[0089] The communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access
(FDMA), transmission control protocol-internet protocol (TCP-I P), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.1 1 , Long Term Evolution wireless communication technique (LTE) and any similar wireless communication technology. A communications device involved in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection. In the following some example implementations of apparatuses utilizing the present invention will be described in more detail.
[0090] Although the above examples describe embodiments of the invention operating within a wireless communication device, it would be appreciated that the invention as described above may be implemented as a part of any apparatus comprising a circuitry in which radio frequency signals are transmitted and received. Thus, for example, embodiments of the invention may be implemented in a mobile phone, in a base station, in a computer such as a desktop computer or a tablet computer comprising radio frequency communication means (e.g. wireless local area network, cellular radio, etc.).
[0091] In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits or any combination thereof. While various aspects of the invention may be illustrated and described as block diagrams or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
[0092] Thus, without limiting the scope to any specific implementation means, an apparatus according to an aspect comprises means for determining a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels; means for determining an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; means for determining the branch nodes having at least one voxel assigned to the branch node; means for determining the leaf nodes having at least one voxel assigned to the leaf node; and means for selecting either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information. [0093] If implemented at least partly as software, an aspect may comprise, for example, a computer memory system connectable to a processor and having one or more programmable operational characteristics, said characteristics being defined through configuration by said computer based on the type of said processor, wherein said system is connectable to said processor by a bus, wherein a programmable operational characteristic of said system are configured to determine a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels; determine an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; determine the branch nodes having at least one voxel assigned to the branch node; determine the leaf nodes having at least one voxel assigned to the leaf node; and select either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.
[0094] Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
[0095] Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
[0096] The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention.

Claims

CLAIMS:
1 . A method comprising:
determining a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels;
determining an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node;
determining the branch nodes having at least one voxel assigned to the branch node;
determining the leaf nodes having at least one voxel assigned to the leaf node; and
selecting either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.
2. The method according to claim 1 , further comprising
deciding said information to be communicated to the decoder on the basis of at least one of the following:
node structure similarity of current octree to other octrees located in adjacent or further spatial locations;
node structure similarity of current octree to node structure of current octree, adjacent, or further octrees in the previous times.
3. The method according to claim 1 or 2, wherein the information to be communicated to the decoder lacks information about content of the leaf nodes..
4. The method according to any preceding claim, further comprising
deciding said information to be communicated to the decoder on the basis of the number of nodes required to enable the decoder to create the octree presentation correctly.
5. The method according to any preceding claim, wherein the information to be communicated to the decoder comprises sequence numbers of the nodes.
6. The method according to any preceding claim, further comprising
transmitting a flag to the decoder for indicating whether the information contains information about said branch nodes and leaf nodes having at least one voxel assigned to them or information about said leaf nodes having no voxel assigned to them.
7. The method according to any preceding claim, further comprising
partitioning the octree into a plurality of subtrees; and
deciding the information to be communicated to the decoder subtree- specifically.
8. The method according to claim 7, further comprising
recording a subtree size of each branch node as an attribute of the branch node;
mipmapping the branches nodes;
determining a density component for mipmapped values of the branch nodes; and
partitioning the octree into the plurality of subtrees based on the subtree size and the density component of the branch nodes.
9. The method according to claim 7 or 8, further comprising
assigning a unique ID number for each subtree; and
prefixing an encoded output of a subtree with the ID number of a parent subtree and with the sequence number of the node within the parent subtree where a child subtree attaches to.
10. The method according to claim 7 or 8, further comprising
prefixing each subtree with voxel coordinates of its root node.
1 1 . An apparatus comprising
at least one processor and at least one memory, said at least one memory stored with code thereon, which when executed by said at least one processor, causes the apparatus to: determine a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels;
determine an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node
representing the 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node;
determine the branch nodes having at least one voxel assigned to the branch node;
determine the leaf nodes having at least one voxel assigned to the leaf node; and
select either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.
12. The apparatus according to claim 1 1 , further comprising code, which when executed by said at least one processor, causes the apparatus to
decide said information to be communicated to the decoder on the basis of at least one of the following:
node structure similarity of current octree to other octrees located in adjacent or further spatial locations;
node structure similarity of current octree to node structure of current octree, adjacent, or further octrees in the previous times.
13. The apparatus according to claim 1 1 or 12, wherein the information to be communicated to the decoder lacks information about content of the leaf nodes..
14. The apparatus according to any of claims 1 1 - 13, further comprising code, which when executed by said at least one processor, causes the apparatus to
decide said information to be communicated to the decoder on the basis of the number of nodes required to enable the decoder to create the octree presentation correctly.
15. The apparatus according to any of claims 1 1 - 14, wherein the information to be communicated to the decoder comprises sequence numbers of the nodes.
16. The apparatus according to any of claims 1 1 - 15, further comprising code, which when executed by said at least one processor, causes the apparatus to
transmit a flag to the decoder for indicating whether the information contains information about said branch nodes and leaf nodes having at least one voxel assigned to them or information about said leaf nodes having no voxel assigned to them.
17. The apparatus according to any of claims 1 1 - 16, further comprising code, which when executed by said at least one processor, causes the apparatus to
partition the octree into a plurality of subtrees; and
decide the information to be communicated to the decoder subtree-specifically.
18. The apparatus according to claim 17, further comprising code, which when executed by said at least one processor, causes the apparatus to
record a subtree size of each branch node as an attribute of the branch node; mipmap the branches nodes;
determine a density component for mipmapped values of the branch nodes; and partition the octree into the plurality of subtrees based on the subtree size and the density component of the branch nodes.
19. The apparatus according to claim 17 or 18, further comprising code, which when executed by said at least one processor, causes the apparatus to
assign a unique ID number for each subtree; and
prefix an encoded output of a subtree with the ID number of a parent subtree and with the sequence number of the node within the parent subtree where a child subtree attaches to.
20. The apparatus according to claim 17 or 18, further comprising code, which when executed by said at least one processor, causes the apparatus to
prefix each subtree with voxel coordinates of its root node.
21 . A computer readable storage medium stored with code thereon for use by an apparatus, which when executed by a processor, causes the apparatus to perform the method according to at least one of claims 1 to 10.
22. An apparatus comprising means for determining a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels;
means for determining an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node;
means for determining the branch nodes having at least one voxel assigned to the branch node;
means for determining the leaf nodes having at least one voxel assigned to the leaf node; and
means for selecting either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.
23. An apparatus according to claim 22, further comprising means for performing the method according to at least one of claims 2 to 10.
24. A computer memory system connectable to a processor and having one or more programmable operational characteristics, wherein said system is connectable to said processor by a bus, wherein a programmable operational characteristic of said system are configured to
determine a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels;
determine an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node;
determine the branch nodes having at least one voxel assigned to the branch node;
determine the leaf nodes having at least one voxel assigned to the leaf node; and select either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.
PCT/EP2018/071091 2017-10-26 2018-08-03 A method for volumetric video encoding and decoding WO2019081087A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1717574.6A GB2568232A (en) 2017-10-26 2017-10-26 A method for volumetric video encoding and decoding
GB1717574.6 2017-10-26

Publications (1)

Publication Number Publication Date
WO2019081087A1 true WO2019081087A1 (en) 2019-05-02

Family

ID=60580321

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/071091 WO2019081087A1 (en) 2017-10-26 2018-08-03 A method for volumetric video encoding and decoding

Country Status (2)

Country Link
GB (1) GB2568232A (en)
WO (1) WO2019081087A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021062530A1 (en) * 2019-10-01 2021-04-08 Blackberry Limited Angular mode syntax for tree-based point cloud coding
CN113544747A (en) * 2019-07-12 2021-10-22 腾讯美国有限责任公司 Method and device for geometrical merging mode of point cloud coding and decoding
CN114051730A (en) * 2019-07-01 2022-02-15 Lg 电子株式会社 Apparatus and method for processing point cloud data
CN114503586A (en) * 2019-10-03 2022-05-13 Lg电子株式会社 Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
US20220229492A1 (en) * 2019-10-09 2022-07-21 Huawei Technologies Co., Ltd. Eye gaze tracking
RU2799041C1 (en) * 2019-12-02 2023-07-03 TEНСЕНТ АМЕРИКА ЭлЭлСи Method and device for encoding point cloud

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013117001A1 (en) * 2012-02-09 2013-08-15 Thomson Licensing Efficient compression of 3d models based on octree decomposition

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100695142B1 (en) * 2004-03-08 2007-03-14 삼성전자주식회사 Adaptive nth power of 2 n-ary tree generating method, and method and apparatus for encoding and decoding 3D volume data using it

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013117001A1 (en) * 2012-02-09 2013-08-15 Thomson Licensing Efficient compression of 3d models based on octree decomposition

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BAS DADO ET AL: "Geometry and Attribute Compression for Voxel Scenes", COMPUTER GRAPHICS FORUM, vol. 35, no. 2, 1 May 2016 (2016-05-01), GB, pages 397 - 407, XP055454967, ISSN: 0167-7055, DOI: 10.1111/cgf.12841 *
CHEN H H ET AL: "A SURVEY OF CONSTRUCTION AND MANIPULATION OF OCTREES", COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, ACADEMIC PRESS, DULUTH, MA, US, vol. 43, no. 3, 1 September 1988 (1988-09-01), pages 409 - 431, XP000007912, DOI: 10.1016/0734-189X(88)90092-8 *
JULIUS KAMMERL ET AL: "Real-time compression of point cloud streams", ROBOTICS AND AUTOMATION (ICRA), 2012 IEEE INTERNATIONAL CONFERENCE ON, IEEE, 14 May 2012 (2012-05-14), pages 778 - 785, XP032450378, ISBN: 978-1-4673-1403-9, DOI: 10.1109/ICRA.2012.6224647 *
MEKURIA RUFAEL ET AL: "Design, Implementation, and Evaluation of a Point Cloud Codec for Tele-Immersive Video", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, US, vol. 27, no. 4, 1 April 2017 (2017-04-01), pages 828 - 842, XP011644783, ISSN: 1051-8215, [retrieved on 20170403], DOI: 10.1109/TCSVT.2016.2543039 *
RUWEN SCHNABEL ET AL: "Octree-Based Point Cloud Compression", 29 July 2006 (2006-07-29), pages 1 - 11, XP008150338, ISBN: 1-56881-352-X, Retrieved from the Internet <URL:http://cg.cs.uni-bonn.de/aigaion2root/attachments/schnabel-2006-octree.pdf> *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114051730A (en) * 2019-07-01 2022-02-15 Lg 电子株式会社 Apparatus and method for processing point cloud data
JP2022538857A (en) * 2019-07-01 2022-09-06 エルジー エレクトロニクス インコーポレイティド Point cloud data processing apparatus and method
JP7440546B2 (en) 2019-07-01 2024-02-28 エルジー エレクトロニクス インコーポレイティド Point cloud data processing device and method
CN113544747A (en) * 2019-07-12 2021-10-22 腾讯美国有限责任公司 Method and device for geometrical merging mode of point cloud coding and decoding
CN113544747B (en) * 2019-07-12 2023-08-29 腾讯美国有限责任公司 Method and device for geometric merging mode of point cloud coding and decoding
WO2021062530A1 (en) * 2019-10-01 2021-04-08 Blackberry Limited Angular mode syntax for tree-based point cloud coding
CN114503586A (en) * 2019-10-03 2022-05-13 Lg电子株式会社 Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
US20220229492A1 (en) * 2019-10-09 2022-07-21 Huawei Technologies Co., Ltd. Eye gaze tracking
US11899837B2 (en) * 2019-10-09 2024-02-13 Huawei Technologies Co., Ltd. Eye gaze tracking
RU2799041C1 (en) * 2019-12-02 2023-07-03 TEНСЕНТ АМЕРИКА ЭлЭлСи Method and device for encoding point cloud

Also Published As

Publication number Publication date
GB201717574D0 (en) 2017-12-13
GB2568232A (en) 2019-05-15

Similar Documents

Publication Publication Date Title
US11889113B2 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
WO2019081087A1 (en) A method for volumetric video encoding and decoding
WO2019076503A1 (en) An apparatus, a method and a computer program for coding volumetric video
KR20210040273A (en) Point cloud data transmission apparatus, point cloud data transmission method, point cloud data reception apparatus and point cloud data reception method
KR102609776B1 (en) Point cloud data processing method and device
KR20200140824A (en) Method and apparatus for encoding/decoding a point cloud representing a 3D object
US20230046917A1 (en) In-tree geometry quantization of point clouds
US20220366610A1 (en) Point cloud data processing method and apparatus
WO2022023002A1 (en) Methods and apparatus for encoding and decoding a 3d mesh as a volumetric content
US20230290006A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
KR20200140825A (en) Method and apparatus for encoding/decoding a point cloud representing a 3D object
CN115668919A (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
US10997795B2 (en) Method and apparatus for processing three dimensional object image using point cloud data
EP4228267A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
KR20220035229A (en) Method and apparatus for delivering volumetric video content
US20220321914A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
US20220191498A1 (en) Signalling Profile Level Exemptions For Immersive Video
US20230388542A1 (en) A method and apparatus for adapting a volumetric video to client devices
US20230154052A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device and point cloud data reception method
US20230362409A1 (en) A method and apparatus for signaling depth of multi-plane images-based volumetric video
US20230334719A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
US20240153147A1 (en) V3c syntax extension for mesh compression
US20230232042A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
US20230239451A1 (en) A method and apparatus for encoding and decoding volumetric content in and from a data stream
US20230281878A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device and point cloud data reception method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18750185

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18750185

Country of ref document: EP

Kind code of ref document: A1