WO2019081087A1

WO2019081087A1 - A method for volumetric video encoding and decoding

Info

Publication number: WO2019081087A1
Application number: PCT/EP2018/071091
Authority: WO
Inventors: Payman Aflaki Beni; Jaakko Olli Taavetti KERÄNEN
Original assignee: Nokia Technologies Oy
Priority date: 2017-10-26
Filing date: 2018-08-03
Publication date: 2019-05-02
Also published as: GB201717574D0; GB2568232A

Abstract

A method comprising determining a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels; determining an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; determining the branch nodes having at least one voxel assigned to the branch node; determining the leaf nodes having at least one voxel assigned to the leaf node; and selecting either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.

Description

A METHOD FOR VOLUMETRIC VIDEO ENCODING AND DECODING

TECHNICAL FIELD

[0001] The present invention relates to a method for encoding and decoding volumetric video data.

BACKGROUND

[0002] 360-degree viewing camera devices with multiple lenses per viewing direction are becoming more and more popular and affordable for both consumer and professional usage. Moreover, such multi-camera captured scenes can be reconstructed in three- dimensional (3D) if the camera location and pose information is known. Such a

reconstruction's quality and coverage may depend on the distribution of the cameras and their capture capabilities.

[0003] Volumetric video may be captured using one or more multi-camera devices (MCDs). When multiple MCDs are in use, the captured footage may be synchronized so that the MCDs provide different viewpoints in the same world. In contrast to traditional 2D/3D video, volumetric video describes a 3D model of the world where the viewer is free to move and look around to observe different parts of the world. The volumetric

presentation of the scene is constructed based on the information captured by said several MCDs.

[0004] The total amount of information to represent the scene, which is required to be encoded and transmitted, easily becomes very high. This poses significant burdens to both computational capacity of the encoder and the transmission capacity of the current broadcasting data delivery infrastructure.

SUMMARY

[0005] Now in order to at least alleviate the above problems, an enhanced encoding method is introduced herein.

[0006] A method according to a first aspect comprises determining a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels; determining an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; determining the branch nodes having at least one voxel assigned to the branch node; determining the leaf nodes having at least one voxel assigned to the leaf node; and selecting either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.

[0007] According to an embodiment, the method further comprises deciding said information to be communicated to the decoder on the basis of at least one of the following: node structure similarity of current octree to other octrees located in adjacent or further spatial locations; node structure similarity of current octree to node structure of current octree, adjacent, or further octrees in the previous times.

[0008] According to an embodiment, the information to be communicated to the decoder lacks information about content of the leaf nodes.

[0009] According to an embodiment, the method further comprises deciding said information to be communicated to the decoder on the basis of the number of nodes required to enable the decoder to create the octree presentation correctly.

[0010] According to an embodiment, the information to be communicated to the decoder comprises sequence numbers of the nodes.

[001 1 ] According to an embodiment, the method further comprises transmitting a flag to the decoder for indicating whether the information contains information about said branch nodes and leaf nodes having at least one voxel assigned to them or information about said leaf nodes having no voxel assigned to them.

[0012] According to an embodiment, the method further comprises partitioning the octree into a plurality of subtrees; and deciding the information to be communicated to the decoder subtree-specifically.

[0013] According to an embodiment, the method further comprises recording a subtree size of each branch node as an attribute of the branch node; mipmapping the branches nodes; determining a density component for mipmapped values of the branch nodes; and partitioning the octree into the plurality of subtrees based on the subtree size and the density component of the branch nodes.

[0014] According to an embodiment, the method further comprises assigning a unique I D number for each subtree; and prefixing an encoded output of a subtree with the I D number of a parent subtree and with the sequence number of the node within the parent subtree where a child subtree attaches to.

[0015] According to an embodiment, the method further comprises prefixing each subtree with voxel coordinates of its root node. [0016] The second and the third aspects relate to an apparatus and a computer readable storage medium stored with code thereon, which are arranged to carry out the above method and one or more of the embodiments related thereto. BRIEF DESCRIPTION OF THE DRAWINGS

[0017] For better understanding of the present invention, reference will now be made by way of example to the accompanying drawings in which:

[0018] Figure 1 shows an example of an octree data structure;

[0019] Figure 2 shows an example of a texture map for voxel attributes;

[0020] Figure 3 shows a flowchart of an encoding method in accordance with an embodiment;

[0021] Figure 4 shows an example of node presentation of an octree data structure utilized method in accordance with a plurality of embodiments;

[0022] Figure 5 shows another example of node presentation of an octree data structure utilized method in accordance with a plurality of embodiments;

[0023] Figure 6 shows a simplified block diagram of a system comprising a plurality of multi-camera units for generating volumetric video in accordance with an embodiment;

[0024] Figure 7 shows a schematic block diagram of an exemplary apparatus or electronic device;

[0025] Figure 8 shows an apparatus according to an example embodiment;

[0026] Figure 9 shows an example of an arrangement for wireless communication comprising a plurality of apparatuses, networks and network elements.

DETAILED DESCRIPTION OF SOME EXAMPLE EMBODIMENTS

[0027] The following embodiments are exemplary. Although the specification may refer to "an", "one", or "some" embodiment(s) in several locations, this does not necessarily mean that each such reference is to the same embodiment(s), or that the feature only applies to a single embodiment. Single features of different embodiments may also be combined to provide other embodiments.

[0028] The terms voxel and point, as in voxel array and 3D point cloud, are closely related. In general, 3D points are positioned in 3D space without constraints, e.g. in floating point coordinates, while a voxel represents a small volume of space inside a fixed 3D grid, using discrete (integer) coordinates. Voxels may be formed from the captured 3D points e.g. by merging the 3D points into voxels comprising a plurality of 3D points such that for a selected 3D point, all neighbouring 3D points within a predefined threshold from the selected 3D point are merged into a voxel without exceeding a maximum number of 3D points in a voxel.

[0029] A voxel implicitly describes a finite volume of space, such as a cube delimited by the neighbouring cells/cubes, whereas a point has no shape and a volume of zero. A point cloud may still describe a plurality of voxel-like volumes if the points are interpreted as origin points and accompanied by information about the dimensions of each volume. This makes it possible to convert a 3D point cloud to a voxel array, and vice versa.

[0030] A video codec consists of an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can decompress the compressed video representation back into a viewable form. Typically encoder discards some information in the original video sequence in order to represent the video in a more compact form (that is, at lower bitrate).

[0031 ] Volumetric video data represents a three-dimensional scene or object and can be used as input for Augmented reality (AR), Virtual Reality (VR) and Mixed Reality (MR) applications. Volumetric video data describes geometry, such as shape, size, position in 3D-space, and respective attributes (e.g. colour, opacity, reflectance, ...), plus any possible temporal changes of the geometry and attributes at given time instances, such as frames in 2D video.

[0032] Increasing computational resources and advances in 3D data acquisition devices have enabled reconstruction of highly detailed volumetric video representations of natural scenes. Infrared, lasers, time-of-flight, and structured light are all examples of devices that can be used to construct 3D video data. Representation of the 3D data depends on how the data is used. Dense Voxel arrays have been used to represent volumetric medical data. In 3D graphics polygonal meshes are extensively used. Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold. Another way to represent 3D data is to code it as a set of texture and depth maps, as is the case in the multi-view plus depth. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi-level surface maps.

[0033] In the cases of dense point clouds or voxel arrays, the reconstructed 3D scene may contain tens or even hundreds of millions of points. If such representations are to be stored or interchanged between endpoints, then efficient compression becomes essential.

[0034] The octree data structure is used extensively to encode geometry induced by the point cloud. Figure 1 shows an example of the geometry of an voxel octree, where each node represents a point/voxel. The root node is the point cloud aligned bounding box. Each node is recursively subdivided into a plurality of child nodes. In the general, the branch nodes may be subdivided into an arbitrary number of child nodes, but typically into eight child nodes. In the following examples it is assumed for sake of clarity that the number of child nodes is eight. Nodes having child nodes are called branch nodes and nodes having no child nodes are called leaf nodes. In Figure 1 , each dark node represents a branch or a leaf node, associated to at least one voxel with texture information in the volumetric presentation i.e. an occupied voxel. Only non-empty nodes continue to be subdivided. A node that contains a single point from the point cloud is only subdivided until the desired maximum resolution is reached. When a point cloud is represented as an octree, the point coordinates are quantized to integers, and the volume of space represented by the point becomes explicit: the position and dimensions of each point ultimately are represented as a node (i.e. a unique subvolume) within the octree. If the octree is converted back into a point cloud, the non-empty leaf node centre points can be used to represent points.

[0035] Each level in the octree is called a Level of Detail (LOD). Figure 1 illustrates four LODs. At each LOD, the voxel attributes (for e.g. colour, normal, reflectance) are

"mipmapped"; i.e. set to the average of the respective attributes of all the enclosed points.

LODs are generally useful for efficiency: the smaller LODs can be used for low-precision tasks that require fast performance. The octree structure is typically serialised using occupancy coding. In the serialisation method, each node starting from the root node is coded as an eight bit mask of occupied children where a one in the mask means that the child spatially contains points of the point cloud and a zero indicates that there are no points in that spatial region of the octree subdivided space.

[0036] For practical purposes and due to the characteristics of Graphics Processing Units (GPU), solid voxels may be stored in aggregate instead of one voxel per node. In other words, leaf nodes may contain a small voxel array, e.g., 2 x 2 x 2 or 4 x 4 x 4 voxels. This reduces the total number of nodes required by the octree, and allows a GPU to read the voxel values faster.

[0037] While octree based voxel coding is efficient in the compression of geometry, the opportunities to use the specific statistics of the attribute for compression diminishes when used for coding geometry combined with attributes. Therefore, the coding of voxel geometry and their attributes are typically separated. For example in the reference software framework of the development of point cloud compression technology standard in JCT1/SC29/WG1 1 (MPEG), geometry is coded using occupancy codes based on a octree data structure, while the colour attributes are mapped into a 2D sampling grid and then coded using standard JPEG. Figure 2 shows an example of 2D texture map for representing the colour attributes of the voxels. [0038] As becomes evident from the above, uncompressed dynamic volumetric video, which may include both point cloud and voxelized video, requires very large storage space/bandwidth. Each point in a 3D point clouds is characterised by at least two modalities: its geometrical structure and colour (texture). There may be other modalities such as normal, reflectance and material. Typically, but not necessarily, each modality is coded independently. The structuring of octrees needs to be carried out prior to encoding the related RGB values to the correct solid voxel in the octree presentation. Therefore, for interchange purposes, it is necessary to compress the volumetric video data.

[0039] Now in order to at least alleviate the above problems, a method for enhanced encoding of 3D volumetric data is presented hereinafter.

[0040] The method, which is disclosed in Figure 3, comprises determining (300) a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels;

determining (302) an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing a 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; determining (304) the branch nodes having at least one voxel assigned to the branch node; determining (306) the leaf nodes having at least one voxel assigned to the leaf node; and selecting (308) either information about branch nodes and leaf nodes having at least one voxel associated with them or information about leaf nodes having no voxel associated with them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.

[0041] Accordingly, step 304 determines if the current branch node is associated with at least one respective representation in the volumetric content and hence, whether or not it has an associated colour value, such as an RGB value in the texture map. It should be noted that if a branch has children, i.e. is a branch node, it may always have at least one respective voxel associated to it. Step 306 determines whether or not the leaf node has a respective representation in the volumetric content and hence, whether or not it has an associated colour value, such as an RGB value in the texture map.

[0042] Thus, two different methods for determining the structure of the octree presentation are introduced, i.e. determining the octree presentation in terms of either the branch nodes and leaf nodes having voxel presentation in the volumetric content or the leaf nodes having no voxel presentation in the volumetric content. The encoder is enabled to select either of these presentations nodes to be communicated to the decoder, whereupon the encoder makes the selection by aiming to minimize the bit rate required to transmit said information to the decoder.

[0043] The 3D space of the scene may be presented with an octree presentation, defining the location of branch and leaf nodes and the respective colour values in terms of e.g. RGB or YUV. When the structure of the octree is known, the respective colour values will be mapped to the branch nodes respectively. Based on the known structure of the octree, the branch nodes (nodes having at least one child node) can be identified and informed to the decoder. On the basis of this information, the decoder may determine the whole structure of the octree. In the other approach, the leaf nodes (nodes having no child node) can be identified and informed to the decoder. On the basis of this information, the decoder may as well determine the whole structure of the octree.

[0044] According to an embodiment, the information to be communicated to the decoder lacks information about content of the leaf nodes. As described above, the coding of voxel geometry (i.e. the structure of the octree) and their attributes (i.e. the leaf node content) are separated. For minimizing the bit rate to be transmitted to the decoder, only information about the structure of the octree is delivered to the decoder, and the leaf node contents are presented separately. The separation may provide benefits especially when considering dynamic scenes with multiple frames. In that situation, multiple frames may share the same leaf node contents while only containing information about changes in the octree structure. Thus, avoiding transmitting the same leaf node content several times reduces redundancy.

[0045] The two methods for determining the structure of the octree presentation are now illustrated more in detail using an example octree structure depicted in Figure 4. Herein, it may be assumed there is a well-defined sequential numbering (indexing) of all of the nodes present in the octree. This has practical implications with regard to how these numbers are presented as bits. To reduce the total number of bits required for the numbers, the sequential numbers may be limited, e.g., to 16 bits and a large octree may be split into several smaller octrees, each having their own internal numbering of nodes. Another approach would be to use a high number of bits for the sequential numbers for runtime use but compress the list of node numbers when writing out the data. This would make it more straightforward to handle the data in memory without wasting bits in the presentation.

[0046] The method based on determining the branch nodes, which is herein referred to as MODE1 , is best suited for sparse volumes with few solid voxels. In MODE1 , the nodes which do have at least one leaf node with voxel presentation in the volumetric content are reported and hence, it is possible to create the octree structure based on the node numbers communicated.

[0047] Figure 4 shows an example octree structure. In an octree representation of a 3D volumetric representation of a scene, the root node represents the 3D space of the scene in question on a very rough-scale level. In other words, the root node always exists, otherwise the octree does not exist in the first place. Therefore, the numbering may be started from the first child node onwards. As shown in Figure 4, on the first two levels only nodes 0, 5, 13,16, and 23 have at least one leaf node with voxel presentation in the volumetric content.

[0048] In the example of Figure 4, the third level is the lowest LOD associated with the smallest presented spatial information in the volumetric presentation i.e. the third level provides the accuracy of detail for determining the size of a voxel. On the third level, node 13 has child nodes 24, 25, 26, 27, 28, 30 and 31 as leaf nodes having voxel presentation in the volumetric content and the child node 29 as a leaf node having no voxel

presentation in the volumetric content. Similarly, node 16 has a child node 32 as a leaf node having voxel presentation in the volumetric content and child nodes 33 - 39 as leaf nodes having no voxel presentation in the volumetric content, and node 23 has child nodes 40 - 43 as leaf nodes having no voxel presentation in the volumetric content and child nodes 44 - 47 as leaf nodes having voxel presentation in the volumetric content.

[0049] In MODE1 , only the leaf nodes with a voxel presentation in the volumetric content and the branch nodes with at least one leaf node with a voxel presentation in the volumetric content are reported to the decoder, and based on this information the decoder may create the whole structure of the octree up to the lowest level of the nodes.

Thereafter, the respective colour values will be mapped from the colour table to the correct leaf nodes (each leaf node presenting a voxel). MODE1 is very beneficial in the cases where most of the 3D space presented by the octree, such as a cubic, is empty and there are not many nodes with children. Hence, the least amount of node numbers should be communicated with the decoder to present the structure of the octree.

[0050] The method based on determining the leaf nodes having no voxel presentation in the volumetric content associated to them, which is herein referred to as MODE2, is best suited for volumes with many empty voxels or many voxels without a colour representation. In MODE2, only the leaf nodes having no voxel presentation in the volumetric content associated to them are reported and hence, it is possible to create the octree structure based on the node numbers communicated.

[0051] In the example octree structure of Figure 4, nodes 1 - 4, 6 - 12, 14, 15, 17 - 22, 29, and 33 - 43 are leaf nodes having no voxel presentation in the volumetric content associated to them. Accordingly, the number of leaf nodes having no voxel presentation in the volumetric content associated to them is considerably higher for this octree structure compared to the number of branch or leaf nodes having voxel presentation in the volumetric content associated to them. Therefore, in such cases, the encoder decides to present the octree structure to the decoder by reporting the branch and leaf nodes having voxel presentation in the volumetric content.

[0052] According to an embodiment, said information to be communicated to the decoder is decided on the basis of the number of nodes required to enable the decoder to create the octree presentation correctly. The smaller the number of nodes is, the smaller is the bit rate required to transmit said information, and the mode selection may be carried out accordingly.

[0053] Figure 5 shows another example of octree structure. In Figure 5, dark nodes with a downward arrow represent a branch node having each of the eight child nodes as leaf nodes having voxel presentation in the volumetric content. Thus, the octree structure of Figure 5 comprise only five leaf nodes having no voxel presentation in the volumetric content: nodes 1 , 3, 4, 6 and 36. Since the number of branch and leaf nodes having voxel presentation in the volumetric content is considerably higher than the number of leaf nodes having no voxel presentation in the volumetric content in the octree presentation, which is typical for volumes with a plurality of non-empty voxels or voxels having a colour representation, it is more beneficial from compression point of view to represent the octree structure with leaf nodes having no voxel presentation in the volumetric content as compared to the octree structure representation with the branch and leaf nodes having voxel presentation in the volumetric content. Thus, the decision in the encoder will be to operate in MODE2 and to communicate the leaf nodes.

[0054] According to an embodiment, the information to be communicated to the decoder comprises sequence numbers of the nodes. For example, in the case of Figure 5 sequence numbers of the leaf nodes, i.e. 1 , 3, 4, 6 and 36, is be communicated to the decoder.

[0055] According to an embodiment, the method further comprises transmitting a flag to the decoder for indicating whether the information contains information about said branch and leaf nodes having voxel presentation in the volumetric content or information about said leaf nodes having no voxel presentation in the volumetric content. Thus, the communication between the encoder and decoder may include a flag, which may be communicated for each 3D space (cubic) separately informing the decoder that the communicated numbers are presenting either the branch nodes or the leaf nodes. The flag may be determined for example as follows: mode_Flag = 0 (means MODE 1 is selected - Branch and leaf nodes having voxel presentation in the volumetric content communicated)

mode_Flag = 1 (means MODE 2 is selected - Leaf nodes having no voxel presentation in the volumetric content communicated)

[0056] Since the flag is only one bit, the overhead caused by including the flag is negligible compared to the bitrate reduction achieved by introducing the two different modes, and hence, it is expected to bring bitrate reduction to the whole system.

[0057] A typical volumetric 3D scene/object is composed of several thin surfaces surrounded by empty space, due to data acquisition devices recording only the surfaces of objects and surrounding environment. Consequently, the larger nodes in the octree typically have a very low density. Volumetric scenes may also have a large total number of voxels due to being captured in high resolution or simply having large dimensions. If a single presentation mode were to be chosen for the entire octree, it would likely end up being MODE 1 (branch and leaf nodes having voxel presentation in the volumetric content communicated) due to the overall sparseness of the octree.

[0058] According to an embodiment, the method further comprises partitioning the octree into a plurality of subtrees and deciding the information to be communicated to the decoder subtree-specifically. Herein, the mode for each subtree may depend on the characteristics of the local region represented by the subtree. For example, dense regions such as nodes covering object surfaces can use the most appropriate presentation mode.

[0059] In another embodiment, the mode selection for each subtree may depend on the selected mode for the surrounding subtrees and taking into account the available prediction references from other subtrees. In this case, a rate distortion algorithm, may decide which mode should be selected to present the current subtree with least amount of information to be communicated taking into account the modes selected for the

surrounding subtrees. (Intra prediction)

[0060] In another embodiment, not only the surrounding subtrees, but also the subtrees which do not have any spatial connection to the current subtree may be taken into account. In this case, the mode selection of those subtrees may be used for the current subtree mode selection.

[0061 ] In another embodiment the mode selection for the current subtree or any adjacent subtree in a previous time stamp may be used for mode selection. In this case, the selected mode for the current subtree in time T5 may depend on the selected mode for the current subtree in at least one of the times TO, T1 , T2, T3, T4 where

T0<T1 <T2<T3<T4<T5. It should be noted that the reference selected mode may go further back than TO. [0062] The mode dependency may refer to the case where the subtree structure is similar to the one which was reported for the another/same subtree in a previous time stamp. In this case, the similarity of the branch/leaf nodes to be communicated may help in deciding which mode to select to reduce the information that should be communicated. Similarly, the mode dependency may refer to the case where the subtree structure is similar to the one which is was reported for another subtree in an adjacent (of farther away) subtree. Similarly in this case, the similarity of branch/leaf nodes to be

communicated may help in deciding which mode to select to reduce the information that should be communicated.

[0063] Due to the large number of nodes and voxels in the octree (e.g., in the order of hundreds of millions), it is beneficial for an encoder to use techniques that speed up determining which presentation mode to choose.

[0064] According to an embodiment, the partitioning may further comprise recording a subtree size of each branch node as an attribute of the branch node; mipmapping the branches nodes; determining a density component for mipmapped values of the branch nodes; and partitioning the octree into the plurality of subtrees based on the subtree size and the density component of the branch nodes.

[0065] Thus, the encoder may carry out the partitioning and quickly choose the presentation mode for each subtree, for example, as follows:

[0066] During the construction of the octree, the subtree size of each branch node is recorded as an attribute of the branch node. The subtree size is equal to the total number of child nodes that the branch contains.

[0067] Branches are mipmapped during the construction of the octree. That is, the attributes of child nodes are averaged, and the average values (e.g., colour) are used as the attributes of the branch node. The Alpha component of the mipmapped colours can be used to keep track of the density of each node. The Alpha of a solid voxel is considered 1 .0. Therefore, the Alpha of a node is a value between 0.0 and 1.0, depending on how many solid voxels it contains.

[0068] The octree is partitioned by looking at nodes' subtree size and Alpha (density) attributes. The operation is started at the root node.

a. Low density and large size means that the encoder should look for a smaller node within the node children.

b. Low density and small size means that MODE 1 (branch and leaf nodes having voxel presentation in the volumetric content communicated) is chosen.

c. High density and large size means that MODE 2 (leaf nodes having no voxel presentation in the volumetric content communicated) is chosen, unless the subtree size exceeds the range of the index numbering. In the latter case, the encoder should look for a smaller node within the node children,

d. High density and small size means that MODE 2 (leaf nodes having no voxel presentation in the volumetric content communicated) is chosen.

e. If none of the above is applicable, the operation will recursively continue to the children of the node. It is noted that the actual values for "low/high" density and "small/large" branches may vary depending on the situation and they may be tuned as encoder parameters.

[0069] After decoding, the subtrees are again connected together into one large octree.

[0070] According to an embodiment, the partitioning may further comprise assigning a unique ID number for each subtree; and prefixing an encoded output of a subtree with the ID number of the parent subtree and with the sequence number of the node within the parent subtree where the child subtree attaches to.

[0071] According to another embodiment, the partitioning may further comprise prefixing each subtree with the voxel coordinates of its root node.

[0072] Both of the above embodiments provide necessary information to the decoder for reconstructing the subtrees into one large octree, however the latter embodiment may require more bits than ID number and parent index.

[0073] As becomes evident from the above, significant advantages may be obtained through one or more of the disclosed embodiments. The method improves the

compression efficiency in both storing and delivering volumetric data. The method enables the encoder to determine the most efficient presentation mode of an octree structure and inform the decoder about the necessary data, thereby reducing the bandwidth

consumption of the network and simplifying the operation of the decoder. The

embodiments utilize the separation of the octree presentation into the structure of the octree and the attributes, and provide a straightforward improved in the compression efficiency through the simplification of presentation of the structure of the octree only. The partitioning of the octree into subtrees according to embodiments enables the encoder to choose the most appropriate presentation mode for each subtree, thereby further enhancing the compression efficiency.

[0074] As mentioned above, volumetric video may be captured using one or more multi-camera devices (MCDs). Figure 6 is a simplified block diagram of a system 200 comprising a plurality of multi-camera units 130, 140, 150. Although Figure 6 only depicts three multi-camera unit 130, 140, 150, the system may have two multi-camera units 130, 140 or more than three multi-camera units. [0075] It is assumed that the system 200 has information about the location and orientation of each of the multi-camera units 130, 140, 150 of the system. The location and orientation information may have been stored into a camera database 210. This information may have been entered manually or the system 200 may comprise elements which can determine the location and orientation of each of the multi-camera units 130, 140, 150 of the system. If the location and/or the orientation of any of the multi-camera units 130, 140, 150 changes, the changed location and/or orientation information may be updated in the camera database 210. The system 200 may be controlled by a controller 202, which may be a server or another appropriate element capable of communicating with the multi-camera units 130, 140, 150 and the camera database 210.

[0076] The location and/or the orientation of the multi-camera units 130, 140, 150 may not be stored into the database 210 but only to each individual multi-camera unit 130, 140, 150. Hence, the location and/or the orientation of the multi-camera units 130, 140, 150 may be requested from the multi-camera units 130, 140, 150 when needed. As an example, if the first multi-camera unit 130 needs to know the location and orientation of second multi-camera unit 130, the first multi-camera unit 130 may request that information from the second multi-camera unit 140. If some information regarding the second multi- camera unit 140 is still needed, the first multi-camera unit 130 may request the missing information from the controller 202, for example.

[0077] The multi-camera system, as disclosed in Figure 6, may be used to reconstruct multi-camera captured scenes in 3D if the camera locations and pose information are accurately known. Such a reconstruction's quality and coverage depends on the distribution of the cameras and their capture capabilities. Volumetric video may be captured using one or more multi-camera devices (MCDs). When multiple MCDs are in use, the captured footage may be synchronized in the controller 202 so that the MCDs provide different viewpoints in the same world. In contrast to traditional 2D/3D video, volumetric video describes a 3D model of the world where the viewer is free to move and look around to observe different parts of the world.

[0078] The following describes in further detail suitable apparatus and possible mechanisms for implementing the embodiments of the invention. In this regard reference is first made to Figure 7 which shows a schematic block diagram of an exemplary apparatus or electronic device 50 depicted in Figure 8, which may incorporate a controller according to an embodiment of the invention.

[0079] The electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system. However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may require transmission of radio frequency signals.

[0080] The apparatus 50 may comprise a housing 30 for incorporating and protecting the device. The apparatus 50 further may comprise a display 32 in the form of a liquid crystal display. In other embodiments of the invention the display may be any suitable display technology suitable to display an image or video. The apparatus 50 may further comprise a keypad 34. In other embodiments of the invention any suitable data or user interface mechanism may be employed. For example the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display. The apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input. The apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection. The apparatus 50 may also comprise a battery 40 (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator). The term battery discussed in connection with the

embodiments may also be one of these mobile energy devices. Further, the apparatus 50 may comprise a combination of different kinds of energy devices, for example a rechargeable battery and a solar cell. The apparatus may further comprise an infrared port 41 for short range line of sight communication to other devices. In other embodiments the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/FireWire wired connection.

[0081 ] The apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50. The controller 56 may be connected to memory 58 which in embodiments of the invention may store both data and/or may also store instructions for implementation on the controller 56. The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller 56.

[0082] The apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a universal integrated circuit card (UICC) reader and a universal integrated circuit card for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.

[0083] The apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network. The apparatus 50 may further comprise an antenna 60 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).

[0084] In some embodiments of the invention, the apparatus 50 comprises a camera 42 capable of recording or detecting imaging.

[0085] With respect to Figure 9, an example of a system within which embodiments of the present invention can be utilized is shown. The system 10 comprises multiple communication devices which can communicate through one or more networks. The system 10 may comprise any combination of wired and/or wireless networks including, but not limited to a wireless cellular telephone network (such as a global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), long term evolution (LTE) based network, code division multiple access (CDMA) network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.

[0086] For example, the system shown in Figure 8 shows a mobile telephone network 1 1 and a representation of the internet 28. Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.

[0087] The example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (I MD) 18, a desktop computer 20, a notebook computer 22, a tablet computer. The apparatus 50 may be stationary or mobile when carried by an individual who is moving.

[0088] Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24. The base station 24 may be connected to a network server 26 that allows

communication between the mobile telephone network 1 1 and the internet 28. The system may include additional communication devices and communication devices of various types.

[0089] The communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access

(FDMA), transmission control protocol-internet protocol (TCP-I P), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.1 1 , Long Term Evolution wireless communication technique (LTE) and any similar wireless communication technology. A communications device involved in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection. In the following some example implementations of apparatuses utilizing the present invention will be described in more detail.

[0090] Although the above examples describe embodiments of the invention operating within a wireless communication device, it would be appreciated that the invention as described above may be implemented as a part of any apparatus comprising a circuitry in which radio frequency signals are transmitted and received. Thus, for example, embodiments of the invention may be implemented in a mobile phone, in a base station, in a computer such as a desktop computer or a tablet computer comprising radio frequency communication means (e.g. wireless local area network, cellular radio, etc.).

[0091] In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits or any combination thereof. While various aspects of the invention may be illustrated and described as block diagrams or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

[0092] Thus, without limiting the scope to any specific implementation means, an apparatus according to an aspect comprises means for determining a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels; means for determining an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; means for determining the branch nodes having at least one voxel assigned to the branch node; means for determining the leaf nodes having at least one voxel assigned to the leaf node; and means for selecting either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information. [0093] If implemented at least partly as software, an aspect may comprise, for example, a computer memory system connectable to a processor and having one or more programmable operational characteristics, said characteristics being defined through configuration by said computer based on the type of said processor, wherein said system is connectable to said processor by a bus, wherein a programmable operational characteristic of said system are configured to determine a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels; determine an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node; determine the branch nodes having at least one voxel assigned to the branch node; determine the leaf nodes having at least one voxel assigned to the leaf node; and select either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.

[0094] Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

[0095] Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

[0096] The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention.

Claims

CLAIMS:

1 . A method comprising:

determining a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels;

determining an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node;

determining the branch nodes having at least one voxel assigned to the branch node;

determining the leaf nodes having at least one voxel assigned to the leaf node; and

selecting either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.

2. The method according to claim 1 , further comprising

deciding said information to be communicated to the decoder on the basis of at least one of the following:

node structure similarity of current octree to other octrees located in adjacent or further spatial locations;

node structure similarity of current octree to node structure of current octree, adjacent, or further octrees in the previous times.

3. The method according to claim 1 or 2, wherein the information to be communicated to the decoder lacks information about content of the leaf nodes..

4. The method according to any preceding claim, further comprising

deciding said information to be communicated to the decoder on the basis of the number of nodes required to enable the decoder to create the octree presentation correctly.

5. The method according to any preceding claim, wherein the information to be communicated to the decoder comprises sequence numbers of the nodes.

6. The method according to any preceding claim, further comprising

transmitting a flag to the decoder for indicating whether the information contains information about said branch nodes and leaf nodes having at least one voxel assigned to them or information about said leaf nodes having no voxel assigned to them.

7. The method according to any preceding claim, further comprising

partitioning the octree into a plurality of subtrees; and

deciding the information to be communicated to the decoder subtree- specifically.

8. The method according to claim 7, further comprising

recording a subtree size of each branch node as an attribute of the branch node;

mipmapping the branches nodes;

determining a density component for mipmapped values of the branch nodes; and

partitioning the octree into the plurality of subtrees based on the subtree size and the density component of the branch nodes.

9. The method according to claim 7 or 8, further comprising

assigning a unique ID number for each subtree; and

prefixing an encoded output of a subtree with the ID number of a parent subtree and with the sequence number of the node within the parent subtree where a child subtree attaches to.

10. The method according to claim 7 or 8, further comprising

prefixing each subtree with voxel coordinates of its root node.

1 1 . An apparatus comprising

at least one processor and at least one memory, said at least one memory stored with code thereon, which when executed by said at least one processor, causes the apparatus to: determine a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels;

determine an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node

representing the 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node;

determine the branch nodes having at least one voxel assigned to the branch node;

determine the leaf nodes having at least one voxel assigned to the leaf node; and

select either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.

12. The apparatus according to claim 1 1 , further comprising code, which when executed by said at least one processor, causes the apparatus to

decide said information to be communicated to the decoder on the basis of at least one of the following:

13. The apparatus according to claim 1 1 or 12, wherein the information to be communicated to the decoder lacks information about content of the leaf nodes..

14. The apparatus according to any of claims 1 1 - 13, further comprising code, which when executed by said at least one processor, causes the apparatus to

decide said information to be communicated to the decoder on the basis of the number of nodes required to enable the decoder to create the octree presentation correctly.

15. The apparatus according to any of claims 1 1 - 14, wherein the information to be communicated to the decoder comprises sequence numbers of the nodes.

16. The apparatus according to any of claims 1 1 - 15, further comprising code, which when executed by said at least one processor, causes the apparatus to

transmit a flag to the decoder for indicating whether the information contains information about said branch nodes and leaf nodes having at least one voxel assigned to them or information about said leaf nodes having no voxel assigned to them.

17. The apparatus according to any of claims 1 1 - 16, further comprising code, which when executed by said at least one processor, causes the apparatus to

partition the octree into a plurality of subtrees; and

decide the information to be communicated to the decoder subtree-specifically.

18. The apparatus according to claim 17, further comprising code, which when executed by said at least one processor, causes the apparatus to

record a subtree size of each branch node as an attribute of the branch node; mipmap the branches nodes;

determine a density component for mipmapped values of the branch nodes; and partition the octree into the plurality of subtrees based on the subtree size and the density component of the branch nodes.

19. The apparatus according to claim 17 or 18, further comprising code, which when executed by said at least one processor, causes the apparatus to

assign a unique ID number for each subtree; and

prefix an encoded output of a subtree with the ID number of a parent subtree and with the sequence number of the node within the parent subtree where a child subtree attaches to.

20. The apparatus according to claim 17 or 18, further comprising code, which when executed by said at least one processor, causes the apparatus to

prefix each subtree with voxel coordinates of its root node.

21 . A computer readable storage medium stored with code thereon for use by an apparatus, which when executed by a processor, causes the apparatus to perform the method according to at least one of claims 1 to 10.

22. An apparatus comprising means for determining a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels;

means for determining an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node;

means for determining the branch nodes having at least one voxel assigned to the branch node;

means for determining the leaf nodes having at least one voxel assigned to the leaf node; and

means for selecting either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.

23. An apparatus according to claim 22, further comprising means for performing the method according to at least one of claims 2 to 10.

24. A computer memory system connectable to a processor and having one or more programmable operational characteristics, wherein said system is connectable to said processor by a bus, wherein a programmable operational characteristic of said system are configured to

determine a three-dimensional (3D) volumetric representation of a scene as a plurality of voxels;

determine an octree presentation about the plurality of voxels, said octree presentation comprising a plurality of levels, a first level comprising a root node representing the 3D space within the scene, and each subsequent level being recursively divided into a plurality of child nodes comprising branch nodes having child nodes on the next level and leaf nodes lacking child nodes, wherein each said voxels is assigned to one node;

determine the leaf nodes having at least one voxel assigned to the leaf node; and select either information about branch nodes and leaf nodes having at least one voxel assigned to them or information about leaf nodes having no voxel assigned to them to be communicated to a decoder on the basis of minimizing bit rate required to transmit said information.