WO2024012765A1 - Procédé, appareil et produit-programme informatique de vidéocodage et de décodage vidéo - Google Patents
Procédé, appareil et produit-programme informatique de vidéocodage et de décodage vidéo Download PDFInfo
- Publication number
- WO2024012765A1 WO2024012765A1 PCT/EP2023/064327 EP2023064327W WO2024012765A1 WO 2024012765 A1 WO2024012765 A1 WO 2024012765A1 EP 2023064327 W EP2023064327 W EP 2023064327W WO 2024012765 A1 WO2024012765 A1 WO 2024012765A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- subdivision
- triangle
- edges
- triangles
- iteration count
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000004590 computer program Methods 0.000 title claims description 25
- 238000006073 displacement reaction Methods 0.000 claims abstract description 90
- 239000013598 vector Substances 0.000 claims abstract description 73
- 230000003044 adaptive effect Effects 0.000 claims abstract description 66
- 230000011664 signaling Effects 0.000 claims description 17
- 238000013459 approach Methods 0.000 claims description 13
- 238000012805 post-processing Methods 0.000 claims description 12
- 230000007704 transition Effects 0.000 claims description 8
- 230000008569 process Effects 0.000 description 25
- 238000009877 rendering Methods 0.000 description 18
- 238000012856 packing Methods 0.000 description 11
- 238000007781 pre-processing Methods 0.000 description 9
- 230000006835 compression Effects 0.000 description 8
- 238000007906 compression Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 5
- 238000013507 mapping Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 230000002730 additional effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- FMYKJLXRRQTBOR-UBFHEZILSA-N (2s)-2-acetamido-4-methyl-n-[4-methyl-1-oxo-1-[[(2s)-1-oxohexan-2-yl]amino]pentan-2-yl]pentanamide Chemical group CCCC[C@@H](C=O)NC(=O)C(CC(C)C)NC(=O)[C@H](CC(C)C)NC(C)=O FMYKJLXRRQTBOR-UBFHEZILSA-N 0.000 description 1
- 208000037540 Alveolar soft tissue sarcoma Diseases 0.000 description 1
- 208000008524 alveolar soft part sarcoma Diseases 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- HPNSNYBUADCFDR-UHFFFAOYSA-N chromafenozide Chemical compound CC1=CC(C)=CC(C(=O)N(NC(=O)C=2C(=C3CCCOC3=CC=2)C)C(C)(C)C)=C1 HPNSNYBUADCFDR-UHFFFAOYSA-N 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 208000031509 superficial epidermolytic ichthyosis Diseases 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/04—Texture mapping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
- G06T17/205—Re-meshing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/001—Model-based coding, e.g. wire frame
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Definitions
- the present solution generally relates to encoding, signaling and rendering a volumetric video.
- Volumetric video data represents a three-dimensional scene or object.
- Such data describes geometry (shape, size, position in three-dimensional (3D) space) and respective attributes (e.g., colour, opacity, reflectance, ...), plus any possible temporal changes of the geometry and attributes at given time instances (like frames in two- dimensional (2D) video.
- Temporal information about the scene can be included in the form of individual capture instances, i.e., “frames” in 2D video, or other means, e.g., position of an object as a function of time.
- encoding, signalling and decoding process to generate a consistent connectivity and consistent geometry in transitions between areas of that are encoded with a different number of subdivision iterations.
- a set of additional coefficients are calculated and signalled along the bitstream to improve the smoothness or fidelity of the reconstructed geometry at the decoder side, especially the transitions between areas with different number of subdivision iterations.
- the method generates extra vertices during the adaptive subdivision process in the encoder, which may allow to generate a smoother geometry transition between high density (high subdivision iteration count) areas and low density (low subdivision iteration count) areas at a decoder side.
- the smoother transitions may provide a better prediction temporally and may enable better compression efficiency of the system.
- an apparatus comprising means for receiving one or more mesh frames from sequences representing volumetric video; means for generating base meshes from the mesh frames that are approximations of the original mesh frames but with less vertices; means for segmenting each base mesh into patches comprising triangles formed by edges belonging only to one triangle, edges shared by one or two triangles, and vertices connected by edges or shared edges; means for assigning each patch with a subdivision iteration count and corresponding displacement vectors; means for adaptively subdividing edges of the triangles; means for signalling the subdivision iteration count and an adaptive subdivision flag indicative of usage of the adaptive subdivision in or along the patch information; and means for encoding the sequence of base mesh frames and corresponding displacement vectors, and patch information in a bitstream.
- the apparatus comprises means for determining subdivision iteration count for the edges, wherein the subdivision iteration count is one of the following: zero, one or two or any positive integer number.
- the apparatus comprises means for creating a list of boundary triangles, which contain at least one edge connected to a triangle of another patch, wherein other triangles are inner triangles.
- the apparatus comprises means for adaptively subdividing edges of the triangles into two edges.
- the apparatus comprises means for adding additional vertices on one or more subdivided edges; means for encoding corresponding displacement vectors forthe additional vertices; and means for signalling the presence of additional displacement vectors along the bitstream.
- the apparatus comprises means for post-processing on a transition area to decimate the resulting triangulation.
- an apparatus comprising at least a processor, a memory and a computer program stored in said memory, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: receive one or more mesh frames from sequences representing volumetric video; generate base meshes from the mesh frames that are approximations of the original mesh frames but with less vertices; segment each base mesh into patches comprising triangles formed by edges belonging only to one triangle, edges shared by one or two triangles, and vertices connected by edges or shared edges; assign each patch with a subdivision iteration count and corresponding displacement vectors; and adaptively subdivide edges of the triangles; signal the subdivision iteration count and an adaptive subdivision flag indicative of usage of the adaptive subdivision in or along the patch information; and encode the sequence of base mesh frames and corresponding displacement vectors, and patch information in a bitstream.
- a method comprising receiving one or more mesh frames from sequences representing volumetric video; generating base meshes from the mesh frames that are approximations of the original mesh frames but with less vertices; segmenting each base mesh into patches comprising triangles formed by edges belonging only to one triangle, edges shared by one or two triangles, and vertices connected by edges or shared edges; assigning each patch with a subdivision iteration count and corresponding displacement vectors; and adaptively subdividing edges of the triangles; signalling the subdivision iteration count and an adaptive subdivision flag indicative of usage of the adaptive subdivision in or along the patch information; and encoding the sequence of base mesh frames and corresponding displacement vectors, and patch information in a bitstream.
- a computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to receive one or more mesh frames from sequences representing volumetric video; generate base meshes from the mesh frames that are approximations of the original mesh frames but with less vertices; segment each base mesh into patches comprising triangles formed by edges belonging only to one triangle, edges shared by one or two triangles, and vertices connected by edges or shared edges; assign each patch with a subdivision iteration count and corresponding displacement vectors; and adaptively subdivide edges of the triangles; signal the subdivision iteration count and an adaptive subdivision flag indicative of usage of the adaptive subdivision in or along the patch information; and encode the sequence of base mesh frames and corresponding displacement vectors, and patch information in a bitstream.
- an apparatus comprising means for obtaining a bitstream representing volumetric video comprising of encoded base meshes, corresponding displacement vectors and encoded patch information; means for decoding base meshes from the bitstream that are approximations of the original mesh frames but with less vertices and decoding the corresponding displacement vectors, and decoding the encoded patch information; means for extracting a subdivision iteration count and an adaptive subdivision flag from the patch information; means for reconstructing the base meshes and applying the corresponding displacement vectors to generate mesh frames; and means for adaptively subdividing edges of the triangles of the mesh frames.
- a method comprising obtaining a bitstream representing volumetric video comprising of encoded base meshes, corresponding displacement vectors and encoded patch information; decoding base meshes from the bitstream that are approximations of the original mesh frames but with less vertices and decoding the corresponding displacement vectors, and decoding the encoded patch information; extract a subdivision iteration count and an adaptive subdivision flag from the patch information; reconstructing the base meshes and apply the corresponding displacement vectors to generate mesh frames; and adaptively subdividing edges of the triangles of the mesh frames.
- an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: obtain a bitstream representing volumetric video comprising of encoded base meshes, corresponding displacement vectors and encoded patch information; decode base meshes from the bitstream that are approximations of the original mesh frames but with less vertices and decode the corresponding displacement vectors, and decoding the encoded patch information;; extract a subdivision iteration count and an adaptive subdivision flag from the patch information; reconstruct the base meshes and apply the corresponding displacement vectors to generate mesh frames; and adaptively subdividing edges of the triangles of the mesh frames.
- a computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to obtain a bitstream representing volumetric video comprising of encoded base meshes, corresponding displacement vectors and encoded patch information; decode base meshes from the bitstream that are approximations of the original mesh frames but with less vertices and decode the corresponding displacement vectors, and decoding the encoded patch information; extract a subdivision iteration count and an adaptive subdivision flag from the patch information; reconstruct the base meshes and apply the corresponding displacement vectors to generate mesh frames; and adaptively subdividing edges of the triangles of the mesh frames.
- the computer program product is embodied on a non- transitory computer readable medium.
- Fig. 1 a shows an example of volumetric media conversion at an encoder
- Fig. 1 b shows an example of a volumetric media reconstruction at a decoder
- Fig. 2 shows an example of block to patch mapping
- Figs. 3a - 3c show examples of an atlas coordinate system; a local 3D patch coordinate system; and a final target coordinate system;
- Fig. 4 shows examples of different types of elements for creation of objects with polygon meshes
- Fig. 5 shows extensions to the V-PCC encoder to support mesh encoding
- Fig. 6 shows extensions to the V-PCC decoder to support mesh decoding
- Fig. 7 illustrated an example in which each triangle is converted into four triangles by connecting the triangle edge midpoints
- Fig. 8 shows an example of multi-resolution analysis of a mesh
- Fig. 9 illustrates a pre-processing module and an encoder module according to an embodiment
- Fig. 10 shows an example of decimation, uv-atlas isocharting and subdivision surface fitting, in accordance with an approach
- Fig. 11 illustrates an intra frame encoder scheme according to an embodiment
- Fig. 12 illustrates an inter frame encoder scheme according to an embodiment
- Fig. 13 shows a decoding process according to an embodiment
- Fig. 14 illustrates a decoding process in intra mode according to an embodiment
- Fig. 15 illustrates a decoding process in inter mode according to an embodiment
- Figs. 16a — 16c illustrate an example of a non-uniform subdivision
- Fig. 16d illustrates a base mesh, the mesh subdivided once, and the mesh subdivided twice;
- Figs. 17a — 17c illustrate how enabling non-uniform subdivision may introduce cracks between triangles that belong to areas being subdivided at different levels
- Fig. 18 shows an example of two patches with different subdivision levels, and adaptive triangle splits to avoid T-junctions
- Fig. 19 illustrates an adaptive subdivision dictionary
- Fig. 20 illustrates how elongated triangles caused by adaptive triangle splits on a high-valence vertex may lead to artefacts at a rendering stage
- Figs. 21 , 22 and 23 illustrate an example of a dictionary
- Figs. 24a — 24d illustrate an example of an adaptive subdivision of prior art
- Figs. 25a — 25d illustrate an example of an adaptive subdivision according to an embodiment
- Fig. 26a is a flowchart illustrating a method according to an embodiment
- Fig. 26b is a flowchart illustrating a method according to another embodiment.
- Fig. 27 shows an apparatus according to an embodiment.
- the present embodiments relate to the encoding, signaling and rendering a volumetric video.
- 3D data acquisition devices have enabled reconstruction of highly detailed volumetric video representations of natural scenes.
- Infrared, lasers, time-of-flight and structured light are all examples of devices that can be used to construct 3D video data.
- Representation of the 3D data depends on how the 3D data is used.
- Dense Voxel arrays have been used to represent volumetric medical data.
- polygonal meshes are extensively used.
- Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold.
- Another way to represent 3D data is coding this 3D data as set of texture and depth map as is the case in the multi-view plus depth. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi-level surface maps.
- Visual volumetric video comprising a sequence of visual volumetric frames, if uncompressed, may be represented by a large amount of data, which can be costly in terms of storage and transmission. This has led to the need for a high coding efficiency standard for the compression of visual volumetric data.
- V3C enables the encoding and decoding processes of a variety of volumetric media by using video and image coding technologies. This is achieved through first a conversion of such media from their corresponding 3D representation to multiple 2D representations, also referred to as V3C video components, before coding such information.
- Such representations may include occupancy, geometry, and attribute components.
- the occupancy component can inform a V3C decoding and/or rendering system of which samples in the 2D components are associated with data in the final 3D representation.
- the geometry component contains information about the precise location of 3D data in space, while attribute components can provide additional properties, e.g., texture or material information, of such 3D data.
- Figures 1a and 1 b An example is shown in Figures 1a and 1 b, where Figure 1a presents volumetric media conversion at an encoder, and where Figure 1 b presents volumetric media reconstruction at a decoder side.
- the 3D media is converted to a series of 2D representations: occupancy, geometry, and attributes. Additional information may also be included in the bitstream to enable inverse reconstruction.
- An atlas consists of multiple elements, named as patches. Each patch identifies a region in all available 2D components and contains information necessary to perform the appropriate inverse projection of this region back to the 3D space. The shape of such regions is determined through a 2D bounding box associated with each patch as well as their coding order. The shape of these regions is also further refined after the consideration of the occupancy information. Atlases may be partitioned into patch packing blocks of equal size.
- FIG. 2 shows an example of block to patch mapping with 4 projected patches onto an atlas when asps_patch_precedence_order_flag is equal to 0. Projected points are represented with dark grey. The area that does not contain any projected points is represented with light grey. Patch packing blocks are represented with dashed lines. The number inside each patch packing block represents the patch index of the patch to which it is mapped.
- Axes orientations are specified for internal operations. For instance, the origin of the atlas coordinates is located on the top-left corner of the atlas frame. For the reconstruction step, an intermediate axes definition for a local 3D patch coordinate system is used. The 3D local patch coordinate system is then converted to the final target 3D coordinate system using appropriate transformation steps.
- Figure 3a shows an example of a single patch packed onto an atlas image.
- This patch is then converted to a local 3D patch coordinate system (U, V, D) defined by the projection plane with origin O’, tangent (U), bi-tangent (V), and normal (D) axes.
- the projection plane is equal to the sides of an axis-aligned 3D bounding box, as shown in Figure 3b.
- the location of the bounding box in the 3D model coordinate system defined by a left-handed system with axes (X, Y, Z), can be obtained by adding offsets TilePatch3dOffsetU, TilePatch3DOffsetV, and TilePatch3DOffsetD, as illustrated in Figure 3c.
- V3C may be used by applications targeting volumetric content.
- One of such applications is MPEG immersive video (MIV) (ISO/IEC 23090- 12).
- MIV enables volumetric video coding for applications in which a scene is recorded with multiple RGB(D) (red, green, blue, and optionally depth) cameras with overlapping fields of view (FoVs).
- RGB(D) red, green, blue, and optionally depth
- FoVs fields of view
- One example setup is a linear array of cameras pointing towards a scene. This multiscopic view of the scene allows a 3D reconstruction and therefore 6DoF/3DoF+ consumption.
- MIV uses the patch data unit concept from V3C and extends it by using camera views for reprojection.
- Coded V3C video components are referred to in this disclosure as video bitstreams, while a coded atlas is referred to as the atlas bitstream.
- Video bitstreams and atlas bitstreams may be further split into smaller units, referred to here as video and atlas sub-bitstreams, respectively, and may be interleaved together, after the addition of appropriate delimiters, to construct a V3C bitstream.
- V3C patch information is contained in atlas bitstream, atlas_sub_bitstream(), which contains a sequence of NAL (Network Abstraction Layer) units.
- NAL unit is specified to format data and provide header information in a manner appropriate for conveyance on a variety of communication channels or storage media. All data are contained in NAL units, each of which contains an integer number of bytes.
- a NAL unit specifies a generic format for use in both packet-oriented and bitstream systems. The format of NAL units for both packet-oriented transport and sample streams is identical except that in the sample stream format specified in Annex D of ISO/IEC 23090-5 each NAL unit can be preceded by an additional element that specifies the size of the NAL unit.
- NAL units in atlas bitstream can be divided to atlas coding layer (ACL) and non-atlas coding layer (non-ACL) units.
- ACL atlas coding layer
- non-ACL non-atlas coding layer
- nal_unit_type specifies the type of the RBSP data structure contained in the NAL unit as specified in Table 4 of ISO/IEC 23090-5.
- nal_layer_id specifies the identifier of the layer to which an ACL NAL unit belongs or the identifier of a layer to which a non-ACL NAL unit applies.
- the value of naljayerjd shall be in the range of 0 to 62, inclusive.
- the value of 63 may be specified in the future by ISO/IEC. Decoders conforming to a profile specified in Annex A of ISO/IEC 23090- 5 shall ignore (i.e.
- rbsp_byte[ i ] is the i-th byte of an RBSP.
- An RBSP is specified as an ordered sequence of bytes as follows:
- the RBSP contains a string of data bits (SODB) as follows:
- the RBSP is also empty.
- the RBSP contains the SODB as follows: o
- the first byte of the RBSP contains the first (most significant, left-most) eight bits of the SODB; the next byte of the RBSP contains the next eight bits of the SODB, etc., until fewer than eight bits of the SODB remain.
- o The rbsp_trailing_bits( ) syntax structure is present after the SODB as follows: ⁇
- the first (most significant, left-most) bits of the final RBSP byte contain the remaining bits of the SODB (if any).
- the next bit consists of a single bit equal to 1 (i.e., rbsp_stop_one_bit).
- One or more cabac_zero_word 16-bit syntax elements equal to 0x0000 may be present in some RBSPs after the rbsp_trailing_bits( ) at the end of the RBSP.
- Syntax structures having these RBSP properties are denoted in the syntax tables using an "_rbsp" suffix. These structures are carried within NAL units as the content of the rbsp_byte[ i ] data bytes. As an example the following may be considered as typical content:
- Atlas_frame_parameter_set_rbsp( ) which is used to carry parameters related to atlas on a frame level and are valid for one or more atlas frames.
- sei_rbsp( ) used to carry SEI (Supplemental Enhancement Information) messages in NAL units.
- the decoder can extract the SODB from the RBSP by concatenating the bits of the bytes of the RBSP and discarding the rbsp_stop_one_bit, which is the last (least significant, right-most) bit equal to 1 , and discarding any following (less significant, farther to the right) bits that follow it, which are equal to 0.
- the data necessary for the decoding process is contained in the SODB part of the RBSP.
- atlas_tile_group_layer_rbsp() contains metadata information for a list off tile groups, which represent sections of frame. Each tile group may contain several patches for which the metadata syntax is described below.
- Annex F of V3C V-PCC specification (23090-5) describes different SEI messages that have been defined for V3C MIV purposes. SEI messages assist in processes related to decoding, reconstruction, display, or other purposes. Annex F (23090-5) defines two types of SEI messages: essential and non-essential. V3C SEI messages are signalled in sei_rspb() which is documented below.
- Non-essential SEI messages are not required by the decoding process. Conforming decoders are not required to process this information for output order conformance.
- non-essential SEI messages When present in the bitstream, non-essential SEI messages shall obey the syntax and semantics as specified in Annex F (23090-5). When the content of a non-essential SEI message is conveyed for the application by some means other than presence within the bitstream, the representation of the content of the SEI message is not required to use the same syntax specified in annex F (23090-5). For the purpose of counting bits, only the appropriate bits that are actually present in the bitstream are counted.
- Essential SEI messages are an integral part of the V3C bitstream and should not be removed from the bitstream.
- the essential SEI messages are categorized into two types:
- Type-A essential SEI messages These SEIs contain information required to check bitstream conformance and for output timing decoder conformance. Every V3C decoder conforming to point A should not discard any relevant Type- A essential SEI messages and shall consider them for bitstream conformance and for output timing decoder conformance.
- Type-B essential SEI messages V3C decoders that wish to conform to a particular reconstruction profile should not discard any relevant Type-B essential SEI messages and shall consider them for 3D point cloud reconstruction and conformance purposes.
- V3C specification was designed so that amendments or new editions can be created in the future.
- a number of fields for future extensions to parameter sets were reserved.
- V3C introduced an extension in VPS related to MIV and packed video component.
- a polygon mesh is a collection of vertices, edges and faces that defines the shape of a polyhedral object in 3D computer graphics and solid modelling.
- the faces usually consist of triangles (triangle mesh), quadrilaterals (quads), or other simple convex polygons (n-gons), since this simplifies rendering, but may also be more generally composed of concave polygons, or even polygons with holes.
- Objects created with polygon meshes are represented by different types of elements. These include vertices, edges, faces, polygons and surfaces as shown in Fig. 4. In many applications, only vertices, edges and either faces or polygons are stored.
- Polygon meshes are defined by the following elements:
- Vertex A position in 3D space defined as (x, y, z) along with other information such as color (r, g, b), normal vector and texture coordinates.
- Edge A connection between two vertices.
- Face A closed set of edges, in which a triangle face has three edges, and a quad face has four edges.
- a polygon is a coplanar set of faces. In systems that support multi-sided faces, polygons and faces are equivalent. Mathematically a polygonal mesh may be considered an unstructured grid, or undirected graph, with additional properties of geometry, shape and topology.
- Groups Some mesh formats contain groups, which define separate elements of the mesh, and are useful for determining separate sub-objects for a Skeletal animation or separate actors for non-skeletal animation.
- UV coordinates Most mesh formats also support some form of UV coordinates which are a separate 2D representation of the mesh “unfolded” to show what portion of a 2-dimensional texture map applies to different polygons of the mesh. It is also possible for meshes to contain other vertex attribute information such as color, tangent vectors, weight maps to control animation, etc. (sometimes also called channels).
- V-PCC and MIV When rendering volumetric content, a choice can be made to either render the content as points or meshes.
- the position of the point and the colour of the point are defined by the geometry and texture maps (i.e., V3C video components) encoded as video.
- This type of content should be ideally rendered as points, considering that every encoded pixel represents a vertex and its colour.
- mobile GPU architecture is not ideal for rendering point primitives, due to high number of points required to represent solid surfaces. Instead, better quality and performance is achieved through mesh-based representations, where a surface is described more conservatively using fewer vertices that are connected through faces. High visual detail is preserved through a high-resolution texture map.
- Triangle mesh projection and rasterization is very efficient in mobile GPUs and provides a suitable trade-off for faster and more efficient rendering in exchange for reduction in geometric detail.
- Current generation HW is highly optimized for such pipelines and mesh-based rendering remains dominant.
- the quality gains for meshbased rendering are even more significant on mobile HW, where the limited capacity of batteries and the constrained cooling capabilities set additional limitations in terms of power consumption.
- Figure 5 and Figure 6 show the extensions to the V-PCC encoder 500 and decoder 600 to support mesh encoding and mesh decoding, respectively.
- the input mesh data is demultiplexed 502 into vertex coordinate+attributes and vertex connectivity.
- the vertex coordinate+attributes data is coded using MPEG-I V-PCC encoder 504, whereas the vertex connectivity data is coded as auxiliary data. Both of these, i.e. coded vertex coordinate+attributes data and the vertex connectivity data, are multiplexed 506 to create the final compressed output bitstream 508.
- Vertex ordering 510 is carried out on the reconstructed vertex coordinates at the output of MPEG-I V-PCC to reorder the vertices for optimal vertex connectivity encoding 512.
- the input bitstream is demultiplexed 602 to separate the compressed bitstreams for vertex coordinates+attributes and vertex connectivity.
- the vertex coordinates+attributes data is decoded using MPEG-I V-PCC decoder 604.
- Vertex reordering 606 is carried out on the reconstructed vertex coordinates at the output of MPEG-I V-PCC decoder 604 to match the vertex order at the encoder.
- the vertex connectivity data is also decoded 608 and everything is multiplexed 610 to generate the reconstructed mesh 612.
- Mesh data may be compressed directly without projecting it into 2D-planes, like in V- PCC based mesh coding.
- the main idea of an algorithm is to traverse mesh triangles in a deterministic way so that each new triangle is encoded next to an already encoded triangle. This may enable prediction of vertex specific information from the previously encoded data by adding delta to the previous data. Symbols may be utilized to signal how each new triangle is connected to the previously encoded part of the mesh. Connecting triangles in such a way results on average in 1 to 2 bits per triangle when combined with existing binary encoding techniques.
- V-DMC video-based dynamic mesh coding
- the deformed mesh obtained by m n i + d n i, i.e., by adding the displacement vectors to the subdivided mesh vertices generates the best approximation of the original mesh at that resolution, given the base mesh and prior subdivision levels.
- the displacement vectors may undergo a lazy wavelet transform prior to compression.
- the attribute map of the original mesh is transferred to the deformed mesh at the highest resolution (i.e., subdivision level) such that texture coordinates are obtained for the deformed mesh and a new attribute map is generated.
- the encoding process can be separated into two main modules: the pre-processing module and the actual encoder module as illustrated in Fig. 9, in accordance with an approach.
- the encoder is composed of a pre-processing module that generates a base mesh and the displacement vectors, given the input mesh sequence and its attribute maps.
- the encoder module generates the compressed bitstream by ingesting the inputs and outputs of the pre-processing module.
- the pre-processing mainly comprises three steps: decimation (reducing the original mesh resolution to produce a base mesh), uv-atlas isocharting (creating a parameterization of the base mesh) and the subdivision surface fitting as illustrated in Figure 10, in accordance with an approach.
- Figure 12 illustrates an inter frame encoder scheme, similar to the intra case, but with the base mesh connectivity being constrained for all frames of a group of frames.
- a motion encoder is used to efficiently encode displacements between base meshes compared to the base mesh of the first frame of the group of frames.
- the base mesh connectivity of the first frame of a group of frames is imposed to the subsequent frame’s base meshes to improve compression performance.
- a sub-bitstream that contains all metadata required to decode and reconstruct the mesh sequence based on the aforementioned sub-bitstreams.
- the signalling of the metadata is based on the V3C syntax and includes necessary extensions that are specific to meshes.
- the decoding process is illustrated on Figure 13, in accordance with an approach.
- First the compressed bitstream is demultiplexed and decoded by a decoder module 130 into sub-bitstreams that are reconstructed, i.e. , metadata, reconstructed base mesh, reconstructed displacements and the reconstructed attribute map data.
- the reconstruction of the mesh sequence is performed based on that data in the postprocessing module 131.
- Both the decoder module 130 and the post-processing module 131 may be controlled by an application module 132.
- FIGS 14 and 15 illustrate the decoding process in INTRA and INTER mode respectively.
- the signalling of the metadata and substreams produced by the encoder and ingested by the decoder was proposed as an extension of V3C in the technical submission to the dynamic mesh coding CfP, and should be considered as purely indicative for the moment. It is as follows and mainly consists in additional V3C unit header syntax, additional V3C unit payload syntax, and Mesh Intra patch data unit.
- V3C unit header syntax is illustrated in the following:
- V3C unit payload syntax is illustrated in the following:
- Mesh Intra patch data unit syntax is illustrated in the following:
- the dynamic mesh is encoded based on a multi-resolution approach that first encodes a low-resolution mesh, called “base mesh”, that is then enhanced in resolution and quality with mesh subdivision steps and vertex displacement vectors that may further be encoded with a lazy wavelet transform, as explained earlier in this document.
- the mesh subdivision step is performed uniformly over the whole base mesh.
- Figures 16a to 16c illustrate an example of a non-uniform subdivision.
- meshes typically do not have a uniform density of triangles, but may have denser areas to convey more detail on salient areas such as the face of a person.
- the density becomes more uniform an example of which is illustrated in Fig. 16b.
- Uniform subdivision (here represented without recomputing vertex displacements) does not lead to a similar sampling when compared to the original mesh an example of which is illustrated in Fig. 16c. It is however possible to define different iteration counts for the subdivision per triangle of the base mesh, and therefore obtain a denser sampling in selected areas of the mesh as in the original mesh.
- Figure 16d illustrates a example of a base mesh S°, the base mesh subdivided once S 1 , and the base mesh subdivided twice S 2 .
- FIG. 17a shows a base mesh triangle.
- T'O triangle is not subdivided and is now noted TO, while T'1 is subdivided into four triangles (T1 , T2, T3, T4,), creating also three new vertices (v4, v5, v6).
- the new vertex v5 does not belong to TO.
- a crack appears after quantization of vertex coordinates (after compression) due to the displacement of vertex v5.
- An empty space X (the hashed area) appears between triangles TO, T 1 and T4.
- Figure 18 shows an example of two patches with different subdivision levels, and adaptive triangle splits to avoid T-junctions.
- the lowest vertex (illustrated as a hashed circle) is connected to elongated triangles that are not optimal for texture mapping, shading and rendering.
- Figure 19 illustrates an adaptive subdivision dictionary (thicker lines): four cases may occur for a triangle with (a) three edges to be subdivided (regular case), (b) two edges to be subdivided and one not; (c) one edge to be subdvided and two not, and finally (d) no edge to be subdivided.
- the subdivision dictionary with four cases as shown of Figure 19 may have the advantage that it does not add vertices, while ensuring consistent connectivity.
- the approach in cases (b) and (c) on Figure 19 may create very elongated triangles around vertices, which have many neighbors. This may create a representation of the geometry that is not suitable for texture mapping or rendering as it creates aliasing or ripple effects as illustrated in Figure 20.
- Elongated triangles caused by adaptive triangle splits on a high-valence (i.e. , high number of neighbors) vertex (left) may lead to artefacts (right) at the rendering stage, often called aliasing (or ripple) effects.
- aliasing or ripple
- the input mesh sequence has been pre-processed, creating base meshes, patches, subdivision levels and displacement vectors for each frame.
- the pre-processing block (e.g. the block 91 in Fig. 9) generates base meshes that are good approximations of the original mesh frames but with less vertices, which means that they are at a lower resolution.
- the pre-processing is performed in such way that the base mesh topology is the same for all frames inside the given GoF.
- each base mesh is assumed to have been segmented into patches (clusters of connected triangles), in a consistent manner over the whole GoF in Random Access conditions or even on the whole sequence if the base mesh connectivity is preserved over the whole sequence.
- each patch is assigned with a subdivision iteration count and corresponding displacement vectors such that the reconstructed mesh may have a higher resolution in some areas, corresponding to patches with higher subdivision iteration count, than others.
- the base mesh m(i), the displacement vectors d(i) and the attribute map data A(i) may be encoded by an encoder 92.
- the adaptive subdivision is performed both at the encoder 92 and a decoder 130 (Fig. 13).
- the following representations of shared edges and edges are considered. An edge belongs to only one triangle, while a shared edge is shared by one or two triangles. Contours of patches are provided as chains of edges and shared edges. Each edge is assigned the subdivision iteration count of the patch it belongs to.
- an adaptive subdivision dictionary is used that takes into account the fact that edges may further be subdivided, even more than once, compared to the other triangle edges. Furthermore, creation of a minimal set of additional vertices is enabled that further smooth the transition areas and may lead to better shaped triangles, while improving compression performance.
- FIG. 21 , 22 and 23 An example of the dictionary is illustrated in Figures 21 , 22 and 23 and comprises 9 example cases, for each combination of triangle exterior edge types. These edge types are defined as follows: edge for which two or more subdivision steps need to be performed compared to the current subdivision level (indicated with reference G in Figs. 21 — 23), edge for which one more subdivision needs to be performed compared to the current subdivision level (indicated with reference B in Figs. 21 — 23), and edge for which no subdivision will be further performed (indicated with reference R in Figs. 21—23).
- an “rgb” code is assigned to a current triangle where r is the number of red edges in the triangle, g is the number of green edges, and b is the number of blue edges in the triangle.
- the code ‘300’ corresponds to a triangle with three red exterior edges, i.e. , that require no further subdivision.
- Derivation process of the initial subdivision iteration count (noted subjc) of an edge given patches with different subdivision iteration counts is as follows. It is noted here that it may be trivial if all patches have the same subdivision iteration count. This is the initialization that maps a color to the edge of triangles following the aforementioned rgb code: if the edge does not have to be subdivided (red), if the edge subdivision iteration count is above 1 (green), and if the edge subdivision iteration count is equal to 1 (blue).
- the first step is to create for each patch, a list of boundary triangles.
- boundary triangles are triangles that contain an edge, which is connected to a triangle from another patch. All other triangles of the patch are considered as inner triangles. It is worth noting that if a triangle T contains a vertex that is connected to a triangle from another patch, but that the edges of the triangle T are not connected to a triangle of another patch, then the triangle T is considered as an inner triangle.
- Figure 21 shows an adaptive subdivision dictionary with rgb codes 300, 003 and 030. These are the cases where subdivision does not happen (300), or is regular (003 and 030), no extra vertices are created.
- Figure 22 shows an adaptive subdivision dictionary codes 021 , 201 and 102.
- 021 subdivision is regular in triangles that are not connected to the blue edge.
- 201 a single edge is added to connect the midpoint of the blue edge with the triangle opposite vertex.
- 102 two edges are added to connect midpoints of the blue edges. No extra vertices are created for these three cases.
- Figure 23 shows an adaptive subdivision dictionary codes 120, 111 and 210.
- code 120 a midpoint vertex is created on the red edge to allow an almost uniform triangulation close to code 021 .
- code 111 a midpoint vertex is created on the red edge; the first subdivision of the triangle is regular, while the second iteration only adds two edges to connect vertices from the green edge.
- code 210 two midpoint vertices are created for the two red edges and lead to a triangulation that is similar to case 111.
- the additional vertices that are added during the processing of triangles matching cases with code 120, 111 and 210, can either be interpolated by the decoder and encoder without additional displacement vectors being created, encoded or decoded, by computing the position of these vertices as the midpoint of their respective edge in the triangle.
- introducing a displacement vector for these extra vertices may improve the quality of the mesh reconstruction and encoding.
- the encoder may include in the patch data unit the positions, e.g., as raw data of the extra vertices to be added.
- the decoder even without a vertex id, can use this extra vertex position information, for example by finding the nearest extra point obtained by simple mid-point prediction, with a kd-tree-based nearest neighbor search.
- Figures 24a — 24d illustrate an example of an adaptive subdivision without utilizing the procedure disclosed above.
- the central triangle belongs to a patch that is not to be subdivided, while the left and right triangles are subdivided with three iterations.
- a prior art dictionary is applied for a first subdivision iteration.
- the prior art dictionary is applied after two additional subdivisions.
- elongated triangles produced by this method are painted in dark grey.
- Figures 25a — 25d illustrate the same example for the four cases of dictionaries according to the procedure described above.
- the prior art leads to suboptimal triangulations.
- the triangle in the middle is classified as rgb code 120 and is therefore subdivided.
- the resulting exterior triangles are classified as 120, 210 and 210 respectively, while the interior triangle is classified as 003 as its neighbors will undergo a subdivision.
- Figure 25c it can be seen that applying the disclosed dictionary generates 7 new vertices.
- a post-processing can be applied to decimate the result so as to reduce the number of neighbors of one of the extra vertices. This enables to improve the triangle shapes of the end result.
- the approach disclosed in this disclosure may require less additional vertices than a regular subdivision, while creating triangles that have good shapes. Because, in some cases, a few extra vertices may have too many neighbors, an additional postprocessing step is applied on the transition area to decimate the resulting triangulation, leading to an even better result, with fewer additional vertices.
- the postprocessing step can be implemented with a simplification algorithm based on quadric error metric, like the one used to generate the base mesh in the pre-processing step, but other choices are also possible.
- Displacement vectors can be computed for the extra generated vertices with the approach disclosed above. Their wavelet transform can be computed as for other vertices, for example with the midpoint prediction.
- Signalling of the approach includes at least the following signals:
- adaptive_subdivision_flag having either a value indicative of usage of the adaptive subdivision or not usage of the adaptive subdivision.
- another flag such as adaptive_subdivision_post_simplification may also be signalled to indicate if postprocessing simplification is used as well.
- the simplification algorithm can be signalled in as well on the parameter set level.
- the simplification algorithm can be signalled on a patch level.
- the simplification algorithm can be signalled in a SEI message.
- the number of additional vertices introduced by the adaptive subdivision is encoded in the patch data unit (in V3C nomenclature) as adaptive_subdivision_additional_vertices_number_minus1.
- the vertex indices of these additional vertices can be included in the patch data unit as well.
- the displacement vectors of these additional vertices are packed in the video component like other displacement vectors in the order of their respective index.
- the index of the additional vertices may be computed as follows:
- a new index may be computed by:
- New_ld minjd * total_number_of_vertices+max_id and sorting all new produced indices for all patches. The smallest of these new indices will be mapped to an id equal to mesh_vertex_count_minus1 + 1 , and so forth to produce a consistent indexing.
- the generated indices may be used to order displacement vectors or wavelet coefficients at the end of their respective patch in the displacement video component.
- asps_adaptive_subdivision_flag indicates the adaptive subdivision process is used.
- asps_adaptive_subdivision_post_simplification if asps_adaptive_subdivision post_simplification is true, indicates that a post-processing simplification filter is applied after the adaptive subdivision.
- asps_adaptive_subdivision_additional_displacements flag indicates the presence of additional displacements to be used in the adaptive subdivision process.
- the adaptive subdivision process signaling information is included in the atlas frame parameter set (afps).
- afps adaptive subdivision flag indicates the adaptive subdivision process is used.
- afps_adaptive_subdivision_post_simplification if asps_adaptive_subdivision_flag is true, indicates that a post-processing simplification filter is applied after the adaptive subdivision.
- afps_adaptive_subdivision_additional_displacements flag indicates the presence of additional displacements to be used in the adaptive subdivision process.
- the rgb code used in the adaptive subdivision process may be explicitely signalled per triangle in patches that have a lower subdivision iteration count than their neighboring patches. In order to allow for enhanced coding, and because some coulour indices may be more frequent than others, the signalling may be encoded using for example Exp-Golomb codes or arithmetic coding such as CABAC.
- the colour code can be signalled with one code value per triangle within the patch or differentially by using the colour code difference between the colour code of the previous triangle and the one of the current triangle; the order of triangles being either implicit using the triangle indices of encoded in the base mesh or being explicit in the patch data unit by listing the order of triangles.
- mdu_adaptive_subdivision_explicit_triangle_code_present indicates the adaptive subdivision color code is present and signalled per triangle.
- mdu_triangle_index_diff[ tilelD ][ patchldx ] represents the triangle index difference with the previous triangle index for triangles in the current patch patchldx of the current tile tilelD.
- mdu_as_triangle_color_code[ tilelD ][ patchldx ] represents the adaptive subdivison color code for the current triangle in the current patch patchldx and current tile tilelD.
- the color code may be also signalled per edge of the patch.
- extra vertices introduced by the adaptive subdivision process may be signalled in the bitstream with the following information.
- mdu_transform a transform
- mdu_transform a transform
- mdu additional displacements packing mode indicates the packing mode used for the additional displacements assigned to extra vertices that are introduced in the adaptive subdivision process.
- the mdu_additional_displacements_packing_mode may take the following values; 0 for incremental packing, 1 fortrailing packing, etc.
- Incremental packing of additional displacements introduced by adaptive subdivision describes a packing technique that embeds for each subdivision iteration these additional displacements (for this subdivision iteration) after the patch displacements for the current subdivision iteration.
- Trailing packing of additional displacements introduced by adaptive subdivision describes a packing technique that embeds these additional displacements after all the patch displacements. Additional displacements are sorted based on their vertex index and corresponding subdivision iteration. mdu_additional_displacements_count_minus1 indicates the number of additional displacements for the current patch. mdu_additional_vertices_prediction_mode indicates the mode used to predict the position of the additional vertices produced by the adaptive subdivision process within the patch. The possible modes may include 0 for midpoint prediction, 1 for one_ring_average_prediction, 2 for quadric-energy-based prediction etc.
- Midpoint prediction mode consists in predicting the additional vertex position on an edge e[v0,v1] connecting vertices vO and v1 , by averaging the positions of the two vertices vO and v1 .
- One-ring-average prediction mode consists in predicting the additional vertex position on an edge e[V0, V1 ] connecting vertices vO and v1 , by averaging the positions of vertices vO, v1 , v2 and v3 where v2 and v3 are the vertices (different from vO and v1 ) respectively belonging to the triangles (v0,v1 ,v2) and (v0,v1 ,v3) that are connected to the edge e.
- e is a border edge belonging to a unique triangle (v0,v1 ,v2)
- the position is predicted as the average of vO, v1 and v2 positions.
- Quadric-energy prediction mode consists in predicting the position of the additional vertex on an edge e[v0,v1] as in the quadric-energy based mesh decimation process, i.e., by estimating the extremum of a quadric equation, e.g., based on the two respective planes formed by the two triangles that contain the edge e.
- the method for decoding generally comprises obtaining 2605 a bitstream representing volumetric video comprising of encoded base meshes, corresponding displacement vectors and encoded patch information; decoding 2610 base meshes from the bitstream that are approximations of the original mesh frames but with less vertices and decoding the corresponding displacement vectors, and decoding the encoded patch information; extracting 2615 a subdivision iteration count and an adaptive subdivision flag from the patch information; reconstructing 2620 the base meshes and applying the corresponding displacement vectors to generate mesh frames; and adaptively subdividing 2625 edges of the triangles of the mesh frames.
- Each of the steps can be implemented by a respective module of a computer system.
- the method for encoding generally comprises obtaining 2655 one or more mesh frames from sequences representing volumetric video; generating 2660 base meshes from the mesh frames that are approximations of the original mesh frames but with less vertices; segmenting 2665 each base mesh into patches; assigning 2670 each patch with a subdivision iteration count and corresponding displacement vectors; adaptively subdividing 2675 edges of the triangles; signalling 2680 the subdivision iteration count and an adaptive subdivision flag indicative of usage of the adaptive subdivision in or along the patch information; and encoding 2685 the sequence of base mesh frames and corresponding displacement vectors, and patch information in a bitstream.
- Each of the steps can be implemented by a respective module of a computer system.
- An apparatus comprises means for obtaining a bitstream representing volumetric video comprising of encoded base meshes, corresponding displacement vectors and encoded patch information; means for decoding base meshes from the bitstream that are approximations of the original mesh frames but with less vertices and decoding the corresponding displacement vectors, and decoding the encoded patch information; means for extracting a subdivision iteration count and an adaptive subdivision flag from the patch information; means for reconstructing the base meshes and applying the corresponding displacement vectors to generate mesh frames; and means for adaptively subdividing edges of the triangles of the mesh frames.
- the means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry.
- the memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Figure 26 according to various embodiments.
- the memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Figure 26a and/or 26b according to various embodiments.
- Figure 27 shows a block diagram of a video coding system according to an example embodiment as a schematic block diagram of an electronic device 50, which may incorporate a codec.
- the electronic device may comprise an encoder or a decoder.
- the electronic device 50 may for example be a mobile terminal or a user equipment of a wireless communication system or a camera device.
- the electronic device 50 may be also comprised at a local or a remote server or a graphics processing unit of a computer.
- the device may be also comprised as part of a head-mounted display device.
- the apparatus 50 may comprise a display 32 in the form of a liquid crystal display. In other embodiments of the invention the display may be any suitable display technology suitable to display an image or video.
- the apparatus 50 may further comprise a keypad 34.
- any suitable data or user interface mechanism may be employed.
- the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
- the apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input.
- the apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection.
- the apparatus 50 may also comprise a battery (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator).
- the apparatus may further comprise a camera 42 capable of recording or capturing images and/or video.
- the camera 42 may be a multi-lens camera system having at least two camera sensors.
- the camera is capable of recording or detecting individual frames which are then passed to the codec 54 or the controller for processing.
- the apparatus may receive the video and/or image data for processing from another device prior to transmission and/or storage.
- the apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50.
- the apparatus or the controller 56 may comprise one or more processors or processor circuitry and be connected to memory 58 which may store data in the form of image, video and/or audio data, and/or may also store instructions for implementation on the controller 56 or to be executed by the processors or the processor circuitry.
- the controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of image, video and/or audio data or assisting in coding and decoding carried out by the controller.
- the apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC (Universal Integrated Circuit Card) and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
- the apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network.
- the apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).
- the apparatus may comprise one or more wired interfaces configured to transmit and/or receive data over a wired connection, for example an electrical cable or an optical fiber connection.
- the various embodiments may provide advantages. For example, the present embodiments improve rendering quality around edges. In addition, the present embodiments improve allocation of vertices around object edges. Further, the present embodiments improve rendering performance by avoiding overly dense meshes.
- the present embodiments provide scalable approach for mesh quality signaling, which may be adjusted by client device capabilities. The present embodiments can be used to remove most visible rendering artefacts that are typical for mesh-based rendering of V3C content.
- a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment.
- a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of a various embodiments.
- a computer program product according to an embodiment can be embodied on a non- transitory computer readable medium.
- the computer program product can be downloaded over a network in a data packet.
- the different functions discussed herein may be performed in a different order and/or concurrently with other.
- one or more of the abovedescribed functions and embodiments may be optional or may be combined.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Les modes de réalisation se rapportent à un procédé de codage et de décodage d'une vidéo volumétrique. Les modes de réalisation se rapportent également à un équipement pour mettre en œuvre les procédés. Le procédé de décodage comprend l'obtention d'un train de bits représentant une vidéo volumétrique comprenant des mailles de base codées, des vecteurs de déplacement correspondants et des informations de correctif codées ; le décodage de mailles de base à partir du train de bits qui sont des approximations des trames de maillage d'origine mais avec moins de sommets et le décodage des vecteurs de déplacement correspondants, ainsi que le décodage des informations de correctif codées ; l'extraction d'un décompte d'itérations de subdivision et d'un drapeau de subdivision adaptative à partir des informations de correctif ; la reconstruction des mailles de base et l'application des vecteurs de déplacement correspondants pour générer des trames de maillage ; et la subdivision adaptative des bords des triangles des trames de maillage.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FI20225669 | 2022-07-14 | ||
FI20225669 | 2022-07-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024012765A1 true WO2024012765A1 (fr) | 2024-01-18 |
Family
ID=86776150
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2023/064327 WO2024012765A1 (fr) | 2022-07-14 | 2023-05-29 | Procédé, appareil et produit-programme informatique de vidéocodage et de décodage vidéo |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024012765A1 (fr) |
-
2023
- 2023-05-29 WO PCT/EP2023/064327 patent/WO2024012765A1/fr unknown
Non-Patent Citations (2)
Title |
---|
JUNGSUN KIM (APPLE) ET AL: "[V-DMC] VDMC support in the V3C framework", no. m60748, 28 October 2022 (2022-10-28), XP030305216, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/140_Mainz/wg11/m60748-v5-m60784_v4.zip m60784_v4/m60748_vdmc_v2.pptx> [retrieved on 20221028] * |
KHALED MAMMOU (APPLE) ET AL: "[V-CG] Apple's Dynamic Mesh Coding CfP Response", no. m59281, 16 April 2022 (2022-04-16), XP030301428, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/138_OnLine/wg11/m59281-v2-m59281-v2.zip WG07_Apple_Response_DynamicMesh_CFP_final_dscriptors.docx> [retrieved on 20220416] * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12101457B2 (en) | Apparatus, a method and a computer program for volumetric video | |
US20220383552A1 (en) | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method | |
US11711535B2 (en) | Video-based point cloud compression model to world signaling information | |
CN114097229A (zh) | 点云数据发送设备、点云数据发送方法、点云数据接收设备和点云数据接收方法 | |
EP4399877A1 (fr) | Appareil, procédé et programme informatique destinés à une vidéo volumétrique | |
US20220230360A1 (en) | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method | |
WO2021260266A1 (fr) | Procédé, appareil et produit-programme informatique pour codage vidéo volumétrique | |
WO2021191495A1 (fr) | Procédé, appareil et produit-programme d'ordinateur pour codage vidéo et décodage vidéo | |
EP4373097A1 (fr) | Dispositif d'émission de données de nuage de points, procédé d'émission de données de nuage de points, dispositif de réception de données de nuage de points et procédé de réception de données de nuage de points | |
EP4373096A1 (fr) | Dispositif et procédé de transmission de données de nuage de points, et dispositif et procédé de réception de données en nuage de points | |
CN115668919A (zh) | 点云数据发送装置、点云数据发送方法、点云数据接收装置和点云数据接收方法 | |
CN116438799A (zh) | 点云数据发送装置、点云数据发送方法、点云数据接收装置和点云数据接收方法 | |
US20230171427A1 (en) | Method, An Apparatus and a Computer Program Product for Video Encoding and Video Decoding | |
WO2023144445A1 (fr) | Procédé, appareil et produit-programme informatique de codage et de décodage vidéo | |
EP4329311A1 (fr) | Dispositif de transmission de données de nuage de points, procédé de transmission de données de nuage de points, dispositif de réception de données de nuage de points et procédé de réception de données de nuage de points | |
EP4369716A1 (fr) | Dispositif d'émission de données de nuage de points, procédé d'émission de données de nuage de points, dispositif de réception de données de nuage de points et procédé de réception de données de nuage de points | |
WO2024012765A1 (fr) | Procédé, appareil et produit-programme informatique de vidéocodage et de décodage vidéo | |
EP4443880A1 (fr) | Procédé, appareil et produit de programme informatique pour coder et décoder un contenu multimédia volumétrique | |
WO2024084128A1 (fr) | Procédé, appareil et produit-programme informatique de codage et de décodage vidéo | |
WO2024170819A1 (fr) | Procédé, appareil et produit-programme informatique d'encodage et de décodage de vidéo | |
WO2023001623A1 (fr) | Signalisation de connectivité de pièces v3c pour compression de maillage | |
US20230298218A1 (en) | V3C or Other Video-Based Coding Patch Correction Vector Determination, Signaling, and Usage | |
EP4412208A1 (fr) | Procédé d'émission de données de nuage de points, dispositif d'émission de données de nuage de points, procédé de réception de données de nuage de points et dispositif de réception de données de nuage de points | |
WO2024209129A1 (fr) | Procédé, appareil et produit-programme informatique de codage et de décodage vidéo | |
US20240179347A1 (en) | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23731128 Country of ref document: EP Kind code of ref document: A1 |