EP3987774A1 - Appareil, procédé et programme informatique pour vidéo volumétrique - Google Patents

Appareil, procédé et programme informatique pour vidéo volumétrique

Info

Publication number
EP3987774A1
EP3987774A1 EP20826896.1A EP20826896A EP3987774A1 EP 3987774 A1 EP3987774 A1 EP 3987774A1 EP 20826896 A EP20826896 A EP 20826896A EP 3987774 A1 EP3987774 A1 EP 3987774A1
Authority
EP
European Patent Office
Prior art keywords
patch
surface normal
image
representation
auxiliary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP20826896.1A
Other languages
German (de)
English (en)
Other versions
EP3987774A4 (fr
Inventor
Sebastian Schwarz
Kimmo Roimela
Mika Pesonen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of EP3987774A1 publication Critical patent/EP3987774A1/fr
Publication of EP3987774A4 publication Critical patent/EP3987774A4/fr
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/172Processing image signals image signals comprising non-image signal components, e.g. headers or format information
    • H04N13/178Metadata, e.g. disparity information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding

Definitions

  • FIGs. 2a and 2b show a compression and a decompression process for 3D volumetric video
  • a video codec comprises an encoder that transforms the input video into a compressed representation suited for storage/transmission, and a decoder that can un compress the compressed video representation back into a viewable form.
  • An encoder may discard some information in the original video sequence in order to represent the video in a more compact form (i.e. at lower bitrate).
  • Volumetric video data represents a three-dimensional scene or object, and thus such data can be viewed from any viewpoint.
  • Volumetric video data can be used as an input for augmented reality (AR), virtual reality (VR) and mixed reality (MR) applications.
  • AR augmented reality
  • VR virtual reality
  • MR mixed reality
  • Such data describes geometry (shape, size, position in 3D-space) and respective attributes (e.g. color, opacity, reflectance, ...), together with any possible temporal changes of the geometry and attributes at given time instances (e.g. frames in 2D video).
  • Volumetric video is either generated from 3D models, i.e. computer-generated imagery (CGI), or captured from real-world scenes using a variety of capture solutions, e.g.
  • CGI computer-generated imagery
  • each point of each 3D surface is described as a 3D point with color and/or other attribute information such as surface normal or material reflectance.
  • Point cloud is a set of data points in a coordinate system, for example in a three- dimensional coordinate system being defined by X, Y, and Z coordinates.
  • the points may represent an external surface of an object in the screen space, e.g. in a three-dimensional space.
  • Figure lb illustrates a predicted representation of an image block (P' n ); a reconstructed prediction error signal (D' n ); a preliminary reconstructed image (I' n ); a final reconstructed image (R' n ); an inverse transform (T 1 ); an inverse quantization (Q 1 ); an entropy decoding (E 1 ); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F).
  • a coding unit may be defined as a coding block of luma samples, two corresponding coding blocks of chroma samples of a picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is coded using three separate color planes and syntax structures used to code the samples.
  • a CU with the maximum allowed size may be named as LCU (largest coding unit) or coding tree unit (CTU) and the video picture is divided into non-overlapping LCUs.
  • a picture can be partitioned in tiles, which are rectangular and contain an integer number of LCUs.
  • the partitioning to tiles forms a regular grid, where heights and widths of tiles differ from each other by one LCU at the maximum.
  • a slice is defined to be an integer number of coding tree units contained in one independent slice segment and all subsequent dependent slice segments (if any) that precede the next independent slice segment (if any) within the same access unit.
  • a slice segment is defined to be an integer number of coding tree units ordered consecutively in the tile scan and contained in a single NAL unit. The division of each picture into slice segments is a partitioning.
  • Texture picture(s) and the respective geometry picture(s), if any, and the respective attribute picture(s) may have the same or different chroma format.
  • Terms texture image and texture picture may be used interchangeably.
  • Terms geometry image and geometry picture may be used interchangeably.
  • a specific type of a geometry image is a depth image.
  • Embodiments described in relation to a geometry image equally apply to a depth image, and embodiments described in relation to a depth image equally apply to a geometry image.
  • Terms attribute image and attribute picture may be used interchangeably.
  • a geometry picture and/or an attribute picture may be treated as an auxiliary picture in video/image encoding and/or decoding.
  • Each point cloud frame represents a dataset of points within a 3D volumetric space that has unique coordinates and attributes.
  • An example of a point cloud frame is shown on Figure 3 a.
  • the patch generation process decomposes the point cloud frame by converting 3d samples to 2d samples on a given projection plane using a strategy that provides the best compression.
  • the patch generation process aims at decomposing the point cloud into a minimum number of patches with smooth boundaries, while also minimizing the reconstruction error.
  • TMC2vO the following approach is implemented.
  • each point is associated with the plane that has the closest normal (i.e., maximizes the dot product of the point normal n p . and the plane normal n Pidx
  • the packing process aims at mapping the extracted patches onto a 2D grid while trying to minimize the unused space, and guaranteeing that every TxT (e.g., 16x16) block of the grid is associated with a unique patch.
  • T is a user-defined parameter that is encoded in the bitstream and sent to the decoder.
  • TMC2vO uses a simple packing strategy that iteratively tries to insert patches into a WxH grid.
  • W and H are user defined parameters, which correspond to the resolution of the geometry/texture images that will be encoded.
  • the patch location is determined through an exhaustive search that is performed in raster scan order. The first location that can guarantee an overlapping-free insertion of the patch is selected and the grid cells covered by the patch are marked as used. If no empty space in the current resolution image can fit a patch, then the height H of the grid is temporarily doubled and search is applied again. At the end of the process, H is clipped so as to fit the used grid cells.
  • the image generation process exploits the 3D to 2D mapping computed during the packing process to store the geometry and texture of the point cloud as images.
  • each patch is projected onto two images, referred to as layers. More precisely, let H(u,v) be the set of points of the current patch that get projected to the same pixel (u, v).
  • the first layer also called the near layer, stores the point of H(u,v) with the lowest depth DO.
  • the second layer referred to as the far layer, captures the point of H(u,v) with the highest depth within the interval [DO, DO+D], where D is a user-defined parameter that describes the surface thickness.
  • the generated videos have the following characteristics: geometry: WxH YUV420-8bit, where the geometry video is monochromatic, and texture: WxH YUV420- 8bit, where the texture generation procedure exploits the reconstructed/smoothed geometry in order to compute the colors to be associated with the re-sampled points.
  • Each block of TxT (e.g., 16x16) pixels is processed independently.
  • the pixels of the block are filled by copying either the last row or column of the previous TxT block in raster order.
  • the block has both empty and filled pixels (i.e. a so-called edge block), then the empty pixels are iteratively filled with the average value of their non-empty neighbors.
  • the generated images/layers are stored as video frames and compressed using a video codec.
  • mapping information providing for each TxT block its associated patch index is encoded as follows:
  • the occupancy map consists of a binary map that indicates for each cell of the grid whether it belongs to the empty space or to the point cloud.
  • one cell of the 2D grid produces a pixel during the image generation process.
  • an occupancy map when considering an occupancy map as an image, it may be considered to comprise occupancy patches.
  • the occupancy map compression leverages the auxiliary information described in previous section, in order to detect the empty TxT blocks (i.e., blocks with patch index 0).
  • the remaining blocks are encoded as follows.
  • the occupancy map could be encoded with a precision of a BOxBO blocks.
  • the generated binary image covers only a single colour plane. However, given the prevalence of 4:2:0 codecs, it may be desirable to extend the image with“neutral” or fixed value chroma planes (e.g. adding chroma planes with all sample values equal to 0 or 128, assuming the use of an 8-bit codec).
  • the obtained video frame is compressed by using a video codec with lossless coding tool support (e.g., AVC, HEVC RExt, HEVC-SCC).
  • Occupancy map is simplified by detecting empty and non-empty blocks of resolution TxT in the occupancy map and only for the non-empty blocks we encode their patch index as follows:
  • a list of candidate patches is created for each TxT block by considering all the patches that contain that block.
  • the list of candidates is sorted in the reverse order of the patches.
  • the point cloud geometry reconstruction process exploits the occupancy map information in order to detect the non-empty pixels in the geometry/texture images/layers.
  • the 3D positions of the points associated with those pixels are computed by levering the auxiliary patch information and the geometry images. More precisely, let P be the point associated with the pixel (u, v) and let (50, sO, rO) be the 3D location of the patch to which it belongs and (uO, vO, ul, vl) its 2D bounding box. P could be expressed in terms of depth d (u, v), tangential shift s(u, v) and bi-tangential shift r(u, v) as follows:
  • g(u, v) is the luma component of the geometry image.
  • the texture values are directly read from the texture images.
  • signalling surface normals in additional video streams suffers from at least two major restrictions.
  • the required precision to reflect the [0; 180] floating point range is not supported, but only 0.35 degree precision for lObit video signals.
  • the required dimensionality is not given, i.e. YUV420 chroma sub-sampling allows for only one full resolution plane and leads to lower precision in the surface normal signalling. Consequently, to avoid any sub-sampling problems, three individual video streams would be required, which naturally causes significant overhead in the signalling.
  • auxiliary patch information comprises metadata relating to surface properties of the patch and one or more indicators of the surface normal of the patch for configuring reconstruction of the 3D representation of said at least one object; and encoding (404) the geometry image, the texture image, the occupancy map and the auxiliary patch information in or along a bitstream.
  • V-PCC already packs patches with related surface information, as described above, is utilised by introducing a per-patch surface normal signalling into the auxiliary patch information to increase the precision of video-based surface normal vector signalling in V-PCC.
  • the surface normal of the patch is indicated as a residual between a clustered projection plane surface normal of the patch and an actual value of the surface normal of the patch.
  • each patch is clustered based on its dominant projection plane surface vector.
  • the surface normals signalled in an independent video track can be calculated as residual between the projection plane surface normal and their actual value.
  • auxiliary information bit stream syntax of V-PCC is disclosed in the document MPEG N18180, also referred to as ISO/IEC 23090-5:2018(E)“Study of CD”. According to an embodiment, said one or more indicators are introduced in
  • a syntax element which may be referred to as pdu SNV residual flag is added to a pdu normal axis or any other suitable syntax structure for ISO/IEC 23090-5 (or similar volumetric video coding technology).
  • a syntax element which may be referred to as pdu SNV residual quantization max, may also be added to a pdu normal axis in order to signal the resulting maximum quantization range.
  • Such signalling is a simple but effective way of introducing a per-patch surface normal signalling into the auxiliary patch information, which increases the precision by reducing quantization range.
  • This process may be a preferred operation mode for surface normal signaling, or the process can be switched on or off, e.g. with using said syntax elements as per-patch flags, as shown in Table 1.
  • the clustered projection plane surface normal of the patch is used as a basis for calculating the residual.
  • the surface normal of the patch is indicated as a residual between an averaged surface normal of each pixel of the patch and an actual value of the surface normal of the patch. Such signalling provides a more precise value for the surface normal of the patch, from which the residual value may also be calculated more precisely.
  • the signalling of the averaged surface normal of each pixel of the patch is carried out by three elements indicating tangent, bitangent and normal component of the surface normal relative to the projection plane of the patch in question.
  • the syntax element indicating tangent, bitangent and normal component of the surface normal relative to the projection plane of the patch in question.
  • pdu_SNV_residual_flag may be replaced by three syntax elements, such as
  • the signalling of the averaged surface normal of each pixel of the patch is configured to be carried out by three syntax elements indicating the normal direction in three dimensions.
  • the signalling of the averaged surface normal of each pixel of the patch is configured to be carried out by two syntax elements indicating tangent and bitangent components of the surface normal relative to the projection plane of the patch in question, with a third normal component derived so that the resulting vector has unit length.
  • the resulting maximum quantization range shall be signalled.
  • the signalling may be carried out either on a sequence level, or on a per-patch level, e.g. as shown in the examples of Tables 1 and 2.
  • the surface normal of the patch is indicated as a residual between a vector pointing toward a focus point of the normal of each pixel of the patch and an actual value of the surface normal of the patch.
  • a more precise value for the surface normal of the patch may be calculated as patch normal focus point NFP, which represents the point towards all normals per point of a patch are pointing towards (or away from). Then, the following process can be used to generate normal vectors with reduced range, i.e. lower residual:
  • Np points away from NFP
  • Np points toward NFP
  • the benefit of this embodiment is a reduced residual from the surface normal predictor, thus reducing the quantisation range and improving coding efficiency.
  • the signalling of the vector pointing towards the focus point of the normal of each pixel of the patch is carried out by three syntax elements indicating the focus point in three dimensions.
  • the syntax element pdu_SNV_residual_flag may be replaced by three syntax elements, such as pdu SN V focus point X _ pdu_SNV_ focus_point_Y and pdu_SNV_ focus_point_Z, a shown in Table 3.
  • the vector pointing towards the focus point of the normal of each pixel of the patch could also be coded as a 3D residual, or a 2D residual followed by computing the third component via the unit-length rule in the decoder.
  • chroma-subsampling leads to lower precision when signalling the three elements of a surface normal vector.
  • the method further comprises obtaining a surface normal identifiable by three orthogonal surface normal vector components; identifying, per a spatial and a temporal unit, most significant surface normal vector component; assigning the most significant surface normal vector component to a luminance component for surface normal generation; and assigning the two remaining surface normal vector components to chrominance components to be subsampled for surface normal image generation.
  • the identification of the most significant surface normal vector component may be performed with rate-distortion optimization, which includes or estimates: downsampling of the motion fields of the other two motion vector components prior to encoding,
  • Different spatial units may be pre-determined (e.g. in a coding standard) or selected by the encoder, such as an entire picture, a slice, a tile, a set of slices or tiles, a patch, a block (such as a CTU in HE VC).
  • Different temporal units may be pre-determined (e.g. in a coding standard) or selected by the encoder, such as a single picture, a group of pictures, or a coded video sequence.
  • Component selection data is encoded into or along the bitstream, for example as an SEI message.
  • the component selection data may for example indicate which surface normal vector component is selected as the most significant (i.e., conveyed in the luma component) per an indicated spatial unit within the persistence scope (e.g., until the next SEI message of the same type).
  • the mapping of the other two surface normal vector components to the particular chroma components may follow a pre-defined algorithm.
  • pdu_SNV_element by including said syntax element in the example of the first embodiment of indicating the residuals against projection plane and the second embodiment of indicating the residual against per patch signalled normal vector, respectively.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un procédé comprenant les étapes suivantes : projeter une représentation 3D d'au moins un objet sur au moins un timbre 2D ; générer une image de géométrie, une image de texture, une carte d'occupation et des informations de timbres auxiliaires à partir du timbre 2D, les informations de timbres auxiliaires comprenant des métadonnées relatives aux propriétés de surface du timbre et un ou plusieurs indicateurs de la surface normale du timbre pour configurer une reconstruction de la représentation 3D d'au moins un objet ; et coder l'image de géométrie, l'image de texture, la carte d'occupation et les informations de timbres auxiliaires dans un flux binaire ou le long de celui-ci.
EP20826896.1A 2019-06-20 2020-05-18 Appareil, procédé et programme informatique pour vidéo volumétrique Pending EP3987774A4 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20195543 2019-06-20
PCT/FI2020/050331 WO2020254719A1 (fr) 2019-06-20 2020-05-18 Appareil, procédé et programme informatique pour vidéo volumétrique

Publications (2)

Publication Number Publication Date
EP3987774A1 true EP3987774A1 (fr) 2022-04-27
EP3987774A4 EP3987774A4 (fr) 2023-06-28

Family

ID=74036970

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20826896.1A Pending EP3987774A4 (fr) 2019-06-20 2020-05-18 Appareil, procédé et programme informatique pour vidéo volumétrique

Country Status (2)

Country Link
EP (1) EP3987774A4 (fr)
WO (1) WO2020254719A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024029348A1 (fr) * 2022-08-01 2024-02-08 ソニーグループ株式会社 Dispositif et procédé de traitement d'informations

Also Published As

Publication number Publication date
EP3987774A4 (fr) 2023-06-28
WO2020254719A1 (fr) 2020-12-24

Similar Documents

Publication Publication Date Title
EP3751857A1 (fr) Procédé, appareil et produit programme informatique de codage et décodage de vidéos volumétriques
US12101457B2 (en) Apparatus, a method and a computer program for volumetric video
EP3614674A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
US20210144404A1 (en) Apparatus, a method and a computer program for volumetric video
US20230068178A1 (en) A method, an apparatus and a computer program product for volumetric video encoding and decoding
US11659151B2 (en) Apparatus, a method and a computer program for volumetric video
WO2019158821A1 (fr) Appareil, procédé et programme informatique de vidéo volumétrique
US11711535B2 (en) Video-based point cloud compression model to world signaling information
WO2019185985A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
EP4399877A1 (fr) Appareil, procédé et programme informatique destinés à une vidéo volumétrique
WO2021191495A1 (fr) Procédé, appareil et produit-programme d'ordinateur pour codage vidéo et décodage vidéo
US11974026B2 (en) Apparatus, a method and a computer program for volumetric video
US12069314B2 (en) Apparatus, a method and a computer program for volumetric video
WO2021170906A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
US12047604B2 (en) Apparatus, a method and a computer program for volumetric video
EP3987774A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
EP3699867A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
WO2021165566A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
WO2019234290A1 (fr) Appareil, procédé et programme d'ordinateur pour vidéo volumétrique
WO2019162564A1 (fr) Appareil, procédé et programme d'ordinateur pour vidéo volumétrique
WO2019185983A1 (fr) Procédé, appareil et produit-programme d'ordinateur destinés au codage et au décodage de vidéo volumétrique numérique
WO2022074286A1 (fr) Procédé, appareil et produit-programme informatique de codage et de décodage vidéo
WO2023041838A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
EP4393148A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
WO2024079383A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20220112

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20230525

RIC1 Information provided on ipc code assigned before grant

Ipc: G06T 15/50 20110101ALI20230519BHEP

Ipc: G06T 9/00 20060101ALI20230519BHEP

Ipc: G06T 15/04 20110101ALI20230519BHEP

Ipc: H04N 19/597 20140101ALI20230519BHEP

Ipc: H04N 13/178 20180101ALI20230519BHEP

Ipc: H04N 13/268 20180101ALI20230519BHEP

Ipc: H04N 13/161 20180101AFI20230519BHEP