EP3987774A1 - Appareil, procédé et programme informatique pour vidéo volumétrique - Google Patents
Appareil, procédé et programme informatique pour vidéo volumétriqueInfo
- Publication number
- EP3987774A1 EP3987774A1 EP20826896.1A EP20826896A EP3987774A1 EP 3987774 A1 EP3987774 A1 EP 3987774A1 EP 20826896 A EP20826896 A EP 20826896A EP 3987774 A1 EP3987774 A1 EP 3987774A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- patch
- surface normal
- image
- representation
- auxiliary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/001—Model-based coding, e.g. wire frame
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/161—Encoding, multiplexing or demultiplexing different image signal components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/172—Processing image signals image signals comprising non-image signal components, e.g. headers or format information
- H04N13/178—Metadata, e.g. disparity information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/282—Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
Definitions
- FIGs. 2a and 2b show a compression and a decompression process for 3D volumetric video
- a video codec comprises an encoder that transforms the input video into a compressed representation suited for storage/transmission, and a decoder that can un compress the compressed video representation back into a viewable form.
- An encoder may discard some information in the original video sequence in order to represent the video in a more compact form (i.e. at lower bitrate).
- Volumetric video data represents a three-dimensional scene or object, and thus such data can be viewed from any viewpoint.
- Volumetric video data can be used as an input for augmented reality (AR), virtual reality (VR) and mixed reality (MR) applications.
- AR augmented reality
- VR virtual reality
- MR mixed reality
- Such data describes geometry (shape, size, position in 3D-space) and respective attributes (e.g. color, opacity, reflectance, ...), together with any possible temporal changes of the geometry and attributes at given time instances (e.g. frames in 2D video).
- Volumetric video is either generated from 3D models, i.e. computer-generated imagery (CGI), or captured from real-world scenes using a variety of capture solutions, e.g.
- CGI computer-generated imagery
- each point of each 3D surface is described as a 3D point with color and/or other attribute information such as surface normal or material reflectance.
- Point cloud is a set of data points in a coordinate system, for example in a three- dimensional coordinate system being defined by X, Y, and Z coordinates.
- the points may represent an external surface of an object in the screen space, e.g. in a three-dimensional space.
- Figure lb illustrates a predicted representation of an image block (P' n ); a reconstructed prediction error signal (D' n ); a preliminary reconstructed image (I' n ); a final reconstructed image (R' n ); an inverse transform (T 1 ); an inverse quantization (Q 1 ); an entropy decoding (E 1 ); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F).
- a coding unit may be defined as a coding block of luma samples, two corresponding coding blocks of chroma samples of a picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is coded using three separate color planes and syntax structures used to code the samples.
- a CU with the maximum allowed size may be named as LCU (largest coding unit) or coding tree unit (CTU) and the video picture is divided into non-overlapping LCUs.
- a picture can be partitioned in tiles, which are rectangular and contain an integer number of LCUs.
- the partitioning to tiles forms a regular grid, where heights and widths of tiles differ from each other by one LCU at the maximum.
- a slice is defined to be an integer number of coding tree units contained in one independent slice segment and all subsequent dependent slice segments (if any) that precede the next independent slice segment (if any) within the same access unit.
- a slice segment is defined to be an integer number of coding tree units ordered consecutively in the tile scan and contained in a single NAL unit. The division of each picture into slice segments is a partitioning.
- Texture picture(s) and the respective geometry picture(s), if any, and the respective attribute picture(s) may have the same or different chroma format.
- Terms texture image and texture picture may be used interchangeably.
- Terms geometry image and geometry picture may be used interchangeably.
- a specific type of a geometry image is a depth image.
- Embodiments described in relation to a geometry image equally apply to a depth image, and embodiments described in relation to a depth image equally apply to a geometry image.
- Terms attribute image and attribute picture may be used interchangeably.
- a geometry picture and/or an attribute picture may be treated as an auxiliary picture in video/image encoding and/or decoding.
- Each point cloud frame represents a dataset of points within a 3D volumetric space that has unique coordinates and attributes.
- An example of a point cloud frame is shown on Figure 3 a.
- the patch generation process decomposes the point cloud frame by converting 3d samples to 2d samples on a given projection plane using a strategy that provides the best compression.
- the patch generation process aims at decomposing the point cloud into a minimum number of patches with smooth boundaries, while also minimizing the reconstruction error.
- TMC2vO the following approach is implemented.
- each point is associated with the plane that has the closest normal (i.e., maximizes the dot product of the point normal n p . and the plane normal n Pidx
- the packing process aims at mapping the extracted patches onto a 2D grid while trying to minimize the unused space, and guaranteeing that every TxT (e.g., 16x16) block of the grid is associated with a unique patch.
- T is a user-defined parameter that is encoded in the bitstream and sent to the decoder.
- TMC2vO uses a simple packing strategy that iteratively tries to insert patches into a WxH grid.
- W and H are user defined parameters, which correspond to the resolution of the geometry/texture images that will be encoded.
- the patch location is determined through an exhaustive search that is performed in raster scan order. The first location that can guarantee an overlapping-free insertion of the patch is selected and the grid cells covered by the patch are marked as used. If no empty space in the current resolution image can fit a patch, then the height H of the grid is temporarily doubled and search is applied again. At the end of the process, H is clipped so as to fit the used grid cells.
- the image generation process exploits the 3D to 2D mapping computed during the packing process to store the geometry and texture of the point cloud as images.
- each patch is projected onto two images, referred to as layers. More precisely, let H(u,v) be the set of points of the current patch that get projected to the same pixel (u, v).
- the first layer also called the near layer, stores the point of H(u,v) with the lowest depth DO.
- the second layer referred to as the far layer, captures the point of H(u,v) with the highest depth within the interval [DO, DO+D], where D is a user-defined parameter that describes the surface thickness.
- the generated videos have the following characteristics: geometry: WxH YUV420-8bit, where the geometry video is monochromatic, and texture: WxH YUV420- 8bit, where the texture generation procedure exploits the reconstructed/smoothed geometry in order to compute the colors to be associated with the re-sampled points.
- Each block of TxT (e.g., 16x16) pixels is processed independently.
- the pixels of the block are filled by copying either the last row or column of the previous TxT block in raster order.
- the block has both empty and filled pixels (i.e. a so-called edge block), then the empty pixels are iteratively filled with the average value of their non-empty neighbors.
- the generated images/layers are stored as video frames and compressed using a video codec.
- mapping information providing for each TxT block its associated patch index is encoded as follows:
- the occupancy map consists of a binary map that indicates for each cell of the grid whether it belongs to the empty space or to the point cloud.
- one cell of the 2D grid produces a pixel during the image generation process.
- an occupancy map when considering an occupancy map as an image, it may be considered to comprise occupancy patches.
- the occupancy map compression leverages the auxiliary information described in previous section, in order to detect the empty TxT blocks (i.e., blocks with patch index 0).
- the remaining blocks are encoded as follows.
- the occupancy map could be encoded with a precision of a BOxBO blocks.
- the generated binary image covers only a single colour plane. However, given the prevalence of 4:2:0 codecs, it may be desirable to extend the image with“neutral” or fixed value chroma planes (e.g. adding chroma planes with all sample values equal to 0 or 128, assuming the use of an 8-bit codec).
- the obtained video frame is compressed by using a video codec with lossless coding tool support (e.g., AVC, HEVC RExt, HEVC-SCC).
- Occupancy map is simplified by detecting empty and non-empty blocks of resolution TxT in the occupancy map and only for the non-empty blocks we encode their patch index as follows:
- a list of candidate patches is created for each TxT block by considering all the patches that contain that block.
- the list of candidates is sorted in the reverse order of the patches.
- the point cloud geometry reconstruction process exploits the occupancy map information in order to detect the non-empty pixels in the geometry/texture images/layers.
- the 3D positions of the points associated with those pixels are computed by levering the auxiliary patch information and the geometry images. More precisely, let P be the point associated with the pixel (u, v) and let (50, sO, rO) be the 3D location of the patch to which it belongs and (uO, vO, ul, vl) its 2D bounding box. P could be expressed in terms of depth d (u, v), tangential shift s(u, v) and bi-tangential shift r(u, v) as follows:
- g(u, v) is the luma component of the geometry image.
- the texture values are directly read from the texture images.
- signalling surface normals in additional video streams suffers from at least two major restrictions.
- the required precision to reflect the [0; 180] floating point range is not supported, but only 0.35 degree precision for lObit video signals.
- the required dimensionality is not given, i.e. YUV420 chroma sub-sampling allows for only one full resolution plane and leads to lower precision in the surface normal signalling. Consequently, to avoid any sub-sampling problems, three individual video streams would be required, which naturally causes significant overhead in the signalling.
- auxiliary patch information comprises metadata relating to surface properties of the patch and one or more indicators of the surface normal of the patch for configuring reconstruction of the 3D representation of said at least one object; and encoding (404) the geometry image, the texture image, the occupancy map and the auxiliary patch information in or along a bitstream.
- V-PCC already packs patches with related surface information, as described above, is utilised by introducing a per-patch surface normal signalling into the auxiliary patch information to increase the precision of video-based surface normal vector signalling in V-PCC.
- the surface normal of the patch is indicated as a residual between a clustered projection plane surface normal of the patch and an actual value of the surface normal of the patch.
- each patch is clustered based on its dominant projection plane surface vector.
- the surface normals signalled in an independent video track can be calculated as residual between the projection plane surface normal and their actual value.
- auxiliary information bit stream syntax of V-PCC is disclosed in the document MPEG N18180, also referred to as ISO/IEC 23090-5:2018(E)“Study of CD”. According to an embodiment, said one or more indicators are introduced in
- a syntax element which may be referred to as pdu SNV residual flag is added to a pdu normal axis or any other suitable syntax structure for ISO/IEC 23090-5 (or similar volumetric video coding technology).
- a syntax element which may be referred to as pdu SNV residual quantization max, may also be added to a pdu normal axis in order to signal the resulting maximum quantization range.
- Such signalling is a simple but effective way of introducing a per-patch surface normal signalling into the auxiliary patch information, which increases the precision by reducing quantization range.
- This process may be a preferred operation mode for surface normal signaling, or the process can be switched on or off, e.g. with using said syntax elements as per-patch flags, as shown in Table 1.
- the clustered projection plane surface normal of the patch is used as a basis for calculating the residual.
- the surface normal of the patch is indicated as a residual between an averaged surface normal of each pixel of the patch and an actual value of the surface normal of the patch. Such signalling provides a more precise value for the surface normal of the patch, from which the residual value may also be calculated more precisely.
- the signalling of the averaged surface normal of each pixel of the patch is carried out by three elements indicating tangent, bitangent and normal component of the surface normal relative to the projection plane of the patch in question.
- the syntax element indicating tangent, bitangent and normal component of the surface normal relative to the projection plane of the patch in question.
- pdu_SNV_residual_flag may be replaced by three syntax elements, such as
- the signalling of the averaged surface normal of each pixel of the patch is configured to be carried out by three syntax elements indicating the normal direction in three dimensions.
- the signalling of the averaged surface normal of each pixel of the patch is configured to be carried out by two syntax elements indicating tangent and bitangent components of the surface normal relative to the projection plane of the patch in question, with a third normal component derived so that the resulting vector has unit length.
- the resulting maximum quantization range shall be signalled.
- the signalling may be carried out either on a sequence level, or on a per-patch level, e.g. as shown in the examples of Tables 1 and 2.
- the surface normal of the patch is indicated as a residual between a vector pointing toward a focus point of the normal of each pixel of the patch and an actual value of the surface normal of the patch.
- a more precise value for the surface normal of the patch may be calculated as patch normal focus point NFP, which represents the point towards all normals per point of a patch are pointing towards (or away from). Then, the following process can be used to generate normal vectors with reduced range, i.e. lower residual:
- Np points away from NFP
- Np points toward NFP
- the benefit of this embodiment is a reduced residual from the surface normal predictor, thus reducing the quantisation range and improving coding efficiency.
- the signalling of the vector pointing towards the focus point of the normal of each pixel of the patch is carried out by three syntax elements indicating the focus point in three dimensions.
- the syntax element pdu_SNV_residual_flag may be replaced by three syntax elements, such as pdu SN V focus point X _ pdu_SNV_ focus_point_Y and pdu_SNV_ focus_point_Z, a shown in Table 3.
- the vector pointing towards the focus point of the normal of each pixel of the patch could also be coded as a 3D residual, or a 2D residual followed by computing the third component via the unit-length rule in the decoder.
- chroma-subsampling leads to lower precision when signalling the three elements of a surface normal vector.
- the method further comprises obtaining a surface normal identifiable by three orthogonal surface normal vector components; identifying, per a spatial and a temporal unit, most significant surface normal vector component; assigning the most significant surface normal vector component to a luminance component for surface normal generation; and assigning the two remaining surface normal vector components to chrominance components to be subsampled for surface normal image generation.
- the identification of the most significant surface normal vector component may be performed with rate-distortion optimization, which includes or estimates: downsampling of the motion fields of the other two motion vector components prior to encoding,
- Different spatial units may be pre-determined (e.g. in a coding standard) or selected by the encoder, such as an entire picture, a slice, a tile, a set of slices or tiles, a patch, a block (such as a CTU in HE VC).
- Different temporal units may be pre-determined (e.g. in a coding standard) or selected by the encoder, such as a single picture, a group of pictures, or a coded video sequence.
- Component selection data is encoded into or along the bitstream, for example as an SEI message.
- the component selection data may for example indicate which surface normal vector component is selected as the most significant (i.e., conveyed in the luma component) per an indicated spatial unit within the persistence scope (e.g., until the next SEI message of the same type).
- the mapping of the other two surface normal vector components to the particular chroma components may follow a pre-defined algorithm.
- pdu_SNV_element by including said syntax element in the example of the first embodiment of indicating the residuals against projection plane and the second embodiment of indicating the residual against per patch signalled normal vector, respectively.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Library & Information Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FI20195543 | 2019-06-20 | ||
PCT/FI2020/050331 WO2020254719A1 (fr) | 2019-06-20 | 2020-05-18 | Appareil, procédé et programme informatique pour vidéo volumétrique |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3987774A1 true EP3987774A1 (fr) | 2022-04-27 |
EP3987774A4 EP3987774A4 (fr) | 2023-06-28 |
Family
ID=74036970
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20826896.1A Pending EP3987774A4 (fr) | 2019-06-20 | 2020-05-18 | Appareil, procédé et programme informatique pour vidéo volumétrique |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP3987774A4 (fr) |
WO (1) | WO2020254719A1 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024029348A1 (fr) * | 2022-08-01 | 2024-02-08 | ソニーグループ株式会社 | Dispositif et procédé de traitement d'informations |
-
2020
- 2020-05-18 WO PCT/FI2020/050331 patent/WO2020254719A1/fr active Application Filing
- 2020-05-18 EP EP20826896.1A patent/EP3987774A4/fr active Pending
Also Published As
Publication number | Publication date |
---|---|
EP3987774A4 (fr) | 2023-06-28 |
WO2020254719A1 (fr) | 2020-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3751857A1 (fr) | Procédé, appareil et produit programme informatique de codage et décodage de vidéos volumétriques | |
US12101457B2 (en) | Apparatus, a method and a computer program for volumetric video | |
EP3614674A1 (fr) | Appareil, procédé et programme informatique pour vidéo volumétrique | |
US20210144404A1 (en) | Apparatus, a method and a computer program for volumetric video | |
US20230068178A1 (en) | A method, an apparatus and a computer program product for volumetric video encoding and decoding | |
US11659151B2 (en) | Apparatus, a method and a computer program for volumetric video | |
WO2019158821A1 (fr) | Appareil, procédé et programme informatique de vidéo volumétrique | |
US11711535B2 (en) | Video-based point cloud compression model to world signaling information | |
WO2019185985A1 (fr) | Appareil, procédé et programme informatique pour vidéo volumétrique | |
EP4399877A1 (fr) | Appareil, procédé et programme informatique destinés à une vidéo volumétrique | |
WO2021191495A1 (fr) | Procédé, appareil et produit-programme d'ordinateur pour codage vidéo et décodage vidéo | |
US11974026B2 (en) | Apparatus, a method and a computer program for volumetric video | |
US12069314B2 (en) | Apparatus, a method and a computer program for volumetric video | |
WO2021170906A1 (fr) | Appareil, procédé et programme informatique pour vidéo volumétrique | |
US12047604B2 (en) | Apparatus, a method and a computer program for volumetric video | |
EP3987774A1 (fr) | Appareil, procédé et programme informatique pour vidéo volumétrique | |
EP3699867A1 (fr) | Appareil, procédé et programme informatique pour vidéo volumétrique | |
WO2021165566A1 (fr) | Appareil, procédé et programme informatique pour vidéo volumétrique | |
WO2019234290A1 (fr) | Appareil, procédé et programme d'ordinateur pour vidéo volumétrique | |
WO2019162564A1 (fr) | Appareil, procédé et programme d'ordinateur pour vidéo volumétrique | |
WO2019185983A1 (fr) | Procédé, appareil et produit-programme d'ordinateur destinés au codage et au décodage de vidéo volumétrique numérique | |
WO2022074286A1 (fr) | Procédé, appareil et produit-programme informatique de codage et de décodage vidéo | |
WO2023041838A1 (fr) | Appareil, procédé et programme informatique pour vidéo volumétrique | |
EP4393148A1 (fr) | Appareil, procédé et programme informatique pour vidéo volumétrique | |
WO2024079383A1 (fr) | Appareil, procédé et programme informatique pour vidéo volumétrique |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20220112 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20230525 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06T 15/50 20110101ALI20230519BHEP Ipc: G06T 9/00 20060101ALI20230519BHEP Ipc: G06T 15/04 20110101ALI20230519BHEP Ipc: H04N 19/597 20140101ALI20230519BHEP Ipc: H04N 13/178 20180101ALI20230519BHEP Ipc: H04N 13/268 20180101ALI20230519BHEP Ipc: H04N 13/161 20180101AFI20230519BHEP |