WO2022219230A1 - Procédé, appareil et produit-programme d'ordinateur de codage vidéo et de décodage vidéo - Google Patents
Procédé, appareil et produit-programme d'ordinateur de codage vidéo et de décodage vidéo Download PDFInfo
- Publication number
- WO2022219230A1 WO2022219230A1 PCT/FI2022/050084 FI2022050084W WO2022219230A1 WO 2022219230 A1 WO2022219230 A1 WO 2022219230A1 FI 2022050084 W FI2022050084 W FI 2022050084W WO 2022219230 A1 WO2022219230 A1 WO 2022219230A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- patch
- distribution
- parameters
- determined
- normal direction
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000004590 computer program Methods 0.000 title claims description 27
- 238000009826 distribution Methods 0.000 claims abstract description 64
- 230000001419 dependent effect Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 description 29
- 238000007906 compression Methods 0.000 description 23
- 230000006835 compression Effects 0.000 description 18
- 238000002156 mixing Methods 0.000 description 12
- 238000013459 approach Methods 0.000 description 11
- 239000003086 colorant Substances 0.000 description 11
- 238000009877 rendering Methods 0.000 description 11
- 241000023320 Luma <angiosperm> Species 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 239000000463 material Substances 0.000 description 7
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 7
- 238000012856 packing Methods 0.000 description 6
- 230000011664 signaling Effects 0.000 description 6
- 230000000007 visual effect Effects 0.000 description 6
- 230000006837 decompression Effects 0.000 description 5
- 238000001914 filtration Methods 0.000 description 5
- 239000000203 mixture Substances 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000002123 temporal effect Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 210000003128 head Anatomy 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 3
- 238000013139 quantization Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000005266 casting Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 238000013138 pruning Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000011049 filling Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000004886 head movement Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000003892 spreading Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/15—Processing image signals for colour aspects of image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/161—Encoding, multiplexing or demultiplexing different image signal components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/172—Processing image signals image signals comprising non-image signal components, e.g. headers or format information
- H04N13/178—Metadata, e.g. disparity information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/282—Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N2213/00—Details of stereoscopic systems
- H04N2213/003—Aspects relating to the "2D+depth" image format
Definitions
- an apparatus for decoding comprising means for receiving an encoded bitstream; means for determining a viewing direction; means for determining parameters relating to distribution of reflected light ray for a three-dimensional point of a surface; means for decoding surface normal direction from a patch direction unit; means for computing a distribution for each viewing direction based on the determined viewing direction and the determined parameters; and means for using the computed distribution and the surface normal direction to render a patch.
- parameters and the surface normal direction are determined at a three-dimensional point of a non-diffuse patch.
- information on the surface normal direction is encoded to a patch description unit (PDU).
- PDU patch description unit
- parameters relating to distribution of reflected light are determined by von Mises-Fisher (vMF) distribution.
- vMF von Mises-Fisher
- Fig. 4 shows an example of a de-compression process of a volumetric video
- Fig. 8 shows examples of vMF distributions with different concentration values
- Fig. 10 shows an example of patch normal direction and viewing camera direction that sample vMF distributions
- a video codec comprises an encoder that transforms the input video into a compressed representation suited for storage/transmission, and a decoder that can un-compress the compressed video representation back into a viewable form.
- An encoder may discard some information in the original video sequence in order to represent the video in a more compact form (i.e. at lower bitrate).
- Figure 1 illustrates an encoding process of an image as an example.
- Figure 2 illustrates a predicted representation of an image block (P’ n ); a reconstructed prediction error signal (D’ n ); a preliminary reconstructed image (l’ n ); a final reconstructed image (R’ n ); an inverse transform (T -1 ); an inverse quantization (Q -1 ); an entropy decoding (E -1 ); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F).
- Volumetric video data represents a three-dimensional scene or object and can be used as input for AR (Augmented Reality), VR (Virtual Reality) and MR (Mixed Reality) applications.
- Such data describes geometry (shape, size, position in 3D-space) and respective attributes (e.g. colour, opacity, reflectance, ...), plus any possible temporal changes of the geometry and attributes at given time instances (like frames in 2D video).
- Volumetric video is either generated from 3D models, i.e. CGI (Computer Generated Imagery), or captured from real-world scenes using a variety of capture solutions, e.g. multi camera, laser scan, combination of video and dedicated depth sensors, and more. Also, a combination of CGI and real-world data is possible.
- CGI Computer Generated Imagery
- Representation formats for such volumetric data comprise, for example, triangle meshes, point clouds, or voxel.
- Temporal information about the scene can be included in the form of individual capture instances, i.e. “frames” in 2D video, or other means, e.g. position of an object as a function of time.
- volumetric video describes a 3D scene (or object), such data can be viewed from any viewpoint. Therefore, volumetric video is an important format for any AR, VR, or MR applications, especially for providing 6DOF viewing capabilities.
- 3D data acquisition devices has enabled reconstruction of highly detailed volumetric video representations of natural scenes.
- Infrared, lasers, time-of-flight and structured light are all examples of devices that can be used to construct 3D video data.
- Representation of the 3D data depends on how the 3D data is used.
- Dense Voxel arrays have been used to represent volumetric medical data.
- polygonal meshes are extensively used.
- Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold.
- Another way to represent 3D data is coding this 3D data as set of texture and depth map as is the case in the multi-view plus depth.
- Projecting volumetric models onto 2D planes allows for using standard 2D video coding tools with highly efficient temporal compression.
- coding efficiency is increased greatly.
- 6DOF capabilities are improved.
- Using several geometries for individual objects improves the coverage of the scene further.
- standard video encoding hardware can be utilized for real-time compression/decompression of the projected planes. The projection and reverse projection steps are of low complexity.
- Figure 3 illustrates an overview of an example of a compression process of a volumetric video. Such process may be applied for example in MPEG Point Cloud Coding (PCC).
- PCC MPEG Point Cloud Coding
- the process starts with an input point cloud frame 301 that is provided for patch generation 302, geometry image generation 304 and texture image generation 305.
- the patch generation 302 process aims at decomposing the point cloud into a minimum number of patches with smooth boundaries, while also minimizing the reconstruction error.
- the normal at every point can be estimated.
- An initial clustering of the point cloud can then be obtained by associating each point with one of the following six oriented planes, defined by their normals:
- the initial clustering may then be refined by iteratively updating the cluster index associated with each point based on its normal and the cluster indices of its nearest neighbours.
- the final step may comprise extracting patches by applying a connected component extraction procedure.
- the patch location is determined through an exhaustive search that is performed in raster scan order.
- the first location that can guarantee an overlapping-free insertion of the patch is selected and the grid cells covered by the patch are marked as used. If no empty space in the current resolution image can fit a patch, then the height H of the grid may be temporarily doubled, and search is applied again. At the end of the process, H is clipped so as to fit the used grid cells.
- the geometry image generation 304 and the texture image generation 305 are configured to generate geometry images and texture images respectively.
- the image generation process may exploit the 3D to 2D mapping computed during the packing process to store the geometry and texture of the point cloud as images.
- each patch may be projected onto two images, referred to as layers.
- H(u, y) be the set of points of the current patch that get projected to the same pixel (u, v).
- the first layer also called a near layer, stores the point of H(u, v) with the lowest depth DO.
- the second layer referred to as the far layer, captures the point of H(u, v) with the highest depth within the interval [DO, D0+AJ, where A is a user-defined parameter that describes the surface thickness.
- the generated videos may have the following characteristics:
- the geometry images and the texture images may be provided to image padding 307.
- the image padding 307 may also receive as an input an occupancy map (OM) 306 to be used with the geometry images and texture images.
- the occupancy map 306 may comprise a binary map that indicates for each cell of the grid whether it belongs to the empty space or to the point cloud.
- the occupancy map (OM) may be a binary image of binary values where the occupied pixels and non-occupied pixels are distinguished and depicted respectively.
- the occupancy map may alternatively comprise a non-binary image allowing additional information to be stored in it. Therefore, the representative values of the DOM (Deep Occupancy Map) may comprise binary values or other values, for example integer values. It should be noticed that one cell of the 2D grid may produce a pixel during the image generation process. Such an occupancy map may be derived from the packing process 303.
- the following metadata may be encoded/decoded for every patch: - index of the projection plane o Index 0 for the planes (1.0, 0.0, 0.0) and (-1.0, 0.0, 0.0) o Index 1 for the planes (0.0, 1.0, 0.0) and (0.0, -1.0, 0.0) o Index 2 for the planes (0.0, 0.0, 1.0) and (0.0, 0.0, -1.0) - 2D bounding box (uO, vO, ul, vl) 3D location (xO, yO, zO) of the patch represented in terms of depth 50, tangential shift sO and bitangential shift rO.
- (dq, sO, rO) may be calculated as follows: o Index o Index o Index o Index
- mapping information providing for each TxT block its associated patch index may be encoded as follows: - For each TxT block, let L be the ordered list of the indexes of the patches such that their 2D bounding box contains that block. The order in the list is the same as the order used to encode the 2D bounding boxes. L is called the list of candidate patches.
- the empty space between patches is considered as a patch and is assigned the special index 0, which is added to the candidate patches list of all the blocks.
- the occupancy map consists of a binary map that indicates for each cell of the grid whether it belongs to the empty space or to the point cloud.
- One cell of the 2D grid produces a pixel during the image generation process.
- the compression process may comprise one or more of the following example operations:
- Binary values may be associated with BOxBO sub-blocks belonging to the same TxT block.
- a value 1 associated with a sub-block if it contains at least a non-padded pixel, and 0 otherwise. If a sub-block has a value of 1 it is said to be full, otherwise it is an empty sub-block.
- a binary information may be encoded for each TxT block to indicate whether it is full or not.
- an extra information indicating the location of the full/empty sub-blocks may be encoded as follows: o Different traversal orders may be defined for the sub-blocks, for example horizontally, vertically, or diagonally starting from top right or top left corner o The encoder chooses one of the traversal orders and may explicitly signal its index in the bitstream. o The binary values associated with the sub-blocks may be encoded by using a run-length encoding strategy.
- FIG. 4 illustrates an overview of a de-compression process for MPEG Point Cloud Coding (PCC).
- a de-multiplexer 401 receives a compressed bitstream, and after de-multiplexing, provides compressed texture video and compressed geometry video to video decompression 402.
- the de-multiplexer 401 transmits compressed occupancy map to occupancy map decompression 403. It may also transmit a compressed auxiliary patch information to auxiliary patch-info compression 404.
- Decompressed geometry video from the video decompression 402 is delivered to geometry reconstruction 405, as are the decompressed occupancy map and decompressed auxiliary patch information.
- the point cloud geometry reconstruction 405 process exploits the occupancy map information in order to detect the non-empty pixels in the geometry/texture images/layers. The 3D positions of the points associated with those pixels may be computed by leveraging the auxiliary patch information and the geometry images.
- the reconstructed geometry image may be provided for smoothing 406, which aims at alleviating potential discontinuities that may arise at the patch boundaries due to compression artifacts.
- the implemented approach moves boundary points to the centroid of their nearest neighbours.
- the smoothed geometry may be transmitted to texture reconstruction 407, which also receives a decompressed texture video from video decompression 402.
- the texture reconstruction 407 outputs a reconstructed point cloud.
- the texture values for the texture reconstruction are directly read from the texture images.
- V3C Visual volumetric video-based Coding
- ISO/I EC 23090-5 (formerly V-PCC (Video-based Point Cloud Compression)) and ISO/IEC 23090-12 (formerly MIV (MPEG Immersive Video)).
- V3C will not be issued as a separate document, but as part of ISO/IEC 23090-5 (expected to include clauses 1-8 of the current V-PCC text).
- ISO/IEC 23090-12 will refer to this common part.
- ISO/IEC 23090-5 will be renamed to V3C PCC, ISO/IEC 23090-12 renamed to V3C MIV.
- MIV relates to the compression of immersive video content, also known as volumetric video, in which a real or virtual 3D scene is captured by multiple real or virtual cameras. MIV enables storage and distribution of immersive video content over existing and future networks, for playback with 6 degrees of freedom (6 DoF) of view position and orientation within a limited viewing space and with different fields of view depending on the capture setup.
- 6 DoF degrees of freedom
- Figure 5 shows an example of an encoding process in the MIV extension of V3C.
- the encoding process comprises preparation of source material; per- group encoding; bitstream formatting and video encoding.
- the source material comprises source views 500, including view parameters, geometry component, attribute components, and optionally also entity map.
- the source material is processed by geometry quality assessment 505; split source in groups 510; synthesize inpainted background 515; and view labelling 520.
- the groups are encoded 525, the details of which - according to an example - are given in Figure 6.
- Bitstream formatting 530 is performed on parameter set, view parameters list and atlas data.
- Video sub bitstream encoding 550 is based on raw geometry, attribute and occupancy video data, which - after encoding - are packed 535 and multiplexed 540 with formatted bitstream 530.
- Figure 6 shows a detailed example of the encoding process of Figure 5 (element 525).
- the process comprises automatic parameter selection 605; and optionally separation to entity layers 610.
- the encoding comprises pixel pruning 615 and aggregating pruning masks 620.
- the clusters are split 630; patches are packed 635; patch attribute average value is modified 640; and color correction 645 is optionally performed.
- video data is generated 650; geometry is quantized 655 and scaled 660.
- occupancy is scaled 665.
- Figure 7 shows an example of a Tenderer of the MIV extension of V3C. The rendering is based on decoded access unit 700.
- the entity filtering 705 is optionally performed, followed by patch culling 710.
- Reconstruction process comprises occupancy reconstruction 715; optional attribute average value restoration 725; which together with the output of patch culling are reconstructed as pruned view 720.
- Geometry processes comprises optional geometry scaling 730; optional depth value decoding 735 and optional depth estimation 740.
- view synthesis reconstructed pruned views are unprojected to global coordinate system 745 and reprojected and merged into a viewport 750.
- inpainting 755 and view space handling 760 are performed.
- the present embodiments are related generally to volumetric video coding and more specifically MPEG-I Immersive Video (MIV).
- MIV MPEG-I Immersive Video
- the goal of MIV is to project regions of the 3D scene into 2D patches, organize these patches into atlases, and provide the required metadata along with coded video bitstreams which would enable a client to synthesize dense views in a so- called “viewing volume”.
- a viewer can observe the surrounding scene from any position and angle within the viewing volume.
- a viewer can observe the surrounding scene from any position and angle within the viewing volume.
- the fact that the viewer can freely move and rotate his/her head and observe the correctly rendered scene at any time in a fluid manner is referred to as a “6- DOF” (six degrees of freedom) experience.
- the aim of the present embodiments is to provide view-dependent patches for 6-DOF (Degrees-of-Freedom) experiences supporting specular content.
- MIV MPEG-I Immersive Video
- the solution based on lumigraph (1 ) uses a proxy geometry instead of depth maps, and encodes viewmaps for each proxy geometry facet. These viewmaps are used to derive blending weights that favour the cameras that are the most aligned with the virtual view.
- the issues are related to the memory footprint of these viewmaps and problems of continuity. For rotations, when the blending weights would change from a favoured camera view to another one, very noticeable transitions are observed. Especially, if the number of input cameras is low.
- the approach is also not scalable with the number of input cameras in terms of compression, streaming or rendering.
- a solution related to lightfield (2) based rendering can represent specular effects. This is done by creating a very dense set of input views (thousands of views compared to 15 to 25 in MIV), that must be arranged on a sphere, and a proxy mesh is generated for all these views. Because the inter camera distances is small and the viewing volume inside the sphere of cameras is small as well, specular features can be rendered with high fidelity, by blending views with a disk-based approach that always favours the camera that is the most aligned with the virtual view to synthesize. The issue with this approach is the amount of input views that is two orders of magnitude large than for MIV, which severely limits compression and streaming experiences. Using less dense input cameras causes blending artifacts with multiple noticeable reflections.
- the present embodiments propose adding a level of modelling.
- BRDFs Birectional Reflectance Distribution Functions
- - evaluate parameters for example BRDF, defining the relation between the incoming and outgoing radiances at a given point P on one or more surface patches.
- the parameters can be estimated e.g. by using Von Mises-Fisher distribution; encode the vMF parameters or other parameters relating to BRDF for each non-diffuse patch;
- This surface normal direction can be estimated in many ways, for example, as the average of the patch vertices or face normal. - incorporating the vMF parameters or other parameters relating to BRDF, to a metadata.
- a BRDF is used as an example.
- the BRDF describes the distribution of reflected light ray in particular direction with respect to the incident light.
- the light sources and BRDFs are unknown in the general context. Therefore, the estimation of a BRDF approximation may be contributed by fitting, for example, a vMF distribution by addressing an inverse problem.
- the glossy BRDF is treated at a surface point as a directional distribution modelled with a spherical vMF distribution.
- the vMF is an isotropic distribution for directional data and statistics, and a generalization of the von Mises Distribution to higher dimensions.
- a “point” used in the disclosure refers to a three-dimensional (3D) point in a space (as in point clouds), as well as its projection in camera views (pixel in multiview and depth).
- a BRDF approximation may be encoded at a patch level with a vMF distribution as well as the surface normal direction of patch. Based on these metadata, the client can perform a per-pixel blending with non-linearly estimated weights.
- MIV and for Video Point Cloud Coding in V3C Video Point Cloud Coding
- vMF is best fit for spherical data.
- the light distribution can be modelled using vMF.
- the vMF parameter can be estimated with simple non-biased estimators.
- the BRDF of a point on a surface can be modelled using a vMF or a mixture of vMFs.
- Equation 2 The mean direction ( m ) and concentration (k) are estimated as follows where Xi are un-normalized directions of observations (in Equation 2 x, ⁇ , X, m are vectors). (Equation 2) and approximation of concentration parameter is given by (in Equation 3, k and R are scalar)
- Equation 3 Higher value of the concentration (k) indicates that the point is more specular, and lower value of the concentration (k) indicates that the point belongs to a diffuse surface as shown in Figure 8 showing examples of vMF distributions with different concentration (k) values from diffuse to increasingly specular.
- the arrows 801 in each example illustrates the mean direction ( m ).
- vMF parameter evaluation can be performed either per point (or projected point as a pixel) or per patch, based on the 3D location of the point and its colours observed from the multiple input camera texture and depth values.
- the colour of the point is the observation of how the light in the scene has been reflected by that point in the viewing direction of the camera.
- a point or a patch BRDF can be represented with a
- Multiple lobes can be due to multiple reflections, such as inter-reflections in a scene.
- additional merging of the lobes may be necessary to simplify the representation and its estimation.
- the vMF distribution and BRDF in general can be defined in a point or patch local coordinate system, where the z-axis is the point or patch normal direction, and the x-axis and y-axis are defined and chosen on the point or patch tangential plane.
- the vMF parameters can also be stored in the world coordinates of the scene, but processing the representation requires the knowledge of the point or patch normal direction in any case.
- Patch parameter estimation is described in below with reference to MIV and to point cloud coding, which are alternatives to each other.
- an MIV encoder receives multiple camera views with depth, as well as camera extrinsic and intrinsic information.
- the intrinsic and extrinsic parameters may comprise one or more of the following: a focal length, an image sensor format, a principal point, an orientation of the camera, i.e., a position of the camera center and the direction of the camera.
- the encoder is configured to compare the received views to each other in order to classify points into diffuse or specular. This may be done by comparing each pair of views and identify for points which are unprojected on the same 3D location up to some threshold and which exhibit different colours and light intensity in the various input cameras where it is visible. These points are then classified as specular. If these points share the same colour or light intensity in all views up to a threshold, they are classified as diffuse. For each point, a normal direction is also estimated and stored per view. Points that are classified as specular are then aggregated into masks and merged into specular patches for each view.
- an encoder receives surface light field (SLF)/view dependent point cloud, that is a point cloud with more than single colour attribute n (n>1), representing different viewing angles.
- SLF surface light field
- Different activity regions are analyzed at the encoder for example “medium”, “high” and “low”. Comparing luma values is one of the options, where the encoder may compare the luma values Y of a 3D point p(X, Y, Z) observed from all the available cameras CO, C1, ... Cn. Based on this analysis, each point in a point cloud is classified to one of three different activity areas (low, medium and high).
- the V-PCC encoder adds an extra constraint to create the patch that belongs to the same activity region and not create the patch belonging to different activity regions, i.e., every patch is either or high, medium or low SLF activity.
- vMF parameters are evaluated on the segmented patches as defined in Equation 2 and Equation 3.
- Per patch colour selection Any number of colours can be selected per patch. For simplicity and in this solution, only two colours are evaluated. Colour selection can be o maximum colour cMax and minimum colour cMin per patch. o or other colour selection criteria such as based on reconstruction and error minimization can be followed.
- Selected colours are packed as video channel and encoded.
- a patch data unit syntax for signalling vMF parameters, or other parameters relating to BRDF, is given in the following:
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Library & Information Science (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
Les modes de réalisation concernent un procédé de codage, comprenant les étapes suivantes : réception d'entrées relatives à un contenu tridimensionnel (1110) ; génération d'un ou de plusieurs patchs bidimensionnels à partir des entrées (1120) ; détermination de paramètres relatifs à la distribution du rayon lumineux réfléchi dans une direction par rapport à la lumière incidente définissant une relation entre des luminances entrantes et sortantes au niveau d'un point tridimensionnel d'une surface (1130) ; détermination de la direction normale de surface d'un patch (1140) ; incorporation des paramètres déterminés et d'informations sur la direction normale de surface déterminée à des métadonnées (1150) ; et association des métadonnées à un flux binaire codé (1160). Les modes de réalisation concernent également un procédé de décodage, et des appareils pour mettre en œuvre les procédés.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FI20215433 | 2021-04-13 | ||
FI20215433 | 2021-04-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022219230A1 true WO2022219230A1 (fr) | 2022-10-20 |
Family
ID=83640198
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FI2022/050084 WO2022219230A1 (fr) | 2021-04-13 | 2022-02-11 | Procédé, appareil et produit-programme d'ordinateur de codage vidéo et de décodage vidéo |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022219230A1 (fr) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200351484A1 (en) * | 2019-04-30 | 2020-11-05 | Nokia Technologies Oy | Apparatus, a method and a computer program for volumetric video |
-
2022
- 2022-02-11 WO PCT/FI2022/050084 patent/WO2022219230A1/fr active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200351484A1 (en) * | 2019-04-30 | 2020-11-05 | Nokia Technologies Oy | Apparatus, a method and a computer program for volumetric video |
Non-Patent Citations (5)
Title |
---|
ANONYMOUS: "MPEG-I VIDEO CODING SUBGROUP Information technology-Coded Representation of Immersive Media - Part 12: Immersive Video, w20001", 133RD MEETING OF THE MPEG. ISO/IEC JTC 1/SC 29/WG 04, 30 January 2021 (2021-01-30), Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/133-OnLine/wg11/MDS20001-WG04_N00049.zip> [retrieved on 20220515] * |
HAN, C. ET AL.: "Frequency Domain Normal Map Filtering. In: SIGGRAPH 2007", ACM, 29 July 2007 (2007-07-29), XP058336086, Retrieved from the Internet <URL:https://dl.acm.org/doi/10.1145/1275808.1276412> [retrieved on 20220515], DOI: 10.1145/1275808.1276412 * |
NAIK, D. ET AL.: "Surface Lightfield Support in Video-based Point Cloud Coding", 2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP, 16 December 2020 (2020-12-16), pages 1 - 6, XP055982435, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/document/9287115> [retrieved on 20220515], DOI: 10.1109/MMSP48831.2020.9287115 * |
RONDAO-ALFACE, P. ET AL.: "Multiple Texture Patches Per Geometry Patch, m55977", 133RD MEETING OF THE MPEG. ISO/IEC JTC1/SC29/WG 04, 12 January 2021 (2021-01-12), XP030290853, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/133-OnLine/wg11/mSS977-v3-m55977_Multiple_Texture_Patches_Per_Geometry_Patch_revised.zip> [retrieved on 20220515] * |
SALAHIEH, B. ET AL.: "Test Model 8 for MPEG Immersive Video, w20002", 133RD MEETING OF THE MPEG. ISO/IEC JTC 1/SC 29/WG 4, 30 January 2021 (2021-01-30), XP030293031, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/133_OnLine/wg11/MDS20002_WG04_N00050-v2.zip>> [retrieved on 20220515] * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3751857A1 (fr) | Procédé, appareil et produit programme informatique de codage et décodage de vidéos volumétriques | |
US11599968B2 (en) | Apparatus, a method and a computer program for volumetric video | |
US11509933B2 (en) | Method, an apparatus and a computer program product for volumetric video | |
US11202086B2 (en) | Apparatus, a method and a computer program for volumetric video | |
US20230068178A1 (en) | A method, an apparatus and a computer program product for volumetric video encoding and decoding | |
US12101457B2 (en) | Apparatus, a method and a computer program for volumetric video | |
WO2019034807A1 (fr) | Codage et décodage séquentiels de vidéo volumétrique | |
US12096027B2 (en) | Method, an apparatus and a computer program product for volumetric video encoding and decoding | |
WO2021260266A1 (fr) | Procédé, appareil et produit-programme informatique pour codage vidéo volumétrique | |
EP4032261A1 (fr) | Modèle de compression de nuage de points sur la base de vidéo pour des informations de signalisation mondiale | |
WO2021191495A1 (fr) | Procédé, appareil et produit-programme d'ordinateur pour codage vidéo et décodage vidéo | |
EP3729805B1 (fr) | Procédé et appareil de codage et de décodage de données vidéo volumétriques | |
WO2021205068A1 (fr) | Procédé, appareil et produit-programme informatique pour codage vidéo volumétrique | |
EP4162691A1 (fr) | Procédé, appareil et produit-programme informatique de codage et de décodage vidéo | |
WO2021191500A1 (fr) | Appareil, procédé et programme informatique pour vidéo volumétrique | |
EP3698332A1 (fr) | Appareil, procédé, et programme d'ordinateur pour vidéo volumétrique | |
WO2022219230A1 (fr) | Procédé, appareil et produit-programme d'ordinateur de codage vidéo et de décodage vidéo | |
WO2019185983A1 (fr) | Procédé, appareil et produit-programme d'ordinateur destinés au codage et au décodage de vidéo volumétrique numérique | |
WO2019211519A1 (fr) | Procédé et appareil de codage et de décodage de vidéo volumétrique | |
WO2022074286A1 (fr) | Procédé, appareil et produit-programme informatique de codage et de décodage vidéo | |
JP2024514066A (ja) | 光効果をサポートする容積ビデオ | |
Colleu | A floating polygon soup representation for 3D video | |
Dziembowski et al. | Test Model 15 for MPEG immersive video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22787684 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22787684 Country of ref document: EP Kind code of ref document: A1 |