WO2022074286A1 - Procédé, appareil et produit-programme informatique de codage et de décodage vidéo - Google Patents

Procédé, appareil et produit-programme informatique de codage et de décodage vidéo Download PDF

Info

Publication number
WO2022074286A1
WO2022074286A1 PCT/FI2021/050630 FI2021050630W WO2022074286A1 WO 2022074286 A1 WO2022074286 A1 WO 2022074286A1 FI 2021050630 W FI2021050630 W FI 2021050630W WO 2022074286 A1 WO2022074286 A1 WO 2022074286A1
Authority
WO
WIPO (PCT)
Prior art keywords
patches
atlas
information
atlas images
images
Prior art date
Application number
PCT/FI2021/050630
Other languages
English (en)
Inventor
Payman AFLAKI-BENI
Sebastian Schwarz
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2022074286A1 publication Critical patent/WO2022074286A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/172Processing image signals image signals comprising non-image signal components, e.g. headers or format information
    • H04N13/178Metadata, e.g. disparity information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2213/00Details of stereoscopic systems
    • H04N2213/003Aspects relating to the "2D+depth" image format

Definitions

  • the present solution generally relates to encoding and decoding of digital volumetric video.
  • new image and video capture devices are available. These devices are able to capture visual and audio content all around them, i.e. they can capture the whole angular field of view, sometimes referred to as 360 degrees field of view. More precisely, they can capture a spherical field of view (i.e., 360 degrees in all spatial directions).
  • new types of output technologies have been invented and produced, such as head-mounted displays. These devices allow a person to see visual content all around him/her, giving a feeling of being “immersed” into the scene captured by the 360 degrees camera.
  • the new capture and display paradigm, where the field of view is spherical is commonly referred to as virtual reality (VR) and is believed to be the common way people will experience media content in the future.
  • VR virtual reality
  • volumetric video For volumetric video, a scene may be captured using one or more 3D (three- dimensional) cameras. The cameras are in different positions and orientations within a scene.
  • 3D three- dimensional
  • One issue to consider is that compared to 2D (two-dimensional) video content, volumetric 3D video content has much more data, so viewing it requires lots of bandwidth (with or without transferring it from a storage location to a viewing device): disk I/O, network traffic, memory bandwidth, GPU (Graphics Processing Unit) upload. Capturing volumetric content also produces a lot of data, particularly when there are multiple capture devices used in parallel. Summary
  • a method comprising:
  • a patch comprises a volumetric video data component
  • each atlas image includes the patches from the same temporal projection
  • the packed information comprises a set of parameters relating to said patches and/or atlas images to be shared among more than one patches/atlas images;
  • an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:
  • Atlas images including at least two patches, wherein each atlas image incudes the patches from the same temporal projection; - pack the information on several patches and/or atlas images, wherein the packed information comprises a set of parameters relating to said patches and/or atlas images to be shared among more than one patches/atlas images;
  • an apparatus comprising:
  • the packed information comprises a set of parameters relating to said patches and/or atlas images to be shared among more than one patches/atlas images;
  • a computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to:
  • each atlas image incudes the patches from the same temporal projection
  • the packed information comprises a set of parameters relating to said patches and/or atlas images to be shared among more than one patches/atlas images;
  • the packed information comprises a list of atlas image identifications and a list of atlas image information being shared by such atlas images.
  • the packed information comprises a list of patch identifications and a list of patch information being shared by such patches.
  • the packed information comprises rotation or width of patches to be shared.
  • a reference atlas image and/or reference region comprising the packing information is indicated in or along a bitstream.
  • Atlas image packed information is communicated prior to a first atlas image included in the list of atlas images.
  • the packed information for the atlas images is the shared tile information.
  • Fig. 1 shows an example of an encoding process
  • Fig. 2 shows an example of a decoding process
  • Fig. 3 shows an example of a compression process of a volumetric video
  • Fig. 4 shows an example of a de-compression process of a volumetric video
  • Fig. 5 is a flowchart illustrating a method according to an embodiment
  • Fig. 6 shows an apparatus according to an embodiment.
  • volumetric video encoding and decoding In the following, several embodiments will be described in the context of volumetric video encoding and decoding.
  • the several embodiments enable packing and signaling volumetric video in one video component.
  • a video codec comprises an encoder that transforms the input video into a compressed representation suited for storage/transmission, and a decoder that can un-compress the compressed video representation back into a viewable form.
  • An encoder may discard some information in the original video sequence in order to represent the video in a more compact form (i.e. at lower bitrate).
  • Figure 1 illustrates an encoding process of an image as an example.
  • Figure 1 shows an image to be encoded (l n ); a predicted representation of an image block (P’ n ); a prediction error signal (D n ); a reconstructed prediction error signal (D’ n ); a preliminary reconstructed image (l’ n ); a final reconstructed image (R’ n ); a transform (T) and inverse transform (T’ 1 ); a quantization (Q) and inverse quantization (Q’ 1 ); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pinter); intra prediction (Pintra); mode selection (MS) and filtering (F).
  • An example of a decoding process is illustrated in Figure 2.
  • Figure 2 illustrates a predicted representation of an image block (P’ n ); a reconstructed prediction error signal (D’ n ); a preliminary reconstructed image (l’n); a final reconstructed image (R’ n ); an inverse transform (T 1 ); an inverse quantization (Q -1 ); an entropy decoding (E -1 ); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F).
  • Volumetric video refers to a visual content that may have been captured using one or more three-dimensional (3D) cameras. When multiple cameras are in use, the captured footage is synchronized so that the cameras provide different viewpoints to the same world.
  • volumetric video describes a 3D model of the world where the viewer is free to move and observe different parts of the world.
  • Volumetric video enables the viewer to move in six degrees of freedom (6DOF): in contrast to common 360° video, where the user has from 2 to 3 degrees of freedom (yaw, pitch, and possibly roll), a volumetric video represents a 3D volume of space rather than a flat image plane.
  • 6DOF degrees of freedom
  • Volumetric video frames contain a large amount of data because they model the contents of a 3D volume instead of just a two- dimensional (2D) plane. However, only a relatively small part of the volume changes over time. Therefore, it may be possible to reduce the total amount of data by only coding information about an initial state and changes which may occur between frames.
  • Volumetric video can be rendered from synthetic 3D animations, reconstructed from multi-view video using 3D reconstruction techniques such as structure from motion, or captured with a combination of cameras and depth sensors such as LiDAR (Light Detection and Ranging), for example.
  • Volumetric video data represents a three-dimensional scene or object, and can be used as input for AR (Augmented Reality), VR (Virtual Reality) and MR (Mixed Reality) applications.
  • Such data describes geometry (shape, size, position in three- dimensional space) and respective attributes (e.g. color, opacity, reflectance, ...), plus any possible temporal changes of the geometry and attributes at given time instances (like frames in two-dimensional (2D) video).
  • Volumetric video is either generated from three-dimensional (3D) models, i.e. CGI (Computer Generated Imagery), or captured from real-world scenes using a variety of capture solutions, e.g. multi-camera, laser scan, combination of video and dedicated depth sensors, and more.
  • CGI Computer Generated Imagery
  • volumetric data comprises triangle meshes, point clouds, or voxel.
  • Temporal information about the scene can be included in the form of individual capture instances, i.e. “frames” in 2D video, or other means, e.g. position of an object as a function on time.
  • volumetric video describes a 3D scene (or object), such data can be viewed from any viewpoint. Therefore, volumetric video is an important format for any AR, VR or MR applications, especially for providing 6DOF viewing capabilities.
  • 3D data acquisition devices has enabled reconstruction of highly detailed volumetric video representations of natural scenes.
  • Infrared, lasers, time-of-flight and structured light are all examples of devices that can be used to construct 3D video data.
  • Representation of the 3D data depends on how the 3D data is used.
  • Dense Voxel arrays have been used to represent volumetric medical data.
  • polygonal meshes are extensively used.
  • Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes, where the topology is not necessarily a 2D manifold.
  • Another way to represent 3D data is coding this 3D data as a set of texture and depth map as is the case in the multi-view plus depth. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multilevel surface maps.
  • the reconstructed 3D scene may contain tens or even hundreds of millions of points. If such representations are to be stored or interchanged between entities, then efficient compression becomes essential.
  • Standard volumetric video representation formats such as point clouds, meshes, voxel, suffer from poor temporal compression performance. Identifying correspondences for motion-compensation in 3D-spaces is an ill-defined problem, as both, geometry and respective attributes may change. For example, temporal successive “frames” do not necessarily have the same number of meshes, points or voxel. Therefore, compression of dynamic 3D scenes is inefficient. 2D-video based approaches for compressing volumetric data, i.e. multiview and depth, have much better compression efficiency, but rarely cover the full scene. Therefore, they provide only limited 6DOF capabilities.
  • a 3D scene represented as meshes, points, and/or voxel can be projected onto one, or more, geometries. These geometries are “unfolded” onto 2D planes (two planes per geometry: one for texture, one for depth), which are then encoded using standard 2D video compression technologies. Relevant projection geometry information is transmitted alongside the encoded video files to the decoder. The decoder decodes the video and performs the inverse projection to regenerate the 3D scene in any desired representation format (not necessarily the starting format).
  • Projecting volumetric models onto 2D planes allows for using standard 2D video coding tools with highly efficient temporal compression.
  • coding efficiency may be increased greatly.
  • 6DOF capabilities may be improved.
  • Using several geometries for individual objects improves the coverage of the scene further.
  • standard video encoding hardware can be utilized for real-time compression/de- compression of the projected planes. The projection and reverse projection steps are of low complexity.
  • Figure 3 illustrates an overview of an example of a compression process of a volumetric video. Such process may be applied for example in MPEG Point Cloud Coding (PCC).
  • PCC MPEG Point Cloud Coding
  • the process starts with an input point cloud frame 301 that is provided for patch generation 302, geometry image generation 304 and texture image generation 305.
  • the patch generation 302 process aims at decomposing the point cloud frame by converting 3D samples to 2D samples on a given projection plane using a strategy that provides the best compression.
  • the patch generation process aims at decomposing the point cloud into a minimum number of patches with smooth boundaries, while also minimizing reconstruction error.
  • the normal at every point can be estimated.
  • the tangent plane and its corresponding normal are defined per each point, based on the point’s nearest neighbors m within a predefined search distance.
  • the barycenter c is computed as follows:
  • the normal is estimated from eigen decomposition for the defined point cloud as:
  • each point is associated with a corresponding plane of a point cloud bounding box.
  • Each plane is defined by a corresponding normal n p with values:
  • each point may be associated with the plane that has the closest normal (i.e. maximizes the dot product of the point normal p .and the plane normal
  • the initial clustering may then be refined by iteratively updating the cluster index associated with each point based on its normal and the cluster indices of its nearest neighbors.
  • the next step may comprise extracting patches by applying a connected component extraction procedure.
  • Patch info determined at patch generation 302 for the input point cloud frame 301 is delivered to packing process 303, to geometry image generation 304 and to texture image generation 305.
  • the packing process 303 aims at generating the geometry and texture maps, by appropriately considering the generated patches and by trying to efficiently place the geometry or texture data that corresponds to each patch onto a 2D grid of size WxH. Such placement also accounts for a minimum size block TxT (e.g. 16 x 16), which specifies the minimum distance between distinct patches as placed on this 2D grid. It should be noticed that T may be a user-defined parameter. Parameter T may be encoded in the bitstream and sent to the decoder.
  • the packing method may use a search algorithm as follows:
  • patches may be placed on a 2D grid in a manner that would guarantee a non-overlapping insertion.
  • Samples belonging to a patch (rounded to a value that is a multiple of T) are considered as occupied blocks.
  • a safeguard between adjacent patches is forced to distance of at least one block being multiple of T.
  • Patches may be processed in an orderly manner, based on the patch index list. Each patch from the list is iteratively placed on the grid.
  • the grid resolution depends on the original point cloud size and its width (W) and height (H) may be encoded in the bitstream and transmitted to the decoder. In the case that there is no empty space available for the next patch the height value of the grid is initially doubled, and the insertion of this patch is evaluated again.
  • the height is trimmed to the minimum needed value. However, this value is not allowed to be set lower than the originally specified value in the encoder.
  • the final values for W and H correspond to the frame resolution that is used to encode the texture and geometry video signals using the appropriate video codec.
  • the geometry image generation 304 and the texture image generation 305 are configured to generate geometry images and texture images, respectively.
  • the image generation process may exploit the 3D to 2D mapping computed during the packing process to store the geometry and texture of the point cloud as images.
  • each patch may be projected onto two images, referred to as layers.
  • H(u, y) be the set of points of the current patch that get projected to the same pixel (u, v).
  • the first layer also called a near layer, stores the point of H(u, v) with the lowest depth DO.
  • the second layer referred to as the far layer, captures the point of H(u, v) with the highest depth within the interval [DO, D0+A], where is a user-defined parameter that describes the surface thickness.
  • the generated videos may have the following characteristics:
  • the geometry video is monochromatic.
  • the texture generation procedure exploits the reconstructed/smoothed geometry in order to compute the colors to be associated with the re-sampled points.
  • the geometry images and the texture images may be provided to image padding 307.
  • the image padding 307 may also receive as an input an occupancy map (OM) 306 to be used with the geometry images and texture images.
  • the occupancy map 306 may comprise a binary map that indicates for each cell of the grid whether it belongs to the empty space or to the point cloud.
  • the occupancy map (OM) may be a binary image of binary values where the occupied pixels and non-occupied pixels are distinguished and depicted, respectively.
  • the occupancy map may alternatively comprise a non-binary image allowing additional information to be stored in it. Therefore, the representative values of the DOM (Deep Occupancy Map) may comprise binary values or other values, for example integer values.
  • the padding process 307 aims at filling the empty space between patches in order to generate a piecewise smooth image suited for video compression. For example, in a simple padding strategy, each block of TxT (e.g. 16x16) pixels is compressed independently. If the block is empty (i.e. unoccupied, i.e. all its pixels belong to empty space), then the pixels of the block are filled by copying either the last row or column of the previous TxT block in raster order. If the block is full (i.e. occupied, i.e., no empty pixels), nothing is done. If the block has both empty and filled pixels (i.e. edge block), then the empty pixels are iteratively filled with the average value of their non-empty neighbors.
  • TxT e.g. 16x16
  • the padded geometry images and padded texture images may be provided for video compression 308.
  • the generated images/layers may be stored as video frames and compressed using for example the H.265 video codec according to the video codec configurations provided as parameters.
  • the video compression 308 also generates reconstructed geometry images to be provided for smoothing 309, wherein a smoothed geometry is determined based on the reconstructed geometry images and patch info from the patch generation 302.
  • the smoothed geometry may be provided to texture image generation 305 to adapt the texture images.
  • the patch may be associated with auxiliary information being encoded/decoded for each patch as metadata.
  • the auxiliary information may comprise index of the projection plane, 2D bounding box, 3D location of the patch.
  • Metadata may be encoded/decoded for every patch:
  • mapping information providing for each TxT block its associated patch index may be encoded as follows:
  • L be the ordered list of the indexes of the patches such that their 2D bounding box contains that block.
  • the order in the list is the same as the order used to encode the 2D bounding boxes.
  • L is called the list of candidate patches.
  • the empty space between patches is considered as a patch and is assigned the special index 0, which is added to the candidate patches list of all the blocks.
  • the occupancy map consists of a binary map that indicates for each cell of the grid whether it belongs to the empty space or to the point cloud.
  • One cell of the 2D grid produces a pixel during the image generation process.
  • the occupancy map compression 310 leverages the auxiliary information described in previous section, in order to detect the empty TxT blocks (i.e. blocks with patch index 0).
  • the remaining blocks may be encoded as follows:
  • the occupancy map can be encoded with a precision of a BOxBO blocks.
  • the compression process may comprise one or more of the following example operations:
  • Binary values may be associated with BOxBO sub-blocks belonging to the same TxT block.
  • a value 1 associated with a sub-block if it contains at least a non-padded pixel, and 0 otherwise. If a sub-block has a value of 1 it is said to be full, otherwise it is an empty sub-block.
  • a binary information may be encoded for each TxT block to indicate whether it is full or not.
  • an extra information indicating the location of the full/empty sub-blocks may be encoded as follows: o Different traversal orders may be defined for the sub-blocks, for example horizontally, vertically, or diagonally starting from top right or top left corner o The encoder chooses one of the traversal orders and may explicitly signal its index in the bitstream. o The binary values associated with the sub-blocks may be encoded by using a run-length encoding strategy.
  • An atlas is a collection of 2D bounding boxes, i.e. patches, projected into a rectangular frame that corresponds to a 3D bounding box in 3D space, which may be a subset of a point cloud.
  • the patch in the V-PCC notation is a rectangular region within an atlas, i.e. a collection of information that represents a 3D bounding box of the point cloud and associated geometry and attribute description along with the atlas information that is required to reconstruct the 3D point positions and their corresponding attributes from the 2D projections.
  • An atlas frame may be partitioned into tiles. The partitioned tiles may be presented as one or more tile rows and one or more tile columns.
  • a tile is a rectangular region of an atlas frame. The tiles can further be divided into tile groups. Only rectangular tile groups are supported. In this mode, a tile group contains a number of tiles of an atlas frame that collectively form a rectangular region of the atlas frame.
  • FIG. 4 illustrates an overview of a de-compression process for MPEG Point Cloud Coding (PCC).
  • a de-multiplexer 401 receives a compressed bitstream, and after de-multiplexing, provides compressed texture video and compressed geometry video to video decompression 402.
  • the de-multiplexer 401 transmits compressed occupancy map to occupancy map decompression 403. It may also transmit a compressed auxiliary patch information to auxiliary patch-info compression 404.
  • Decompressed geometry video from the video decompression 402 is delivered to geometry reconstruction 405, as are the decompressed occupancy map and decompressed auxiliary patch information.
  • the point cloud geometry reconstruction 405 process exploits the occupancy map information in order to detect the non-empty pixels in the geometry/texture images/layers. The 3D positions of the points associated with those pixels may be computed by leveraging the auxiliary patch information and the geometry images.
  • the reconstructed geometry image may be provided for smoothing 406, which aims at alleviating potential discontinuities that may arise at the patch boundaries due to compression artifacts.
  • the implemented approach moves boundary points to the centroid of their nearest neighbors.
  • the smoothed geometry may be transmitted to texture reconstruction 407, which also receives a decompressed texture video from video decompression 402.
  • the texture reconstruction 407 outputs a reconstructed point cloud.
  • the texture values for the texture reconstruction are directly read from the texture images.
  • the point cloud geometry reconstruction process exploits the occupancy map information in order to detect the non-empty pixels in the geometry/texture images/layers.
  • the 3D positions of the points associated with those pixels are computed by levering the auxiliary patch information and the geometry images. More precisely, let P be the point associated with the pixel (u, v) and let (30, sO, rO) be the 3D location of the patch to which it belongs and (uO, vO, ul, vl) its 2D bounding box.
  • the texture values can be directly read from the texture images.
  • the result of the decoding process is a 3D point cloud reconstruction.
  • One way to compress a time-varying volumetric scene/object is to project 3D surfaces on to some number of pre-defined 2D planes. Regular 2D video compression algorithms can then be used to compress various aspects of the projected surfaces. Such projection is presented using different patches. Each set of patches may represent a specific object or specific parts of a scene. One or more patches may create a tile and it is possible to create a group of tiles, including one or more tiles.
  • V3C Visual volumetric video-based Coding
  • ISO/IEC 23090-5 (formerly V-PCC (Video-based Point Cloud Compression)) and ISO/IEC 23090-12 (formerly MIV (MPEG Immersive Video)).
  • V3C will not be issued as a separate document, but as part of ISO/IEC 23090-5 (expected to include clauses 1 -8 of the current V-PCC text).
  • ISO/IEC 23090-12 will refer to this common part.
  • ISO/IEC 23090-5 is expected to be renamed to V3C PCC, ISO/IEC 23090-12 renamed to V3C MIV.
  • a vpcc_unit consists of header and payload pairs. Below is the syntax for vpcc_units and vpcc_unit_header structures.
  • V3C MIV comprises a parameter vme_packed_video_present_flag which enables packing of some general information of each atlas.
  • vme_packed_video_present_flag[ j ] 0 indicates that the atlas with ID j does not have packed data.
  • vps_packed_video_present_flag[ j ] 1 indicates that the atlas with ID j has packed data.
  • vps_packed_video_present_flag[ j ] is not present, it is inferred to be equal to 0.
  • the present embodiments target introducing a set of parameters to be included as packed information to share specific characteristics of the patches and/or atlas image in the atlas image or tile. Such information is similar or identical among the patches and/or atlas images and hence, there is no need to send it separately for each patch and/or atlas image.
  • the present embodiments consider that at least two patches are present in the current atlas.
  • the present embodiments introduce a set of parameters to be included in the packed information for each atlas image, where the parameters belong to the patches and/or atlas images and are similar between them.
  • the present embodiments have advantages, since they reduce the amount of overhead sent for each atlas image or group of images.
  • the packed information may share the same information on patches among all patches that have similar criteria. Similarly, the packed information may share the same information on atlas images among all atlas images that have similar criteria.
  • the atlas bitstream may contain a signal that informs the decoder whether or not the packed information exists, wherein the packed information is packed patch information or packed atlas image information.
  • the packed information is packed patch information or packed atlas image information.
  • the flag indicating the similar criteria for atlas images for a group of pictures may be assigned for a specific number of atlas images, e.g. 16 or 32. Alternatively, it may follow the number of images included in an encoding GOP decided at the encoder side. Alternatively, it can be adaptive to the content, meaning that it can continue until a certain criterion is met.
  • the criterion may be sudden change in the content, radical rotation of an object, big change in lamination of the scene, object entering or exiting from the scene, scene cuts, etc.
  • the signal will be received in the decoder side and respective packed information will be used in the respective patches or tile images.
  • the packed information presented in V3C MIV standard includes some of the information related to patches/atlas images to be shared among all or specific number of patches/atlas images e.g. rotation or width of patches to be shared. Therefore, in the packed information, there may be a list of patch entitylDs that share one or more specific criteria. Alternatively, it may be assigned that one or more specific criteria are shared among some patches which are presented with their patch entitylDs.
  • the present embodiments are not limited to patches and the packed information may belong to tiles or in general atlas images are well.
  • the packed information belongs to tiles, the packed information is defined for two or more tiles in an atlas image.
  • the packed information belongs to atlas images, then the packed information is defined per sequence of atlas images, e.g. one GOP (group of pictures).
  • the packed information proposed in the present disclosure only belong to patches.
  • the packed information include, but are not limited to the following (the syntax elements which have a descriptor assigned to them):
  • the packed information included may represent the atlas image information which is packed to be shared between a series of atlas images.
  • the atlas image packed information should be communicated prior to the first atlas image included in the list of atlas images with packed information.
  • the packed information include but are not limited to the following: atgdu_patch_mode[ p ]
  • the atlas images may share similar tile information.
  • the packed information includes, but are not limited to, the following (the syntax elements which have a descriptor assigned to them): General Supplemental enhancement information message syntax is presented in below:
  • the aforementioned parameters will be included in the respective presentation of packed information, i.e. atlas level (for patch information) or sequence of pictures level (for atlas images information).
  • Atlas images are going to share the respective packed atlas image information; or similarly, which patches are going to share the respective packed patch information.
  • pi_ref_atlas_id [ j ] indicates the reference atlas to get packed information from. If pi_ref_atlas_id [ j ] is equal to j then new packed information is signalled. Otherwise, packed information is copied from the packed information for the atlas with the indicated index.
  • pi_ref_region_id [ i ] indicates the reference region to get packed information from. If pi_ref_region_id [ i ] is equal to i then new packed information is signalled. Otherwise, packed information is copied from the packed information for the region with the indicated region id.
  • the method generally comprises receiving 510 as an input a volumetric video frame comprising volumetric content; projecting 520 the volumetric content to at least two patches in a temporal order, wherein a patch comprises a volumetric video data component; creating 530 atlas images including at least two patches, wherein each atlas image includes the patches from the same temporal projection; packing 540 the information on several patches and/or atlas images , wherein the packed information comprises a set of parameters relating to said patches and/or atlas images to be shared among more than one patches/atlas images; signaling 550, in or along a bitstream, at least an indication that the atlas images comprise packed information; and transmitting 560 the encoded bitstream to a storage for rendering.
  • Each of the steps can be implemented by a respective
  • An apparatus comprises means for receiving as an input a volumetric video frame comprising volumetric content; means for projecting the volumetric content to at least two patches in a temporal order, wherein a patch comprises a volumetric video data component; means for creating atlas images including at least two patches, wherein each atlas image includes the patches from the same temporal projection; means for packing the information on several patches and/or atlas images, wherein the packed information comprises a set of parameters relating to said patches and/or atlas images to be shared among more than one patches/atlas images; means for signaling, in or along a bitstream, at least an indication that the atlas images comprise packed information; and means for transmitting the encoded bitstream to a storage for rendering.
  • the means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry.
  • the memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Figure 5 according to various embodiments.
  • the various embodiments may provide advantages. For example, the present embodiments enable reducing the required bitrate to signal the patch information. In addition, the present embodiments enable reducing the required bitrate to signal the atlas image information.
  • a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment.
  • a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.
  • the computer program code comprises one or more operational characteristics.
  • Said operational characteristics are being defined through configuration by said computer based on the type of said processor, wherein a system is connectable to said processor by a bus, wherein a programmable operational characteristic of the system comprises receiving as an input a volumetric video frame comprising volumetric content; projecting the volumetric content to at least two patches in a temporal order, wherein a patch comprises a volumetric video data component; creating atlas images including at least two patches, wherein each atlas image includes the patches from the same temporal projection; packing the information on several patches and/or atlas images , wherein the packed information comprises a set of parameters relating to said patches and/or atlas images to be shared among more than one patches/atlas images; signaling, in or along a bitstream, at least an indication that the atlas images comprise packed information; and transmitting the encoded bitstream to a storage for rendering.
  • a computer program product according to an embodiment can be embodied on a non-transitory computer readable medium. According to another embodiment, the computer program product can be downloaded over a network in a data packet.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Les modes de réalisation concernent un procédé consistant à : recevoir, en tant qu'entrée, une trame vidéo volumétrique comprenant un contenu volumétrique (510) ; projeter le contenu volumétrique sur au moins deux pièces dans un ordre temporel, une pièce comprenant un composant de données vidéo volumétriques (520) ; créer des images d'atlas comprenant au moins deux pièces, chaque image d'atlas comprenant les pièces provenant de la même projection temporelle (530) ; conditionner les informations sur plusieurs pièces et/ou images d'atlas, les informations condensées comprenant un ensemble de paramètres se rapportant auxdites pièces et/ou des images d'atlas devant être partagées entre plusieurs pièces/images d'atlas (540) ; signaler, dans un flux binaire ou le long de celui-ci, au moins une indication mentionnant que les images d'atlas comprennent des informations condensées (550) ; et transmettre le flux binaire codé à un stockage pour le rendu (560). Les modes de réalisation concernent également un appareil et un produit-programme informatique.
PCT/FI2021/050630 2020-10-05 2021-09-24 Procédé, appareil et produit-programme informatique de codage et de décodage vidéo WO2022074286A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20205969 2020-10-05
FI20205969 2020-10-05

Publications (1)

Publication Number Publication Date
WO2022074286A1 true WO2022074286A1 (fr) 2022-04-14

Family

ID=81126478

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2021/050630 WO2022074286A1 (fr) 2020-10-05 2021-09-24 Procédé, appareil et produit-programme informatique de codage et de décodage vidéo

Country Status (1)

Country Link
WO (1) WO2022074286A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200151913A1 (en) * 2017-11-09 2020-05-14 Samsung Electronics Co., Ltd. Point cloud compression using non-orthogonal projection
WO2020150148A1 (fr) * 2019-01-14 2020-07-23 Futurewei Technologies, Inc. Rotation de pièce efficace dans un codage en nuage de points

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200151913A1 (en) * 2017-11-09 2020-05-14 Samsung Electronics Co., Ltd. Point cloud compression using non-orthogonal projection
WO2020150148A1 (fr) * 2019-01-14 2020-07-23 Futurewei Technologies, Inc. Rotation de pièce efficace dans un codage en nuage de points

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"3 DG OF ISO/IEC JTC1/SC29/WG11 W19329 Information technology- Coded Representation of Immersive Media - Part 5: Visual Volumetric Video-based Coding (V3C) and Video-based Point Cloud Compression (V-PCC", THE 130TH MEETING OF MPEG, 9 May 2020 (2020-05-09), Alpbach, Retrieved from the Internet <URL:https://dms.mpeg.expert/doc_end_user/documents/130_Alpbach/wg11/w19329.zip> [retrieved on 20211222] *
"V-PCC Codec description", 128. MPEG MEETING; 20191007 - 20191011; GENEVA; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11), 30 December 2019 (2019-12-30), XP030225590 *
ARASH VOSOUGHI, BYEONGDOO CHOI, SEHOON YEA, STEPHAN WENGER, SHAN LIU: "[V-PCC][CE2.19 related][New proposal] Dynamic point cloud partition packing using tile groups", 127. MPEG MEETING; 20190708 - 20190712; GOTHENBURG; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11), 4 July 2019 (2019-07-04), XP030207640 *

Similar Documents

Publication Publication Date Title
EP3751857A1 (fr) Procédé, appareil et produit programme informatique de codage et décodage de vidéos volumétriques
US11509933B2 (en) Method, an apparatus and a computer program product for volumetric video
US20230068178A1 (en) A method, an apparatus and a computer program product for volumetric video encoding and decoding
US20230050860A1 (en) An apparatus, a method and a computer program for volumetric video
WO2019158821A1 (fr) Appareil, procédé et programme informatique de vidéo volumétrique
US20210092430A1 (en) Video-Based Point Cloud Compression Model to World Signalling Information
WO2021191495A1 (fr) Procédé, appareil et produit-programme d&#39;ordinateur pour codage vidéo et décodage vidéo
US20220217400A1 (en) Method, an apparatus and a computer program product for volumetric video encoding and decoding
US11974026B2 (en) Apparatus, a method and a computer program for volumetric video
US20230129875A1 (en) A method, an apparatus and a computer program product for volumetric video encoding and video decoding
WO2021170906A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
US20220159297A1 (en) An apparatus, a method and a computer program for volumetric video
WO2021205068A1 (fr) Procédé, appareil et produit-programme informatique pour codage vidéo volumétrique
WO2021260266A1 (fr) Procédé, appareil et produit-programme informatique pour codage vidéo volumétrique
US20220329871A1 (en) An Apparatus, A Method and a Computer Program for Volumetric Video
WO2023144445A1 (fr) Procédé, appareil et produit-programme informatique de codage et de décodage vidéo
EP3699867A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
EP3987774A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
EP4032314A1 (fr) Procédé, appareil et produit-programme informatique pour codage vidéo et décodage vidéo
WO2022074286A1 (fr) Procédé, appareil et produit-programme informatique de codage et de décodage vidéo
WO2019185983A1 (fr) Procédé, appareil et produit-programme d&#39;ordinateur destinés au codage et au décodage de vidéo volumétrique numérique
WO2021165566A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
WO2023041838A1 (fr) Appareil, procédé et programme informatique pour vidéo volumétrique
WO2023047021A2 (fr) Procédé, appareil et produit-programme informatique de codage et de décodage vidéo
WO2019211519A1 (fr) Procédé et appareil de codage et de décodage de vidéo volumétrique

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21877057

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21877057

Country of ref document: EP

Kind code of ref document: A1