WO2021214395A1 - Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues - Google Patents
Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues Download PDFInfo
- Publication number
- WO2021214395A1 WO2021214395A1 PCT/FR2021/050551 FR2021050551W WO2021214395A1 WO 2021214395 A1 WO2021214395 A1 WO 2021214395A1 FR 2021050551 W FR2021050551 W FR 2021050551W WO 2021214395 A1 WO2021214395 A1 WO 2021214395A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- patch
- transformation
- atlas
- data stream
- decoded
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/59—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
Definitions
- the invention relates to so-called immersive videos, representative of a scene captured by one or more cameras. More particularly, the invention relates to the encoding and decoding of such videos.
- the scene is conventionally captured by a set of cameras, as illustrated in figure 1.
- These cameras can be of 2D type (cameras Ci, C 2 , C 3 , C 4 in FIG. 1), or type 360, ie capturing the entire scene at 360 degrees around the camera (camera C5 in FIG. 1).
- one or more views are calculated from the decoded views.
- FIG. 2 illustrates an example of a coding-decoding system using such a selection of data from the multi-view video making it possible to synthesize intermediate views on the decoder side.
- one or more basic views are encoded by a 2D encoder, for example an HEVC type encoder, or by a multi-view encoder.
- the other views (T s , D s ) undergo a processing which makes it possible to extract certain zones from each of these views.
- the extracted areas also called patches hereafter, are gathered in images called atlases.
- the atlases are coded for example by a conventional 2D video coder, for example a HEVC coder.
- the atlases are decoded, providing the decoded patches to the view synthesis algorithm to produce intermediate views from the base views and decoded patches.
- the patches make it possible to transmit the same area seen from another point of view.
- the patches make it possible to transmit the occlusions, that is to say the parts of the scene which are not visible from a given view.
- the MIV system (MPEG-I part 12) in its reference implementation (TMIV for “Test Model for Immersive Video”) generates atlases formed from a set of patches.
- Figure 3 illustrates an example of extracting patches (Patch2, Patch5, Patch8, Patch3, Patch7) from views (V 0 , Vi, V) and the creation of associated atlases, for example two atlases A 0 and Ai .
- These atlases A 0 and Ai each comprise an image of texture T 0 , Ti and a corresponding depth map D 0, Di.
- the atlas A 0 comprises a texture T 0 and a depth D 0
- the atlas comprises a texture Ti and a depth Di.
- the patches are gathered in images and coded by a conventional 2D video coder.
- it is necessary to have an optimal arrangement of the patches in the atlases.
- it is necessary not only to reduce the cost of compressing such patches, but also the number of pixels. to be processed for the decoder.
- the devices for rendering such videos have more limited resources than the devices for encoding such videos.
- the invention improves the state of the art. For this purpose, it relates to a method of decoding a coded data stream representative of a multi-view video, said coded data stream comprising coded data representative of at least one atlas, said at least one atlas corresponding to an image comprising at least one patch, said at least one patch corresponding to a set of pixels extracted from at least one component of a view of the multi-view video, said view not being encoded in said encoded data stream .
- the decoding process includes:
- the invention also relates to a method of encoding a data stream representative of a multi-view video, the encoding method comprises:
- the invention also makes it possible to apply transformations to the patches of an atlas which are different from one patch to another, or which possibly have different parameters.
- the arrangement of patches in an atlas is thus optimized for compression.
- the transformations used for the patches of the atlas make it possible, on the one hand, to optimize the occupation rate of the pixels of the atlas, by using transformations such as rotation, subsampling in coding so as to arrange the patches within the atlas image.
- the transformations make it possible to optimize the cost of compressing the patches, in particular by modifying the pixel values of these patches, for example by reducing the dynamics of the pixels, by sub-sampling, which leads to coding fewer pixels, or even by using an optimal arrangement of the patches in the atlas image making it possible to have as few pixels as possible to code.
- the reduction in the occupation rate of the pixels of the atlas also makes it possible to reduce the rate of pixels to be processed by the decoder, and therefore to reduce the complexity of the decoding.
- it is determined whether a transformation must be applied to said at least one patch decoded from at least one syntax element decoded from said coded data stream for said at least one patch .
- a syntax element is explicitly coded in the data stream to indicate whether a transformation must be applied to the decoded patch and which transformation to apply.
- said at least one decoded syntax element comprises at least one indicator indicating whether a transformation must be applied to said at least one patch and whether the indicator indicates that a transformation must be applied said at least one patch, said at least one syntax element optionally comprises at least one parameter of said transformation.
- the transformation to be applied to the patch is coded in the form of an indicator indicating whether a transformation must be applied to the patch or not, and in the positive case, optionally the parameter (s) of the transformation to apply.
- a binary flag can indicate whether a transformation should be applied to the patch, and if so, a code indicating which transformation is used, and possibly one or more parameters of the transformation, such as scale factor, function of modification of the dynamics of the pixels, angle of rotation, etc ...
- the transformation parameters can be defined by default at the encoder.
- said at least one parameter of said transformation to be applied to said patch has a value which is encoded by prediction with respect to a prediction value.
- the prediction value is encoded in a header of a view, or of a component of the atlas or of the atlas.
- the prediction value corresponds to the value of a parameter of a transformation applied to a patch belonging to the group comprising:
- the determination, for said at least one decoded patch, whether a transformation must be applied to said at least one decoded patch is carried out if a syntax element decoded from a header of the data stream indicates an activation of the application of transformations to the patches encoded in the data stream, said syntax element being encoded in a header of a view or of a component of a view or of said atlas.
- a high level syntax element is encoded in the data stream to indicate the use of transformations to be applied to the patches of the multi-view video.
- a transformation must be applied to said at least one decoded patch if a characteristic of said decoded patch meets a criterion.
- the indication of the use of a transformation to be applied to the patch is not coded explicitly in the data stream. Such an indication is inferred from a characteristic of the decoded patch. This particular embodiment of the invention allows the use of patch transformations without incurring additional coding cost to signal the use of transformations.
- the characteristic corresponds to an energy E calculated from the value of the pixels of said at least one decoded patch, the transformation to be applied to said at least one patch corresponding to a multiplication of the value of said pixels by a determined factor, when the energy E is less than a threshold.
- an order in which said transformations must be applied is predefined. According to this particular embodiment of the invention, no signaling is necessary to indicate the order in which the transformations are applied. This order is defined at the encoder and at the decoder and remains the same for all the patches to which these transformations apply.
- the invention also relates to a device for decoding an encoded data stream representative of a multi-view video, said encoded data stream comprising encoded data representative of at least one atlas, said at least one atlas corresponding to an atlas. image comprising at least one patch, said at least one patch corresponding to a set of pixels extracted from at least one component of a view of the multi-view video, said view not being encoded in said encoded data stream, the decoding device comprising a processor and a memory configured for:
- such a device is included in a terminal.
- the invention also relates to a device for encoding a data stream representative of a multi-view video, comprising a processor and a memory configured for:
- such a device is included in a terminal.
- the coding method, and respectively the decoding method according to the invention can be implemented in various ways, in particular in wired form or in software form.
- the coding method, respectively the decoding method is implemented by a computer program.
- the invention also relates to a computer program comprising instructions for implementing the coding method or the decoding method according to any one of the particular embodiments described above, when said program is executed by a processor.
- Such a program can use any programming language. It can be downloaded from a communications network and / or recorded on a computer readable medium.
- This program can use any programming language, and be in the form of source code, object code, or intermediate code between source code and object code, such as in a partially compiled form, or in any other. desirable form.
- the invention also relates to a recording medium or information medium readable by a computer, and comprising instructions of a computer program as mentioned above.
- the aforementioned recording media can be any entity or device capable of storing the program.
- the medium may comprise a storage means, such as a ROM, for example a CD ROM or a microelectronic circuit ROM, a USB key, or else a magnetic recording means, for example a hard disk.
- the recording media can correspond to a transmissible medium such as an electrical or optical signal, which can be conveyed via an electrical or optical cable, by radio or by other means.
- the program according to the invention can in particular be downloaded from an Internet type network.
- the recording media can correspond to an integrated circuit in which the program is incorporated, the circuit being adapted to execute or to be used in the execution of the method in question.
- FIG. 1 schematically illustrates an example of a system for capturing a multi-view of a scene.
- FIG. 2 schematically illustrates an example of a multi-view coder based on the coding of patches.
- Figure 3 illustrates an example of patch extraction and atlas creation.
- FIG. 4 illustrates steps of a coding method according to a particular embodiment of the invention.
- FIG. 5 illustrates steps of a decoding method according to a particular embodiment of the invention.
- FIG. 6 illustrates an example of data flow according to a particular embodiment of the invention.
- FIG. 7 illustrates an example of the architecture of a coding device according to a particular embodiment of the invention.
- FIG. 8 illustrates an example of the architecture of a decoding device according to a particular embodiment of the invention.
- FIG. 4 illustrates the steps of a method for encoding a multi-view video in at least one encoded data stream according to a particular embodiment of the invention.
- the multi-view video is encoded according to an encoding scheme as presented in relation to FIG. 2 in which one or more so-called basic views are encoded in the data stream and in which sub-images or Patches comprising texture and depth data are also encoded in the data stream. These patches come from additional views which are not fully encoded in the data stream. Such patches and one or more basic views allow the decoder to synthesize other views of the scene, also called virtual views, or synthesized views or even intermediate views thereafter. These summary views were not encoded in the data stream. The steps of such a coding scheme relating to a particular embodiment of the invention are described below.
- the scene is captured by a set of cameras Ci, C 2 , ..., C N as in FIG. 1.
- Each camera generates a view, comprising at least one so-called texture component which varies during time.
- the texture component of a view is a sequence of 2D images corresponding to the images captured by the camera placed at the viewpoint of the view.
- Each view also includes a depth component, called the depth map, which is determined for each image in the view.
- the depth map can be generated in a known manner by estimating the depth using the texture, or by capturing volumetric data of the scene using devices using Lidar technology for "Light detection and Ranging". in English.
- view will be used to indicate a sequence of texture images and depth maps representative of the scene captured from a point of view.
- view can also correspond to a texture image and a depth map of a view at a given time.
- the encoder then proceeds to the steps which are described below, for example according to the encoding scheme defined in Base I Salahieh, Bart Kroon, Joe I Jung, Marek Domanski, Test Model 4 for Immersive Video, ISO / IEC JTC 1 / SC 29 / WG 11 N19002, Brussels, BE - January 2020.
- one or more so-called base views are selected from the captured views of the multi-view video.
- the base views are selected from the set of captured views of the multi-view video by known means. For example, spatial subsampling can be done to select every other view. In another example, the content of the views can be used to determine which views are to be kept as base views. In yet another example, the camera settings (position, orientation, focal length) can be used to determine which views to select as base views. At the end of step E40, a number of views are selected to be base views.
- a “pruning” method is applied to the additional views to identify for each additional view one or more patches to be transmitted to the decoder.
- This step makes it possible to determine the patches to be transmitted by extracting from the images of additional views the areas necessary for the synthesis of intermediate views. For example, such zones correspond to occlusion zones not visible in the basic views, or else to visible zones having undergone a change in illumination, or even having a lower quality in the basic views.
- the extracted areas are of arbitrary size and shape.
- a group of pixels connected to their neighbors is carried out to create from zones extracted from the same view one or more rectangular patches easier to code and to arrange.
- the encoder determines one or more transformation (s) which will be applied to the patch when it is arranged in an atlas.
- the patches can be patches comprising a texture component and / or a depth component.
- the patches are arranged in the atlases so as to minimize the cost of coding the atlases and / or reduce the number of pixels to be processed by the decoder.
- the patches can undergo transformations, among which:
- the encoder then goes through each patch and determines one or more transformations to apply to the patch.
- an “identity” transformation in other words no transformations, can also be included in the list of transformations to be tested for the patch.
- the selection of a transformation among the possible transformations can be made by evaluating a rate-distortion criterion calculated on the reconstructed signal using the rate necessary to encode the transformed patch and the distortion calculated between the original patch. and the transformed patch coded then reconstructed. The selection can also be made based on the evaluation of the quality of the additional view synthesized using the patch being processed.
- the factors Nv, Nh, and Ne can be tested.
- the factors Nv, Nh, and Ne are equal to 2. In other embodiments, other values are possible, such as 4, 8 or 16.
- mapping The transformation corresponding to a modification of the values of the pixels is also called a “mapping” in English.
- Such a mapping transformation can for example consist in dividing all the values of the pixels of the patch by a given value Dv.
- Dv is 2.
- other values are possible, such as 4, 8, or 16.
- the parameter P of the transformation is then a list of triples (x1, a, b) for each linear part of the mapping.
- the mapping can also be a LUT (“LookUp Table”) which is an array associating a value y with an input x.
- LUT LookUp Table
- the determination of a transformation associated with a patch can also take into account the number of atlases available to encode the multi-view video and simulate the arrangement of the patches in the atlases in order to optimize the bit rate / coding distortion cost of the atlases. atlas or the quality of the synthesis of intermediate views in a global way.
- step E42 a list of transformed patches is available. Each patch is associated with the transformation (s) determined for this patch and the associated parameters.
- the patches are arranged in one or more atlases.
- the number of atlases depends for example on parameters defined on input to the encoder which are in particular the size of an atlas (length and height) and the maximum number M of pixels for the texture and depth of all the atlases per image or instant given. This maximum number M corresponds to the number of pixels to be processed for the decoder for an instant of the multi-view video.
- each base view is encoded in an atlas comprising a patch comprising a texture component and a depth component of the base view at a given instant. According to this particular mode, there are then as many atlases as there are base views and as many atlases as necessary to transport all the patches extracted from the additional views.
- an atlas may include a base view and patches, or a base view may be clipped and represented on multiple atlases if the view size is larger than the size of an atlas.
- a patch of an atlas can then correspond to an entire image of a base view or to a part of a base view or to an area extracted from an additional view.
- the texture pixels of the patches are arranged in the texture component of an atlas and the depth pixels of the patches are arranged in the depth component of an atlas.
- An atlas may only have a single texture or depth component, or it may include a texture component and a depth component.
- an atlas can also include other types of component including information useful for the synthesis of intermediate views.
- other types of components may include information such as a reflectance index, to indicate how transparent the corresponding area is, or even confidence information about the depth value at that location.
- the coder goes through all the patches in the patch list. For each patch, the encoder determines in which atlas this patch will be encoded. This list includes processed patches and unprocessed patches. Untransformed patches are either patches comprising areas extracted from additional views that have not undergone any transformation or identity transformation, or patches comprising base view images. It is considered here that when the patch must undergo a transformation, it has already been transformed.
- An atlas is a set of spatially rearranged patches in an image. This image is intended to be encoded. This arrangement is intended to occupy the space in the atlas images to be encoded as well as possible. Indeed, one of the objectives of video coding is to minimize the number of pixels to be decoded before being able to synthesize a view. For this, the patches are arranged in the atlases so as to maximize the number of patches in an atlas. Such a method is described in Base I Salahieh, Bart Kroon, Jo ⁇ l Jung, Marek Domanski, Test Model 4 for Immersive Video, ISO / IEC JTC 1 / SC 29 / WG 11 N 19002, Brussels, BE - January 2020.
- step E43 a list of patches for each atlas is generated. It should be noted that this arrangement also determines the number of atlases to be coded for a given moment.
- each atlas which includes a texture component and / or a depth component in the form of a 2D image is encoded using a conventional video encoder such as HEVC, VVC, MV- HEVC, 3D-HEVC, etc.
- a conventional video encoder such as HEVC, VVC, MV- HEVC, 3D-HEVC, etc.
- the base views are considered here as patches.
- the coding of atlases therefore involves the coding of the basic views.
- the information associated with each atlas is encoded in the data stream.
- This information is conventionally encoded by an entropy encoder.
- the patch list includes the following, for each patch in the list:
- step E45 for at least several patches of the atlas, information relating to the transformations to be applied to the patch during decoding is encoded in the data stream.
- the transformations to be applied to the patch during decoding correspond to the inverse transformations applied to the patch when arranging the patch in the atlas and determined above.
- each patch information indicating the transformation to be applied is transmitted.
- the transformation to be applied to the decoding which is indicated and not the transformation applied to the encoding (corresponding to the inverse transformation of the decoding).
- the information transmitted on the transformation to be applied may correspond to information indicating the transformation applied to the coding, the decoder then deducing the transformation to be applied from this information.
- the information indicating the transformation to be applied can be an index indicating the transformation to be applied in a list of possible transformations.
- a list can further include an identity transformation.
- an index indicating the identity transformation can thus be encoded.
- a binary indicator can be coded to indicate whether the patch is transformed or not, and if the binary indicator indicates that the patch has been transformed, an index indicating the transformation to be applied in the list of possible transformations is coded. .
- only the binary indicator can be coded to indicate whether the patch is transformed or not.
- the list of possible transformations can be known to the decoder and therefore does not need to be transmitted in the data stream. In other embodiments, the list of possible transformations can be coded in the data stream, for example in a header of a view or of the multi-view video.
- the parameters associated with the transformations to be applied can also be defined by default and known to the decoder.
- the parameters associated with a transformation applied to the patch are encoded in the data stream for each patch.
- the parameter associated with the transformation can correspond to a value of an interpolation to be applied for all the dimensions or a value of an interpolation to apply for each of the dimensions.
- the parameters of this transformation correspond to the characteristics of the mapping to be applied: parameters of a linear function, piecewise linear , Look-up Table (LUT), ....
- LUT Look-up Table
- the possible LUT (s) can be known to the decoder.
- the parameter corresponds to the angle of rotation selected from among the possible rotations.
- the parameters associated with a transformation can be encoded as they are or else by prediction with respect to a prediction value.
- a prediction value can be defined and encoded in the data stream in a header of a view, or of a component, or of an image of a view. , or an atlas including the current patch.
- the P value of a parameter will be predicted by a Ppred value encoded at the level of the atlas.
- the difference between Ppred and P is then coded for each patch in the atlas.
- the prediction value Ppred can correspond to the value of the parameter used for a patch previously processed.
- it can be the previous patch in the patch processing order, or the previous patch belonging to the same view as the current patch.
- the prediction value of the parameter can also be obtained by a mechanism similar to the “Merge” mode (merged in French) of an HEVC type encoder.
- For each patch a list of candidate patches is defined and an index pointing to one of these candidate patches is coded for the patch.
- it is not necessary to transmit an index a criterion being able to be used to identify the patch among the list of candidate patches.
- the information indicating whether the patch must undergo a transformation can be decomposed into a part which indicates the use of the transformation (for example a binary flag) and a part which indicates the parameters of the transformation. , if usage is enabled.
- This signaling mechanism can be used independently for each possible transformation for the patch.
- a binary indicator can be coded at the level of a header of an atlas, or of a view or of a component, to activate the use of a determined transformation for the patches of this atlas, of this view or of this component.
- the application of the transformation determined for a patch then depends on the value of this binary indicator.
- two binary indicators U and I B associated respectively with the activation of a transformation A and with the activation of a transformation B are encoded in a header of an atlas.
- the value of the binary indicator indicates that the use of transformation A is possible, while the value of the binary indicator I B indicates that the use of transformation B is not possible.
- a binary indicator will indicate whether transformation A is applied to the patch, and possibly the associated parameters. It is not necessary in this example to code for each patch a binary indicator to indicate whether the transformation B is applied to the patch.
- an atlas can comprise a patch for which a determined transformation can be applied as a function of the indicator coded for this patch and a patch for which the same transformation cannot be applied.
- no indicator for this transformation is encoded in the patch information.
- no information indicating a transformation is coded at the patch level. This is deduced at the decoder from a characteristic of the patch. The transformation is then applied to the patch as soon as it meets a certain criterion. This particular mode will be described in more detail below in relation to the decoding method.
- FIG. 5 illustrates the steps of a method for decoding an encoded data stream representative of a multi-view video according to a particular embodiment of the invention.
- the encoded data stream was generated by the encoding method described in relation to FIG. 4.
- the atlas information is decoded. This information is conventionally decoded by a suitable entropy decoder.
- this information can be an index indicating a transformation among a list of possible transformations, or else for each possible transformation, an indicator indicating whether the transformation must be applied to the patch.
- the information may correspond to a binary indicator indicating the use of the transformation or a value of an interpolation to be applied for all dimensions.
- the information may correspond to a binary indicator indicating the use of the transformation or for each of the dimensions a value of an interpolation to apply.
- the information may include information indicating the use of the mapping, and possibly information representative of the characteristics of the mapping to be applied. (parameters of a linear function, piecewise linear, Look-up Table, ).
- the parameter will indicate which rotation has been selected among the possible rotations.
- the information transmitted making it possible to identify a transformation to be applied to the patch is decoded in a manner suited to the coding applied. Thus, it can be decoded as is (direct decoding) or in a predictive manner, in a manner similar to the encoder.
- the information making it possible to identify a transformation to be applied to the patch can comprise a part which indicates the use of the transformation (binary indicator) and a part which indicates the parameters of the transformation. , if usage is enabled.
- the decoding for a given patch of information identifying a transformation to be applied to the patch may depend on a binary activation indicator encoded at the header of the atlas, sight or component to which the patch belongs.
- the information identifying a transformation to be applied to the patch is not encoded with the information of the patch, but derived from the characteristics of the decoded patch.
- Dv a determined factor for modifying the patch values.
- the patch is interpolated by a determined factor, for example a factor of 2 in the vertical dimension.
- the patch dimensions considered here are the patch dimensions decoded from information from the atlas in which the patch was encoded. These are therefore the dimensions of the patch before transformation at the decoder (and therefore after transformation at the encoder).
- This variant makes it possible to mix in the same atlas "long” patches for which it is not interesting to do a sub-sampling and "long” patches for which one sub-samples without signaling, which makes them respect the criterion. which allows them to interpolate to the decoder.
- Other threshold values can be used, for example more restrictive values like 0.9 ⁇ H / W ⁇ 1.1.
- each atlas which includes a 2D texture component and / or a 2D depth component, is decoded using a conventional video decoder such as AVC or HEVC, VVC, MV-HEVC, 3D-HEVC, etc.
- the decoded patches are reconstructed by applying the transformation identified during step E50 to the texture component and / or to the depth component of each patch in its atlas depending on whether the transformation applies to texture, depth or both components.
- this step consists of individually modifying each patch by applying the transformation identified for this patch. This can be done in different ways, for example: by modifying the pixels of the patch in the atlas that contains it, by copying the modified patch in a buffer memory area, or by copying the transformed patch in the view associated with it. .
- each patch to be reconstructed can have one of the following transformations applied:
- the transmitted mapping parameters can be either the encoder mapping parameters (and then the decoder will have to apply the inverse mapping function), or the decoder mapping parameters (and then the encoder will have to apply the inverse mapping function).
- the encoder it is possible to apply to the encoder several transformations to a patch. These transformations are signaled in the stream in the information encoded for the patch or else deduced from the characteristics of the decoded patch. For example, it is possible to apply to the encoder a downsampling by a factor of 2 in each dimension of the patch, followed by a mapping of the pixel values of the patch, then a rotation.
- the order of the transformations to be applied is predefined and known to the encoder and to the decoder. For example, the encoder order is as follows: rotation, then downsampling, then mapping.
- step E52 a set of reconstructed patches is available.
- At least one intermediate view is synthesized using at least one basic view and at least one patch reconstructed previously.
- the chosen virtual view synthesis algorithm is applied to the decoded and reconstructed data from the multi-view video that has been transmitted to the decoder. As explained previously, this algorithm relies on the pixels of the components of the base views and patches to produce a view from a point of view between the cameras.
- the synthesis algorithm uses at least two textures and two depth maps, taken from base views and / or additional views to generate an intermediate view.
- Synthesizers are known and belong, for example, to the DIBR (“Depth Image Based Rendering”) category.
- DIBR Depth Image Based Rendering
- algorithms frequently used by standards organizations are:
- the RVS for "Reference View Synthesizer” in English, initiated by the University of Brussels and improved by Philips begins by projecting the reference views using a calculated disparity.
- the references are partitioned into triangles, and distorted. Then the deformed views of each reference are mixed (“blending” in English), then a basic “inpainting” type filling is applied to fill the dis-occlusions;
- VVS VVS for “Versatile View Synthesizer” in English, developed by Orange, sorts the references, applies a deformation of certain information from the depth maps, then a conditional fusion of these depths. Then a backward warping of the textures is applied, then a fusion different textures and different depths. Finally, a spatio-temporal “inpainting” type filling is applied, before spatial filtering of the intermediate image.
- FIG. 6 illustrates an example of a data stream according to a particular embodiment of the invention and in particular the atlas information encoded in the stream and making it possible to identify one or more transformations to be applied to the patches of the atlas.
- the data stream was generated according to the encoding method according to any one of the particular embodiments described in relation to FIG. 4, and it is suitable for being decoded by the decoding method according to any one. particular embodiments described in relation to FIG. 5.
- such a flow comprises in particular:
- the patch information and in particular a Trf indicator indicating whether or not the transformation is used for the patch,
- a Par parameter of the transformation for example in the form of a residue obtained with respect to the prediction value Ppred, when the latter is encoded.
- FIG. 7 shows the simplified structure of a COD coding device suitable for implementing the coding method according to any one of the particular embodiments of the invention.
- the steps of the coding method are implemented by computer program instructions.
- the coding device COD has the conventional architecture of a computer and comprises in particular a memory MEM, a processing unit UT, equipped for example with a processor PROC, and controlled by the computer program PG stored in MEM memory.
- the PG computer program comprises instructions for implementing the steps of the encoding method as described above, when the program is executed by the processor PROC.
- the code instructions of the computer program PG are for example loaded into a RAM memory (not shown) before being executed by the processor PROC.
- the processor PROC of the processing unit UT notably implements the steps of the coding method described above, according to the instructions of the computer program PG.
- FIG. 8 shows the simplified structure of a decoding device DEC suitable for implementing the decoding method according to any one of the particular embodiments of the invention.
- the decoding device DEC has the conventional architecture of a computer and notably comprises a MEMO memory, a UTO processing unit, equipped for example with a PROCO processor, and controlled by the PGO computer program stored in MEMO memory.
- the computer program PGO comprises instructions for implementing the steps of the decoding method as described above, when the program is executed by the processor PROCO.
- the code instructions of the computer program PGO are for example loaded into a RAM memory (not shown) before being executed by the processor PROCO.
- the processor PROCO of the processing unit UTO notably implements the steps of the decoding method described above, according to the instructions of the computer program PGO.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/919,642 US20230164352A1 (en) | 2020-04-22 | 2021-03-29 | Methods and devices for coding and decoding a multi-view video sequence |
JP2022564436A JP2023522456A (ja) | 2020-04-22 | 2021-03-29 | マルチビュービデオシーケンスをコード化および復号するための方法およびデバイス |
KR1020227040112A KR20230002802A (ko) | 2020-04-22 | 2021-03-29 | 멀티-뷰 비디오 시퀀스를 코딩 및 디코딩하기 위한 방법들 및 디바이스들 |
CN202180030246.1A CN115428456A (zh) | 2020-04-22 | 2021-03-29 | 用于编码和解码多视图视频序列的方法和设备 |
EP21721150.7A EP4140136A1 (fr) | 2020-04-22 | 2021-03-29 | Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues |
BR112022020642A BR112022020642A2 (pt) | 2020-04-22 | 2021-03-29 | Processos e dispositivos de codificação e decodificação de uma sequência de video de múltiplas vistas |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FRFR2003994 | 2020-04-22 | ||
FR2003994A FR3109685A1 (fr) | 2020-04-22 | 2020-04-22 | Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021214395A1 true WO2021214395A1 (fr) | 2021-10-28 |
Family
ID=71452477
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2021/050551 WO2021214395A1 (fr) | 2020-04-22 | 2021-03-29 | Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues |
Country Status (8)
Country | Link |
---|---|
US (1) | US20230164352A1 (fr) |
EP (1) | EP4140136A1 (fr) |
JP (1) | JP2023522456A (fr) |
KR (1) | KR20230002802A (fr) |
CN (1) | CN115428456A (fr) |
BR (1) | BR112022020642A2 (fr) |
FR (1) | FR3109685A1 (fr) |
WO (1) | WO2021214395A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11457199B2 (en) * | 2020-06-22 | 2022-09-27 | Electronics And Telecommunications Research Institute | Method for processing immersive video and method for producing immversive video |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090232212A1 (en) * | 2005-04-12 | 2009-09-17 | Peter Amon | Adaptive Interpolation in Image or Video Encoding |
EP2755388A1 (fr) * | 2011-11-07 | 2014-07-16 | Nippon Telegraph And Telephone Corporation | Procédé, dispositif et programme pour coder et décoder une image |
US20160234492A1 (en) * | 2015-02-11 | 2016-08-11 | Qualcomm Incorporated | Coding tree unit (ctu) level adaptive loop filter (alf) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20200111643A (ko) * | 2019-03-19 | 2020-09-29 | 한국전자통신연구원 | 이머시브 영상 처리 방법 및 이머시브 영상 합성 방법 |
BR112021020654A2 (pt) * | 2019-05-14 | 2022-01-25 | Intel Corp | Dispositivo, pelo menos um meio legível por máquina, sistema e método para codificação de vídeo imersivo |
WO2021002633A2 (fr) * | 2019-07-04 | 2021-01-07 | 엘지전자 주식회사 | Dispositif de transmission de données de nuage de points, procédé de transmission de données de nuage de points, dispositif de réception de données de nuage de points et procédé de réception de données de nuage de points |
CN115004230A (zh) * | 2020-01-14 | 2022-09-02 | 华为技术有限公司 | 用于v-pcc的缩放参数 |
-
2020
- 2020-04-22 FR FR2003994A patent/FR3109685A1/fr not_active Withdrawn
-
2021
- 2021-03-29 KR KR1020227040112A patent/KR20230002802A/ko unknown
- 2021-03-29 CN CN202180030246.1A patent/CN115428456A/zh active Pending
- 2021-03-29 JP JP2022564436A patent/JP2023522456A/ja active Pending
- 2021-03-29 WO PCT/FR2021/050551 patent/WO2021214395A1/fr unknown
- 2021-03-29 EP EP21721150.7A patent/EP4140136A1/fr active Pending
- 2021-03-29 US US17/919,642 patent/US20230164352A1/en active Pending
- 2021-03-29 BR BR112022020642A patent/BR112022020642A2/pt unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090232212A1 (en) * | 2005-04-12 | 2009-09-17 | Peter Amon | Adaptive Interpolation in Image or Video Encoding |
EP2755388A1 (fr) * | 2011-11-07 | 2014-07-16 | Nippon Telegraph And Telephone Corporation | Procédé, dispositif et programme pour coder et décoder une image |
US20160234492A1 (en) * | 2015-02-11 | 2016-08-11 | Qualcomm Incorporated | Coding tree unit (ctu) level adaptive loop filter (alf) |
Non-Patent Citations (6)
Title |
---|
ARICI T ET AL: "A Histogram Modification Framework and Its Application for Image Contrast Enhancement", IEEE TRANSACTIONS ON IMAGE PROCESSING, IEE SERVICE CENTER , PISCATAWAY , NJ, US, vol. 18, no. 9, 1 September 2009 (2009-09-01), pages 1921 - 1935, XP011268498, ISSN: 1057-7149, DOI: 10.1109/TIP.2009.2021548 * |
BASEI SALAHIEHBART KROONJOEL JUNGMAREK DOMANSKI: "Test ModeI 4 for Immersive Video", ISO/IEC JTC 1/SC 29/WG 11 N19002, BRUSSELS, BE, January 2020 (2020-01-01) |
H. KARIM ET AL: "Scalable multiple description video coding for stereoscopic 3D", IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, vol. 54, no. 2, 1 May 2008 (2008-05-01), pages 745 - 752, XP055065376, ISSN: 0098-3063, DOI: 10.1109/TCE.2008.4560156 * |
LOU J ET AL: "Motorola Mobility's adaptive interpolation filter", 96. MPEG MEETING; 21-3-2011 - 25-3-2011; GENEVA; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11), no. m19888, 16 March 2011 (2011-03-16), XP030048455 * |
MURINTO ET AL: "Image enhancement using piecewise linear contrast stretch methods based on unsharp masking algorithms for leather image processing", 2017 3RD INTERNATIONAL CONFERENCE ON SCIENCE IN INFORMATION TECHNOLOGY (ICSITECH), IEEE, 25 October 2017 (2017-10-25), pages 669 - 673, XP033296949, ISBN: 978-1-5090-5864-8, [retrieved on 20180112], DOI: 10.1109/ICSITECH.2017.8257197 * |
VIDEO: "Working Draft 3 of Metadata for Immersive Video", vol. ties/16, 13 January 2020 (2020-01-13), pages 1 - 49, XP044281540, Retrieved from the Internet <URL:https://www.itu.int/ifa/t/2017/sg16/docs/200622/td/ties/gen/T17-SG16-200622-TD-GEN-0444!A6!ZIP-E.zip MIV_WD3_clean.docx> [retrieved on 20200113] * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11457199B2 (en) * | 2020-06-22 | 2022-09-27 | Electronics And Telecommunications Research Institute | Method for processing immersive video and method for producing immversive video |
Also Published As
Publication number | Publication date |
---|---|
JP2023522456A (ja) | 2023-05-30 |
BR112022020642A2 (pt) | 2022-11-29 |
FR3109685A1 (fr) | 2021-10-29 |
EP4140136A1 (fr) | 2023-03-01 |
KR20230002802A (ko) | 2023-01-05 |
US20230164352A1 (en) | 2023-05-25 |
CN115428456A (zh) | 2022-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3061246B1 (fr) | Procédé de codage et de décodage d'images, dispositif de codage et de décodage d'images et programmes d'ordinateur correspondants | |
WO2019211541A2 (fr) | Procédé et dispositif de décodage d'une vidéo multi-vue, et procédé et dispositif de traitement d'images | |
FR3012004A1 (fr) | Procede de codage et de decodage d'images, dispositif de codage et de decodage d'images et programmes d'ordinateur correspondants | |
WO2010043809A1 (fr) | Prediction d'une image par compensation en mouvement en avant | |
FR2920632A1 (fr) | Procede et dispositif de decodage de sequences video avec masquage d'erreurs | |
FR3084552A1 (fr) | Procede de formation d'une sequence d'images de sortie a partir d'une sequence d'images d'entree, procede de reconstruction d'une sequence d'images d'entree a partir d'une sequence d'images de sortie, dispositifs, equipement serveur, equipement client et programmes d'ordinateurs associes | |
WO2021214395A1 (fr) | Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues | |
WO2019008254A1 (fr) | Procédé de codage et décodage d'images, dispositif de codage et décodage et programmes d'ordinateur correspondants | |
WO2020188172A1 (fr) | Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues | |
EP1714498B1 (fr) | Procede de recherche de la directon de prediction en codage video intra-image | |
EP3158749B1 (fr) | Procédé de codage et de décodage d'images, dispositif de codage et de décodage d'images et programmes d'ordinateur correspondants | |
WO2018073523A1 (fr) | Procédé de codage et de décodage de paramètres d'image, dispositif de codage et de décodage de paramètres d'image et programmes d'ordinateur correspondants | |
WO2020070409A1 (fr) | Codage et décodage d'une vidéo omnidirectionnelle | |
WO2019115899A1 (fr) | Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues représentative d'une vidéo omnidirectionnelle | |
WO2019008253A1 (fr) | Procédé de codage et décodage d'images, dispositif de codage et décodage et programmes d'ordinateur correspondants | |
WO2021160955A1 (fr) | Procédé et dispositif de traitement de données de vidéo multi-vues | |
WO2022069809A1 (fr) | Codage et decodage d'une video multi-vues | |
FR3106014A1 (fr) | Synthèse itérative de vues à partir de données d’une vidéo multi-vues | |
WO2022269163A1 (fr) | Procédé de construction d'une image de profondeur d'une vidéo multi-vues, procédé de décodage d'un flux de données représentatif d'une vidéo multi-vues, procédé de codage, dispositifs, système, équipement terminal, signal et programmes d'ordinateur correspondants | |
WO2020260034A1 (fr) | Procede et dispositif de traitement de donnees de video multi-vues | |
FR3064145A1 (fr) | Procede de codage et decodage d'images, dispositif de codage et decodage et programmes d'ordinateur correspondants |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21721150 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022564436 Country of ref document: JP Kind code of ref document: A |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112022020642 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 20227040112 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2021721150 Country of ref document: EP Effective date: 20221122 |
|
ENP | Entry into the national phase |
Ref document number: 112022020642 Country of ref document: BR Kind code of ref document: A2 Effective date: 20221011 |