EP4140136A1 - Verfahren und vorrichtungen zur codierung und decodierung einer mehransichtsvideosequenz - Google Patents

Verfahren und vorrichtungen zur codierung und decodierung einer mehransichtsvideosequenz

Info

Publication number
EP4140136A1
EP4140136A1 EP21721150.7A EP21721150A EP4140136A1 EP 4140136 A1 EP4140136 A1 EP 4140136A1 EP 21721150 A EP21721150 A EP 21721150A EP 4140136 A1 EP4140136 A1 EP 4140136A1
Authority
EP
European Patent Office
Prior art keywords
patch
transformation
atlas
data stream
decoded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP21721150.7A
Other languages
English (en)
French (fr)
Inventor
Félix Henry
Joël JUNG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
Orange SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Orange SA filed Critical Orange SA
Publication of EP4140136A1 publication Critical patent/EP4140136A1/de
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Definitions

  • the invention relates to so-called immersive videos, representative of a scene captured by one or more cameras. More particularly, the invention relates to the encoding and decoding of such videos.
  • the scene is conventionally captured by a set of cameras, as illustrated in figure 1.
  • These cameras can be of 2D type (cameras Ci, C 2 , C 3 , C 4 in FIG. 1), or type 360, ie capturing the entire scene at 360 degrees around the camera (camera C5 in FIG. 1).
  • one or more views are calculated from the decoded views.
  • FIG. 2 illustrates an example of a coding-decoding system using such a selection of data from the multi-view video making it possible to synthesize intermediate views on the decoder side.
  • one or more basic views are encoded by a 2D encoder, for example an HEVC type encoder, or by a multi-view encoder.
  • the other views (T s , D s ) undergo a processing which makes it possible to extract certain zones from each of these views.
  • the extracted areas also called patches hereafter, are gathered in images called atlases.
  • the atlases are coded for example by a conventional 2D video coder, for example a HEVC coder.
  • the atlases are decoded, providing the decoded patches to the view synthesis algorithm to produce intermediate views from the base views and decoded patches.
  • the patches make it possible to transmit the same area seen from another point of view.
  • the patches make it possible to transmit the occlusions, that is to say the parts of the scene which are not visible from a given view.
  • the MIV system (MPEG-I part 12) in its reference implementation (TMIV for “Test Model for Immersive Video”) generates atlases formed from a set of patches.
  • Figure 3 illustrates an example of extracting patches (Patch2, Patch5, Patch8, Patch3, Patch7) from views (V 0 , Vi, V) and the creation of associated atlases, for example two atlases A 0 and Ai .
  • These atlases A 0 and Ai each comprise an image of texture T 0 , Ti and a corresponding depth map D 0, Di.
  • the atlas A 0 comprises a texture T 0 and a depth D 0
  • the atlas comprises a texture Ti and a depth Di.
  • the patches are gathered in images and coded by a conventional 2D video coder.
  • it is necessary to have an optimal arrangement of the patches in the atlases.
  • it is necessary not only to reduce the cost of compressing such patches, but also the number of pixels. to be processed for the decoder.
  • the devices for rendering such videos have more limited resources than the devices for encoding such videos.
  • the invention improves the state of the art. For this purpose, it relates to a method of decoding a coded data stream representative of a multi-view video, said coded data stream comprising coded data representative of at least one atlas, said at least one atlas corresponding to an image comprising at least one patch, said at least one patch corresponding to a set of pixels extracted from at least one component of a view of the multi-view video, said view not being encoded in said encoded data stream .
  • the decoding process includes:
  • the invention also relates to a method of encoding a data stream representative of a multi-view video, the encoding method comprises:
  • the invention also makes it possible to apply transformations to the patches of an atlas which are different from one patch to another, or which possibly have different parameters.
  • the arrangement of patches in an atlas is thus optimized for compression.
  • the transformations used for the patches of the atlas make it possible, on the one hand, to optimize the occupation rate of the pixels of the atlas, by using transformations such as rotation, subsampling in coding so as to arrange the patches within the atlas image.
  • the transformations make it possible to optimize the cost of compressing the patches, in particular by modifying the pixel values of these patches, for example by reducing the dynamics of the pixels, by sub-sampling, which leads to coding fewer pixels, or even by using an optimal arrangement of the patches in the atlas image making it possible to have as few pixels as possible to code.
  • the reduction in the occupation rate of the pixels of the atlas also makes it possible to reduce the rate of pixels to be processed by the decoder, and therefore to reduce the complexity of the decoding.
  • it is determined whether a transformation must be applied to said at least one patch decoded from at least one syntax element decoded from said coded data stream for said at least one patch .
  • a syntax element is explicitly coded in the data stream to indicate whether a transformation must be applied to the decoded patch and which transformation to apply.
  • said at least one decoded syntax element comprises at least one indicator indicating whether a transformation must be applied to said at least one patch and whether the indicator indicates that a transformation must be applied said at least one patch, said at least one syntax element optionally comprises at least one parameter of said transformation.
  • the transformation to be applied to the patch is coded in the form of an indicator indicating whether a transformation must be applied to the patch or not, and in the positive case, optionally the parameter (s) of the transformation to apply.
  • a binary flag can indicate whether a transformation should be applied to the patch, and if so, a code indicating which transformation is used, and possibly one or more parameters of the transformation, such as scale factor, function of modification of the dynamics of the pixels, angle of rotation, etc ...
  • the transformation parameters can be defined by default at the encoder.
  • said at least one parameter of said transformation to be applied to said patch has a value which is encoded by prediction with respect to a prediction value.
  • the prediction value is encoded in a header of a view, or of a component of the atlas or of the atlas.
  • the prediction value corresponds to the value of a parameter of a transformation applied to a patch belonging to the group comprising:
  • the determination, for said at least one decoded patch, whether a transformation must be applied to said at least one decoded patch is carried out if a syntax element decoded from a header of the data stream indicates an activation of the application of transformations to the patches encoded in the data stream, said syntax element being encoded in a header of a view or of a component of a view or of said atlas.
  • a high level syntax element is encoded in the data stream to indicate the use of transformations to be applied to the patches of the multi-view video.
  • a transformation must be applied to said at least one decoded patch if a characteristic of said decoded patch meets a criterion.
  • the indication of the use of a transformation to be applied to the patch is not coded explicitly in the data stream. Such an indication is inferred from a characteristic of the decoded patch. This particular embodiment of the invention allows the use of patch transformations without incurring additional coding cost to signal the use of transformations.
  • the characteristic corresponds to an energy E calculated from the value of the pixels of said at least one decoded patch, the transformation to be applied to said at least one patch corresponding to a multiplication of the value of said pixels by a determined factor, when the energy E is less than a threshold.
  • an order in which said transformations must be applied is predefined. According to this particular embodiment of the invention, no signaling is necessary to indicate the order in which the transformations are applied. This order is defined at the encoder and at the decoder and remains the same for all the patches to which these transformations apply.
  • the invention also relates to a device for decoding an encoded data stream representative of a multi-view video, said encoded data stream comprising encoded data representative of at least one atlas, said at least one atlas corresponding to an atlas. image comprising at least one patch, said at least one patch corresponding to a set of pixels extracted from at least one component of a view of the multi-view video, said view not being encoded in said encoded data stream, the decoding device comprising a processor and a memory configured for:
  • such a device is included in a terminal.
  • the invention also relates to a device for encoding a data stream representative of a multi-view video, comprising a processor and a memory configured for:
  • such a device is included in a terminal.
  • the coding method, and respectively the decoding method according to the invention can be implemented in various ways, in particular in wired form or in software form.
  • the coding method, respectively the decoding method is implemented by a computer program.
  • the invention also relates to a computer program comprising instructions for implementing the coding method or the decoding method according to any one of the particular embodiments described above, when said program is executed by a processor.
  • Such a program can use any programming language. It can be downloaded from a communications network and / or recorded on a computer readable medium.
  • This program can use any programming language, and be in the form of source code, object code, or intermediate code between source code and object code, such as in a partially compiled form, or in any other. desirable form.
  • the invention also relates to a recording medium or information medium readable by a computer, and comprising instructions of a computer program as mentioned above.
  • the aforementioned recording media can be any entity or device capable of storing the program.
  • the medium may comprise a storage means, such as a ROM, for example a CD ROM or a microelectronic circuit ROM, a USB key, or else a magnetic recording means, for example a hard disk.
  • the recording media can correspond to a transmissible medium such as an electrical or optical signal, which can be conveyed via an electrical or optical cable, by radio or by other means.
  • the program according to the invention can in particular be downloaded from an Internet type network.
  • the recording media can correspond to an integrated circuit in which the program is incorporated, the circuit being adapted to execute or to be used in the execution of the method in question.
  • FIG. 1 schematically illustrates an example of a system for capturing a multi-view of a scene.
  • FIG. 2 schematically illustrates an example of a multi-view coder based on the coding of patches.
  • Figure 3 illustrates an example of patch extraction and atlas creation.
  • FIG. 4 illustrates steps of a coding method according to a particular embodiment of the invention.
  • FIG. 5 illustrates steps of a decoding method according to a particular embodiment of the invention.
  • FIG. 6 illustrates an example of data flow according to a particular embodiment of the invention.
  • FIG. 7 illustrates an example of the architecture of a coding device according to a particular embodiment of the invention.
  • FIG. 8 illustrates an example of the architecture of a decoding device according to a particular embodiment of the invention.
  • FIG. 4 illustrates the steps of a method for encoding a multi-view video in at least one encoded data stream according to a particular embodiment of the invention.
  • the multi-view video is encoded according to an encoding scheme as presented in relation to FIG. 2 in which one or more so-called basic views are encoded in the data stream and in which sub-images or Patches comprising texture and depth data are also encoded in the data stream. These patches come from additional views which are not fully encoded in the data stream. Such patches and one or more basic views allow the decoder to synthesize other views of the scene, also called virtual views, or synthesized views or even intermediate views thereafter. These summary views were not encoded in the data stream. The steps of such a coding scheme relating to a particular embodiment of the invention are described below.
  • the scene is captured by a set of cameras Ci, C 2 , ..., C N as in FIG. 1.
  • Each camera generates a view, comprising at least one so-called texture component which varies during time.
  • the texture component of a view is a sequence of 2D images corresponding to the images captured by the camera placed at the viewpoint of the view.
  • Each view also includes a depth component, called the depth map, which is determined for each image in the view.
  • the depth map can be generated in a known manner by estimating the depth using the texture, or by capturing volumetric data of the scene using devices using Lidar technology for "Light detection and Ranging". in English.
  • view will be used to indicate a sequence of texture images and depth maps representative of the scene captured from a point of view.
  • view can also correspond to a texture image and a depth map of a view at a given time.
  • the encoder then proceeds to the steps which are described below, for example according to the encoding scheme defined in Base I Salahieh, Bart Kroon, Joe I Jung, Marek Domanski, Test Model 4 for Immersive Video, ISO / IEC JTC 1 / SC 29 / WG 11 N19002, Brussels, BE - January 2020.
  • one or more so-called base views are selected from the captured views of the multi-view video.
  • the base views are selected from the set of captured views of the multi-view video by known means. For example, spatial subsampling can be done to select every other view. In another example, the content of the views can be used to determine which views are to be kept as base views. In yet another example, the camera settings (position, orientation, focal length) can be used to determine which views to select as base views. At the end of step E40, a number of views are selected to be base views.
  • a “pruning” method is applied to the additional views to identify for each additional view one or more patches to be transmitted to the decoder.
  • This step makes it possible to determine the patches to be transmitted by extracting from the images of additional views the areas necessary for the synthesis of intermediate views. For example, such zones correspond to occlusion zones not visible in the basic views, or else to visible zones having undergone a change in illumination, or even having a lower quality in the basic views.
  • the extracted areas are of arbitrary size and shape.
  • a group of pixels connected to their neighbors is carried out to create from zones extracted from the same view one or more rectangular patches easier to code and to arrange.
  • the encoder determines one or more transformation (s) which will be applied to the patch when it is arranged in an atlas.
  • the patches can be patches comprising a texture component and / or a depth component.
  • the patches are arranged in the atlases so as to minimize the cost of coding the atlases and / or reduce the number of pixels to be processed by the decoder.
  • the patches can undergo transformations, among which:
  • the encoder then goes through each patch and determines one or more transformations to apply to the patch.
  • an “identity” transformation in other words no transformations, can also be included in the list of transformations to be tested for the patch.
  • the selection of a transformation among the possible transformations can be made by evaluating a rate-distortion criterion calculated on the reconstructed signal using the rate necessary to encode the transformed patch and the distortion calculated between the original patch. and the transformed patch coded then reconstructed. The selection can also be made based on the evaluation of the quality of the additional view synthesized using the patch being processed.
  • the factors Nv, Nh, and Ne can be tested.
  • the factors Nv, Nh, and Ne are equal to 2. In other embodiments, other values are possible, such as 4, 8 or 16.
  • mapping The transformation corresponding to a modification of the values of the pixels is also called a “mapping” in English.
  • Such a mapping transformation can for example consist in dividing all the values of the pixels of the patch by a given value Dv.
  • Dv is 2.
  • other values are possible, such as 4, 8, or 16.
  • the parameter P of the transformation is then a list of triples (x1, a, b) for each linear part of the mapping.
  • the mapping can also be a LUT (“LookUp Table”) which is an array associating a value y with an input x.
  • LUT LookUp Table
  • the determination of a transformation associated with a patch can also take into account the number of atlases available to encode the multi-view video and simulate the arrangement of the patches in the atlases in order to optimize the bit rate / coding distortion cost of the atlases. atlas or the quality of the synthesis of intermediate views in a global way.
  • step E42 a list of transformed patches is available. Each patch is associated with the transformation (s) determined for this patch and the associated parameters.
  • the patches are arranged in one or more atlases.
  • the number of atlases depends for example on parameters defined on input to the encoder which are in particular the size of an atlas (length and height) and the maximum number M of pixels for the texture and depth of all the atlases per image or instant given. This maximum number M corresponds to the number of pixels to be processed for the decoder for an instant of the multi-view video.
  • each base view is encoded in an atlas comprising a patch comprising a texture component and a depth component of the base view at a given instant. According to this particular mode, there are then as many atlases as there are base views and as many atlases as necessary to transport all the patches extracted from the additional views.
  • an atlas may include a base view and patches, or a base view may be clipped and represented on multiple atlases if the view size is larger than the size of an atlas.
  • a patch of an atlas can then correspond to an entire image of a base view or to a part of a base view or to an area extracted from an additional view.
  • the texture pixels of the patches are arranged in the texture component of an atlas and the depth pixels of the patches are arranged in the depth component of an atlas.
  • An atlas may only have a single texture or depth component, or it may include a texture component and a depth component.
  • an atlas can also include other types of component including information useful for the synthesis of intermediate views.
  • other types of components may include information such as a reflectance index, to indicate how transparent the corresponding area is, or even confidence information about the depth value at that location.
  • the coder goes through all the patches in the patch list. For each patch, the encoder determines in which atlas this patch will be encoded. This list includes processed patches and unprocessed patches. Untransformed patches are either patches comprising areas extracted from additional views that have not undergone any transformation or identity transformation, or patches comprising base view images. It is considered here that when the patch must undergo a transformation, it has already been transformed.
  • An atlas is a set of spatially rearranged patches in an image. This image is intended to be encoded. This arrangement is intended to occupy the space in the atlas images to be encoded as well as possible. Indeed, one of the objectives of video coding is to minimize the number of pixels to be decoded before being able to synthesize a view. For this, the patches are arranged in the atlases so as to maximize the number of patches in an atlas. Such a method is described in Base I Salahieh, Bart Kroon, Jo ⁇ l Jung, Marek Domanski, Test Model 4 for Immersive Video, ISO / IEC JTC 1 / SC 29 / WG 11 N 19002, Brussels, BE - January 2020.
  • step E43 a list of patches for each atlas is generated. It should be noted that this arrangement also determines the number of atlases to be coded for a given moment.
  • each atlas which includes a texture component and / or a depth component in the form of a 2D image is encoded using a conventional video encoder such as HEVC, VVC, MV- HEVC, 3D-HEVC, etc.
  • a conventional video encoder such as HEVC, VVC, MV- HEVC, 3D-HEVC, etc.
  • the base views are considered here as patches.
  • the coding of atlases therefore involves the coding of the basic views.
  • the information associated with each atlas is encoded in the data stream.
  • This information is conventionally encoded by an entropy encoder.
  • the patch list includes the following, for each patch in the list:
  • step E45 for at least several patches of the atlas, information relating to the transformations to be applied to the patch during decoding is encoded in the data stream.
  • the transformations to be applied to the patch during decoding correspond to the inverse transformations applied to the patch when arranging the patch in the atlas and determined above.
  • each patch information indicating the transformation to be applied is transmitted.
  • the transformation to be applied to the decoding which is indicated and not the transformation applied to the encoding (corresponding to the inverse transformation of the decoding).
  • the information transmitted on the transformation to be applied may correspond to information indicating the transformation applied to the coding, the decoder then deducing the transformation to be applied from this information.
  • the information indicating the transformation to be applied can be an index indicating the transformation to be applied in a list of possible transformations.
  • a list can further include an identity transformation.
  • an index indicating the identity transformation can thus be encoded.
  • a binary indicator can be coded to indicate whether the patch is transformed or not, and if the binary indicator indicates that the patch has been transformed, an index indicating the transformation to be applied in the list of possible transformations is coded. .
  • only the binary indicator can be coded to indicate whether the patch is transformed or not.
  • the list of possible transformations can be known to the decoder and therefore does not need to be transmitted in the data stream. In other embodiments, the list of possible transformations can be coded in the data stream, for example in a header of a view or of the multi-view video.
  • the parameters associated with the transformations to be applied can also be defined by default and known to the decoder.
  • the parameters associated with a transformation applied to the patch are encoded in the data stream for each patch.
  • the parameter associated with the transformation can correspond to a value of an interpolation to be applied for all the dimensions or a value of an interpolation to apply for each of the dimensions.
  • the parameters of this transformation correspond to the characteristics of the mapping to be applied: parameters of a linear function, piecewise linear , Look-up Table (LUT), ....
  • LUT Look-up Table
  • the possible LUT (s) can be known to the decoder.
  • the parameter corresponds to the angle of rotation selected from among the possible rotations.
  • the parameters associated with a transformation can be encoded as they are or else by prediction with respect to a prediction value.
  • a prediction value can be defined and encoded in the data stream in a header of a view, or of a component, or of an image of a view. , or an atlas including the current patch.
  • the P value of a parameter will be predicted by a Ppred value encoded at the level of the atlas.
  • the difference between Ppred and P is then coded for each patch in the atlas.
  • the prediction value Ppred can correspond to the value of the parameter used for a patch previously processed.
  • it can be the previous patch in the patch processing order, or the previous patch belonging to the same view as the current patch.
  • the prediction value of the parameter can also be obtained by a mechanism similar to the “Merge” mode (merged in French) of an HEVC type encoder.
  • For each patch a list of candidate patches is defined and an index pointing to one of these candidate patches is coded for the patch.
  • it is not necessary to transmit an index a criterion being able to be used to identify the patch among the list of candidate patches.
  • the information indicating whether the patch must undergo a transformation can be decomposed into a part which indicates the use of the transformation (for example a binary flag) and a part which indicates the parameters of the transformation. , if usage is enabled.
  • This signaling mechanism can be used independently for each possible transformation for the patch.
  • a binary indicator can be coded at the level of a header of an atlas, or of a view or of a component, to activate the use of a determined transformation for the patches of this atlas, of this view or of this component.
  • the application of the transformation determined for a patch then depends on the value of this binary indicator.
  • two binary indicators U and I B associated respectively with the activation of a transformation A and with the activation of a transformation B are encoded in a header of an atlas.
  • the value of the binary indicator indicates that the use of transformation A is possible, while the value of the binary indicator I B indicates that the use of transformation B is not possible.
  • a binary indicator will indicate whether transformation A is applied to the patch, and possibly the associated parameters. It is not necessary in this example to code for each patch a binary indicator to indicate whether the transformation B is applied to the patch.
  • an atlas can comprise a patch for which a determined transformation can be applied as a function of the indicator coded for this patch and a patch for which the same transformation cannot be applied.
  • no indicator for this transformation is encoded in the patch information.
  • no information indicating a transformation is coded at the patch level. This is deduced at the decoder from a characteristic of the patch. The transformation is then applied to the patch as soon as it meets a certain criterion. This particular mode will be described in more detail below in relation to the decoding method.
  • FIG. 5 illustrates the steps of a method for decoding an encoded data stream representative of a multi-view video according to a particular embodiment of the invention.
  • the encoded data stream was generated by the encoding method described in relation to FIG. 4.
  • the atlas information is decoded. This information is conventionally decoded by a suitable entropy decoder.
  • this information can be an index indicating a transformation among a list of possible transformations, or else for each possible transformation, an indicator indicating whether the transformation must be applied to the patch.
  • the information may correspond to a binary indicator indicating the use of the transformation or a value of an interpolation to be applied for all dimensions.
  • the information may correspond to a binary indicator indicating the use of the transformation or for each of the dimensions a value of an interpolation to apply.
  • the information may include information indicating the use of the mapping, and possibly information representative of the characteristics of the mapping to be applied. (parameters of a linear function, piecewise linear, Look-up Table, ).
  • the parameter will indicate which rotation has been selected among the possible rotations.
  • the information transmitted making it possible to identify a transformation to be applied to the patch is decoded in a manner suited to the coding applied. Thus, it can be decoded as is (direct decoding) or in a predictive manner, in a manner similar to the encoder.
  • the information making it possible to identify a transformation to be applied to the patch can comprise a part which indicates the use of the transformation (binary indicator) and a part which indicates the parameters of the transformation. , if usage is enabled.
  • the decoding for a given patch of information identifying a transformation to be applied to the patch may depend on a binary activation indicator encoded at the header of the atlas, sight or component to which the patch belongs.
  • the information identifying a transformation to be applied to the patch is not encoded with the information of the patch, but derived from the characteristics of the decoded patch.
  • Dv a determined factor for modifying the patch values.
  • the patch is interpolated by a determined factor, for example a factor of 2 in the vertical dimension.
  • the patch dimensions considered here are the patch dimensions decoded from information from the atlas in which the patch was encoded. These are therefore the dimensions of the patch before transformation at the decoder (and therefore after transformation at the encoder).
  • This variant makes it possible to mix in the same atlas "long” patches for which it is not interesting to do a sub-sampling and "long” patches for which one sub-samples without signaling, which makes them respect the criterion. which allows them to interpolate to the decoder.
  • Other threshold values can be used, for example more restrictive values like 0.9 ⁇ H / W ⁇ 1.1.
  • each atlas which includes a 2D texture component and / or a 2D depth component, is decoded using a conventional video decoder such as AVC or HEVC, VVC, MV-HEVC, 3D-HEVC, etc.
  • the decoded patches are reconstructed by applying the transformation identified during step E50 to the texture component and / or to the depth component of each patch in its atlas depending on whether the transformation applies to texture, depth or both components.
  • this step consists of individually modifying each patch by applying the transformation identified for this patch. This can be done in different ways, for example: by modifying the pixels of the patch in the atlas that contains it, by copying the modified patch in a buffer memory area, or by copying the transformed patch in the view associated with it. .
  • each patch to be reconstructed can have one of the following transformations applied:
  • the transmitted mapping parameters can be either the encoder mapping parameters (and then the decoder will have to apply the inverse mapping function), or the decoder mapping parameters (and then the encoder will have to apply the inverse mapping function).
  • the encoder it is possible to apply to the encoder several transformations to a patch. These transformations are signaled in the stream in the information encoded for the patch or else deduced from the characteristics of the decoded patch. For example, it is possible to apply to the encoder a downsampling by a factor of 2 in each dimension of the patch, followed by a mapping of the pixel values of the patch, then a rotation.
  • the order of the transformations to be applied is predefined and known to the encoder and to the decoder. For example, the encoder order is as follows: rotation, then downsampling, then mapping.
  • step E52 a set of reconstructed patches is available.
  • At least one intermediate view is synthesized using at least one basic view and at least one patch reconstructed previously.
  • the chosen virtual view synthesis algorithm is applied to the decoded and reconstructed data from the multi-view video that has been transmitted to the decoder. As explained previously, this algorithm relies on the pixels of the components of the base views and patches to produce a view from a point of view between the cameras.
  • the synthesis algorithm uses at least two textures and two depth maps, taken from base views and / or additional views to generate an intermediate view.
  • Synthesizers are known and belong, for example, to the DIBR (“Depth Image Based Rendering”) category.
  • DIBR Depth Image Based Rendering
  • algorithms frequently used by standards organizations are:
  • the RVS for "Reference View Synthesizer” in English, initiated by the University of Brussels and improved by Philips begins by projecting the reference views using a calculated disparity.
  • the references are partitioned into triangles, and distorted. Then the deformed views of each reference are mixed (“blending” in English), then a basic “inpainting” type filling is applied to fill the dis-occlusions;
  • VVS VVS for “Versatile View Synthesizer” in English, developed by Orange, sorts the references, applies a deformation of certain information from the depth maps, then a conditional fusion of these depths. Then a backward warping of the textures is applied, then a fusion different textures and different depths. Finally, a spatio-temporal “inpainting” type filling is applied, before spatial filtering of the intermediate image.
  • FIG. 6 illustrates an example of a data stream according to a particular embodiment of the invention and in particular the atlas information encoded in the stream and making it possible to identify one or more transformations to be applied to the patches of the atlas.
  • the data stream was generated according to the encoding method according to any one of the particular embodiments described in relation to FIG. 4, and it is suitable for being decoded by the decoding method according to any one. particular embodiments described in relation to FIG. 5.
  • such a flow comprises in particular:
  • the patch information and in particular a Trf indicator indicating whether or not the transformation is used for the patch,
  • a Par parameter of the transformation for example in the form of a residue obtained with respect to the prediction value Ppred, when the latter is encoded.
  • FIG. 7 shows the simplified structure of a COD coding device suitable for implementing the coding method according to any one of the particular embodiments of the invention.
  • the steps of the coding method are implemented by computer program instructions.
  • the coding device COD has the conventional architecture of a computer and comprises in particular a memory MEM, a processing unit UT, equipped for example with a processor PROC, and controlled by the computer program PG stored in MEM memory.
  • the PG computer program comprises instructions for implementing the steps of the encoding method as described above, when the program is executed by the processor PROC.
  • the code instructions of the computer program PG are for example loaded into a RAM memory (not shown) before being executed by the processor PROC.
  • the processor PROC of the processing unit UT notably implements the steps of the coding method described above, according to the instructions of the computer program PG.
  • FIG. 8 shows the simplified structure of a decoding device DEC suitable for implementing the decoding method according to any one of the particular embodiments of the invention.
  • the decoding device DEC has the conventional architecture of a computer and notably comprises a MEMO memory, a UTO processing unit, equipped for example with a PROCO processor, and controlled by the PGO computer program stored in MEMO memory.
  • the computer program PGO comprises instructions for implementing the steps of the decoding method as described above, when the program is executed by the processor PROCO.
  • the code instructions of the computer program PGO are for example loaded into a RAM memory (not shown) before being executed by the processor PROCO.
  • the processor PROCO of the processing unit UTO notably implements the steps of the decoding method described above, according to the instructions of the computer program PGO.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
EP21721150.7A 2020-04-22 2021-03-29 Verfahren und vorrichtungen zur codierung und decodierung einer mehransichtsvideosequenz Pending EP4140136A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR2003994A FR3109685A1 (fr) 2020-04-22 2020-04-22 Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues
PCT/FR2021/050551 WO2021214395A1 (fr) 2020-04-22 2021-03-29 Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues

Publications (1)

Publication Number Publication Date
EP4140136A1 true EP4140136A1 (de) 2023-03-01

Family

ID=71452477

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21721150.7A Pending EP4140136A1 (de) 2020-04-22 2021-03-29 Verfahren und vorrichtungen zur codierung und decodierung einer mehransichtsvideosequenz

Country Status (8)

Country Link
US (1) US20230164352A1 (de)
EP (1) EP4140136A1 (de)
JP (1) JP2023522456A (de)
KR (1) KR20230002802A (de)
CN (1) CN115428456A (de)
BR (1) BR112022020642A2 (de)
FR (1) FR3109685A1 (de)
WO (1) WO2021214395A1 (de)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11457199B2 (en) * 2020-06-22 2022-09-27 Electronics And Telecommunications Research Institute Method for processing immersive video and method for producing immversive video

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102005016827A1 (de) * 2005-04-12 2006-10-19 Siemens Ag Adaptive Interpolation bei der Bild- oder Videokodierung
CN103228228B (zh) * 2010-07-12 2016-04-13 3形状股份有限公司 使用纹理特征的3d对象建模
JP5711098B2 (ja) * 2011-11-07 2015-04-30 日本電信電話株式会社 画像符号化方法,画像復号方法,画像符号化装置,画像復号装置およびそれらのプログラム
US10057574B2 (en) * 2015-02-11 2018-08-21 Qualcomm Incorporated Coding tree unit (CTU) level adaptive loop filter (ALF)
KR20200111643A (ko) * 2019-03-19 2020-09-29 한국전자통신연구원 이머시브 영상 처리 방법 및 이머시브 영상 합성 방법
BR112021020654A2 (pt) * 2019-05-14 2022-01-25 Intel Corp Dispositivo, pelo menos um meio legível por máquina, sistema e método para codificação de vídeo imersivo
KR102292195B1 (ko) * 2019-07-04 2021-08-24 엘지전자 주식회사 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법
CN115004230A (zh) * 2020-01-14 2022-09-02 华为技术有限公司 用于v-pcc的缩放参数

Also Published As

Publication number Publication date
JP2023522456A (ja) 2023-05-30
FR3109685A1 (fr) 2021-10-29
BR112022020642A2 (pt) 2022-11-29
WO2021214395A1 (fr) 2021-10-28
US20230164352A1 (en) 2023-05-25
KR20230002802A (ko) 2023-01-05
CN115428456A (zh) 2022-12-02

Similar Documents

Publication Publication Date Title
EP3061246B1 (de) Verfahren zur codierung und decodierung von bildern, vorrichtung zur codierung und decodierung von bildern und entsprechende computerprogramme
EP3878170B1 (de) Ansichtssynthese
WO2019211541A2 (fr) Procédé et dispositif de décodage d'une vidéo multi-vue, et procédé et dispositif de traitement d'images
FR3012004A1 (fr) Procede de codage et de decodage d'images, dispositif de codage et de decodage d'images et programmes d'ordinateur correspondants
WO2010043809A1 (fr) Prediction d'une image par compensation en mouvement en avant
FR2920632A1 (fr) Procede et dispositif de decodage de sequences video avec masquage d'erreurs
EP4140136A1 (de) Verfahren und vorrichtungen zur codierung und decodierung einer mehransichtsvideosequenz
FR3026261A1 (fr) Procede de codage et de decodage d'images integrales, dispositif de codage et de decodage d'images integrales et programmes d'ordinateur correspondants
FR3068557A1 (fr) Procede de codage et decodage d'images,dispositif de codage et decodage et programmes d'ordinateur correspondants
EP3939304A1 (de) Verfahren und vorrichtungen zur codierung und decodierung von mehrfachansichtssequenzen
EP1714498B1 (de) Verfahren zur ermittlung der optimalen prädiktionsrichtung für intra-prädiktive videokodierung
EP3158749B1 (de) Verfahren zur codierung und decodierung von bildern, vorrichtung zur codierung und decodierung von bildern und entsprechende computerprogramme
EP3529987A1 (de) Methode für die kodierung und dekodierung von bildparametern, vorrichtung für die kodierung und dekodierung von bildparametern und zugehörige computerprogramme
WO2020070409A1 (fr) Codage et décodage d'une vidéo omnidirectionnelle
WO2019115899A1 (fr) Procédés et dispositifs de codage et de décodage d'une séquence vidéo multi-vues représentative d'une vidéo omnidirectionnelle
WO2019008253A1 (fr) Procédé de codage et décodage d'images, dispositif de codage et décodage et programmes d'ordinateur correspondants
EP4104446A1 (de) Verfahren und vorrichtung zur verarbeitung von daten von mehrfachansichtsvideo
EP4222950A1 (de) Verfahren zur codierung und decodierung eines mehrfachansichtsvideos
FR3106014A1 (fr) Synthèse itérative de vues à partir de données d’une vidéo multi-vues
EP4360319A1 (de) Verfahren zur erstellung eines tiefenbildes aus einem mehransichtsvideo, verfahren zur decodierung eines datenstroms
WO2020260034A1 (fr) Procede et dispositif de traitement de donnees de video multi-vues
FR3064145A1 (fr) Procede de codage et decodage d'images, dispositif de codage et decodage et programmes d'ordinateur correspondants

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20220929

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
RAP3 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ORANGE