US11736725B2 - Methods for encoding decoding of a data flow representing of an omnidirectional video - Google Patents

Methods for encoding decoding of a data flow representing of an omnidirectional video Download PDF

Info

Publication number
US11736725B2
US11736725B2 US17/500,362 US202117500362A US11736725B2 US 11736725 B2 US11736725 B2 US 11736725B2 US 202117500362 A US202117500362 A US 202117500362A US 11736725 B2 US11736725 B2 US 11736725B2
Authority
US
United States
Prior art keywords
image
video
representative
reference sub
enhancement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US17/500,362
Other versions
US20220046279A1 (en
Inventor
Thibaud Biatek
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telediffusion de France ets Public de Diffusion
Original Assignee
Telediffusion de France ets Public de Diffusion
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telediffusion de France ets Public de Diffusion filed Critical Telediffusion de France ets Public de Diffusion
Priority to US17/500,362 priority Critical patent/US11736725B2/en
Publication of US20220046279A1 publication Critical patent/US20220046279A1/en
Application granted granted Critical
Publication of US11736725B2 publication Critical patent/US11736725B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/174Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/187Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/34Scalability techniques involving progressive bit-plane based encoding of the enhancement layer, e.g. fine granular scalability [FGS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the invention is situated in the field of video compression, and more particularly that of techniques for encoding and decoding immersive or omnidirectional (e.g. 180°, 360° in 2D or 3D) video.
  • An omnidirectional video can be used to represent a scene from a central point and to do so in every direction.
  • the term “360° video content” is used when the totality of the field is used.
  • a subset of the field can also be captured, for example covering only 180°.
  • the content can be captured monoscopically (2D) or stereoscopically (3D).
  • This type of content can be generated by assembling sequences of images captured by different cameras or else it can be generated synthetically by computer (e.g. in VR video games).
  • the images of such a video content enable the rendering, via an appropriate device, of the video along any direction whatsoever.
  • a user can control the direction in which the captured scene is displayed and navigates continuously in every possible direction.
  • Such 360° video contents can for example be rendered by using a virtual reality helmet offering the user an impression of immersion in the scene captured by the 360° video content.
  • Such 360° video contents necessitate reception devices adapted to this type of content (a virtual reality helmet for example) in order to offer the functions of immersion and control of the displayed view by the user.
  • the content captured specifically for a 360° video broadcast can have been already captured for a 2D or 3D video broadcast. In this case, it is the totality of the 360° content projected on a plane that is broadcast.
  • scalable video encoding There are techniques of video encoding by layers, known as scalable video encoding, used to encode a 2D video stream in several successive layers of refinements offering different levels of rebuilding of the 2D video.
  • spatial scalability enables the encoding of a video signal in several layers of increasing spatial resolution.
  • Scalability in terms of PSNR Peak Signal to Noise Ratio
  • Scalability in the colorimetric space enables the encoding of a video signal in several layers represented in increasingly wider colorimetric spaces.
  • the US document 2016/156917 describes a method for the scalable encoding of a video that can be a multiview video and wherein each view of the multiview video is encoded in a layer of the stream and predicted by another view of the multiview video.
  • the invention improves on the prior art. To this effect, it concerns a method for encoding a data stream representative of an omnidirectional video, comprising:
  • the invention thus reduces the cost of transmission of the video streams when the video contents must be transmitted in 2D view as well as in 360° view or in 3D view and in 3D-360° view.
  • a classic 2D or 3D video decoder will decode only the base layer or one of the base layers to rebuild a 2D or 3D video of the scene and a compatible 360° decoder will decode the base layer or layers and at least one enhancement layer to rebuild the 360° video.
  • the use of a prediction of the at least one base layer to encode the enhancement layer thus reduces the cost of encoding the enhancement layer.
  • the invention also concerns a method for decoding a data stream representative of an omnidirectional video, comprising:
  • omnidirectional video herein is understood to mean equally well a video of a scene, for which the totality of the field (360°) is captured and a video of a scene for which a sub-part of the 360° field is captured, for example 180°, 160°, 255.6°, or the like.
  • the omnidirectional video is therefore representative of a scene captured on at least one continuous part of the 360° field.
  • the prediction of the enhancement layer relative to the at least one base layer comprises, in order to encode or rebuild at least one image of the enhancement layer:
  • the prediction in the enhancement layer is carried out by the addition, during the encoding or decoding of an image of the enhancement layer, of a reference image in which the images rebuilt from base layers are projected.
  • a new reference image is added into the memory of reference images of the enhancement layer. This new reference image is generated by geometrical projection of all the base images rebuilt from the base layers at a time instant.
  • the data stream comprises a piece of information representative of a type of geometrical projection used to represent the omnidirectional video.
  • the view presented by the 2D or 3D video is a view extracted from the omnidirectional video.
  • the data stream comprises a piece of information representative of a type of geometrical projection used to extract a view of the omnidirectional video and of its parameters of location.
  • such a piece of information representative of parameters of projection and of location of said base image is encoded in the data stream in each image of the 360° video.
  • this variant is used to take account of a shift in the scene of a view serving as a prediction for the enhancement layer.
  • the images of the video of the base layer can correspond to images captured while moving in the scene, for example to track an object in motion in the scene.
  • the view can be captured by a camera in motion or successively by several cameras located at different viewpoints in the scene, to track a ball or a player during a football match for example.
  • the data stream comprises at least two base layers, each base layer being representative of a 2D or 3D video, each base layer being respectively represented by a view of the scene, the at least two base layers being encoded independently of each other.
  • an image of the enhancement layer is encoded by means of a group of tiles, each tile covering a region of the image of the enhancement layer, each region being distinct and separated from the other regions of the image of the enhancement layer, each tile being encoded by prediction relative to the at least one base layer.
  • the decoding of the enhancement layer comprises the rebuilding of a part of the image of the enhancement layer, the rebuilding of said part of the image comprising the decoding of the tiles of the enhancement layer covering the part of the image of the enhancement layer to be rebuilt, and the decoding of the at least one base layer comprising the decoding of the base layers used to predict the tiles covering the part of the image of the enhancement layer to be rebuilt.
  • Such a particular embodiment of the invention enables the rebuilding of only one part of the omnidirectional image and not the entire image. Typically, only the part being viewed by the user is rebuilt. Thus, it is not necessary to decode all the base layers of the video stream or even send them to the receiver. Indeed, with a user being unable to simultaneously see the entire image of the omnidirectional video, it is possible to encode an omnidirectional image by a tile mechanism enabling the independent encoding of the regions of the omnidirectional image so as to then make it possible to decode only those regions of the omnidirectional image that are visible to the user.
  • the independent encoding of the base layers thus makes it possible to rebuild the tiles of the omnidirectional image separately and to limit the complexity when decoding by avoiding the decoding of unnecessary base layers.
  • a piece of information identifying the at least one base layer used to predict the tile is decoded from the data stream.
  • the invention also relates to a device for encoding a data stream representative of an omnidirectional video.
  • the encoding device comprises means of encoding in said stream of at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video, and means of encoding, in said stream, at least one enhancement layer representative of the omnidirectional video, said means of encoding the enhancement layer comprising means of prediction of the enhancement layer relative to the at least one base layer.
  • the invention also relates to a device for decoding a data stream representative of an omnidirectional video.
  • the decoding device comprises means for the decoding, in said stream, of at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video, and means of decoding, in said stream, at least one enhancement layer representative of the omnidirectional video, said means for decoding the enhancement layer comprising means of prediction of the enhancement layer representative to the at least one base layer.
  • the encoding device and decoding device respectively are especially adapted to implementing the method of encoding and decoding respectively described here above.
  • the encoding device and decoding device respectively could of course comprise the different characteristics of the encoding method and decoding method respectively, according to the invention.
  • the characteristics and advantages of this encoding and decoding device respectively are the same as those of the encoding and decoding method respectively and are not described in more ample detail.
  • the decoding device is comprised in a terminal.
  • the invention also relates to a signal representative of an omnidirectional video comprising encoded data of at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video and encoded data of at least one enhancement layer representative of the omnidirectional video, the at least one enhancement layer being encoded by prediction relative to the at least one base layer.
  • an image of the enhancement layer is encoded by means of a group of tiles, each tile covering a region of the image of the enhancement layer, each region being distinct and separated from the other regions of the image of the enhancement layer, each tile being encoded by prediction relative to the at least one base layer.
  • the signal also comprises for each tile a piece of information identifying the at least one base layer used to predict the tile.
  • the invention also relates to a computer program comprising instructions to implement the method of encoding or the method of decoding according to any one of the particular embodiments described here above when said program is executed by a processor.
  • a program can use any programming language whatsoever. It can be downloaded from a communications network and/or recorded on a medium readable by computer.
  • This program can use any programming language whatsoever and be in the form of source code, object code or intermediate code between source code and object code, such as in a partially compiled form or in any other desirable form whatsoever.
  • the invention concerns a recording support or medium or information support or medium readable by a computer, comprising instructions of a computer program such as is mentioned here above.
  • the recording media mentioned here above can be any entity or device capable of storing the program.
  • the medium can comprise a storage means such as a read-only memory (ROM) type memory, for example a CD-ROM or a microelectronic circuit ROM, a flash memory mounted on a detachable storage medium, such as a USB stick, or again a magnetic mass memory of the hard-disk drive (HDD) or solid-state drive (SSD) type or a combination of memories working according to one or more data-recording technologies.
  • the recording medium can correspond to a transmissible medium such as an electrical or optical signal that can be conveyed via an electrical or optical cable, by radio or by other means.
  • the proposed computer program can be downloaded from an Internet type network.
  • the recording medium can correspond to an integrated circuit into which the program is incorporated, the circuit being adapted to the execution of a use in the execution of the method in question.
  • the encoding or decoding method according to the invention can therefore be implemented in various ways, especially in wired form or in software form.
  • FIG. 1 A illustrates the steps of the method of encoding according to one particular embodiment of the invention
  • FIG. 1 B illustrates an example of a signal generated according to the method of encoding implemented according to one particular embodiment of the invention
  • FIG. 2 A illustrates an image of a view of a scene captured by a 360° video encoded in a base layer
  • FIG. 2 B illustrates the image illustrated in FIG. 2 A projected in the referential of an image of the 360° video
  • FIG. 2 C illustrates an image of the 360° video encoded in an enhancement layer
  • FIGS. 2 D and 2 E each illustrate an image of two views of a scene captured by a 360° video and each encoded in a base layer
  • FIG. 2 F illustrates the images of two views illustrated in FIGS. 2 D and 2 E projected in the referential of an image of the 360° video
  • FIG. 2 G illustrates an image of the 360° video encoded in an enhancement layer
  • FIG. 3 illustrates steps of the method of decoding according to one particular embodiment of the invention
  • FIG. 4 A illustrates an example of an encoder configured to implement the method of encoding according to one particular embodiment of the invention
  • FIG. 4 B illustrates a device adapted to implementing the method of encoding according to another particular embodiment of the invention
  • FIG. 5 A illustrates an example of a decoder configured to implement the method of decoding according to one particular embodiment of the invention
  • FIG. 5 B illustrates a device adapted to implementing the method of decoding according to another particular embodiment of the invention
  • FIGS. 6 A and 6 B respectively illustrate an image of the 360° omnidirectional video encoded by independent tiles and a reference image generated from two views of two base layers and used to encode the image of FIG. 6 A ,
  • FIGS. 7 A-C respectively illustrate a projection in a 2D plane of a 360° omnidirectional video with cubemap type projection, a 3D spherical representation in an XYZ referential of the 360° omnidirectional video and a view extracted from the 360° immersive content in a 2D plane according to a rectilinear projection,
  • FIG. 7 D illustrates the relationship between different geometrical projections
  • FIG. 8 illustrates the procedure for building the reference image.
  • FIGS. 2 A , C-E and G and of FIGS. 7 A-B are extracted from 360° videos made available by LetInVR within the framework of the JVET (Joint Video Exploration Team), JVT-D0179 : Test Sequences for Virtual Reality Video Coding from Letin VR, 15-21 Oct. 2016).
  • the general principle of the invention is that of encoding a data stream scalably, thus making it possible to rebuild and render a 360° video when a receiver is adapted to receiving and rendering such a 360° video and rebuilding and rendering a 2D or 3D video when the receiver is adapted only to rendering a 2D or 3D video.
  • the 2D or 3D video is encoded in a base layer and the 360° video is encoded in an enhancement or improvement layer predicted from the base layer.
  • the stream can comprise several base layers each corresponding to a 2D or 3D video corresponding to a view of the scene.
  • the enhancement layer is thus encoded by prediction on the basis of all or a part of the base layers comprised in the stream.
  • FIG. 1 A illustrates steps of the method of encoding according to one particular embodiment of the invention.
  • a 360° video is encoded scalably by extracting views from a 360° video and by encoding each view in a base layer.
  • view is understood here to mean a sequence of images acquired from a viewpoint of the scene captured by the 360° video.
  • Such a sequence of images can be a sequence of monoscopic images in the case of a 360° video in 2D or a sequence of stereoscopic images in the case of a 360° video in 3D.
  • each image comprises a left-hand view and a right-hand view encoded jointly for example in the form of an image generated by means of left-hand and right-hand views placed side by side or one above the other.
  • the encoder encoding such a sequence of stereoscopic images in a base layer or an enhancement layer will then encode each image comprising a left-hand view and a right-hand view as a classic sequence of 2D images.
  • the omnidirectional video is a 360° video in 2D.
  • the number of base layers is independent of the number of views used to generate the 360° video.
  • the number of base layers encoded in the scalable data stream is for example determined during the production of the content or it can be determined by the encoder for purposes of optimizing the bit rate.
  • a first and a second view are extracted from the 360° video.
  • the views [ 1 ] and [ 2 ] are respectively encoded during an encoding step 12 for encoding a base layer BL[ 1 ] an encoding step 13 for encoding a base layer BL[ 2 ].
  • the base layers BL[ 1 ] and BL[ 2 ] are encoded independently of one another, i.e. there is no dependence of encoding (prediction, encoding context, etc.) between the encoding of the images of the base layer BL[ 1 ] and the encoding of the images of the base layer BL[ 2 ].
  • Each base layer BL[ 1 ] or BL[ 2 ] is decodable independently of the others.
  • this particular embodiment of the invention requires that the decoder should be capable of decoding both base layers to render a classic 2D video.
  • Each encoded/rebuilt image of the base layers BL[ 1 ] and BL[ 2 ] is then projected (steps 14 and 15 respectively) geometrically onto a same reference image I ref .
  • the result of this is a partially filled reference image that contains the samples interpolated from the projected view or views of the base layer. The building of the reference image is described in greater detail with reference to FIG. 8 .
  • FIGS. 2 A- 2 C illustrate one embodiment in which a single base layer is used.
  • the images of the 360° video have a spatial resolution of 3840 ⁇ 1920 pixels and are generated by an equirectangular projection and the 360° image sequence has a frequency of 30 images per second.
  • FIG. 2 C illustrates an image of the 360° video at a time instant t encoded in the enhancement layer.
  • FIG. 2 A An image at the time instant t of the view extracted from the 360° video is illustrated in FIG. 2 A .
  • the yaw and pitch coordinates correspond to the coordinates of the center (P in FIG. 2 B ) of the geometrical projection of an image of the view of the base layer, the yaw and pitch coordinates correspond respectively to the angle ⁇ and the angle ⁇ of the point P in the pivot format illustrated in FIG. 7 B .
  • the horizontal FOV and vertical FOV parameters correspond respectively to the horizontal and vertical sizes of an image of the extracted view centered on the point P in the pivot format illustrated in FIG. 7 B ; this image of the extracted view is represented in FIG. 7 C .
  • FIG. 2 B illustrates the reference image I ref used to predict the image of the 360° video at the instant t after equirectangular geometrical projection of the image of the base layer illustrated in FIG. 2 A .
  • FIGS. 2 D- 2 G illustrate an embodiment in which two base layers are used.
  • the images of the 360° video have a spatial resolution of 3840 ⁇ 1920 pixels and are generated by an equirectangular projection and the 360° image sequence has a frequency of 30 images per second.
  • FIG. 2 G illustrates an image of the 360° video at a time instant t encoded in the enhancement layer.
  • FIG. 2 D An image at the time instant t of a first view extracted from the 360° video is illustrated in FIG. 2 D .
  • FIG. 2 E An image at the time instant t of a second view extracted from the 360° video is illustrated in FIG. 2 E .
  • FIG. 2 F illustrates the reference image I ref used to predict the image of the 360° video at the instant t after equirectangular geometrical projection of the images of the first view and of the second view illustrated respectively in FIGS. 2 D and 2 E .
  • the representation of a 360° omnidirectional video in a plane is defined by a geometrical transformation characterizing the way in which a 360° omnidirectional content represented in a sphere is adapted to a representation in a plane.
  • the spherical representation of the data is used as a pivot format; it makes it possible to represent the points captured by the omnidirectional video device.
  • Such an XYZ 3D spherical representation is illustrated in FIG. 7 B .
  • the 360° video is represented by means of an equirectangular geometrical transformation that can be seen as a projection of the points on a cylinder surrounding the sphere.
  • Other geometrical transformations are of course possible, for example a cubemap projection corresponding to a projection of points on a cube enclosing a sphere. The faces of the cubes being finally unfolded on to a plane to form the 2D image.
  • a cubemap projection is for example illustrated in FIG. 7 A .
  • FIG. 7 D illustrates a more detailed view of the relationship between the different formats mentioned here above.
  • the passage from an equirectangular format A to a cubemap format B is done through a pivot format C characterized by a representation of the samples in a spherical XYZ system illustrated in FIG. 7 B .
  • the extraction of a view D from the format A is done through this pivot format C.
  • the extraction of a view of the immersive content is characterized by a geometrical transformation, for example by making a rectilinear projection of the points of the sphere along a plane illustrated by the plane ABCD in FIG. 7 C .
  • This projection is characterized by parameters of location such as yaw, pitch and the horizontal and vertical field of view (FOV).
  • JVET-G1003 Algorithm descriptions of projection format conversion and video quality metrics in 360 Lib Version 4”, Y. Ye, E. Alshina, J. Boyce, JVET of ITU - T SG 16 WP 3 and ISO/IEC JTC 1 /SC 29 /WG 11, 7th meeting, Turin, IT, 13-21 Jul. 2017).
  • FIG. 8 illustrates the different steps enabling the passage between two formats.
  • a table of correspondence is first of all built at E 80 in order to place the position of each sample in the destination image (I ref ), in correspondence with its corresponding position in the source format (corresponding to the rebuilt images of the base layers BL[ 1 ] and BL[ 2 ] in the example described in FIG. 1 A ). For each position (u,v) in the destination image, the following steps apply:
  • each pixel (u,v) in the destination image (I ref ) is interpolated relative to the value of the corresponding positive (u′,v′) in the source image during a step E 84 (corresponding to the rebuilt images of the base layers BL[ 1 ] and BL[ 2 ] in the example described with reference to FIG. 1 A ).
  • An interpolation can be done (u′v′) before assigning the value in applying a Lanczos type interpolation filter on the decoded image of the base layer at the position placed in correspondence.
  • the 360° video is encoded in an enhancement layer EL by prediction relative to the base layers BL[ 1 ] and BL[ 2 ] in using the reference image I ref generated from the base layers.
  • the data encoded during the steps 12 , 13 and 16 are multiplexed in order to form a binary stream comprising the encoded data of the base layers BL[ 1 ] and BL[ 2 ] and the enhancement layer EL.
  • the projection data used to build the reference image I ref are also encoded in the binary stream and transmitted to the decoder.
  • the encoding steps 12 , 13 and 16 can advantageously be implemented by standard video encoders, for example by a standard scalable SHVC encoder of the HEVC standard.
  • FIG. 1 B illustrates an example of a binary stream generated according to the method described with reference to FIG. 1 A .
  • the binary stream comprises:
  • the information representative of the projection and location parameters of a view of the base layer can for example be encoded in the form of coordinates of the view (yaw, pitch, HFOV, VFOV) matched with the type of projection (rectilinear projection) used to extract the view.
  • the information representative of the parameters of projection and location of a view of a base layer can be encoded only once in the binary stream. It is thus valid for the entire image sequence.
  • the information representative of the parameters of projection and location of a view of a base layer can be encoded several times in the binary stream, for example at each image, or at each group of images. It is thus valid only for one image or one group of images.
  • such a variant procures the advantage wherein the view extracted at each instant in time of the sequence can correspond to a view of an object that is in motion in this scene and is tracked in the course of time.
  • such a variant procures the advantage wherein the video sequence encoded in a base layer can change its viewpoint in the course of time thus making it possible to track an event via different viewpoints in the course of time.
  • FIG. 3 illustrates steps of the method of decoding according to one particular embodiment of the invention.
  • the scalable binary stream representative of the 360° video is demultiplexed during a step 30 .
  • the encoded data of the base layers BL[ 1 ] and BL[ 2 ] in the example described herein are sent to a decoder to be decoded (steps 31 , 33 respectively).
  • the rebuilt images of the base layers are projected (steps 32 , 34 respectively) similarly to the encoding method on a reference image I ref to serve as a prediction for the enhancement layer EL.
  • the geometrical projection is carried out from projection data provided in the binary stream (type of projection, information on projection and on location of the view).
  • the encoded data of the enhancement layer EL are decoded (step 35 ) and the images of the 360° video are rebuilt in using the reference images I ref generated from geometrical projections made on the base layers, as specified here above.
  • the scalable binary stream representative of the 360° video thus makes it possible to address any type of receiver.
  • Such a scalable stream also makes each receiver capable of decoding and rebuilding a 2D video or a 360° video according to its capacities.
  • classic receivers such as PCs, television sets, tablets, etc. will decode only one base layer and render a sequence of 2D images
  • receivers adapted to 360° video such as virtual reality helmets, smartphones, etc. will decode the base layers and the enhancement layers and render 360° video.
  • FIG. 4 A provides a more detailed illustration of the steps encoding a base layer and an enhancement layer of the method described here above according to one particular embodiment of the invention.
  • an enhancement layer encoding a 360° omnidirectional video by prediction from a base layer encoding a view k.
  • Each image of the view k to be encoded is sub-divided into blocks of pixels and each block of pixels is then encoded classically by spatial or temporal prediction in using a previously built reference image of the sequence of images of the view k.
  • a prediction module P determines a prediction for a current block B k c .
  • the current block B k c is encoded by spatial prediction relative to other blocks of the same image or else by temporal prediction relative to a block of a reference image of the view k previously encoded and rebuilt and stored in the memory MEM b .
  • the prediction residue is obtained in computing the difference between the current block B k c and the prediction determined by the prediction module P.
  • This prediction residue is then transformed by a transformation module T implementing for example a DCT (discrete cosine transform).
  • the transformed coefficients of the residue block are then quantified by a quantification module Q and then encoded by the entropic encoding module C to form the encoded data of the base layer BL[k].
  • the prediction residue is rebuilt, via an inverse quantification performed by the module Q ⁇ 1 and an inverse transform performed by the module T 1 and added to the prediction determined by the prediction module P to rebuild the current block.
  • the rebuilt current block is then stored in order to rebuild the current image and so that this rebuilt current image can serve as a reference during the encoding of following images of the view k.
  • a projection module PROJ carries out a geometrical projection of the rebuilt image in the reference image I ref of the 360° video as illustrated in FIG. 2 B and according to the geometrical transformation described here above.
  • the reference image I ref obtained by projection of the rebuilt image of the base layer is stored in the memory of the enhancement layer MEM e .
  • the 360° omnidirectional video is encoded image by image and block and block.
  • Each block of pixels is encoded classically by spatial or temporal prediction in using a reference image previously rebuilt and stored in the memory MEM e .
  • a prediction module P determines a prediction for a current block B e c of a current image of the 360° omnidirectional video.
  • the current block B e c is encoded by spatial prediction relative to other blocks of the same image or else by temporal prediction relative to a block of a previously encoded and rebuilt reference image of the 360° video, stored in the memory MEM e .
  • the current block B e c can also be encoded by interlayer prediction relative to a block co-localized in the reference image I ref obtained from the base layer.
  • a mode of encoding is reported in the encoded data EL of the enhancement layer by an Inter encoding mode signaling a temporal encoding of the block, a zero motion vector and a reference index indicating the reference image of the memory MEM e used indicating the image I ref .
  • These pieces of information are encoded by an entropic encoder C.
  • Such a particular embodiment of the invention enables the reutilization of the existing syntax of the temporal encoding modes of the existing standards. Other types of signaling are of course possible.
  • the mode of prediction determined to encode a current block B e c is for example selected from among the possible modes of prediction and by selecting the one that minimizes a bit rate/distortion criterion.
  • a prediction residue is obtained by computing the difference between the current block B e c and the prediction determined by the prediction module P.
  • This prediction module is then transformed by a transformation module T implementing for example a DCT (discrete cosine transform) type of transform.
  • the transformed coefficients of the residue block are then quantified by a quantification module Q and then encoded by the entropic encoding module C to form encoded data of the enhancement layer EL.
  • DCT discrete cosine transform
  • the prediction residue is rebuilt, via an inverse quantification performed by the module Q ⁇ 1 and an inverse transform performed by the module T 1 and added to the prediction determined by the prediction module P to rebuild the current block.
  • the rebuilt current block is then stored in order to rebuild the current image and each rebuilt current image can serve as a reference during the encoding of following images of the 360° omnidirectional video.
  • the encoding has been described here in the case of a single view k encoded in a base layer.
  • the method can be easily transposed to the case of several encoded views in an equivalent number of base layers.
  • Each image rebuilt at an time instant t of a base layer is projected on the same reference image I ref of the 360° video to encode an image of the 360° video at the instant t.
  • FIG. 4 B presents the simplified structure of an encoding device COD adapted to implementing the encoding method according to any one of the particular embodiments of the invention described here above.
  • Such an encoding device comprises a memory MEM 4 , a processing unit UT 4 , equipped for example with a processor PROC 4 .
  • the encoding method is implemented by a computer program PG 4 stored in a memory MEM 4 and managing the processing unit UT 4 .
  • the computer program PG 4 comprises instructions to implement the steps of the encoding method as described here above when the program is executed by the processor PROC 4 .
  • the code instructions of the computer program PG 4 are for example loaded into a memory (not shown) and then executed by the processor PROC 4 .
  • the processor PROC 4 of the processing unit UT 4 implements especially the steps of the encoding method described with reference to FIG. 1 A or 4 A according to the instructions of the computer program PG 4 .
  • the encoding method is implemented by functional modules (P, T, Q, Q ⁇ 1 , T 1 , C, PROJ).
  • the processing unit UT 4 cooperates with the different functional modules and the memory MEM 4 in order to implement the steps of the encoding method.
  • the memory MEM 4 especially includes the memories MEM b , MEM c .
  • a functional module can include a processor, a memory and program code instructions to implement the function corresponding to the module when the code instructions are executed by the processor.
  • a functional module can be implemented by any type of adapted encoding circuit such as, for example and non-exhaustively, microprocessors, digital signal processors (DSPs), applications specific integrated circuits (ASICs), field programmable gate array (FPGA) circuits, logic unit wiring.
  • DSPs digital signal processors
  • ASICs applications specific integrated circuits
  • FPGA field programmable gate array
  • FIG. 5 A provides a more detailed illustration of the steps for decoding a base layer and an enhancement layer of the method described here above according to one particular embodiment of the invention.
  • a decoding of an enhancement layer EL encoding a 360° omnidirectional video by prediction from a base layer BL[k] encoding a view k we describe the case of a decoding of an enhancement layer EL encoding a 360° omnidirectional video by prediction from a base layer BL[k] encoding a view k.
  • the view k and the 360° omnidirectional video are decoded image by image and block by block.
  • the data of the base layer BL[k] are decoded by an entropic decoding module D. Then, for a current block of a current image to be rebuilt, a prediction residue is rebuilt via an inverse quantification of the coefficients decoded entropically by an inverse quantification module Q ⁇ 1 and an inverse transform performed by an inverse transform module T 1 .
  • a prediction module P determines a prediction for the current block on the basis of the signaling data decoded by the entropic decoding module D. The prediction is added to the rebuilt prediction residue to rebuild the current block.
  • the rebuilt current block is then stored in order to rebuild the current image and so that this rebuilt current image is stored in the memory of reference images of the base layer MEM b and so that it can serve as a reference during the decoding of following images of the view k.
  • a projection module PROJ makes a geometrical projection of the rebuilt image in the reference image I ref of the 360° omnidirectional video, as illustrated in FIG. 2 B and according to the geometrical transformation described here above.
  • the reference image I ref obtained by projection of the rebuilt image of the base layer is stored in the memory of reference images of the enhancement layer MEM e .
  • the data of the enhancement layer EL are decoded by an entropic decoding module D. Then, for a current block of a current image to be rebuilt, a prediction residue is rebuilt via an inverse quantification of the entropically decoded coefficients implemented by an inverse quantification module Q ⁇ 1 and an inverse transform implemented by an inverse transformation module T 1 .
  • a prediction module determines a prediction for the current block from the signaling data decoded by the entropic decoding module D.
  • the decoded syntax data indicate that the current block B e c is encoded by inter-layer prediction relative to a block co-localized in the reference image I ref obtained from the base layer.
  • the prediction module therefore determines that the prediction corresponds to the block co-located with the current block B e c in the reference image I ref .
  • the prediction is added to the rebuilt prediction residue to rebuild the current block.
  • the rebuilt current block is then stored in order to rebuild the current image of the enhancement layer.
  • This rebuilt image is stored in the reference image memory of the enhancement layer MEM e to serve as a reference during the decoding of following images of the 360° video.
  • FIG. 5 B presents the simplified structure of a decoding device DEC adapted to implementing the decoding method according to any one of the particular embodiments of the invention described here above.
  • Such a decoding device comprises a memory MEM 5 , a processing unit UT 5 , equipped for example with a processor PROCR 5 .
  • the decoding method is implemented by a computer program PG 5 stored in a memory MEM 5 and driving the processing unit UT 5 .
  • the computer program PG 5 comprises instructions to implement the steps of the decoding method as described here above when the program is executed by the processor PROC 5 .
  • the code instructions of the computer program PG 5 are for example loaded into a memory (not shown) and then executed by the processor PROC 5 .
  • the processor PROC 5 of the processing unit UT 5 especially implements the steps of the decoding method described in relation to FIG. 3 or 5 A , according to the instructions of the computer program PG 5 .
  • the decoding method is implemented by functional modules (P, Q ⁇ 1 , T 1 , D, PROJ).
  • the processing unit UT 5 cooperates with the different functional modules and the memory MEM 5 in order to implement the steps of the decoding method.
  • the memory MEM 5 can especially include the memories MEM b , MEM e .
  • a functional module can include a processor, a memory and program code instructions to implement the function corresponding to the module when the code instructions are executed by the processor.
  • a functional module can be implemented by any type of adapted encoding circuit such as, for example and non-exhaustively, microprocessors, digital signal processors (DSPs), applications specific integrated circuits (ASICs), field programmable gate array (FPGA) circuits, logic unit wiring.
  • DSPs digital signal processors
  • ASICs applications specific integrated circuits
  • FPGA field programmable gate array
  • the blocks of an image of the enhancement layer are encoded by groups of blocks.
  • a group of blocks is also called a tile.
  • Each group of blocks, i.e. each tile, is encoded independently of the other tiles.
  • Each tile can then be decoded independently of the other tiles.
  • Such tiles (TE 0 -TE 11 ) are illustrated in FIG. 6 A representative of an image of the 360° omnidirectional video at a time instant in which 12 tiles are defined and entirely overlap the image.
  • independent encoding of tiles is understood here to mean an encoding of the blocks of a tile that do not use any spatial prediction from a block of another tile of the image, or temporal prediction from a block of a tile of the reference image not co-localized with the current tile.
  • Each tile is encoded by temporal prediction or inter-layer prediction on the basis of one or more of the base layers as illustrated in FIGS. 6 A and 6 B .
  • the tiles TE 4 and TE 7 are encoded by inter-layer prediction relative to the image projected in the reference image I ref of the view 1 and of the tiles TE 3 and TE 6 are encoded by inter-layer projection relative to the image projected in the reference image I ref of the view 2 .
  • a receiver adapted to decoding and rendering a 360° video decodes only the tiles necessary for the current zone of the 360° image viewed by a user. Indeed, during the rendering of a 360° video, a user cannot, at an instant t, view the entire image of the video, i.e. he cannot look in all directions at the same time and can, at an instant t, view only the zone of the image facing his gaze.
  • such a viewing zone is represented by the zone ZV of FIG. 6 A .
  • the zone ZV of FIG. 6 A is represented by the zone ZV of FIG. 6 A .
  • only the base layers that have served for the prediction of the zone viewed by the user are decoded at the step 31 .
  • the base layer corresponding to the view 1 is decoded during the step 31 and only the tiles TE 4 , TE 5 , TE 7 and TE 8 are decoded during the step 35 of FIG. 3 on the basis of the enhancement layer EL.
  • the step 35 only the part of the image of the enhancement layer corresponding to the tiles TE 4 , TE 5 , TE 7 and TE 8 is rebuilt.
  • a tile of the enhancement layer EL can be encoded by prediction from several base layers, as a function for example of the choices of bit rate/distortion optimization made during the encoding of the blocks of the enhancement layer, a block of a tile being possibly encoded by prediction relative to a first base layer and another block of the same tile being possibly encoded by another base layer distinct from the first base layer. In this case, all the base layers used for the prediction of the blocks of a tile of the enhancement layer must be decoded.
  • the stream of encoded data comprises, for each tile of the enhancement layer, a piece of information identifying the base layers used to predict the tile.
  • syntax elements indicating the number of base layers used and an identifier of each base layer used are encoded in the data stream.
  • Such syntax elements are decoded for each tile of the enhancement layer to be decoded during the step 35 for decoding the enhancement layer.
  • the particular embodiment described here limits the use of the resources of the decoder and avoids the decoding of data that is unnecessary because it is not viewed by the user.
  • Such an embodiment can be implemented by any one of the encoding devices and any one of the decoding devices described here above.
  • such a reference image has large-sized, non-defined zones, for example set at zero by default, which then use memory resources unnecessarily.
  • the rebuilt images of the base layers projected on the enhancement layer can be stored in reference sub-images.
  • a sub-image can be used for each base layer.
  • Each sub-image is stored in association with shift information enabling the encoder and/or the decoder to determine the location of the sub-image in the enhancement image.
  • Such a variant can be implemented independently of the decoder and/or of the encoder.

Abstract

A method for encoding a data stream representing an omnidirectional video. The method includes: encoding, in the stream, at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video; and encoding, in the stream, one enhancement layer representative of the omnidirectional video. The enhancement layer is encoded by prediction relative to the base layer. The prediction of the enhancement layer relative to the base layer includes: generating a reference sub-image obtained by geometrical projection on the reference sub-image of an image, called a base image, rebuilt from the base layer, and storing the reference sub-image in association with shift information enabling an encoder to determine the location of the reference sub-image in the enhancement image in a non-transitory computer-readable memory of reference sub-images of the enhancement layer.

Description

1. CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a Continuation application of U.S. Ser. No. 16/756,755, filed on Apr. 16, 2020, which is a Section 371 National Stage application of International Application No. PCT/EP2018/077922, filed Oct. 12, 2018, the contents of which are hereby incorporated by reference in their entireties.
2. FIELD OF THE INVENTION
The invention is situated in the field of video compression, and more particularly that of techniques for encoding and decoding immersive or omnidirectional (e.g. 180°, 360° in 2D or 3D) video.
3. PRIOR ART
An omnidirectional video can be used to represent a scene from a central point and to do so in every direction. The term “360° video content” is used when the totality of the field is used. A subset of the field can also be captured, for example covering only 180°. The content can be captured monoscopically (2D) or stereoscopically (3D). This type of content can be generated by assembling sequences of images captured by different cameras or else it can be generated synthetically by computer (e.g. in VR video games). The images of such a video content enable the rendering, via an appropriate device, of the video along any direction whatsoever. A user can control the direction in which the captured scene is displayed and navigates continuously in every possible direction.
Such 360° video contents can for example be rendered by using a virtual reality helmet offering the user an impression of immersion in the scene captured by the 360° video content.
Such 360° video contents necessitate reception devices adapted to this type of content (a virtual reality helmet for example) in order to offer the functions of immersion and control of the displayed view by the user.
However most currently used video content receivers are not compatible with this type of 360° video content and enable the rendering of only classic 2D or 3D video contents. Indeed, the rendering of a 360° video content necessitates the application of geometrical transforms to the images of the video in order to render the desired viewing direction.
Thus the broadcasting of 360° video contents is not backwards-compatible with the existing fleet of video receivers and is limited solely to receivers adapted to contents of this type.
However it is observed that the content captured specifically for a 360° video broadcast can have been already captured for a 2D or 3D video broadcast. In this case, it is the totality of the 360° content projected on a plane that is broadcast.
In addition, the simultaneous broadcasting of a same content captured in different formats (2D or 3D and 360°) to address the different video receivers is costly in terms of bandwidth, since it is necessary to send as many video streams as there are possible formats, namely 2D, 3D, 360° views of the same captured scene.
There is therefore a need to optimize the encoding and the broadcasting of omnidirectional video contents, representative of a part (180°) of a scene or the totality (360°) of a scene and to do so monoscopically (2D) or stereoscopically (3D).
There are techniques of video encoding by layers, known as scalable video encoding, used to encode a 2D video stream in several successive layers of refinements offering different levels of rebuilding of the 2D video. For example, spatial scalability enables the encoding of a video signal in several layers of increasing spatial resolution. Scalability in terms of PSNR (Peak Signal to Noise Ratio) enables the encoding of a video signal for a fixed spatial resolution in several layers of rising quality. Scalability in the colorimetric space enables the encoding of a video signal in several layers represented in increasingly wider colorimetric spaces.
However, none of the existing techniques enables the generation of a video data stream representative of a scene that can be decoded by a classic 2D or 3D video decoder as well as by a 360° video decoder.
The US document 2016/156917 describes a method for the scalable encoding of a video that can be a multiview video and wherein each view of the multiview video is encoded in a layer of the stream and predicted by another view of the multiview video.
4. SUMMARY OF THE INVENTION
The invention improves on the prior art. To this effect, it concerns a method for encoding a data stream representative of an omnidirectional video, comprising:
    • the encoding in said stream of at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video (360°, 180° etc),
    • the encoding in said stream of at least one enhancement layer representative of the omnidirectional video, the at least one enhancement layer being encoded by prediction relative to the at least one base layer.
The invention thus reduces the cost of transmission of the video streams when the video contents must be transmitted in 2D view as well as in 360° view or in 3D view and in 3D-360° view. Thus, a classic 2D or 3D video decoder will decode only the base layer or one of the base layers to rebuild a 2D or 3D video of the scene and a compatible 360° decoder will decode the base layer or layers and at least one enhancement layer to rebuild the 360° video. The use of a prediction of the at least one base layer to encode the enhancement layer thus reduces the cost of encoding the enhancement layer.
Correlatively, the invention also concerns a method for decoding a data stream representative of an omnidirectional video, comprising:
    • the decoding, from said stream, of at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video,
    • the decoding, from said stream, of at least one enhancement layer representative of the omnidirectional video, the at least one enhancement layer being decoded by prediction relative to the at least one base layer.
The term “omnidirectional video” herein is understood to mean equally well a video of a scene, for which the totality of the field (360°) is captured and a video of a scene for which a sub-part of the 360° field is captured, for example 180°, 160°, 255.6°, or the like. The omnidirectional video is therefore representative of a scene captured on at least one continuous part of the 360° field.
According to one particular embodiment of the invention, the prediction of the enhancement layer relative to the at least one base layer comprises, in order to encode or rebuild at least one image of the enhancement layer:
    • the generating of a reference image obtained by geometrical projection on the reference image of an image, called a base image, rebuilt from the at least one base layer,
    • the storing of said reference image in a memory of reference images of the enhancement layer.
Advantageously, the prediction in the enhancement layer is carried out by the addition, during the encoding or decoding of an image of the enhancement layer, of a reference image in which the images rebuilt from base layers are projected. Thus, a new reference image is added into the memory of reference images of the enhancement layer. This new reference image is generated by geometrical projection of all the base images rebuilt from the base layers at a time instant.
According to another particular embodiment of the invention, the data stream comprises a piece of information representative of a type of geometrical projection used to represent the omnidirectional video.
According to another particular embodiment of the invention, the view presented by the 2D or 3D video is a view extracted from the omnidirectional video.
According to another particular embodiment of the invention, the data stream comprises a piece of information representative of a type of geometrical projection used to extract a view of the omnidirectional video and of its parameters of location.
According to one variant, such a piece of information representative of parameters of projection and of location of said base image is encoded in the data stream in each image of the 360° video. Advantageously, this variant is used to take account of a shift in the scene of a view serving as a prediction for the enhancement layer. For example, the images of the video of the base layer can correspond to images captured while moving in the scene, for example to track an object in motion in the scene. For example, the view can be captured by a camera in motion or successively by several cameras located at different viewpoints in the scene, to track a ball or a player during a football match for example.
According to another particular embodiment of the invention, the data stream comprises at least two base layers, each base layer being representative of a 2D or 3D video, each base layer being respectively represented by a view of the scene, the at least two base layers being encoded independently of each other.
Thus, it is possible to have several independent base layers in the stream enabling several 2D or 3D views of the 360° video to be rebuilt independently.
According to another particular embodiment of the invention, an image of the enhancement layer is encoded by means of a group of tiles, each tile covering a region of the image of the enhancement layer, each region being distinct and separated from the other regions of the image of the enhancement layer, each tile being encoded by prediction relative to the at least one base layer. The decoding of the enhancement layer comprises the rebuilding of a part of the image of the enhancement layer, the rebuilding of said part of the image comprising the decoding of the tiles of the enhancement layer covering the part of the image of the enhancement layer to be rebuilt, and the decoding of the at least one base layer comprising the decoding of the base layers used to predict the tiles covering the part of the image of the enhancement layer to be rebuilt.
Such a particular embodiment of the invention enables the rebuilding of only one part of the omnidirectional image and not the entire image. Typically, only the part being viewed by the user is rebuilt. Thus, it is not necessary to decode all the base layers of the video stream or even send them to the receiver. Indeed, with a user being unable to simultaneously see the entire image of the omnidirectional video, it is possible to encode an omnidirectional image by a tile mechanism enabling the independent encoding of the regions of the omnidirectional image so as to then make it possible to decode only those regions of the omnidirectional image that are visible to the user.
Through the particular embodiment of the invention, the independent encoding of the base layers thus makes it possible to rebuild the tiles of the omnidirectional image separately and to limit the complexity when decoding by avoiding the decoding of unnecessary base layers.
Advantageously, for each tile of the enhancement layer to be decoded, a piece of information identifying the at least one base layer used to predict the tile is decoded from the data stream.
The invention also relates to a device for encoding a data stream representative of an omnidirectional video. The encoding device comprises means of encoding in said stream of at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video, and means of encoding, in said stream, at least one enhancement layer representative of the omnidirectional video, said means of encoding the enhancement layer comprising means of prediction of the enhancement layer relative to the at least one base layer.
The invention also relates to a device for decoding a data stream representative of an omnidirectional video. The decoding device comprises means for the decoding, in said stream, of at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video, and means of decoding, in said stream, at least one enhancement layer representative of the omnidirectional video, said means for decoding the enhancement layer comprising means of prediction of the enhancement layer representative to the at least one base layer.
The encoding device and decoding device respectively are especially adapted to implementing the method of encoding and decoding respectively described here above. The encoding device and decoding device respectively could of course comprise the different characteristics of the encoding method and decoding method respectively, according to the invention. Thus, the characteristics and advantages of this encoding and decoding device respectively are the same as those of the encoding and decoding method respectively and are not described in more ample detail.
According to one particular embodiment of the invention, the decoding device is comprised in a terminal.
The invention also relates to a signal representative of an omnidirectional video comprising encoded data of at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video and encoded data of at least one enhancement layer representative of the omnidirectional video, the at least one enhancement layer being encoded by prediction relative to the at least one base layer.
According to one particular embodiment of the invention, an image of the enhancement layer is encoded by means of a group of tiles, each tile covering a region of the image of the enhancement layer, each region being distinct and separated from the other regions of the image of the enhancement layer, each tile being encoded by prediction relative to the at least one base layer. According to one particular embodiment of the invention, the signal also comprises for each tile a piece of information identifying the at least one base layer used to predict the tile. Thus, only the base layers needed for decoding a tile to be decoded are decoded, thus optimizing the use of the resources of the decoder.
The invention also relates to a computer program comprising instructions to implement the method of encoding or the method of decoding according to any one of the particular embodiments described here above when said program is executed by a processor. Such a program can use any programming language whatsoever. It can be downloaded from a communications network and/or recorded on a medium readable by computer. This program can use any programming language whatsoever and be in the form of source code, object code or intermediate code between source code and object code, such as in a partially compiled form or in any other desirable form whatsoever.
According to yet another aspect, the invention concerns a recording support or medium or information support or medium readable by a computer, comprising instructions of a computer program such as is mentioned here above. The recording media mentioned here above can be any entity or device capable of storing the program. For example, the medium can comprise a storage means such as a read-only memory (ROM) type memory, for example a CD-ROM or a microelectronic circuit ROM, a flash memory mounted on a detachable storage medium, such as a USB stick, or again a magnetic mass memory of the hard-disk drive (HDD) or solid-state drive (SSD) type or a combination of memories working according to one or more data-recording technologies. Furthermore, the recording medium can correspond to a transmissible medium such as an electrical or optical signal that can be conveyed via an electrical or optical cable, by radio or by other means. In particular, the proposed computer program can be downloaded from an Internet type network.
As an alternative, the recording medium can correspond to an integrated circuit into which the program is incorporated, the circuit being adapted to the execution of a use in the execution of the method in question.
The encoding or decoding method according to the invention can therefore be implemented in various ways, especially in wired form or in software form.
5. LIST OF FIGURES
Other features and advantages of the invention shall appear more clearly from the following description of one particular embodiment, given by way of a simple illustratory and non-exhaustive example and from the appended drawings, of which:
FIG. 1A illustrates the steps of the method of encoding according to one particular embodiment of the invention,
FIG. 1B illustrates an example of a signal generated according to the method of encoding implemented according to one particular embodiment of the invention,
FIG. 2A illustrates an image of a view of a scene captured by a 360° video encoded in a base layer,
FIG. 2B illustrates the image illustrated in FIG. 2A projected in the referential of an image of the 360° video,
FIG. 2C illustrates an image of the 360° video encoded in an enhancement layer,
FIGS. 2D and 2E each illustrate an image of two views of a scene captured by a 360° video and each encoded in a base layer,
FIG. 2F illustrates the images of two views illustrated in FIGS. 2D and 2E projected in the referential of an image of the 360° video,
FIG. 2G illustrates an image of the 360° video encoded in an enhancement layer,
FIG. 3 illustrates steps of the method of decoding according to one particular embodiment of the invention,
FIG. 4A illustrates an example of an encoder configured to implement the method of encoding according to one particular embodiment of the invention,
FIG. 4B illustrates a device adapted to implementing the method of encoding according to another particular embodiment of the invention,
FIG. 5A illustrates an example of a decoder configured to implement the method of decoding according to one particular embodiment of the invention,
FIG. 5B illustrates a device adapted to implementing the method of decoding according to another particular embodiment of the invention,
FIGS. 6A and 6B respectively illustrate an image of the 360° omnidirectional video encoded by independent tiles and a reference image generated from two views of two base layers and used to encode the image of FIG. 6A,
FIGS. 7A-C respectively illustrate a projection in a 2D plane of a 360° omnidirectional video with cubemap type projection, a 3D spherical representation in an XYZ referential of the 360° omnidirectional video and a view extracted from the 360° immersive content in a 2D plane according to a rectilinear projection,
FIG. 7D illustrates the relationship between different geometrical projections,
FIG. 8 illustrates the procedure for building the reference image.
The images of FIGS. 2A, C-E and G and of FIGS. 7A-B are extracted from 360° videos made available by LetInVR within the framework of the JVET (Joint Video Exploration Team), JVT-D0179: Test Sequences for Virtual Reality Video Coding from Letin VR, 15-21 Oct. 2016).
6. DESCRIPTION OF ONE EMBODIMENT OF THE INVENTION 6.1 General Principle
The general principle of the invention is that of encoding a data stream scalably, thus making it possible to rebuild and render a 360° video when a receiver is adapted to receiving and rendering such a 360° video and rebuilding and rendering a 2D or 3D video when the receiver is adapted only to rendering a 2D or 3D video.
In order to reduce the cost of transmission of a stream comprising the 2D or 3D video as well as the 360° video, according to the invention, the 2D or 3D video is encoded in a base layer and the 360° video is encoded in an enhancement or improvement layer predicted from the base layer.
According to one particular embodiment of the invention, the stream can comprise several base layers each corresponding to a 2D or 3D video corresponding to a view of the scene. The enhancement layer is thus encoded by prediction on the basis of all or a part of the base layers comprised in the stream.
6.2 Examples of Implementation
FIG. 1A illustrates steps of the method of encoding according to one particular embodiment of the invention. According to this particular embodiment of the invention, a 360° video is encoded scalably by extracting views from a 360° video and by encoding each view in a base layer. The term “view” is understood here to mean a sequence of images acquired from a viewpoint of the scene captured by the 360° video. Such a sequence of images can be a sequence of monoscopic images in the case of a 360° video in 2D or a sequence of stereoscopic images in the case of a 360° video in 3D. In the case of a sequence of stereoscopic images, each image comprises a left-hand view and a right-hand view encoded jointly for example in the form of an image generated by means of left-hand and right-hand views placed side by side or one above the other. The encoder encoding such a sequence of stereoscopic images in a base layer or an enhancement layer will then encode each image comprising a left-hand view and a right-hand view as a classic sequence of 2D images.
Here below, we describe an embodiment in which the omnidirectional video is a 360° video in 2D.
Here we describe an embodiment where two base layers are used to encode the enhancement layer. Generally, the method described here applies to the case where a number of views N, with N greater than or equal to 1, is used for the encoding of the enhancement layer.
The number of base layers is independent of the number of views used to generate the 360° video. The number of base layers encoded in the scalable data stream is for example determined during the production of the content or it can be determined by the encoder for purposes of optimizing the bit rate.
During the steps 10 and 11, a first and a second view are extracted from the 360° video. The views [1] and [2] are respectively encoded during an encoding step 12 for encoding a base layer BL[1] an encoding step 13 for encoding a base layer BL[2].
In one particular embodiment described here, the base layers BL[1] and BL[2] are encoded independently of one another, i.e. there is no dependence of encoding (prediction, encoding context, etc.) between the encoding of the images of the base layer BL[1] and the encoding of the images of the base layer BL[2]. Each base layer BL[1] or BL[2] is decodable independently of the others. According to another particular embodiment, it is possible to encode the base layers BL[1] and BL[2] dependently, for example to gain in compression efficiency. However, this particular embodiment of the invention requires that the decoder should be capable of decoding both base layers to render a classic 2D video.
Each encoded/rebuilt image of the base layers BL[1] and BL[2] is then projected ( steps 14 and 15 respectively) geometrically onto a same reference image Iref. The result of this is a partially filled reference image that contains the samples interpolated from the projected view or views of the base layer. The building of the reference image is described in greater detail with reference to FIG. 8 .
FIGS. 2A-2C illustrate one embodiment in which a single base layer is used. According to this embodiment, the images of the 360° video have a spatial resolution of 3840×1920 pixels and are generated by an equirectangular projection and the 360° image sequence has a frequency of 30 images per second. FIG. 2C illustrates an image of the 360° video at a time instant t encoded in the enhancement layer.
An image at the time instant t of the view extracted from the 360° video is illustrated in FIG. 2A. Such a view is for example extracted from the 360° video by means of the following coordinates: yaw=20°, pitch=5°, horizontal FOV (field of view)=110° and vertical FOV=80°, the spatial resolution of the images of the extracted view is 1920×960 pixels and the time frequency is 30 images per second. The yaw and pitch coordinates correspond to the coordinates of the center (P in FIG. 2B) of the geometrical projection of an image of the view of the base layer, the yaw and pitch coordinates correspond respectively to the angle θ and the angle φ of the point P in the pivot format illustrated in FIG. 7B. The horizontal FOV and vertical FOV parameters correspond respectively to the horizontal and vertical sizes of an image of the extracted view centered on the point P in the pivot format illustrated in FIG. 7B; this image of the extracted view is represented in FIG. 7C.
FIG. 2B illustrates the reference image Iref used to predict the image of the 360° video at the instant t after equirectangular geometrical projection of the image of the base layer illustrated in FIG. 2A.
FIGS. 2D-2G illustrate an embodiment in which two base layers are used. According to this embodiment, the images of the 360° video have a spatial resolution of 3840×1920 pixels and are generated by an equirectangular projection and the 360° image sequence has a frequency of 30 images per second. FIG. 2G illustrates an image of the 360° video at a time instant t encoded in the enhancement layer.
An image at the time instant t of a first view extracted from the 360° video is illustrated in FIG. 2D. This first view is for example extracted from the 360° video by means of the following coordinates: yaw=20°, pitch=5°, horizontal FOV (field of view)=110° and vertical FOV=80°; the spatial resolution of the images of the first extracted view is 1920×960 pixels and the time frequency is 30 images per second.
An image at the time instant t of a second view extracted from the 360° video is illustrated in FIG. 2E. This second view is for example extracted from the 360° video using the coordinates: yaw=20°, pitch=5°, horizontal FOV (field of view)=110° and vertical FOV=80°; the spatial resolution of the images of the first extracted view is 1920×960 pixels and the time frequency is 30 images per second.
FIG. 2F illustrates the reference image Iref used to predict the image of the 360° video at the instant t after equirectangular geometrical projection of the images of the first view and of the second view illustrated respectively in FIGS. 2D and 2E.
In order to project the rebuilt images of the base layers in the reference image, the following steps of geometrical transformation are applied.
The representation of a 360° omnidirectional video in a plane is defined by a geometrical transformation characterizing the way in which a 360° omnidirectional content represented in a sphere is adapted to a representation in a plane. The spherical representation of the data is used as a pivot format; it makes it possible to represent the points captured by the omnidirectional video device. Such an XYZ 3D spherical representation is illustrated in FIG. 7B.
For example, the 360° video is represented by means of an equirectangular geometrical transformation that can be seen as a projection of the points on a cylinder surrounding the sphere. Other geometrical transformations are of course possible, for example a cubemap projection corresponding to a projection of points on a cube enclosing a sphere. The faces of the cubes being finally unfolded on to a plane to form the 2D image. Such a cubemap projection is for example illustrated in FIG. 7A.
FIG. 7D illustrates a more detailed view of the relationship between the different formats mentioned here above. The passage from an equirectangular format A to a cubemap format B is done through a pivot format C characterized by a representation of the samples in a spherical XYZ system illustrated in FIG. 7B. In the same way, the extraction of a view D from the format A is done through this pivot format C. The extraction of a view of the immersive content is characterized by a geometrical transformation, for example by making a rectilinear projection of the points of the sphere along a plane illustrated by the plane ABCD in FIG. 7C. This projection is characterized by parameters of location such as yaw, pitch and the horizontal and vertical field of view (FOV). The mathematical properties of these different geometrical transformations are documented in the document JVET-G1003 (“Algorithm descriptions of projection format conversion and video quality metrics in 360Lib Version 4”, Y. Ye, E. Alshina, J. Boyce, JVET of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC 29/WG 11, 7th meeting, Turin, IT, 13-21 Jul. 2017).
FIG. 8 illustrates the different steps enabling the passage between two formats. A table of correspondence is first of all built at E80 in order to place the position of each sample in the destination image (Iref), in correspondence with its corresponding position in the source format (corresponding to the rebuilt images of the base layers BL[1] and BL[2] in the example described in FIG. 1A). For each position (u,v) in the destination image, the following steps apply:
    • At E81: passage of the coordinates (u,v) of the destination image into the pivot system XYZ.
    • At E82: projection of the XYZ coordinates of the pivot system in the source image (u′,v′).
    • At E83: updating the table of correspondence relating the positions in the destination format and in the source format.
Once the table of correspondence is built, the value of each pixel (u,v) in the destination image (Iref) is interpolated relative to the value of the corresponding positive (u′,v′) in the source image during a step E84 (corresponding to the rebuilt images of the base layers BL[1] and BL[2] in the example described with reference to FIG. 1A). An interpolation can be done (u′v′) before assigning the value in applying a Lanczos type interpolation filter on the decoded image of the base layer at the position placed in correspondence.
At a step 16 of the encoding method illustrated in FIG. 1A, the 360° video is encoded in an enhancement layer EL by prediction relative to the base layers BL[1] and BL[2] in using the reference image Iref generated from the base layers.
At a step 17, the data encoded during the steps 12, 13 and 16 are multiplexed in order to form a binary stream comprising the encoded data of the base layers BL[1] and BL[2] and the enhancement layer EL. The projection data used to build the reference image Iref are also encoded in the binary stream and transmitted to the decoder.
The encoding steps 12, 13 and 16 can advantageously be implemented by standard video encoders, for example by a standard scalable SHVC encoder of the HEVC standard.
FIG. 1B illustrates an example of a binary stream generated according to the method described with reference to FIG. 1A. In this example, the binary stream comprises:
    • the encoded data of the base layers BL[1] and BL[2],
    • a piece of information PRJ representative of the type of geometrical projection used to represent the omnidirectional content, for example a value indicating an equirectangular projection,
    • a piece of information PRJ_B1, PRJ_B2 respectively, representative of the projection used to extract the view and its location parameters in the 360° video from the view of the base layer BL[1] and BL[2] respectively.
The information representative of the projection and location parameters of a view of the base layer can for example be encoded in the form of coordinates of the view (yaw, pitch, HFOV, VFOV) matched with the type of projection (rectilinear projection) used to extract the view.
The information representative of the parameters of projection and location of a view of a base layer can be encoded only once in the binary stream. It is thus valid for the entire image sequence.
The information representative of the parameters of projection and location of a view of a base layer can be encoded several times in the binary stream, for example at each image, or at each group of images. It is thus valid only for one image or one group of images.
When the information representative of the parameters of projection and location of a view is encoded at each image, such a variant procures the advantage wherein the view extracted at each instant in time of the sequence can correspond to a view of an object that is in motion in this scene and is tracked in the course of time.
When the information representative of parameters of projection and location of a view is encoded for a group of images, such a variant procures the advantage wherein the video sequence encoded in a base layer can change its viewpoint in the course of time thus making it possible to track an event via different viewpoints in the course of time.
FIG. 3 illustrates steps of the method of decoding according to one particular embodiment of the invention.
According to this particular embodiment of the invention, the scalable binary stream representative of the 360° video is demultiplexed during a step 30. The encoded data of the base layers BL[1] and BL[2] in the example described herein are sent to a decoder to be decoded ( steps 31, 33 respectively).
Then, the rebuilt images of the base layers are projected ( steps 32, 34 respectively) similarly to the encoding method on a reference image Iref to serve as a prediction for the enhancement layer EL. The geometrical projection is carried out from projection data provided in the binary stream (type of projection, information on projection and on location of the view).
The encoded data of the enhancement layer EL are decoded (step 35) and the images of the 360° video are rebuilt in using the reference images Iref generated from geometrical projections made on the base layers, as specified here above.
The scalable binary stream representative of the 360° video thus makes it possible to address any type of receiver. Such a scalable stream also makes each receiver capable of decoding and rebuilding a 2D video or a 360° video according to its capacities.
According to the decoding method described here above, classic receivers such as PCs, television sets, tablets, etc. will decode only one base layer and render a sequence of 2D images, while receivers adapted to 360° video such as virtual reality helmets, smartphones, etc. will decode the base layers and the enhancement layers and render 360° video.
FIG. 4A provides a more detailed illustration of the steps encoding a base layer and an enhancement layer of the method described here above according to one particular embodiment of the invention. Here, we describe the case of the encoding of an enhancement layer encoding a 360° omnidirectional video by prediction from a base layer encoding a view k.
Each image of the view k to be encoded is sub-divided into blocks of pixels and each block of pixels is then encoded classically by spatial or temporal prediction in using a previously built reference image of the sequence of images of the view k.
Classically, a prediction module P determines a prediction for a current block Bk c. The current block Bk c is encoded by spatial prediction relative to other blocks of the same image or else by temporal prediction relative to a block of a reference image of the view k previously encoded and rebuilt and stored in the memory MEMb.
The prediction residue is obtained in computing the difference between the current block Bk c and the prediction determined by the prediction module P.
This prediction residue is then transformed by a transformation module T implementing for example a DCT (discrete cosine transform). The transformed coefficients of the residue block are then quantified by a quantification module Q and then encoded by the entropic encoding module C to form the encoded data of the base layer BL[k].
The prediction residue is rebuilt, via an inverse quantification performed by the module Q−1 and an inverse transform performed by the module T1 and added to the prediction determined by the prediction module P to rebuild the current block.
The rebuilt current block is then stored in order to rebuild the current image and so that this rebuilt current image can serve as a reference during the encoding of following images of the view k.
When the current image of the view k is rebuilt, a projection module PROJ carries out a geometrical projection of the rebuilt image in the reference image Iref of the 360° video as illustrated in FIG. 2B and according to the geometrical transformation described here above.
The reference image Iref obtained by projection of the rebuilt image of the base layer is stored in the memory of the enhancement layer MEMe.
Just as in the case of the base layer, the 360° omnidirectional video is encoded image by image and block and block. Each block of pixels is encoded classically by spatial or temporal prediction in using a reference image previously rebuilt and stored in the memory MEMe.
Classically, a prediction module P determines a prediction for a current block Be c of a current image of the 360° omnidirectional video. The current block Be c is encoded by spatial prediction relative to other blocks of the same image or else by temporal prediction relative to a block of a previously encoded and rebuilt reference image of the 360° video, stored in the memory MEMe.
According to the invention, advantageously, the current block Be c can also be encoded by interlayer prediction relative to a block co-localized in the reference image Iref obtained from the base layer. For example, such a mode of encoding is reported in the encoded data EL of the enhancement layer by an Inter encoding mode signaling a temporal encoding of the block, a zero motion vector and a reference index indicating the reference image of the memory MEMe used indicating the image Iref. These pieces of information are encoded by an entropic encoder C. Such a particular embodiment of the invention enables the reutilization of the existing syntax of the temporal encoding modes of the existing standards. Other types of signaling are of course possible.
The mode of prediction determined to encode a current block Be c is for example selected from among the possible modes of prediction and by selecting the one that minimizes a bit rate/distortion criterion.
Once a prediction mode is selected for the current block Be c, a prediction residue is obtained by computing the difference between the current block Be c and the prediction determined by the prediction module P.
This prediction module is then transformed by a transformation module T implementing for example a DCT (discrete cosine transform) type of transform. The transformed coefficients of the residue block are then quantified by a quantification module Q and then encoded by the entropic encoding module C to form encoded data of the enhancement layer EL.
The prediction residue is rebuilt, via an inverse quantification performed by the module Q−1 and an inverse transform performed by the module T1 and added to the prediction determined by the prediction module P to rebuild the current block.
The rebuilt current block is then stored in order to rebuild the current image and each rebuilt current image can serve as a reference during the encoding of following images of the 360° omnidirectional video.
The encoding has been described here in the case of a single view k encoded in a base layer. The method can be easily transposed to the case of several encoded views in an equivalent number of base layers. Each image rebuilt at an time instant t of a base layer is projected on the same reference image Iref of the 360° video to encode an image of the 360° video at the instant t.
FIG. 4B presents the simplified structure of an encoding device COD adapted to implementing the encoding method according to any one of the particular embodiments of the invention described here above.
Such an encoding device comprises a memory MEM4, a processing unit UT4, equipped for example with a processor PROC4.
According to one particular embodiment of the invention, the encoding method is implemented by a computer program PG4 stored in a memory MEM4 and managing the processing unit UT4. The computer program PG4 comprises instructions to implement the steps of the encoding method as described here above when the program is executed by the processor PROC4.
At initialization, the code instructions of the computer program PG4 are for example loaded into a memory (not shown) and then executed by the processor PROC4. The processor PROC4 of the processing unit UT4 implements especially the steps of the encoding method described with reference to FIG. 1A or 4A according to the instructions of the computer program PG4.
According to another particular embodiment of the invention, the encoding method is implemented by functional modules (P, T, Q, Q−1, T1, C, PROJ). To this end, the processing unit UT4 cooperates with the different functional modules and the memory MEM4 in order to implement the steps of the encoding method. The memory MEM4 especially includes the memories MEMb, MEMc.
The different functional modules described here above can be in hardware or software form. In hardware form, such a functional module can include a processor, a memory and program code instructions to implement the function corresponding to the module when the code instructions are executed by the processor. In hardware form, such a functional module can be implemented by any type of adapted encoding circuit such as, for example and non-exhaustively, microprocessors, digital signal processors (DSPs), applications specific integrated circuits (ASICs), field programmable gate array (FPGA) circuits, logic unit wiring.
FIG. 5A provides a more detailed illustration of the steps for decoding a base layer and an enhancement layer of the method described here above according to one particular embodiment of the invention. Here, we describe the case of a decoding of an enhancement layer EL encoding a 360° omnidirectional video by prediction from a base layer BL[k] encoding a view k.
The view k and the 360° omnidirectional video are decoded image by image and block by block.
Classically, the data of the base layer BL[k] are decoded by an entropic decoding module D. Then, for a current block of a current image to be rebuilt, a prediction residue is rebuilt via an inverse quantification of the coefficients decoded entropically by an inverse quantification module Q−1 and an inverse transform performed by an inverse transform module T1. A prediction module P determines a prediction for the current block on the basis of the signaling data decoded by the entropic decoding module D. The prediction is added to the rebuilt prediction residue to rebuild the current block.
The rebuilt current block is then stored in order to rebuild the current image and so that this rebuilt current image is stored in the memory of reference images of the base layer MEMb and so that it can serve as a reference during the decoding of following images of the view k.
When the current image of the view k is rebuilt, a projection module PROJ makes a geometrical projection of the rebuilt image in the reference image Iref of the 360° omnidirectional video, as illustrated in FIG. 2B and according to the geometrical transformation described here above.
The reference image Iref obtained by projection of the rebuilt image of the base layer is stored in the memory of reference images of the enhancement layer MEMe.
The data of the enhancement layer EL are decoded by an entropic decoding module D. Then, for a current block of a current image to be rebuilt, a prediction residue is rebuilt via an inverse quantification of the entropically decoded coefficients implemented by an inverse quantification module Q−1 and an inverse transform implemented by an inverse transformation module T1. A prediction module determines a prediction for the current block from the signaling data decoded by the entropic decoding module D.
For example, the decoded syntax data indicate that the current block Be c is encoded by inter-layer prediction relative to a block co-localized in the reference image Iref obtained from the base layer. The prediction module therefore determines that the prediction corresponds to the block co-located with the current block Be c in the reference image Iref.
The prediction is added to the rebuilt prediction residue to rebuild the current block. The rebuilt current block is then stored in order to rebuild the current image of the enhancement layer.
This rebuilt image is stored in the reference image memory of the enhancement layer MEMe to serve as a reference during the decoding of following images of the 360° video.
FIG. 5B presents the simplified structure of a decoding device DEC adapted to implementing the decoding method according to any one of the particular embodiments of the invention described here above.
Such a decoding device comprises a memory MEM5, a processing unit UT5, equipped for example with a processor PROCR5.
According to one particular embodiment of the invention, the decoding method is implemented by a computer program PG5 stored in a memory MEM5 and driving the processing unit UT5. The computer program PG5 comprises instructions to implement the steps of the decoding method as described here above when the program is executed by the processor PROC5.
At initialization, the code instructions of the computer program PG5 are for example loaded into a memory (not shown) and then executed by the processor PROC5. The processor PROC5 of the processing unit UT5 especially implements the steps of the decoding method described in relation to FIG. 3 or 5A, according to the instructions of the computer program PG5.
According to another particular embodiment of the invention, the decoding method is implemented by functional modules (P, Q−1, T1, D, PROJ). To this end, the processing unit UT5 cooperates with the different functional modules and the memory MEM5 in order to implement the steps of the decoding method. The memory MEM5 can especially include the memories MEMb, MEMe.
The different functional modules described here above can be in hardware or software form. In hardware form, such a functional module can include a processor, a memory and program code instructions to implement the function corresponding to the module when the code instructions are executed by the processor. In hardware form, such a functional module can be implemented by any type of adapted encoding circuit such as, for example and non-exhaustively, microprocessors, digital signal processors (DSPs), applications specific integrated circuits (ASICs), field programmable gate array (FPGA) circuits, logic unit wiring.
According to one particular embodiment of the invention, the blocks of an image of the enhancement layer are encoded by groups of blocks. Such a group of blocks is also called a tile. Each group of blocks, i.e. each tile, is encoded independently of the other tiles.
Each tile can then be decoded independently of the other tiles. Such tiles (TE0-TE11) are illustrated in FIG. 6A representative of an image of the 360° omnidirectional video at a time instant in which 12 tiles are defined and entirely overlap the image.
The term “independent encoding of tiles” is understood here to mean an encoding of the blocks of a tile that do not use any spatial prediction from a block of another tile of the image, or temporal prediction from a block of a tile of the reference image not co-localized with the current tile.
Each tile is encoded by temporal prediction or inter-layer prediction on the basis of one or more of the base layers as illustrated in FIGS. 6A and 6B. In FIGS. 6A and 6B, the tiles TE4 and TE7 are encoded by inter-layer prediction relative to the image projected in the reference image Iref of the view 1 and of the tiles TE3 and TE6 are encoded by inter-layer projection relative to the image projected in the reference image Iref of the view 2.
According to this particular embodiment of the invention, it can happen that a receiver adapted to decoding and rendering a 360° video decodes only the tiles necessary for the current zone of the 360° image viewed by a user. Indeed, during the rendering of a 360° video, a user cannot, at an instant t, view the entire image of the video, i.e. he cannot look in all directions at the same time and can, at an instant t, view only the zone of the image facing his gaze.
For example, such a viewing zone is represented by the zone ZV of FIG. 6A. Thus, according to this embodiment, only the base layers that have served for the prediction of the zone viewed by the user are decoded at the step 31. In the example described in FIGS. 6A and 6B, only the base layer corresponding to the view 1 is decoded during the step 31 and only the tiles TE4, TE5, TE7 and TE8 are decoded during the step 35 of FIG. 3 on the basis of the enhancement layer EL. During the step 35, only the part of the image of the enhancement layer corresponding to the tiles TE4, TE5, TE7 and TE8 is rebuilt. The particular embodiment described with reference to FIGS. 6A and 6B is described here in the case where the tiles of the enhancement layer EL to be decoded depend on only one base layer (that of the view 1). According to other variants, a tile of the enhancement layer EL can be encoded by prediction from several base layers, as a function for example of the choices of bit rate/distortion optimization made during the encoding of the blocks of the enhancement layer, a block of a tile being possibly encoded by prediction relative to a first base layer and another block of the same tile being possibly encoded by another base layer distinct from the first base layer. In this case, all the base layers used for the prediction of the blocks of a tile of the enhancement layer must be decoded.
To this end, the stream of encoded data comprises, for each tile of the enhancement layer, a piece of information identifying the base layers used to predict the tile.
For example, for each tile, syntax elements indicating the number of base layers used and an identifier of each base layer used are encoded in the data stream. Such syntax elements are decoded for each tile of the enhancement layer to be decoded during the step 35 for decoding the enhancement layer.
The particular embodiment described here limits the use of the resources of the decoder and avoids the decoding of data that is unnecessary because it is not viewed by the user. Such an embodiment can be implemented by any one of the encoding devices and any one of the decoding devices described here above.
The methods of encoding and decoding described here above have been described in the case where the rebuilt images of the base layers are projected at the steps 14 and 15 of FIG. 1A and at the steps 32, 34 of FIG. 3 on a same reference image inserted in the memory of reference images of the enhancement layer.
When the number of base layers is limited, for example when it is 1 or 2, such a reference image has large-sized, non-defined zones, for example set at zero by default, which then use memory resources unnecessarily.
According to other variants, the rebuilt images of the base layers projected on the enhancement layer can be stored in reference sub-images. For example, a sub-image can be used for each base layer. Each sub-image is stored in association with shift information enabling the encoder and/or the decoder to determine the location of the sub-image in the enhancement image. Such a variant gives the advantage of saving memory space by avoiding the need for a reference image in the enhancement layer, for which the majority of the samples are zero.
Such a variant can be implemented independently of the decoder and/or of the encoder.
Although the present disclosure has been described with reference to one or more examples, workers skilled in the art will recognize that changes may be made in form and detail without departing from the scope of the disclosure and/or the appended claims.

Claims (14)

What is claimed is:
1. A method for encoding a data stream representative of an omnidirectional video, wherein the method comprises the following acts performed by an encoding device:
encoding, in said stream, at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video; and
encoding, in said stream, one enhancement layer representative of the omnidirectional video, the enhancement layer being encoded by prediction relative to the at least one base layer,
wherein the prediction of the enhancement layer relative to the at least one base layer comprises, in order to encode one image of the enhancement layer, called enhancement image, for each of said at least one base layer:
generating a reference sub-image obtained by geometrical projection on said reference sub-image of an image, called a base image, rebuilt from said base layer, said reference sub-image serving as a prediction for said at least one enhancement image,
storing said reference sub-image in association with shift information enabling an encoder to determine the location of the reference sub-image in the enhancement image in a non-transitory computer-readable memory of reference sub-images of the enhancement layer.
2. The method according to claim 1, wherein the data stream comprises a piece of information representative of a type of a geometrical projection used to represent the omnidirectional video.
3. The method according to claim 1, wherein the view represented by the 2D or 3D video is a view extracted from the omnidirectional video.
4. The method according to claim 3,
wherein the data stream comprises a piece of information representative of parameters of projection and of location of said base image in an image of the omnidirectional video, said information being used to project the base image on the reference sub-image.
5. The method according to claim 4, wherein said piece of information representative of the parameters of projection of location of said base image is encoded in the data stream at each image of the omnidirectional video.
6. A method for decoding a data stream representative of an omnidirectional video, wherein the method comprises the following acts performed by a decoding device:
decoding, from said stream, at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video,
decoding, from said stream, one enhancement layer representative of the omnidirectional video, the enhancement layer being decoded by prediction relative to the at least one base layer,
wherein the prediction of the enhancement layer relative to the at least one base layer comprises, in order to decode one image of the enhancement layer, called enhancement image, for each of said at least one base layer:
generating a reference sub-image obtained by geometrical projection on said reference sub-image of an image, called a base image, rebuilt from said base layer, said reference sub-image serving as a prediction for said at least one enhancement image,
storing said reference sub-image in association with shift information enabling a decoder to determine the location of the reference sub-image in the enhancement image in a non-transitory computer-readable memory of reference sub-images of the enhancement layer.
7. The method according to claim 6, wherein the data stream comprises a piece of information representative of a type of a geometrical projection used to represent the omnidirectional video.
8. The method according to claim 6, wherein the view represented by the 2D or 3D video is a view extracted from the omnidirectional video.
9. The method according to claim 8,
wherein the data stream comprises a piece of information representative of parameters of projection and of location of said base image in an image of the omnidirectional video, said information being used to project the base image on the reference sub-image.
10. The method according to claim 9, wherein said piece of information representative of the parameters of projection of location of said base image is encoded in the data stream at each image of the omnidirectional video.
11. The method according to claim 6, wherein the data stream comprises at least two base layers, each base layer being representative of a 2D or 3D video, respectively representative of a view of the scene, the at least two base layers being encoded independently of each other.
12. A device for encoding a data stream representative of an omnidirectional video, wherein the device comprises:
a processor; and
a non-transitory computer-readable medium comprising instructions stored thereon, which when executed by the processor configure the device to:
encode, in said stream, at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video; and
encode, in said stream, one enhancement layer representative of the omnidirectional video, the enhancement layer being encoded by predicting the enhancement layer relative to the at least one base layer,
wherein the prediction of the enhancement layer relative to the at least one base layer comprises, in order to encode one image of the enhancement layer, called enhancement image, for each of said at least one base layer:
generating a reference sub-image obtained by geometrical projection on said reference sub-image of an image, called a base image, rebuilt from said base layer, said reference sub-image serving as a prediction for said at least one enhancement image,
storing said reference sub-image in association with shift information enabling an encoder to determine the location of the reference sub-image in the enhancement image in a non-transitory computer-readable memory of reference sub-images of the enhancement layer.
13. A device for decoding a data stream representative of an omnidirectional video, the device comprises:
a processor; and
a non-transitory computer-readable medium comprising instructions stored thereon, which when executed by the processor configure the device to:
decode, in said stream, at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video; and
decode, in said stream, one enhancement layer representative of the omnidirectional video, the enhancement layer being decoded by prediction relative to the at least one base layer,
wherein the prediction of the enhancement layer relative to the at least one base layer comprises, in order to decode one image of the enhancement layer, called enhancement image, for each of said at least one base layer:
generating a reference sub-image obtained by geometrical projection on said reference sub-image of an image, called a base image, rebuilt from said base layer, said reference sub-image serving as a prediction for said at least one enhancement image,
storing said reference sub-image in association with shift information enabling a decoder to determine the location of the reference sub-image in the enhancement image in a non-transitory computer-readable memory of reference sub-images of the enhancement layer.
14. A non-transitory computer-readable medium comprising instructions stored thereon, which when executed by a processor of an encoding device or respectively a decoding device configure the encoding device or respectively the decoding device to:
encode or respectively decode a data stream representative of an omnidirectional video by:
encoding or respectively decoding, in said stream, at least one base layer representative of a 2D or 3D video, the 2D or 3D video being representative of a view of a same scene captured by the omnidirectional video; and
encoding or respectively decoding, in said stream, one enhancement layer representative of the omnidirectional video, the enhancement layer being encoded by prediction relative to the at least one base layer,
wherein the prediction of the enhancement layer relative to the at least one base layer comprises, in order to encode at image of the enhancement layer, called enhancement image, for each of said at least one base layer:
generating a reference sub-image obtained by geometrical projection on said reference sub-image of an image, called a base image, rebuilt from said base layer, said reference sub-image serving as a prediction for said at least one enhancement image,
storing said reference sub-image in association with shift information enabling the encoding device or the decoding device to determine the location of the reference sub-image in the enhancement image in a non-transitory computer-readable memory of reference sub-images of the enhancement layer.
US17/500,362 2017-10-19 2021-10-13 Methods for encoding decoding of a data flow representing of an omnidirectional video Active US11736725B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/500,362 US11736725B2 (en) 2017-10-19 2021-10-13 Methods for encoding decoding of a data flow representing of an omnidirectional video

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
FR1759822 2017-10-19
FR1759822A FR3072850B1 (en) 2017-10-19 2017-10-19 CODING AND DECODING METHODS OF A DATA FLOW REPRESENTATIVE OF AN OMNIDIRECTIONAL VIDEO
PCT/EP2018/077922 WO2019076764A1 (en) 2017-10-19 2018-10-12 Methods for encoding and decoding a data flow representing an omnidirectional video
US202016756755A 2020-04-16 2020-04-16
US17/500,362 US11736725B2 (en) 2017-10-19 2021-10-13 Methods for encoding decoding of a data flow representing of an omnidirectional video

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US16/756,755 Continuation US11172223B2 (en) 2017-10-19 2018-10-12 Methods for encoding decoding of a data flow representing of an omnidirectional video
PCT/EP2018/077922 Continuation WO2019076764A1 (en) 2017-10-19 2018-10-12 Methods for encoding and decoding a data flow representing an omnidirectional video

Publications (2)

Publication Number Publication Date
US20220046279A1 US20220046279A1 (en) 2022-02-10
US11736725B2 true US11736725B2 (en) 2023-08-22

Family

ID=61187409

Family Applications (2)

Application Number Title Priority Date Filing Date
US16/756,755 Active US11172223B2 (en) 2017-10-19 2018-10-12 Methods for encoding decoding of a data flow representing of an omnidirectional video
US17/500,362 Active US11736725B2 (en) 2017-10-19 2021-10-13 Methods for encoding decoding of a data flow representing of an omnidirectional video

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US16/756,755 Active US11172223B2 (en) 2017-10-19 2018-10-12 Methods for encoding decoding of a data flow representing of an omnidirectional video

Country Status (5)

Country Link
US (2) US11172223B2 (en)
EP (1) EP3698546A1 (en)
CN (1) CN111357292A (en)
FR (1) FR3072850B1 (en)
WO (1) WO2019076764A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11317104B2 (en) * 2019-05-15 2022-04-26 Tencent America LLC Method and apparatus for video coding

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080267289A1 (en) 2006-01-11 2008-10-30 Huawei Technologies Co., Ltd. Method And Device For Performing Interpolation In Scalable Video Coding
US20110012994A1 (en) 2009-07-17 2011-01-20 Samsung Electronics Co., Ltd. Method and apparatus for multi-view video coding and decoding
US20120213275A1 (en) 2011-02-22 2012-08-23 Kwon Nyeong-Kyu Scalable video coding and devices performing the scalable video coding
US20140064364A1 (en) 2012-09-04 2014-03-06 Research In Motion Limited Methods and devices for inter-layer prediction in scalable video compression
US20140098880A1 (en) 2012-10-05 2014-04-10 Qualcomm Incorporated Prediction mode information upsampling for scalable video coding
JP2015035641A (en) 2013-07-10 2015-02-19 シャープ株式会社 Image decoder and image encoder
WO2015048176A1 (en) 2013-09-24 2015-04-02 Vid Scale, Inc. Inter-layer prediction for scalable video coding
US20150256838A1 (en) 2012-09-30 2015-09-10 Sharp Kabushiki Kaisha Signaling scalability information in a parameter set
US20150281708A1 (en) 2012-03-22 2015-10-01 Mediatek Inc. a corporation Method and apparatus of scalable video coding
US20160014425A1 (en) 2012-10-01 2016-01-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Scalable video coding using inter-layer prediction contribution to enhancement layer prediction
CN105308966A (en) 2013-04-05 2016-02-03 三星电子株式会社 Video encoding method and apparatus thereof, and a video decoding method and apparatus thereof
US20160156917A1 (en) 2013-07-11 2016-06-02 Nokia Technologies Oy An apparatus, a method and a computer program for video coding and decoding
CN105659600A (en) 2013-07-17 2016-06-08 汤姆逊许可公司 Method and device for decoding a scalable stream representative of an image sequence and corresponding coding method and device
WO2016108188A1 (en) 2014-12-31 2016-07-07 Nokia Technologies Oy Inter-layer prediction for scalable video coding and decoding
WO2016185090A1 (en) 2015-05-20 2016-11-24 Nokia Technologies Oy An apparatus, a method and a computer program for video coding and decoding
US20170238011A1 (en) 2016-02-17 2017-08-17 Telefonaktiebolaget Lm Ericsson (Publ) Methods and Devices For Encoding and Decoding Video Pictures
US20180041764A1 (en) * 2016-08-08 2018-02-08 Mediatek Inc. View-Independent Decoding For Omnidirectional Video
WO2018045108A1 (en) 2016-09-02 2018-03-08 Vid Scale, Inc. Method and system for signaling of 360-degree video information
US20180376126A1 (en) * 2017-06-26 2018-12-27 Nokia Technologies Oy Apparatus, a method and a computer program for omnidirectional video
US20190349598A1 (en) 2017-01-03 2019-11-14 Nokia Technologies Oy An Apparatus, a Method and a Computer Program for Video Coding and Decoding

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9774882B2 (en) * 2009-07-04 2017-09-26 Dolby Laboratories Licensing Corporation Encoding and decoding architectures for format compatible 3D video delivery

Patent Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080267289A1 (en) 2006-01-11 2008-10-30 Huawei Technologies Co., Ltd. Method And Device For Performing Interpolation In Scalable Video Coding
US20110012994A1 (en) 2009-07-17 2011-01-20 Samsung Electronics Co., Ltd. Method and apparatus for multi-view video coding and decoding
CN102577376A (en) 2009-07-17 2012-07-11 三星电子株式会社 Method and apparatus for multi-view video coding and decoding
US20120213275A1 (en) 2011-02-22 2012-08-23 Kwon Nyeong-Kyu Scalable video coding and devices performing the scalable video coding
US20150281708A1 (en) 2012-03-22 2015-10-01 Mediatek Inc. a corporation Method and apparatus of scalable video coding
US20140064364A1 (en) 2012-09-04 2014-03-06 Research In Motion Limited Methods and devices for inter-layer prediction in scalable video compression
US20150256838A1 (en) 2012-09-30 2015-09-10 Sharp Kabushiki Kaisha Signaling scalability information in a parameter set
US20160014425A1 (en) 2012-10-01 2016-01-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Scalable video coding using inter-layer prediction contribution to enhancement layer prediction
US20140098880A1 (en) 2012-10-05 2014-04-10 Qualcomm Incorporated Prediction mode information upsampling for scalable video coding
US20160037178A1 (en) 2013-04-05 2016-02-04 Samsung Electronics Co., Ltd. Video encoding method and apparatus thereof and a video decoding method and apparatus thereof
US10728565B2 (en) 2013-04-05 2020-07-28 Samsung Electronics Co., Ltd. Video encoding method and apparatus thereof and a video decoding method and apparatus thereof
CN105308966A (en) 2013-04-05 2016-02-03 三星电子株式会社 Video encoding method and apparatus thereof, and a video decoding method and apparatus thereof
JP2015035641A (en) 2013-07-10 2015-02-19 シャープ株式会社 Image decoder and image encoder
US20160156917A1 (en) 2013-07-11 2016-06-02 Nokia Technologies Oy An apparatus, a method and a computer program for video coding and decoding
CN105659600A (en) 2013-07-17 2016-06-08 汤姆逊许可公司 Method and device for decoding a scalable stream representative of an image sequence and corresponding coding method and device
US20160165244A1 (en) 2013-07-17 2016-06-09 Thomson Licensing Method and device for decoding a scalable stream representative of an image sequence and corresponding coding method and device
US9961353B2 (en) 2013-07-17 2018-05-01 Thomson Licensing Method and device for decoding a scalable stream representative of an image sequence and corresponding coding method and device
WO2015048176A1 (en) 2013-09-24 2015-04-02 Vid Scale, Inc. Inter-layer prediction for scalable video coding
WO2016108188A1 (en) 2014-12-31 2016-07-07 Nokia Technologies Oy Inter-layer prediction for scalable video coding and decoding
WO2016185090A1 (en) 2015-05-20 2016-11-24 Nokia Technologies Oy An apparatus, a method and a computer program for video coding and decoding
US20170238011A1 (en) 2016-02-17 2017-08-17 Telefonaktiebolaget Lm Ericsson (Publ) Methods and Devices For Encoding and Decoding Video Pictures
US20180041764A1 (en) * 2016-08-08 2018-02-08 Mediatek Inc. View-Independent Decoding For Omnidirectional Video
WO2018045108A1 (en) 2016-09-02 2018-03-08 Vid Scale, Inc. Method and system for signaling of 360-degree video information
US20190349598A1 (en) 2017-01-03 2019-11-14 Nokia Technologies Oy An Apparatus, a Method and a Computer Program for Video Coding and Decoding
US20180376126A1 (en) * 2017-06-26 2018-12-27 Nokia Technologies Oy Apparatus, a method and a computer program for omnidirectional video

Non-Patent Citations (15)

* Cited by examiner, † Cited by third party
Title
"Algorithm descriptions of projection format conversion and video quality metrics in 360Lib Version 4", 119. MPEG MEETING; 20170717 - 20170721; TORINO; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11), 1 October 2017 (2017-10-01), XP030023717
"Algorithm Descriptions of Projection Format Conversion and Video Quality Metrics in 360Lib Version 4", 119. MPEG Meeting; Jul. 17, 2017-Jul. 21, 2017; Torino; (Motion Picture Expert Group or ISO/IEC JTC1/SC29/WG11),, No. N17056, Oct. 1, 2017 (Oct. 1, 2017), XP030023717.
Boyce, J. M. et al., "Overview of SHVC: Scalable Extensions of the High Efficiency Video Coding Standard", IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, No. 1, Jan. 2016.
Chinese Office Action, including Search Report, dated Dec. 19, 2022 for parallel Chinese Application No. 201880068327.9.
English translation of Written Opinion of the International Searching Authority dated Dec. 13, 2018 for corresponding International Application No. PCT/EP2018/077922, filed Oct. 12, 2018.
Final Office Action dated Apr. 1, 2021 from the United States Patent and Trademark Office for U.S. Appl. No. 16/756,755, filed Apr. 16, 2020.
H.-M. OH; S. OH (LGE): "Omnidirectional fisheye video SEI message", 29. JCT-VC MEETING; 19-10-2017 - 25-10-2017; MACAU; (JOINT COLLABORATIVE TEAM ON VIDEO CODING OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://WFTP3.ITU.INT/AV-ARCH/JCTVC-SITE/, 11 October 2017 (2017-10-11), XP030118311
H-M Oh et al., "Omnidirectional Fisheye Video SEI message", 29. JCT-VC Meeting; Oct. 23, 2017-Oct. 27, 2017; Macu; (Joint Collaborative Team on Video Coding of ISO/IEC JTC1/SC29/WG11 and ITU-T SG.16); URL: Http://WFTP3.ITU.INT/AV-Arch/JCTVC-Site/,, No. JCTVC-AC0034 Oct. 11, 2017 (Oct. 11, 2017), XP030118311.
International Search Report and Written Opinion dated Nov. 27, 2018 for corresponding International Application No. PCT/EP2018/077922, filed Oct. 12, 2018.
Kashyap Kammachi-Sreedhar et al., "Standard-compliant Multiview Video Coding and Streaming for Virtual Reality Applications", 2016 IEEE International Symposium on Multimedia.
Notice of Allowance dated Jul. 14, 2021 from the United States Patent and Trademark Office for U.S. Appl. No. 16/756,755, filed Apr. 16, 2020.
Office Action dated Sep. 25, 2020 from the United States Patent and Trademark Office for U.S. Appl. No. 16/756,755, filed Apr. 16, 2020.
Sun, W. et al., "Test Sequences for Virtual Reality Video Coding from LetinVR", Joint Video Exploration Team (JVET), of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 4th Meeting: Chengdu, CN, Oct. 15-21, 2016, Document: JVET-D0179.
Y.-K. WANG (QUALCOMM): "On ERP equations for sample location remapping and sphere coverage signalling", 29. JCT-VC MEETING; 19-10-2017 - 25-10-2017; MACAU; (JOINT COLLABORATIVE TEAM ON VIDEO CODING OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://WFTP3.ITU.INT/AV-ARCH/JCTVC-SITE/, 10 October 2017 (2017-10-10), XP030118298
Y-K Wang (Qualcomm): "On ERP Equations for Sample Location Remapping and Sphere Coverage Signalling" 29. JCT-VC Meeting; Oct. 23, 2017, Oct. 27, 2017; Macau; (Joint Collaborative Team on Video Coding of ISO/IEC JTC1/SC29/WG11 and ITU-T SG.16); URL: Http://WFTP3.ITU.INT/AV-arch/JCTVC-Site/,, No. JCTVC-AC0024-v6, Oct. 10, 2017 (Oct. 10, 2017), XP030118298.

Also Published As

Publication number Publication date
US11172223B2 (en) 2021-11-09
EP3698546A1 (en) 2020-08-26
CN111357292A (en) 2020-06-30
US20220046279A1 (en) 2022-02-10
FR3072850A1 (en) 2019-04-26
US20200267411A1 (en) 2020-08-20
WO2019076764A1 (en) 2019-04-25
FR3072850B1 (en) 2021-06-04

Similar Documents

Publication Publication Date Title
US10249019B2 (en) Method and apparatus for mapping omnidirectional image to a layout output format
US11778171B2 (en) Apparatus, a method and a computer program for video coding and decoding
US10620441B2 (en) Viewport-aware quality metric for 360-degree video
CN109155861B (en) Method and apparatus for encoding media content and computer-readable storage medium
TWI754644B (en) encoding device
JP2022537576A (en) Apparatus, method and computer program for video encoding and decoding
CN107409214B (en) Method and apparatus for decoding inter-layer video and method and apparatus for encoding inter-layer video
US20210227236A1 (en) Scalability of multi-directional video streaming
CN111819855B (en) Cancellation flag indication in video streams
JP7373597B2 (en) Image decoding device
CN107005705B (en) Method and apparatus for encoding or decoding multi-layered image using inter-layer prediction
US20120262545A1 (en) Method for coding and decoding a 3d video signal and corresponding devices
US11736725B2 (en) Methods for encoding decoding of a data flow representing of an omnidirectional video
JP7416820B2 (en) Null tile coding in video coding
CN115699768A (en) Image coding method based on POC information and non-reference picture mark in video or image coding system
CN116134821A (en) Method and apparatus for processing high level syntax in an image/video coding system
TWI637356B (en) Method and apparatus for mapping omnidirectional image to a layout output format
CN112997502B (en) Encoding device, decoding device, encoding method, and decoding method
CN114982242A (en) Method and apparatus for signaling picture segmentation information
CN115004708A (en) Method and apparatus for signaling image information
CN114982231A (en) Image decoding method and apparatus therefor
CN115004709A (en) Method and apparatus for signaling slice-related information
CN116134816A (en) Method and apparatus for processing general constraint information in image/video coding system
CN114902681A (en) Method and apparatus for signaling information related to slice in image/video encoding/decoding system

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE