WO2018069215A1 - Procédé, appareil et flux permettant de coder une transparence et des informations d'ombre d'un format vidéo immersif - Google Patents

Procédé, appareil et flux permettant de coder une transparence et des informations d'ombre d'un format vidéo immersif Download PDF

Info

Publication number
WO2018069215A1
WO2018069215A1 PCT/EP2017/075620 EP2017075620W WO2018069215A1 WO 2018069215 A1 WO2018069215 A1 WO 2018069215A1 EP 2017075620 W EP2017075620 W EP 2017075620W WO 2018069215 A1 WO2018069215 A1 WO 2018069215A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
data
layer
mapping surface
bitstream
Prior art date
Application number
PCT/EP2017/075620
Other languages
English (en)
Inventor
Gerard Briand
Renaud Dore
Mary-Luc Champel
Izabela Orlac
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from EP16306346.4A external-priority patent/EP3310052A1/fr
Priority claimed from EP16306347.2A external-priority patent/EP3310057A1/fr
Priority claimed from EP16306348.0A external-priority patent/EP3310053A1/fr
Application filed by Thomson Licensing filed Critical Thomson Licensing
Publication of WO2018069215A1 publication Critical patent/WO2018069215A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/29Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding involving scalability at the object level, e.g. video object layer [VOL]

Definitions

  • the present disclosure relates to the domain of immersive video content.
  • the present disclosure is also understood in the context of the formatting of the data representative of the immersive content, for example for the rendering on end-user devices such as mobile devices or Head-Mounted Displays. 2. Background
  • Display systems such as a head-mounted display (HMD) or a CAVE allow a user to browse into an immersive video content.
  • the immersive video content may be obtained with CGI (Computer-generated imagery) techniques. With such immersive video content, it is possible to compute the content according to the point of view of the user watching it, but with unrealistic graphical quality.
  • the immersive video content may be obtained with the mapping of a video (e.g. a video acquired by several cameras) on a surface such as a sphere or a cube.
  • a video e.g. a video acquired by several cameras
  • Such an immersive video content provides good image quality but issues related to the parallax, especially for objects of the scene of the foreground, i.e. close from the cameras, appear.
  • free-viewpoint video is a technique for representation and coding of multi-view video and subsequent re-rendering from arbitrary viewpoints. While increasing the user experience in immersive context, the amount of data to be transported to the renderer is very important and may be an issue.
  • the present disclosure relates to a method of encoding data representative of an omnidirectional video into a bitstream, the method comprising:
  • the present disclosure relates to a device configured to encode data representative of an omnidirectional video into a bitstream, the device comprising a memory associated with at least one processor configured to:
  • the present disclosure relates to a device configured to encode data representative of an omnidirectional video into a bitstream, the device comprising:
  • the present disclosure relates to a device configured to decode data representative of an omnidirectional video from a bitstream, the device comprising a memory associated with at least one processor configured to:
  • the present disclosure relates to a device configured to decode data representative of an omnidirectional video into a bitstream, the device comprising:
  • the shadow information is encoded into (respectively decoded from) a second layer of the bitstream with data representative of the first mapping surface, the second layer referring to a second video track.
  • the shadow information is encoded into (respectively decoded from) the first layer, the shadow information comprising a pointer to a second video track.
  • the second video track comprises data representative of transparency to apply to the first video.
  • the second video track comprises data representative of shadow associated with data representative of background obtained from the first video and data representative of transparency.
  • the present disclosure also relates to a bitstream carrying data representative of a first mapping surface in a first layer, the first layer referring to a first video track comprising background data of an omnidirectional video and shadow information associated with at least a part of the first mapping surface, the at least a part being deformed to correspond to the topology of at least a part of a scene represented in the first video and comprising shadow.
  • the present disclosure also relates to a computer program product comprising program code instructions to execute the steps of the method of encoding or decoding data representative of an omnidirectional video, when this program is executed on a computer.
  • the present disclosure also relates to a (non-transitory) processor readable medium having stored therein instructions for causing a processor to perform at least the abovementioned method of encoding or decoding data representative of an omnidirectional video.
  • the present disclosure also relates to a (non-transitory) processor readable medium having stored therein instructions for causing a processor to perform at least the abovementioned method of rendering an omnidirectional video from a bitstream carrying data representative of the omnidirectional video, when this program is executed on a computer.
  • FIG. 1 shows an immersive content, according to a particular embodiment of the present principles
  • FIG. 2 shows an example of an equirectangular mapping function, according to a specific embodiment of the present principles
  • FIG. 4 shows a functional overview of an encoding and decoding system according to an example of the present principles
  • - figures 5 to 9 each shows a first system configured to process an immersive content, according to specific embodiments of the present principles
  • - figures 10 to 12 each shows a second system configured to process an immersive content, according to specific embodiments of the present principles
  • FIG. 13 shows a first embodiment of a device configured to render an immersive content, according to an example of the present disclosure
  • - figure 17 shows a process of decoding and rendering an immersive content, according to a particular embodiment of the present principles
  • - figure 18 shows the obtaining of an immersive content from several video contents, according to a specific embodiment of the present principles
  • FIG. 19 shows an example of an architecture of a device configured for implementing the method(s) of figures 28 to 33, in accordance with an example of the present principles
  • - figures 20 to 25 each shows the syntax of a data stream, according to example embodiments of the present principles
  • FIG. 26 shows a process of obtaining shadow information for an immersive content, according to a particular embodiment of the present principles
  • FIG. 27 shows a mapping surface used to represent at least a part of an immersive content comprising one or more shadows, according to a particular embodiment of the present principles
  • the present principles will be described in reference to a first particular embodiment of a method of (and a device configured for) encoding data representative of an omnidirectional video (also called immersive video) into a bitstream.
  • data representative of a first mapping surface e.g. a sphere, a cube, a cylinder, an octahedron
  • the first layer refers to (or points to) a first video track that comprises a first video representing the background of the omnidirectional video, the first mapping surface being used to map the first video on it.
  • the first video is a "flat" video in the sense that there is no parallax between the objects represented in the first video.
  • Data representative of a second mapping surface is encoded into a second layer of the bitstream.
  • the second mapping surface is different from the first mapping surface, for example the dimensions of the first and second mapping surfaces are different, the centers of the first and second mapping surface are different and/or the types of the first and second mapping surfaces are different.
  • the second layer refers to (or points to) a second video track that comprises a second video representing at least a part of the foreground of the omnidirectional video, the second mapping surface being used to map the second video on the second mapping surface.
  • the second video is a "flat" video in the sense that there is no parallax between the objects represented in the second video.
  • shadow information is encoded into the bitstream.
  • the shadow information is intended to be mapped on the first mapping surface where the shadow has to appear on the first "background” video.
  • the first mapping surface is partially deformed to correspond to the topology of the scene represented in the first video, especially on the parts of the scene where the shadow(s) has to appear.
  • the addition of shadow information associated with the same first mapping surface than the one used to represent the first "background” video enables to enhance the realism of the rendering of the scene and to bring parallax in the omnidirectional video composed of the first "background” video and the shadow information.
  • a corresponding method of (and a device configured for) decoding data representative of the omnidirectional video is also described with regard to the third aspect of the present principles.
  • the bitstream may comprise a first layer referring to the background, a second layer referring to the foreground, a third layer referring to transparency information associated with the foreground.
  • the bitstream may comprise a first layer referring to the background, a second layer referring to the foreground, a third layer referring to transparency information associated with the foreground and a fourth layer referring to the shadow information.
  • a video or a video content a sequence of successive still images, the sequence comprising one or more still image(s).
  • the omnidirectional video consequently comprises one or more successive image(s).
  • the display device used to visualize the immersive content 1 1 is for example a HMD (Head-Mounted Display), worn on the head of a user or as part of a helmet.
  • the HMD advantageously comprises one or more display screens (for example LCD (Liquid Crystal Display), OLED (Organic Light- Emitting Diode) or LCOS (Liquid Crystal On Silicon)) and sensor(s) configured for measuring the change(s) of position of the HMD, for example gyroscopes or an IMU (Inertial Measurement Unit), according to one, two or three axes of the real world (pitch, yaw and/or roll axis).
  • display screens for example LCD (Liquid Crystal Display), OLED (Organic Light- Emitting Diode) or LCOS (Liquid Crystal On Silicon)
  • sensor(s) configured for measuring the change(s) of position of the HMD, for example gyroscopes or an IMU (Inertial Measurement Unit
  • the part 1 2 of the immersive content 1 1 corresponding to the measured position of the HMD is advantageously determined with a specific function establishing the relationship between the point of view associated with the HMD in the real world and the point of view of a virtual camera associated with the immersive content 1 1 .
  • Controlling the part 1 2 of the video content to be displayed on the display screen(s) of the HMD according to the measured position of the HMD enables a user wearing the HMD to browse into the immersive content, which is larger than the field of view associated with the display screen(s) of the HMD.
  • the immersive system is a CAVE (Cave Automatic Virtual Environment) system, wherein the immersive content is projected onto the walls of a room.
  • the walls of the CAVE are for example made up of rear- projection screens or flat panel displays. The user may thus browse his/her gaze on the different walls of the room.
  • the CAVE system is advantageously provided with cameras acquiring images of the user to determine by video processing of these images the gaze direction of the user.
  • the gaze or the pose of the user is determined with a tracking system, for example an infrared tracking system, the user wearing infrared sensors.
  • the immersive system is a tablet with a tactile display screen, the user browsing into the content by scrolling the content with one or more fingers sliding onto the tactile display screen.
  • the immersive content 1 1 and the part 1 2 as well may comprise foreground object(s) and background object(s).
  • the background object(s) may be obtained for example from a first video representing the background of the immersive content 1 1 .
  • the foreground object(s) may be obtained for example from one or more second videos each representing one or more of the foreground objects, the immersive content being obtained by compositing of the first video with the second video(s).
  • the immersive content 1 1 is not limited to a 4 ⁇ steradian video content but extends to any video content (or audio-visual content) having a size greater than the field of view 12.
  • the immersive content may be for example a 2 ⁇ , 2.5 ⁇ , 3 ⁇ steradian content and so on.
  • An immersive video is a video encoded on at least one rectangular image that is a two-dimension array of pixels (i.e. element of color information) like a "regular" video.
  • the image is first mapped on the inner face of a convex volume, also called mapping surface (e.g. a sphere, a cube, a pyramid), and, second, a part of this volume is captured by a virtual camera.
  • mapping surface e.g. a sphere, a cube, a pyramid
  • Images captured by the virtual camera are rendered on the screen of an immersive display device (e.g a HMD).
  • a stereoscopic video is encoded on one or two rectangular images, projected on two mapping surfaces which are combined to be captured by two virtual cameras according to the characteristics of the device. Pixels are encoded according to a mapping function in the image.
  • mapping function depends on the mapping surface.
  • mapping surface For a same mapping surface, several mapping functions may be possible.
  • the faces of a cube may be structured according to different layouts within the image surface.
  • a sphere may be mapped according to an equirectangular projection or to a gnomonic projection for example.
  • Figure 2 shows an example of an equirectangular mapping function.
  • the sequence of image(s) of an immersive video is encoded on a rectangular image 21 meant to be mapped on a spherical mapping surface 22.
  • the mapping function 23 establishes a mapping between each pixel of the image 21 and a point on the mapping surface 22 (and vice versa).
  • the mapping function 23 is based on the equirectangular projection (also called equidistant cylindrical projection).
  • the image on the image 21 is distorted.
  • the distances are respected at the equator and stretched at poles. Straight lines are no longer straight and perspectives are distorted.
  • the mapping function 23 is based on the equidistant conic projection for instance.
  • the projection function 25 consists in selecting a part of the mapping surface 22 as seen by a camera located at the center of the sphere, the camera being configured in terms of field of view and resolution in order to produce an image that directly fits with the screen 24.
  • the chosen field of view depends on the characteristics of the display device. For HMD for example, the angle of the field of view is close to the human stereoscopic vision field, which is around one hundred and twenty degrees.
  • the aiming direction of the camera corresponds to the direction the user is looking toward and the virtual camera controller of the immersive video rendering device is used to modify the aiming direction of the camera.
  • the sequence of images is encoded on a rectangular (or square) image 31 meant to be mapped on a cubical mapping surface 32.
  • the mapping function 33 establishes a correspondence between squares in the image 31 and faces of the cube 32. Vice versa, the mapping function determines how the faces of the cube 32 are organized within the surface of the image 31 . Images on each face are not distorted. However, in the total image of the image 31 , lines are piece-wise straight and perspectives are broken. The image may contain empty squares (filled with default or random color information, white on the example of figure 3).
  • the projection function works as the projection function of figure 2.
  • a camera is placed at the center of the cube 32 and captures an image that fits the screen of the immersive rendering device.
  • the pre-processing module 300 may also accept an omnidirectional video in a particular format (for example, equirectangular) as input, and pre-processes the video to change the mapping into a format more suitable for encoding. Depending on the acquired video data representation, the pre-processing module 300 may perform a mapping space change.
  • the encoding device 400 and the encoding method will be described with respect to other figures of the specification.
  • the data which may encode immersive video data or 3D CGI encoded data for instance, are sent to a network interface 500, which can be typically implemented in any network interface, for instance present in a gateway.
  • the data are then transmitted through a communication network, such as internet but any other network can be foreseen.
  • Network interface 600 can be implemented in a gateway, in a television, in a set-top box, in a head mounted display device, in an immersive (projective) wall or in any immersive video rendering device.
  • the data are sent to a decoding device 700.
  • Decoding function is one of the processing functions described in the following figures 5 to 15.
  • Decoded data are then processed by a player 800.
  • Player 800 prepares the data for the rendering device 900 and may receive external data from sensors or users input data. More precisely, the player 800 prepares the part of the video content that is going to be displayed by the rendering device 900.
  • the processing device can also comprise a second communication interface with a wide access network such as internet and access content located on a cloud, directly or through a network device such as a home or a local gateway.
  • the processing device can also access a local storage through a third interface such as a local access network interface of Ethernet type.
  • the processing device may be a computer system having one or several processing units.
  • it may be a smartphone which can be connected through wired or wireless links to the immersive video rendering device or which can be inserted in a housing in the immersive video rendering device and communicating with it through a connector or wirelessly as well.
  • Communication interfaces of the processing device are wireline interfaces (for example a bus interface, a wide area network interface, a local area network interface) or wireless interfaces (such as a I EEE 802.1 1 interface or a Bluetooth® interface).
  • the immersive video rendering device can be provided with an interface to a network directly or through a gateway to receive and/or transmit content.
  • the system comprises an auxiliary device which communicates with the immersive video rendering device and with the processing device.
  • this auxiliary device can contain at least one of the processing functions.
  • the immersive video rendering device may comprise one or several displays.
  • the device may employ optics such as lenses in front of each of its display.
  • the display can also be a part of the immersive display device like in the case of smartphones or tablets.
  • displays and optics may be embedded in a helmet, in glasses, or in a visor that a user can wear.
  • the immersive video rendering device may also integrate several sensors, as described later on.
  • the immersive video rendering device can also comprise several interfaces or connectors. It might comprise one or several wireless modules in order to communicate with sensors, processing functions, handheld or other body parts related devices or sensors.
  • Figure 5 illustrates a particular embodiment of a system configured to decode, process and render immersive videos.
  • the system comprises an immersive video rendering device 1 0, sensors 20, user inputs devices 30, a computer 40 and a gateway 50 (optional).
  • FIG 7 shows a third embodiment related to the one shown in Figure 2.
  • the game console 60 processes the content data.
  • Game console 60 sends data and optionally control commands to the immersive video rendering device 10.
  • the game console 60 is configured to process data representative of an immersive video and to send the processed data to the immersive video rendering device 10 for display. Processing can be done exclusively by the game console 60 or part of the processing can be done by the immersive video rendering device 10.
  • the game console 60 is connected to internet, either directly or through a gateway or network interface 50.
  • the game console 60 obtains the data representative of the immersive video from the internet.
  • the game console 60 obtains the data representative of the immersive video from a local storage (not represented) where the data representative of the immersive video are stored, said local storage can be on the game console 60 or on a local server accessible through a local area network for instance (not represented).
  • Immersive video rendering device 70 is described with reference to
  • Sensors used for pose estimation are, for instance, gyroscopes, accelerometers or compasses. More complex systems, for example using a rig of cameras may also be used. In this case, the at least one processor performs image processing to estimate the pose of the device 10. Some other measurements are used to process the content according to environment conditions or user's reactions. Sensors used for observing environment and users are, for instance, microphones, light sensor or contact sensors. More complex systems may also be used like, for example, a video camera tracking user's eyes. In this case the at least one processor performs image processing to operate the expected measurement.
  • Figure 10 shows an example of a system of the second type. It comprises a display 1 000 which is an immersive (projective) wall which receives data from a computer 4000.
  • the computer 4000 may receive immersive video data from the internet.
  • the computer 4000 is usually connected to internet, either directly or through a gateway 5000 or network interface.
  • the immersive video data are obtained by the computer 4000 from a local storage (not represented) where the data representative of an immersive video are stored, said local storage can be in the computer 4000 or in a local server accessible through a local area network for instance (not represented).
  • This system may also comprise sensors 2000 and user input devices 3000.
  • the immersive wall 1000 can be of OLED or LCD type. It can be equipped with one or several cameras.
  • the immersive wall 1000 may process data received from the sensor 2000 (or the plurality of sensors 2000).
  • the data received from the sensors 2000 may be related to lighting conditions, temperature, environment of the user, e.g. position of objects.
  • Figure 11 shows another example of a system of the second type. It comprises an immersive (projective) wall 6000 which is configured to process (e.g. decode and prepare data for display) and display the video content. It further comprises sensors 2000, user input devices 3000.
  • immersive wall 6000 which is configured to process (e.g. decode and prepare data for display) and display the video content. It further comprises sensors 2000, user input devices 3000.
  • This system may also comprise sensors 2000 and user input devices 3000.
  • the immersive wall 6000 can be of OLED or LCD type. It can be equipped with one or several cameras.
  • the immersive wall 6000 may process data received from the sensor 2000 (or the plurality of sensors 2000). The data received from the sensors 2000 may be related to lighting conditions, temperature, environment of the user, e.g. position of objects.
  • the immersive wall 6000 may process the video data (e.g. decoding them and preparing them for display) according to the data received from these sensors/user input devices.
  • the sensors signals can be received through a communication interface of the immersive wall.
  • This communication interface can be of Bluetooth type, of WIFI type or any other type of connection, preferentially wireless but can also be a wired connection.
  • the immersive wall 6000 may comprise at least one communication interface to communicate with the sensors and with internet.
  • Gaming console 7000 sends instructions and user input parameters to the immersive wall 6000.
  • Immersive wall 6000 processes the immersive video content possibly according to input data received from sensors 2000 and user input devices 3000 and gaming consoles 7000 in order to prepare the content for display.
  • the immersive wall 6000 may also comprise internal memory to store the content to be displayed.
  • Figure 16 shows a process of obtaining, encoding and/or formatting data representative of an omnidirectional video, according to a particular embodiment of the present principles.
  • Operations 1 61 to 1 64 refer to background data, i.e. data that are intended to be used to form the background of the omnidirectional video that results from the compositing of the first video and from one or more second video comprising one or more foreground objects.
  • An example of such an omnidirectional video is shown on figure 18.
  • Figure 18 illustrates a non- limitative example of an omnidirectional video 185 resulting from the combination of a first "flat" omnidirectional video 181 acquired with a plurality of cameras and of a second "flat” video 182.
  • the first video 181 is mapped onto a first mapping surface 183 (e.g. a first sphere) and the second video is mapped on a second mapping surface 184 (e.g. a second sphere having a radius smaller than the radius of the first sphere).
  • the first video corresponds to the background of the omnidirectional video 185 and the second video corresponds to the foreground of the omnidirectional video 185.
  • a first video is acquired.
  • the first video is an omnidirectional video of a scene acquired for example with an omnidirectional camera that is a camera with a 360-degree field of view in the horizontal plane or with a visual field that covers the entire sphere (4 ⁇ steradians).
  • the first video may be acquired with a rig of several camera that enables to acquire the whole sphere or part of it.
  • the first video is not acquired but retrieved from the cloud, a library of omnidirectional videos or any storage unit or apparatus.
  • An audio track associated with the first video may also be optionally acquired.
  • the first video is processed.
  • the first video may be for example stitched if acquired with a plurality of cameras.
  • it is signalled to a video encoder under which format the first video may be encoded, for example according to H.264 standard or HEVC standard.
  • it is further signalled which first mapping surface is to be used to represent the first video.
  • the first mapping surface may be for example chosen by an operator from a list of available mapping surfaces, e.g. sphere, squished sphere, cylinder, cube, octahedron, icosahedron, truncated pyramid. Dimensions (i.e.
  • the first mapping surface (and associated dimensions and centre point) is determined by default by the system. For example, it may be decided that a first video representative of background information is mapped by default on a sphere having a determined radius and a determined centre point, thus requiring no input from the operator.
  • the type of the first mapping surface, its dimensions and center points form advantageously a set of metadata associated with the first video.
  • the sound information acquired with the first video when any sound has been acquired, is encoded into an audio track according to a determined format, for example according to AAC (Advanced Audio Coding) standard, WMA (Windows Media Audio), MPEG-1 /2 Audio Layer 3.
  • AAC Advanced Audio Coding
  • WMA Windows Media Audio
  • MPEG-1 MPEG-1 /2 Audio Layer 3.
  • the data of the first video is encoded into a first video track according to a determined format, for example according to H.264/MPEG-4 AVC: “Advanced video coding for generic audiovisual Services”, SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Recommendation ITU-T H.264, Telecommunication Standardization Sector of ITU, February 2014 or according to HEVC/H265: "ITU-T H.265 TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (10/2014), SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Infrastructure of audiovisual services - Coding of moving video, High efficiency video coding, Recommendation ITU-T H.265".
  • Operations 1 65 to 168 refer to foreground data and/or transparency data and/or shadow data, i.e. data that are intended to be used to form the foreground of the omnidirectional video that results from the compositing of the first video and from one or more second video comprising one or more foreground objects and/or to form shadow(s) on the background resulting from the foreground objects.
  • foreground data and/or transparency data and/or shadow data i.e. data that are intended to be used to form the foreground of the omnidirectional video that results from the compositing of the first video and from one or more second video comprising one or more foreground objects and/or to form shadow(s) on the background resulting from the foreground objects.
  • a second video is acquired.
  • the second video corresponds to an omnidirectional video or to a standard video, i.e. a video with a standard field of view.
  • the second video is composed by an operator, for example when the second video represent one or more shadows to be overlaid onto the background of the omnidirectional video.
  • the second video is not acquired but retrieved from the cloud, a library of omnidirectional videos or any storage unit or apparatus.
  • An audio track associated with the second video may also be optionally acquired.
  • the second video is processed.
  • Chroma-key operation is for example performed on the second video to extract transparency information associated with the second video.
  • chroma-key operation consists in classifying the pixels of the images of the second video according to their color and to associate a value comprised between 0 and 255 for example with each pixel, depending on the color associated with the pixel. This value is called alpha value and is representative of transparency.
  • the transparency information obtained from the second video and associated with the second video is used when compositing the second video with another video, e.g. the first video.
  • the background of the second video e.g.
  • the parts of the second video having blue or green pixels is usually assigned the value 0 and will be removed when combining the second video with the first video, the background of the second video being fully transparent.
  • the foreground object(s) of the second video are assigned an alpha value equal or close to 1 and is considered as being opaque.
  • the second video is further processed after the chroma-key operation to assign to each pixel of the background (i.e. each pixel having a color information corresponding to the background, e.g. a blue or green color information) a determined color value (e.g. blue or green) that has a fixed value and that is the same for each and every pixel of the background.
  • a further processing enables to better control the background of the second video to improve the background removal when compositing.
  • the determined color value may be stored and signalled as a metadata.
  • operation 1 66 it may also be signalled to a video encoder under which format the second video may be encoded, for example according to H.264 standard or HEVC standard.
  • a video encoder under which format the second video may be encoded
  • transparency information has been obtained from the second video
  • it may also be signalled to the video encoded under which format the transparency information may be encoded.
  • HEVC standard proposes a specific configuration to efficiently encode grayscale video as the transparency information.
  • it is further signalled which second mapping surface is to be used to represent the second video.
  • the second mapping surface may be for example chosen by an operator from a list of available mapping surfaces, e.g. sphere, squished sphere, cylinder, cube, octahedron, icosahedron, truncated pyramid.
  • the sound information acquired with the second, when any sound has been acquired is encoded into an audio track according to a determined format, for example according to AAC (Advanced Audio Coding, ISO/IEC 14496-10, Information technology— Coding of audiovisual objects— Part 10: Advanced Video Coding), WMA (Windows Media Audio), MPEG-1 /2 Audio Layer 3.
  • AAC Advanced Audio Coding, ISO/IEC 14496-10, Information technology— Coding of audiovisual objects— Part 10: Advanced Video Coding), WMA (Windows Media Audio), MPEG-1 /2 Audio Layer 3.
  • the data of the second video is encoded into a second video track according to a determined format, for example according to H.264/MPEG-4 AVC: “Advanced video coding for generic audiovisual Services”, SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Recommendation ITU-T H.264, Telecommunication Standardization Sector of ITU, February 2014 or according to HEVC/H265: "ITU-T H.265 TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (10/2014), SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS, Infrastructure of audiovisual services - Coding of moving video, High efficiency video coding, Recommendation ITU-T H.265".
  • the data representative of transparency may be encoded into a third video track using the same format as the one used to encode the associated second video.
  • the data comprised in the first, second and/or video tracks are encoded with the signalling information and the metadata obtained from the operations 1 62 and 1 66.
  • the encoding process results in a container comprising one or more bitstream having a determined syntax, as described with more details with regard to figures 20 to 25.
  • bitstream(s) obtained at operation 1 69 is (are) stored in a memory device and/or transmitting to be decoded and process, e.g. to render the data representative of the omnidirectional video comprised in such bitstream(s), as described with more details with regard to figure 17.
  • Figures 20 to 25 each shows the syntax of a set of one or more bitstreams, according to example embodiments of the present principles.
  • mapping surface e.g. a sphere or a cube identified with an integer, e.g. 3 for a sphere
  • the coordinates may be expressed in the same space as the one used for the first mapping surface.
  • the coordinates of the center point of the second mapping surface are expressed with offset with regard to the coordinates of the center point of the first mapping surface.
  • the second layer 2212 may additionally comprise an information on the type of the second layer, i.e. a layer representative of foreground and transparency.
  • the variant of figure 22 has the advantage of minimizing the data to be encoded into the bitstream 221 with regard to the number of data to be encoded into the bitstream 21 1 of figure 21 , the data representative of the second mapping surface being encoded only once while the data representative of the second mapping surface is encoded twice in the bitstream 21 1 , one time for the second layer 2012 and one time for the third layer 21 13.
  • Figure 23 shows the syntax of a set of one or more bitstreams, according to a second example.
  • a bitstream 231 is represented with a plurality of layers 201 1 , 2012, 21 13 to 201 n, n being an integer greater than or equal to 2.
  • the first layer 201 1 and the n th layer 201 n are identical to the first layer 201 1 and the n th layer 201 n described with regard to figure 20.
  • Figures 23 shows a variant of figures 21 and/or 22 for encoding transparency information associated with a foreground video.
  • the second layer 2212 refers to the second video track 203 that comprises data representative of foreground object(s).
  • the second layer 2312 further comprises information representative of transparency.
  • the transparency information may correspond to the color value of a pixel of the background of second video or to coordinates x, y of such a pixel of the background within the second video.
  • Such a transparency information may be used at the decoding/rendering stage for the chroma-key operation. Knowing the color value of the background of the second video, the decoder/renderer may be able to retrieve the transparency information.
  • the second layer 2212 also comprises data representative of a second mapping surface associated with the second video of the second video track 202.
  • the transparency information corresponds to a chroma-key shader, i.e. a program intended to be executed by GPUs at the rendering stage, the chroma-key shader comprising the instructions and parameters needed by the renderer to perform the chroma-key operation.
  • Figure 17 shows a process of obtaining, decoding and/or interpreting data representative of an omnidirectional video from the one or more bitstreams obtained from the process of figure 1 6, according to a particular embodiment of the present principles.
  • data representative of additional second mapping surfaces may be obtained/decoded from additional second layers of the bitstream, each additional second layer referring to a video track comprising data of a video corresponding to at least a part of the foreground of the omnidirectional video.
  • Figure 29 illustrates a method for encoding data representative of an omnidirectional video implemented for example in a device 1 90 (described with regard to figure 1 9), according to a non-restrictive embodiment of a second aspect of the present principles.
  • a step 294 transparency information associated with the second video is encoded into the bitstream.
  • the transparency information is for example encoded into the second layer or into a third layer different from the second layer.
  • the transparency information may correspond to a pointer to a third video track that comprises data of a third video representative of transparency, such as in an alpha channel.
  • the transparency information corresponds to a color value of a background pixel of the second video or to the coordinates of a pixel of the background of the second video.
  • the transparency information corresponds to a chroma-key shader intended to be executed by a renderer rendering the omnidirectional video.
  • a step 324 transparency information associated with the second video is decoded from the bitstream.
  • the transparency information is for example decoded from the second layer or from a third layer different from the second layer.
  • the transparency information may correspond to a pointer to a third video track that comprises data of a third video representative of transparency, such as in an alpha channel.
  • the transparency information corresponds to a color value of a background pixel of the second video or to the coordinates of a pixel of the background of the second video.
  • the transparency information corresponds to a chroma-key shader intended to be executed by a renderer rendering the omnidirectional video.
  • shadow information associated with at least a part of the first mapping surface is encoded into the bitstream.
  • the part of the first mapping surface with which the shadow information is associated is deformed to correspond to the topology of the part(s) of a scene represented in the first video, which comprise shadow(s).
  • the shadow information is encoded into a second layer of the bitstream or into the first layer and advantageously points to a second video track that comprises data of a second video representative of shadow (either as a grayscale or as a video representative of shadow with an alpha key, i.e. with data representative of transparency).
  • the shadow information is associated with a deformed first mapping surface of the same type than the deformed first mapping surface associated with the first video but with smaller dimension.
  • the present disclosure also relates to a method (and a device configured) for displaying images rendered from the data stream comprising the information representative of the object of the scene and to a method (and a device configured) for rendering and displaying the object with a flat video.
  • the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a computer program product, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program).
  • An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
  • the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, Smartphones, tablets, computers, mobile phones, portable/personal digital assistants ("PDAs”), and other devices that facilitate communication of information between end-users.
  • PDAs portable/personal digital assistants
  • Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding, data decoding, view generation, texture processing, and other processing of images and related texture information and/or depth information.
  • equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set- top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices.
  • the equipment may be mobile and even installed in a mobile vehicle.
  • the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD"), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”).
  • the instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un procédé et un dispositif de codage de données représentatives d'une vidéo omnidirectionnelle dans un train de bits, comprenant le codage de données représentatives d'une première surface de mappage en une première couche du train de bits, la première couche se référant à une première piste vidéo qui comprend des données correspondant à l'arrière-plan de la vidéo omnidirectionnelle, et le codage d'informations d'ombre associées à au moins une partie de la première surface de mappage dans le flux binaire, ladite partie étant déformée afin de correspondre à la topologie d'au moins une partie d'une scène représentée dans la première vidéo et comprenant une ombre. La présente invention concerne également un procédé et un appareil de décodage correspondants.
PCT/EP2017/075620 2016-10-12 2017-10-09 Procédé, appareil et flux permettant de coder une transparence et des informations d'ombre d'un format vidéo immersif WO2018069215A1 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
EP16306346.4A EP3310052A1 (fr) 2016-10-12 2016-10-12 Procédé, appareil et flux de format vidéo immersif
EP16306347.2A EP3310057A1 (fr) 2016-10-12 2016-10-12 Procédé, appareil et flux permettant de coder des informations d'ombrage et de transparence pour un format vidéo immersif
EP16306347.2 2016-10-12
EP16306346.4 2016-10-12
EP16306348.0A EP3310053A1 (fr) 2016-10-12 2016-10-12 Procédé et appareil permettant de coder des informations de transparence pour un format vidéo immersif
EP16306348.0 2016-10-12

Publications (1)

Publication Number Publication Date
WO2018069215A1 true WO2018069215A1 (fr) 2018-04-19

Family

ID=60037620

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/075620 WO2018069215A1 (fr) 2016-10-12 2017-10-09 Procédé, appareil et flux permettant de coder une transparence et des informations d'ombre d'un format vidéo immersif

Country Status (1)

Country Link
WO (1) WO2018069215A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020141995A1 (fr) * 2019-01-03 2020-07-09 Telefonaktiebolaget Lm Ericsson (Publ) Prise en charge de réalité augmentée dans un format de média omnidirectionnel
WO2023083218A1 (fr) * 2021-11-11 2023-05-19 华为技术有限公司 Procédé d'affichage de l'image en lumière lors de la mise en place d'un mirage d'écran, appareil et système connexes

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110211043A1 (en) * 2008-11-04 2011-09-01 Koninklijke Philips Electronics N.V. Method and system for encoding a 3d image signal, encoded 3d image signal, method and system for decoding a 3d image signal
US20140178029A1 (en) * 2012-12-26 2014-06-26 Ali Fazal Raheman Novel Augmented Reality Kiosks
US20150215623A1 (en) * 2014-01-24 2015-07-30 Lucasfilm Entertainment Company Ltd. Dynamic lighting capture and reconstruction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110211043A1 (en) * 2008-11-04 2011-09-01 Koninklijke Philips Electronics N.V. Method and system for encoding a 3d image signal, encoded 3d image signal, method and system for decoding a 3d image signal
US20140178029A1 (en) * 2012-12-26 2014-06-26 Ali Fazal Raheman Novel Augmented Reality Kiosks
US20150215623A1 (en) * 2014-01-24 2015-07-30 Lucasfilm Entertainment Company Ltd. Dynamic lighting capture and reconstruction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DALAI FELINTO ET AL: "Production framework for full panoramic scenes with photorealistic augmented reality", INFORMATICA (CLEI), 2012 XXXVIII CONFERENCIA LATINOAMERICANA EN, IEEE, 1 October 2012 (2012-10-01), pages 1 - 10, XP032330805, ISBN: 978-1-4673-0794-9, DOI: 10.1109/CLEI.2012.6427123 *
NICOLAS H: "Scalable Video Compression Scheme for Tele-Surveillance Applications Based on Cast Shadow Detection and Modelling", IMAGE PROCESSING, 2005. ICIP 2005. IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA,IEEE, vol. 3, 11 September 2005 (2005-09-11), pages 493 - 496, XP010852697, ISBN: 978-0-7803-9134-5, DOI: 10.1109/ICIP.2005.1530436 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020141995A1 (fr) * 2019-01-03 2020-07-09 Telefonaktiebolaget Lm Ericsson (Publ) Prise en charge de réalité augmentée dans un format de média omnidirectionnel
WO2023083218A1 (fr) * 2021-11-11 2023-05-19 华为技术有限公司 Procédé d'affichage de l'image en lumière lors de la mise en place d'un mirage d'écran, appareil et système connexes

Similar Documents

Publication Publication Date Title
JP7241018B2 (ja) 没入型ビデオフォーマットのための方法、装置、及びストリーム
KR102600011B1 (ko) 3 자유도 및 볼류메트릭 호환 가능한 비디오 스트림을 인코딩 및 디코딩하기 위한 방법들 및 디바이스들
CN109478344B (zh) 用于合成图像的方法和设备
KR20200065076A (ko) 볼류메트릭 비디오 포맷을 위한 방법, 장치 및 스트림
US20180165830A1 (en) Method and device for determining points of interest in an immersive content
KR20200083616A (ko) 볼류메트릭 비디오를 인코딩/디코딩하기 위한 방법, 장치 및 스트림
US20220094903A1 (en) Method, apparatus and stream for volumetric video format
US20190268584A1 (en) Methods, devices and stream to provide indication of mapping of omnidirectional images
JP2021502033A (ja) ボリュメトリックビデオを符号化/復号する方法、装置、およびストリーム
EP3562159A1 (fr) Procédé, appareil et flux pour format vidéo volumétrique
KR20190046850A (ko) 몰입형 비디오 포맷을 위한 방법, 장치 및 스트림
US20200045342A1 (en) Methods, devices and stream to encode global rotation motion compensated images
EP3547703A1 (fr) Procédé, appareil et flux pour format vidéo volumétrique
WO2018069215A1 (fr) Procédé, appareil et flux permettant de coder une transparence et des informations d'ombre d'un format vidéo immersif
WO2015185537A1 (fr) Procédé et dispositif pour la reconstruction du visage d'un utilisateur portant un visiocasque
JP2023506832A (ja) 補助パッチを有する容積ビデオ
KR102607709B1 (ko) 3 자유도 및 볼류메트릭 호환 가능한 비디오 스트림을 인코딩 및 디코딩하기 위한 방법들 및 디바이스들
EP3310057A1 (fr) Procédé, appareil et flux permettant de coder des informations d'ombrage et de transparence pour un format vidéo immersif
EP3310053A1 (fr) Procédé et appareil permettant de coder des informations de transparence pour un format vidéo immersif
EP3310052A1 (fr) Procédé, appareil et flux de format vidéo immersif
EP3709659A1 (fr) Procédé et appareil de codage et de décodage de vidéo volumétrique
US20230032599A1 (en) Methods and apparatuses for encoding, decoding and rendering 6dof content from 3dof+ composed elements
US20230217006A1 (en) A method and apparatuses for delivering a volumetric video content
US20230215080A1 (en) A method and apparatus for encoding and decoding volumetric video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17780749

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17780749

Country of ref document: EP

Kind code of ref document: A1