WO2020068935A1 - Signalisation de correspondance de point central de fenêtre d'affichage d'un point de vue de réalité virtuelle - Google Patents

Signalisation de correspondance de point central de fenêtre d'affichage d'un point de vue de réalité virtuelle Download PDF

Info

Publication number
WO2020068935A1
WO2020068935A1 PCT/US2019/052894 US2019052894W WO2020068935A1 WO 2020068935 A1 WO2020068935 A1 WO 2020068935A1 US 2019052894 W US2019052894 W US 2019052894W WO 2020068935 A1 WO2020068935 A1 WO 2020068935A1
Authority
WO
WIPO (PCT)
Prior art keywords
viewport
viewpoint
video
viewpoints
correspondence
Prior art date
Application number
PCT/US2019/052894
Other languages
English (en)
Inventor
Ye-Kui Wang
Yuqun FAN
Peiyun Di
Original Assignee
Futurewei Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Futurewei Technologies, Inc. filed Critical Futurewei Technologies, Inc.
Publication of WO2020068935A1 publication Critical patent/WO2020068935A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/128Adjusting depth or disparity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234345Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2365Multiplexing of several video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4347Demultiplexing of several video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0081Depth or disparity estimation from stereoscopic image signals

Definitions

  • the present disclosure is generally related to virtual reality (VR), also referred to as omnidirectional media, immersive media, and 360 degree video, and is specifically related to mechanisms for signaling viewport center point correspondences between VR video viewpoints.
  • VR virtual reality
  • VR virtual reality
  • HMD head mounted displays
  • VR video often also referred to as 360 degree video or omnidirectional video
  • gaming include gaming, training, education, sports video, online shopping, adult entrainment, and so on.
  • the disclosure includes a method implemented in a decoder.
  • the method comprises processing a virtual reality (VR) video stream, wherein the VR video stream comprises a plurality of viewpoints included in a viewpoint set, wherein each of the viewpoints corresponds to one particular omnidirectional video camera for capturing an omnidirectional video at a particular location, and wherein the VR video stream contains information indicative of a plurality of viewport centers for the viewpoints in the viewpoint set.
  • the method further comprises presenting a first viewport of a first viewpoint of the viewpoint set to a user.
  • the method further comprises switching from the first viewpoint to a second viewpoint of the viewpoint set.
  • the method further comprises determining a second viewport of the second viewpoint based on the information indicative of a plurality of viewport centers for the viewpoints in the viewpoint set.
  • viewpoints receive a default viewport. The effect is that a user looking at a first object at a first viewpoint via a first viewport can switch to a second viewpoint. However, when the switch is made, the user has to manually reorient from the default viewport to a second viewport in order to find the object being watched.
  • the present disclosure includes a mechanism to signal correspondences between viewport centers of related viewpoints. In this way, a user viewing an object at a first viewpoint can automatically be reoriented to that object upon switching to a second viewpoint based on the viewport center correspondences between the viewpoints.
  • the mechanisms support increased functionality at the decoder. Further, some systems indicate correspondences between viewports according to spatial regions. However, viewpoint center points can be encoded using less data than encoding spatial regions. As such, the present disclosure supports increased coding efficiency. Hence, the disclosed mechanisms provide for reduced memory usage at the encoder and the decoder, as well as the reduced network resource usage to communicate such data.
  • another implementation of the aspect provides, wherein the information indicative of the plurality of viewport centers indicates the second viewport and the first viewport are corresponding viewpoint centers.
  • another implementation of the aspect provides, wherein the information indicative of the plurality of viewport centers is coded as a pair in a viewport center point correspondence (vcpc) sample entry.
  • vcpc viewport center point correspondence
  • another implementation of the aspect provides, wherein the information indicative of the plurality of viewport centers is coded as a set containing a plurality of viewport centers in a vcpc sample entry.
  • another implementation of the aspect provides, wherein the information indicative of the plurality of viewport centers is coded in a timed metadata track related to the plurality of viewpoints.
  • another implementation of the aspect provides, wherein the information indicative of the plurality of viewport centers is coded in a sphere region structure.
  • the disclosure includes a method implemented in an encoder.
  • the method comprises receiving, at a processor of the encoder, a VR video signal filmed from a plurality of viewpoints.
  • the method further comprises determining, by the processor, a correspondence between viewport centers for the viewpoints.
  • the method further comprises encoding, by the processor, the correspondence between the viewport centers for the viewpoints in a bitstream.
  • the method further comprises transmitting, by a transmitter of the encoder, the bitstream containing the correspondence between the viewport centers for the viewpoints to support viewpoint transitions when displaying the VR video signal.
  • viewpoints receive a default viewport. The effect is that a user looking at a first object at a first viewpoint via a first viewport can switch to a second viewpoint.
  • the present disclosure includes a mechanism to signal correspondences between viewport centers of related viewpoints. In this way, a user viewing an object at a first viewpoint can automatically be reoriented to that object upon switching to a second viewpoint based on the viewport center correspondences between the viewpoints. Accordingly, the mechanisms support increased functionality at the decoder. Further, some systems indicate correspondences between viewports according to spatial regions. However, viewpoint center points can be encoded using less data than encoding spatial regions. As such, the present disclosure supports increased coding efficiency. Hence, the disclosed mechanisms provide for reduced memory usage at the encoder and the decoder, as well as the reduced network resource usage to communicate such data.
  • another implementation of the aspect provides, wherein the correspondence between the viewport centers is coded as a pair in a vcpc sample entry.
  • another implementation of the aspect provides, wherein the correspondence between the viewport centers is coded as a set containing a plurality of viewport centers in a vcpc sample entry.
  • another implementation of the aspect provides, wherein the correspondence between the viewport centers is coded in a timed metadata track related to the plurality of viewpoints.
  • another implementation of the aspect provides, wherein the correspondence between the viewport centers is coded in a sphere region structure.
  • the disclosure includes a method implemented in a decoder. The method comprises receiving, by a receiver of the decoder, a bitstream including at least a portion of a coded VR video filmed from a plurality of viewpoints and including a correspondence between viewport centers for the viewpoints.
  • the method further comprises decoding, by a processor of the decoder, the portion of the VR video at a center point of a source viewport at a source viewpoint.
  • the method further comprises forwarding, by the processor, the portion of the VR video at the source viewport toward a display.
  • the method further comprises determining, by the processor, to switch from the source viewpoint to a destination viewpoint.
  • the method further comprises determining, by the processor, a destination viewport at the destination viewpoint based on the source viewport and the correspondence between viewport centers for the viewpoints.
  • the method further comprises decoding, by the processor, the portion of the VR video at a center point of the destination viewport at the destination viewpoint.
  • the method further comprises forwarding, by the processor, the portion of the VR video at the destination viewport toward the display.
  • viewpoints receive a default viewport.
  • the effect is that a user looking at a first object at a first viewpoint via a first viewport can switch to a second viewpoint.
  • the switch when the switch is made, the user has to manually reorient from the default viewport to a second viewport in order to find the object being watched.
  • the present disclosure includes a mechanism to signal correspondences between viewport centers of related viewpoints. In this way, a user viewing an object at a first viewpoint can automatically be reoriented to that object upon switching to a second viewpoint based on the viewport center correspondences between the viewpoints. Accordingly, the mechanisms support increased functionality at the decoder. Further, some systems indicate correspondences between viewports according to spatial regions.
  • viewpoint center points can be encoded using less data than encoding spatial regions.
  • the present disclosure supports increased coding efficiency.
  • the disclosed mechanisms provide for reduced memory usage at the encoder and the decoder, as well as the reduced network resource usage to communicate such data.
  • another implementation of the aspect provides, wherein the correspondence between the viewport centers is coded as a pair in a vcpc sample entry.
  • another implementation of the aspect provides, wherein the correspondence between the viewport centers is coded as a set containing a plurality of viewport centers in a vcpc sample entry.
  • another implementation of the aspect provides, wherein the correspondence between the viewport centers is coded in a timed metadata track related to the plurality of viewpoints.
  • another implementation of the aspect provides, wherein the correspondence between the viewport centers is coded in a sphere region structure.
  • the disclosure includes video coding device comprising a processor, a receiver coupled to the processor, and a transmitter coupled to the processor, the processor, receiver, and transmitter configured to perform the method of any of the preceding aspects.
  • the disclosure includes a non-transitory computer readable medium comprising a computer program product for use by a video coding device, the computer program product comprising computer executable instructions stored on the non- transitory computer readable medium such that when executed by a processor cause the video coding device to perform the method of any of the preceding aspects.
  • the disclosure includes an encoder comprising a receiving means for receiving a VR video signal filmed from a plurality of viewpoints.
  • the encoder further comprises a correspondence determination means for determining a correspondence between viewport centers for the viewpoints.
  • the encoder further comprise an encoding means for encoding the correspondence between the viewport centers for the viewpoints in a bitstream.
  • the encoder further comprise a transmitting means for transmitting the bitstream containing the correspondence between the viewport centers for the viewpoints to support viewpoint transitions when displaying the VR video signal.
  • the disclosure includes a decoder comprising a receiving means for receiving a bitstream including at least a portion of a coded VR video filmed from a plurality of viewpoints and including a correspondence between viewport centers for the viewpoints.
  • the decoder further comprises a decoding means for decoding the portion of the VR video at a center point of a source viewport at a source viewpoint, and decoding the portion of the VR video at a center point of a destination viewport at a destination viewpoint.
  • the decoder further comprises a determining means for determining to switch from the source viewpoint to the destination viewpoint, and determining the destination viewport at the destination viewpoint based on the source viewport and the correspondence between viewport centers for the viewpoints.
  • the decoder further comprises a forwarding means for forwarding the portion of the VR video at the source viewport toward a display, and forwarding the portion of the VR video at the destination viewport toward the display.
  • any one of the foregoing embodiments may be combined with any one or more of the other foregoing embodiments to create a new embodiment within the scope of the present disclosure.
  • FIG. 1 is a schematic diagram of an example system for VR based video coding.
  • FIG. 2 is a flowchart of an example method of coding a VR picture bitstream.
  • FIG. 3 is a flowchart of an example method of coding a video signal.
  • FIG. 4 is a schematic diagram of an example coding and decoding (codec) system for video coding.
  • FIG. 5 is a schematic diagram illustrating an example video encoder.
  • FIG. 6 is a schematic diagram illustrating an example video decoder.
  • FIG. 7 is a schematic diagram illustrating an example system for capturing VR video from multiple viewpoints.
  • FIG. 8 is a schematic diagram of a pair of viewpoints with corresponding viewport center points.
  • FIG. 9 is a schematic diagram of a set of viewpoints with corresponding viewport center points.
  • FIG. 10 is a schematic diagram of an example VR video file for multiple viewpoints.
  • FIG. 11 is an embodiment of a method of displaying a VR video at a decoder based on a viewport center point correspondence between multiple viewpoints.
  • FIG. 12 is another embodiment of a method of displaying a VR video at a decoder based on a viewport center point correspondence between multiple viewpoints.
  • FIG. 13 is an embodiment of a method of signaling a viewport center point correspondence between multiple viewpoints in a VR video from an encoder.
  • FIG. 14 is a schematic diagram of an example video coding device.
  • FIG. 15 is a schematic diagram of an embodiment of a system for signaling a viewport center point correspondence between multiple viewpoints in a VR video.
  • Video coding standards include International Telecommunication Union Telecommunication Standardization Sector (ITU-T) document H.261, International Organization for Standardization/ International Electrotechnical Commission (ISO/IEC) Motion Picture Experts Group (MPEG)-l Part 2, ITU-T FI.262 or ISO/IEC MPEG-2 Part 2, ITU-T FI.263, ISO/IEC MPEG-4 Part 2, Advanced Video Coding (AVC), also known as ITU-T FI.264 or ISO/IEC MPEG-4 Part 10, and High Efficiency Video Coding (FIEVC), also known as ITU-T FI.265 or MPEG-FI Part 2.
  • ITU-T International Telecommunication Union Telecommunication Standardization Sector
  • ISO/IEC International Electrotechnical Commission
  • MPEG Motion Picture Experts Group
  • ITU-T FI.263 ISO/IEC MPEG-4 Part 2
  • AVC Advanced Video Coding
  • FIEVC High Efficiency Video Coding
  • AVC includes extensions such as Scalable Video Coding (SVC), Multiview Video Coding (MVC) and Multiview Video Coding plus Depth (MVC+D), and three dimensional (3D) AVC (3D-AVC).
  • FIEVC includes extensions such as Scalable FIEVC (SHVC), Multiview HEVC (MV-HEVC), and 3D HEVC (3D-HEVC).
  • File format standards include the ISO base media file format (ISOBMFF) (ISO/IEC 14496-12, hereinafter“ISO/IEC 14996-12”) and other file format standards derived from ISOBMFF, including MPEG-4 file format (ISO/IEC 14496-14), 3rd Generation Partnership Project (3GPP) file format (3GPP TS 26.244), and AVC file format (ISO/IEC 14496-15, hereinafter“ISO/IEC 14996-15”).
  • ISO/IEC 14496-12 specifies the ISO base media file format.
  • Other documents extend the ISO base media file format for specific applications. For instance, ISO/IEC 14496-15 describes the carriage of Network Abstraction Layer (NAL) unit structured video in the ISO base media file format.
  • NAL Network Abstraction Layer
  • FI.264/AVC and FIEVC are examples of NAL unit structured video.
  • ISO/IEC 14496-15 includes sections describing the carriage of H.264/AVC NAL units. Additionally, section 8 of ISO/IEC 14496- 15 describes the carriage of ffEVC NAL units. Thus, section 8 of ISO/IEC 14496-15 is said to describe the ffEVC file format.
  • ISOBMFF is used as the basis for many codec encapsulation formats, such as the AVC File Format, as well as for many multimedia container formats, such as the MPEG-4 File Format, the 3GPP File Format, and the DVB File Format.
  • codec encapsulation formats such as the AVC File Format
  • multimedia container formats such as the MPEG-4 File Format, the 3GPP File Format, and the DVB File Format.
  • static media such as images, as well as metadata, can be stored in a file conforming to ISOBMFF.
  • Files structured according to ISOBMFF may be used for many purposes, including local media file playback, progressive downloading of a remote file, segments for Dynamic Adaptive Streaming over ffyper Text Transfer Protocol (ffTTP) (DASff), containers for content to be streamed and corresponding packetization instructions, and recording of received real-time media streams.
  • ffTTP Dynamic Adaptive Streaming over ffyper Text Transfer Protocol
  • ISOBMFF can be employed for streaming, e.g., for progressive download or DASff.
  • movie fragments defined in fSOBMFF can be used fn addition to continuous media, such as audio and video, static media, such as images, as well as metadata can be stored in a file conforming to ISOBMFF.
  • Such file formats and streaming mechanisms can be employed to encode, signal, decode, and display a VR video.
  • a VR video can be recorded from multiple viewpoints.
  • a viewpoint is position of a camera used to capture video.
  • multiple cameras can be positioned at multiple locations to record a scene, an event, etc.
  • such cameras may include a camera array and/or fisheye camera(s) capable of capturing wide angle video.
  • a VR camera mechanism can capture a sphere of video, or sub-portions thereof. Only a portion of the sphere may be displayed to a user. Upon viewing, a user can control a viewing orientation from the viewpoint.
  • a VR video can be taken of a basketball game from multiple viewpoints on, around, and/or above the court.
  • a user may be allowed to view the game from a viewpoint of choice and at an orientation/angle of choice from the selected viewpoint.
  • a default viewing orientation/angle can be employed for each viewpoint. Accordingly, when a user switches to a viewpoint, the decoder can employ the default angle to orient the user until the user can select the desired viewing orientation.
  • This implementation has certain drawbacks. For example, a user may wish to pay attention to a particular object in a scene, such as a basketball or a particular player in a basketball game.
  • the user’s viewing angle is reset to the default value each time the user switches between viewpoints. Accordingly, a user viewing a basketball at a first viewpoint would be reoriented to a default angle upon switching to a second viewpoint. This would likely result in losing sight of the basketball. The user would then likely have to search for the current location of the basketball from the new viewpoint.
  • the result is that default viewing orientations may create discontinuities in a user’s viewing experience and create a poor viewing experience in some cases.
  • viewport center point also referred to herein as viewport centers
  • video data related to the viewpoints may be included in tracks of a video file.
  • a timed metadata track that contains data relevant to multiple viewpoints can also be included in the video file.
  • Viewport center point correspondences between the viewpoints may be included in the timed metadata track and/or in the tracks containing video data related to the associated viewpoints.
  • Such information can indicate correspondences between viewpoint pairs and/or viewpoint sets. Specifically, such information can denote that a first viewport at a first viewpoint orients toward the same location in the VR space as a corresponding second viewport at a second viewpoint.
  • the correspondence is indicated by including pairs and/or sets of center points of the corresponding viewports at the associated viewpoints.
  • Signaling viewport center point correspondences may be employed as an alternative to viewpoint spatial region correspondences.
  • viewport center point correspondences can be encoded using fewer bits and associated actions can be computed more simply than when denoting similar information using viewpoint spatial region correspondences (e.g., described in terms of viewport boundaries and/or viewpoint angles). Using such information, a user can switch between viewpoints.
  • the decoder can check the relevant viewpoint and/or metadata track(s) to determine a correspondence between a center point of a source viewport at a source viewpoint and a center point of a destination viewport at a destination viewpoint.
  • the decoder can automatically orient the user toward a destination viewport at the destination viewpoint that corresponds to the source viewport selected by the user at the source viewpoint.
  • a user watching a basketball at a source viewpoint can be automatically oriented toward the basketball upon switching to the destination viewpoint.
  • This allows the decoder to provide a consistent view to a user upon switching between viewpoints.
  • a viewport center points correspondence (Vcpc) sample function (VcpcSample()) and/or a Vcpc sample entry function (VcpcSampleEntry()) can be employed to describe the viewport center correspondences between viewpoints.
  • VcpcSample() and/or VcpcSampleEntry() can be included in a sphere region structure (SphereRegionStructO) object and/or in a sphere region sample (SphereRegionSample()) object, which can then be included in a timed metadata track and/or in corresponding video tracks.
  • SphereRegionStructO sphere region structure
  • SphereRegionSample() sphere region sample
  • Such data can be included at the beginning of the relevant track to indicate initial correspondences between the viewpoints.
  • viewport center point correspondence information can be updated at a corresponding temporal location in the relevant track to denote such changes over time.
  • FIG. 1 is a schematic diagram of an example system 100 for VR based video coding.
  • System 100 includes a multi-directional camera 101, a VR coding device 104 including an encoder 103, a decoder 107, and a rendering device 109.
  • the multi-directional camera 101 comprises an array of camera devices. Each camera device is pointed at a different angle so that the multi-directional camera 101 can take multiple directional video streams of the surrounding environment from a plurality of angles.
  • multi-directional camera 101 can take video of the environment as a sphere with the multi-directional camera 101 at the center of the sphere.
  • sphere and spherical video refers to both a geometrical sphere and sub-portions of a geometrical sphere, such as spherical caps, spherical domes, spherical segments, etc.
  • a multi-directional camera 101 may take one hundred and eighty degree video to cover half of the environment so that a production crew can remain behind the multi-directional camera 101.
  • a multi-directional camera 101 can also take video in three hundred sixty degrees (or any sub-portion thereof) ffowever, a portion of the floor under the multi-directional camera 101 may be omitted, which results in video of less than a perfect sphere.
  • sphere is a general term used for clarity of discussion and should not be considered limiting from a geometrical stand point. It should be noted that in some examples a multi-directional camera 101 may include a camera that includes one or more fisheye lenses (e.g., instead of an array of cameras).
  • a VR coding device 104 may be a computing system including specialized VR coding software.
  • the VR coding device 104 may include an encoder 103 (a.k.a., a video encoder). In some examples, the encoder 103 can also be included in a separate computer system from the VR coding device 104.
  • the VR coding device 104 is configured to convert the multiple directional video streams into a single multiple directional video stream including the entire recorded area from all relevant angles. This conversion may be referred to as image stitching. For example, frames from each video stream that are captured at the same time can be stitched together to create a single spherical image. A spherical video stream can then be created from the spherical images.
  • image stitching For clarity of discussion, it should be noted that the terms frame, picture, and image may be used interchangeably herein unless specifically noted.
  • the spherical video stream can then be forwarded to the encoder 103 for compression.
  • An encoder 103 is a device and/or program capable of converting information from one format to another for purposes of standardization, speed, and/or compression.
  • Standardized encoders 103 are configured to encode rectangular and/or square images. Accordingly, the encoder 103 is configured to map each spherical image from the spherical video stream into a plurality of rectangular sub-pictures. The sub-pictures can then be placed in separate sub-picture video streams. As such, each sub-picture video stream displays a stream of images over time as recorded from a sub-portion of the spherical video stream.
  • the encoder 103 can then encode each sub-picture video stream to compress the video stream to a manageable file size.
  • the encoding process is discussed in more detail below.
  • the encoder 103 partitions each frame from each sub-picture video stream into pixel blocks, compresses the pixel blocks by inter-prediction and/or intra-prediction to create coding blocks including prediction blocks and residual blocks, applies transforms to the residual blocks for further compression, and applies various filters to the blocks.
  • the compressed blocks as well as corresponding syntax are stored in bitstream(s), for example in ISOBMFF and/or in omnidirectional media format (OMAF).
  • the VR coding device 104 may store the encoded bitstream(s) in memory, locally, and/or on a server, for communication to a decoder 107 on demand.
  • the data can be forwarded via a network 105, which may include the Internet, a mobile telecommunications network (e.g., a long term evolution (LTE) based data network), or other data communication data system.
  • LTE long term evolution
  • the decoder 107 (a.k.a., a video decoder) is a device on a user’s location that is configured to reverse the coding process to reconstruct the sub-picture video streams from the encoded bitstream(s).
  • the decoder 107 also merges the sub-picture video streams to reconstruct the spherical video stream.
  • the spherical video stream, or sub-portions thereof, can then be forwarded to the rendering device 109.
  • the rendering device 109 is a device configured to display the spherical video stream to the user.
  • the rendering device 109 may include a FIMD that attaches to the user’s head and covers the user’s eyes.
  • the rendering device 109 may include a screen for each eye, cameras, motion sensors, speakers, etc. and may communicate with the decoder 107 via wireless and/or wired connections.
  • the rendering device 109 may display a sub-portion of the spherical video stream to the user.
  • the sub-portion shown is based on a field of view (FOV) and/or viewport of the rendering device 109.
  • a FOV is the observable area of the recorded environment that is displayed to a user by the rendering device 109.
  • the FOV can be described as a conical projection between a user’s eye and extending into the virtual environment.
  • a viewport is a two dimensional plane upon which a three dimensional environment is projected.
  • a viewport describes the area of a portion of the virtual environment displayed on a screen or screens of a rendering device
  • a FOV describes the portion of the virtual environment seen by the user.
  • viewport and FOV may be used interchangeably in many cases, but may include different technical details.
  • a FOV can be described in terms of pixels, coordinates, and/or bounds while a viewport can be described in terms of angles.
  • the rendering device 109 may change the position of the FOV/viewport based on user head movement by employing the motion tracking sensors. This allows the user to see different portions of the spherical video stream depending on head movement.
  • the rendering device 109 may offset the FOV for each eye based on the user’s interpapillary distance (IPD) to create the impression of a three dimensional space.
  • IPD interpapillary distance
  • the rendering device 109 may be a computer screen or television screen that changes a FOV/viewport based on user input.
  • FIG. 2 is a flowchart of an example method 200 of coding a VR picture bitstream as a plurality of sub-picture bitstreams, for example by employing the components of system 100.
  • a multi-directional camera set such as multi-directional camera 101
  • the multiple directional video streams include views of an environment at various angles.
  • the multiple directional video streams may capture video from three hundred sixty degrees, one hundred eighty degrees, two hundred forty degrees, etc. around the camera in the horizontal plane.
  • the multiple directional video streams may also capture video from three hundred sixty degrees, one hundred eighty degrees, two hundred forty degrees, etc. around the camera in the vertical plane.
  • the result is to create video that includes information sufficient to cover a spherical area around the camera over some period of time.
  • each directional video stream includes a series of images taken at a corresponding angle.
  • the multiple directional video streams are synchronized by ensuring frames from each directional video stream that were captured at the same time domain position are processed together.
  • the frames from the directional video streams can then be stitched together in the space domain to create a spherical video stream.
  • each frame of the spherical video stream contains data taken from the frames of all the directional video streams that occur at a common temporal position.
  • a fisheye lens may capture a single video stream at a wide angle.
  • a single multi directional stream may be captured at step 201, which may allow step 203 to be omitted in some cases.
  • the spherical video stream is mapped into rectangular sub-picture video streams.
  • This process may also be referred to as projecting the spherical video stream into rectangular sub-picture video streams.
  • encoders and decoders are generally designed to encode rectangular and/or square frames. Accordingly, mapping the spherical video stream into rectangular sub-picture video streams creates video streams that can be encoded and decoded by non-VR specific encoders and decoders, respectively. It should be noted that steps 203 and 205 are specific to VR video processing, and hence may be performed by specialized VR hardware, software, or combinations thereof.
  • the rectangular sub-picture video streams can be forwarded to an encoder, such as encoder 103.
  • the encoder then encodes the sub-picture video streams as sub picture bitstreams in a corresponding media file format.
  • each sub-picture video stream can be treated by the encoder as a video signal.
  • the encoder can encode each frame of each sub-picture video stream via inter-prediction, intra-prediction, etc. Such encoding and corresponding decoding as well as encoders and decoders are discussed in detail with respect to the FIGS below.
  • the sub-picture video streams can be stored in ISOBMFF. For example, the sub-picture video streams are captured at a specified resolution.
  • the sub-picture video streams can then be downsampled to various lower resolutions for encoding.
  • Each resolution can be referred to as a representation.
  • Lower quality representations lose image clarity while reducing file size. Accordingly, lower quality representations can be transmitted to a user using fewer network resources (e.g., time, bandwidth, etc.) than higher quality representations with an attendant loss of visual quality.
  • Each representation can be stored in a corresponding set of tracks. Hence, tracks can be sent to a user, where the tracks include the sub-picture bitstreams at various resolutions (e.g., visual quality).
  • the sub-picture bitstreams can be sent to the decoder as tracks.
  • all sub-picture bitstreams are transmitted at the same quality by transmitting tracks from the same representation.
  • the tracks containing sub-picture bitstreams with data in the users FOV may be sent at higher resolutions by selecting higher quality representations.
  • Tracks containing sub-picture bitstreams with areas outside the users FOV can be sent at progressively lower resolutions by selecting lower quality representations. This may be referred to as viewport dependent coding.
  • the tracks may include relatively short video segments (e.g., about three seconds), and hence the representations selected for particular areas of the video can change over time based on changes in FOV. This allows quality to change as the users FOV changes.
  • a decoder such as decoder 107, receives the tracks containing the sub picture bitstreams.
  • the decoder can then decode the sub-picture bitstreams into sub-picture video streams for display.
  • the decoding process involves the reverse of the encoding process (e.g., using inter-prediction and intra-prediction), and is discussed in more detail with respect to the FIGS below.
  • the decoder can merge the sub-picture video streams into the spherical video stream for presentation on a rendering device.
  • the decoder can employ a so called lightweight merging algorithm that selects frames from each sub-picture video stream that occur at the same presentation time and merges them together based on the position and/or angle associated with the corresponding sub-picture video stream.
  • the decoder may also employ filters to smooth edges between the sub-picture video streams, remove artifacts, etc.
  • the decoder can then forward the spherical video stream to a rendering device, such as rendering device 109.
  • the rendering device renders a viewport of the spherical video stream for presentation to the user.
  • areas of the spherical video stream outside of the FOV at each point in time are not rendered.
  • the user can select and view a sub portion of the virtual environment as recorded, and hence can experience the virtual environment as if present at the time of recording.
  • FIG. 3 is a flowchart of an example method 300 of coding a video signal.
  • method 300 may receive a plurality of sub-picture video streams from step 205 of method 200.
  • Method 300 treats each sub-picture video stream as a video signal input.
  • Method 300 applies steps 301-317 to each sub-picture video stream in order to implement steps 207- 211 of method 200.
  • the output video signal from method 300 includes the decoded sub picture video streams, which can be merged and displayed according to steps 213 and 215 of method 200.
  • method 300 can be implemented on a system 100.
  • Method 300 encodes a video signal, for example including sub-picture video streams, at an encoder.
  • the encoding process compresses the video signal by employing various mechanisms to reduce the video file size. A smaller file size allows the compressed video file to be transmitted toward a user, while reducing associated bandwidth overhead.
  • the decoder then decodes the compressed video file to reconstruct the original video signal for display to an end user.
  • the decoding process generally mirrors the encoding process to allow the decoder to consistently reconstruct the video signal.
  • the video signal is input into the encoder.
  • the video signal may be an uncompressed video file stored in memory.
  • the video file may be captured by a video capture device, such as a video camera, and encoded to support live streaming of the video.
  • the video file may include both an audio component and a video component.
  • the video component contains a series of image frames that, when viewed in a sequence, gives the visual impression of motion.
  • the frames contain pixels that are expressed in terms of light, referred to herein as luma components (or luma samples), and color, which is referred to as chroma components (or color samples). It should be noted that a frame may also be referred to as a picture, a sub-frame as a sub-picture, etc.
  • the video signal is partitioned into blocks.
  • Partitioning includes subdividing the pixels in each frame into square and/or rectangular blocks for compression.
  • FIEVC also known as FI.265 and MPEG-FI Part 2
  • the frame can first be divided into coding tree units (CTUs), which are blocks of a predefined size (e.g., sixty four pixels by sixty four pixels).
  • CTUs coding tree units
  • Coding trees may be employed to divide the CTUs into blocks and then recursively subdivide the blocks until configurations are achieved that support further encoding.
  • luma components of a frame may be subdivided until the individual blocks contain relatively homogenous lighting values.
  • chroma components of a frame may be subdivided until the individual blocks contain relatively homogenous color values. Accordingly, partitioning mechanisms vary depending on the content of the video frames.
  • inter-prediction and/or intra-prediction may be employed.
  • Inter-prediction is designed to take advantage of the fact that objects in a common scene tend to appear in successive frames. Accordingly, a block depicting an object in a reference frame need not be repeatedly described in adjacent frames. Specifically, an object, such as a table, may remain in a constant position over multiple frames. Flence the table is described once and adjacent frames can refer back to the reference frame.
  • Pattern matching mechanisms may be employed to match objects over multiple frames. Further, moving objects may be represented across multiple frames, for example due to object movement or camera movement. As a particular example, a video may show an automobile that moves across the screen over multiple frames.
  • Motion vectors can be employed to describe such movement, or lack thereof.
  • a motion vector is a two-dimensional vector that provides an offset from the coordinates of an object in a frame to the coordinates of the object in a reference frame.
  • inter-prediction can encode an image block in a current frame as a set of motion vectors indicating an offset from a corresponding block in a reference frame.
  • Intra-prediction encodes blocks in a common frame. Intra-prediction takes advantage of the fact that luma and chroma components tend to cluster in a frame. For example, a patch of green in a portion of a tree tends to be positioned adjacent to similar patches of green. Intra-prediction employs multiple directional prediction modes (e.g., thirty three in ffEVC), a planar mode, and a direct current (DC) mode. The directional modes indicate that a current block is similar/the same as samples of a neighbor block in a corresponding direction. Planar mode indicates that a series of blocks along a row/column (e.g., a plane) can be interpolated based on neighbor blocks at the edges of the row.
  • a row/column e.g., a plane
  • Planar mode in effect, indicates a smooth transition of light/color across a row/column by employing a relatively constant slope in changing values.
  • DC mode is employed for boundary smoothing and indicates that a block is similar/the same as an average value associated with samples of all the neighbor blocks associated with the angular directions of the directional prediction modes.
  • intra-prediction blocks can represent image blocks as various relational prediction mode values instead of the actual values.
  • inter-prediction blocks can represent image blocks as motion vector values instead of the actual values. In either case, the prediction blocks may not exactly represent the image blocks in some cases. Any differences are stored in residual blocks. Transforms may be applied to the residual blocks to further compress the file.
  • various filtering techniques may be applied.
  • the filters are applied according to an in-loop filtering scheme.
  • the block based prediction discussed above may result in the creation of blocky images at the decoder. Further, the block based prediction scheme may encode a block and then reconstruct the encoded block for later use as a reference block.
  • the in-loop filtering scheme iteratively applies noise suppression filters, de-blocking filters, adaptive loop filters, and sample adaptive offset (SAO) filters to the blocks/frames. These filters mitigate such blocking artifacts so that the encoded file can be accurately reconstructed. Further, these filters mitigate artifacts in the reconstructed reference blocks so that artifacts are less likely to create additional artifacts in subsequent blocks that are encoded based on the reconstructed reference blocks.
  • bitstream includes the data discussed above as well as any signaling data (e.g., syntax) desired to support proper video signal reconstruction at the decoder.
  • data may include partition data, prediction data, residual blocks, and various flags providing coding instructions to the decoder.
  • the bitstream may be stored in memory for transmission toward a decoder upon request, for example as a track and/or track fragment in ISOBMFF.
  • the bitstream may also be broadcast and/or multicast toward a plurality of decoders.
  • the creation of the bitstream is an iterative process. Accordingly, steps 301, 303, 305, 307, and 309 may occur continuously and/or simultaneously over many frames and blocks. The order shown is presented for clarity and ease of discussion, and is not intended to limit the video coding process to a particular order.
  • the decoder receives the bitstream and begins the decoding process at step 311.
  • the decoder can employ an entropy decoding scheme to convert the bitstream into corresponding syntax and video data.
  • the decoder employs the syntax data from the bitstream to determine the partitions for the frames at step 311. The partitioning should match the results of block partitioning at step 303.
  • Entropy encoding/decoding which may be employed in step 311, is now described.
  • the encoder makes many choices during the compression process, such as selecting block partitioning schemes from several possible choices based on the spatial positioning of values in the input image(s). Signaling the exact choices may employ a large number of bins.
  • a bin is a binary value that is treated as a variable (e.g., a bit value that may vary depending on context).
  • Entropy coding allows the encoder to discard any options that are clearly not viable for a particular case, leaving a set of allowable options.
  • Each allowable option is then assigned a code word. The length of the code word is based on the number of allowable options (e.g., one bin for two options, two bins for three to four options, etc.).
  • the encoder then encodes the code word for the selected option. This scheme reduces the size of the code words as the code words are as big as desired to uniquely indicate a selection from a small sub-set of allowable options as opposed to uniquely indicating the selection from a potentially large set of all possible options.
  • the decoder then decodes the selection by determining the set of allowable options in a similar manner to the encoder. By determining the set of allowable options, the decoder can read the code word and determine the selection made by the encoder.
  • the decoder performs block decoding. Specifically, the decoder employs reverse transforms to generate residual blocks. Then the decoder employs the residual blocks and corresponding prediction blocks to reconstruct the image blocks according to the partitioning.
  • the prediction blocks may include both intra-prediction blocks and inter prediction blocks as generated at the encoder at step 305.
  • the reconstructed image blocks are then positioned into frames of a reconstructed video signal according to the partitioning data determined at step 311. Syntax for step 313 may also be signaled in the bitstream via entropy coding as discussed above.
  • step 315 filtering is performed on the frames of the reconstructed video signal in a manner similar to step 307 at the encoder. For example, noise suppression filters, de blocking filters, adaptive loop filters, and SAO filters may be applied to the frames to remove blocking artifacts.
  • the video signal can be forwarded for merging at step 317 and then output to a display, such as a FIMD, for viewing by an end user.
  • FIG. 4 is a schematic diagram of an example coding and decoding (codec) system 400 for video coding.
  • codec system 400 provides functionality to support encoding and decoding sub-picture video streams according to methods 200 and 300. Further, codec system 400 can be employed to implement an encoder 103 and/or a decoder 107 of system 100.
  • Codec system 400 is generalized to depict components employed in both an encoder and a decoder.
  • Codec system 400 receives and partitions frames from a video signal (e.g., including a sub-picture video stream) as discussed with respect to steps 301 and 303 in operating method 300, which results in a partitioned video signal 401.
  • Codec system 400 then compresses the partitioned video signal 401 into a coded bitstream when acting as an encoder as discussed with respect to steps 305, 307, and 309 in method 300.
  • codec system 400 When acting as a decoder, codec system 400 generates an output video signal from the bitstream as discussed with respect to steps 311, 313, 315, and 317 in operating method 300.
  • the codec system 400 includes a general coder control component 411, a transform scaling and quantization component 413, an intra-picture estimation component 415, an intra-picture prediction component 417, a motion compensation component 419, a motion estimation component 421, a scaling and inverse transform component 429, a filter control analysis component 427, an in-loop filters component 425, a decoded picture buffer component 423, and a header formatting and context adaptive binary arithmetic coding (CAB AC) component 431.
  • Such components are coupled as shown.
  • black lines indicate movement of data to be encoded/decoded while dashed lines indicate movement of control data that controls the operation of other components.
  • the components of codec system 400 may all be present in the encoder.
  • the decoder may include a subset of the components of codec system 400.
  • the decoder may include the intra picture prediction component 417, the motion compensation component 419, the scaling and inverse transform component 429, the in-loop filters component 425, and the decoded picture buffer component 423. These components are now described.
  • the partitioned video signal 401 is a captured video sequence that has been partitioned into blocks of pixels by a coding tree.
  • a coding tree employs various split modes to subdivide a block of pixels into smaller blocks of pixels. These blocks can then be further subdivided into smaller blocks.
  • the blocks may be referred to as nodes on the coding tree. Larger parent nodes are split into smaller child nodes.
  • the number of times a node is subdivided is referred to as the depth of the node/coding tree.
  • the divided blocks can be included in coding units (CUs) in some cases.
  • a CU can be a sub-portion of a CTU that contains a luma block, red difference chroma (Cr) block(s), and a blue difference chroma (Cb) block(s) along with corresponding syntax instructions for the CU.
  • the split modes may include a binary tree (BT), triple tree (TT), and a quad tree (QT) employed to partition a node into two, three, or four child nodes, respectively, of varying shapes depending on the split modes employed.
  • the partitioned video signal 401 is forwarded to the general coder control component 411, the transform scaling and quantization component 413, the intra picture estimation component 415, the filter control analysis component 427, and the motion estimation component 421 for compression.
  • the general coder control component 411 is configured to make decisions related to coding of the images of the video sequence into the bitstream according to application constraints. For example, the general coder control component 411 manages optimization of bitrate/bitstream size versus reconstruction quality. Such decisions may be made based on storage space/bandwidth availability and image resolution requests. The general coder control component 411 also manages buffer utilization in light of transmission speed to mitigate buffer underrun and overrun issues. To manage these issues, the general coder control component 411 manages partitioning, prediction, and filtering by the other components.
  • the general coder control component 411 may dynamically increase compression complexity to increase resolution and increase bandwidth usage or decrease compression complexity to decrease resolution and bandwidth usage ffence, the general coder control component 411 controls the other components of codec system 400 to balance video signal reconstruction quality with bitrate concerns.
  • the general coder control component 411 creates control data, which controls the operation of the other components.
  • the control data is also forwarded to the header formatting and CABAC component 431 to be encoded in the bitstream to signal parameters for decoding at the decoder.
  • the partitioned video signal 401 is also sent to the motion estimation component 421 and the motion compensation component 419 for inter-prediction.
  • a frame or slice of the partitioned video signal 401 may be divided into multiple video blocks.
  • Motion estimation component 421 and the motion compensation component 419 perform inter-predictive coding of the received video block relative to one or more blocks in one or more reference frames to provide temporal prediction.
  • Codec system 400 may perform multiple coding passes, e.g., to select an appropriate coding mode for each block of video data.
  • Motion estimation component 421 and motion compensation component 419 may be highly integrated, but are illustrated separately for conceptual purposes.
  • Motion estimation performed by motion estimation component 421, is the process of generating motion vectors, which estimate motion for video blocks.
  • a motion vector for example, may indicate the displacement of a coded object relative to a predictive block.
  • a predictive block is a block that is found to closely match the block to be coded, in terms of pixel difference.
  • a predictive block may also be referred to as a reference block.
  • Such pixel difference may be determined by sum of absolute difference (SAD), sum of square difference (SSD), or other difference metrics.
  • HEVC employs several coded objects including a CTU, coding tree blocks (CTBs), and CUs.
  • a CTU can be divided into CTBs, which can then be divided into CBs for inclusion in CUs.
  • a CU can be encoded as a prediction unit (PU) containing prediction data and/or a transform unit (TU) containing transformed residual data for the CU.
  • the motion estimation component 421 generates motion vectors, PUs, and TUs by using a rate-distortion analysis as part of a rate distortion optimization process. For example, the motion estimation component 421 may determine multiple reference blocks, multiple motion vectors, etc. for a current block/frame, and may select the reference blocks, motion vectors, etc. having the best rate-distortion characteristics. The best rate-distortion characteristics balance both quality of video reconstruction (e.g., amount of data loss by compression) with coding efficiency (e.g., size of the final encoding).
  • codec system 400 may calculate values for sub-integer pixel positions of reference pictures stored in decoded picture buffer component 423. For example, video codec system 400 may interpolate values of one-quarter pixel positions, one-eighth pixel positions, or other fractional pixel positions of the reference picture. Therefore, motion estimation component 421 may perform a motion search relative to the full pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision. The motion estimation component 421 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU to the position of a predictive block of a reference picture.
  • Motion estimation component 421 outputs the calculated motion vector as motion data to the header formatting and CAB AC component 431 for encoding and motion to the motion compensation component 419.
  • Motion compensation performed by motion compensation component 419, may involve fetching or generating the predictive block based on the motion vector determined by motion estimation component 421. Again, motion estimation component 421 and motion compensation component 419 may be functionally integrated, in some examples.
  • motion compensation component 419 may locate the predictive block to which the motion vector points. A residual video block is then formed by subtracting pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values.
  • motion estimation component 421 performs motion estimation relative to luma components
  • motion compensation component 419 uses motion vectors calculated based on the luma components for both chroma components and luma components.
  • the predictive block and residual block are forwarded to transform scaling and quantization component 413.
  • the partitioned video signal 401 is also sent to intra-picture estimation component 415 and intra-picture prediction component 417.
  • intra-picture estimation component 415 and intra picture prediction component 417 may be highly integrated, but are illustrated separately for conceptual purposes.
  • the intra-picture estimation component 415 and intra-picture prediction component 417 intra-predict a current block relative to blocks in a current frame, as an alternative to the inter-prediction performed by motion estimation component 421 and motion compensation component 419 between frames, as described above.
  • the intra picture estimation component 415 determines an intra-prediction mode to use to encode a current block.
  • intra-picture estimation component 415 selects an appropriate intra-prediction mode to encode a current block from multiple tested intra prediction modes. The selected intra-prediction modes are then forwarded to the header formatting and CABAC component 431 for encoding.
  • the intra-picture estimation component 415 calculates rate-distortion values using a rate-distortion analysis for the various tested intra-prediction modes, and selects the intra-prediction mode having the best rate-distortion characteristics among the tested modes.
  • Rate-distortion analysis generally determines an amount of distortion (or error) between an encoded block and an original unencoded block that was encoded to produce the encoded block, as well as a bitrate (e.g., a number of bits) used to produce the encoded block.
  • the intra picture estimation component 415 calculates ratios from the distortions and rates for the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.
  • intra-picture estimation component 415 may be configured to code depth blocks of a depth map using a depth modeling mode (DMM) based on rate-distortion optimization (RDO).
  • DDMM depth modeling mode
  • RDO rate-distortion optimization
  • the intra-picture prediction component 417 may generate a residual block from the predictive block based on the selected intra-prediction modes determined by intra-picture estimation component 415 when implemented on an encoder or read the residual block from the bitstream when implemented on a decoder.
  • the residual block includes the difference in values between the predictive block and the original block, represented as a matrix.
  • the residual block is then forwarded to the transform scaling and quantization component 413.
  • the intra-picture estimation component 415 and the intra-picture prediction component 417 may operate on both luma and chroma components.
  • the transform scaling and quantization component 413 is configured to further compress the residual block.
  • the transform scaling and quantization component 413 applies a transform, such as a discrete cosine transform (DCT), a discrete sine transform (DST), or a conceptually similar transform, to the residual block, producing a video block comprising residual transform coefficient values. Wavelet transforms, integer transforms, sub-band transforms, or other types of transforms could also be used.
  • the transform may convert the residual information from a pixel value domain to a transform domain, such as a frequency domain.
  • the transform scaling and quantization component 413 is also configured to scale the transformed residual information, for example based on frequency.
  • Such scaling involves applying a scale factor to the residual information so that different frequency information is quantized at different granularities, which may affect final visual quality of the reconstructed video.
  • the transform scaling and quantization component 413 is also configured to quantize the transform coefficients to further reduce bitrate.
  • the quantization process may reduce the bit depth associated with some or all of the coefficients.
  • the degree of quantization may be modified by adjusting a quantization parameter.
  • the transform scaling and quantization component 413 may then perform a scan of the matrix including the quantized transform coefficients.
  • the quantized transform coefficients are forwarded to the header formatting and CABAC component 431 to be encoded in the bitstream.
  • the scaling and inverse transform component 429 applies a reverse operation of the transform scaling and quantization component 413 to support motion estimation.
  • the scaling and inverse transform component 429 applies inverse scaling, transformation, and/or quantization to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block which may become a predictive block for another current block.
  • the motion estimation component 421 and/or motion compensation component 419 may calculate a reference block by adding the residual block back to a corresponding predictive block for use in motion estimation of a later block/frame. Filters are applied to the reconstructed reference blocks to mitigate artifacts created during scaling, quantization, and transform. Such artifacts could otherwise cause inaccurate prediction (and create additional artifacts) when subsequent blocks are predicted.
  • the filter control analysis component 427 and the in-loop filters component 425 apply the filters to the residual blocks and/or to reconstructed image blocks.
  • the transformed residual block from the scaling and inverse transform component 429 may be combined with a corresponding prediction block from intra-picture prediction component 417 and/or motion compensation component 419 to reconstruct the original image block.
  • the filters may then be applied to the reconstructed image block.
  • the filters may instead be applied to the residual blocks.
  • the filter control analysis component 427 and the in-loop filters component 425 are highly integrated and may be implemented together, but are depicted separately for conceptual purposes. Filters applied to the reconstructed reference blocks are applied to particular spatial regions and include multiple parameters to adjust how such filters are applied.
  • the filter control analysis component 427 analyzes the reconstructed reference blocks to determine where such filters should be applied and sets corresponding parameters. Such data is forwarded to the header formatting and CABAC component 431 as filter control data for encoding.
  • the in-loop filters component 425 applies such filters based on the filter control data.
  • the filters may include a deblocking filter, a noise suppression filter, a SAO filter, and an adaptive loop filter. Such filters may be applied in the spatial/pixel domain (e.g., on a reconstructed pixel block) or in the frequency domain, depending on the example.
  • the filtered reconstructed image block, residual block, and/or prediction block are stored in the decoded picture buffer component 423 for later use in motion estimation as discussed above.
  • the decoded picture buffer component 423 stores and forwards the reconstructed and filtered blocks toward a display as part of an output video signal.
  • the decoded picture buffer component 423 may be any memory device capable of storing prediction blocks, residual blocks, and/or reconstructed image blocks.
  • the header formatting and CABAC component 431 receives the data from the various components of codec system 400 and encodes such data into a coded bitstream for transmission toward a decoder. Specifically, the header formatting and CABAC component 431 generates various headers to encode control data, such as general control data and filter control data. Further, prediction data, including intra-prediction and motion data, as well as residual data in the form of quantized transform coefficient data are all encoded in the bitstream. The final bitstream includes all information desired by the decoder to reconstruct the original partitioned video signal 401.
  • Such information may also include intra-prediction mode index tables (also referred to as codeword mapping tables), definitions of encoding contexts for various blocks, indications of most probable intra-prediction modes, an indication of partition information, etc.
  • Such data may be encoded by employing entropy coding.
  • the information may be encoded by employing context adaptive variable length coding (CAVLC), CABAC, syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or another entropy coding technique.
  • CAVLC context adaptive variable length coding
  • CABAC syntax-based context-adaptive binary arithmetic coding
  • PIPE probability interval partitioning entropy
  • the coded bitstream may be transmitted to another device (e.g., a video decoder) or archived for later transmission or retrieval.
  • FIG. 5 is a block diagram illustrating an example video encoder 500.
  • Video encoder 500 may be employed to implement the encoding functions of codec system 400 and/or implement steps 301, 303, 305, 307, and/or 309 of method 300. Further, encoder 500 may be employed to implement steps 205-209 of method 200 as well as encoder 103.
  • Encoder 500 partitions an input video signal (e.g., a sub-picture video stream), resulting in a partitioned video signal 501, which is substantially similar to the partitioned video signal 401. The partitioned video signal 501 is then compressed and encoded into a bitstream by components of encoder 500.
  • an input video signal e.g., a sub-picture video stream
  • the partitioned video signal 501 is forwarded to an intra-picture prediction component 517 for intra-prediction.
  • the intra-picture prediction component 517 may be substantially similar to intra-picture estimation component 415 and intra-picture prediction component 417.
  • the partitioned video signal 501 is also forwarded to a motion compensation component 521 for inter-prediction based on reference blocks in a decoded picture buffer component 523.
  • the motion compensation component 521 may be substantially similar to motion estimation component 421 and motion compensation component 419.
  • the prediction blocks and residual blocks from the intra-picture prediction component 517 and the motion compensation component 521 are forwarded to a transform and quantization component 513 for transformation and quantization of the residual blocks.
  • the transform and quantization component 513 may be substantially similar to the transform scaling and quantization component 413.
  • the transformed and quantized residual blocks and the corresponding prediction blocks (along with associated control data) are forwarded to an entropy coding component 531 for coding into a bitstream.
  • the entropy coding component 531 may be substantially similar to the header formatting and CAB AC component 431.
  • the transformed and quantized residual blocks and/or the corresponding prediction blocks are also forwarded from the transform and quantization component 513 to an inverse transform and quantization component 529 for reconstruction into reference blocks for use by the motion compensation component 521.
  • the inverse transform and quantization component 529 may be substantially similar to the scaling and inverse transform component 429.
  • In-loop filters in an in-loop filters component 525 are also applied to the residual blocks and/or reconstructed reference blocks, depending on the example.
  • the in-loop filters component 525 may be substantially similar to the filter control analysis component 427 and the in-loop filters component 425.
  • the in-loop filters component 525 may include multiple filters as discussed with respect to in-loop filters component 425.
  • the filtered blocks are then stored in a decoded picture buffer component 523 for use as reference blocks by the motion compensation component 521.
  • the decoded picture buffer component 523 may be substantially similar to the decoded picture buffer component 423.
  • the encoder 500 may encode video into one or more tracks. As discussed in more detail below, VR video can be recorded from multiple viewpoints. Video from each viewpoint can then be encoded in a corresponding set of tracks. This allows the decoder to swap between tracks based on user input, which allows a user to swap between viewpoints as desired. A user may wish to continuously watch a particular object or location in the virtual environment when switching between viewpoints. In order to allow the user to maintain a consistent view, the encoder 500 can be configured to encode data indicating correspondences between viewport center points of related viewpoints. This allows the decoder to determine the correspondences and determine the FOV and/or viewport used by the user at a first viewpoint when a viewpoint switch is requested.
  • the decoder can then determine a FOV/viewport at a second viewpoint that corresponds to the FOV/viewport used at the first viewpoint based on the correspondences encoded by the encoder 500. Accordingly, when the user switches between viewpoints, the decoder can display a FOV/viewport at the second viewpoint that points toward the same location previously viewed by the user at the first viewpoint. For example, such correspondences can be encoded in a timed metadata track and/or in corresponding video tracks.
  • FIG. 6 is a block diagram illustrating an example video decoder 600.
  • Video decoder 600 may be employed to implement the decoding functions of codec system 400 and/or implement steps 311, 313, 315, and/or 317 of operating method 300. Further, decoder 600 may be employed to implement steps 211-213 of method 200 as well as decoder 107.
  • Decoder 600 receives a plurality of tracks containing picture bitstreams and/or sub-picture bitstreams, for example from an encoder 500, generates a reconstructed output video signal, for example by merging sub-picture video streams into a spherical video stream, and forwards the spherical video stream for display to a user via a rendering device.
  • the bitstreams are received by an entropy decoding component 633.
  • the entropy decoding component 633 is configured to implement an entropy decoding scheme, such as CAVLC, CABAC, SBAC, PIPE coding, or other entropy coding techniques.
  • the entropy decoding component 633 may employ header information to provide a context to interpret additional data encoded as codewords in the bitstreams.
  • the decoded information includes any desired information to decode the video signal, such as general control data, filter control data, partition information, motion data, prediction data, and quantized transform coefficients from residual blocks.
  • the quantized transform coefficients are forwarded to an inverse transform and quantization component 629 for reconstruction into residual blocks.
  • the inverse transform and quantization component 629 may be similar to inverse transform and quantization component 529.
  • the reconstructed residual blocks and/or prediction blocks are forwarded to intra picture prediction component 617 for reconstruction into image blocks based on intra prediction operations.
  • the intra-picture prediction component 617 may be similar to intra picture estimation component 415 and intra-picture prediction component 417. Specifically, the intra-picture prediction component 617 employs prediction modes to locate a reference block in the frame and applies a residual block to the result to reconstruct intra-predicted image blocks.
  • the reconstructed intra-predicted image blocks and/or the residual blocks and corresponding inter-prediction data are forwarded to a decoded picture buffer component 623 via an in-loop filters component 625, which may be substantially similar to decoded picture buffer component 423 and in-loop filters component 425, respectively.
  • the in-loop filters component 625 filters the reconstructed image blocks, residual blocks, and/or prediction blocks, and such information is stored in the decoded picture buffer component 623.
  • Reconstructed image blocks from decoded picture buffer component 623 are forwarded to a motion compensation component 621 for inter-prediction.
  • the motion compensation component 621 may be substantially similar to motion estimation component 421 and/or motion compensation component 419. Specifically, the motion compensation component 621 employs motion vectors from a reference block to generate a prediction block and applies a residual block to the result to reconstruct an image block.
  • the resulting reconstructed blocks may also be forwarded via the in-loop filters component 625 to the decoded picture buffer component 623.
  • the decoded picture buffer component 623 continues to store additional reconstructed image blocks, which can be reconstructed into frames via the partition information. Such frames may also be placed in a sequence. The sequence is output toward a display as a reconstructed output video signal.
  • the decoder 600 may receive a set of tracks containing VR video recorded from multiple viewpoints. This allows the decoder 600 to swap between tracks based on user input, which allows a user to swap between viewpoints as desired. A user may wish to continuously watch a particular object or location in the virtual environment when switching between viewpoints. In order to allow the user to maintain a consistent view, the tracks may contain data indicating correspondences between viewport center points of related viewpoints. This allows the decoder 600 to determine the correspondences and determine the FOV and/or viewport used by the user at a first viewpoint when a viewpoint switch is requested.
  • the decoder 600 can then determine a FOV/viewport at a second viewpoint that corresponds to the FOV/viewport used at the first viewpoint based on the correspondences encoded by the encoder. Accordingly, when the user switches between viewpoints, the decoder 600 can display a FOV/viewport at the second viewpoint that points toward the same location previously viewed by the user at the first viewpoint. For example, such correspondences can be encoded in a timed metadata track and/or in corresponding video tracks.
  • FIG. 7 is a schematic diagram illustrating an example system 700 for capturing VR video from multiple viewpoints 702, 703, and 704.
  • Multiple viewpoints 702, 703, and 704 are as an example. In other examples, less or more viewpoints may be provided.
  • the system 700 is implemented to capture activity at a particular scene 701 (e.g., a stadium) using a plurality of cameras positioned at corresponding viewpoints 702, 703, and 704.
  • the cameras may be similar to the multi-directional cameras 101 described above in connection with FIG. 1.
  • the cameras may capture VR videos in fixed positions at viewpoint 702 and viewpoint 703, together with a camera, which has the ability to continuously change positions along a rail 705 in order to capture VR videos from a variety of different positions denoted as viewpoint 704.
  • the camera By sliding along the rail 705, the camera is able to capture the VR video from different positions, and hence viewpoint 704 may change over time.
  • the camera at viewpoint 704 may be mounted in other ways in order to be moveable in one or more directions.
  • the cameras may each record a sphere of video looking outward from the perspective of the corresponding viewpoint 702, 703, and 704.
  • a viewpoint 702, 703, and 704 is the center of a sphere of video data as recorded from a specified location.
  • video (and audio) can be recorded from viewpoints 702, 703, and 704.
  • the video for each viewpoint can then be stored in a set of corresponding tracks.
  • video from a viewpoint 702 can be downsampled and stored at various resolutions in tracks as part of an adaptation set for viewpoint 702.
  • Adaptation sets for viewpoints 703 and 704 can also be stored in corresponding tracks.
  • a decoder can receive user input and, based on the user input, select an adaptation set with corresponding tracks for display. This in turn allows a user to direct the decoder to switch between viewpoints 702, 703, and 704. The result is the user can experience VR video from a first viewpoint (e.g., viewpoint 702) at a first time and then switch to experience VR video from a second viewpoint (e.g., viewpoint 703 or 704) at a second time.
  • a first viewpoint e.g., viewpoint 702
  • a second viewpoint e.g., viewpoint 703 or 704
  • One mechanism to enable such a viewpoint switch is to provide a default orientation for each viewpoint 702, 703, and 704.
  • An orientation is a direction of view pointing outward from the center of a corresponding viewpoint 702, 703, and/or 704.
  • An orientation may be described in terms of angle, coordinates, etc.
  • a specified orientation may result in a corresponding FOV and viewport for viewing video from the viewpoint 702, 703, and/or 704.
  • a system for encoding VR video from multiple viewpoints 702, 703, and 704 can be implemented as follows. Tracks belonging to the same viewpoint may have the same value of track group id for track group type (vipo). The track group id of tracks from one viewpoint may differ from the track group id of tracks from any other viewpoint. By default, when this track grouping is not indicated for any track in a file, the file is considered as containing content for one viewpoint only.
  • Example syntax is as follows:
  • class ViewpointGroupBox extends TrackGroupTypeBox('vipo') ⁇
  • V iewpointPos Struct() V iewpointPos Struct() ;
  • a Viewpoint Information Structure (ViewpointInfoStruct()) provides information of a viewpoint, including the position of the viewpoint and the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system.
  • the syntax is as follows.
  • V iewpointPos Struct() V iewpointPos Struct() ;
  • bit(7) reserved 0;
  • bit(3 l) reserved 0;
  • the semantics for this syntax is as follows.
  • the group alignment flag can be set equal to one to specify that the viewpoint belongs to a separate coordinate system (with its own origin) for the alignment of viewpoint groups and the ViewpointGroupStruct is present.
  • the group alignment flag can be set equal to zero to specify that the viewpoint belongs to the common reference coorindate system.
  • viewpoint_pos_x, viewpoint_pos_y, and viewpoint pos z specify the position of the viewpoint (when the position of the viewpoint is static) or the initial position of viewpoint (when the position of the viewpoint is dynamic) in units of 10 1 millimeters, in three dimensional (3D) space with (0, 0, 0) as the center of the common reference coordinate system. If a viewpoint is associated with a timed metadata track with sample entry type 'dyvp', the position of the viewpoint is dynamic. Otherwise, the position of the viewpoint is static. In the former case, the dynamic position of the viewpoint is signalled in the associated timed metadata track with sample entry type 'dyvp'.
  • the viewpoint_gpspos_present_flag may be set equal to one to indicate that viewpoint gpspos longitude, viewpoint gpspos latitude, and viewpoint gpspos altitude are present.
  • the viewpoint_gpspos_present_flag can be set equal to zero to indicate that viewpoint gpspos longitude, viewpoint gpspos latitude, and viewpoint gpspos altitude are not present
  • viewpoint gpspos longitude can indicate the longitude of the geolocation of the viewpoint in units of 2 23 degrees.
  • viewpoint_gpspos_longitude shall be in range of -180 * 2 23 to 180 * 2 23 - 1, inclusive. Positive values represent eastern longitude and negative values represent western longitude.
  • the viewpoint gpspos latitude indicates the latitude of the geolocation of the viewpoint in units of 2 23 degrees.
  • the viewpoint gpspos latitude may be in the range of -90 * 2 23 to 90 * 2 23 - 1, inclusive.
  • a positive value represents northern latitude and negative value represents southern latitude.
  • the viewpoint gpspos altitude indicates the altitude of the geolocation of the viewpoint in units of milimeters above a World Geodetic System (WGS 84) reference ellipsoid.
  • WGS 84 World Geodetic System
  • the viewpoint gcs yaw, v i e w po i n t gcs p itch, and viewpoint gcs roll specify the yaw, pitch, and roll angles, respectively, of the rotation angles of X, Y, Z axes of the global coordinate system of the viewpoint relative to the common reference coordinate system, in units degrees.
  • the viewpoint_gcs_yaw may be in the range of -180 * 2 16 to 180 *2 16 - 1, inclusive.
  • the viewpoint_gcs_pitch may be in the range of -90 * 2 16 to 90 * 2 16 , inclusive.
  • the viewpoint_gcs_roll may be in the range of -l80 * 2 16 to 180 * 2 16 - 1, inclusive.
  • the vwpt group id indicates the identifier of a viewpoint group. All viewpoints in a viewpoint group share a common reference coordinate system.
  • the vwpt group description is a null- terminated UTF-8 string which indicates the description of a viewpoint group. A null string is allowed.
  • An OMAF player may be expected to start with the initial viewpoint timed metadata. Subsequently, if the user wishes to switch to a viewpoint group and the initial viewpoint information is not present, the OMAF player may switch to the viewpoint with the least value of the viewpoint identifier in the viewpoint group.
  • a sample group may be employed for recommended viewports of multiple viewpoints.
  • a timed metadata track having sample entry type 'rcvp' may contain zero or one SampleToGroupBox with grouping type equal to 'vwpt'.
  • This SampleToGroupBox represents the assignment of samples in this timed metadata (and consequently the corresponding samples in the media tracks) to viewpoints.
  • an accompanying SampleGroupDescriptionBox with the same grouping type may be present, and may contain the identifier (ID) of the particular viewpoint this group of samples belong to.
  • ID identifier
  • viewpoint id indicates the viewpoint identifier of the viewpoint this group of samples belong to.
  • Timed metadata for viewpoints may include dynamic viewpoint information.
  • the dynamic viewpoint timed metadata track indicates the viewpoint parameters that are dynamically changing over time.
  • An OMAF player may use the signalled information as follows when starting playing back of one viewpoint after switching from another viewpoint. If there is a recommended viewing orientation explicitly signaled, the OMAF player may parse this information and follow the recommended viewing orientation. Otherwise, the OMAF player may keep the same viewing orientation as in the switching-from viewpoint just before the switching occurs.
  • the track sample entry type 'dyvp' can be used for dynamic viewpoints.
  • the sample entry of this sample entry type is specified as follows:
  • bit(7) reserved 0; if (!dynamic_gcs_rotation_flag)
  • the ViewpointPosStmct() is defined above but indicates the initial viewpoint position.
  • the dynamic gcs rotated flag can be set equal to zero to specify that the yaw, pitch, and roll rotation angles ofX, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system remain unchanged in all samples referring to this sample entry.
  • the dynamic gcs rotated flag can be set equal to zero to specify that the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system are indicated in the samples.
  • ViewpointGlobalCoordinateSysRotationStruct() is defined in clause above but indicates the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system for each sample referring to this sample entry.
  • the semantics of ViewpointInfoStruct() is specified above.
  • the first sample should have a group alignment flag set equal to one.
  • the ViewpointGroupStruct() can be absent.
  • the structure is inferred to be identical to the ViewpointGroupStruct() of the previous sample in decoding order.
  • the metadata indicates the initial viewpoint that should be used.
  • the initial viewpoint may be inferred to be the viewpoint that has the least value of viewpoint identifier among all viewpoints in the file.
  • the initial viewpoint timed metadata track, when present, should be indicated as being associated with all viewpoints in the file.
  • the track sample entry type 'invp' may be used.
  • the sample entry of this sample entry type is specified as follows:
  • the id of initial viewpoint indicates the value of the viewpoint identifier of the initial viewpoint for the first sample to which this sample entry applies.
  • the id of initial viewpoint indicates the value of the viewpoint identifier of the initial viewpoint for the sample.
  • OMAF includes the specification of the initial viewing orientation timed metadata. This metadata indicates initial viewing orientations that may be used when playing the associated media tracks or a single omnidirectional image stored as an image item.
  • the track sample entry type 'invo' may be used.
  • An example syntax is as follows:
  • bit(7) reserved 0;
  • bit(7) reserved 0;
  • the default orientation approach as described in the implementation above may cause a user to view a specified default FOV and viewport upon switching to a new viewpoint 702, 703, and/or 704. Flowever, this may result in a negative user experience in some cases. For example, a user may wish to continuously view an object in the scene 701, such as a basketball, a particular player, a goal, etc. Such a consistency may not be possible using default orientations. For example, a user watching the ball at viewpoint 702 may wish to switch to viewpoint 704 to get a closer look. Flowever, the default orientation at viewpoint 704 may be toward the goal. In such a case, the user loses the ball upon switching and is forced to find the ball again.
  • the encoder can store viewport center point correspondences between viewpoints 702, 703, and/or 704.
  • the decoder can determine the viewport viewed by the user at viewpoint 702 upon switching to viewpoint 704 (or viewpoint 703 in other examples).
  • the decoder can then use the viewport center point correspondences between viewpoint 702 and viewpoint 704 to determine a viewport at viewpoint 704 that matches the viewport at viewpoint 702.
  • the decoder can then employ the determined viewport at viewpoint 704 after making the switch.
  • the user is automatically oriented to the same location in the scene 701 after the switch between viewpoints 702, 703, and/or 704 as was viewed before the switch. For example, if the user is watching the ball at viewpoint 702, the user is automatically oriented to view the ball from viewpoint 704 upon switching.
  • the viewport center point correspondences are discussed in greater detail below.
  • FIG. 8 is a schematic diagram 800 of a pair of viewpoints 810 and 820 with corresponding viewport center points 8l3a and 823a.
  • a viewport correspondence is an indication that two or more viewports 813 and 823 are spatially related such that viewing the viewports 813 and 823 from a related viewpoint 810 and 820, respectively, provides a view of the same object 830.
  • the correspondences support switching in the present disclosure. Further, the correspondences between viewports 813 and 823 are denoted by the center points 813a and 823a of the viewports 813 and 823, respectively.
  • the correspondences shown in schematic diagram 800 can be used by an encoder 103, an encoder 500, a decoder 107, a decoder 600, and/or a codec system 400. Further, the correspondences shown in schematic diagram 800 can describe relationships between viewpoints 702, 703, and/or 704. In addition, the correspondences shown in schematic diagram 800 can be encoded in a bitstream and used to support selection of tracks to decode and display, and hence can be used as part of methods 200 and 300.
  • viewpoint 810 and 820 pairs can be stored as viewpoint 810 and 820 pairs.
  • Viewpoints 810 and 820 each include a sphere 814 and 824, respectively, of video content in associated tracks.
  • a user viewing video from a viewpoint 810 and 820 has access to a sphere 814 and 824, respectively, of video content.
  • the video content is depicted to the user by projecting a portion of the video content from the sphere 814 and 824, depending on the user’s viewpoint 810 and 820, onto a viewport based on the current FOV 811 and 821, respectively, of the user.
  • a FOV such as FOV 811 and 821, is an angular orientation measured from the center of the associated viewpoint 810 and 820, respectively.
  • the angular orientation of an FOV 811 and 821 can be used to determine an area of the location 831 that a user can view from a corresponding viewpoint 810 and 820.
  • Each FOV corresponds to a viewport.
  • a viewport is a two dimensional planar shape with a width and a height that covers a viewable portion of the viewpoint sphere.
  • a portion of the viewpoint sphere that can be viewed by an FOV is projected onto a corresponding viewport.
  • a user wearing a FIMD may point their head in a particular direction.
  • the angular orientation of the FIMD relative to the viewpoint sphere of video content in such an example is the FOV.
  • the screen displaying content to the user is the viewport.
  • the portion of the viewpoint sphere of video content displayed on the viewport should match the angular orientation of the FIMD. ffence, the viewport should match the FOV for proper operation of a VR system.
  • the viewpoints 810 and 820 can have many viewports covering different portions of the spheres 814 and 824, respectively. Such viewports can be as varied as the number of FOVs available for the corresponding viewpoint. Many of the viewports of the viewpoints 810 and 820 are unrelated as they may allow the user a view of the same location 831, but not the same object 830. Flowever, certain viewports, such as viewports 813 and 823 correspond. This is because FOVs 811 and 821 project the same object 830 onto viewports 813 and 823.
  • corresponding viewports 813 and 823 may be employed to allow a user to view the same object 830 at the location 831 from a different perspective.
  • the center points 8l3a and 823a of each viewport 813 and 823, respectively, can be employed to indicate such correspondence.
  • a user can view the object 830 via an initial viewport 813 at an initial viewpoint 810.
  • the user can then decide to switch to a destination viewpoint 820.
  • the decoder can determine that the initial viewpoint 810 and the destination viewpoint 820 have corresponding viewports.
  • the decoder can employ stored viewport center point data to determine that the viewport center point 8l3a of the initial viewport 813 corresponds with the viewport center point 823a of the viewport 823.
  • the decoder can default to the viewport 823 containing the indicated viewport center point 823a. In this way, the decoder can perform a switch from viewpoint 810 to viewpoint 820 while orienting the user toward the same object 830 in order to maintain viewing consistency when switching between viewpoints 810 and 820.
  • a viewpoint set groups of viewpoints 810 and 820 that have a view of the same object 830 may be referred to as a viewpoint set.
  • a center point 8l3a of viewport 813 or a center point 823a of viewport 823) may also be referred to as a viewport center in some cases.
  • viewpoints 810 and 820 in a viewpoint set can be encoded in a VR video stream. This allows a decoder to correctly determine corresponding viewports 813 and 823 based on the viewpoint centers when switching between viewpoints 810 and 820 in the viewpoint set.
  • the embodiment described in FIG. 8 can be implemented as discussed below.
  • the viewing orientation information e.g., FOV related information
  • the client/decoder can be signaled to the client/decoder in a timed metadata track.
  • Viewpoint pairs and corresponding center points of viewport pairs are signaled in order to indicate an expected viewing orientation after switching from one viewpoint to another viewpoint.
  • the viewport center points correspondence information is signalled as a particular sample entry type, such as 'vcpc', in the timed metadata tracks.
  • the sample entry may be specified as follows:
  • VcpcSampleEntryO extends SphereRegionSampleEntry('vcpc') ⁇
  • num_viewpoint_pairs indicates the number of viewpoint pairs for which viewport center points correspondence is signalled in the samples to which this sample entry applies.
  • viewpoint_id[2] indicates two viewpoints identifiers (IDs) of the viewpoint pair num corresponding viewport centres indicates the number of corresponding viewport center points signalled in this sample for the i- th viewpoint pair.
  • SphereRegionStruct(0)[k]for k equal to 0 or 1 specifies the viewport center point corresponding to the viewpoint indicated by viewpoint_id[k].
  • a viewport center points correspondence timed metadata track with sample entry type of 'vcpc' may have no track reference of type 'cdsc' and in this case the correspondence may apply to the entire file.
  • Content providers can perform scene or object matching among video streams representing different viewpoints frame by frame, and choose a representative point of the scene or object, e.g., the center point of an object, as the corresponding viewport center point to be indicated by the VCPC timed metadata track.
  • the client checks whether the user's field of view in the switch-from viewpoint covers a corresponding viewport center point that is indicated by the time-aligned sample of the VCPC timed metadata track.
  • the client may render to the user the viewport in the switching-to viewpoint for which the corresponding center point is indicated by the time- aligned sample of the VCPC timed metadata track.
  • the user's field of view covers more than one indicated viewport center point, one of those that are the closest to the center of the user's field of view may be chosen. If both recommended viewport metadata information for the switch-to viewpoint and the VCPC timed metadata track are available, since both types information do no impose mandatory OMAF player behaviour, then the OMAF player may choose to follow either one or no one.
  • FIG. 9 is a schematic diagram 900 of a set of viewpoints 910, 920, 940, and 950 corresponding viewport center points 9l3a, 923a, 943a, and 953a.
  • the correspondences support switching in the present disclosure ffence, the correspondences shown in schematic diagram 900 can be used by an encoder 103, an encoder 500, a decoder 107, a decoder 600, and/or a codec system 400. Further, the correspondences shown in schematic diagram 900 can describe relationships between viewpoints 702, 703, and/or 704. In addition, the correspondences shown in schematic diagram 900 can be encoded in a bitstream and used to support selection of tracks to decode and display, and hence can be used as part of methods 200 and 300.
  • Diagram 900 is substantially similar to diagram 800, but allows for sets including any number of viewpoints 910, 920, 940, and 950 instead of using pairs of viewpoints 810 and 820.
  • diagram 900 includes viewpoints 910, 920, 940, and 950, spheres 914, 924, 944, and 954, FOVs 911, 921, 941, and 951, and viewports 913, 923, 943, and 953, which are substantially similar to viewpoints 810 and 820, spheres 814 and 824, FOVs 811 and 821, and viewports 813 and 823.
  • FOVs 911, 921, 941, and 951 are all oriented toward same object 930 from different perspectives.
  • an encoder may encode a set of correspondences between viewport center point 9l3a at viewpoint 910, viewport center point 923a at viewpoint 920, viewport center point 943a at viewpoint 940, and viewport center point 953a at viewpoint 950 during VR video creation.
  • a decoder can then use such information to maintain viewing consistency when switching between viewpoints 910, 920, 940, and 950.
  • the viewport correspondence is signalled between a group of two or more viewports as follows.
  • the viewport center points correspondence timed metadata tracks that have a particular sample entry type, such as 'vcpc', can be used.
  • the sample entry can be specified as follows:
  • VcpcSampleEntryO extends SphereRegionSampleEntry('vcpc') ⁇
  • the semantics for this syntax is as follows.
  • the num viewpoint sets indicates the number of viewpoint sets for which viewport center points correspondence is signalled in the samples to which this sample entry applies.
  • the num_viewpoints_in_this_set[i] indicates the number of viewpoints in the i-th viewpoint set.
  • the viewpoint_id[i][j] indicates the viewpoint ID of the j-th viewpoint in the i-th viewpoint set.
  • the num_corresponding_viewport_centres_in_this_set[i] indicates the number of corresponding viewport center points signalled in this sample for the i-th viewpoint set.
  • the SphereRegionStruct(0)[i][k][j] specifies the k-th corresponding viewport center point of the j-th viewpoint in the i-th viewpoint set. For any particular value of k in the range of zero to num_corresponding_viewport_centres_in_this_set[i] - 1, inclusive, the sphere points indicated by SphereRegionStruct(0)[i][k][j] for j ranging from zero to num_viewpoints_in_this_set[i] - 1, inclusive, are viewport center points that correspond to each other for the viewpoints in the i-th viewpoint set.
  • a viewport center points correspondence in the timed metadata track with a sample entry type of 'vcpc' may have no track reference of type 'cdsc'.
  • the correspondence applies to the entire file.
  • Content providers can perform scene or object matching among video streams representing different viewpoints frame by frame, and choose a representative point of the scene or object, such as the center point of an object, as the corresponding viewport center point to be indicated by the VCPC timed metadata track.
  • the client/decoder checks whether the user's field of view in the switch-from viewpoint covers a corresponding viewport center point that is indicated by the time-aligned sample of the VCPC timed metadata track.
  • the client may render to the user the viewport in the switching-to viewpoint for which the corresponding center point is indicated by the time-aligned sample of the VCPC timed metadata track.
  • the user's field of view covers more than one indicated viewport center point, one of those that are the closest to the center of the user's field of view should be chosen. If both recommended viewport metadata information for the switch-to viewpoint and the VCPC timed metadata track are available, since both types information do no impose mandatory OMAF player behavior, then the OMAF player may choose to follow either one or no one.
  • FIG. 10 is a schematic diagram of an example VR video file 1000 for multiple viewpoints.
  • VR video file 1000 may be employed to contain correspondences between spatial regions of viewpoints as discussed with respect to diagrams 800 and 900.
  • the VR video file 1000 can be encoded and/or decoded by an encoder 103, an encoder 500, a decoder 107, a decoder 600, and/or a codec system 400.
  • the VR video file 1000 can describe VR video from multiple viewpoints, such as viewpoints 702, 703, and/or 704.
  • the VR video file 1000 can contain encoded VR video, and hence can generated by an encoder and read by a decoder to support video display as part of methods 200 and 300.
  • the VR video file 1000 can contain sets of tracks for corresponding viewpoints.
  • the VR video file 1000 can contain a set of viewpoint A tracks 1010, a set of viewpoint B tracks 1020, a set of viewpoint C tracks 1040, and a set of viewpoint D tracks 1050.
  • such tracks can contain video data as captured from viewpoint 910, viewpoint 920, viewpoint 940, and viewpoint 950, respectively.
  • VR video recorded at a viewpoint is stored in a corresponding adaptation set.
  • the adaptation set is downsampled to various lower resolutions. Then a track is generated for each resolution of the adaptation set.
  • the set of viewpoint A tracks 1010, set of viewpoint B tracks 1020, set of viewpoint C tracks 1040, and set of viewpoint D tracks 1050 contain the tracks associated with the adaptation set for the corresponding viewpoints.
  • the relevant tracks can then be forwarded to the decoder/client depending on the viewpoint selected by the user and the desired resolution based on the availability of network resources.
  • the VR video file 1000 can also contain a timed metadata track 1060.
  • the timed metadata track 1060 contains metadata relevant to all of the viewpoints and hence to all of the tracks 1010, 1020, 1040, and 1050.
  • viewport center point correspondences between viewpoints can be stored in the timed metadata track 1060, for example as one or more S VcpcSample objects/functions.
  • correspondences between each of the relevant viewport center points for the viewpoints can be stored toward the beginning of the timed metadata track 1060.
  • Such information may be global in nature and can be used for the entire VR video file 1000.
  • the timed metadata track 1060 can be employed to contain the viewport center point correspondences between the viewpoints over the entire length of the VR video file 1000.
  • the viewport center point correspondences in the timed metadata track 1060 can then be used by a decoder when displaying VR video as contained in tracks 1010, 1020, 1040, and 1050.
  • FIG. 11 is an embodiment of a method 1100 of displaying a VR video at a decoder based on a viewport center point correspondence between multiple viewpoints, as discussed with respect to diagrams 800 and 900 and as applied to viewpoints such as viewpoints 702, 703, and/or 704.
  • method 1100 may be employed by a decoder 107, a decoder 600, and/or a codec system 400.
  • Method 1100 can also be employed to support viewpoint switching when displaying a VR video file, such as VR video file 1000, and hence can be employed to improve methods 200 and 300.
  • Method 1100 operates on a client including a decoder.
  • the decoder begins processing a VR video stream.
  • the VR video stream comprises a plurality of viewpoints, and the viewpoints are included in at least one viewpoint set.
  • a viewpoint set is a group of viewpoints that contain one or more viewport center point correspondences as described with respect to diagram 800 and/or 900.
  • Each of the viewpoints corresponds to one particular omnidirectional video camera used for capturing an omnidirectional video at a particular location.
  • the VR video stream contains information indicative of a plurality of viewport centers for the viewpoints in the viewpoint set.
  • the information indicative of the plurality of viewport center points may indicate that a second viewport and a first viewport are corresponding viewpoint centers associated with a second viewpoint and a first viewpoint, respectively.
  • the decoder presents a first viewport of a first viewpoint of the viewpoint set to a user, for example by forwarding the first viewport of the first viewpoint toward a display.
  • the decoder switches from the first viewpoint to a second viewpoint of the viewpoint set. For example, the decoder may determine to make such a switch upon receiving a command from the user.
  • the decoder determines a second viewport of the second viewpoint based on the information indicative of a plurality of viewport centers for the viewpoints in the viewpoint set as obtained and processed at step 1101.
  • the decoder can then forward video data associated with the second viewport toward the display.
  • the first/second viewpoint and the first/second viewport may also be referred to as a source/destination viewpoint and viewport, a switched from/switched to viewpoint and viewport, an initial/final viewpoint and viewport, initial/switched viewpoint and viewport, etc.
  • the information indicative of the plurality of viewport centers used to determine the second viewport based on the first viewport can be coded as a pair in a vcpc sample entry in some examples.
  • the information indicative of the plurality of viewport centers can also be coded as a set containing a plurality of viewport centers in one or more vcpc sample entries.
  • the information indicative of the plurality of viewport centers can be coded in a timed metadata track related to the plurality of viewpoints.
  • the information indicative of the plurality of viewport centers can be coded in a sphere region structure.
  • FIG. 12 is another embodiment of a method 1200 of displaying a VR video at a decoder based on a viewport center point correspondence between multiple viewpoints, as discussed with respect to diagrams 800 and 900 and as applied to viewpoints such as viewpoints 702, 703, and/or 704.
  • method 1200 may be employed by a decoder 107, a decoder 600, and/or a codec system 400.
  • Method 1200 can also be employed to support viewpoint switching when displaying a VR video file, such as VR video file 1000, and hence can be employed to improve methods 200 and 300.
  • a client operating a decoder receives a bitstream including at least a portion of a coded VR video filmed from a plurality of viewpoints.
  • the bitstream includes a correspondence between viewport centers for the viewpoints.
  • the correspondence between the viewport centers can be coded as a pair in a vcpc sample entry.
  • the correspondence between the viewport centers can be coded as a set containing a plurality of viewport centers in one or more vcpc sample entries.
  • the correspondence between the viewport centers may be coded in a timed metadata track related to the plurality of viewpoints.
  • the correspondence between the viewport centers can be coded in a sphere region structure, for example in the timed metadata track and containing the vcpc sample entries.
  • the decoder decodes the portion of the VR video at a center point of a current viewport at a current viewpoint.
  • the decoder also forwards the portion of the VR video at the current viewport toward a display.
  • the decoder determines to switch from the source viewpoint to a destination viewpoint, for example in response to user input. Accordingly, the decoder determines a destination viewport at the destination viewpoint based on the source viewport and the correspondence between viewport centers for the viewpoints.
  • the decoder decodes the portion of the VR video at a center point of the destination viewport at the destination viewpoint. The decoder can then forward the portion of the VR video at the destination viewport toward the display.
  • FIG. 13 is an embodiment of a method 1300 of signaling a viewport center point correspondence between multiple viewpoints in a VR video from an encoder, as discussed with respect to diagrams 800 and 900 and as applied to viewpoints such as viewpoints 702, 703, and/or 704.
  • method 1300 may be employed by an encoder 103, an encoder 500, and/or a codec system 400.
  • Method 1300 can also be employed to support generation of a VR video file, such as VR video file 1000, and hence can be employed to improve methods 200 and 300.
  • Method 1300 operates on an encoder, for example on a computer system configured to encode VR video.
  • the encoder receives a VR video signal filmed from a plurality of viewpoints.
  • the encoder determines a correspondence between viewport centers for the viewpoints. Data to support the determination of correspondences may be input by the user, received from the cameras, determined based on global positioning system (GPS) data, etc.
  • GPS global positioning system
  • the encoder encodes the correspondence between the viewport centers for the viewpoints in a bitstream.
  • the correspondence between the viewport centers can be coded as a pair in a vcpc sample entry.
  • the correspondence between the viewport centers can be coded as a set containing a plurality of viewport centers in one or more vcpc sample entries.
  • the correspondence between the viewport centers may be coded in a timed metadata track related to the plurality of viewpoints.
  • the correspondence between the viewport centers can be coded in a sphere region structure, for example in the timed metadata track and contain the vcpc sample entries.
  • the encoder can transmit the bitstream containing the correspondence between the viewport centers for the viewpoints to support viewpoint transitions when displaying the VR video signal.
  • the encoder can transmit the bitstream toward a client with a decoder.
  • the encoder can transmit the bitstream toward a server, which can store the bistream for further transmissions to client(s).
  • the correspondence between the viewport centers indicates a correspondence between a center point of a source viewport at a source viewpoint and a center point of a destination viewport at a destination viewpoint to maintain a consistent object view upon viewpoint switching at the client/decoder.
  • FIG. 14 is a schematic diagram of an example video coding device 1400 according to an embodiment of the disclosure.
  • the coding device 1400 is suitable for implementing the methods and processes disclosed herein.
  • the coding device 1400 comprises downstream ports 1410 and transceiver units (Tx/Rx) 1420 for transmitting and receiving data to and from a downstream direction; a processor 1430, logic unit, or central processing unit (CPU) to process the data; upstream ports 1450 coupled to Tx/Rx 1420 for transmitting and receiving the data to and from an upstream direction; and a memory 1460 for storing the data.
  • Tx/Rx transceiver units
  • CPU central processing unit
  • the coding device 1400 may also comprise optical-to-electrical (OE) components and/or electrical-to-optical (EO) components coupled to the downstream ports 1410, the Tx/Rx units 1420, and the upstream ports 1450 for egress or ingress of optical or electrical signals.
  • OE optical-to-electrical
  • EO electrical-to-optical
  • the processor 1430 is implemented by hardware and software.
  • the processor 1430 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor), field- programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and digital signal processors (DSPs).
  • the processor 1430 is in communication with the downstream ports 1410, transceiver units 1420, upstream ports 1450, and memory 1460.
  • the processor 1430 comprises a coding module 1470.
  • the coding module 1470 implements the disclosed embodiments described above.
  • the coding module 1470 may implement an encoder 103, an encoder 500, a decoder 107, a decoder 600, and/or a codec system 400, depending on the example.
  • the coding module 1470 may implement method 200, method 300, method 1100, 1200, and/or method 1300, depending on the example.
  • coding module 1470 may generate or decode a VR video file 1000.
  • the coding module 1470 can encode or decode VR video based on a timed metadata track 1060 that contains correspondences between viewport centers of different viewpoints, such as viewpoints 702, 703, 704, 810, 820, 910, 920, 940, and/or 950 as discussed with respect to diagrams 800 and 900.
  • coding module 1470 when acting as an encoder, can determine and encode correspondences between viewport center points for viewpoints in pairs and/or sets, for example in a SphereRegionStruct object in a timed metadata track.
  • the coding module 1470 can determine such correspondences and use them when switching between viewpoints to provide the user with a consistent view of a location/object.
  • the coding module 1470 can determine a source viewport at a source viewpoint and determine a destination viewport at a destination viewpoint based on the correspondences between the center points of such viewports.
  • the inclusion of the coding module 1470 therefore provides a substantial improvement to the functionality of the coding device 1400 and effects a transformation of the coding device 1400 to a different state.
  • the coding module 1470 is implemented as instructions stored in the memory 1460 and executed by the processor 1430.
  • the video coding device 1400 may also include input and/or output (I/O) devices 1480 for communicating data to and from a user.
  • the I/O devices 1480 may include output devices such as a display for displaying video data, speakers for outputting audio data, etc.
  • the TO devices 1480 may also include input devices, such as a keyboard, mouse, trackball, etc., and/or corresponding interfaces for interacting with such output devices.
  • the memory 1460 comprises one or more disks, tape drives, and solid-state drives and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution.
  • the memory 1460 may be volatile and non-volatile and may be read-only memory (ROM), random-access memory (RAM), ternary content-addressable memory (TCAM), and static random-access memory (SRAM).
  • FIG. 15 is a schematic diagram of an embodiment of a system 1500 for signaling a viewport center point correspondence between multiple viewpoints in a VR video.
  • the system 1500 is suitable for implementing the methods and processes disclosed herein, for example, may implement method 200, method 300, method 1100, 1200, and/or method 1300, depending on the example.
  • the system 1500 includes a video encoder 1502.
  • the encoder 1502 comprises a receiver 1501 for receiving a VR video signal filmed from a plurality of viewpoints.
  • the encoder 1502 also comprises a correspondence determination module 1503 for determining a correspondence between viewport centers for the viewpoints.
  • the encoder 1502 also comprises an encoding module 1505 for encoding the correspondence between the viewport centers for the viewpoints in a bitstream.
  • the encoder 1502 also comprises a transmitter 1507 for transmitting the bitstream containing the correspondence between the viewport centers for the viewpoints to support viewpoint transitions when displaying the VR video signal.
  • the encoder 1502 is further configured to perform other encoding related mechanisms as discussed herein.
  • the system 1500 also includes a video decoder 1510.
  • the decoder 1510 comprises a receiver 1511 for receiving a bitstream including at least a portion of a coded VR video filmed from a plurality of viewpoints and including a correspondence between viewport centers for the viewpoints.
  • the decoder 1510 also comprises a decoding module 1513 for decoding the portion of the VR video at a center point of a source viewport at a source viewpoint, and decoding the portion of the VR video at a center point of a destination viewport at a destination viewpoint.
  • the decoder 1510 also comprises a determining module 1515 for determining to switch from the source viewpoint to the destination viewpoint, and determining the destination viewport at the destination viewpoint based on the source viewport and the correspondence between viewport centers for the viewpoints.
  • the decoder 1510 also comprises a forwarding module 1517 for forwarding the portion of the VR video at the source viewport toward a display, and forwarding the portion of the VR video at the destination viewport toward the display.
  • the decoder 1510 is further configured to perform other decoding, display, and/or viewpoint switching related mechanisms as discussed herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un mécanisme de codage vidéo de réalité virtuelle (RV). Le mécanisme consiste à recevoir un flux binaire comprenant au moins une partie d'une vidéo de réalité virtuelle codée (VR) filmée à partir d'une pluralité de points de vue et comprenant une correspondance entre des centres de fenêtres d'affichage pour les points de vue. La partie de la vidéo VR à un point central d'une fenêtre d'affichage source au niveua d'un point de vue source est décodée puis transmise à un dispositif d'affichage. Le point de vue source est commuté vers un point de vue de destination. Une fenêtre d'affichage de destination est déterminée au niveau du point de vue de destination d'après la fenêtre d'affichage source et la correspondance entre les centres des fenêtres d'affichage pour les points de vue. La partie de la vidéo VR à un point central de la fenêtre d'affichage de destination au niveau du point de vue de destination est décodée puis transmise au dispositif d'affichage.
PCT/US2019/052894 2018-09-27 2019-09-25 Signalisation de correspondance de point central de fenêtre d'affichage d'un point de vue de réalité virtuelle WO2020068935A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862737651P 2018-09-27 2018-09-27
US62/737,651 2018-09-27

Publications (1)

Publication Number Publication Date
WO2020068935A1 true WO2020068935A1 (fr) 2020-04-02

Family

ID=69950841

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/052894 WO2020068935A1 (fr) 2018-09-27 2019-09-25 Signalisation de correspondance de point central de fenêtre d'affichage d'un point de vue de réalité virtuelle

Country Status (1)

Country Link
WO (1) WO2020068935A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021262385A1 (fr) * 2020-06-23 2021-12-30 Tencent America LLC Signalisation de débit de bande passante au moyen d'une piste de segment à indice de combo dans une diffusion en continu multimédia
EP4258222A4 (fr) * 2020-12-02 2024-05-22 Tencent Technology (Shenzhen) Company Limited Procédé de traitement de données et appareil pour média immersif, et support de stockage lisible par ordinateur

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170269685A1 (en) * 2016-03-17 2017-09-21 Sony Interactive Entertainment Inc. Spectating Virtual (VR) Environments Associated With VR User Interactivity
US20170310723A1 (en) * 2016-04-22 2017-10-26 Home Box Office, Inc. Streaming media state machine
US20170359624A1 (en) * 2016-06-08 2017-12-14 Sphere Optics Company, Llc Multi-view point/location omni-directional recording and viewing
WO2018021067A1 (fr) * 2016-07-29 2018-02-01 ソニー株式会社 Dispositif de traitement d'image et procédé de traitement d'image
US20180041715A1 (en) * 2016-06-27 2018-02-08 Adtile Technologies Inc. Multiple streaming camera navigation interface system
US20180063505A1 (en) * 2016-08-25 2018-03-01 Lg Electronics Inc. Method of transmitting omnidirectional video, method of receiving omnidirectional video, device for transmitting omnidirectional video, and device for receiving omnidirectional video

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170269685A1 (en) * 2016-03-17 2017-09-21 Sony Interactive Entertainment Inc. Spectating Virtual (VR) Environments Associated With VR User Interactivity
US20170310723A1 (en) * 2016-04-22 2017-10-26 Home Box Office, Inc. Streaming media state machine
US20170359624A1 (en) * 2016-06-08 2017-12-14 Sphere Optics Company, Llc Multi-view point/location omni-directional recording and viewing
US20180041715A1 (en) * 2016-06-27 2018-02-08 Adtile Technologies Inc. Multiple streaming camera navigation interface system
WO2018021067A1 (fr) * 2016-07-29 2018-02-01 ソニー株式会社 Dispositif de traitement d'image et procédé de traitement d'image
US20180063505A1 (en) * 2016-08-25 2018-03-01 Lg Electronics Inc. Method of transmitting omnidirectional video, method of receiving omnidirectional video, device for transmitting omnidirectional video, and device for receiving omnidirectional video

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021262385A1 (fr) * 2020-06-23 2021-12-30 Tencent America LLC Signalisation de débit de bande passante au moyen d'une piste de segment à indice de combo dans une diffusion en continu multimédia
CN114616801A (zh) * 2020-06-23 2022-06-10 腾讯美国有限责任公司 在媒体流式传输中使用组合索引段轨道用信号通知带宽上限
CN114616801B (zh) * 2020-06-23 2023-12-26 腾讯美国有限责任公司 视频编码的方法、装置、设备以及存储介质
US11973817B2 (en) 2020-06-23 2024-04-30 Tencent America LLC Bandwidth cap signaling using combo-index segment track in media streaming
EP4258222A4 (fr) * 2020-12-02 2024-05-22 Tencent Technology (Shenzhen) Company Limited Procédé de traitement de données et appareil pour média immersif, et support de stockage lisible par ordinateur

Similar Documents

Publication Publication Date Title
US11438600B2 (en) Immersive media metrics for virtual reality content with multiple viewpoints
US11917130B2 (en) Error mitigation in sub-picture bitstream based viewpoint dependent video coding
TWI712313B (zh) 感興趣區之發信號之系統及方法
US11706398B2 (en) Signaling a cancel flag in a video bitstream
TW201838407A (zh) 適應性擾動立方體之地圖投影
WO2020068935A1 (fr) Signalisation de correspondance de point central de fenêtre d'affichage d'un point de vue de réalité virtuelle
WO2020068284A1 (fr) Groupement de points de vue de réalité virtuelle (vr)
WO2019200227A1 (fr) Signalisation de correspondance de zone spatiale entre des points de vue de réalité virtuelle

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19864638

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19864638

Country of ref document: EP

Kind code of ref document: A1