WO2018223241A1 - Building and rendering immersive virtual reality experiences - Google Patents

Building and rendering immersive virtual reality experiences Download PDF

Info

Publication number
WO2018223241A1
WO2018223241A1 PCT/CA2018/050690 CA2018050690W WO2018223241A1 WO 2018223241 A1 WO2018223241 A1 WO 2018223241A1 CA 2018050690 W CA2018050690 W CA 2018050690W WO 2018223241 A1 WO2018223241 A1 WO 2018223241A1
Authority
WO
WIPO (PCT)
Prior art keywords
mapping
manifest
encoded stream
video
media objects
Prior art date
Application number
PCT/CA2018/050690
Other languages
French (fr)
Inventor
Stephane Levesque
Christian EVE-LEVESQUE
Daniel LORENZO
Jorel AMTHOR
Kevin Ouellet
Original Assignee
Vimersiv Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vimersiv Inc. filed Critical Vimersiv Inc.
Publication of WO2018223241A1 publication Critical patent/WO2018223241A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/017Head mounted
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • G06F3/147Digital output to display device ; Cooperation and interconnection of the display device with other functional units using display panels
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/36Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
    • G09G5/37Details of the operation on graphic patterns
    • G09G5/377Details of the operation on graphic patterns for mixing or overlaying two or more graphic patterns
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/156Mixing image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/194Transmission of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/0179Display position adjusting means not related to the information to be displayed
    • G02B2027/0187Display position adjusting means not related to the information to be displayed slaved to motion of at least a part of the body of the user, e.g. head, eye
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2370/00Aspects of data communication
    • G09G2370/02Networking aspects
    • G09G2370/027Arrangements and methods specific for the display of internet documents
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G2370/00Aspects of data communication
    • G09G2370/20Details of the management of multiple sources of image data
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G3/00Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes
    • G09G3/001Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes using specific devices not provided for in groups G09G3/02 - G09G3/36, e.g. using an intermediate record carrier such as a film slide; Projection systems; Display of non-alphanumerical information, solely or in combination with alphanumerical information, e.g. digital display on projected diapositive as background
    • G09G3/003Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes using specific devices not provided for in groups G09G3/02 - G09G3/36, e.g. using an intermediate record carrier such as a film slide; Projection systems; Display of non-alphanumerical information, solely or in combination with alphanumerical information, e.g. digital display on projected diapositive as background to produce spatial visual effects

Definitions

  • the subject matter disclosed herein relates generally to the field of virtual reality, and more particularly to binocular virtual reality systems.
  • the disclosed subject matter further relates to receiving, decoding, rendering, compositing and/or displaying virtual reality content, as well as to methods and systems for manipulating and rendering media objects, such as video, image, and/or interactively coded media objects, to create composited three-dimensional (3D) immersive scenes.
  • VR experiences such as 360-degree video experiences
  • HMD head-mounted displays
  • mobile devices such as tablets and smart phones
  • Internet browsers Internet browsers
  • TV connected televisions
  • set-top boxes Accessibility of such experiences is also growing.
  • resolution and processing power is generally increasing
  • streaming methods are being optimized, and higher speed Internet connections are becoming more widely available to unlock rapid access to yet higher quality experiences.
  • a VR device may simultaneously display video content, still image content, and interactive content. This section does not constitute an admission of prior art. Summary
  • An immersive virtual reality (VR) system and related method are provided herein.
  • Exemplary embodiments of the present disclosure provide improvements to VR technology that may increase computational efficiency, as well as decrease computation costs and power requirements involved, in the
  • VR experiences may increase the flexibility and range of options for displaying VR content and for increasing interactivity within the VR experience.
  • An exemplary method for building and rendering 360-degree interactive video experiences, the method comprising receiving a plurality of media objects, and at least one of positioning, orienting, and applying spatial surface geometry to media objects.
  • the method may be applied where the at least one of positioning, orienting, and applying spatial surface geometry to media objects is done in accordance with a mapping manifest.
  • An exemplary method for providing immersive video experiences includes building a mapping manifest, receiving a plurality of media objects, multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest, and providing at least one of the mapping manifest or the encoded stream for an immersive video experience.
  • the method may be applied where the encoded stream comprises hypertext markup language (HTML) defining the mapping manifest.
  • HTML hypertext markup language
  • the method may further include decoding the encoded stream and rendering the decoded stream in accordance with the mapping manifest.
  • the method may be applied where the received plurality of media objects is of a first type, and the method may further include receiving a second plurality of media objects of a different second type, multiplexing the received second plurality of media objects into a second encoded stream in accordance with the mapping manifest, providing the second encoded stream to the immersive device, and decoding the second encoded stream and rendering the second decoded stream on the immersive device.
  • the method may be applied where the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
  • the method may further include at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects.
  • the method may be applied where the at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects is done in accordance with the mapping manifest.
  • the method may be applied where the immersive video experience has a full or 360-degree spherical field of view.
  • the method may be applied where the immersive video experience has a partial, such as 180-degree dome, field of view.
  • An exemplary program storage device tangibly embodying instructions executable by a processor for building a mapping manifest, receiving a plurality of media objects, multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest, and providing at least one of the mapping manifest or the encoded stream for an immersive video experience.
  • the device may be applied where the received plurality of media objects is of a first type, and further include instructions for receiving a second plurality of media objects of a different second type, multiplexing the received second plurality of media objects into a second encoded stream in accordance with the mapping manifest, and providing the second encoded stream for the immersive video experience.
  • the device may be applied where the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
  • the device may further include instructions for at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects.
  • the device may be applied where the at least one of positioning, orienting, and applying spatial surface geometry to plurality of media objects is done in accordance with the mapping manifest.
  • the device may be applied where the immersive video experience has a full or 360-degree spherical field of view.
  • the device may be applied where the immersive video experience has a partial, such as a 180-degree dome, field of view.
  • An exemplary program storage device tangibly embodying instructions executable by a processor for receiving at least one encoded stream, receiving a mapping manifest, decoding the at least one encoded stream with a single decoder or interpreter, and rendering the decoded stream as an immersive video experience in accordance with the mapping manifest.
  • the device may be applied where the received encoded stream is of a first type and may further include instructions for receiving a second encoded stream of a second type different from a type of the at least one encoded stream, decoding the second encoded stream with a second decoder or interpreter, and rendering the second decoded stream as part of the immersive video experience in accordance with the mapping manifest.
  • the device may be applied where the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
  • the device may be applied where the immersive video experience has a full or 360-degree spherical field of view.
  • the device may be applied where the immersive video experience has a partial, such as 180-degree dome, field of view.
  • Figure 1 is a block diagram illustrating the multiple media elements used to perform compositing of the 3D immersive scene in accordance with an exemplary embodiment of the present disclosure
  • Figure 2 is a schematic diagram showing conceptually how the multiple media elements and their components are combined to create a 3D immersive scene in accordance with an exemplary embodiment of the present disclosure
  • Figure 3 is a block diagram illustrating an exemplary mapping of media elements according to mapping manifests and the process for multiplexing many media object of a family of media element into one element in accordance with an exemplary embodiment of the present disclosure
  • Figure 4 is a process diagram illustrating exemplary processes for multiplexing many media objects to enable one decoding activity to construct a scene in accordance with an exemplary embodiment of the present disclosure
  • Figure 5 is a block diagram illustrating exemplary processes for a compositing engine to build the 3D environment scene in accordance with an exemplary embodiment of the present disclosure
  • Figure 6 is a schematic diagram illustrating exemplary combinations of two video elements to construct a 360-degree (Spherical) and a 180-degree (Full- Dome) 3D scene both with a floating jumbotron-style video element in
  • Figure 7 is a block diagram illustrating an interactive VR system having content supply and delivery sides in accordance with an exemplary embodiment of the present disclosure
  • Figure 8 is a block diagram illustrating an interactive VR system
  • Figure 9 is a block diagram illustrating an interactive VR system with projection based on experience, scene and media in accordance with an exemplary embodiment of the present disclosure
  • Figure 10 is a process diagram illustrating an interactive VR process for fetching and implementing experience in accordance with an exemplary embodiment of the present disclosure
  • Figure 1 1 is a process diagram illustrating an interactive VR process for capturing web content from viewport to GPU in accordance with an exemplary embodiment of the present disclosure
  • Figure 12 is a process diagram illustrating an interactive VR process for placing media object textures on a GPU in accordance with an exemplary embodiment of the present disclosure.
  • Figure 13 is a schematic diagram illustrating an interactive VR process for generating an exemplary web page in accordance with an exemplary
  • An interactive virtual reality (VR) system and related method are provided herein.
  • An exemplary embodiment implements a binocular virtual reality system on a smartphone for an immersive experience. It shall be understood that an immersive experience does not necessarily have to allow a full 360 degrees of rotation or total immersion.
  • a feature of the present disclosure is to reduce the computational requirements necessary for user hardware to adequately function in a virtual reality (VR) system.
  • One of the challenges in VR displays is hardware meeting the computing power requirements.
  • a VR device may simultaneously display video content, still image content, and interactive content. For example, if a head-mounted display (HMD) is used to play a three- dimensional (3D) movie, the 3D movie might be shown on a simulated theater screen a certain distance in front of the viewer.
  • HMD head-mounted display
  • 3D movie might be shown on a simulated theater screen a certain distance in front of the viewer.
  • Foreground imagery which may be still image content, may surround the display showing theater seats, curtains, or the like. There may also be background imagery filling the space behind the scene, such as for example when the viewer turns to look away from the virtual screen. It may be desirable for there to be multiple video feeds shown at the same time, for example, in a virtual display of multiple TV screens, or on a Jumbotron in a virtual arena.
  • a VR display may also display interactive content such as Hypertext Mark-up Language (HTML) content obtained from the web.
  • HTML Hypertext Mark-up Language
  • VR video-on- demand
  • VOD video-on- demand
  • decoders When content from multiple sources is displayed, multiple instances of decoders may be instantiated.
  • video data is typically provided in encoded form, such as Moving Picture Experts Group (MPEG) encoded format, which requires decoding by a codec prior to display. If more than one video source is presented to a viewer, multiple codec instances may be executed to decode the multiple video sources.
  • MPEG Moving Picture Experts Group
  • JPEG Joint Photographic Experts Group
  • Such decoders are computationally costly. In fact, when using a
  • smartphone-based VR display for example, battery life is notoriously short and overheating can be a problem.
  • An exemplary embodiment of the present disclosure uses a single decoder for each type of data (i.e., a single video codec, a single image codec, and a single browser instance) to decode the data of each respective type.
  • a decoder may comprise a non-displayed browser component or interpreter, such as for interpreting hypertext markup language (HTML).
  • HTML hypertext markup language
  • a 1920x2160 video is provided comprising the content of both videos such that a single codec is used to decode it.
  • the same is done for images, and the same is done for interactive content, which in the present example is web data, and more specifically HTML data.
  • This is provided in a single web page and decoded using a single browser.
  • the computational cost of decoding a double-sized video image is far lower than the cost of decoding two single-sized video images.
  • battery life for the user hardware may be greatly improved.
  • the output of the decoders is fed to a compositing and rendering engine which receives the decoded data and cuts it out into appropriate pieces, such as separating the two stacked video frames in the 1920x1080 example above, places them in an appropriate location in a 3D space, and renders the images for display on the smart phone screen.
  • a compositing and rendering engine which receives the decoded data and cuts it out into appropriate pieces, such as separating the two stacked video frames in the 1920x1080 example above, places them in an appropriate location in a 3D space, and renders the images for display on the smart phone screen.
  • An exemplary method and system are provided for building and rendering
  • an exemplary embodiment method is provided for compositing a three- dimensional (3D) immersive scene 1 10 from media objects or elements 120 such as video streams 124, image streams 122, cameras 126, screen captures 128, other media elements 125, interactive coded object streams 130 and the like to build the 3D environment world including a background layer 1 12, a main video content layer 1 14, a foreground layer 1 16, a combination of interactive layers 1 18 fixed in space 132, fixed to camera 134, and fixed to main video content layer 136.
  • media objects or elements 120 such as video streams 124, image streams 122, cameras 126, screen captures 128, other media elements 125, interactive coded object streams 130 and the like
  • 3D environment world including a background layer 1 12, a main video content layer 1 14, a foreground layer 1 16, a combination of interactive layers 1 18 fixed in space 132, fixed to camera 134, and fixed to main video content layer 136.
  • a method indicated generally by the reference numeral 200 includes the positioning of media objects at different positions, orientations, and applied on specific spatial surface geometries such as on a sphere, cylinder, cube, dome, plane, and/or custom geometries.
  • a scene description 210 includes a background descriptor 212, main content descriptor 214, foreground descriptor 216, interactive layers descriptor 218, subtitles descriptor 219, and a HUD descriptor 21 1 .
  • Layers 220 corresponding to the scene description 210 are formed including a background layer 222 corresponding to the background descriptor 212, a main layer 224 corresponding to the main content descriptor 214, a foreground layer 226 corresponding to the foreground descriptor 216, an interactive layer 228 corresponding to the interactive layers descriptor 218, a subtitle layer 229 corresponding to the subtitles descriptor 219, and a HUD layer 221 corresponding to the HUD descriptor 21 1 .
  • a composition 230 corresponding to the layers 220 and scene description 210 is formed including a background feature 232 corresponding to the background layer 222 and background descriptor 212, a main content feature 234 corresponding to the main layer 224 and the main content descriptor 214, a foreground feature 236 corresponding to the foreground layer 226 and the foreground descriptor 216, an interactive feature 238 corresponding to the interactive layer 228 and the interactive layers descriptor 218, a subtitle feature 239 corresponding to the subtitle layer 229 and the subtitles descriptor 219, and a HUD feature 231 corresponding to the HUD layer 221 and the HUD descriptor 21 1 .
  • the method may further provide for the scene configuration, that is, the complete set of configuration parameters for all media objects contained in the scene, to be stored in the general mapping manifests indicated generally by the reference numeral 300.
  • Each mapping manifest is configurable and updated via real-time data streams to update dynamically the scene configuration while the user is immersed within the scene, such as to update a new background layer, and/or update a new layer of interactivity based on user behavior.
  • a video element mapping manifest 340 maps a first video stream 341 , a second video stream 342, a third video stream 343, and/or the like into the scene.
  • mapping manifests may be implemented for still image content, where the mapping manifests may together make up a general mapping manifest defining the display of general VR content including video, interactive and still image content in a 3D VR scene.
  • an interactive object code mapping manifest 350 maps a first interactive object code stream 351 , a second interactive object code stream 352, a third interactive object code stream 353, and the like into the scene.
  • the general mapping manifest may be provided to the VR device such that the VR device may decode and properly display provided media according to the mappings of the mapping manifests.
  • mapping manifests may be separate data structures or may be incorporated into an encoded stream or web content.
  • the encoded stream may include HTML defining a mapping manifest.
  • the method indicated generally by the reference numeral 400 may include a start block 441 , which passes control to a function block 442, which receives a plurality of video streams.
  • the function block 442 passes control to a function block 444, which multiplexes the plurality of video streams to obtain one combined video stream with specific video zones dedicated to initial individual video streams.
  • the function block 444 passes control to a function block 446, which updates the video element mapping manifest with specific properties for each video zone.
  • the function block 446 passes control to a function block 448, which positions specific video zones in the 3D virtual environment defined by position, orientation, and surface geometry.
  • the function block 448 passes control to an end block 449.
  • the method may further include a start block 451 , which passes control to a function block 452, which receives a plurality of interactive object code streams.
  • the function block 452 passes control to a function block 454, which multiplexes the plurality of interactive object code streams to obtain one combined interactive object code stream with specific interactive object code zones dedicated to initial individual interactive object code streams.
  • the function block 454 passes control to a function block 456, which updates the interactive object code element mapping manifest with specific properties for each interactive object code zone.
  • the function block 456 passes control to a function block 458, which positions specific interactive object code zones in the 3D virtual environment defined by position, orientation, and surface geometry.
  • the combination of multiple media objects from the same families, such as video elements, image elements, and interactive object code elements, via multiplexing may be decoded as one media object.
  • the then multiplexed media object contains all media objects from a same family divided into multiple zones which may be represented by areas of pixels. Configuration of each zone is stored and updated in the media object mapping manifest. Each zone is independent in positioning, orientation, visibility, and spatial surface geometry, where configurations from other media objects from the same family are all decoded at the same time. Benefits from combining all media objects into one family and performing decoding activities once offers an improvement in optimization of processing resources.
  • a method of decoding, tiling, rendering and compositing is indicated generally by the reference numeral 500.
  • the method 500 receives media elements 540 including media content 542 and media description 544.
  • the media content is passed to a video decoder 546 or an image decoder 548, while the media description is passed to a projection plug-in 560.
  • a tiling unit 562 includes texture 564 drawn from the video decoder and image decoder, and tiling information 566 drawn from the projection plug-in.
  • a rendering unit 567 includes sub-texture 568 drawn from the tiling unit, and 3D geometry 569 drawn from the projection plug-in.
  • the output of the rendering unit is passed to a composition engine 580.
  • the method 500 further receives interactive layer elements 550 including logic 552 and media description 554.
  • the logic is passed to a coded object interpreter or virtual machine 556, while the media description is passed to a projection plug-in 570.
  • a tiling unit 572 includes texture 574 drawn from the coded object interpreter or virtual machine, and tiling information 576 drawn from the projection plug-in 570.
  • a rendering unit 577 includes sub-texture 578 drawn from the tiling unit 572, and 3D geometry 579 drawn from the projection plug-in 570. The output of the rendering unit 577 is passed to a composition engine 590.
  • the method may include the combination of media content such as video elements and image elements, or interactive object code elements, respectively, with media description configuration data; labeled together as Media Element and Interactive Layer Element, respectively.
  • the Media Element and Interactive Layer Element are then decoded with Element specific decoding processes, such as a video decoder for video elements, an image decoder for image elements, and a coded object interpreter or virtual machine for interactive layers.
  • Element specific decoding processes such as a video decoder for video elements, an image decoder for image elements, and a coded object interpreter or virtual machine for interactive layers.
  • Media description configuration feeds a data stream to the projection plugin that will then construct the 3D surface geometry in the 3D scene environment.
  • the Element specific texture is then divided into sub-textures using tiling information data from the Projection Plugin.
  • the sub-texture is then applied to the specific 3D surface geometry previously created by the projection plugin to render layers.
  • the Composition Engine composes the scene by
  • a first immersive scene 610 is built from a main video stream 612 comprising equirectangular 360-degree video and a video stream 614 comprising a jumbotron feed. These video streams are mapped using a video element tiling mapping manifest 616 and rendered on a sphere 618.
  • a second immersive scene 620 is built from a main video stream 622 comprising full-dome 180-degree video and a video stream 624 comprising a jumbotron feed. These video streams are mapped using a video element tiling mapping manifest 626 and rendered on a sphere 628.
  • an immersive scene built with a spherical video and a floating jumbotron screen would be multiplexed as one video element.
  • the majority of the pixels would be allocated to the main video layer, being the spherical video in this example, and a small portion of the jumbotron feed could be multiplexed in the lower end of the combined video element.
  • Allocated pixels of video for the sphere and for the jumbotron are stored in the mapping manifest.
  • the rendering engine uses the mapping manifest to perform the scene construction applying the specific zones of the main video to a spatial sphere and to a floating plane in order to build, respectively, the spherical 360-degree video environment with the floating jumbotron.
  • the twitter feed web element and the live stats web element would be multiplexed into one web page element.
  • Allocated pixels of dynamic HTML for the twitter feed zone would be applied on spatial cylinder geometry where the scrolling twitter feed would appear, and the allocated pixels of dynamic HTML for the live stats floating table would also appear at a specific location as designated in the mapping manifest.
  • an exemplary interactive VR system is indicated generally by the reference numeral 700.
  • the system 700 includes a content management system (CMS) 710, which includes a publisher interface 712 connected to experiences storage 714.
  • CMS content management system
  • CDN content distribution network
  • the system 700 includes an operator interface 716 connected to web content storage 718, an images composition unit connected to images storage 724, and a videos composition unit 726 connected to videos storage 728.
  • a mapping manifest may be sent from the CMS to the experience manager.
  • the system 700 includes a client 730.
  • the client 730 includes a user interface 732 connected to an experience manager 734, which also receives input from the experiences storage 714.
  • experiences manager in turn, is connected to a web content composition unit 736, which also receives input from the web content storage 718 and provides output to a browser 738, which, in turn, provides output to a projection and mapping unit 740.
  • An image codec 742 receives input from the images storage 724 and also provides output to the projection and mapping unit 740.
  • a video codec 744 receives input from the videos storage 728 and further provides output to the projection and mapping unit 740.
  • the projection and mapping unit 740 receives input directly from the experience manager 734 and provides output to a GPU 746.
  • a multiplexing unit is indicated generally by the reference numeral 800.
  • the multiplexing unit is configured to combine video sources using predefined layouts matching a projection.
  • the multiplexing unit 800 includes a hardware production portion 810, a CDN 820, a software production portion 830, and a server portion 840 connected to the software production portion.
  • the CDN receives input from both the hardware production portion 810 and the server 840.
  • the hardware production portion 810 includes a live VR camera feed 812 and a broadcast or television feed 814, both connected to a hardware
  • the hardware multiplexing unit 816 is connected to a videos storage 822 of the CDN 820.
  • the software production portion 830 includes a live VR camera feed 832 and a broadcast or television feed 834.
  • the server 840 includes a software multiplexing unit 846, where the software multiplexing unit 846 receives input from both the live VR camera feed 832 and the broadcast or television feed 834 of the hardware production portion 830.
  • an exemplary interactive VR system is indicated generally by the reference numeral 900.
  • an experience manager 910 provides from one to n experience templates 920, which are fed into from one to two scene templates 940 from a Tenderer 930. the combined templates are fed to a media unit 960, which inserts video media 970 and image media 980, and feeds the combined media to a projection and format unit 950.
  • an experience algorithm is indicated generally by the reference numeral 1000.
  • the algorithm includes a function block 1010 that receives an experience identifier from a user interface such as 732 of Figure 7 and passes an http or https request to a function block 1020.
  • the function block 1020 fetches experience data such as 920 of Figure 9 from a server such as 840 of Figure 8 via an application programming interface (API), and passes parallel control to three function blocks 1030, 1040 and 1050, respectively.
  • the function block 1030 creates a web page including all web content in different frames, while the function block 1040 creates video decoders, and while the function block 1050 creates image decoders.
  • a function block 1 1 10 receives a list of web contents and container sizes from an experience manager such as 734 of Figure 7 and passes control to a function block 1 120.
  • the function block 1 120 creates a web page including frames for each web content and passes control to a function block 1 130.
  • the function block 1 130 loads locally created web page to a web view or web browser, and passes control to a function block 1 140, which captures the web browser viewport or textual representation to the GPU each time that the viewport is updated.
  • a texture algorithm is indicated generally by the reference numeral 1200.
  • a function block 1210 receives a list of media assets in experience form the experience manager 734 of Figure 7 and/or 910 of Figure 9, and passes control to a function block 1220.
  • the function block 1220 places texture on the GPU and stores UVs, vertexes, and indices.
  • the UVs are for projecting a 2D image onto a 3D model's surface for texture mapping, where the letters U and V denote the axes of the 2D texture. This transfers texture by capturing from a video codec, for example, and mapping to a GPU.
  • Table A includes pseudocode for an exemplary embodiment experience algorithm. Table A
  • an experience manager algorithm comparable to 1 100 of Figure 1 1 is indicated generally by the reference numeral 1300.
  • the algorithm itself comprises function blocks 1310, 1320, 1330 and 1340 comparable to function blocks function blocks 1 1 10, 1 120, 1 130 and 1 140 of Figure 1 1 , so duplicate description may be omitted.
  • an exemplary web page created by the function block 1320 is indicated generally by the reference numeral 1350. This is an example of a web page that could be generated to display three different web contents, shown here as a menu 1352, an advertising banner 1354 and a scoreboard 1356. It shall be understood that the layout and contents may vary. Moreover, the advertising banner, for example, may be tuned to a particular device, user, location, or the like.
  • This tunability may be accomplished by variable content within a manifest or by variably tuned manifests.
  • the result will be loaded into a virtual web view and will not be displayed on a user's display in the layout form 1360, but rather in the visual form 1370.
  • each part or web content will be sliced, sent to the GPU as needed, and placed in the virtual environment at the designated position.
  • the technology defined herein may be implemented by a system which comprises computer-readable storage containing instructions for instructing a processing device or processing devices to execute the methods described herein.
  • the system is implemented on a computing device, such as a smart phone.
  • the exemplary device may comprise a network.
  • the exemplary device may comprise processing units in a processing entity or processor for performing general and specific processing.
  • the processing entity is at least partially programmable and may include a general-purpose processor as can be found in a smartphone.
  • the processing entity may also include function-specific processing units such as a GPU which may comprise hardware codecs.
  • the device may also comprise computer-readable storage, which may be distributed over several different physical memory entities such as chips, layers, or the like.
  • the computer-readable storage may be accessible by the processing entity for storing/retrieving operational data.
  • the computer-readable memory may also comprise program instructions readable and executable by the processing entity instructing the processing entity to implement the methods described herein.
  • the network interface may be adapted for receiving virtual reality content such as video, still image and interactive such as web or HTML content.
  • the device may comprise logic for decoding the received content, including, for example, a video decoder for decoding MPEG-4, HEVC, or the like, a still image decoder for decoding JPEG or the like, and an interactive content decoder such as a web browser.
  • the device may also comprise buffers for storing the decoded content including a frame buffer for storing decoded video frames, an image buffer for storing decoded still images and an interactive content buffer for storing interactive content.
  • the computer-readable storage may also comprise one or more mapping manifest, including one or more general mapping manifest as described herein which may be used by the system to determine how to provide the received VR content in a VR scene.
  • the device may be adapted for receiving mapping manifests such as over the network interface.
  • the device comprises computer-executable software code for causing the processing entity to implement one or more of the methods and techniques described herein, including for causing the device to map portions of received video, still image and/or interactive content, such as from respective buffers, onto a 3D VR scene, or more particularly onto surfaces or objects of a 3D VR scene such as texture, and to composite, render and display the resulting scene.
  • the creation of the 3D VR scene may also include generating 3D objects onto which to map the VR content.
  • the creation of the 3D VR objects may be done in accordance with or as instructed by the one or more mapping manifest(s).
  • the technology may include receiving a single video stream having images comprising different zones corresponding to different video content associated with different 3D surfaces in a virtual scene, buffering this image data in corresponding buffer zones and texturing the different 3D surfaces with the respective image data from the corresponding buffer zones. This may involve populating a frame buffer with different zones that contain different videos and generating a 3D scene with different surfaces corresponding to different zones. This feature may provide the benefit of providing the entire video content of a virtual reality experience in a single video stream, thereby reducing the resource drain of running multiple video decoders.
  • the present technology may include receiving a single video stream having images comprising different portions corresponding to different 3D geometries in a virtual representation, applying a distortion to the different portions in accordance with their respective 3D geometries and applying them as textures to the respective surfaces of 3D objects having their respective 3D geometry in a virtual scene.
  • different zones may be represented under different geometries in the 3D scene - e.g. zone #1 is stretched on a sphere and zone #2 is a cylinder. This feature may provide the benefit of providing access to both curved and planar content using a single stream with the same advantages as above.
  • the present technology may include receiving and applying at a content-receiving client a mapping manifest comprising mapping instructions for dividing image, video and web content into subdivisions and mapping the content of the subdivisions to respective 3D surfaces in a virtual scene.
  • a mapping manifest comprising mapping instructions for dividing image, video and web content into subdivisions and mapping the content of the subdivisions to respective 3D surfaces in a virtual scene.
  • the present technology may include populating a web cache with web content from a web page and assigning different portion of the web content different respective 3D surfaces in a virtual scene, deriving a displayable output for each of the different portion and displaying each of the displayable on their respective 3D surface.
  • the different portions may further be associated with different geometries, and their displayable output distorted accordingly.
  • the present technology may include rendering a 3D scene comprising different 3D objects or surfaces by creating separate compositions for each of the 3D objects or surfaces and superimposing the resulting compositions in decreasing order of distance from a virtual camera.
  • a 3D scene by compositing a superposition of layering.

Abstract

A virtual reality (VR) technology is provided that reduces the computational overhead required to present VR content and improves efficiency and battery life, including building and rendering immersive video experiences by building a mapping manifest; receiving a plurality of media objects; multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest; providing at least one of the mapping manifest or the encoded stream; and decoding the encoded stream and rendering the decoded stream in accordance with the mapping manifest.

Description

BUILDING AND RENDERING IMMERSIVE VIRTUAL REALITY EXPERIENCES
Cross-Reference
This application claims priority to U.S. provisional patent application no. 62/516,710 filed June 8, 2017, the content of which is hereby incorporated by reference.
Technical Field
The subject matter disclosed herein relates generally to the field of virtual reality, and more particularly to binocular virtual reality systems. The disclosed subject matter further relates to receiving, decoding, rendering, compositing and/or displaying virtual reality content, as well as to methods and systems for manipulating and rendering media objects, such as video, image, and/or interactively coded media objects, to create composited three-dimensional (3D) immersive scenes.
Background
Virtual Reality (VR) experiences, such as 360-degree video experiences, are growing in popularity. They may be consumed via head-mounted displays (HMD), mobile devices such as tablets and smart phones, Internet browsers, connected televisions (TV), and/or set-top boxes. Accessibility of such experiences is also growing. For example, resolution and processing power is generally increasing, streaming methods are being optimized, and higher speed Internet connections are becoming more widely available to unlock rapid access to yet higher quality experiences.
One of the challenges in VR displays is hardware meeting the computing power requirements. In a typical interaction, a VR device may simultaneously display video content, still image content, and interactive content. This section does not constitute an admission of prior art. Summary
An immersive virtual reality (VR) system and related method are provided herein. Exemplary embodiments of the present disclosure provide improvements to VR technology that may increase computational efficiency, as well as decrease computation costs and power requirements involved, in the
presentation of VR experiences. Optionally, they may increase the flexibility and range of options for displaying VR content and for increasing interactivity within the VR experience.
An exemplary method is provided for building and rendering 360-degree interactive video experiences, the method comprising receiving a plurality of media objects, and at least one of positioning, orienting, and applying spatial surface geometry to media objects. The method may be applied where the at least one of positioning, orienting, and applying spatial surface geometry to media objects is done in accordance with a mapping manifest.
An exemplary method for providing immersive video experiences includes building a mapping manifest, receiving a plurality of media objects, multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest, and providing at least one of the mapping manifest or the encoded stream for an immersive video experience. The method may be applied where the encoded stream comprises hypertext markup language (HTML) defining the mapping manifest.
The method may further include decoding the encoded stream and rendering the decoded stream in accordance with the mapping manifest. The method may be applied where the received plurality of media objects is of a first type, and the method may further include receiving a second plurality of media objects of a different second type, multiplexing the received second plurality of media objects into a second encoded stream in accordance with the mapping manifest, providing the second encoded stream to the immersive device, and decoding the second encoded stream and rendering the second decoded stream on the immersive device. The method may be applied where the first type is one of video, image or web content, and the second type is a different one of video, image or web content. The method may further include at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects. The method may be applied where the at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects is done in accordance with the mapping manifest.
The method may be applied where the immersive video experience has a full or 360-degree spherical field of view. The method may be applied where the immersive video experience has a partial, such as 180-degree dome, field of view.
An exemplary program storage device is provided, tangibly embodying instructions executable by a processor for building a mapping manifest, receiving a plurality of media objects, multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest, and providing at least one of the mapping manifest or the encoded stream for an immersive video experience.
The device may be applied where the received plurality of media objects is of a first type, and further include instructions for receiving a second plurality of media objects of a different second type, multiplexing the received second plurality of media objects into a second encoded stream in accordance with the mapping manifest, and providing the second encoded stream for the immersive video experience. The device may be applied where the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
The device may further include instructions for at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects. The device may be applied where the at least one of positioning, orienting, and applying spatial surface geometry to plurality of media objects is done in accordance with the mapping manifest. The device may be applied where the immersive video experience has a full or 360-degree spherical field of view. The device may be applied where the immersive video experience has a partial, such as a 180-degree dome, field of view. An exemplary program storage device is provided, tangibly embodying instructions executable by a processor for receiving at least one encoded stream, receiving a mapping manifest, decoding the at least one encoded stream with a single decoder or interpreter, and rendering the decoded stream as an immersive video experience in accordance with the mapping manifest.
The device may be applied where the received encoded stream is of a first type and may further include instructions for receiving a second encoded stream of a second type different from a type of the at least one encoded stream, decoding the second encoded stream with a second decoder or interpreter, and rendering the second decoded stream as part of the immersive video experience in accordance with the mapping manifest. The device may be applied where the first type is one of video, image or web content, and the second type is a different one of video, image or web content. The device may be applied where the immersive video experience has a full or 360-degree spherical field of view. The device may be applied where the immersive video experience has a partial, such as 180-degree dome, field of view.
Brief Description of the Drawings
The present disclosure may be better understood by way of the following detailed description when considered with reference to the appended drawings, in which:
Figure 1 is a block diagram illustrating the multiple media elements used to perform compositing of the 3D immersive scene in accordance with an exemplary embodiment of the present disclosure;
Figure 2 is a schematic diagram showing conceptually how the multiple media elements and their components are combined to create a 3D immersive scene in accordance with an exemplary embodiment of the present disclosure;
Figure 3 is a block diagram illustrating an exemplary mapping of media elements according to mapping manifests and the process for multiplexing many media object of a family of media element into one element in accordance with an exemplary embodiment of the present disclosure; Figure 4 is a process diagram illustrating exemplary processes for multiplexing many media objects to enable one decoding activity to construct a scene in accordance with an exemplary embodiment of the present disclosure;
Figure 5 is a block diagram illustrating exemplary processes for a compositing engine to build the 3D environment scene in accordance with an exemplary embodiment of the present disclosure;
Figure 6 is a schematic diagram illustrating exemplary combinations of two video elements to construct a 360-degree (Spherical) and a 180-degree (Full- Dome) 3D scene both with a floating jumbotron-style video element in
accordance with an exemplary embodiment of the present disclosure;
Figure 7 is a block diagram illustrating an interactive VR system having content supply and delivery sides in accordance with an exemplary embodiment of the present disclosure;
Figure 8 is a block diagram illustrating an interactive VR system
configured to combine video sources using a predefined layout matching a projection in accordance with an exemplary embodiment of the present disclosure;
Figure 9 is a block diagram illustrating an interactive VR system with projection based on experience, scene and media in accordance with an exemplary embodiment of the present disclosure;
Figure 10 is a process diagram illustrating an interactive VR process for fetching and implementing experience in accordance with an exemplary embodiment of the present disclosure;
Figure 1 1 is a process diagram illustrating an interactive VR process for capturing web content from viewport to GPU in accordance with an exemplary embodiment of the present disclosure;
Figure 12 is a process diagram illustrating an interactive VR process for placing media object textures on a GPU in accordance with an exemplary embodiment of the present disclosure; and
Figure 13 is a schematic diagram illustrating an interactive VR process for generating an exemplary web page in accordance with an exemplary
embodiment of the present disclosure. Detailed Description
An interactive virtual reality (VR) system and related method are provided herein. An exemplary embodiment implements a binocular virtual reality system on a smartphone for an immersive experience. It shall be understood that an immersive experience does not necessarily have to allow a full 360 degrees of rotation or total immersion. A feature of the present disclosure is to reduce the computational requirements necessary for user hardware to adequately function in a virtual reality (VR) system. One of the challenges in VR displays is hardware meeting the computing power requirements. In a typical interaction, a VR device may simultaneously display video content, still image content, and interactive content. For example, if a head-mounted display (HMD) is used to play a three- dimensional (3D) movie, the 3D movie might be shown on a simulated theater screen a certain distance in front of the viewer. Foreground imagery, which may be still image content, may surround the display showing theater seats, curtains, or the like. There may also be background imagery filling the space behind the scene, such as for example when the viewer turns to look away from the virtual screen. It may be desirable for there to be multiple video feeds shown at the same time, for example, in a virtual display of multiple TV screens, or on a Jumbotron in a virtual arena. A VR display may also display interactive content such as Hypertext Mark-up Language (HTML) content obtained from the web.
When consuming a VR experience, which might include watching a 360- degree video, for example, such video might be live or pre-recorded video-on- demand (VOD). Either way, the user, even though they may feel immersed in the experience, may wish to increase engagement with enrichment layers of interactivity or additional information.
When content from multiple sources is displayed, multiple instances of decoders may be instantiated. For example, video data is typically provided in encoded form, such as Moving Picture Experts Group (MPEG) encoded format, which requires decoding by a codec prior to display. If more than one video source is presented to a viewer, multiple codec instances may be executed to decode the multiple video sources. The same goes for images, which are typically encoded in Joint Photographic Experts Group (JPEG) form, and web content where multiple browser instances may be used to decode the HTML. Such decoders are computationally costly. In fact, when using a
smartphone-based VR display, for example, battery life is notoriously short and overheating can be a problem.
An exemplary embodiment of the present disclosure uses a single decoder for each type of data (i.e., a single video codec, a single image codec, and a single browser instance) to decode the data of each respective type. That is, a decoder may comprise a non-displayed browser component or interpreter, such as for interpreting hypertext markup language (HTML). Rather than decoding multiple videos using multiple codecs, those videos are merged into a single larger video which is then decoded at the user hardware using a single codec. For example, if two 1920x1080 videos are to be a displayed such as for left and right stereoscopic viewing, a 1920x2160 video is provided comprising the content of both videos such that a single codec is used to decode it. The same is done for images, and the same is done for interactive content, which in the present example is web data, and more specifically HTML data. This is provided in a single web page and decoded using a single browser. The computational cost of decoding a double-sized video image is far lower than the cost of decoding two single-sized video images. Hence there is a significant net savings in computational cost at the user hardware. Thus, for example, battery life for the user hardware may be greatly improved.
The output of the decoders is fed to a compositing and rendering engine which receives the decoded data and cuts it out into appropriate pieces, such as separating the two stacked video frames in the 1920x1080 example above, places them in an appropriate location in a 3D space, and renders the images for display on the smart phone screen. Although this is described in the context of a smart phone-based VR unit, it shall be understood that the presently disclosed technique has application in any type of VR unit.
An exemplary method and system are provided for building and rendering
360-degree interactive video experiences and the like. The description is intended to provide a skilled and knowledgeable reader with the ability to understand the present disclosure. To this end, examples are used to illustrate possible implementations. This is not intended to be limiting, and those of ordinary skill in the pertinent art may recognize possible variants and
modifications falling within the scope and spirit of the present disclosure.
As shown in Figure 1 , an exemplary embodiment method, indicated generally by the reference numeral 100, is provided for compositing a three- dimensional (3D) immersive scene 1 10 from media objects or elements 120 such as video streams 124, image streams 122, cameras 126, screen captures 128, other media elements 125, interactive coded object streams 130 and the like to build the 3D environment world including a background layer 1 12, a main video content layer 1 14, a foreground layer 1 16, a combination of interactive layers 1 18 fixed in space 132, fixed to camera 134, and fixed to main video content layer 136.
Turning to Figure 2. a method indicated generally by the reference numeral 200 includes the positioning of media objects at different positions, orientations, and applied on specific spatial surface geometries such as on a sphere, cylinder, cube, dome, plane, and/or custom geometries. For example, a scene description 210 includes a background descriptor 212, main content descriptor 214, foreground descriptor 216, interactive layers descriptor 218, subtitles descriptor 219, and a HUD descriptor 21 1 . Layers 220 corresponding to the scene description 210 are formed including a background layer 222 corresponding to the background descriptor 212, a main layer 224 corresponding to the main content descriptor 214, a foreground layer 226 corresponding to the foreground descriptor 216, an interactive layer 228 corresponding to the interactive layers descriptor 218, a subtitle layer 229 corresponding to the subtitles descriptor 219, and a HUD layer 221 corresponding to the HUD descriptor 21 1 . A composition 230 corresponding to the layers 220 and scene description 210 is formed including a background feature 232 corresponding to the background layer 222 and background descriptor 212, a main content feature 234 corresponding to the main layer 224 and the main content descriptor 214, a foreground feature 236 corresponding to the foreground layer 226 and the foreground descriptor 216, an interactive feature 238 corresponding to the interactive layer 228 and the interactive layers descriptor 218, a subtitle feature 239 corresponding to the subtitle layer 229 and the subtitles descriptor 219, and a HUD feature 231 corresponding to the HUD layer 221 and the HUD descriptor 21 1 .
Turning now to Figure 3, the method may further provide for the scene configuration, that is, the complete set of configuration parameters for all media objects contained in the scene, to be stored in the general mapping manifests indicated generally by the reference numeral 300. Each mapping manifest is configurable and updated via real-time data streams to update dynamically the scene configuration while the user is immersed within the scene, such as to update a new background layer, and/or update a new layer of interactivity based on user behavior. For example, a video element mapping manifest 340 maps a first video stream 341 , a second video stream 342, a third video stream 343, and/or the like into the scene.
A similar mapping manifest may be implemented for still image content, where the mapping manifests may together make up a general mapping manifest defining the display of general VR content including video, interactive and still image content in a 3D VR scene. For example, an interactive object code mapping manifest 350 maps a first interactive object code stream 351 , a second interactive object code stream 352, a third interactive object code stream 353, and the like into the scene.
The general mapping manifest may be provided to the VR device such that the VR device may decode and properly display provided media according to the mappings of the mapping manifests. There may be several mapping manifests, where each may be provided as a plug-in to VR software, for example, and mapping manifests may be switched, modified/updated and even received on-the-go to allow a change in the scene construction or display on-the-fly.
The presently disclosed technology can also be adapted to create custom solutions for content providers as software can be provided with a mapping manifest adapted for their content organization. Thus, a content provider may offer a VR application embodying the present technology which not only allows users to view their content in VR, but also boasts far greater performance than existing VR applications. It shall be understood that the mapping manifests may be separate data structures or may be incorporated into an encoded stream or web content. For example, the encoded stream may include HTML defining a mapping manifest.
As shown in Figure 4, the method indicated generally by the reference numeral 400 may include a start block 441 , which passes control to a function block 442, which receives a plurality of video streams. The function block 442 passes control to a function block 444, which multiplexes the plurality of video streams to obtain one combined video stream with specific video zones dedicated to initial individual video streams. The function block 444 passes control to a function block 446, which updates the video element mapping manifest with specific properties for each video zone. The function block 446 passes control to a function block 448, which positions specific video zones in the 3D virtual environment defined by position, orientation, and surface geometry. The function block 448, in turn, passes control to an end block 449. The method may further include a start block 451 , which passes control to a function block 452, which receives a plurality of interactive object code streams. The function block 452 passes control to a function block 454, which multiplexes the plurality of interactive object code streams to obtain one combined interactive object code stream with specific interactive object code zones dedicated to initial individual interactive object code streams. The function block 454 passes control to a function block 456, which updates the interactive object code element mapping manifest with specific properties for each interactive object code zone. The function block 456 passes control to a function block 458, which positions specific interactive object code zones in the 3D virtual environment defined by position, orientation, and surface geometry. The function block 458, in turn, passes control to an end block 459.
Thus, the combination of multiple media objects from the same families, such as video elements, image elements, and interactive object code elements, via multiplexing, may be decoded as one media object. The then multiplexed media object contains all media objects from a same family divided into multiple zones which may be represented by areas of pixels. Configuration of each zone is stored and updated in the media object mapping manifest. Each zone is independent in positioning, orientation, visibility, and spatial surface geometry, where configurations from other media objects from the same family are all decoded at the same time. Benefits from combining all media objects into one family and performing decoding activities once offers an improvement in optimization of processing resources.
Turning to Figure 5, a method of decoding, tiling, rendering and compositing is indicated generally by the reference numeral 500. The method 500 receives media elements 540 including media content 542 and media description 544. The media content is passed to a video decoder 546 or an image decoder 548, while the media description is passed to a projection plug-in 560. A tiling unit 562 includes texture 564 drawn from the video decoder and image decoder, and tiling information 566 drawn from the projection plug-in. A rendering unit 567 includes sub-texture 568 drawn from the tiling unit, and 3D geometry 569 drawn from the projection plug-in. The output of the rendering unit is passed to a composition engine 580. The method 500 further receives interactive layer elements 550 including logic 552 and media description 554. The logic is passed to a coded object interpreter or virtual machine 556, while the media description is passed to a projection plug-in 570. A tiling unit 572 includes texture 574 drawn from the coded object interpreter or virtual machine, and tiling information 576 drawn from the projection plug-in 570. A rendering unit 577 includes sub-texture 578 drawn from the tiling unit 572, and 3D geometry 579 drawn from the projection plug-in 570. The output of the rendering unit 577 is passed to a composition engine 590.
Thus, the method may include the combination of media content such as video elements and image elements, or interactive object code elements, respectively, with media description configuration data; labeled together as Media Element and Interactive Layer Element, respectively. The Media Element and Interactive Layer Element are then decoded with Element specific decoding processes, such as a video decoder for video elements, an image decoder for image elements, and a coded object interpreter or virtual machine for interactive layers. Media description configuration feeds a data stream to the projection plugin that will then construct the 3D surface geometry in the 3D scene environment. The Element specific texture is then divided into sub-textures using tiling information data from the Projection Plugin. The sub-texture is then applied to the specific 3D surface geometry previously created by the projection plugin to render layers. Then, the Composition Engine composes the scene by
superimposing all of the rendered layers.
Turning now to Figure 6, two exemplary immersive scenes are indicated generally by the reference numeral 600. A first immersive scene 610 is built from a main video stream 612 comprising equirectangular 360-degree video and a video stream 614 comprising a jumbotron feed. These video streams are mapped using a video element tiling mapping manifest 616 and rendered on a sphere 618. A second immersive scene 620 is built from a main video stream 622 comprising full-dome 180-degree video and a video stream 624 comprising a jumbotron feed. These video streams are mapped using a video element tiling mapping manifest 626 and rendered on a sphere 628.
Thus, an immersive scene built with a spherical video and a floating jumbotron screen would be multiplexed as one video element. The majority of the pixels would be allocated to the main video layer, being the spherical video in this example, and a small portion of the jumbotron feed could be multiplexed in the lower end of the combined video element. Allocated pixels of video for the sphere and for the jumbotron are stored in the mapping manifest.
The rendering engine uses the mapping manifest to perform the scene construction applying the specific zones of the main video to a spatial sphere and to a floating plane in order to build, respectively, the spherical 360-degree video environment with the floating jumbotron. Following from that same example, if in addition to the jumbotron floating screen, a cylindrical scrolling twitter feed and a live stats table needs to be composited in the scene, the twitter feed web element and the live stats web element would be multiplexed into one web page element. Allocated pixels of dynamic HTML for the twitter feed zone would be applied on spatial cylinder geometry where the scrolling twitter feed would appear, and the allocated pixels of dynamic HTML for the live stats floating table would also appear at a specific location as designated in the mapping manifest. There is a link between the position of the zone in the frame buffer and the position in the 3D scene. Configuration and properties of frame buffer zones including video, interactive and image elements, are described in a mapping manifest.
As shown in Figure 7, an exemplary interactive VR system is indicated generally by the reference numeral 700. On a content supply side, the system 700 includes a content management system (CMS) 710, which includes a publisher interface 712 connected to experiences storage 714. Also on the supply side, the system 700 includes a content distribution network (CDN) 720, which includes an operator interface 716 connected to web content storage 718, an images composition unit connected to images storage 724, and a videos composition unit 726 connected to videos storage 728. A mapping manifest may be sent from the CMS to the experience manager.
On a content delivery side, the system 700 includes a client 730. The client 730 includes a user interface 732 connected to an experience manager 734, which also receives input from the experiences storage 714. The
experiences manager, in turn, is connected to a web content composition unit 736, which also receives input from the web content storage 718 and provides output to a browser 738, which, in turn, provides output to a projection and mapping unit 740. An image codec 742 receives input from the images storage 724 and also provides output to the projection and mapping unit 740. A video codec 744 receives input from the videos storage 728 and further provides output to the projection and mapping unit 740. In addition, the projection and mapping unit 740 receives input directly from the experience manager 734 and provides output to a GPU 746.
Turning to Figure 8, a multiplexing unit is indicated generally by the reference numeral 800. The multiplexing unit is configured to combine video sources using predefined layouts matching a projection. The multiplexing unit 800 includes a hardware production portion 810, a CDN 820, a software production portion 830, and a server portion 840 connected to the software production portion. The CDN receives input from both the hardware production portion 810 and the server 840. The hardware production portion 810 includes a live VR camera feed 812 and a broadcast or television feed 814, both connected to a hardware
multiplexing unit 816. The hardware multiplexing unit 816 is connected to a videos storage 822 of the CDN 820.
The software production portion 830 includes a live VR camera feed 832 and a broadcast or television feed 834. The server 840 includes a software multiplexing unit 846, where the software multiplexing unit 846 receives input from both the live VR camera feed 832 and the broadcast or television feed 834 of the hardware production portion 830.
As shown in Figure 9, an exemplary interactive VR system is indicated generally by the reference numeral 900. Here, an experience manager 910 provides from one to n experience templates 920, which are fed into from one to two scene templates 940 from a Tenderer 930. the combined templates are fed to a media unit 960, which inserts video media 970 and image media 980, and feeds the combined media to a projection and format unit 950.
Turning to Figure 10, an experience algorithm is indicated generally by the reference numeral 1000. The algorithm includes a function block 1010 that receives an experience identifier from a user interface such as 732 of Figure 7 and passes an http or https request to a function block 1020. The function block 1020, in turn, fetches experience data such as 920 of Figure 9 from a server such as 840 of Figure 8 via an application programming interface (API), and passes parallel control to three function blocks 1030, 1040 and 1050, respectively. The function block 1030 creates a web page including all web content in different frames, while the function block 1040 creates video decoders, and while the function block 1050 creates image decoders.
Turning to Figure 1 1 , an experience manager algorithm is indicated generally by the reference numeral 1 100. Here, a function block 1 1 10 receives a list of web contents and container sizes from an experience manager such as 734 of Figure 7 and passes control to a function block 1 120. The function block 1 120 creates a web page including frames for each web content and passes control to a function block 1 130. The function block 1 130, in turn, loads locally created web page to a web view or web browser, and passes control to a function block 1 140, which captures the web browser viewport or textual representation to the GPU each time that the viewport is updated.
Turning now to Figure 12, a texture algorithm is indicated generally by the reference numeral 1200. Here, a function block 1210 receives a list of media assets in experience form the experience manager 734 of Figure 7 and/or 910 of Figure 9, and passes control to a function block 1220. For each media element, the function block 1220 places texture on the GPU and stores UVs, vertexes, and indices. Here, the UVs are for projecting a 2D image onto a 3D model's surface for texture mapping, where the letters U and V denote the axes of the 2D texture. This transfers texture by capturing from a video codec, for example, and mapping to a GPU.
Table A includes pseudocode for an exemplary embodiment experience algorithm. Table A
"id": "P0OvmqXpNE9",
"customerld": "5Av0mVxwQ8X",
"title": "My Experience",
"thumbnailUrl": "http://www.site.com/myexperience_thumb.jpg", "defaultBackgroundColor": "#000000",
"defaultBackgroundMediald": null,
"widgetld": 6Bn8ISnAC9z,
"tags": "model, sample, patent",
"scenes": [{
"id": "5Av0mVxwQ8X",
"experienced": "P0OvmqXpNE9",
"title": "Main Scene",
"thumbnailUrl":
://www.site.com/mainscene_thumb.jpg",
"backgroundColor": "#00FF00",
"backgroundMediald": "3OPqmdyzk0b",
"contentMediald": "20Pmqczxl6c",
"startAction": null,
"endAction": null,
"live": false,
"contentMedia": {
"id": "7n9wZ0LmgA6",
"title": "My Video",
"description": "This is our default scene", "author": "SpherePlay",
"date": "2018-06-01 ",
"mediaUrl":
"http://www.site.com/video180.m3u8",
"subtitlesUrl":
"http://www.site.com/subtitles_fr.smi",
"thumbnailUrl":
"http://www.site.com/video180_thumb.jpg",
"type": "video",
"projection": {
"type": "fulldome",
"params": "180"
}
"format": "mono"
},
"backgroundMedia": {
"id": "3OPqmdyzk0b",
"title": "My image",
"description": "This is another scene", "author": "SpherePlay",
"date": "2017-09-27",
"mediaUrl":
"http://www.site.com/image360stereo.mp4",
"subtitlesUrl": null,
"thumbnailUrl":
"http://www.site.com/image360_thumb.jpg",
"type": "image",
"projection": {
"type": "sphere",
"params": null
},
"format": "mono"
}
}
]
}
As shown in Figure 13, an experience manager algorithm comparable to 1 100 of Figure 1 1 , here with exemplary output, is indicated generally by the reference numeral 1300. The algorithm itself comprises function blocks 1310, 1320, 1330 and 1340 comparable to function blocks function blocks 1 1 10, 1 120, 1 130 and 1 140 of Figure 1 1 , so duplicate description may be omitted. Here, an exemplary web page created by the function block 1320 is indicated generally by the reference numeral 1350. This is an example of a web page that could be generated to display three different web contents, shown here as a menu 1352, an advertising banner 1354 and a scoreboard 1356. It shall be understood that the layout and contents may vary. Moreover, the advertising banner, for example, may be tuned to a particular device, user, location, or the like. This tunability may be accomplished by variable content within a manifest or by variably tuned manifests. The result will be loaded into a virtual web view and will not be displayed on a user's display in the layout form 1360, but rather in the visual form 1370. In operation, each part or web content will be sliced, sent to the GPU as needed, and placed in the virtual environment at the designated position.
The technology defined herein may be implemented by a system which comprises computer-readable storage containing instructions for instructing a processing device or processing devices to execute the methods described herein. In one example, the system is implemented on a computing device, such as a smart phone.
The exemplary device may comprise a network. The exemplary device may comprise processing units in a processing entity or processor for performing general and specific processing. The processing entity is at least partially programmable and may include a general-purpose processor as can be found in a smartphone. The processing entity may also include function-specific processing units such as a GPU which may comprise hardware codecs. The device may also comprise computer-readable storage, which may be distributed over several different physical memory entities such as chips, layers, or the like. The computer-readable storage may be accessible by the processing entity for storing/retrieving operational data. The computer-readable memory may also comprise program instructions readable and executable by the processing entity instructing the processing entity to implement the methods described herein.
The network interface may be adapted for receiving virtual reality content such as video, still image and interactive such as web or HTML content. The device may comprise logic for decoding the received content, including, for example, a video decoder for decoding MPEG-4, HEVC, or the like, a still image decoder for decoding JPEG or the like, and an interactive content decoder such as a web browser. The device may also comprise buffers for storing the decoded content including a frame buffer for storing decoded video frames, an image buffer for storing decoded still images and an interactive content buffer for storing interactive content.
The computer-readable storage may also comprise one or more mapping manifest, including one or more general mapping manifest as described herein which may be used by the system to determine how to provide the received VR content in a VR scene. In some embodiments, the device may be adapted for receiving mapping manifests such as over the network interface.
The device comprises computer-executable software code for causing the processing entity to implement one or more of the methods and techniques described herein, including for causing the device to map portions of received video, still image and/or interactive content, such as from respective buffers, onto a 3D VR scene, or more particularly onto surfaces or objects of a 3D VR scene such as texture, and to composite, render and display the resulting scene. The creation of the 3D VR scene may also include generating 3D objects onto which to map the VR content. The creation of the 3D VR objects may be done in accordance with or as instructed by the one or more mapping manifest(s).
In certain embodiment the technology may include receiving a single video stream having images comprising different zones corresponding to different video content associated with different 3D surfaces in a virtual scene, buffering this image data in corresponding buffer zones and texturing the different 3D surfaces with the respective image data from the corresponding buffer zones. This may involve populating a frame buffer with different zones that contain different videos and generating a 3D scene with different surfaces corresponding to different zones. This feature may provide the benefit of providing the entire video content of a virtual reality experience in a single video stream, thereby reducing the resource drain of running multiple video decoders.
Moreover, the present technology may include receiving a single video stream having images comprising different portions corresponding to different 3D geometries in a virtual representation, applying a distortion to the different portions in accordance with their respective 3D geometries and applying them as textures to the respective surfaces of 3D objects having their respective 3D geometry in a virtual scene. For example, different zones may be represented under different geometries in the 3D scene - e.g. zone #1 is stretched on a sphere and zone #2 is a cylinder. This feature may provide the benefit of providing access to both curved and planar content using a single stream with the same advantages as above.
Moreover, the present technology may include receiving and applying at a content-receiving client a mapping manifest comprising mapping instructions for dividing image, video and web content into subdivisions and mapping the content of the subdivisions to respective 3D surfaces in a virtual scene. This too appears to be outside of the concern of the cited references. Usefully, this feature may provide the benefit for a client device to be configured to use a single decoder for each of the three types of media used (video, image, interactive).
Moreover, the present technology may include populating a web cache with web content from a web page and assigning different portion of the web content different respective 3D surfaces in a virtual scene, deriving a displayable output for each of the different portion and displaying each of the displayable on their respective 3D surface. The different portions may further be associated with different geometries, and their displayable output distorted accordingly. Populate a frame buffer with different zones that contain bilateral interactive content such as HTML content. For example, different bilateral interactive zones may be represented under different geometries in the 3D scene - zone #1 is stretched on a sphere and zone #2 is stretched on a cylinder. This feature may allow the use of a web address to obtain all the different interactive elements of a VR experience, and to apply these interactive elements where desired in the scene. For which only a single web browser instance is needed.
Moreover, the present technology may include rendering a 3D scene comprising different 3D objects or surfaces by creating separate compositions for each of the 3D objects or surfaces and superimposing the resulting compositions in decreasing order of distance from a virtual camera. For example, we may render a 3D scene by compositing a superposition of layering. E.g.,
superimposing different separately composited objects of the 3D scene. This may provide the advantage of reducing the computational burden of rendering a 3D scene.
Although various embodiments have been illustrated, this was for the purpose of describing, but not limiting, the present invention. Various possible modifications and different configurations will become apparent to those of ordinary skill in the pertinent art and are within the scope and spirit of the present invention, which is defined more particularly by the attached claims.

Claims

Claims
1 . A method for providing immersive video experiences, the method comprising:
building a mapping manifest;
receiving a plurality of media objects;
multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest; and
providing the mapping manifest and the encoded stream for an immersive video experience.
2. The method of Claim 1 wherein the encoded stream comprises hypertext markup language (HTML) defining the mapping manifest.
3. The method of Claim 1 or 2, further comprising:
decoding the encoded stream; and
rendering the decoded stream in accordance with the mapping manifest.
4. The method of Claim 3, further comprising at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects.
5. The method of Claim 3 or 4 wherein the received plurality of media objects is of a first type, the method further comprising:
receiving a second plurality of media objects of a different second type; multiplexing the received second plurality of media objects into a second encoded stream in accordance with the mapping manifest;
providing the second encoded stream to the immersive device; and decoding the second encoded stream and rendering the second decoded stream on the immersive device.
6. The method of Claim 5 wherein the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
7. The method of Claim 6 wherein the at least one of positioning, orienting, and applying spatial surface geometry to plurality of media objects is done in accordance with the mapping manifest.
8. The method of any one of Claims 1 to 7, wherein the immersive video experience has a full spherical environment field of view.
9. The method of any one of Claims 1 to 7, wherein the immersive video experience has a partial environment field of view.
10. The method of Claim 9 wherein the partial environment field of view comprises a dome.
1 1 . The method of any one of Claims 3 to 7 wherein rendering comprises:
generating image data from the decoded streams using a browser or interpreter; and
based on the mapping manifest, projecting portions of the generated image data upon a surface field of view using a GPU.
12. The method of any one of Claims 3 to 7, further comprising:
projecting media portions of the decoded streams upon a surface field of view using a GPU.
13. A program storage device tangibly embodying instructions executable by a processor for:
building a mapping manifest;
receiving a plurality of media objects; multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest; and
providing the mapping manifest and the encoded stream for an immersive video experience.
14. The device of Claim 13 wherein the received plurality of media objects is of a first type, further comprising instructions for:
receiving a second plurality of media objects of a different second type; multiplexing the received second plurality of media objects into a second encoded stream in accordance with the mapping manifest; and
providing the second encoded stream for the immersive video experience.
15. The device of Claim 14 wherein the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
16. The device of Claim 13, 14 or 15, further comprising instructions for at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects.
17. The device of any one of Claims 13 to 16 wherein the at least one of positioning, orienting, and applying spatial surface geometry to plurality of media objects is done in accordance with the mapping manifest.
18. The device of any one of Claims 13 to 17 wherein the immersive video experience has a full spherical environment field of view.
19. The device of any one of Claims 13 to 17 wherein the immersive video experience has a partial environment field of view.
20. A program storage device tangibly embodying instructions executable by a processor for: receiving at least one encoded stream;
receiving a mapping manifest;
decoding the at least one encoded stream with a single decoder or interpreter; and
rendering the decoded stream as an immersive video experience in accordance with the mapping manifest.
21 . The device of Claim 20 wherein the received encoded stream is of a first type, further comprising instructions for:
receiving a second encoded stream of a second type different from a type of the at least one encoded stream;
decoding the second encoded stream with a second decoder or interpreter; and
rendering the second decoded stream as part of the immersive video experience in accordance with the mapping manifest.
22. The device of Claim 21 wherein the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
23. The device of Claim 20, 21 or 22 wherein the immersive video experience has a full spherical environment field of view.
24. The device of Claim 20, 21 or 22 wherein the immersive video experience has a partial environment field of view.
25. The device of Claim 20, 21 or 22 wherein rendering comprises: generating image data from the decoded streams using a browser or interpreter; and
based on the mapping manifest, projecting portions of the generated image data upon a surface field of view using a GPU.
26. The device of Claim 20, 21 or 22, further comprising instructions for a GPU to project media portions of the decoded streams upon a surface field of view.
PCT/CA2018/050690 2017-06-08 2018-06-08 Building and rendering immersive virtual reality experiences WO2018223241A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762516710P 2017-06-08 2017-06-08
US62/516,710 2017-06-08

Publications (1)

Publication Number Publication Date
WO2018223241A1 true WO2018223241A1 (en) 2018-12-13

Family

ID=64565697

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2018/050690 WO2018223241A1 (en) 2017-06-08 2018-06-08 Building and rendering immersive virtual reality experiences

Country Status (1)

Country Link
WO (1) WO2018223241A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113784105A (en) * 2021-09-10 2021-12-10 上海曼恒数字技术股份有限公司 Information processing method and system for immersive VR terminal

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6567086B1 (en) * 2000-07-25 2003-05-20 Enroute, Inc. Immersive video system using multiple video streams
US20070005795A1 (en) * 1999-10-22 2007-01-04 Activesky, Inc. Object oriented video system
US20120007752A1 (en) * 2007-08-13 2012-01-12 Samsung Electronics Co., Ltd. Method and apparatus for encoding/decoding metadata
US20120131432A1 (en) * 2010-11-24 2012-05-24 Edward Wayne Goddard Systems and methods for delta encoding, transmission and decoding of html forms
WO2016024892A1 (en) * 2014-08-13 2016-02-18 Telefonaktiebolaget L M Ericsson (Publ) Immersive video

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070005795A1 (en) * 1999-10-22 2007-01-04 Activesky, Inc. Object oriented video system
US6567086B1 (en) * 2000-07-25 2003-05-20 Enroute, Inc. Immersive video system using multiple video streams
US20120007752A1 (en) * 2007-08-13 2012-01-12 Samsung Electronics Co., Ltd. Method and apparatus for encoding/decoding metadata
US20120131432A1 (en) * 2010-11-24 2012-05-24 Edward Wayne Goddard Systems and methods for delta encoding, transmission and decoding of html forms
WO2016024892A1 (en) * 2014-08-13 2016-02-18 Telefonaktiebolaget L M Ericsson (Publ) Immersive video

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113784105A (en) * 2021-09-10 2021-12-10 上海曼恒数字技术股份有限公司 Information processing method and system for immersive VR terminal

Similar Documents

Publication Publication Date Title
CN111133763B (en) Superposition processing method and device in 360 video system
JP6657475B2 (en) Method for transmitting omnidirectional video, method for receiving omnidirectional video, transmitting device for omnidirectional video, and receiving device for omnidirectional video
US11651752B2 (en) Method and apparatus for signaling user interactions on overlay and grouping overlays to background for omnidirectional content
CN108702528B (en) Method for transmitting 360 video, method for receiving 360 video, apparatus for transmitting 360 video, and apparatus for receiving 360 video
CN109076255B (en) Method and equipment for sending and receiving 360-degree video
CN111164969B (en) Method and apparatus for transmitting or receiving 6DOF video using stitching and re-projection related metadata
EP3466093B1 (en) Method, device, and computer program for adaptive streaming of virtual reality media content
EP3466091B1 (en) Method, device, and computer program for improving streaming of virtual reality media content
JP7399224B2 (en) Methods, devices and computer programs for transmitting media content
KR102258448B1 (en) Method and apparatus for transmitting and receiving 360-degree video using metadata related to hotspot and ROI
US10409445B2 (en) Rendering of an interactive lean-backward user interface on a television
US11094130B2 (en) Method, an apparatus and a computer program product for video encoding and video decoding
KR102640664B1 (en) A method for controlling VR device and a VR device
CA3069034A1 (en) Image processing method, terminal, and server
US11587200B2 (en) Method and apparatus for enabling multiple timeline support for omnidirectional content playback
EP3712751A1 (en) Method and apparatus for incorporating location awareness in media content
WO2018223241A1 (en) Building and rendering immersive virtual reality experiences
WO2020188142A1 (en) Method and apparatus for grouping entities in media content
WO2023194648A1 (en) A method, an apparatus and a computer program product for media streaming of immersive media
WO2020141995A1 (en) Augmented reality support in omnidirectional media format

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18814029

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18814029

Country of ref document: EP

Kind code of ref document: A1