WO2018223241A1 - Building and rendering immersive virtual reality experiences - Google Patents
Building and rendering immersive virtual reality experiences Download PDFInfo
- Publication number
- WO2018223241A1 WO2018223241A1 PCT/CA2018/050690 CA2018050690W WO2018223241A1 WO 2018223241 A1 WO2018223241 A1 WO 2018223241A1 CA 2018050690 W CA2018050690 W CA 2018050690W WO 2018223241 A1 WO2018223241 A1 WO 2018223241A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mapping
- manifest
- encoded stream
- video
- media objects
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- G—PHYSICS
- G02—OPTICS
- G02B—OPTICAL ELEMENTS, SYSTEMS OR APPARATUS
- G02B27/00—Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
- G02B27/01—Head-up displays
- G02B27/017—Head mounted
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/14—Digital output to display device ; Cooperation and interconnection of the display device with other functional units
- G06F3/147—Digital output to display device ; Cooperation and interconnection of the display device with other functional units using display panels
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09G—ARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
- G09G5/00—Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
- G09G5/36—Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
- G09G5/37—Details of the operation on graphic patterns
- G09G5/377—Details of the operation on graphic patterns for mixing or overlaying two or more graphic patterns
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/156—Mixing image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/161—Encoding, multiplexing or demultiplexing different image signal components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/194—Transmission of image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
-
- G—PHYSICS
- G02—OPTICS
- G02B—OPTICAL ELEMENTS, SYSTEMS OR APPARATUS
- G02B27/00—Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
- G02B27/01—Head-up displays
- G02B27/0179—Display position adjusting means not related to the information to be displayed
- G02B2027/0187—Display position adjusting means not related to the information to be displayed slaved to motion of at least a part of the body of the user, e.g. head, eye
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09G—ARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
- G09G2370/00—Aspects of data communication
- G09G2370/02—Networking aspects
- G09G2370/027—Arrangements and methods specific for the display of internet documents
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09G—ARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
- G09G2370/00—Aspects of data communication
- G09G2370/20—Details of the management of multiple sources of image data
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09G—ARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
- G09G3/00—Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes
- G09G3/001—Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes using specific devices not provided for in groups G09G3/02 - G09G3/36, e.g. using an intermediate record carrier such as a film slide; Projection systems; Display of non-alphanumerical information, solely or in combination with alphanumerical information, e.g. digital display on projected diapositive as background
- G09G3/003—Control arrangements or circuits, of interest only in connection with visual indicators other than cathode-ray tubes using specific devices not provided for in groups G09G3/02 - G09G3/36, e.g. using an intermediate record carrier such as a film slide; Projection systems; Display of non-alphanumerical information, solely or in combination with alphanumerical information, e.g. digital display on projected diapositive as background to produce spatial visual effects
Definitions
- the subject matter disclosed herein relates generally to the field of virtual reality, and more particularly to binocular virtual reality systems.
- the disclosed subject matter further relates to receiving, decoding, rendering, compositing and/or displaying virtual reality content, as well as to methods and systems for manipulating and rendering media objects, such as video, image, and/or interactively coded media objects, to create composited three-dimensional (3D) immersive scenes.
- VR experiences such as 360-degree video experiences
- HMD head-mounted displays
- mobile devices such as tablets and smart phones
- Internet browsers Internet browsers
- TV connected televisions
- set-top boxes Accessibility of such experiences is also growing.
- resolution and processing power is generally increasing
- streaming methods are being optimized, and higher speed Internet connections are becoming more widely available to unlock rapid access to yet higher quality experiences.
- a VR device may simultaneously display video content, still image content, and interactive content. This section does not constitute an admission of prior art. Summary
- An immersive virtual reality (VR) system and related method are provided herein.
- Exemplary embodiments of the present disclosure provide improvements to VR technology that may increase computational efficiency, as well as decrease computation costs and power requirements involved, in the
- VR experiences may increase the flexibility and range of options for displaying VR content and for increasing interactivity within the VR experience.
- An exemplary method for building and rendering 360-degree interactive video experiences, the method comprising receiving a plurality of media objects, and at least one of positioning, orienting, and applying spatial surface geometry to media objects.
- the method may be applied where the at least one of positioning, orienting, and applying spatial surface geometry to media objects is done in accordance with a mapping manifest.
- An exemplary method for providing immersive video experiences includes building a mapping manifest, receiving a plurality of media objects, multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest, and providing at least one of the mapping manifest or the encoded stream for an immersive video experience.
- the method may be applied where the encoded stream comprises hypertext markup language (HTML) defining the mapping manifest.
- HTML hypertext markup language
- the method may further include decoding the encoded stream and rendering the decoded stream in accordance with the mapping manifest.
- the method may be applied where the received plurality of media objects is of a first type, and the method may further include receiving a second plurality of media objects of a different second type, multiplexing the received second plurality of media objects into a second encoded stream in accordance with the mapping manifest, providing the second encoded stream to the immersive device, and decoding the second encoded stream and rendering the second decoded stream on the immersive device.
- the method may be applied where the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
- the method may further include at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects.
- the method may be applied where the at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects is done in accordance with the mapping manifest.
- the method may be applied where the immersive video experience has a full or 360-degree spherical field of view.
- the method may be applied where the immersive video experience has a partial, such as 180-degree dome, field of view.
- An exemplary program storage device tangibly embodying instructions executable by a processor for building a mapping manifest, receiving a plurality of media objects, multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest, and providing at least one of the mapping manifest or the encoded stream for an immersive video experience.
- the device may be applied where the received plurality of media objects is of a first type, and further include instructions for receiving a second plurality of media objects of a different second type, multiplexing the received second plurality of media objects into a second encoded stream in accordance with the mapping manifest, and providing the second encoded stream for the immersive video experience.
- the device may be applied where the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
- the device may further include instructions for at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects.
- the device may be applied where the at least one of positioning, orienting, and applying spatial surface geometry to plurality of media objects is done in accordance with the mapping manifest.
- the device may be applied where the immersive video experience has a full or 360-degree spherical field of view.
- the device may be applied where the immersive video experience has a partial, such as a 180-degree dome, field of view.
- An exemplary program storage device tangibly embodying instructions executable by a processor for receiving at least one encoded stream, receiving a mapping manifest, decoding the at least one encoded stream with a single decoder or interpreter, and rendering the decoded stream as an immersive video experience in accordance with the mapping manifest.
- the device may be applied where the received encoded stream is of a first type and may further include instructions for receiving a second encoded stream of a second type different from a type of the at least one encoded stream, decoding the second encoded stream with a second decoder or interpreter, and rendering the second decoded stream as part of the immersive video experience in accordance with the mapping manifest.
- the device may be applied where the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
- the device may be applied where the immersive video experience has a full or 360-degree spherical field of view.
- the device may be applied where the immersive video experience has a partial, such as 180-degree dome, field of view.
- Figure 1 is a block diagram illustrating the multiple media elements used to perform compositing of the 3D immersive scene in accordance with an exemplary embodiment of the present disclosure
- Figure 2 is a schematic diagram showing conceptually how the multiple media elements and their components are combined to create a 3D immersive scene in accordance with an exemplary embodiment of the present disclosure
- Figure 3 is a block diagram illustrating an exemplary mapping of media elements according to mapping manifests and the process for multiplexing many media object of a family of media element into one element in accordance with an exemplary embodiment of the present disclosure
- Figure 4 is a process diagram illustrating exemplary processes for multiplexing many media objects to enable one decoding activity to construct a scene in accordance with an exemplary embodiment of the present disclosure
- Figure 5 is a block diagram illustrating exemplary processes for a compositing engine to build the 3D environment scene in accordance with an exemplary embodiment of the present disclosure
- Figure 6 is a schematic diagram illustrating exemplary combinations of two video elements to construct a 360-degree (Spherical) and a 180-degree (Full- Dome) 3D scene both with a floating jumbotron-style video element in
- Figure 7 is a block diagram illustrating an interactive VR system having content supply and delivery sides in accordance with an exemplary embodiment of the present disclosure
- Figure 8 is a block diagram illustrating an interactive VR system
- Figure 9 is a block diagram illustrating an interactive VR system with projection based on experience, scene and media in accordance with an exemplary embodiment of the present disclosure
- Figure 10 is a process diagram illustrating an interactive VR process for fetching and implementing experience in accordance with an exemplary embodiment of the present disclosure
- Figure 1 1 is a process diagram illustrating an interactive VR process for capturing web content from viewport to GPU in accordance with an exemplary embodiment of the present disclosure
- Figure 12 is a process diagram illustrating an interactive VR process for placing media object textures on a GPU in accordance with an exemplary embodiment of the present disclosure.
- Figure 13 is a schematic diagram illustrating an interactive VR process for generating an exemplary web page in accordance with an exemplary
- An interactive virtual reality (VR) system and related method are provided herein.
- An exemplary embodiment implements a binocular virtual reality system on a smartphone for an immersive experience. It shall be understood that an immersive experience does not necessarily have to allow a full 360 degrees of rotation or total immersion.
- a feature of the present disclosure is to reduce the computational requirements necessary for user hardware to adequately function in a virtual reality (VR) system.
- One of the challenges in VR displays is hardware meeting the computing power requirements.
- a VR device may simultaneously display video content, still image content, and interactive content. For example, if a head-mounted display (HMD) is used to play a three- dimensional (3D) movie, the 3D movie might be shown on a simulated theater screen a certain distance in front of the viewer.
- HMD head-mounted display
- 3D movie might be shown on a simulated theater screen a certain distance in front of the viewer.
- Foreground imagery which may be still image content, may surround the display showing theater seats, curtains, or the like. There may also be background imagery filling the space behind the scene, such as for example when the viewer turns to look away from the virtual screen. It may be desirable for there to be multiple video feeds shown at the same time, for example, in a virtual display of multiple TV screens, or on a Jumbotron in a virtual arena.
- a VR display may also display interactive content such as Hypertext Mark-up Language (HTML) content obtained from the web.
- HTML Hypertext Mark-up Language
- VR video-on- demand
- VOD video-on- demand
- decoders When content from multiple sources is displayed, multiple instances of decoders may be instantiated.
- video data is typically provided in encoded form, such as Moving Picture Experts Group (MPEG) encoded format, which requires decoding by a codec prior to display. If more than one video source is presented to a viewer, multiple codec instances may be executed to decode the multiple video sources.
- MPEG Moving Picture Experts Group
- JPEG Joint Photographic Experts Group
- Such decoders are computationally costly. In fact, when using a
- smartphone-based VR display for example, battery life is notoriously short and overheating can be a problem.
- An exemplary embodiment of the present disclosure uses a single decoder for each type of data (i.e., a single video codec, a single image codec, and a single browser instance) to decode the data of each respective type.
- a decoder may comprise a non-displayed browser component or interpreter, such as for interpreting hypertext markup language (HTML).
- HTML hypertext markup language
- a 1920x2160 video is provided comprising the content of both videos such that a single codec is used to decode it.
- the same is done for images, and the same is done for interactive content, which in the present example is web data, and more specifically HTML data.
- This is provided in a single web page and decoded using a single browser.
- the computational cost of decoding a double-sized video image is far lower than the cost of decoding two single-sized video images.
- battery life for the user hardware may be greatly improved.
- the output of the decoders is fed to a compositing and rendering engine which receives the decoded data and cuts it out into appropriate pieces, such as separating the two stacked video frames in the 1920x1080 example above, places them in an appropriate location in a 3D space, and renders the images for display on the smart phone screen.
- a compositing and rendering engine which receives the decoded data and cuts it out into appropriate pieces, such as separating the two stacked video frames in the 1920x1080 example above, places them in an appropriate location in a 3D space, and renders the images for display on the smart phone screen.
- An exemplary method and system are provided for building and rendering
- an exemplary embodiment method is provided for compositing a three- dimensional (3D) immersive scene 1 10 from media objects or elements 120 such as video streams 124, image streams 122, cameras 126, screen captures 128, other media elements 125, interactive coded object streams 130 and the like to build the 3D environment world including a background layer 1 12, a main video content layer 1 14, a foreground layer 1 16, a combination of interactive layers 1 18 fixed in space 132, fixed to camera 134, and fixed to main video content layer 136.
- media objects or elements 120 such as video streams 124, image streams 122, cameras 126, screen captures 128, other media elements 125, interactive coded object streams 130 and the like
- 3D environment world including a background layer 1 12, a main video content layer 1 14, a foreground layer 1 16, a combination of interactive layers 1 18 fixed in space 132, fixed to camera 134, and fixed to main video content layer 136.
- a method indicated generally by the reference numeral 200 includes the positioning of media objects at different positions, orientations, and applied on specific spatial surface geometries such as on a sphere, cylinder, cube, dome, plane, and/or custom geometries.
- a scene description 210 includes a background descriptor 212, main content descriptor 214, foreground descriptor 216, interactive layers descriptor 218, subtitles descriptor 219, and a HUD descriptor 21 1 .
- Layers 220 corresponding to the scene description 210 are formed including a background layer 222 corresponding to the background descriptor 212, a main layer 224 corresponding to the main content descriptor 214, a foreground layer 226 corresponding to the foreground descriptor 216, an interactive layer 228 corresponding to the interactive layers descriptor 218, a subtitle layer 229 corresponding to the subtitles descriptor 219, and a HUD layer 221 corresponding to the HUD descriptor 21 1 .
- a composition 230 corresponding to the layers 220 and scene description 210 is formed including a background feature 232 corresponding to the background layer 222 and background descriptor 212, a main content feature 234 corresponding to the main layer 224 and the main content descriptor 214, a foreground feature 236 corresponding to the foreground layer 226 and the foreground descriptor 216, an interactive feature 238 corresponding to the interactive layer 228 and the interactive layers descriptor 218, a subtitle feature 239 corresponding to the subtitle layer 229 and the subtitles descriptor 219, and a HUD feature 231 corresponding to the HUD layer 221 and the HUD descriptor 21 1 .
- the method may further provide for the scene configuration, that is, the complete set of configuration parameters for all media objects contained in the scene, to be stored in the general mapping manifests indicated generally by the reference numeral 300.
- Each mapping manifest is configurable and updated via real-time data streams to update dynamically the scene configuration while the user is immersed within the scene, such as to update a new background layer, and/or update a new layer of interactivity based on user behavior.
- a video element mapping manifest 340 maps a first video stream 341 , a second video stream 342, a third video stream 343, and/or the like into the scene.
- mapping manifests may be implemented for still image content, where the mapping manifests may together make up a general mapping manifest defining the display of general VR content including video, interactive and still image content in a 3D VR scene.
- an interactive object code mapping manifest 350 maps a first interactive object code stream 351 , a second interactive object code stream 352, a third interactive object code stream 353, and the like into the scene.
- the general mapping manifest may be provided to the VR device such that the VR device may decode and properly display provided media according to the mappings of the mapping manifests.
- mapping manifests may be separate data structures or may be incorporated into an encoded stream or web content.
- the encoded stream may include HTML defining a mapping manifest.
- the method indicated generally by the reference numeral 400 may include a start block 441 , which passes control to a function block 442, which receives a plurality of video streams.
- the function block 442 passes control to a function block 444, which multiplexes the plurality of video streams to obtain one combined video stream with specific video zones dedicated to initial individual video streams.
- the function block 444 passes control to a function block 446, which updates the video element mapping manifest with specific properties for each video zone.
- the function block 446 passes control to a function block 448, which positions specific video zones in the 3D virtual environment defined by position, orientation, and surface geometry.
- the function block 448 passes control to an end block 449.
- the method may further include a start block 451 , which passes control to a function block 452, which receives a plurality of interactive object code streams.
- the function block 452 passes control to a function block 454, which multiplexes the plurality of interactive object code streams to obtain one combined interactive object code stream with specific interactive object code zones dedicated to initial individual interactive object code streams.
- the function block 454 passes control to a function block 456, which updates the interactive object code element mapping manifest with specific properties for each interactive object code zone.
- the function block 456 passes control to a function block 458, which positions specific interactive object code zones in the 3D virtual environment defined by position, orientation, and surface geometry.
- the combination of multiple media objects from the same families, such as video elements, image elements, and interactive object code elements, via multiplexing may be decoded as one media object.
- the then multiplexed media object contains all media objects from a same family divided into multiple zones which may be represented by areas of pixels. Configuration of each zone is stored and updated in the media object mapping manifest. Each zone is independent in positioning, orientation, visibility, and spatial surface geometry, where configurations from other media objects from the same family are all decoded at the same time. Benefits from combining all media objects into one family and performing decoding activities once offers an improvement in optimization of processing resources.
- a method of decoding, tiling, rendering and compositing is indicated generally by the reference numeral 500.
- the method 500 receives media elements 540 including media content 542 and media description 544.
- the media content is passed to a video decoder 546 or an image decoder 548, while the media description is passed to a projection plug-in 560.
- a tiling unit 562 includes texture 564 drawn from the video decoder and image decoder, and tiling information 566 drawn from the projection plug-in.
- a rendering unit 567 includes sub-texture 568 drawn from the tiling unit, and 3D geometry 569 drawn from the projection plug-in.
- the output of the rendering unit is passed to a composition engine 580.
- the method 500 further receives interactive layer elements 550 including logic 552 and media description 554.
- the logic is passed to a coded object interpreter or virtual machine 556, while the media description is passed to a projection plug-in 570.
- a tiling unit 572 includes texture 574 drawn from the coded object interpreter or virtual machine, and tiling information 576 drawn from the projection plug-in 570.
- a rendering unit 577 includes sub-texture 578 drawn from the tiling unit 572, and 3D geometry 579 drawn from the projection plug-in 570. The output of the rendering unit 577 is passed to a composition engine 590.
- the method may include the combination of media content such as video elements and image elements, or interactive object code elements, respectively, with media description configuration data; labeled together as Media Element and Interactive Layer Element, respectively.
- the Media Element and Interactive Layer Element are then decoded with Element specific decoding processes, such as a video decoder for video elements, an image decoder for image elements, and a coded object interpreter or virtual machine for interactive layers.
- Element specific decoding processes such as a video decoder for video elements, an image decoder for image elements, and a coded object interpreter or virtual machine for interactive layers.
- Media description configuration feeds a data stream to the projection plugin that will then construct the 3D surface geometry in the 3D scene environment.
- the Element specific texture is then divided into sub-textures using tiling information data from the Projection Plugin.
- the sub-texture is then applied to the specific 3D surface geometry previously created by the projection plugin to render layers.
- the Composition Engine composes the scene by
- a first immersive scene 610 is built from a main video stream 612 comprising equirectangular 360-degree video and a video stream 614 comprising a jumbotron feed. These video streams are mapped using a video element tiling mapping manifest 616 and rendered on a sphere 618.
- a second immersive scene 620 is built from a main video stream 622 comprising full-dome 180-degree video and a video stream 624 comprising a jumbotron feed. These video streams are mapped using a video element tiling mapping manifest 626 and rendered on a sphere 628.
- an immersive scene built with a spherical video and a floating jumbotron screen would be multiplexed as one video element.
- the majority of the pixels would be allocated to the main video layer, being the spherical video in this example, and a small portion of the jumbotron feed could be multiplexed in the lower end of the combined video element.
- Allocated pixels of video for the sphere and for the jumbotron are stored in the mapping manifest.
- the rendering engine uses the mapping manifest to perform the scene construction applying the specific zones of the main video to a spatial sphere and to a floating plane in order to build, respectively, the spherical 360-degree video environment with the floating jumbotron.
- the twitter feed web element and the live stats web element would be multiplexed into one web page element.
- Allocated pixels of dynamic HTML for the twitter feed zone would be applied on spatial cylinder geometry where the scrolling twitter feed would appear, and the allocated pixels of dynamic HTML for the live stats floating table would also appear at a specific location as designated in the mapping manifest.
- an exemplary interactive VR system is indicated generally by the reference numeral 700.
- the system 700 includes a content management system (CMS) 710, which includes a publisher interface 712 connected to experiences storage 714.
- CMS content management system
- CDN content distribution network
- the system 700 includes an operator interface 716 connected to web content storage 718, an images composition unit connected to images storage 724, and a videos composition unit 726 connected to videos storage 728.
- a mapping manifest may be sent from the CMS to the experience manager.
- the system 700 includes a client 730.
- the client 730 includes a user interface 732 connected to an experience manager 734, which also receives input from the experiences storage 714.
- experiences manager in turn, is connected to a web content composition unit 736, which also receives input from the web content storage 718 and provides output to a browser 738, which, in turn, provides output to a projection and mapping unit 740.
- An image codec 742 receives input from the images storage 724 and also provides output to the projection and mapping unit 740.
- a video codec 744 receives input from the videos storage 728 and further provides output to the projection and mapping unit 740.
- the projection and mapping unit 740 receives input directly from the experience manager 734 and provides output to a GPU 746.
- a multiplexing unit is indicated generally by the reference numeral 800.
- the multiplexing unit is configured to combine video sources using predefined layouts matching a projection.
- the multiplexing unit 800 includes a hardware production portion 810, a CDN 820, a software production portion 830, and a server portion 840 connected to the software production portion.
- the CDN receives input from both the hardware production portion 810 and the server 840.
- the hardware production portion 810 includes a live VR camera feed 812 and a broadcast or television feed 814, both connected to a hardware
- the hardware multiplexing unit 816 is connected to a videos storage 822 of the CDN 820.
- the software production portion 830 includes a live VR camera feed 832 and a broadcast or television feed 834.
- the server 840 includes a software multiplexing unit 846, where the software multiplexing unit 846 receives input from both the live VR camera feed 832 and the broadcast or television feed 834 of the hardware production portion 830.
- an exemplary interactive VR system is indicated generally by the reference numeral 900.
- an experience manager 910 provides from one to n experience templates 920, which are fed into from one to two scene templates 940 from a Tenderer 930. the combined templates are fed to a media unit 960, which inserts video media 970 and image media 980, and feeds the combined media to a projection and format unit 950.
- an experience algorithm is indicated generally by the reference numeral 1000.
- the algorithm includes a function block 1010 that receives an experience identifier from a user interface such as 732 of Figure 7 and passes an http or https request to a function block 1020.
- the function block 1020 fetches experience data such as 920 of Figure 9 from a server such as 840 of Figure 8 via an application programming interface (API), and passes parallel control to three function blocks 1030, 1040 and 1050, respectively.
- the function block 1030 creates a web page including all web content in different frames, while the function block 1040 creates video decoders, and while the function block 1050 creates image decoders.
- a function block 1 1 10 receives a list of web contents and container sizes from an experience manager such as 734 of Figure 7 and passes control to a function block 1 120.
- the function block 1 120 creates a web page including frames for each web content and passes control to a function block 1 130.
- the function block 1 130 loads locally created web page to a web view or web browser, and passes control to a function block 1 140, which captures the web browser viewport or textual representation to the GPU each time that the viewport is updated.
- a texture algorithm is indicated generally by the reference numeral 1200.
- a function block 1210 receives a list of media assets in experience form the experience manager 734 of Figure 7 and/or 910 of Figure 9, and passes control to a function block 1220.
- the function block 1220 places texture on the GPU and stores UVs, vertexes, and indices.
- the UVs are for projecting a 2D image onto a 3D model's surface for texture mapping, where the letters U and V denote the axes of the 2D texture. This transfers texture by capturing from a video codec, for example, and mapping to a GPU.
- Table A includes pseudocode for an exemplary embodiment experience algorithm. Table A
- an experience manager algorithm comparable to 1 100 of Figure 1 1 is indicated generally by the reference numeral 1300.
- the algorithm itself comprises function blocks 1310, 1320, 1330 and 1340 comparable to function blocks function blocks 1 1 10, 1 120, 1 130 and 1 140 of Figure 1 1 , so duplicate description may be omitted.
- an exemplary web page created by the function block 1320 is indicated generally by the reference numeral 1350. This is an example of a web page that could be generated to display three different web contents, shown here as a menu 1352, an advertising banner 1354 and a scoreboard 1356. It shall be understood that the layout and contents may vary. Moreover, the advertising banner, for example, may be tuned to a particular device, user, location, or the like.
- This tunability may be accomplished by variable content within a manifest or by variably tuned manifests.
- the result will be loaded into a virtual web view and will not be displayed on a user's display in the layout form 1360, but rather in the visual form 1370.
- each part or web content will be sliced, sent to the GPU as needed, and placed in the virtual environment at the designated position.
- the technology defined herein may be implemented by a system which comprises computer-readable storage containing instructions for instructing a processing device or processing devices to execute the methods described herein.
- the system is implemented on a computing device, such as a smart phone.
- the exemplary device may comprise a network.
- the exemplary device may comprise processing units in a processing entity or processor for performing general and specific processing.
- the processing entity is at least partially programmable and may include a general-purpose processor as can be found in a smartphone.
- the processing entity may also include function-specific processing units such as a GPU which may comprise hardware codecs.
- the device may also comprise computer-readable storage, which may be distributed over several different physical memory entities such as chips, layers, or the like.
- the computer-readable storage may be accessible by the processing entity for storing/retrieving operational data.
- the computer-readable memory may also comprise program instructions readable and executable by the processing entity instructing the processing entity to implement the methods described herein.
- the network interface may be adapted for receiving virtual reality content such as video, still image and interactive such as web or HTML content.
- the device may comprise logic for decoding the received content, including, for example, a video decoder for decoding MPEG-4, HEVC, or the like, a still image decoder for decoding JPEG or the like, and an interactive content decoder such as a web browser.
- the device may also comprise buffers for storing the decoded content including a frame buffer for storing decoded video frames, an image buffer for storing decoded still images and an interactive content buffer for storing interactive content.
- the computer-readable storage may also comprise one or more mapping manifest, including one or more general mapping manifest as described herein which may be used by the system to determine how to provide the received VR content in a VR scene.
- the device may be adapted for receiving mapping manifests such as over the network interface.
- the device comprises computer-executable software code for causing the processing entity to implement one or more of the methods and techniques described herein, including for causing the device to map portions of received video, still image and/or interactive content, such as from respective buffers, onto a 3D VR scene, or more particularly onto surfaces or objects of a 3D VR scene such as texture, and to composite, render and display the resulting scene.
- the creation of the 3D VR scene may also include generating 3D objects onto which to map the VR content.
- the creation of the 3D VR objects may be done in accordance with or as instructed by the one or more mapping manifest(s).
- the technology may include receiving a single video stream having images comprising different zones corresponding to different video content associated with different 3D surfaces in a virtual scene, buffering this image data in corresponding buffer zones and texturing the different 3D surfaces with the respective image data from the corresponding buffer zones. This may involve populating a frame buffer with different zones that contain different videos and generating a 3D scene with different surfaces corresponding to different zones. This feature may provide the benefit of providing the entire video content of a virtual reality experience in a single video stream, thereby reducing the resource drain of running multiple video decoders.
- the present technology may include receiving a single video stream having images comprising different portions corresponding to different 3D geometries in a virtual representation, applying a distortion to the different portions in accordance with their respective 3D geometries and applying them as textures to the respective surfaces of 3D objects having their respective 3D geometry in a virtual scene.
- different zones may be represented under different geometries in the 3D scene - e.g. zone #1 is stretched on a sphere and zone #2 is a cylinder. This feature may provide the benefit of providing access to both curved and planar content using a single stream with the same advantages as above.
- the present technology may include receiving and applying at a content-receiving client a mapping manifest comprising mapping instructions for dividing image, video and web content into subdivisions and mapping the content of the subdivisions to respective 3D surfaces in a virtual scene.
- a mapping manifest comprising mapping instructions for dividing image, video and web content into subdivisions and mapping the content of the subdivisions to respective 3D surfaces in a virtual scene.
- the present technology may include populating a web cache with web content from a web page and assigning different portion of the web content different respective 3D surfaces in a virtual scene, deriving a displayable output for each of the different portion and displaying each of the displayable on their respective 3D surface.
- the different portions may further be associated with different geometries, and their displayable output distorted accordingly.
- the present technology may include rendering a 3D scene comprising different 3D objects or surfaces by creating separate compositions for each of the 3D objects or surfaces and superimposing the resulting compositions in decreasing order of distance from a virtual camera.
- a 3D scene by compositing a superposition of layering.
Abstract
A virtual reality (VR) technology is provided that reduces the computational overhead required to present VR content and improves efficiency and battery life, including building and rendering immersive video experiences by building a mapping manifest; receiving a plurality of media objects; multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest; providing at least one of the mapping manifest or the encoded stream; and decoding the encoded stream and rendering the decoded stream in accordance with the mapping manifest.
Description
BUILDING AND RENDERING IMMERSIVE VIRTUAL REALITY EXPERIENCES
Cross-Reference
This application claims priority to U.S. provisional patent application no. 62/516,710 filed June 8, 2017, the content of which is hereby incorporated by reference.
Technical Field
The subject matter disclosed herein relates generally to the field of virtual reality, and more particularly to binocular virtual reality systems. The disclosed subject matter further relates to receiving, decoding, rendering, compositing and/or displaying virtual reality content, as well as to methods and systems for manipulating and rendering media objects, such as video, image, and/or interactively coded media objects, to create composited three-dimensional (3D) immersive scenes.
Background
Virtual Reality (VR) experiences, such as 360-degree video experiences, are growing in popularity. They may be consumed via head-mounted displays (HMD), mobile devices such as tablets and smart phones, Internet browsers, connected televisions (TV), and/or set-top boxes. Accessibility of such experiences is also growing. For example, resolution and processing power is generally increasing, streaming methods are being optimized, and higher speed Internet connections are becoming more widely available to unlock rapid access to yet higher quality experiences.
One of the challenges in VR displays is hardware meeting the computing power requirements. In a typical interaction, a VR device may simultaneously display video content, still image content, and interactive content. This section does not constitute an admission of prior art.
Summary
An immersive virtual reality (VR) system and related method are provided herein. Exemplary embodiments of the present disclosure provide improvements to VR technology that may increase computational efficiency, as well as decrease computation costs and power requirements involved, in the
presentation of VR experiences. Optionally, they may increase the flexibility and range of options for displaying VR content and for increasing interactivity within the VR experience.
An exemplary method is provided for building and rendering 360-degree interactive video experiences, the method comprising receiving a plurality of media objects, and at least one of positioning, orienting, and applying spatial surface geometry to media objects. The method may be applied where the at least one of positioning, orienting, and applying spatial surface geometry to media objects is done in accordance with a mapping manifest.
An exemplary method for providing immersive video experiences includes building a mapping manifest, receiving a plurality of media objects, multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest, and providing at least one of the mapping manifest or the encoded stream for an immersive video experience. The method may be applied where the encoded stream comprises hypertext markup language (HTML) defining the mapping manifest.
The method may further include decoding the encoded stream and rendering the decoded stream in accordance with the mapping manifest. The method may be applied where the received plurality of media objects is of a first type, and the method may further include receiving a second plurality of media objects of a different second type, multiplexing the received second plurality of media objects into a second encoded stream in accordance with the mapping manifest, providing the second encoded stream to the immersive device, and decoding the second encoded stream and rendering the second decoded stream on the immersive device. The method may be applied where the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
The method may further include at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects. The method may be applied where the at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects is done in accordance with the mapping manifest.
The method may be applied where the immersive video experience has a full or 360-degree spherical field of view. The method may be applied where the immersive video experience has a partial, such as 180-degree dome, field of view.
An exemplary program storage device is provided, tangibly embodying instructions executable by a processor for building a mapping manifest, receiving a plurality of media objects, multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest, and providing at least one of the mapping manifest or the encoded stream for an immersive video experience.
The device may be applied where the received plurality of media objects is of a first type, and further include instructions for receiving a second plurality of media objects of a different second type, multiplexing the received second plurality of media objects into a second encoded stream in accordance with the mapping manifest, and providing the second encoded stream for the immersive video experience. The device may be applied where the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
The device may further include instructions for at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects. The device may be applied where the at least one of positioning, orienting, and applying spatial surface geometry to plurality of media objects is done in accordance with the mapping manifest. The device may be applied where the immersive video experience has a full or 360-degree spherical field of view. The device may be applied where the immersive video experience has a partial, such as a 180-degree dome, field of view.
An exemplary program storage device is provided, tangibly embodying instructions executable by a processor for receiving at least one encoded stream, receiving a mapping manifest, decoding the at least one encoded stream with a single decoder or interpreter, and rendering the decoded stream as an immersive video experience in accordance with the mapping manifest.
The device may be applied where the received encoded stream is of a first type and may further include instructions for receiving a second encoded stream of a second type different from a type of the at least one encoded stream, decoding the second encoded stream with a second decoder or interpreter, and rendering the second decoded stream as part of the immersive video experience in accordance with the mapping manifest. The device may be applied where the first type is one of video, image or web content, and the second type is a different one of video, image or web content. The device may be applied where the immersive video experience has a full or 360-degree spherical field of view. The device may be applied where the immersive video experience has a partial, such as 180-degree dome, field of view.
Brief Description of the Drawings
The present disclosure may be better understood by way of the following detailed description when considered with reference to the appended drawings, in which:
Figure 1 is a block diagram illustrating the multiple media elements used to perform compositing of the 3D immersive scene in accordance with an exemplary embodiment of the present disclosure;
Figure 2 is a schematic diagram showing conceptually how the multiple media elements and their components are combined to create a 3D immersive scene in accordance with an exemplary embodiment of the present disclosure;
Figure 3 is a block diagram illustrating an exemplary mapping of media elements according to mapping manifests and the process for multiplexing many media object of a family of media element into one element in accordance with an exemplary embodiment of the present disclosure;
Figure 4 is a process diagram illustrating exemplary processes for multiplexing many media objects to enable one decoding activity to construct a scene in accordance with an exemplary embodiment of the present disclosure;
Figure 5 is a block diagram illustrating exemplary processes for a compositing engine to build the 3D environment scene in accordance with an exemplary embodiment of the present disclosure;
Figure 6 is a schematic diagram illustrating exemplary combinations of two video elements to construct a 360-degree (Spherical) and a 180-degree (Full- Dome) 3D scene both with a floating jumbotron-style video element in
accordance with an exemplary embodiment of the present disclosure;
Figure 7 is a block diagram illustrating an interactive VR system having content supply and delivery sides in accordance with an exemplary embodiment of the present disclosure;
Figure 8 is a block diagram illustrating an interactive VR system
configured to combine video sources using a predefined layout matching a projection in accordance with an exemplary embodiment of the present disclosure;
Figure 9 is a block diagram illustrating an interactive VR system with projection based on experience, scene and media in accordance with an exemplary embodiment of the present disclosure;
Figure 10 is a process diagram illustrating an interactive VR process for fetching and implementing experience in accordance with an exemplary embodiment of the present disclosure;
Figure 1 1 is a process diagram illustrating an interactive VR process for capturing web content from viewport to GPU in accordance with an exemplary embodiment of the present disclosure;
Figure 12 is a process diagram illustrating an interactive VR process for placing media object textures on a GPU in accordance with an exemplary embodiment of the present disclosure; and
Figure 13 is a schematic diagram illustrating an interactive VR process for generating an exemplary web page in accordance with an exemplary
embodiment of the present disclosure.
Detailed Description
An interactive virtual reality (VR) system and related method are provided herein. An exemplary embodiment implements a binocular virtual reality system on a smartphone for an immersive experience. It shall be understood that an immersive experience does not necessarily have to allow a full 360 degrees of rotation or total immersion. A feature of the present disclosure is to reduce the computational requirements necessary for user hardware to adequately function in a virtual reality (VR) system. One of the challenges in VR displays is hardware meeting the computing power requirements. In a typical interaction, a VR device may simultaneously display video content, still image content, and interactive content. For example, if a head-mounted display (HMD) is used to play a three- dimensional (3D) movie, the 3D movie might be shown on a simulated theater screen a certain distance in front of the viewer. Foreground imagery, which may be still image content, may surround the display showing theater seats, curtains, or the like. There may also be background imagery filling the space behind the scene, such as for example when the viewer turns to look away from the virtual screen. It may be desirable for there to be multiple video feeds shown at the same time, for example, in a virtual display of multiple TV screens, or on a Jumbotron in a virtual arena. A VR display may also display interactive content such as Hypertext Mark-up Language (HTML) content obtained from the web.
When consuming a VR experience, which might include watching a 360- degree video, for example, such video might be live or pre-recorded video-on- demand (VOD). Either way, the user, even though they may feel immersed in the experience, may wish to increase engagement with enrichment layers of interactivity or additional information.
When content from multiple sources is displayed, multiple instances of decoders may be instantiated. For example, video data is typically provided in encoded form, such as Moving Picture Experts Group (MPEG) encoded format, which requires decoding by a codec prior to display. If more than one video source is presented to a viewer, multiple codec instances may be executed to decode the multiple video sources. The same goes for images, which are
typically encoded in Joint Photographic Experts Group (JPEG) form, and web content where multiple browser instances may be used to decode the HTML. Such decoders are computationally costly. In fact, when using a
smartphone-based VR display, for example, battery life is notoriously short and overheating can be a problem.
An exemplary embodiment of the present disclosure uses a single decoder for each type of data (i.e., a single video codec, a single image codec, and a single browser instance) to decode the data of each respective type. That is, a decoder may comprise a non-displayed browser component or interpreter, such as for interpreting hypertext markup language (HTML). Rather than decoding multiple videos using multiple codecs, those videos are merged into a single larger video which is then decoded at the user hardware using a single codec. For example, if two 1920x1080 videos are to be a displayed such as for left and right stereoscopic viewing, a 1920x2160 video is provided comprising the content of both videos such that a single codec is used to decode it. The same is done for images, and the same is done for interactive content, which in the present example is web data, and more specifically HTML data. This is provided in a single web page and decoded using a single browser. The computational cost of decoding a double-sized video image is far lower than the cost of decoding two single-sized video images. Hence there is a significant net savings in computational cost at the user hardware. Thus, for example, battery life for the user hardware may be greatly improved.
The output of the decoders is fed to a compositing and rendering engine which receives the decoded data and cuts it out into appropriate pieces, such as separating the two stacked video frames in the 1920x1080 example above, places them in an appropriate location in a 3D space, and renders the images for display on the smart phone screen. Although this is described in the context of a smart phone-based VR unit, it shall be understood that the presently disclosed technique has application in any type of VR unit.
An exemplary method and system are provided for building and rendering
360-degree interactive video experiences and the like. The description is intended to provide a skilled and knowledgeable reader with the ability to
understand the present disclosure. To this end, examples are used to illustrate possible implementations. This is not intended to be limiting, and those of ordinary skill in the pertinent art may recognize possible variants and
modifications falling within the scope and spirit of the present disclosure.
As shown in Figure 1 , an exemplary embodiment method, indicated generally by the reference numeral 100, is provided for compositing a three- dimensional (3D) immersive scene 1 10 from media objects or elements 120 such as video streams 124, image streams 122, cameras 126, screen captures 128, other media elements 125, interactive coded object streams 130 and the like to build the 3D environment world including a background layer 1 12, a main video content layer 1 14, a foreground layer 1 16, a combination of interactive layers 1 18 fixed in space 132, fixed to camera 134, and fixed to main video content layer 136.
Turning to Figure 2. a method indicated generally by the reference numeral 200 includes the positioning of media objects at different positions, orientations, and applied on specific spatial surface geometries such as on a sphere, cylinder, cube, dome, plane, and/or custom geometries. For example, a scene description 210 includes a background descriptor 212, main content descriptor 214, foreground descriptor 216, interactive layers descriptor 218, subtitles descriptor 219, and a HUD descriptor 21 1 . Layers 220 corresponding to the scene description 210 are formed including a background layer 222 corresponding to the background descriptor 212, a main layer 224 corresponding to the main content descriptor 214, a foreground layer 226 corresponding to the foreground descriptor 216, an interactive layer 228 corresponding to the interactive layers descriptor 218, a subtitle layer 229 corresponding to the subtitles descriptor 219, and a HUD layer 221 corresponding to the HUD descriptor 21 1 . A composition 230 corresponding to the layers 220 and scene description 210 is formed including a background feature 232 corresponding to the background layer 222 and background descriptor 212, a main content feature 234 corresponding to the main layer 224 and the main content descriptor 214, a foreground feature 236 corresponding to the foreground layer 226 and the foreground descriptor 216, an interactive feature 238 corresponding to the
interactive layer 228 and the interactive layers descriptor 218, a subtitle feature 239 corresponding to the subtitle layer 229 and the subtitles descriptor 219, and a HUD feature 231 corresponding to the HUD layer 221 and the HUD descriptor 21 1 .
Turning now to Figure 3, the method may further provide for the scene configuration, that is, the complete set of configuration parameters for all media objects contained in the scene, to be stored in the general mapping manifests indicated generally by the reference numeral 300. Each mapping manifest is configurable and updated via real-time data streams to update dynamically the scene configuration while the user is immersed within the scene, such as to update a new background layer, and/or update a new layer of interactivity based on user behavior. For example, a video element mapping manifest 340 maps a first video stream 341 , a second video stream 342, a third video stream 343, and/or the like into the scene.
A similar mapping manifest may be implemented for still image content, where the mapping manifests may together make up a general mapping manifest defining the display of general VR content including video, interactive and still image content in a 3D VR scene. For example, an interactive object code mapping manifest 350 maps a first interactive object code stream 351 , a second interactive object code stream 352, a third interactive object code stream 353, and the like into the scene.
The general mapping manifest may be provided to the VR device such that the VR device may decode and properly display provided media according to the mappings of the mapping manifests. There may be several mapping manifests, where each may be provided as a plug-in to VR software, for example, and mapping manifests may be switched, modified/updated and even received on-the-go to allow a change in the scene construction or display on-the-fly.
The presently disclosed technology can also be adapted to create custom solutions for content providers as software can be provided with a mapping manifest adapted for their content organization. Thus, a content provider may offer a VR application embodying the present technology which not only allows users to view their content in VR, but also boasts far greater performance than
existing VR applications. It shall be understood that the mapping manifests may be separate data structures or may be incorporated into an encoded stream or web content. For example, the encoded stream may include HTML defining a mapping manifest.
As shown in Figure 4, the method indicated generally by the reference numeral 400 may include a start block 441 , which passes control to a function block 442, which receives a plurality of video streams. The function block 442 passes control to a function block 444, which multiplexes the plurality of video streams to obtain one combined video stream with specific video zones dedicated to initial individual video streams. The function block 444 passes control to a function block 446, which updates the video element mapping manifest with specific properties for each video zone. The function block 446 passes control to a function block 448, which positions specific video zones in the 3D virtual environment defined by position, orientation, and surface geometry. The function block 448, in turn, passes control to an end block 449. The method may further include a start block 451 , which passes control to a function block 452, which receives a plurality of interactive object code streams. The function block 452 passes control to a function block 454, which multiplexes the plurality of interactive object code streams to obtain one combined interactive object code stream with specific interactive object code zones dedicated to initial individual interactive object code streams. The function block 454 passes control to a function block 456, which updates the interactive object code element mapping manifest with specific properties for each interactive object code zone. The function block 456 passes control to a function block 458, which positions specific interactive object code zones in the 3D virtual environment defined by position, orientation, and surface geometry. The function block 458, in turn, passes control to an end block 459.
Thus, the combination of multiple media objects from the same families, such as video elements, image elements, and interactive object code elements, via multiplexing, may be decoded as one media object. The then multiplexed media object contains all media objects from a same family divided into multiple zones which may be represented by areas of pixels. Configuration of each zone
is stored and updated in the media object mapping manifest. Each zone is independent in positioning, orientation, visibility, and spatial surface geometry, where configurations from other media objects from the same family are all decoded at the same time. Benefits from combining all media objects into one family and performing decoding activities once offers an improvement in optimization of processing resources.
Turning to Figure 5, a method of decoding, tiling, rendering and compositing is indicated generally by the reference numeral 500. The method 500 receives media elements 540 including media content 542 and media description 544. The media content is passed to a video decoder 546 or an image decoder 548, while the media description is passed to a projection plug-in 560. A tiling unit 562 includes texture 564 drawn from the video decoder and image decoder, and tiling information 566 drawn from the projection plug-in. A rendering unit 567 includes sub-texture 568 drawn from the tiling unit, and 3D geometry 569 drawn from the projection plug-in. The output of the rendering unit is passed to a composition engine 580. The method 500 further receives interactive layer elements 550 including logic 552 and media description 554. The logic is passed to a coded object interpreter or virtual machine 556, while the media description is passed to a projection plug-in 570. A tiling unit 572 includes texture 574 drawn from the coded object interpreter or virtual machine, and tiling information 576 drawn from the projection plug-in 570. A rendering unit 577 includes sub-texture 578 drawn from the tiling unit 572, and 3D geometry 579 drawn from the projection plug-in 570. The output of the rendering unit 577 is passed to a composition engine 590.
Thus, the method may include the combination of media content such as video elements and image elements, or interactive object code elements, respectively, with media description configuration data; labeled together as Media Element and Interactive Layer Element, respectively. The Media Element and Interactive Layer Element are then decoded with Element specific decoding processes, such as a video decoder for video elements, an image decoder for image elements, and a coded object interpreter or virtual machine for interactive layers. Media description configuration feeds a data stream to the projection
plugin that will then construct the 3D surface geometry in the 3D scene environment. The Element specific texture is then divided into sub-textures using tiling information data from the Projection Plugin. The sub-texture is then applied to the specific 3D surface geometry previously created by the projection plugin to render layers. Then, the Composition Engine composes the scene by
superimposing all of the rendered layers.
Turning now to Figure 6, two exemplary immersive scenes are indicated generally by the reference numeral 600. A first immersive scene 610 is built from a main video stream 612 comprising equirectangular 360-degree video and a video stream 614 comprising a jumbotron feed. These video streams are mapped using a video element tiling mapping manifest 616 and rendered on a sphere 618. A second immersive scene 620 is built from a main video stream 622 comprising full-dome 180-degree video and a video stream 624 comprising a jumbotron feed. These video streams are mapped using a video element tiling mapping manifest 626 and rendered on a sphere 628.
Thus, an immersive scene built with a spherical video and a floating jumbotron screen would be multiplexed as one video element. The majority of the pixels would be allocated to the main video layer, being the spherical video in this example, and a small portion of the jumbotron feed could be multiplexed in the lower end of the combined video element. Allocated pixels of video for the sphere and for the jumbotron are stored in the mapping manifest.
The rendering engine uses the mapping manifest to perform the scene construction applying the specific zones of the main video to a spatial sphere and to a floating plane in order to build, respectively, the spherical 360-degree video environment with the floating jumbotron. Following from that same example, if in addition to the jumbotron floating screen, a cylindrical scrolling twitter feed and a live stats table needs to be composited in the scene, the twitter feed web element and the live stats web element would be multiplexed into one web page element. Allocated pixels of dynamic HTML for the twitter feed zone would be applied on spatial cylinder geometry where the scrolling twitter feed would appear, and the allocated pixels of dynamic HTML for the live stats floating table would also appear at a specific location as designated in the mapping manifest.
There is a link between the position of the zone in the frame buffer and the position in the 3D scene. Configuration and properties of frame buffer zones including video, interactive and image elements, are described in a mapping manifest.
As shown in Figure 7, an exemplary interactive VR system is indicated generally by the reference numeral 700. On a content supply side, the system 700 includes a content management system (CMS) 710, which includes a publisher interface 712 connected to experiences storage 714. Also on the supply side, the system 700 includes a content distribution network (CDN) 720, which includes an operator interface 716 connected to web content storage 718, an images composition unit connected to images storage 724, and a videos composition unit 726 connected to videos storage 728. A mapping manifest may be sent from the CMS to the experience manager.
On a content delivery side, the system 700 includes a client 730. The client 730 includes a user interface 732 connected to an experience manager 734, which also receives input from the experiences storage 714. The
experiences manager, in turn, is connected to a web content composition unit 736, which also receives input from the web content storage 718 and provides output to a browser 738, which, in turn, provides output to a projection and mapping unit 740. An image codec 742 receives input from the images storage 724 and also provides output to the projection and mapping unit 740. A video codec 744 receives input from the videos storage 728 and further provides output to the projection and mapping unit 740. In addition, the projection and mapping unit 740 receives input directly from the experience manager 734 and provides output to a GPU 746.
Turning to Figure 8, a multiplexing unit is indicated generally by the reference numeral 800. The multiplexing unit is configured to combine video sources using predefined layouts matching a projection. The multiplexing unit 800 includes a hardware production portion 810, a CDN 820, a software production portion 830, and a server portion 840 connected to the software production portion. The CDN receives input from both the hardware production portion 810 and the server 840.
The hardware production portion 810 includes a live VR camera feed 812 and a broadcast or television feed 814, both connected to a hardware
multiplexing unit 816. The hardware multiplexing unit 816 is connected to a videos storage 822 of the CDN 820.
The software production portion 830 includes a live VR camera feed 832 and a broadcast or television feed 834. The server 840 includes a software multiplexing unit 846, where the software multiplexing unit 846 receives input from both the live VR camera feed 832 and the broadcast or television feed 834 of the hardware production portion 830.
As shown in Figure 9, an exemplary interactive VR system is indicated generally by the reference numeral 900. Here, an experience manager 910 provides from one to n experience templates 920, which are fed into from one to two scene templates 940 from a Tenderer 930. the combined templates are fed to a media unit 960, which inserts video media 970 and image media 980, and feeds the combined media to a projection and format unit 950.
Turning to Figure 10, an experience algorithm is indicated generally by the reference numeral 1000. The algorithm includes a function block 1010 that receives an experience identifier from a user interface such as 732 of Figure 7 and passes an http or https request to a function block 1020. The function block 1020, in turn, fetches experience data such as 920 of Figure 9 from a server such as 840 of Figure 8 via an application programming interface (API), and passes parallel control to three function blocks 1030, 1040 and 1050, respectively. The function block 1030 creates a web page including all web content in different frames, while the function block 1040 creates video decoders, and while the function block 1050 creates image decoders.
Turning to Figure 1 1 , an experience manager algorithm is indicated generally by the reference numeral 1 100. Here, a function block 1 1 10 receives a list of web contents and container sizes from an experience manager such as 734 of Figure 7 and passes control to a function block 1 120. The function block 1 120 creates a web page including frames for each web content and passes control to a function block 1 130. The function block 1 130, in turn, loads locally created web page to a web view or web browser, and passes control to a
function block 1 140, which captures the web browser viewport or textual representation to the GPU each time that the viewport is updated.
Turning now to Figure 12, a texture algorithm is indicated generally by the reference numeral 1200. Here, a function block 1210 receives a list of media assets in experience form the experience manager 734 of Figure 7 and/or 910 of Figure 9, and passes control to a function block 1220. For each media element, the function block 1220 places texture on the GPU and stores UVs, vertexes, and indices. Here, the UVs are for projecting a 2D image onto a 3D model's surface for texture mapping, where the letters U and V denote the axes of the 2D texture. This transfers texture by capturing from a video codec, for example, and mapping to a GPU.
Table A includes pseudocode for an exemplary embodiment experience algorithm. Table A
"id": "P0OvmqXpNE9",
"customerld": "5Av0mVxwQ8X",
"title": "My Experience",
"thumbnailUrl": "http://www.site.com/myexperience_thumb.jpg", "defaultBackgroundColor": "#000000",
"defaultBackgroundMediald": null,
"widgetld": 6Bn8ISnAC9z,
"tags": "model, sample, patent",
"scenes": [{
"id": "5Av0mVxwQ8X",
"experienced": "P0OvmqXpNE9",
"title": "Main Scene",
"thumbnailUrl":
://www.site.com/mainscene_thumb.jpg",
"backgroundColor": "#00FF00",
"backgroundMediald": "3OPqmdyzk0b",
"contentMediald": "20Pmqczxl6c",
"startAction": null,
"endAction": null,
"live": false,
"contentMedia": {
"id": "7n9wZ0LmgA6",
"title": "My Video",
"description": "This is our default scene",
"author": "SpherePlay",
"date": "2018-06-01 ",
"mediaUrl":
"http://www.site.com/video180.m3u8",
"subtitlesUrl":
"http://www.site.com/subtitles_fr.smi",
"thumbnailUrl":
"http://www.site.com/video180_thumb.jpg",
"type": "video",
"projection": {
"type": "fulldome",
"params": "180"
}
"format": "mono"
},
"backgroundMedia": {
"id": "3OPqmdyzk0b",
"title": "My image",
"description": "This is another scene", "author": "SpherePlay",
"date": "2017-09-27",
"mediaUrl":
"http://www.site.com/image360stereo.mp4",
"subtitlesUrl": null,
"thumbnailUrl":
"http://www.site.com/image360_thumb.jpg",
"type": "image",
"projection": {
"type": "sphere",
"params": null
},
"format": "mono"
}
}
]
}
As shown in Figure 13, an experience manager algorithm comparable to 1 100 of Figure 1 1 , here with exemplary output, is indicated generally by the reference numeral 1300. The algorithm itself comprises function blocks 1310, 1320, 1330 and 1340 comparable to function blocks function blocks 1 1 10, 1 120, 1 130 and 1 140 of Figure 1 1 , so duplicate description may be omitted. Here, an exemplary web page created by the function block 1320 is indicated generally by the reference numeral 1350. This is an example of a web page that could be
generated to display three different web contents, shown here as a menu 1352, an advertising banner 1354 and a scoreboard 1356. It shall be understood that the layout and contents may vary. Moreover, the advertising banner, for example, may be tuned to a particular device, user, location, or the like. This tunability may be accomplished by variable content within a manifest or by variably tuned manifests. The result will be loaded into a virtual web view and will not be displayed on a user's display in the layout form 1360, but rather in the visual form 1370. In operation, each part or web content will be sliced, sent to the GPU as needed, and placed in the virtual environment at the designated position.
The technology defined herein may be implemented by a system which comprises computer-readable storage containing instructions for instructing a processing device or processing devices to execute the methods described herein. In one example, the system is implemented on a computing device, such as a smart phone.
The exemplary device may comprise a network. The exemplary device may comprise processing units in a processing entity or processor for performing general and specific processing. The processing entity is at least partially programmable and may include a general-purpose processor as can be found in a smartphone. The processing entity may also include function-specific processing units such as a GPU which may comprise hardware codecs. The device may also comprise computer-readable storage, which may be distributed over several different physical memory entities such as chips, layers, or the like. The computer-readable storage may be accessible by the processing entity for storing/retrieving operational data. The computer-readable memory may also comprise program instructions readable and executable by the processing entity instructing the processing entity to implement the methods described herein.
The network interface may be adapted for receiving virtual reality content such as video, still image and interactive such as web or HTML content. The device may comprise logic for decoding the received content, including, for example, a video decoder for decoding MPEG-4, HEVC, or the like, a still image decoder for decoding JPEG or the like, and an interactive content decoder such
as a web browser. The device may also comprise buffers for storing the decoded content including a frame buffer for storing decoded video frames, an image buffer for storing decoded still images and an interactive content buffer for storing interactive content.
The computer-readable storage may also comprise one or more mapping manifest, including one or more general mapping manifest as described herein which may be used by the system to determine how to provide the received VR content in a VR scene. In some embodiments, the device may be adapted for receiving mapping manifests such as over the network interface.
The device comprises computer-executable software code for causing the processing entity to implement one or more of the methods and techniques described herein, including for causing the device to map portions of received video, still image and/or interactive content, such as from respective buffers, onto a 3D VR scene, or more particularly onto surfaces or objects of a 3D VR scene such as texture, and to composite, render and display the resulting scene. The creation of the 3D VR scene may also include generating 3D objects onto which to map the VR content. The creation of the 3D VR objects may be done in accordance with or as instructed by the one or more mapping manifest(s).
In certain embodiment the technology may include receiving a single video stream having images comprising different zones corresponding to different video content associated with different 3D surfaces in a virtual scene, buffering this image data in corresponding buffer zones and texturing the different 3D surfaces with the respective image data from the corresponding buffer zones. This may involve populating a frame buffer with different zones that contain different videos and generating a 3D scene with different surfaces corresponding to different zones. This feature may provide the benefit of providing the entire video content of a virtual reality experience in a single video stream, thereby reducing the resource drain of running multiple video decoders.
Moreover, the present technology may include receiving a single video stream having images comprising different portions corresponding to different 3D geometries in a virtual representation, applying a distortion to the different portions in accordance with their respective 3D geometries and applying them as
textures to the respective surfaces of 3D objects having their respective 3D geometry in a virtual scene. For example, different zones may be represented under different geometries in the 3D scene - e.g. zone #1 is stretched on a sphere and zone #2 is a cylinder. This feature may provide the benefit of providing access to both curved and planar content using a single stream with the same advantages as above.
Moreover, the present technology may include receiving and applying at a content-receiving client a mapping manifest comprising mapping instructions for dividing image, video and web content into subdivisions and mapping the content of the subdivisions to respective 3D surfaces in a virtual scene. This too appears to be outside of the concern of the cited references. Usefully, this feature may provide the benefit for a client device to be configured to use a single decoder for each of the three types of media used (video, image, interactive).
Moreover, the present technology may include populating a web cache with web content from a web page and assigning different portion of the web content different respective 3D surfaces in a virtual scene, deriving a displayable output for each of the different portion and displaying each of the displayable on their respective 3D surface. The different portions may further be associated with different geometries, and their displayable output distorted accordingly. Populate a frame buffer with different zones that contain bilateral interactive content such as HTML content. For example, different bilateral interactive zones may be represented under different geometries in the 3D scene - zone #1 is stretched on a sphere and zone #2 is stretched on a cylinder. This feature may allow the use of a web address to obtain all the different interactive elements of a VR experience, and to apply these interactive elements where desired in the scene. For which only a single web browser instance is needed.
Moreover, the present technology may include rendering a 3D scene comprising different 3D objects or surfaces by creating separate compositions for each of the 3D objects or surfaces and superimposing the resulting compositions in decreasing order of distance from a virtual camera. For example, we may render a 3D scene by compositing a superposition of layering. E.g.,
superimposing different separately composited objects of the 3D scene. This
may provide the advantage of reducing the computational burden of rendering a 3D scene.
Although various embodiments have been illustrated, this was for the purpose of describing, but not limiting, the present invention. Various possible modifications and different configurations will become apparent to those of ordinary skill in the pertinent art and are within the scope and spirit of the present invention, which is defined more particularly by the attached claims.
Claims
1 . A method for providing immersive video experiences, the method comprising:
building a mapping manifest;
receiving a plurality of media objects;
multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest; and
providing the mapping manifest and the encoded stream for an immersive video experience.
2. The method of Claim 1 wherein the encoded stream comprises hypertext markup language (HTML) defining the mapping manifest.
3. The method of Claim 1 or 2, further comprising:
decoding the encoded stream; and
rendering the decoded stream in accordance with the mapping manifest.
4. The method of Claim 3, further comprising at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects.
5. The method of Claim 3 or 4 wherein the received plurality of media objects is of a first type, the method further comprising:
receiving a second plurality of media objects of a different second type; multiplexing the received second plurality of media objects into a second encoded stream in accordance with the mapping manifest;
providing the second encoded stream to the immersive device; and decoding the second encoded stream and rendering the second decoded stream on the immersive device.
6. The method of Claim 5 wherein the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
7. The method of Claim 6 wherein the at least one of positioning, orienting, and applying spatial surface geometry to plurality of media objects is done in accordance with the mapping manifest.
8. The method of any one of Claims 1 to 7, wherein the immersive video experience has a full spherical environment field of view.
9. The method of any one of Claims 1 to 7, wherein the immersive video experience has a partial environment field of view.
10. The method of Claim 9 wherein the partial environment field of view comprises a dome.
1 1 . The method of any one of Claims 3 to 7 wherein rendering comprises:
generating image data from the decoded streams using a browser or interpreter; and
based on the mapping manifest, projecting portions of the generated image data upon a surface field of view using a GPU.
12. The method of any one of Claims 3 to 7, further comprising:
projecting media portions of the decoded streams upon a surface field of view using a GPU.
13. A program storage device tangibly embodying instructions executable by a processor for:
building a mapping manifest;
receiving a plurality of media objects;
multiplexing the received plurality of media objects into an encoded stream in accordance with the mapping manifest; and
providing the mapping manifest and the encoded stream for an immersive video experience.
14. The device of Claim 13 wherein the received plurality of media objects is of a first type, further comprising instructions for:
receiving a second plurality of media objects of a different second type; multiplexing the received second plurality of media objects into a second encoded stream in accordance with the mapping manifest; and
providing the second encoded stream for the immersive video experience.
15. The device of Claim 14 wherein the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
16. The device of Claim 13, 14 or 15, further comprising instructions for at least one of positioning, orienting, and applying spatial surface geometry to the plurality of media objects.
17. The device of any one of Claims 13 to 16 wherein the at least one of positioning, orienting, and applying spatial surface geometry to plurality of media objects is done in accordance with the mapping manifest.
18. The device of any one of Claims 13 to 17 wherein the immersive video experience has a full spherical environment field of view.
19. The device of any one of Claims 13 to 17 wherein the immersive video experience has a partial environment field of view.
20. A program storage device tangibly embodying instructions executable by a processor for:
receiving at least one encoded stream;
receiving a mapping manifest;
decoding the at least one encoded stream with a single decoder or interpreter; and
rendering the decoded stream as an immersive video experience in accordance with the mapping manifest.
21 . The device of Claim 20 wherein the received encoded stream is of a first type, further comprising instructions for:
receiving a second encoded stream of a second type different from a type of the at least one encoded stream;
decoding the second encoded stream with a second decoder or interpreter; and
rendering the second decoded stream as part of the immersive video experience in accordance with the mapping manifest.
22. The device of Claim 21 wherein the first type is one of video, image or web content, and the second type is a different one of video, image or web content.
23. The device of Claim 20, 21 or 22 wherein the immersive video experience has a full spherical environment field of view.
24. The device of Claim 20, 21 or 22 wherein the immersive video experience has a partial environment field of view.
25. The device of Claim 20, 21 or 22 wherein rendering comprises: generating image data from the decoded streams using a browser or interpreter; and
based on the mapping manifest, projecting portions of the generated image data upon a surface field of view using a GPU.
26. The device of Claim 20, 21 or 22, further comprising instructions for a GPU to project media portions of the decoded streams upon a surface field of view.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762516710P | 2017-06-08 | 2017-06-08 | |
US62/516,710 | 2017-06-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018223241A1 true WO2018223241A1 (en) | 2018-12-13 |
Family
ID=64565697
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CA2018/050690 WO2018223241A1 (en) | 2017-06-08 | 2018-06-08 | Building and rendering immersive virtual reality experiences |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018223241A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113784105A (en) * | 2021-09-10 | 2021-12-10 | 上海曼恒数字技术股份有限公司 | Information processing method and system for immersive VR terminal |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6567086B1 (en) * | 2000-07-25 | 2003-05-20 | Enroute, Inc. | Immersive video system using multiple video streams |
US20070005795A1 (en) * | 1999-10-22 | 2007-01-04 | Activesky, Inc. | Object oriented video system |
US20120007752A1 (en) * | 2007-08-13 | 2012-01-12 | Samsung Electronics Co., Ltd. | Method and apparatus for encoding/decoding metadata |
US20120131432A1 (en) * | 2010-11-24 | 2012-05-24 | Edward Wayne Goddard | Systems and methods for delta encoding, transmission and decoding of html forms |
WO2016024892A1 (en) * | 2014-08-13 | 2016-02-18 | Telefonaktiebolaget L M Ericsson (Publ) | Immersive video |
-
2018
- 2018-06-08 WO PCT/CA2018/050690 patent/WO2018223241A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070005795A1 (en) * | 1999-10-22 | 2007-01-04 | Activesky, Inc. | Object oriented video system |
US6567086B1 (en) * | 2000-07-25 | 2003-05-20 | Enroute, Inc. | Immersive video system using multiple video streams |
US20120007752A1 (en) * | 2007-08-13 | 2012-01-12 | Samsung Electronics Co., Ltd. | Method and apparatus for encoding/decoding metadata |
US20120131432A1 (en) * | 2010-11-24 | 2012-05-24 | Edward Wayne Goddard | Systems and methods for delta encoding, transmission and decoding of html forms |
WO2016024892A1 (en) * | 2014-08-13 | 2016-02-18 | Telefonaktiebolaget L M Ericsson (Publ) | Immersive video |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113784105A (en) * | 2021-09-10 | 2021-12-10 | 上海曼恒数字技术股份有限公司 | Information processing method and system for immersive VR terminal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111133763B (en) | Superposition processing method and device in 360 video system | |
JP6657475B2 (en) | Method for transmitting omnidirectional video, method for receiving omnidirectional video, transmitting device for omnidirectional video, and receiving device for omnidirectional video | |
US11651752B2 (en) | Method and apparatus for signaling user interactions on overlay and grouping overlays to background for omnidirectional content | |
CN108702528B (en) | Method for transmitting 360 video, method for receiving 360 video, apparatus for transmitting 360 video, and apparatus for receiving 360 video | |
CN109076255B (en) | Method and equipment for sending and receiving 360-degree video | |
CN111164969B (en) | Method and apparatus for transmitting or receiving 6DOF video using stitching and re-projection related metadata | |
EP3466093B1 (en) | Method, device, and computer program for adaptive streaming of virtual reality media content | |
EP3466091B1 (en) | Method, device, and computer program for improving streaming of virtual reality media content | |
JP7399224B2 (en) | Methods, devices and computer programs for transmitting media content | |
KR102258448B1 (en) | Method and apparatus for transmitting and receiving 360-degree video using metadata related to hotspot and ROI | |
US10409445B2 (en) | Rendering of an interactive lean-backward user interface on a television | |
US11094130B2 (en) | Method, an apparatus and a computer program product for video encoding and video decoding | |
KR102640664B1 (en) | A method for controlling VR device and a VR device | |
CA3069034A1 (en) | Image processing method, terminal, and server | |
US11587200B2 (en) | Method and apparatus for enabling multiple timeline support for omnidirectional content playback | |
EP3712751A1 (en) | Method and apparatus for incorporating location awareness in media content | |
WO2018223241A1 (en) | Building and rendering immersive virtual reality experiences | |
WO2020188142A1 (en) | Method and apparatus for grouping entities in media content | |
WO2023194648A1 (en) | A method, an apparatus and a computer program product for media streaming of immersive media | |
WO2020141995A1 (en) | Augmented reality support in omnidirectional media format |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18814029 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18814029 Country of ref document: EP Kind code of ref document: A1 |