WO2019048733A1 - Transmission of video content based on feedback - Google Patents

Transmission of video content based on feedback Download PDF

Info

Publication number
WO2019048733A1
WO2019048733A1 PCT/FI2018/050615 FI2018050615W WO2019048733A1 WO 2019048733 A1 WO2019048733 A1 WO 2019048733A1 FI 2018050615 W FI2018050615 W FI 2018050615W WO 2019048733 A1 WO2019048733 A1 WO 2019048733A1
Authority
WO
WIPO (PCT)
Prior art keywords
viewport
aggregate
video content
content data
ues
Prior art date
Application number
PCT/FI2018/050615
Other languages
French (fr)
Inventor
Athul Prasad
Miska Hannuksela
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2019048733A1 publication Critical patent/WO2019048733A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/61Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
    • H04L65/611Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for multicast or broadcast
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/131Protocols for games, networked simulations or virtual reality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/194Transmission of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2668Creating a channel for a dedicated end-user group, e.g. insertion of targeted commercials based on end-user profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6587Control parameters, e.g. trick play commands, viewpoint selection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video

Definitions

  • the video content may represent, but is not limited to, panoramic virtual reality content transmitted to a plurality of user equipment in a broadcast or multicast transmission.
  • Video content data which may or may not comprise audio, is particularly challenging given the amount of data that is required and the limited availability of spectrum resources.
  • a first aspect provides a method comprising: receiving from a plurality of remote user equipment (UE) information relating to respective viewports; determining data representing an aggregate viewport based on the viewport information from the plurality of UEs; and transmitting video content data to the plurality of UEs based on the determined aggregate viewport.
  • the method may further comprise: transmitting the aggregate viewport data to the UEs; receiving from one or more of the UEs an indication that their respective user viewport does not relate to the aggregate viewport; determining an updated aggregate viewport based on said received indication; transmitting to the plurality of UEs the updated aggregate viewport; and transmitting to the plurality of UEs video content data based on the updated aggregate viewport.
  • UE remote user equipment
  • the indication that the respective user viewport does not relate to the aggregate viewport may comprise an indication that said UE has terminated receiving the video content data.
  • the indication that the respective user viewport does not relate to the aggregate viewport may comprise an indication that video content data is not being viewed at said UE.
  • Determining the updated aggregate viewport may comprise disregarding viewport information received from said UE.
  • the indication that the respective user viewport no longer corresponds to the aggregate viewport may comprise an indication that the viewport has changed by more than a predetermined amount.
  • the indication that the respective user viewport no longer corresponds to the aggregate viewport may comprise an indication that the viewport is outside of the aggregate viewport.
  • the viewport information may comprise information from which can be derived information relating to a viewing angle associated with the UE or a display device associated with the UE.
  • the viewing angle information may comprise a viewing orientation.
  • the viewing angle information may comprise the extent of the viewport.
  • the extent of the viewport may be represented by the horizontal and vertical fields of view of the viewport.
  • the viewport information may further comprise an indication of whether the content is monoscopic or stereoscopic.
  • the viewing angle information may be quantised and/or compressed and wherein the method may further comprise reconstructing and/or decompressing the viewing angle information prior to determination.
  • Transmitting the video content data to the plurality of UEs may comprise transmitting video content data corresponding to the aggregate viewport at a higher quality relative to other video content data outside of the aggregate viewport.
  • the transmitted video content may be transmitted at a high resolution and/or data rate than other video content data outside of the aggregate viewport.
  • the method may further comprise caching video content outside of, but within a limited predetermined spatial distance of, the aggregate viewport, and transmitting said cached video content at a higher quality responsive to receiving from the indication that a UEs respective user viewport does not relate to the aggregate viewport.
  • the cached video content may be transmitted prior to transmitting to the UEs the updated aggregate viewpoint.
  • the video content data may be provided as a plurality of tiles, each tile corresponding to a sub-portion of the overall content data and representing a respective spatial display position.
  • the video content data may be transmitted to the UEs using one or more of a multicast or broadcast transmission.
  • the aggregate viewport data may be transmitted to the UEs using one or more of a multicast or broadcast transmission.
  • the video content data may represent a panoramic image or video.
  • the video content data may be virtual reality (VR) content data.
  • VR virtual reality
  • the user viewport information from each of the plurality of UEs may be indicative of the user viewport of a respective VR headset.
  • the method may further comprise transmitting configuration data to each of the UEs for configuring them to transmit the viewport information.
  • the method may be performed at a base station of a mobile network.
  • a further aspect provides a method, comprising: transmitting to a remote system information relating to a viewport of a user equipment (UE); receiving further video content data based on aggregate viewport data determined remotely using the transmitted viewport information and viewport information from one or more other UEs.
  • UE user equipment
  • the method may further comprise: receiving the aggregate viewport data; determining that the viewport does not relate to the aggregate viewport; and transmitting an indication that the viewport does not relate to the aggregate viewport to the remote system.
  • the determination that the viewport does not relate to the aggregate viewport may be made responsive to the UE terminating receiving the video content data, the transmitted indication identifying such to the remote system.
  • the determination that the viewport does not relate to the aggregate viewport may be made responsive to the video content data not being viewed at the UE, the transmitted indication identifying such to the remote system.
  • the method may further comprise terminating transmission of the viewport information.
  • the determination that the viewport does not relate to the aggregate viewport may be made responsive to the viewport of the UE changing by more than a predetermined amount, the transmitted indication comprising updated viewport information.
  • the determination that the viewport does not relate to the aggregate viewport may be made responsive to the viewport of the UE being outside of the aggregate viewport, the transmitted indication comprising updated viewport information.
  • the transmitted viewport information may comprise information from which can be derived information relating to a viewing angle associated with the UE or a display device associated with the UE.
  • the transmitted viewing angle information may comprise a viewing orientation.
  • the transmitted viewing angle information may comprise the extent of the viewport.
  • the extent of the viewport may be represented by the horizontal and vertical fields of view of the viewport.
  • the transmitted viewport information may further comprise an indication of whether the UE prefers monoscopic or stereoscopic content.
  • the method may further comprise quantising and/or compressing the viewing angle information prior to transmitting.
  • the received video content data may correspond to the aggregate viewport is at a higher quality relative to other video content data outside of the aggregate viewport.
  • the received video content data may be at a higher resolution and/or data rate than other video content data outside of the aggregate viewport.
  • the method may further comprise caching received video content data outside of, but within a limited predetermined spatial distance of, the aggregate viewport, and displaying said cached video content responsive to determining that the viewport does not relate to the aggregate viewport.
  • the video content data may be provided as a plurality of tiles, each tile corresponding to a sub-portion of the overall video content data and representing a respective spatial display position.
  • the video content data may be received using one or more of a multicast or broadcast transmission.
  • the aggregate viewport data may be received using one or more of a multicast or broadcast transmission.
  • the video content data may represent a panoramic image or video.
  • the video content data may be virtual reality (VR) content data.
  • the user viewport information may be indicative of the user viewport of a VR headset.
  • the method may further comprise receiving from a remote device configuration data for configuring the UE to transmit the viewport information, and transmitting the viewport information responsive to installing or executing the configuration data.
  • the method may be performed at a mobile terminal for connection to a mobile network.
  • a further aspect may provide a computer program comprising instructions that when executed by a computer control it to perform the method of any preceding method definition.
  • a further aspect may provide an apparatus configured to perform the method steps of any of preceding method definition.
  • a further aspect may provide a non-transitory computer-readable medium having stored thereon computer-readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising: receiving from a plurality of remote user equipment (UE) information relating to respective viewports; determining data representing an aggregate viewport based on the viewport information from the plurality of UEs; and transmitting video content data to the plurality of UEs based on the determined aggregate viewport.
  • UE remote user equipment
  • a further aspect may provide a non-transitory computer-readable medium having stored thereon computer-readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising:
  • UE user equipment
  • Figure 1 is a perspective view of a VR display system, useful for understanding the embodiments
  • Figure 2a is a block diagram of a computer network including the Figure 1 VR display system, according to embodiments;
  • Figure 3a is a schematic top-plan view of a virtual space
  • Figure 3b is a schematic internal view of part of the Figure 3a virtual space
  • Figure 4 is a block diagram of components of a VR content provider forming part of the Figure 2 VR display system
  • Figure 5 is schematic diagram of a 5G network architecture
  • Figure 6 is a schematic diagram of a 5G network architecture according to
  • Figure 7 is a flow diagram showing processing operations performed at, for example, a 5G base station according to embodiments.
  • Figure 8 is a flow diagram showing processing operations performed at a user equipment according to embodiments.
  • Figure 9 is a flow diagram showing processing operations performed at a 5G base station, content provider and user equipment according to embodiments;
  • Figure 10 is a perspective view of a 360 0 space and a flattened two-dimensional (2D) space according to embodiments;
  • Figure 11 is a frequency-time graph for illustrating timing and bandwidth of feedback information and content data according to embodiments
  • Figure 12 is a schematic view showing edge content in relation to an aggregate viewport according to embodiments
  • Figures 13a - 13c are schematic views of viewports from respective UEs.
  • Figures 13d - i3e are schematic views of alternative aggregate viewports derived from the viewports of Figures 13a - 13c.
  • Embodiments herein relate to the transmission of video data to one or more remote users by means of, for example, over-the-air technology.
  • the video data may represent, for example, panoramic virtual reality (VR) content data representing immersive video content.
  • the VR content data may represent omnidirectional (a.k.a. 360-degree) video content for user exploration using three degrees of freedom (3D0F), wherein the user can choose the viewing orientation, which may be indicated by extrinsic rotation angles around three orthogonal coordinate axes.
  • the VR content data may or may not be accompanied by an audio stream, which audio stream may comprise spatial audio data.
  • the methods and systems are however applicable to any form of video data for transmission from a source to a destination.
  • VR content data are intended also to cover related technologies such as Augmented Reality (AR) and Mixed Reality (MR).
  • the transmission of the VR content data may be by means of video streaming between a content provider and a plurality of users, potentially a large number of users, by means of either or both of a multicast and/or broadcast transmission over-the-air.
  • the transmission may employ an IP network. Again, this is given by way of example.
  • the transmission mechanism may use 3G, 4G LTE or 5G wireless networks, e.g. base stations and protocols, but the methods and systems described herein are applicable to future wireless technologies also.
  • multicast generally refers to the transmission of data between a source and a logical group of identified receivers, for example receivers that are members of a particular multicast group
  • broadcast refers to transmission of data to all receivers within a defined area, e.g. within a cell corresponding to a base station.
  • MBMS Multimedia Broadcast / Multicast Service
  • Embodiments relate to view-dependent video transmission, where the VR content data that is transmitted from the source to user end systems is dependent on the position or orientation of users who may be wearing a VR user device.
  • the transmitted VR content data may represent a part of an overall, wide-angle video scene.
  • the video scene may cover a field which is greater than a viewer's typical field-of-view, e.g. greater than 180 0 . Therefore, embodiments are particularly suited to applications where a user may consume and/or interact with an overall video scene greater than 180 0 and possibly up to 360 0 .
  • VR virtual reality
  • the VR display system may be provided with a live or stored feed from a video content source, the feed representing a VR space for immersive output through the display system.
  • audio is provided, which may be spatial audio.
  • Nokia's OZO (RTM) VR camera is an example of a VR capture device which comprises a camera and microphone array to provide VR content data and a spatial audio signal, but it will be appreciated that the embodiments are not limited to VR applications nor the use of microphone arrays at the video capture point.
  • FIG l is a schematic illustration of a VR display system 1.
  • the VR system l includes a VR headset 20, for displaying visual data in a virtual reality space, and a VR media player 10 for rendering visual content data on the VR headset 20.
  • a virtual space is any computer-generated version of a space, for example a captured real world space, in which a user can be immersed.
  • the VR headset 20 may be of any suitable type.
  • the VR headset 20 may be configured to provide VR video and audio content data to a user. As such, the user may be immersed in virtual space.
  • the VR headset 20 receives visual content from a VR media player 10.
  • the VR media player 10 may be part of a separate device which is connected to the VR headset 20 by a wired or wireless connection.
  • the VR media player 10 may include a games console, or a PC configured to communicate visual data to the VR headset 20.
  • the VR media player 10 may form part of the display for the VR headset 20.
  • the media player 10 may comprise a mobile phone, smartphone or tablet computer configured to play content through its display.
  • the device may be a touchscreen device having a large display over a major surface of the device, through which video content can be displayed.
  • the device may be inserted into a holder of a VR headset 20.
  • a smart phone or tablet computer may display visual data which is provided to a user's eyes via respective lenses in the VR headset 20.
  • the VR display system 1 may also include hardware configured to convert the device to operate as part of VR display system 1.
  • VR media player 10 may be integrated into the VR display device 20.
  • VR media player 10 may be implemented in software.
  • a device comprising VR media player software is referred to as the VR media player 10.
  • the VR display system 1 may include means for determining the spatial position and/or orientation of the user's head. Over successive time frames, a measure of movement may therefore be calculated and stored. Such means may comprise part of the VR media player 10. Alternatively, the means may comprise part of the VR display device 20.
  • the VR display device 20 may incorporate motion tracking sensors which may include one or more of gyroscopes, accelerometers and structured light systems.
  • the VR display device 20 will typically comprise two digital screens for displaying stereoscopic video images of the virtual world in front of respective eyes of the user, and also two speakers for delivering audio, if provided from the VR system. In some VR display devices, a single screen may be divided into two portions for displaying a stereo image pair.
  • the embodiments herein, which primarily relate to the delivery of VR content data, are not limited to a particular type of VR display device 20.
  • the VR content data may, for example, be monoscopic instead of stereoscopic.
  • the VR display system 1 may be configured to display visual data to the user based on the spatial position of the display device 20 and/or the orientation of the user's head.
  • a detected change in spatial position and/or orientation i.e. a form of movement, may result in a corresponding change in the visual data to reflect a position or orientation transformation of the user with reference to the virtual space into which the visual data is projected. This allows VR content data to be consumed with the user experiencing an omnidirectional 3D VR environment.
  • the VR display device 20 may display non-VR content data captured with two- dimensional video or image devices, such as a smartphone or a camcorder, for example.
  • Such non-VR content data may include a framed video or a still image.
  • the non-VR source content data may be 2D, stereoscopic or 3D.
  • the non-VR source content data includes visual source content, and may optionally include audio source content.
  • Such audio source content may be spatial audio source content.
  • Spatial audio may refer to directional rendering of audio in the virtual space such that a detected change in the orientation of the user's head may result in a corresponding change in the spatial audio rendering to reflect an orientation transformation of the user with reference to the virtual space in which the spatial audio data is rendered.
  • the display of the VR display device 20 is described in more detail below.
  • the angular extent of the virtual environment observable through the VR display device 20 is called the viewport of the display device. More generally, the viewport is any representation of a sub-section of the overall VR space that the user is seeing via the VR display device 20.
  • the actual viewport observed by a user may depend on the inter-pupillary distance and on the distance between the lenses of the headset and the user's eyes, but the extent of the viewport can be considered to be approximately the same for all users of a given display device when the display device is being worn by the user.
  • the viewport of the VR display device 20 may be represented by viewport data determined locally, i.e. at the VR display device 20 and/or at the VR media player 10 using known techniques.
  • the viewport data of the VR display device 20 may be representative of a viewing angle, comprising one or more of: viewing orientation (e.g. the spherical coordinates of the centre point of the viewport and a rotation angle of the viewport), the extents of the viewport (e.g. the horizontal and vertical field of view of the viewport).
  • the viewport data may be accompanied by data indicative of the type of content, e.g. monoscopic or stereoscopic VR data.
  • FIG. 2 shows a generalised VR system 1, comprising the above-described media player 10 and VR display device 20.
  • a remote content provider 30 may store and transmit streaming video data which, in the context of embodiments, is VR content data for display to the VR display device 20.
  • the content provider 30 streams the VR content data over a data network 40, which may be any network, for example an IP network such as an over-the-air network, such as a 3G, 4G or 5G mobile IP network, a multicast network, or a broadcast network. If data network 40 is unidirectional, a return channel for providing feedback from VR display device 20 to the content provider 30 may be provided by another data network.
  • the remote content provider 30 may or may not be the location or system where the VR content data is captured and processed.
  • the VR content data may cover, horizontally, the full 360 0 coverage or extent around the capturing position of the VR capture device.
  • the vertical coverage may vary and can be, for example, 180 0 .
  • Such images may be represented by a sphere that is mapped onto a two-dimensional image plane using equi-rectangular projection (ERP.)
  • ERP equi-rectangular projection
  • the horizontal co-ordinates may be considered equivalent to longitude and the vertical coordinates equivalent to latitude with no transformation or scaling applied.
  • the process of forming a monoscopic equi-rectangular panoramic picture can be performed by (i) stitching individual images captured by the different cameras of a 360 0 capture device onto a spherical image; (ii) projecting the spherical image onto the wall of a cylinder; (iii) unfolding the cylinder to a two-dimensional image. Some stages may be merged.
  • a panoramic image may have less than a 360 0 horizontal coverage and up to 180 0 vertical coverage, but otherwise may have the characteristics of equi- rectangular projection format.
  • a multitude of projection formats exist for 360 degree video including, but not limited to, cubemap projection, octahedron projection, and equal area projection.
  • 360 0 video content data can be monoscopic or stereoscopic.
  • a monoscopic display such as conventional flat panel display, is used, or the same video is displayed for both eyes on a stereoscopic display, such as the VR display device 20.
  • the displayed content comprises two views, corresponding to the left eye and the right eye respectively.
  • the VR content data for the overall video scene for example a 360 degree video scene, may be arranged as a series of two-dimensional areas or tiles, each representing a respective spatial part of the scene. Therefore, the VR media player 10 may only receive from the content provider 30 those tiles which are within the current viewport, at least at a high quality.
  • This viewport-dependent transmission of VR content data, or viewport-adaptive streaming (VAS) is useful for reducing the streaming bit-rate with little or no impact on subjective quality.
  • Each tile may be represented by a video segment, which is a temporal portion of the video content, typically in the order of seconds, e.g. two seconds or similar. Therefore, for a given tile, multiple segments may be downloaded, buffered and then decoded and rendered in a sequential order. Therefore, in some embodiments, there may be provided a plurality of groups of video segments, each group corresponding to a respective sub-area of the larger video content, and each segment within a group representing different temporal portions of the video content. The video content may therefore be divided in both the spatial and temporal domains.
  • Figure 3a is a schematic top plan view representing a 360 0 view field 50 in relation to a user 55 wearing the VR display device 20. Based on the user's position and the orientation of the VR display device 20, only a current viewport 60 may be streamed to the media player 10, i.e. the portions or segments of the video scene between the bounding lines 57.
  • Figure 3b shows, for example, a plurality of segments 80a - 8oh when rendered to the VR display device 20 from the user's perspective. These may be termed "first segments" in that they represent the current viewport 60. Each segment 80a - 8oh is effectively a tile representing a respective two-dimensional region of the viewport 60. Each segment 80a - 8oh may represent a still image, or video data lasting several seconds in length.
  • segment used herein refers to video data representing a sub-portion of an overall image for a time interval.
  • the VR display device 20, or its associated VR media player 10 needs to feedback to the contact provider 30 positional data indicative of the current viewport to the source.
  • the positional data may represent the current viewport in a number of ways.
  • the viewport of the display device 20 may be represented by viewport data determined locally, i.e. at the VR display device 20 and/or at the VR media player 10 using known techniques.
  • the viewport data of the display device 20 may be representative of a viewing angle, comprising one or more of: viewing orientation (e.g. the spherical coordinates of the centre point of the viewport and a rotation angle of the viewport), the extents of the viewport (e.g. the horizontal and vertical field of view of the viewport).
  • the viewport data may be accompanied by data indicative of the type of content, e.g. monoscopic or stereoscopic VR content data.
  • the viewport data may be compressed or quantised; in other words, the display device 20 and/or the media player need only generate information sufficient for the content provider 30 (or a network node, e.g. a 5G base station or gigabit Node B (gNB)) to determine the viewport at that end.
  • a network node e.g. a 5G base station or gigabit Node B (gNB)
  • this viewport dependent transmission would require transmitting updated VR content data to each respective VR display device 20 based on the individual viewports fed back from said VR display devices 20. This involves significant processing and spectrum resources.
  • Embodiments herein provide methods and systems for transmitting VR content data using multicast or broadcast protocols, which may involve receiving viewport information from a plurality of user-end devices, one or more of which may be the VR display device 20, determining an aggregate viewport based on the viewport information feedback, and transmitting subsequent VR content data based on the determined aggregate viewport.
  • high quality immersive VR content data may be transmitted to user-end devices using multicast or broadcast protocols for mass delivery, whilst making efficient use of over-the-air radio resources.
  • Embodiments are therefore suited to over-the-air links where spectral resources are limited, such as mobile networks, for example. Latency and caching issues may also be improved, as will become evident.
  • Other features may include the user-end devices 10, 20 determining if their respective viewports do not correspond to the determined aggregate viewport, and if not, informing said fact in subsequent feedback information to the content provider 30 in order that the aggregate viewport can be updated. Detailed embodiments are explained below.
  • FIG 4 is a schematic diagram of components of the content provider 30, or a computer system associated with the content provider.
  • the content provider 30 may have a controller 100, RAM 102, a memory 104, and, optionally, hardware keys 106 and a display 108.
  • the content provider 30 may comprise at least one network interface 110 for connection to the network 40 or other data networks, e.g. a modem which may be wired or wireless.
  • the network interface 110 may therefore be used to receive download requests from the VR display system 1 and to stream data to the VR display system 1.
  • a segment database 116 is also provided, for storing video data for streaming transmission to external devices, such as the VR display system 1 which may be via an over-the-air network.
  • the controller 100 is connected to each of the other components in order to control operation thereof.
  • the memory 104 may be a non-volatile memory such as read only memory (ROM), a hard disk drive (HDD) or a solid state drive (SSD).
  • the memory 104 stores, amongst other things, an operating system 112 and may store software applications 114.
  • the RAM 102 is used by the controller 100 for the temporary storage of data.
  • the operating system 112 may contain code which, when executed by the controller 100 in
  • the controller 100 may take any suitable form. For instance, it may be a
  • microcontroller plural microcontrollers, a processor, or plural processors comprising processor circuitry.
  • the content provider 30 may be a standalone computer, a server, a console, or a network thereof.
  • the content provider 30 may communicate with the VR display system 1 in accordance with one or more software applications 112 in accordance with steps to be described later on.
  • the content provider 30 may also be associated with external software applications. These may be applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications may be termed cloud-hosted applications.
  • the content provider 30 may be in communication with the remote server device in order to utilize the software application stored there.
  • FIG. 5 shows an example 5G architecture.
  • the architecture comprises a media content server 120, akin to the content provider 30 in Figure 2, which stores VR content data for delivery to multiple users.
  • the media content server 120 may communicate with a 5G converged core 130 which represents one or more functional elements of a 5G core network.
  • the 5G converged core 130 may communicate with one or more spatial nodes, including a terrestrial broadcast sG-gNB 140, a 5G- gNB 160 and a fixed network 150.
  • the terrestrial broadcast sG-gNB 140 may be used to transmit content data and other data to large numbers of UEs 145 by a broadcast delivery.
  • the sG-gNB 160 may be used to transmit content data and other data to UEs 145 by multicast delivery, and may also receive feedback data.
  • the fixed network 150 may be used to transmit content data and other data to UEs 145 also, and may receive feedback data.
  • the UEs 145 may comprise, for example, the VR media player 10 which may or may not be combined with the VR display device 20.
  • the reference to "XCast content delivery" indicates that the delivery mechanism may be one or a combination of unicast, multicast or broadcast.
  • FIG 6 shows an example VR content data broadcast scenario that may be employed using at least part of the Figure 5 architecture.
  • the scenario relates to providing VR content data to large numbers of UEs 145, for example in an indoor area although it is applicable to outdoor and a mixture of indoor and outdoor scenarios.
  • Each UE 145 is assumed to be able to transmit and receive data via a 5G network and is configured to decode and display the VR content data.
  • the media content server 120 may be in communication with the sG-gNB 160, e.g. via the 5G converged core 130.
  • the sG-gNB 160 may be suitable for multicast or broadcast transmission and may also receive feedback data from UEs 145 within its respective coverage area 170 via a low-latency data transport link.
  • the media content server 120 may in some embodiments be an edge cloud.
  • the media content server 120 may receive the VR content data from a VR content creation module 180.
  • the VR content creation module 180 may be remote from the media content server 120.
  • the VR content creation module 180 may be associated with a multi-directional camera device 185, such as Nokia's OZO camera.
  • processing operations performed by the sG-gNB 160 in accordance with an embodiment will now be described, although similar nodes can perform said operations in a related context.
  • the processing operations may be performed under control of a software application stored at the sG-gNB 160.
  • the 5G- gNB 160 may comprise components such as those shown in Figure 4, whereby the one or more software applications 114 perform the following operations.
  • a first operation S7.1 comprises receiving from a plurality of UEs 145 information relating to respective viewports, which may be relative to current video content.
  • Another operation S7.2 comprises determining data representing an aggregate viewport based on viewport information from the plurality of UEs 145.
  • Another operation S7.3 comprises transmitting video content data to the plurality of UEs 145 based on the determined aggregate viewport.
  • the transmitting operation S7.3 may be by means of multicast or broadcast using the 5G-gNB 160. It will be appreciated therefore that the video content data transmitted relates or corresponds to the aggregate viewport and is therefore suited to a large number of UEs 145 rather than using individual transmissions suited to individual UEs. Less spectrum resources are needed for the usage of multicast/broadcast to deliver the content, as compared to individual transmissions / unicast.
  • Optional operations may comprise, in an operation S7.4, transmitting the aggregate viewport determined in S7.2 to the plurality of UEs 145.
  • a further operation S7.5 may comprise receiving from one or more UEs 145 an indication that their respective viewport(s) do not relate or correspond to the aggregate viewport.
  • a further operation S7.6 may comprise determining an updated aggregate viewport based on said indication(s) in S7.5. Operation S7.6 may return to operation S7.3 whereby video content data is transmitted based on the (updated) viewport and to operation S7.4 whereby the (updated) aggregate viewport is transmitted to the plurality of UEs.
  • the processing operations may be performed under control of a software application stored at the UE 145.
  • the UE I4s may comprise components such as those shown in Figure 4, whereby the one or more software applications 114 perform the following operations.
  • a first operation S8.1 comprises transmitting to a remote system, e.g. the 5G-gNB 160, information relating to a viewport, which may be relative to currently viewed video content data.
  • An operation S8.2 comprises receiving video content data based on aggregate viewport data determined remotely using the transmitted viewport information and viewport information from other UEs.
  • Optional operations may comprise, in an operation S8.3, determining that the viewport does not relate to the aggregate viewport, and, in an operation S8.4, transmitting an indication that the viewport does not relate to the aggregate viewport.
  • FIG 9 a more detailed flow diagram showing processing operations in each of the Figure 6 content server 120, the 5G-gNB 160 and one of the UEs 145 will now be described in accordance with another embodiment. Note that operations relating to the UE 145 may be performed for each UE in the coverage area of the 5G- gNB 160.
  • a first, optional operation S9.1 comprises using the 5G-gNB 160 to transmit configuration data for configuring the UEs 145 to determine and transmit (or report) their viewport data. This is an optional operation because in some cases, a UE 145 may already be configured to determine and/ or report its viewport data.
  • Operation S9.1 may comprise transmitting a request for viewport data. The request may indicate that UE 145 is required to report its viewport data only if a current viewport data of the UE does not relate to an aggregate viewport.
  • the current viewport data may include a current viewport and/or a predicted future viewport.
  • the UE 145 reports its current viewport data in an operation S9.2 to the 5G-gNB 160.
  • the 5G-gNB 160 determines an aggregate viewport from the plurality of viewports.
  • the aggregate viewport will generally be larger than any individual viewport from a single UE 145.
  • the aggregate viewport may be determined for all UEs 145 or a subset of UEs 145.
  • An indication of the aggregate viewport is transmitted to the content server 120 in an operation S9.4.
  • An indication of the aggregate viewport is broadcast or multicast to the UEs 145 in an operation S9.5.
  • the content server 120 fetches VR content data based on the indicated aggregate viewport.
  • the UE 145 receives the indication of the aggregate viewport broadcast or multicast in step S9.5, as do the other UEs.
  • the 5G- gNB 160 broadcasts or multicasts the VR content data received from the content server 120 in operation S9.6.
  • the VR content data may comprise the aggregate viewport determined in operation S9.3.
  • the UE 145 displays (i.e. decodes, renders and outputs) the received VR content data.
  • the process returns to operation S9.2 whereby the viewport is reported to the 5G- gNB 160 for generating a new aggregate viewport.
  • the viewport information for the UEs 145 may comprise feedback indicative of the viewing angle.
  • the viewing angle may comprise one or more of the spherical co-ordinates of the centre point of the viewport, and a rotation angle of the viewport; the extents of the viewport, e.g. the horizontal and vertical extents; and additional information such as whether the displayed video content data is monoscopic or stereoscopic.
  • the VR content data that is broadcast or multicast may be optimised, for example sent at a higher quality than content data that is outside of the aggregate viewport.
  • the sent VR content data within the aggregate viewport may have a higher resolution and/or higher bit-rate than the other VR content data outside of the viewport.
  • VR content data from outside of the aggregate viewport may be sent in some situations.
  • by sending the optimised VR content data based on the aggregate viewport high-quality VR content can be provided to large numbers of UEs with efficient use of the available spectrum.
  • the indication of the aggregate viewport may comprise a quantised or compressed set of information.
  • the aggregate viewing angle may be relative to ground, 3D co-ordinate information and/or compressed two-dimensional viewing angle grid information.
  • the UE 145 may determine if its current viewport is related to the aggregate viewport, for example located within the aggregate viewport.
  • the indication of the aggregate viewport may also indicate how the current VR content data is to be consumed, e.g. monoscopic or stereoscopic.
  • Figure 10 shows a 3D viewing space 200 and its equivalent, two- dimensional flattened space 210, both of which are divided into multiple regions 215 for quantising viewports.
  • the regions 215 may correspond to content data tiles, mentioned previously.
  • the viewport information that is transmitted to the 5G-gNB 160 may be rounded to the nearest tile boundary to provide a quantised viewport 220.
  • the same quantisation method may be used by the sG-gNB 160 to determine the aggregate viewpoint using the quantised viewpoints from each UE 145.
  • edge content which may either be just within or just outside of the perimeter of the aggregate viewport may be transmitted at lower quality (lower resolution or bit-rate) for local caching, thereby mitigating latency constraints and improving the quality of experience when updating the aggregate viewport.
  • the aggregate viewport may be determined to comprise a region around combination of the viewports reported by UEs 145. As an example, the region could include at least one tile in one or more directions to accommodate possible rapid changes in viewing directions of the users.
  • the sG-gNB 160 may broadcast or multicast the edge content in monoscopic format instead of stereoscopic format, in order to provide a better quality of experience when updating the aggregate viewport.
  • the edge content may be cached at the sG-gNB 160 or in the edge cloud 120 (where provided) for fast fetching and broadcasting or multicasting.
  • the sG-gNB 160 may configure the UEs 145 using the Dynamic Adaptive Streaming over HTTP (DASH, as specified in ISO/IEC 23009-1) protocol stack.
  • the sG-gNB 160 may act as a DASH aware network element (DANE) as defined in the Network Assisted DASH specification (SAND, ISO/IEC 23009-5.)
  • DANE DASH aware network element
  • the 5G-gNB 160 informs the UEs 145 of its capability and willingness to receive viewport SAND messages, e.g. with the DANECapabilities SAND message including an identifier of the viewport SAND message.
  • the sG-gNB 160 may also inform the UEs 140 of desired characteristics of sending such messages, such as the minimum and/or maximum time interval between sending such messages.
  • the viewport SAND message may comprise one or more of the following: viewing orientation (e.g. the spherical co-ordinates of the centre point of the viewport and a rotation angle of the viewport); extents of the viewport; and additional information such as whether the displayed video content data is monoscopic or stereoscopic.
  • the viewport may not relate to the aggregate viewport when the user stops viewing the VR content data; in this respect, the use of a proximity sensor or cameras in a VR viewing device 20 may detect that the user has removed the VR viewing device from their head. Similarly, this may cause the sG-gNB 160 to update the aggregate viewport, disregarding the reported viewport or lack of viewport reports of said UE 145.
  • the viewport may not relate to the aggregate viewport when the current viewport, e.g. the viewing angle, is outside of the aggregate viewport. This may occur when any part of the current viewport of the UE 145 is outside of the perimeter of the aggregate viewport. Responsive thereto, the US 145 may report the new viewport in the usual way to the sG-gNB 160 which then updates the aggregate viewport using the new viewport of UE 145.
  • the viewport may not relate to the aggregate viewport when an angular movement above a predetermined threshold (e.g. 15 0 ) is detected. Responsive thereto, the UE 145 may report the new viewport in the usual way to the sG-gNB 160 which then updates the aggregate viewport using the new viewport of UE 145.
  • a predetermined threshold e.g. 15 0
  • the methods and systems described enable the broadcasting or multicasting of high quality content, such as VR content data (which typically consumes a large amount of air interface resources) in an efficient manner.
  • high quality content such as VR content data (which typically consumes a large amount of air interface resources)
  • This enables a limited amount of the actual VR content, which may represent immersive 360 0 panoramic content, to be transmitted over-the-air and in optimized format, depending on the UE 145 requirements.
  • Another benefit is that only some of UEs 145 will need to send viewport reports at a time, which reduces uplink traffic.
  • the VR content data may be encoded and streamed as a series of tiles.
  • the sG-gNB 160 selects and broadcasts (or multicasts) high quality tile set tracks corresponding to the aggregate viewport and, in some embodiments, low quality tile set tracks for the areas not covered by the aggregate viewport.
  • the 5G-gNB 160 may transmit a tile base track which enables the UE 145 to merge the high quality tile set tracks and the low-quality tile set tracks into a single bit stream that can be decoded at the UE 145 with a single video decoder instance.
  • the 5G-gNB 160 may transmit the high and low -quality streams by superimposing the high-quality stream using a superior modulation and coding scheme over a lower quality transmission layer. For example, this may involve using the most robust modulation and coding scheme optimised for cell-edge users. This may enable the broadcaster to provide resource element mapping of only the lower-quality layer for users receiving such content. If there are users receiving high- quality content, such information may be signalled to said users receiving such content also. If there are only a small number of users receiving higher-quality content, the lower-quality layer may be broadcast and higher-quality content may be sent via unicast or multicast to the relevant users.
  • the relevant physical resource blocks may include a current viewport to physical resource block mapping.
  • the signalling from the 5G-gNB 160 may also include a clear mapping, between the physical resource block in a broadcast channel over-the-air interface, with different types of broadcast content in terms of high or low -quality content. This enables the minimization of receiver complexity and reception of the most relevant content, as each user may receive and decode only the relevant content.
  • the aggregate viewport may also comprise 3D co-ordinate information.
  • empty spaces within the aggregate viewing angle of users may either be transmitted or omitted based on the probability of fast viewport changes or probability of near-time viewing of the VR content data.
  • TTI Transmission Time Interval
  • the feedback signalling from UEs 145 to the sG-gNB 160 is done using shorter TTIs in order to minimise the possible delay that could be caused by the signalling.
  • the frequency of short TTI resources may also be optimised, e.g. for every n time instances, in order to enable fast feedback and optimisation of the VR content data.
  • Figure 11 shows a possible time versus frequency graph, indicating the duration and frequency range of the shorter TTIs (sTTIs) for the viewport feedback data in relation to the broadcast VR content data.
  • the usage of shorter TTIs could enable the sG-gNB to configure faster viewport feedback from the viewing device, for e.g., several times within a time duration of 1 ms.
  • Various other methods of viewport feedback may be employed, including dedicated time and frequency resources over which the UEs 145 may send the viewport feedback.
  • edge content 245 may be defined in relation to the aggregate viewport 240.
  • the edge content 245 may be within a predetermined distance of the perimeter of the aggregate viewport 240, e.g. comprising a single tile adjacent and outside of the perimeter. This edge content 245 may be transmitted at a lower quality (lower resolution or bit-rate) than the content data corresponding to the aggregate viewport 240.
  • the edge content 245 may be cached in a cache at the content server / edge cloud 120 or at a cache of the sG-gNB 160 for immediate broadcast upon receiving signalled information from one of the UEs 145 that its viewport has changed (or responsive to some other prediction that a viewport has changed.)
  • the sG-gNB 160 and/or cache may adaptively learn potential future changes in viewports or the demand for new content based on self-learning algorithms. This may be useful in indoor viewing arenas / VR movie theatres where there may be a higher probability of users making similar viewport changes.
  • the embodiments described propose the encoding of the VR content data within tiles, e.g. rather than region-of-interest based collective coding. This enables dynamic updating of the aggregate viewport without affecting current viewports of users with no change in viewport. This also enables minimal
  • Figures 13a - I3e show different ways in which an aggregate viewport may be determined.
  • first to third viewports are represented in Figures 13a - 13c by reference numerals 230, 231, 232 respectively. Indication of each viewport
  • Figure 13d shows a first example aggregate viewport 235 which is determined such that each tile of the viewports 230,
  • Figure i3e shows a second example aggregate viewport 236 which is determined such that a rectangular aggregate viewport is determined such that the least number of non-represented tiles is included, i.e. one tile in this case.
  • a motion-constrained tile set (MCTS) is such a set of one or more tiles within a picture that the inter prediction (a.k.a. temporal prediction) process is constrained in encoding such that no sample value outside the motion-constrained tile set, and no sample value at a fractional sample position that is derived using one or more sample values outside the motion-constrained tile set, is used for inter prediction of any sample within the motion-constrained tile set.
  • An MCTS may be required to be rectangular. Additionally, the encoding of an MCTS is constrained in a manner that coding parameters, such as motion vector candidates, are not derived from blocks outside the MCTS.
  • MCTS motion-constrained tile set
  • viewport information received from a subset of UEs 145 is used to determine the aggregate viewport 235, 236. For example, viewport information of one or more UEs reporting viewports significantly different from majority of the UEs may be discarded when determining the aggregate view port. This enables to avoid unnecessarily wide aggregate viewports. In some embodiments, determining whether to discard a UE in this respect may be based on a distance between the reported viewport of the UE and a statistical location derived from all reported viewports, for example an average of centres of the reported viewports.
  • the aggregate viewport may be determined such that a predetermined proportion (for example 95 %) of UEs 145 will be served by the aggregate viewport, that is, their current viewport data corresponds to the determined aggregate viewport.
  • a unicast connection may be used and/or established to deliver the appropriate VR content to the discarded UEs.

Abstract

Methods and systems are disclosed relating to adaptive transmission of video content based on feedback, for example feedback from one or more user equipment receiving the video content using an over-the-air broadcast. One operation may comprise receiving from a plurality of remote user equipment (UE) information relating to respective viewports relative to video content. Another operation may comprise determining data representing an aggregate viewport based on the viewport information from the plurality of UEs. Another operation may comprise transmitting video content data to the plurality of UEs based on the determined aggregate viewport.

Description

Transmission of Video Content based on Feedback
Field
This disclosure relates to transmission of video content based on feedback. For example, the video content may represent, but is not limited to, panoramic virtual reality content transmitted to a plurality of user equipment in a broadcast or multicast transmission.
Background
Transmission of content, particularly video content, using over-the-air technology can be challenging, particularly if high quality video content with little or no latency is desired by the end user. Virtual Reality (VR) video content data, which may or may not comprise audio, is particularly challenging given the amount of data that is required and the limited availability of spectrum resources.
In terms of transmitting VR video content data to large numbers of users, for example using third generation (3G) fourth generation (4G / LTE-Advanced) and fifth generation (5G) wireless networks, the use of unicast (one to one) links between service provider and users has been considered, but this has implications in terms of deployment complexity and cost, as well as potentially limiting the number of users able to receive the VR video content data with acceptable quality.
Summary
A first aspect provides a method comprising: receiving from a plurality of remote user equipment (UE) information relating to respective viewports; determining data representing an aggregate viewport based on the viewport information from the plurality of UEs; and transmitting video content data to the plurality of UEs based on the determined aggregate viewport. The method may further comprise: transmitting the aggregate viewport data to the UEs; receiving from one or more of the UEs an indication that their respective user viewport does not relate to the aggregate viewport; determining an updated aggregate viewport based on said received indication; transmitting to the plurality of UEs the updated aggregate viewport; and transmitting to the plurality of UEs video content data based on the updated aggregate viewport. The indication that the respective user viewport does not relate to the aggregate viewport may comprise an indication that said UE has terminated receiving the video content data. The indication that the respective user viewport does not relate to the aggregate viewport may comprise an indication that video content data is not being viewed at said UE.
Determining the updated aggregate viewport may comprise disregarding viewport information received from said UE.
The indication that the respective user viewport no longer corresponds to the aggregate viewport may comprise an indication that the viewport has changed by more than a predetermined amount.
The indication that the respective user viewport no longer corresponds to the aggregate viewport may comprise an indication that the viewport is outside of the aggregate viewport. The viewport information may comprise information from which can be derived information relating to a viewing angle associated with the UE or a display device associated with the UE.
The viewing angle information may comprise a viewing orientation. The viewing angle information may comprise the extent of the viewport.
The extent of the viewport may be represented by the horizontal and vertical fields of view of the viewport. The viewport information may further comprise an indication of whether the content is monoscopic or stereoscopic.
The viewing angle information may be quantised and/or compressed and wherein the method may further comprise reconstructing and/or decompressing the viewing angle information prior to determination. Transmitting the video content data to the plurality of UEs may comprise transmitting video content data corresponding to the aggregate viewport at a higher quality relative to other video content data outside of the aggregate viewport. The transmitted video content may be transmitted at a high resolution and/or data rate than other video content data outside of the aggregate viewport.
The method may further comprise caching video content outside of, but within a limited predetermined spatial distance of, the aggregate viewport, and transmitting said cached video content at a higher quality responsive to receiving from the indication that a UEs respective user viewport does not relate to the aggregate viewport.
The cached video content may be transmitted prior to transmitting to the UEs the updated aggregate viewpoint.
The video content data may be provided as a plurality of tiles, each tile corresponding to a sub-portion of the overall content data and representing a respective spatial display position. The video content data may be transmitted to the UEs using one or more of a multicast or broadcast transmission. The aggregate viewport data may be transmitted to the UEs using one or more of a multicast or broadcast transmission.
The video content data may represent a panoramic image or video. The video content data may be virtual reality (VR) content data. The user viewport information from each of the plurality of UEs may be indicative of the user viewport of a respective VR headset.
The method may further comprise transmitting configuration data to each of the UEs for configuring them to transmit the viewport information.
The method may be performed at a base station of a mobile network. A further aspect provides a method, comprising: transmitting to a remote system information relating to a viewport of a user equipment (UE); receiving further video content data based on aggregate viewport data determined remotely using the transmitted viewport information and viewport information from one or more other UEs.
The method may further comprise: receiving the aggregate viewport data; determining that the viewport does not relate to the aggregate viewport; and transmitting an indication that the viewport does not relate to the aggregate viewport to the remote system.
The determination that the viewport does not relate to the aggregate viewport may be made responsive to the UE terminating receiving the video content data, the transmitted indication identifying such to the remote system.
The determination that the viewport does not relate to the aggregate viewport may be made responsive to the video content data not being viewed at the UE, the transmitted indication identifying such to the remote system.
The method may further comprise terminating transmission of the viewport information.
The determination that the viewport does not relate to the aggregate viewport may be made responsive to the viewport of the UE changing by more than a predetermined amount, the transmitted indication comprising updated viewport information.
The determination that the viewport does not relate to the aggregate viewport may be made responsive to the viewport of the UE being outside of the aggregate viewport, the transmitted indication comprising updated viewport information.
The transmitted viewport information may comprise information from which can be derived information relating to a viewing angle associated with the UE or a display device associated with the UE.
The transmitted viewing angle information may comprise a viewing orientation. The transmitted viewing angle information may comprise the extent of the viewport. The extent of the viewport may be represented by the horizontal and vertical fields of view of the viewport. The transmitted viewport information may further comprise an indication of whether the UE prefers monoscopic or stereoscopic content.
The method may further comprise quantising and/or compressing the viewing angle information prior to transmitting.
The received video content data may correspond to the aggregate viewport is at a higher quality relative to other video content data outside of the aggregate viewport.
The received video content data may be at a higher resolution and/or data rate than other video content data outside of the aggregate viewport.
The method may further comprise caching received video content data outside of, but within a limited predetermined spatial distance of, the aggregate viewport, and displaying said cached video content responsive to determining that the viewport does not relate to the aggregate viewport.
The video content data may be provided as a plurality of tiles, each tile corresponding to a sub-portion of the overall video content data and representing a respective spatial display position.
The video content data may be received using one or more of a multicast or broadcast transmission. The aggregate viewport data may be received using one or more of a multicast or broadcast transmission. The video content data may represent a panoramic image or video. The video content data may be virtual reality (VR) content data. The user viewport information may be indicative of the user viewport of a VR headset.
The method may further comprise receiving from a remote device configuration data for configuring the UE to transmit the viewport information, and transmitting the viewport information responsive to installing or executing the configuration data. The method may be performed at a mobile terminal for connection to a mobile network.
A further aspect may provide a computer program comprising instructions that when executed by a computer control it to perform the method of any preceding method definition.
A further aspect may provide an apparatus configured to perform the method steps of any of preceding method definition.
A further aspect may provide a non-transitory computer-readable medium having stored thereon computer-readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising: receiving from a plurality of remote user equipment (UE) information relating to respective viewports; determining data representing an aggregate viewport based on the viewport information from the plurality of UEs; and transmitting video content data to the plurality of UEs based on the determined aggregate viewport.
A further aspect may provide a non-transitory computer-readable medium having stored thereon computer-readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising:
transmitting to a remote system information relating to a viewport of a user equipment (UE); and receiving further video content data based on aggregate viewport data determined remotely using the transmitted viewport information and viewport information from one or more other UEs.
Brief Description of the Drawings
Example embodiments will now be described by way of non-limiting example with reference to the accompanying drawings, in which:
Figure 1 is a perspective view of a VR display system, useful for understanding the embodiments;
Figure 2a is a block diagram of a computer network including the Figure 1 VR display system, according to embodiments;
Figure 3a is a schematic top-plan view of a virtual space;
Figure 3b is a schematic internal view of part of the Figure 3a virtual space; Figure 4 is a block diagram of components of a VR content provider forming part of the Figure 2 VR display system;
Figure 5 is schematic diagram of a 5G network architecture;
Figure 6 is a schematic diagram of a 5G network architecture according to
embodiments;
Figure 7 is a flow diagram showing processing operations performed at, for example, a 5G base station according to embodiments;
Figure 8 is a flow diagram showing processing operations performed at a user equipment according to embodiments;
Figure 9 is a flow diagram showing processing operations performed at a 5G base station, content provider and user equipment according to embodiments;
Figure 10 is a perspective view of a 3600 space and a flattened two-dimensional (2D) space according to embodiments;
Figure 11 is a frequency-time graph for illustrating timing and bandwidth of feedback information and content data according to embodiments;
Figure 12 is a schematic view showing edge content in relation to an aggregate viewport according to embodiments;
Figures 13a - 13c are schematic views of viewports from respective UEs; and
Figures 13d - i3e are schematic views of alternative aggregate viewports derived from the viewports of Figures 13a - 13c.
Detailed Description of Preferred Embodiments
Embodiments herein relate to the transmission of video data to one or more remote users by means of, for example, over-the-air technology. The video data may represent, for example, panoramic virtual reality (VR) content data representing immersive video content. The VR content data may represent omnidirectional (a.k.a. 360-degree) video content for user exploration using three degrees of freedom (3D0F), wherein the user can choose the viewing orientation, which may be indicated by extrinsic rotation angles around three orthogonal coordinate axes. The VR content data may or may not be accompanied by an audio stream, which audio stream may comprise spatial audio data. The methods and systems are however applicable to any form of video data for transmission from a source to a destination.
References to VR content data are intended also to cover related technologies such as Augmented Reality (AR) and Mixed Reality (MR). The transmission of the VR content data may be by means of video streaming between a content provider and a plurality of users, potentially a large number of users, by means of either or both of a multicast and/or broadcast transmission over-the-air. The transmission may employ an IP network. Again, this is given by way of example. The transmission mechanism may use 3G, 4G LTE or 5G wireless networks, e.g. base stations and protocols, but the methods and systems described herein are applicable to future wireless technologies also.
It will be appreciated that multicast generally refers to the transmission of data between a source and a logical group of identified receivers, for example receivers that are members of a particular multicast group, whereas broadcast refers to transmission of data to all receivers within a defined area, e.g. within a cell corresponding to a base station. In 3G and 4G LTE -Advanced wireless networks, multicast and broadcast networks may use the Multimedia Broadcast / Multicast Service (MBMS) for efficient content distribution, as has been used for television broadcasts, public warning systems and mission-critical communication systems.
Embodiments relate to view-dependent video transmission, where the VR content data that is transmitted from the source to user end systems is dependent on the position or orientation of users who may be wearing a VR user device.
In some embodiments, the transmitted VR content data may represent a part of an overall, wide-angle video scene. For example, the video scene may cover a field which is greater than a viewer's typical field-of-view, e.g. greater than 1800. Therefore, embodiments are particularly suited to applications where a user may consume and/or interact with an overall video scene greater than 1800 and possibly up to 3600.
One use case is virtual reality (VR) content data whereby video content is streamed to a VR display system. As is known, the VR display system may be provided with a live or stored feed from a video content source, the feed representing a VR space for immersive output through the display system. In some embodiments, audio is provided, which may be spatial audio.
Nokia's OZO (RTM) VR camera is an example of a VR capture device which comprises a camera and microphone array to provide VR content data and a spatial audio signal, but it will be appreciated that the embodiments are not limited to VR applications nor the use of microphone arrays at the video capture point.
Figure l is a schematic illustration of a VR display system 1. The VR system l includes a VR headset 20, for displaying visual data in a virtual reality space, and a VR media player 10 for rendering visual content data on the VR headset 20.
In the context of this specification, a virtual space is any computer-generated version of a space, for example a captured real world space, in which a user can be immersed. The VR headset 20 may be of any suitable type. The VR headset 20 may be configured to provide VR video and audio content data to a user. As such, the user may be immersed in virtual space.
The VR headset 20 receives visual content from a VR media player 10. The VR media player 10 may be part of a separate device which is connected to the VR headset 20 by a wired or wireless connection. For example, the VR media player 10 may include a games console, or a PC configured to communicate visual data to the VR headset 20.
Alternatively, the VR media player 10 may form part of the display for the VR headset 20.
Here, the media player 10 may comprise a mobile phone, smartphone or tablet computer configured to play content through its display. For example, the device may be a touchscreen device having a large display over a major surface of the device, through which video content can be displayed. The device may be inserted into a holder of a VR headset 20. With these headsets, a smart phone or tablet computer may display visual data which is provided to a user's eyes via respective lenses in the VR headset 20. The VR display system 1 may also include hardware configured to convert the device to operate as part of VR display system 1. Alternatively, VR media player 10 may be integrated into the VR display device 20. VR media player 10 may be implemented in software. In some embodiments, a device comprising VR media player software is referred to as the VR media player 10.
The VR display system 1 may include means for determining the spatial position and/or orientation of the user's head. Over successive time frames, a measure of movement may therefore be calculated and stored. Such means may comprise part of the VR media player 10. Alternatively, the means may comprise part of the VR display device 20. For example, the VR display device 20 may incorporate motion tracking sensors which may include one or more of gyroscopes, accelerometers and structured light systems.
These sensors generate position data from which a current visual field-of-view (FOV) is determined and updated as the user changes position and/or orientation. The current visual FOV may also be termed a "Viewport" as will be used hereafter. The VR display device 20 will typically comprise two digital screens for displaying stereoscopic video images of the virtual world in front of respective eyes of the user, and also two speakers for delivering audio, if provided from the VR system. In some VR display devices, a single screen may be divided into two portions for displaying a stereo image pair. The embodiments herein, which primarily relate to the delivery of VR content data, are not limited to a particular type of VR display device 20. The VR content data may, for example, be monoscopic instead of stereoscopic.
The VR display system 1 may be configured to display visual data to the user based on the spatial position of the display device 20 and/or the orientation of the user's head. A detected change in spatial position and/or orientation, i.e. a form of movement, may result in a corresponding change in the visual data to reflect a position or orientation transformation of the user with reference to the virtual space into which the visual data is projected. This allows VR content data to be consumed with the user experiencing an omnidirectional 3D VR environment.
The VR display device 20 may display non-VR content data captured with two- dimensional video or image devices, such as a smartphone or a camcorder, for example. Such non-VR content data may include a framed video or a still image. The non-VR source content data may be 2D, stereoscopic or 3D. The non-VR source content data includes visual source content, and may optionally include audio source content. Such audio source content may be spatial audio source content. Spatial audio may refer to directional rendering of audio in the virtual space such that a detected change in the orientation of the user's head may result in a corresponding change in the spatial audio rendering to reflect an orientation transformation of the user with reference to the virtual space in which the spatial audio data is rendered. The display of the VR display device 20 is described in more detail below. In some embodiments, the angular extent of the virtual environment observable through the VR display device 20 is called the viewport of the display device. More generally, the viewport is any representation of a sub-section of the overall VR space that the user is seeing via the VR display device 20. The actual viewport observed by a user may depend on the inter-pupillary distance and on the distance between the lenses of the headset and the user's eyes, but the extent of the viewport can be considered to be approximately the same for all users of a given display device when the display device is being worn by the user.
The viewport of the VR display device 20 may be represented by viewport data determined locally, i.e. at the VR display device 20 and/or at the VR media player 10 using known techniques. For example, the viewport data of the VR display device 20 may be representative of a viewing angle, comprising one or more of: viewing orientation (e.g. the spherical coordinates of the centre point of the viewport and a rotation angle of the viewport), the extents of the viewport (e.g. the horizontal and vertical field of view of the viewport). The viewport data may be accompanied by data indicative of the type of content, e.g. monoscopic or stereoscopic VR data.
Figure 2 shows a generalised VR system 1, comprising the above-described media player 10 and VR display device 20. A remote content provider 30 may store and transmit streaming video data which, in the context of embodiments, is VR content data for display to the VR display device 20. The content provider 30 streams the VR content data over a data network 40, which may be any network, for example an IP network such as an over-the-air network, such as a 3G, 4G or 5G mobile IP network, a multicast network, or a broadcast network. If data network 40 is unidirectional, a return channel for providing feedback from VR display device 20 to the content provider 30 may be provided by another data network.
In embodiments to be described below, the use of a 5G network is assumed, but embodiments are not limited to such. The remote content provider 30 may or may not be the location or system where the VR content data is captured and processed. The VR content data may cover, horizontally, the full 3600 coverage or extent around the capturing position of the VR capture device. The vertical coverage may vary and can be, for example, 1800. Such images may be represented by a sphere that is mapped onto a two-dimensional image plane using equi-rectangular projection (ERP.) The horizontal co-ordinates may be considered equivalent to longitude and the vertical coordinates equivalent to latitude with no transformation or scaling applied. The process of forming a monoscopic equi-rectangular panoramic picture can be performed by (i) stitching individual images captured by the different cameras of a 3600 capture device onto a spherical image; (ii) projecting the spherical image onto the wall of a cylinder; (iii) unfolding the cylinder to a two-dimensional image. Some stages may be merged.
In some cases, a panoramic image may have less than a 3600 horizontal coverage and up to 1800 vertical coverage, but otherwise may have the characteristics of equi- rectangular projection format. In general, a multitude of projection formats exist for 360 degree video, including, but not limited to, cubemap projection, octahedron projection, and equal area projection.
3600 video content data can be monoscopic or stereoscopic. For monoscopic video content data, a monoscopic display, such as conventional flat panel display, is used, or the same video is displayed for both eyes on a stereoscopic display, such as the VR display device 20. For stereoscopic video content data, the displayed content comprises two views, corresponding to the left eye and the right eye respectively. The VR content data for the overall video scene, for example a 360 degree video scene, may be arranged as a series of two-dimensional areas or tiles, each representing a respective spatial part of the scene. Therefore, the VR media player 10 may only receive from the content provider 30 those tiles which are within the current viewport, at least at a high quality. This viewport-dependent transmission of VR content data, or viewport-adaptive streaming (VAS), is useful for reducing the streaming bit-rate with little or no impact on subjective quality.
Each tile may be represented by a video segment, which is a temporal portion of the video content, typically in the order of seconds, e.g. two seconds or similar. Therefore, for a given tile, multiple segments may be downloaded, buffered and then decoded and rendered in a sequential order. Therefore, in some embodiments, there may be provided a plurality of groups of video segments, each group corresponding to a respective sub-area of the larger video content, and each segment within a group representing different temporal portions of the video content. The video content may therefore be divided in both the spatial and temporal domains.
Figure 3a is a schematic top plan view representing a 3600 view field 50 in relation to a user 55 wearing the VR display device 20. Based on the user's position and the orientation of the VR display device 20, only a current viewport 60 may be streamed to the media player 10, i.e. the portions or segments of the video scene between the bounding lines 57.
Figure 3b shows, for example, a plurality of segments 80a - 8oh when rendered to the VR display device 20 from the user's perspective. These may be termed "first segments" in that they represent the current viewport 60. Each segment 80a - 8oh is effectively a tile representing a respective two-dimensional region of the viewport 60. Each segment 80a - 8oh may represent a still image, or video data lasting several seconds in length.
For the avoidance of doubt, the term segment used herein refers to video data representing a sub-portion of an overall image for a time interval.
In order to indicate the current viewport 60, the VR display device 20, or its associated VR media player 10, needs to feedback to the contact provider 30 positional data indicative of the current viewport to the source.
The positional data may represent the current viewport in a number of ways. As mentioned above, the viewport of the display device 20 may be represented by viewport data determined locally, i.e. at the VR display device 20 and/or at the VR media player 10 using known techniques. For example, the viewport data of the display device 20 may be representative of a viewing angle, comprising one or more of: viewing orientation (e.g. the spherical coordinates of the centre point of the viewport and a rotation angle of the viewport), the extents of the viewport (e.g. the horizontal and vertical field of view of the viewport). The viewport data may be accompanied by data indicative of the type of content, e.g. monoscopic or stereoscopic VR content data. In some embodiments, the viewport data may be compressed or quantised; in other words, the display device 20 and/or the media player need only generate information sufficient for the content provider 30 (or a network node, e.g. a 5G base station or gigabit Node B (gNB)) to determine the viewport at that end.
In situations where the source is transmitting the VR content data to multiple users via unicast links, this viewport dependent transmission would require transmitting updated VR content data to each respective VR display device 20 based on the individual viewports fed back from said VR display devices 20. This involves significant processing and spectrum resources.
Embodiments herein provide methods and systems for transmitting VR content data using multicast or broadcast protocols, which may involve receiving viewport information from a plurality of user-end devices, one or more of which may be the VR display device 20, determining an aggregate viewport based on the viewport information feedback, and transmitting subsequent VR content data based on the determined aggregate viewport. In this way, high quality immersive VR content data may be transmitted to user-end devices using multicast or broadcast protocols for mass delivery, whilst making efficient use of over-the-air radio resources. Embodiments are therefore suited to over-the-air links where spectral resources are limited, such as mobile networks, for example. Latency and caching issues may also be improved, as will become evident. Other features may include the user-end devices 10, 20 determining if their respective viewports do not correspond to the determined aggregate viewport, and if not, informing said fact in subsequent feedback information to the content provider 30 in order that the aggregate viewport can be updated. Detailed embodiments are explained below.
Figure 4 is a schematic diagram of components of the content provider 30, or a computer system associated with the content provider. The content provider 30 may have a controller 100, RAM 102, a memory 104, and, optionally, hardware keys 106 and a display 108. The content provider 30 may comprise at least one network interface 110 for connection to the network 40 or other data networks, e.g. a modem which may be wired or wireless. The network interface 110 may therefore be used to receive download requests from the VR display system 1 and to stream data to the VR display system 1. A segment database 116 is also provided, for storing video data for streaming transmission to external devices, such as the VR display system 1 which may be via an over-the-air network.
The controller 100 is connected to each of the other components in order to control operation thereof.
The memory 104 may be a non-volatile memory such as read only memory (ROM), a hard disk drive (HDD) or a solid state drive (SSD). The memory 104 stores, amongst other things, an operating system 112 and may store software applications 114. The RAM 102 is used by the controller 100 for the temporary storage of data. The operating system 112 may contain code which, when executed by the controller 100 in
conjunction with the RAM 102, controls operation of each of the hardware components of the content provider 30.
The controller 100 may take any suitable form. For instance, it may be a
microcontroller, plural microcontrollers, a processor, or plural processors comprising processor circuitry.
The content provider 30 may be a standalone computer, a server, a console, or a network thereof. The content provider 30 may communicate with the VR display system 1 in accordance with one or more software applications 112 in accordance with steps to be described later on.
In some embodiments, the content provider 30 may also be associated with external software applications. These may be applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications may be termed cloud-hosted applications. The content provider 30 may be in communication with the remote server device in order to utilize the software application stored there.
Figure 5 shows an example 5G architecture. The architecture comprises a media content server 120, akin to the content provider 30 in Figure 2, which stores VR content data for delivery to multiple users. The media content server 120 may communicate with a 5G converged core 130 which represents one or more functional elements of a 5G core network. The 5G converged core 130 may communicate with one or more spatial nodes, including a terrestrial broadcast sG-gNB 140, a 5G- gNB 160 and a fixed network 150. The terrestrial broadcast sG-gNB 140 may be used to transmit content data and other data to large numbers of UEs 145 by a broadcast delivery. The sG-gNB 160 may be used to transmit content data and other data to UEs 145 by multicast delivery, and may also receive feedback data. The fixed network 150 may be used to transmit content data and other data to UEs 145 also, and may receive feedback data. The UEs 145 may comprise, for example, the VR media player 10 which may or may not be combined with the VR display device 20. The reference to "XCast content delivery" indicates that the delivery mechanism may be one or a combination of unicast, multicast or broadcast.
Figure 6 shows an example VR content data broadcast scenario that may be employed using at least part of the Figure 5 architecture. The scenario relates to providing VR content data to large numbers of UEs 145, for example in an indoor area although it is applicable to outdoor and a mixture of indoor and outdoor scenarios. Each UE 145 is assumed to be able to transmit and receive data via a 5G network and is configured to decode and display the VR content data. The media content server 120 may be in communication with the sG-gNB 160, e.g. via the 5G converged core 130. The sG-gNB 160 may be suitable for multicast or broadcast transmission and may also receive feedback data from UEs 145 within its respective coverage area 170 via a low-latency data transport link. The media content server 120 may in some embodiments be an edge cloud. The media content server 120 may receive the VR content data from a VR content creation module 180. The VR content creation module 180 may be remote from the media content server 120. The VR content creation module 180 may be associated with a multi-directional camera device 185, such as Nokia's OZO camera.
Referring to Figure 7, processing operations performed by the sG-gNB 160 in accordance with an embodiment will now be described, although similar nodes can perform said operations in a related context. The processing operations may be performed under control of a software application stored at the sG-gNB 160. The 5G- gNB 160 may comprise components such as those shown in Figure 4, whereby the one or more software applications 114 perform the following operations.
A first operation S7.1 comprises receiving from a plurality of UEs 145 information relating to respective viewports, which may be relative to current video content.
Another operation S7.2 comprises determining data representing an aggregate viewport based on viewport information from the plurality of UEs 145. Another operation S7.3 comprises transmitting video content data to the plurality of UEs 145 based on the determined aggregate viewport. The transmitting operation S7.3 may be by means of multicast or broadcast using the 5G-gNB 160. It will be appreciated therefore that the video content data transmitted relates or corresponds to the aggregate viewport and is therefore suited to a large number of UEs 145 rather than using individual transmissions suited to individual UEs. Less spectrum resources are needed for the usage of multicast/broadcast to deliver the content, as compared to individual transmissions / unicast.
Optional operations may comprise, in an operation S7.4, transmitting the aggregate viewport determined in S7.2 to the plurality of UEs 145. A further operation S7.5 may comprise receiving from one or more UEs 145 an indication that their respective viewport(s) do not relate or correspond to the aggregate viewport. A further operation S7.6 may comprise determining an updated aggregate viewport based on said indication(s) in S7.5. Operation S7.6 may return to operation S7.3 whereby video content data is transmitted based on the (updated) viewport and to operation S7.4 whereby the (updated) aggregate viewport is transmitted to the plurality of UEs.
Referring to Figure 8, processing operations performed by a UE 145 in accordance with an embodiment will now be described. The processing operations may be performed under control of a software application stored at the UE 145. The UE I4smay comprise components such as those shown in Figure 4, whereby the one or more software applications 114 perform the following operations.
A first operation S8.1 comprises transmitting to a remote system, e.g. the 5G-gNB 160, information relating to a viewport, which may be relative to currently viewed video content data. An operation S8.2 comprises receiving video content data based on aggregate viewport data determined remotely using the transmitted viewport information and viewport information from other UEs.
Optional operations may comprise, in an operation S8.3, determining that the viewport does not relate to the aggregate viewport, and, in an operation S8.4, transmitting an indication that the viewport does not relate to the aggregate viewport. Referring to Figure 9, a more detailed flow diagram showing processing operations in each of the Figure 6 content server 120, the 5G-gNB 160 and one of the UEs 145 will now be described in accordance with another embodiment. Note that operations relating to the UE 145 may be performed for each UE in the coverage area of the 5G- gNB 160.
A first, optional operation S9.1 comprises using the 5G-gNB 160 to transmit configuration data for configuring the UEs 145 to determine and transmit (or report) their viewport data. This is an optional operation because in some cases, a UE 145 may already be configured to determine and/ or report its viewport data. Operation S9.1 may comprise transmitting a request for viewport data. The request may indicate that UE 145 is required to report its viewport data only if a current viewport data of the UE does not relate to an aggregate viewport. The current viewport data may include a current viewport and/or a predicted future viewport.
The UE 145 reports its current viewport data in an operation S9.2 to the 5G-gNB 160.
In an operation S9.3, responsive to receiving the viewport data, and viewport data from the other UEs 145, the 5G-gNB 160 determines an aggregate viewport from the plurality of viewports. The aggregate viewport will generally be larger than any individual viewport from a single UE 145. The aggregate viewport may be determined for all UEs 145 or a subset of UEs 145. An indication of the aggregate viewport is transmitted to the content server 120 in an operation S9.4. An indication of the aggregate viewport is broadcast or multicast to the UEs 145 in an operation S9.5. In an operation S9.6 the content server 120 fetches VR content data based on the indicated aggregate viewport. In an operation S9.7, the UE 145 receives the indication of the aggregate viewport broadcast or multicast in step S9.5, as do the other UEs. In an operation S9.8, the 5G- gNB 160 broadcasts or multicasts the VR content data received from the content server 120 in operation S9.6. The VR content data may comprise the aggregate viewport determined in operation S9.3. In an operation S9.9, the UE 145 displays (i.e. decodes, renders and outputs) the received VR content data. In an operation S9.10 if it is determined that the viewport of the UE 145 does not relate to the aggregate viewport, then the process returns to operation S9.2 whereby the viewport is reported to the 5G- gNB 160 for generating a new aggregate viewport. It will be appreciated from Figures 7 - 9 that some operations may be re-ordered and/or performed in parallel. The number order of individual operations is not necessarily indicative of processing order and some operations may be optional. In the above embodiments, the viewport information for the UEs 145 may comprise feedback indicative of the viewing angle. The viewing angle may comprise one or more of the spherical co-ordinates of the centre point of the viewport, and a rotation angle of the viewport; the extents of the viewport, e.g. the horizontal and vertical extents; and additional information such as whether the displayed video content data is monoscopic or stereoscopic.
In some embodiments, the VR content data that is broadcast or multicast, e.g. in operations S7.3 or S9.8 may be optimised, for example sent at a higher quality than content data that is outside of the aggregate viewport. For example, the sent VR content data within the aggregate viewport may have a higher resolution and/or higher bit-rate than the other VR content data outside of the viewport. In will therefore be appreciated that VR content data from outside of the aggregate viewport may be sent in some situations. However, by sending the optimised VR content data based on the aggregate viewport, high-quality VR content can be provided to large numbers of UEs with efficient use of the available spectrum.
The indication of the aggregate viewport may comprise a quantised or compressed set of information. For example, the aggregate viewing angle may be relative to ground, 3D co-ordinate information and/or compressed two-dimensional viewing angle grid information. Based on this information, the UE 145 may determine if its current viewport is related to the aggregate viewport, for example located within the aggregate viewport. The indication of the aggregate viewport may also indicate how the current VR content data is to be consumed, e.g. monoscopic or stereoscopic. For example, Figure 10 shows a 3D viewing space 200 and its equivalent, two- dimensional flattened space 210, both of which are divided into multiple regions 215 for quantising viewports. The regions 215 may correspond to content data tiles, mentioned previously. Hence, upon determining the horizontal and vertical extents of the current viewport, the viewport information that is transmitted to the 5G-gNB 160 may be rounded to the nearest tile boundary to provide a quantised viewport 220. The same quantisation method may be used by the sG-gNB 160 to determine the aggregate viewpoint using the quantised viewpoints from each UE 145.
In some embodiments, edge content which may either be just within or just outside of the perimeter of the aggregate viewport may be transmitted at lower quality (lower resolution or bit-rate) for local caching, thereby mitigating latency constraints and improving the quality of experience when updating the aggregate viewport. In some embodiments the aggregate viewport may be determined to comprise a region around combination of the viewports reported by UEs 145. As an example, the region could include at least one tile in one or more directions to accommodate possible rapid changes in viewing directions of the users. Alternatively, or additionally, the sG-gNB 160 may broadcast or multicast the edge content in monoscopic format instead of stereoscopic format, in order to provide a better quality of experience when updating the aggregate viewport. The edge content may be cached at the sG-gNB 160 or in the edge cloud 120 (where provided) for fast fetching and broadcasting or multicasting.
Referring to operation S9.1 above, the sG-gNB 160 may configure the UEs 145 using the Dynamic Adaptive Streaming over HTTP (DASH, as specified in ISO/IEC 23009-1) protocol stack. The sG-gNB 160 may act as a DASH aware network element (DANE) as defined in the Network Assisted DASH specification (SAND, ISO/IEC 23009-5.) The 5G-gNB 160 informs the UEs 145 of its capability and willingness to receive viewport SAND messages, e.g. with the DANECapabilities SAND message including an identifier of the viewport SAND message. The sG-gNB 160 may also inform the UEs 140 of desired characteristics of sending such messages, such as the minimum and/or maximum time interval between sending such messages.
As a result, the UEs 145 generate their respective viewport SAND messages and transmit them to the sG-gNB 160. The viewport SAND message may comprise one or more of the following: viewing orientation (e.g. the spherical co-ordinates of the centre point of the viewport and a rotation angle of the viewport); extents of the viewport; and additional information such as whether the displayed video content data is monoscopic or stereoscopic.
The aggregate viewport change notification, represented in steps S8.3 and S9.10, is generated by the UE 145 when the viewport does not relate to the aggregate viewport. This may occur in a number of situations. For example, the viewport may not relate to the aggregate viewport when the user terminates the viewing session of the VR content data. For example, when the user logs-off of the viewing session, or otherwise terminates the session, the configured UE 145 may respond by informing the 5G-gNB 160 of said fact (e.g. viewport = off) which causes the sG-gNB to update the aggregate viewport, disregarding the reported viewport or lack of viewport reports of said UE. Additionally, or alternatively, the viewport may not relate to the aggregate viewport when the user stops viewing the VR content data; in this respect, the use of a proximity sensor or cameras in a VR viewing device 20 may detect that the user has removed the VR viewing device from their head. Similarly, this may cause the sG-gNB 160 to update the aggregate viewport, disregarding the reported viewport or lack of viewport reports of said UE 145.
For example, the viewport may not relate to the aggregate viewport when the current viewport, e.g. the viewing angle, is outside of the aggregate viewport. This may occur when any part of the current viewport of the UE 145 is outside of the perimeter of the aggregate viewport. Responsive thereto, the US 145 may report the new viewport in the usual way to the sG-gNB 160 which then updates the aggregate viewport using the new viewport of UE 145.
For example, the viewport may not relate to the aggregate viewport when an angular movement above a predetermined threshold (e.g. 150) is detected. Responsive thereto, the UE 145 may report the new viewport in the usual way to the sG-gNB 160 which then updates the aggregate viewport using the new viewport of UE 145.
It will be appreciated from the above that the methods and systems described enable the broadcasting or multicasting of high quality content, such as VR content data (which typically consumes a large amount of air interface resources) in an efficient manner. This enables a limited amount of the actual VR content, which may represent immersive 3600 panoramic content, to be transmitted over-the-air and in optimized format, depending on the UE 145 requirements. Another benefit is that only some of UEs 145 will need to send viewport reports at a time, which reduces uplink traffic.
As mentioned above, the VR content data may be encoded and streamed as a series of tiles. In the above embodiment, the sG-gNB 160 selects and broadcasts (or multicasts) high quality tile set tracks corresponding to the aggregate viewport and, in some embodiments, low quality tile set tracks for the areas not covered by the aggregate viewport.
In some embodiments, the 5G-gNB 160 may transmit a tile base track which enables the UE 145 to merge the high quality tile set tracks and the low-quality tile set tracks into a single bit stream that can be decoded at the UE 145 with a single video decoder instance.
In some embodiments, the 5G-gNB 160 may transmit the high and low -quality streams by superimposing the high-quality stream using a superior modulation and coding scheme over a lower quality transmission layer. For example, this may involve using the most robust modulation and coding scheme optimised for cell-edge users. This may enable the broadcaster to provide resource element mapping of only the lower-quality layer for users receiving such content. If there are users receiving high- quality content, such information may be signalled to said users receiving such content also. If there are only a small number of users receiving higher-quality content, the lower-quality layer may be broadcast and higher-quality content may be sent via unicast or multicast to the relevant users. The relevant physical resource blocks may include a current viewport to physical resource block mapping. Optionally, the signalling from the 5G-gNB 160 may also include a clear mapping, between the physical resource block in a broadcast channel over-the-air interface, with different types of broadcast content in terms of high or low -quality content. This enables the minimization of receiver complexity and reception of the most relevant content, as each user may receive and decode only the relevant content.
Embodiments above have been described in relation to the two-dimensional domain, but it will be appreciated that they can be extended to the 3D domain. For example, the aggregate viewport may also comprise 3D co-ordinate information. In the 3D domain, empty spaces within the aggregate viewing angle of users may either be transmitted or omitted based on the probability of fast viewport changes or probability of near-time viewing of the VR content data. As will be appreciated, a Transmission Time Interval (TTI) is a parameter in digital telecommunication networks relating to the encapsulation of data from higher layers into frames for transmission on the radio link layer. TTI refers to the duration of a transmission on the radio link. In some embodiments, it may be assumed that (depending on the latency constraints of the system) the feedback signalling from UEs 145 to the sG-gNB 160 is done using shorter TTIs in order to minimise the possible delay that could be caused by the signalling. The frequency of short TTI resources may also be optimised, e.g. for every n time instances, in order to enable fast feedback and optimisation of the VR content data.
Figure 11 shows a possible time versus frequency graph, indicating the duration and frequency range of the shorter TTIs (sTTIs) for the viewport feedback data in relation to the broadcast VR content data. The usage of shorter TTIs could enable the sG-gNB to configure faster viewport feedback from the viewing device, for e.g., several times within a time duration of 1 ms. Various other methods of viewport feedback may be employed, including dedicated time and frequency resources over which the UEs 145 may send the viewport feedback.
Referring to Figure 12, in other embodiments, edge content 245 may be defined in relation to the aggregate viewport 240. The edge content 245 may be within a predetermined distance of the perimeter of the aggregate viewport 240, e.g. comprising a single tile adjacent and outside of the perimeter. This edge content 245 may be transmitted at a lower quality (lower resolution or bit-rate) than the content data corresponding to the aggregate viewport 240. The edge content 245 may be cached in a cache at the content server / edge cloud 120 or at a cache of the sG-gNB 160 for immediate broadcast upon receiving signalled information from one of the UEs 145 that its viewport has changed (or responsive to some other prediction that a viewport has changed.)
In this way, the amount of signalling needed between the sG-gNB 160 and the content server / edge-cloud 120 is minimised and a more seamless viewing experience is delivered to end users. The sG-gNB 160 and/or cache may adaptively learn potential future changes in viewports or the demand for new content based on self-learning algorithms. This may be useful in indoor viewing arenas / VR movie theatres where there may be a higher probability of users making similar viewport changes. From an encoding perspective, the embodiments described propose the encoding of the VR content data within tiles, e.g. rather than region-of-interest based collective coding. This enables dynamic updating of the aggregate viewport without affecting current viewports of users with no change in viewport. This also enables minimal
modifications of the broadcast VR content data if some users have minor changes in viewport. Collective encoding would require a change in the broadcast system. In Figure 7, our method implies that a change in aggregate viewport based on the change of viewport for some users leads to adding and removing only some tiles within the shown grid rather than changing the whole set of transmissions broadcast over-the-air.
For completeness, Figures 13a - I3e show different ways in which an aggregate viewport may be determined. Here, first to third viewports are represented in Figures 13a - 13c by reference numerals 230, 231, 232 respectively. Indication of each viewport
230, 231, 232 is received from a respective UE 145. Figure 13d shows a first example aggregate viewport 235 which is determined such that each tile of the viewports 230,
231, 232 is represented. Figure i3e shows a second example aggregate viewport 236 which is determined such that a rectangular aggregate viewport is determined such that the least number of non-represented tiles is included, i.e. one tile in this case. A motion-constrained tile set (MCTS) is such a set of one or more tiles within a picture that the inter prediction (a.k.a. temporal prediction) process is constrained in encoding such that no sample value outside the motion-constrained tile set, and no sample value at a fractional sample position that is derived using one or more sample values outside the motion-constrained tile set, is used for inter prediction of any sample within the motion-constrained tile set. An MCTS may be required to be rectangular. Additionally, the encoding of an MCTS is constrained in a manner that coding parameters, such as motion vector candidates, are not derived from blocks outside the MCTS.
Embodiments described above with reference to the term "tile" may be applied with reference to the term motion-constrained tile set (MCTS).
It is noted that switching from one set of tiles to another set of tiles being transmitted may require discontinuation of inter prediction and coding of newly introduced tile locations without inter prediction. In some embodiments, viewport information received from a subset of UEs 145 is used to determine the aggregate viewport 235, 236. For example, viewport information of one or more UEs reporting viewports significantly different from majority of the UEs may be discarded when determining the aggregate view port. This enables to avoid unnecessarily wide aggregate viewports. In some embodiments, determining whether to discard a UE in this respect may be based on a distance between the reported viewport of the UE and a statistical location derived from all reported viewports, for example an average of centres of the reported viewports. In another example, the aggregate viewport may be determined such that a predetermined proportion (for example 95 %) of UEs 145 will be served by the aggregate viewport, that is, their current viewport data corresponds to the determined aggregate viewport. A unicast connection may be used and/or established to deliver the appropriate VR content to the discarded UEs. It will be appreciated that the above described embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present application.
Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present application or of any application derived therefrom, new claims may be formulated to cover any such features and/or combination of such features.

Claims

Claims
1. A method comprising:
receiving from a plurality of remote user equipment (UE) information relating to respective viewports;
determining data representing an aggregate viewport based on the viewport information from the plurality of UEs; and
transmitting video content data to the plurality of UEs based on the determined aggregate viewport;
transmitting the aggregate viewport data to the UEs;
receiving from one or more of the UEs an indication that their respective user viewport does not relate to the aggregate viewport;
determining an updated aggregate viewport based on said received indication; transmitting to the plurality of UEs the updated aggregate viewport; and transmitting to the plurality of UEs video content data based on the updated aggregate viewport.
2. The method of claim 1, wherein the indication that the respective user viewport does not relate to the aggregate viewport comprises an indication that said UE has terminated receiving the video content data.
3. The method of claim 1 or claim 2, wherein the indication that the respective user viewport does not relate to the aggregate viewport comprises an indication that video content data is not being viewed at said UE.
4. The method of claim 2 or claim 3, wherein determining the updated aggregate viewport comprises disregarding viewport information received from said UE.
5. The method of any of claims 1 to 4, wherein the indication that the respective user viewport no longer corresponds to the aggregate viewport comprises an indication that the viewport has changed by more than a predetermined amount.
6. The method of any of claims 1 to 5, wherein the indication that the respective user viewport no longer corresponds to the aggregate viewport comprises an indication that the viewport is outside of the aggregate viewport.
7. The method of any preceding claim, wherein the viewport information comprises information from which can be derived information relating to a viewing angle associated with the UE or a display device associated with the UE.
8. The method of claim 7, wherein the viewing angle information comprises a viewing orientation.
9. The method of claim 7 or claim 8, wherein the viewing angle information comprises the extent of the viewport.
10. The method of claim 9, wherein the extent of the viewport is represented by the horizontal and vertical fields of view of the viewport.
11. The method of any of claims 7 to 10, wherein the viewport information further comprises an indication of whether the content is monoscopic or stereoscopic.
12. The method of any of claims 7 to 11, wherein the viewing angle information is quantised and/or compressed and wherein the method further comprises
reconstructing and/or decompressing the viewing angle information prior to determination.
13. The method of any preceding claim, wherein transmitting the video content data to the plurality of UEs comprises transmitting video content data corresponding to the aggregate viewport at a higher quality relative to other video content data outside of the aggregate viewport.
14. The method of claim 13, wherein the transmitted video content is transmitted at a high resolution and/or data rate than other video content data outside of the aggregate viewport.
15. The method of claim 13 or claim 14, further comprising caching video content outside of, but within a limited predetermined spatial distance of, the aggregate viewport, and transmitting said cached video content at a higher quality responsive to receiving from the indication that a UEs respective user viewport does not relate to the aggregate viewport.
16. The method of claim 15, wherein said cached video content is transmitted prior to transmitting to the UEs the updated aggregate viewpoint.
17. The method of any preceding claim, wherein the video content data is provided as a plurality of tiles, each tile corresponding to a sub-portion of the overall content data and representing a respective spatial display position.
18. The method of any preceding claim, wherein the video content data is transmitted to the UEs using one or more of a multicast or broadcast transmission.
19. The method of any preceding claim, wherein the aggregate viewport data is transmitted to the UEs using one or more of a multicast or broadcast transmission.
20. The method of any preceding claim, wherein the video content data represents a panoramic image or video.
21. The method of any preceding claim, wherein the video content data is virtual reality (VR) content data.
22. The method of claim 20, wherein the user viewport information from each of the plurality of UEs is indicative of the user viewport of a respective VR headset.
23. The method of any preceding claim, further comprising transmitting configuration data to each of the UEs for configuring them to transmit the viewport information.
24. The method of any preceding claim, performed at a base station of a mobile network.
25. A method, comprising:
transmitting to a remote system information relating to a viewport of a user equipment (UE); receiving further video content data based on aggregate viewport data determined remotely using the transmitted viewport information and viewport information from one or more other UEs;
receiving the aggregate viewport data;
determining that the viewport does not relate to the aggregate viewport; and transmitting an indication that the viewport does not relate to the aggregate viewport to the remote system.
26. The method of claim 25, wherein the determination that the viewport does not relate to the aggregate viewport is made responsive to the UE terminating receiving the video content data, the transmitted indication identifying such to the remote system.
27. The method of claim 25 or claim 26, wherein the determination that the viewport does not relate to the aggregate viewport is made responsive to the video content data not being viewed at the UE, the transmitted indication identifying such to the remote system.
28. The method of claim 26 or claim 27, further comprising terminating
transmission of the viewport information.
29. The method of any of claims 25 to 28, wherein the determination that the viewport does not relate to the aggregate viewport is made responsive to the viewport of the UE changing by more than a predetermined amount, the transmitted indication comprising updated viewport information.
30. The method of any of claims 25 to 29, wherein the determination that the viewport does not relate to the aggregate viewport is made responsive to the viewport of the UE being outside of the aggregate viewport, the transmitted indication comprising updated viewport information.
31. The method of any of claims 25 to 30, wherein the transmitted viewport information comprises information from which can be derived information relating to a viewing angle associated with the UE or a display device associated with the UE.
32. The method of claim 31, wherein the transmitted viewing angle information comprises a viewing orientation.
33. The method of claim 31 or claim 32, wherein the transmitted viewing angle information comprises the extent of the viewport.
34. The method of claim 33, wherein the extent of the viewport is represented by the horizontal and vertical fields of view of the viewport.
35. The method of any of claims 31 to 34, wherein the transmitted viewport information further comprises an indication of whether the UE prefers monoscopic or stereoscopic content.
36. The method of any of claims 31 to 35, further comprising quantising and/ or compressing the viewing angle information prior to transmitting.
37. The method of any of claims 25 to 36, wherein the received video content data corresponding to the aggregate viewport is at a higher quality relative to other video content data outside of the aggregate viewport.
38. The method of claim 37, wherein the received video content data is at a higher resolution and/or data rate than other video content data outside of the aggregate viewport.
39· The method of claim 37 or claim 38, further comprising caching received video content data outside of, but within a limited predetermined spatial distance of, the aggregate viewport, and displaying said cached video content responsive to
determining that the viewport does not relate to the aggregate viewport.
40. The method of any of claims 25 to 39, wherein the video content data is provided as a plurality of tiles, each tile corresponding to a sub-portion of the overall video content data and representing a respective spatial display position.
41. The method of any of claims 25 to 40, wherein the video content data is received using one or more of a multicast or broadcast transmission.
42. The method of any of claims 25 to 41, wherein the aggregate viewport data is received using one or more of a multicast or broadcast transmission.
43. The method of any of claims 25 to 42, wherein the video content data represents a panoramic image or video.
44. The method of any of claims 25 to 43, wherein the video content data is virtual reality (VR) content data.
45. The method of claim 44, wherein the user viewport information is indicative of the user viewport of a VR headset.
46. The method of any of claims 25 to 45, further comprising receiving from a remote device configuration data for configuring the UE to transmit the viewport information, and transmitting the viewport information responsive to installing or executing the configuration data.
47. The method of any of claims 25 to 46, performed at a mobile terminal for connection to a mobile network.
48. A computer program comprising instructions that when executed by a computer control it to perform the method of any preceding claim.
49. An apparatus configured to perform the method steps of any of claims 1 to 48.
50. A non-transitory computer-readable medium having stored thereon computer- readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising:
receiving from a plurality of remote user equipment (UE) information relating to respective viewports;
determining data representing an aggregate viewport based on the viewport information from the plurality of UEs; and
transmitting video content data to the plurality of UEs based on the determined aggregate viewport;
transmitting the aggregate viewport data to the UEs; receiving from one or more of the UEs an indication that their respective user viewport does not relate to the aggregate viewport;
determining an updated aggregate viewport based on said received indication; transmitting to the plurality of UEs the updated aggregate viewport; and transmitting to the plurality of UEs video content data based on the updated aggregate viewport.
51. A non-transitory computer-readable medium having stored thereon computer- readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising:
transmitting to a remote system information relating to a viewport of a user equipment (UE);
receiving further video content data based on aggregate viewport data determined remotely using the transmitted viewport information and viewport information from one or more other UEs;
receiving the aggregate viewport data;
determining that the viewport does not relate to the aggregate viewport; and transmitting an indication that the viewport does not relate to the aggregate viewport to the remote system.
PCT/FI2018/050615 2017-09-05 2018-08-31 Transmission of video content based on feedback WO2019048733A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1714244.9A GB2568020A (en) 2017-09-05 2017-09-05 Transmission of video content based on feedback
GB1714244.9 2017-09-05

Publications (1)

Publication Number Publication Date
WO2019048733A1 true WO2019048733A1 (en) 2019-03-14

Family

ID=60050652

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2018/050615 WO2019048733A1 (en) 2017-09-05 2018-08-31 Transmission of video content based on feedback

Country Status (2)

Country Link
GB (1) GB2568020A (en)
WO (1) WO2019048733A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542209A (en) * 2020-03-30 2021-10-22 腾讯美国有限责任公司 Method, apparatus and readable storage medium for video signaling
CN114500971A (en) * 2022-02-12 2022-05-13 北京蜂巢世纪科技有限公司 Stadium 3D panoramic video generation method and device based on data sharing, head-mounted display equipment and medium
WO2022117185A1 (en) * 2020-12-03 2022-06-09 Nokia Technologies Oy Methods, user equipment and apparatus for controlling vr image in a communication network

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102022115806A1 (en) * 2022-06-24 2024-01-04 Valeo Comfort And Driving Assistance Method and system for providing an image to be displayed by an output device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1087618A2 (en) * 1999-09-27 2001-03-28 Be Here Corporation Opinion feedback in presentation imagery
US20080104652A1 (en) * 2006-11-01 2008-05-01 Swenson Erik R Architecture for delivery of video content responsive to remote interaction
US20130208012A1 (en) * 2012-02-15 2013-08-15 Cenk Ergan Speculative Render Ahead and Caching in Multiple Passes

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10178414B2 (en) * 2015-10-14 2019-01-08 International Business Machines Corporation Aggregated region-based reduced bandwidth video streaming

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1087618A2 (en) * 1999-09-27 2001-03-28 Be Here Corporation Opinion feedback in presentation imagery
US20080104652A1 (en) * 2006-11-01 2008-05-01 Swenson Erik R Architecture for delivery of video content responsive to remote interaction
US20130208012A1 (en) * 2012-02-15 2013-08-15 Cenk Ergan Speculative Render Ahead and Caching in Multiple Passes

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MARTENS, G.: "Bandwidth management for ODV tiled streaming with MPEG-DASH", MASTER THESIS, September 2015 (2015-09-01), Retrieved from the Internet <URL:http://hdl.handle.net/1942/19390> [retrieved on 20181112] *
WANG, H. ET AL., WIRELESS MULTICAST FOR ZOOMABLE VIDEO STREAMING, May 2015 (2015-05-01), XP055582100, Retrieved from the Internet <URL:https://arxiv.org/abs/1505.01933> [retrieved on 20181126] *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542209A (en) * 2020-03-30 2021-10-22 腾讯美国有限责任公司 Method, apparatus and readable storage medium for video signaling
WO2022117185A1 (en) * 2020-12-03 2022-06-09 Nokia Technologies Oy Methods, user equipment and apparatus for controlling vr image in a communication network
CN114500971A (en) * 2022-02-12 2022-05-13 北京蜂巢世纪科技有限公司 Stadium 3D panoramic video generation method and device based on data sharing, head-mounted display equipment and medium

Also Published As

Publication number Publication date
GB2568020A (en) 2019-05-08
GB201714244D0 (en) 2017-10-18

Similar Documents

Publication Publication Date Title
Bao et al. Motion-prediction-based multicast for 360-degree video transmissions
US11489900B2 (en) Spatially unequal streaming
US9832450B2 (en) Methods and apparatus for generating and using reduced resolution images and/or communicating such images to a playback or content distribution device
CN110519652B (en) VR video playing method, terminal and server
KR20190142765A (en) Metrics and messages to enhance your experience with 360-degree adaptive streaming
CN110149542B (en) Transmission control method
WO2019048733A1 (en) Transmission of video content based on feedback
US20190200084A1 (en) Video Delivery
CN107533449B (en) Using motion location information to control ultra-wide video display in stadium settings
US11159823B2 (en) Multi-viewport transcoding for volumetric video streaming
US20220046223A1 (en) Multi-user viewport-adaptive immersive visual streaming
CN110798707B (en) Method, client and server for transmitting media data
US20240119660A1 (en) Methods for transmitting and rendering a 3d scene, method for generating patches, and corresponding devices and computer programs
JP2014520409A (en) Method and system for encoding multi-view video content
Hu et al. Mobile edge assisted live streaming system for omnidirectional video
US20220086470A1 (en) Transcoding ultra-high-definition panoramic videos
EP3386203B1 (en) Signalling of auxiliary content for a broadcast signal
Sharma et al. UAV Immersive Video Streaming: A Comprehensive Survey, Benchmarking, and Open Challenges

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18855009

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18855009

Country of ref document: EP

Kind code of ref document: A1