GB2558206A

GB2558206A - Video streaming

Info

Publication number: GB2558206A
Application number: GB1621407.4A
Authority: GB
Inventors: Pesonen Mika; Rajala Johannes
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2016-12-16
Filing date: 2016-12-16
Publication date: 2018-07-11
Also published as: GB201621407D0

Abstract

Method comprising: providing a stereoscopic video of a scene for streaming; providing 3D entities to be rendered relative to the stereoscopic scene wherein attributes of the three dimensional objects are provided as separate data segments; and for sequential frames of the stereoscopic video: transmitting the stereoscopic video to remote user devices over a data channel; determining available bandwidth of the data channel for transmission of the 3D models; transmitting selected subsets of the object feature data blocks to the remote user devices based on the available bandwidth to enable at least partial rendering. Object characteristics may include texture, material update, position within the scene, spatial relationship, animation update, model updates, levels of detail (LOD), transformation, translation, rotation, scaling, scene hierarchy, and lighting. Data packets may also include interactive pages and links. Number of chunks distributed may be adjusted dynamically based on bandwidth. Each piece may have an associated priority. If bandwidth is above a threshold full scene data may be sent for 3 dimensional visualisation. Spatial audio sound signals may also be delivered. The information may represent mixed reality or augmented reality video.

Description

(71) Applicant(s):

Nokia Technologies Oy Karaportti 3, 02610 Espoo, Finland (56) Documents Cited:

WO 2016/050283 A1 US 20160241837 A1

WO 2012/170984 A1 US 20160026253 A1 (72) Inventor(s):

Mika Pesonen Johannes Rajala (74) Agent and/or Address for Service:

Nokia Technologies Oy

IPR Department, Karakaari 7, 02610 Espoo, Finland (58) Field of Search:

INT CL G06T, H04N

Other: WPI, EPODOC, X-FULL - designating the full text databases of all states (54) Title of the Invention: Video streaming

Abstract Title: Transmitting aspects of 3D models for augmented reality based on bandwidth (57) Method comprising: providing a stereoscopic video of a scene for streaming; providing 3D entities to be rendered relative to the stereoscopic scene wherein attributes of the three dimensional objects are provided as separate data segments; and for sequential frames of the stereoscopic video: transmitting the stereoscopic video to remote user devices over a data channel; determining available bandwidth of the data channel for transmission of the 3D models; transmitting selected subsets of the object feature data blocks to the remote user devices based on the available bandwidth to enable at least partial rendering. Object characteristics may include texture, material update, position within the scene, spatial relationship, animation update, model updates, levels of detail (LOD), transformation, translation, rotation, scaling, scene hierarchy, and lighting. Data packets may also include interactive pages and links. Number of chunks distributed may be adjusted dynamically based on bandwidth. Each piece may have an associated priority. If bandwidth is above a threshold full scene data may be sent for 3 dimensional visualisation. Spatial audio sound signals may also be delivered. The information may represent mixed reality or augmented reality video.

1/13

Fig. 1

2/13

CM

IQ ο

ro

Fig. 2 c

GJ

Ό o

<Q o / uo / s

o o

CM

3/13

ο

Fig. 3

4/13

<o σ>

Fig. 4

5/13 ο

CM

104

Ο

CM

Fig. 5

6/13

ο

CM = HAIPtag

7/13

130

= HAIPtag

8/13

139

oo

CM

CO

9/13

CM

Ο

10/13

τ—I Cxi CO • · ·

11/13

For Frame/77 I—\ 12.1 r\j

CM cq

12/13

Ο

LD to .Ω

Ο)

Ω = QJ Ω <_> LLΦ οο to

Ω τΗ

Ο

to	to	to	to
Q	Q	Ω	Ω

ΓΜ	ΓΜ	ΓΜ	φ
ό	ό	ό	ό

to

JD to ίο


ω	<υ +-»
Ω	Π3
Ο)	Ό
U	Ω
φ	Ω
k.

οο τ—I ο

Ω.

ΓΜ ο3 _Ω

Ο

Φ +->

Φ

Ο

Ω.

=)

Φ

C

Φ

Φ tO

LO

Ω.

ΓΜ ίΩ

Ο

Φ +J

Φ

Ο

Ω.

=) φ

Ο

ΓΜ

Ω.

_Ω

Ο

Φ +J

Φ

Ό

Ω.

to ίο to ίο

Φ rt

Φ 'i_

Φ +J

Φ

Ω

Ι_Ι_

Φ

Ω

Φ

Φ to to ίο φ

ό


<υ	<υ 4-»
Ω	Π3
Ο)	Ό
U	Ω
φ	Ω

λ
ω	<υ +-»
Ω	Π3
Ο)	Ό
U	Ω
φ	Ω
k.


ω	<υ 4-»
Ω	Π3
Ο)	Ό
U	Ω
φ	Ω

r
Ω	τ—I
Ο	4t
4-»	<υ
Π3	4—»
Ε	Π3 Ό
Ω	Ω
,<	5]

75	τ—I 4t
	ω
<υ	4-»
4-»	Π3
Π3	Ό
	Ω
k.


	ΓΜ 4t
ω Ό	<υ 4-»
ο	Π3
	Ό Ω
k.

(Ο οο tO

LO οο

LO

Fig. 14

C0 οο

Ω

Ο)

Ε

OJO

Ο)

Φ

Ο) to

Π3

CQ to _Ω φ

ό


ω	<υ 4—'
Ω	Π3
ω	Ό
U	Ω
φ	Ω
k-

	νI
	it
ω Ό	(U 4-»
ο	03
	Ό Ω
	Ω
k.	4

ΓΜ τ—I Ω_ τ—I τ—I

ΓΜ

CL

Ω_ to _Ω φ

ό


ω	<υ 4-»
Ω	03
Ο)	Ό
U	Ω
φ	Ω
k.

<
Ω	νI
Ο	it
4-»	<υ
03	4-»
Ε	03 Ό
Ω	Ω
<

ΓΜ ο

ΙΟ <τ>

,ο>

U-.

οο

Ο σ>

to

-Ω

CO ω

Ω = QJ Ω <_> LLΦ

13/13

190 191

ώ ο & ·*>· 5«. S « c c Φ χ ο &. & & % *

S 7δ £3 s ε a>

<Rrf Λ{ c? £ s $ © Φ © T?

C to © £1. S *λ C to Ο Ό ΓΛ

C υ m «j

Fig. 16

Intellectual

Property

Office

Application No. GB1621407.4 RTM Date ^^une 2017

The following terms are registered trade marks and should be read as such wherever they occur in this document:

Bluetooth (Page 8)

WiMax (Page 8)

Wi-Fi (Pages 8 & 9)

OZO (Page 9)

JavaScript (Page 18)

Python (Page 18)

Lua (Page 18)

Intellectual Property Office is an operating name of the Patent Office www.gov.uk/ipo

Video Streaming

Field

This specification relates generally to methods and apparatus for mixed reality streaming. The specification further relates to, but is not limited to, methods and apparatus for mixed reality video streaming and related applications such as in computer gaming, 360 degrees stereoscopic video and three dimensional animation.

Background

The modern communications era has brought about a tremendous expansion of wireline and wireless networks. Computer networks, television networks, and telephony networks are experiencing an unprecedented technological expansion, fueled by consumer demand. Wireless and mobile networking technologies have addressed related consumer demands, while providing more flexibility and immediacy of information transfer.

Current and future networking technologies continue to facilitate ease of information transfer and convenience to users. Due to the now ubiquitous nature of electronic devices, people of all ages and education levels are utilizing electronic devices to communicate with other individuals or contacts, receive services and/or share information, media and other content. One area in which there is a demand to increase ease of information transfer relates to the streaming delivery of live or near-live video services to users of electronic devices. The services may be in the form of a particular media or communication application desired by the user, such as a video player, a game player, etc. The services may also be in the form of interactive applications in which the user may respond to a network device in order to perform a task or achieve a goal or take part in a movie or game by interaction.

In some situations, electronic devices may enhance the interaction that users have with their environment. Numerous use cases have developed around the concept of utilizing electronic terminals to enhance user interaction with mixed reality applications, for example applications visualizing a user’s local environment or part of a live or near-live video game or other video content. Mixed reality involves the merging of real and virtual worlds. In some cases, mixed reality involves mixing real world image data with virtual objects in order to produce environments and visualizations in which physical and digital objects co-exist and potentially also interact in real time. Mixed reality includes augmented reality, which uses digital imagery to augment or add to real world imagery, and virtual reality, which simulates real world environments using computer simulation.

The user’s electronic device may for example be a headset or pair of glasses/goggles provided with video screens for displaying stereoscopic video content and, usually, speakers or earphones for outputting audio content appropriate to the video content. The electronic device may be linked to a local client device, e.g. games console, which is in communication with a streaming content server over a network for receiving the live or near-live content.

The audio content may be spatial audio content, which is audio with a spatial percept. By means of a positioning device carried by the electronic device and/or the user, e.g. a radio positioning tag, the user’s position and/or orientation may determine interactions with the content.

Mixed reality and augmented reality are fast growing areas, which are currently also available on many mobile platforms (e.g., Symbian™, Android™, iPhone™, Windows Mobile™, etc.). The concept of mixed reality and augmented reality is to overlay graphics or information on a live or near-live video stream or a still image from a camera in a communication device. The graphics/information maybe of any kind. The graphics/information about the environment and objects in it maybe stored and retrieved as an information layer on top of a view of the real world.

In some situations, the video stream will be stereoscopic and the overlaid graphics may be three-dimensional (3D) graphics (e.g. 3D objects or other 3D entities) for being rendered at the client or user-end relative to the video stream. The video stream and, if provided, associated audio stream, may have priority in terms of bandwidth usage over the 3D content. Therefore, bandwidth limitations in a live or near-live video stream may not allow the full 3D content to be delivered every frame.

Summary

According to one aspect, there is provided a method comprising: providing first data representing a plurality of frames of a stereoscopic video scene for streaming transmission to one or more remote user devices; providing second data representing one or more entities to be rendered in three-dimensions relative to the stereoscopic scene, the second data being provided as a plurality of segments each of which represents a respective property of the one or more entities to enable at least a partial rendering or updating of the one or more entities at the user device(s); and for sequential frames of the video scene: transmitting the first data to one or more remote user devices over a data channel; determining an available bandwidth of the data channel for transmission of the second data; and transmitting one or more selected subsets of the segments to the one or more remote user devices based on the available bandwidth to enable at least a partial rendering or updating of the one or more entities relative to the stereoscopic scene.

A plurality of segments may be transmitted for each frame.

The number of segments may be adjusted dynamically based on the available bandwidth.

Each of the provided segments may have an associated priority and wherein the one or more selected subsets is based both on the available bandwidth and said priorities.

All data segments may be transmitted at least once over a plurality of sequential frames in an order which is based on said priorities.

The priorities may be indicative of the amount of data required to transmit the associated segments.

The priorities maybe dynamically updated responsive to a prior selection of a given segment.

The segments may comprise one or more of: a scene update, a model update, a model position, a model transformation, an animation update, a material update, a scene hierarchy, texture, and lighting.

The method may further comprise transmitting a predetermined data segment irrespective of bandwidth. The predetermined data segment maybe a scene update.

One or more of the segments may comprise data representing a displayable interactive page, or a link thereto, for display of the page at a user device and for receiving a user interaction from the user device. The interactive page may be a website or a link thereto.

The method may further comprise providing a set of full scene data representing the one or more entities for full three-dimensional rendering thereof at the one or more remote user devices, and wherein said full scene data is transmitted in the event that the determined bandwidth is above a first threshold and the selected subsets are transmitted only if the bandwidth is below said first threshold.

The method may further comprise providing third data representing a plurality of corresponding frames of audio data for streaming transmission to, and rendering at, the one or more user devices over the data channel. The third data may represent spatial audio signals.

The first and second data may represent mixed reality video content for streaming transmission to one or more user terminals.

According to a second aspect, there is provided a computer program comprising instructions that when executed by a computer program control it to perform the method of: providing first data representing a plurality of frames of a stereoscopic video scene for streaming transmission to one or more remote user devices; providing second data representing one or more entities to be rendered in three-dimensions relative to the stereoscopic scene, the second data being provided as a plurality of segments each of which represents a respective property of the one or more entities to enable at least a partial rendering or updating of the one or more entities at the user device(s); and for sequential frames of the video scene: transmitting the first data to one or more remote user devices over a data channel; determining an available bandwidth of the data channel for transmission of the second data; and transmitting one or more selected subsets of the segments to the one or more remote user devices based on the available bandwidth to enable at least a partial rendering or updating of the one or more entities relative to the stereoscopic scene.

According to a third aspect, there is provided a non-transitory computer-readable storage medium having stored thereon computer-readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising: providing first data representing a plurality of frames of a stereoscopic video scene for streaming transmission to one or more remote user devices; providing second data representing one or more entities to be rendered in three-dimensions relative to the stereoscopic scene, the second data being provided as a plurality of segments each of which represents a respective property of the one or more entities to enable at least a partial rendering or updating of the one or more entities at the user device(s); and for sequential frames of the video scene: transmitting the first data to one or more remote user devices over a data channel; determining an available bandwidth of the data channel for transmission of the second data; and transmitting one or more selected subsets of the segments to the one or more remote user devices based on the available bandwidth to enable at least a partial rendering or updating of the one or more entities relative to the stereoscopic scene.

According to a fourth aspect, there is provided an apparatus, the apparatus having at least one processor and at least one memory having computer-readable code stored thereon which when executed controls the at least one processor: to provide first data representing a plurality of frames of a stereoscopic video scene for streaming transmission to one or more remote user devices; to provide second data representing one or more entities to be rendered in three-dimensions relative to the stereoscopic scene, the second data being provided as a plurality of segments each of which represents a respective property of the one or more entities to enable at least a partial rendering or updating of the one or more entities at the user device(s); and for sequential frames of the video scene: to transmit the first data to one or more remote user devices over a data channel; to determine an available bandwidth of the data channel for transmission of the second data; and to transmit one or more selected subsets of the segments to the one or more remote user devices based on the available bandwidth to enable at least a partial rendering or updating of the one or more entities relative to the stereoscopic scene.

According to a fifth aspect, there is provided an apparatus configured to perform the method of providing first data representing a plurality of frames of a stereoscopic video scene for streaming transmission to one or more remote user devices; providing second data representing one or more entities to be rendered in three-dimensions relative to the stereoscopic scene, the second data being provided as a plurality of segments each of which represents a respective property of the one or more entities to enable at least a partial rendering or updating of the one or more entities at the user device(s); and for sequential frames of the video scene: transmitting the first data to one or more remote user devices over a data channel; determining an available bandwidth of the data channel for transmission of the second data; and transmitting one or more selected subsets of the segments to the one or more remote user devices based on the available bandwidth to enable at least a partial rendering or updating of the one or more entities relative to the stereoscopic scene.

Brief Description of the Drawings

Embodiments will now be described, by way of non-limiting example, with reference to the accompanying drawings, in which:

Figure l is a schematic block diagram of a data communications network in accordance with embodiments;

Figure 2 is a schematic block diagram of application or functional level components of elements of the Figure l data communications network in accordance with embodiments; Figure 3 is a schematic block diagram of a processing pipeline of a user terminal shown in Figure l in accordance with embodiments;

Figures 4 is a schematic block diagram showing components of a server shown in Figure 1 in accordance with embodiments;

Figure 5 is a schematic block diagram showing components of a user terminal shown in Figure tin accordance with embodiments;

Figure 6 is a graphical representation of a frame a video data and real world data;

Figure 7 is a graphical representation of a corresponding frame of 3D data overlaid on the Figure 6 video and real world data;

Figure 8 is a flow diagram showing method steps for creating a priority pool of 3D segments in accordance with embodiments;

Figures 9 is a graphical representation of a generic priority pool;

Figure 10 is a flow diagram showing method steps for transmitting packets of 3D segments to user terminals in accordance with embodiments;

Figure 11 is a flow diagram showing method steps for generating a packet of 3D segments in accordance with embodiments;

Figure 12 is a flow diagram showing method steps of a further embodiment for transmitting packets of 3D segments to user terminals in accordance with embodiments;

Figures 13 is a graphical representation of an example priority pool;

Figure 14 is a graphical representation of a sequence of 3D packets generated based on the Figure 13 priority pool;

Figure 15 shows rendered views of a 3D object based on 3D segments representing different levels of detail; and

Figure 16 is a flow diagram showing a processing pipeline for a server and a user terminal, including functional level operations.

Detailed Description of Embodiments

Some embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. Indeed, various embodiments of the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Like reference numerals refer to like elements throughout. As used herein, the terms data, content, information and similar terms maybe used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the invention. Moreover, the term exemplary, as used herein, is not provided to convey any qualitative assessment, but instead merely to convey an illustration of an example. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the invention.

Additionally, as used herein, the term 'circuitry' refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present. This definition of 'circuitry' applies to all uses of this term herein, including in any claims. As a further example, as used herein, the term 'circuitry' also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware. As another example, the term 'circuitry' as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.

As defined herein a computer-readable storage medium, which refers to a non-transitory, physical or tangible storage medium (e.g., volatile or non-volatile memory device), may be differentiated from a computer-readable transmission medium, which refers to an electromagnetic signal.

FIG. l illustrates a system diagram in which a server io for providing live or near live streaming of video data for one or more mixed reality applications is shown in an example communications environment comprising a network 30. One or more user devices, or terminals 20, 40, 50 are also shown. A first user terminal 20 may be a device such as a computer or games console, but may in some embodiments be a mobile terminal or indeed any terminal capable of receiving streaming data from the server 10 and in some cases transmitting data to the server 10 and/or to other user terminals 40, 50. The first user terminal 20 may operate with an associated user output device 30 in the form of a headset or glasses/goggles provided with video screens for displaying stereoscopic video content and, usually, speakers or earphones for outputting audio content appropriate to the video content. The audio content may be spatial audio content, which is audio with a spatial percept. By means of a possible positioning device carried by the output device 30 and/or a user 32, e.g. a radio positioning tag, the user’s position and/or orientation may determine interactions with the content. A controller (not shown) may also be associated with the first user terminal 20, for example in the form of a hand held device comprising buttons, switches or the like to enable control signals to be inputted via the network 30.

The system may in some embodiments also comprise second and third user terminals 40, 50 similarly capable of receiving and transmitting data over the network 30. In some cases, an embodiment of the invention may further include one or more additional communications devices.

The network 30 may include a collection of various different nodes (of which the server 10, and the first to third user terminals 20,40, 50 may be examples), devices or functions that may be in communication with each other via corresponding wired and/or wireless interfaces. As such, the illustration of FIG. 1 should be understood to be an example of a broad view of certain elements of the system and not an all-inclusive or detailed view of the system or the network 30.

In some embodiments, the server 10 and the one or more user terminals 20,40, 50 maybe in data communications with each other via the network 30 using interfaces and protocols for wired or wireless internet communications, e.g. using TCP/IP, HTTP. Although not necessary, in some embodiments, the network 30 may be capable of supporting communication in accordance with any one or more of a number of First-Generation (lG), Second-Generation (2G), 2.5G, Third-Generation (3G), 3.5G, 3.9G, Fourth-Generation (4G) mobile communication protocols, Long Term Evolution (LTE), LTE advanced (LTE-A) and/or the like. In some embodiments, the network 30 may be a point-to-point (P2P) network. One or more communication terminals such as the server 10 and the first to third user terminals 20, 40, 50 maybe in communication with each other via the network 30 and each may include an antenna or antennas for transmitting signals to and for receiving signals from a base site, which could be, for example a base station that is a part of one or more cellular or mobile networks or an access point that may be coupled to a data network, such as a Local Area Network (LAN), a Metropolitan Area Network (MAN), and/or a Wide Area Network (WAN), such as the Internet.

Furthermore, the server 10 and the first to third user terminals 20, 40, 50 may communicate in accordance with, for example, radio frequency (RF), near field communication (NFC), Bluetooth (BT), Infrared (IR) or any of a number of different wireline or wireless communication techniques, including Local Area Network (LAN), Wireless LAN (WLAN), Worldwide Interoperability for Microwave Access (WiMAX), Wireless Fidelity (WiFi), UltraWide Band (UWB), Wibree techniques and/or the like. As such, the server 10 and the first to third user terminals 20, 40 50 may be enabled to communicate with the network 30 and each other by any of numerous different access mechanisms. For example, mobile access mechanisms such as Wideband Code Division Multiple Access (W-CDMA), CDMA2000,

Global System for Mobile communications (GSM), General Packet Radio Service (GPRS) and/or the like maybe supported as well as wireless access mechanisms such as WLAN, WiMAX, and/or the like and fixed access mechanisms such as Digital Subscriber Line (DSL), cable modems, Ethernet and/or the like.

Referring to Figure 2, high level modules for the server io and the first to third user terminals 20, 40, 50 are shown.

The server 10 comprises a content platform 60 for storage and delivery of three, and potentially four, types of data representing in data form (a) video data 62 representing a scene, (b) audio data 64 (c) 3D data 66 representing one or more 3D entities for video display relative to the video data scene and (d) real world data 67 for display relative to the video data and the 3D entities which may, for example, represent content captured by a camera of one of the user terminals. The real world data 67 is relevant for mixed reality applications, but is not essential. The data from the modules 62, 64, 66, 67 is provided to a stream encoder 68 the operation of which will be described below.

Typically, the video data 62 will represent a stereoscopic scene on which is overlaid the one or more 3D entities of the 3D data 66 over a sequence of video frames. The video data 62 may represent a background, for example. In some embodiments, the video data 62 may represent video frames of a location, an event or a game background. The term video scene is therefore not limited to any particular image or image type. Whilst stereoscopic, and therefore 3D in nature, the video data 62 of this definition is not to be confused with the 3D data 66 which represents 3D graphics for display relative to the video scene, and therefore which has some perceivable depth in front of, or behind, the video scene.

The audio data 64 may be multichannel audio in loudspeaker format, e.g. stereo signals, 4.0 signals, 5.1 signals, Dolby Atmos (RTM) signals or the like. Instead of loudspeaker format audio, the audio data 64 may be in a multi microphone signal format, such as a raw eight signal from a camera generating spatial video and audio, e.g. Nokia’s OZO camera.

In overview, the stream encoder 68 is arranged to stream the above data types 62, 64, 66, 67 based on available bandwidth with the aim of keeping corresponding frames or time slots substantially synchronised. The stream encoder 68 also controls how the 3D data 66 is streamed using methods to be described below. The stream encoder 68 reserves an amount of the current available bandwidth (e.g. measured in Mbit/sec) for streaming transmission of the video and audio data 62, 64 and, if provided, the real world data 67. These forms of data typically occupy the majority of the available bandwidth, leaving a minority portion available for the 3D data 66.

Conventionally, particularly in live or near-live streaming applications where users may start to consume the content at any time, there is limited available bandwidth for the 3D data 66 meaning that not all 3D data may be sent for every frame of the other data.

In embodiments herein, the 3D data 66 is further arranged, or divided, into elementary segments, each of which represents a respective property of the one or more 3D entities to enable at least a partial rendering or updating of said entities at the user end. The stream encoder 68 may be arranged to control which segments are transmitted, either individually or in combination, at a given time and in a particular sequence so that at least a partial rendering of the 3D entities can be performed and updated.

Referring still to Figure 2, each of the user terminals 20, 40,50 has an associated client 70, 72, 74. Figure 3 shows high level functional modules of the client 70 on one of the user terminals 20. The user terminal 20 comprises circuitry and/or software which provides a stream decoder 72 for receiving and decoding the streamed data, including possible metadata changes. The user terminal 20 also comprises a video decoder 74 for decoding the video data 62 and the audio data 64 from the stream. A mixed reality (MR) engine 76 is provided for creating, caching and rendering 3D entities from the 3D data 66, and for rendering the video and audio data 62, 64, all for output via an output module 78 at substantially synchronised time frames. The output module 78 may be a television screen, the screen of a mobile terminal, or a VR headset 30 as indicated in Figure 1.

Figure 4 shows an example schematic diagram of components of the server 10. The server 10 has a controller 80, an optional display 88, optional hardware keys 90, a memory 82, RAM 84 and an input and output interface 86. The content platform 60 also comprises, or has access to, a content storage memory 96 for storing the above-mentioned video data 62, audio data 64,3D data 66 and, if provided, real world data 67. In some embodiments, the video data 62, audio data 64,3D data 66 and real world data 67 may be received as a live feed from an external source.

The controller 80 is connected to each of the other components in order to control operation thereof.

The input and output interface 86 is provided for communicating with other terminals, e.g. the first to third user terminals 20, 40, 50 via the network 30 shown in Figure 1.

The memory 82 may be a non-volatile memory such as read only memory (ROM) a hard disk drive (HDD) or a solid state drive (SSD). The memory 82 stores, amongst other things, an operating system 92 and one or more software applications 94. The RAM 84 is used by the controller 80 for the temporary storage of data. The operating system 92 may contain code which, when executed by the controller 80 in conjunction with RAM 84, controls operation of each of hardware components of the terminal.

The controller 80 may take any suitable form. For instance, it may be a microcontroller, plural microcontrollers, a processor, or plural processors.

In embodiments herein, the software application 94 is configured to provide the functionality of the stream encoder 68 mentioned above.

Figure 5 shows an example schematic diagram of components of one or more of the first to third user terminals 20, 40, 50. The user terminal 20, 40, 50 has a controller 100, an output device 108, optional hardware keys 110, a memory 102, RAM 104 and an input and output interface 106. The output device 108 may be a video screen and/or a VR headset 30 as indicated in Figure 1.

The controller 100 is connected to each of the other components in order to control operation thereof.

The input and output interface 106 is provided for communicating with the server 10 and possibly to other ones of the terminals 20, 40, 50 via the network 30 shown in Figure 1.

The memory 102 may be a non-volatile memory such as read only memory (ROM) a hard disk drive (HDD) or a solid state drive (SSD). The memory 102 stores, amongst other things, an operating system 112 and one or more software applications 114. The RAM 104 is used by the controller 100 for the temporary storage of data. The operating system 112 may contain code which, when executed by the controller 100 in conjunction with RAM 104, controls operation of each of hardware components of the terminal.

The controller 100 may take any suitable form. For instance, it may be a microcontroller, plural microcontrollers, a processor, or plural processors.

In embodiments herein, the software application 114 is configured to provide the client 70 of the first user terminal 10 mentioned above and shown in Figures 2 and 3.

Figures 6 and 7 visually illustrate the above sets of data using an example. Figure 6 shows a single frame of the video data 62 which represents, when processed and rendered by the video decoder 74 (see Figure 3), a video scene 120 of a sports stadium. The video scene 120 may change for subsequent frames, for example as the capturing camera pans or changes position. Further, a corresponding frame of real-world data 67 is provided an overlaid on the video scene 120. The real world data 67 in this case represents, when processed and rendered, first and second real-world objects 122,124, which maybe members of a sports team. The real world data 67 may be received as a live or near-live feed. For example, the players 122,124 may be real people each carrying a positioning device such as a radio positioning (e.g. HAIP) tag as indicated. Over subsequent frames, the positioning of the players 122,124 may change and so therefore will their positions. It will be appreciated that corresponding frames of audio data 64 may accompany the video and real-world data 62, 67.

Figure 7 shows how 3D data 66 may be represented when processed and rendered by the MR engine 76 (see Figure 3). The 3D data 66 in this case represents two 3D entities, or objects, which are an animated graphical scoreboard 130 and animated graphical mascot 132. These entities are overlaid onto the video scene 120. The appearance of the 3D entities 130,132 may change over subsequent frames. For example, they may animate to provide areas of visual interest; they may grow in size, change colour, shrink and/or pan off the video scene 120. The 3D data 66 is substantially synchronised with the other data.

Generally speaking, the 3D data 66 may represent any entity for rendering in three dimensions relative to a video scene. For example, in the present context the 3D data 66 may represent 3D models of commentators, mascots, depth maps of the playing surface, textual information such as the score board, statistics, the names of respective players, advertisements, lighting, weather effects, webpages etc.

The 3D data 66 may be provided as a so-called “full 3D scene” which comprises all necessary information for the 3D entities of a given frame to be rendered at a user terminal, e.g. user terminal 10. It therefore will require a relatively large amount of bandwidth for streaming transmission to users.

In present embodiments, the 3D data 66 maybe provided in an additional or alternative form, namely as a plurality of elementary segments each of which represents a respective property of the entities to enable at least a partial rendering or update of the entities (either collectively or individually) at a user terminal.

Such segments are provided effectively by dividing the 3D data 66 into smaller data sets for each frame. Some segments may be particular to a given entity. Some segments may be general to all entities in the 3D scene. Examples of segments that maybe provided include, but are not limited to:

Scene update: the identity and change in spatial relationship of all entities, which can be referenced from a previous full scene;

Animation update: an animation update for a particular 3D entity;

Model update: an updated set of 3D data for a particular 3D entity (model updates may also be provided in different levels-of-detail (LOD) to permit rendering from a relatively coarse to fine resolution;)

Material update: an updated set of material or texture data for a particular 3D entity;

Model transformation: a change in position and/or rotation and/or scaling for a particular 3D entity, for example defined in a matrix.

Scene hierarchy: a data representation of the parent-child relationships of a particular 3D entity in a scene, so that when a parent is moved then all of the children are automatically moved in relation.

The list is not exhaustive and segments may represent other properties related to how 3D objects appear when rendered.

At the content platform 60 of the server 10, the software application 94 may be configured to determine on a frame-by-frame basis how to appropriately transmit the 3D data 66.

In some embodiments, the software application 94 determines the available bandwidth for the 3D data 66 and either sends the full 3D scene if there is sufficient bandwidth, or alternatively sends a so-called “3D packet” comprising one or more 3D segments appropriate to the bandwidth. In this way, at least a partial rendering of the 3D scene or one or more 3D entities may be achieved or updated notwithstanding a reduced bandwidth allocation.

In some embodiments, for example, for a total bandwidth budget of 7 Mbits/s, 6 Mbits/s may be allocated for the video, audio and real-world data 62, 64, 67 leaving just 1 Mbits/s for the 3D data 66.

The 3D segments of 3D data 66 may be created manually or automatically by the software application 94.

Figure 8 is a flow diagram illustrating in overview the steps for creating a priority pool which, as will be explained, determines how packets of 3D segments are created. The priority pool is in effect an ordered list of the 3D segments. In a first step 8.1 the 3D segments are provided. In a second step 8.2 priorities are defined for the 3D segments. This may be performed manually or automatically, for example based on the file size of the 3D segments and/or on the nature of the 3D segments and their relative importance for creating a meaningful representation of an entity when rendered. In a third step 8.3 the prioritised 3D segments are stored as a priority pool and made available to the stream encoder 72.

Figure 9 is a schematic view of a priority pool 139 shown by way of example. Within the priority pool 139 is stored the full 3D scene 140. Although strictly-speaking not a 3D segment, it may be stored within the priority pool 139 and assigned the lowest priority level (P4) because it typically has the largest file size. 3D segments A - N are provided in the priority pool with higher priority levels. In some embodiments, a base 3D segment 142 is provided which has the highest priority, meaning that it is either always within a 3D packet, regardless of available bandwidth, or that it is in within most 3D packets. The base 3D segment 142 may typically have the smallest file size. There may be more than one base 3D segment 142 in some embodiments. Where the base 3D segment 142 is always within a 3D packet, and the available bandwidth is lower than its file size, it will be appreciated that it may result in frame skips at the client end which should (in most cases) be relatively few in number. The other 3D segments 144,146,148 will typically have higher priority levels. Two or more 3D segments 142,144,146,148 may have the same priority level.

The priority pool 139 may be updated dynamically based on a previous selection of 3D segments 142,144,146,148 for a 3D packet. This allows for all 3D segments to reach the user terminal over successive frames in order that the 3D rendering will accumulate to something approaching the full 3D data set. In some embodiments, the priority pool 139 may reset to its original state if the full scene data is sent when there is sufficient bandwidth.

Figure 10 is a flow diagram illustrating by way of example the steps for transmitting a 3D packet by the software application 94. In a first step 10.1 the available bandwidth for the 3D data is determined. In a second step 10.2 the priority pool 139 is provided. In step 10.3, a 3D packet is generated comprising one or more 3D segments based on the available bandwidth and the priority pool 139. In step 10.4 the 3D packet is transmitted in synchronisation with the corresponding video, audio and real-world data frame. A further 3D packet will be generated for a subsequent frame and so on.

Figure 11 is a flow diagram illustrating by way of example the steps for creating a 3D packet for a current frame. Other methods maybe employed. In a first step 11.1, the base 3D segment 142 is added to the 3D packet, assuming a base segment is assigned. In a second step 11.2, it is determined if the next prioritised 3D segment in the priority pool 139 has a file size equal to, or less than, the available bandwidth. If so, then said 3D segment is added to the priority pool in step 11.3 and the process returns to step 11.2 for the next prioritised segment. If not, then in step 11.4 the 3D packet is complete and ready for transmission. A subsequent, and optional step 11.5, may update the priority pool based on the used 3D segment(s).

Figure 12 is a flow diagram illustrating in a further embodiment the steps for transmitting a 3D packet on a frame-by-frame basis. In a first step 12.1, the current frame m is identified.

In a second step 12.2 the available bandwidth for the 3D data is determined. In a third step 12.3 it is determined if the available bandwidth is sufficient to transmit the full scene data. If so, then in step 12.4 the full scene data is transmitted and the next frame is identified in step 12.5 so that the process may repeat. If in step 12.3 the bandwidth is not sufficient, then in step 12.6 the 3D packet is generated based on the provided priority pool 139 (step 12.7) and transmitted in step 12.8. As before, the priority pool 139 may be updated based on the 3D segment(s) that occupy the 3D packet.

A more detailed exemplary embodiment will now be explained with reference to Figures 13 and 14.

Figure 13 is a schematic view of a priority pool 160. A base 3D segment 150 is provided which is a scene update segment, having the highest priority level (po). Its file size requires a bandwidth (shown in column 170) of 0.1 Mbits/s. A lesser priority (pi) is assigned to an animation update segment 151 for a first object, a model update segment 152 for the first object and a model update segment 153 for a second object. Each of said 3D segments has a bandwidth requirement of 0.2 Mbit/s. A still lesser priority (p2) is assigned to a material update segment 154 for the first object, having a bandwidth requirement of 0.3 Mbit/s. Finally, the full 3D scene 155 is provided having the lowest priority (p4) and having a bandwidth requirement of 1 Mbit/s.

Figure 14 is a schematic view of a sequence 190 of 3D packets 181 - 188 constructed using the Figure 11 method when applied to the Figure 13 priority pool 160.

For a first frame, the available bandwidth is 1 Mbit/s and hence the full 3D scene 155 may be transmitted. For a second frame, the available bandwidth drops to 0.3 Mbit/sec and so the full scene data 155 cannot be used. Accordingly, the base 3D segment 150 is added to the 3D packet 182 by default and the next prioritised 3D segment, which is the animation update segment 151, is added. No further 3D segment can be added because the bandwidth limit has been reached. The finalised packet 182 is then transmitted. The priority pool 160 at this point maybe updated to move the animation update segment 151 beneath the lowest priority segment 154 (remembering that the full scene data 155 is not considered a 3D segment). The model update segment 152 moves up the priority pool 160 in place of the animation update segment 151. For the next frame 183, the available bandwidth remains at 0.3 Mbit/sec. Again, the base 3D segment 150 is added to the packet 183. Next, the model update segment 152 is added. No further 3D segment can be added because the bandwidth limit has been reached. The finalised 3D packet 183 is then transmitted. The process repeats for subsequent frames as shown. Note that for the seventh frame, the 3D packet 187 will only comprise the default base 3D segment 150 because of the low bandwidth of 0.15 Mbit/sec. In the eighth frame, the bandwidth increases to 1 Mbit/sec and hence the full scene data 155 may be transmitted. At this point, the priority pool 160 may be reset to its original state shown in Figure 13.

As mentioned above, by updating the priority pool 160, all 3D segments will reach the user terminals over a sequence of frames no matter when a user starts consuming the content.

In some embodiments, the 3D segments may comprise multiple segments representing a common entity with different respective levels of detail. A higher priority may be assigned to the 3D segment which represents the entity in a low resolution or level of detail, whereas a lower priority may be assigned to 3D segment representing the same entity in high resolution of level of detail.

For example, a spherical 3D object may be represented by a first, second and third 3D segments having respective high to low priorities in the priority pool. Figure 15 shows how the corresponding object 190,191,192 may appear at the user terminal when rendered. In this way, the lower resolution versions maybe transmitted more frequently than the higher resolution versions, when bandwidth is restricted, so that the user sees at least a rough representation of the object whilst waiting for bandwidth to increase.

Figure 16 is an example of a mixed reality streaming pipeline which is shown for completeness. The pipeline depicts the various processing stages 68, 72, 74, 76, 78 performed by the application programs 94,114 and mentioned previously with reference to Figures 2 and 3.

At the user terminals 20, 40,50 the software application 114 is configured to decode the various data sets, including 3D segments when received, for rendering.

In some embodiments, the mixed reality engine 76 may cache certain data, for example 3D model data for one or more entities, received either in the full 3D scene or in one or more object segment(s). In this way, high resolution models are available for the rest of the stream, if required.

Embodiments above are also applicable to so-called green screen mixed reality content, where the background may comprise dynamic 3D entities.

In some embodiments, the 3D data 66 may include website data, e.g. as a link or HTML with one or more images. One or more 3D segments may represent this website data that maybe stored and transmitted by the server 10 to one or more user terminals 20,40,50 using the above considerations. The website data may allow for user interactions.

For example, the website data in a segment may represent a 2D rectangle object to be presented in the 3D space, relative to the video data. For example, the scoreboard 130 shown in Figure 7 may be a 2D webpage that is interactive, e.g. the user being able to select links or buttons in the webpage using a controller. The controller may be used to draw a visual line from a virtual controller in the 3D space to the webpage so that the user may easily see where the controller is pointing to. Alternatively, or additionally, a floating pointer may be provided for selection. For systems having no controller, the user’s gaze may be used to select functionality in the displayed webpage. For example, the webpage may be floating in the Figure 7 scene on which are presented player statistics are displayed responsive to selection of a particular player. Another example is when watching an event, e.g. a music festival having multiple rooms or stages, where the webpage displays a billboard floating in the 3D space where a schedule for the different rooms/stages is displayed. The user may be able to select from the webpage which room/stage they wish to view.

A new type of 3D update segment maybe provided, for example an interaction update segment, that may be for a full scene or for a specific model. Data for this interaction may be a webpage expressed as HTML or a scripting language such as JavaScript, Python, Lua etc. that can be parsed and executed by the clients of the user terminals 20,40,50.

Although website data is given as an example embodiment, any form of displayable page for interaction may be used.

The Embodiments provide advantages in that 3D entities may be delivered to end users gradually, in smaller segments, to provide some rendering capability when bandwidth is restricted. The 3D content is frame synced with video content in one way traffic to the end user. End users may join the stream at any time, albeit potentially receiving decreased 3D quality. The embodiments enable different forms of mixed reality applications, including gaming, panoramic video etc. Special logic may not be required at the user end.

It will be appreciated that the above described embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present application.

Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present application or of any application derived therefrom, new claims may be formulated to cover any such features and/or combination of such features.

Claims

1. A method comprising:

providing first data representing a plurality of frames of a stereoscopic video scene for streaming transmission to one or more remote user devices;

providing second data representing one or more entities to be rendered in threedimensions relative to the stereoscopic scene, the second data being provided as a plurality of segments each of which represents a respective property of the one or more entities to enable at least a partial rendering or updating of the one or more entities at the user device(s); and for sequential frames of the video scene:

transmitting the first data to one or more remote user devices over a data channel;

determining an available bandwidth of the data channel for transmission of the second data; and transmitting one or more selected subsets of the segments to the one or more remote user devices based on the available bandwidth to enable at least a partial rendering or updating of the one or more entities relative to the stereoscopic scene.

2. The method of claim l, wherein a plurality of segments are transmitted for each frame.

3. The method of claim 2, wherein the number of segments is adjusted dynamically based on the available bandwidth.

4. The method of any preceding claim, wherein each of the provided segments has an associated priority and wherein the one or more selected subsets is based both on the available bandwidth and said priorities.

5. The method of claim 4, wherein all data segments are transmitted at least once over a plurality of sequential frames in an order which is based on said priorities.

6. The method of claim 4 or claim 5, wherein the priorities are indicative of the amount of data required to transmit the associated segments.

7. The method of any of claims 4 to 6, wherein the priorities are dynamically updated responsive to a prior selection of a given segment.

8. The method of any preceding claim, wherein the segments comprise one or more of: a scene update, a model update, a model position, a model transformation, an animation update, a material update, a scene hierarchy, texture, and lighting.

9. The method of any preceding claim, further comprising transmitting a predetermined data segment irrespective of bandwidth.

10. The method of claim 9 when dependent on claim 8, wherein said predetermined data segment is a scene update.

11. The method of any preceding claim, wherein one or more of the segments comprises data representing a displayable interactive page, or a link thereto, for display of the page at a user device and for receiving a user interaction from the user device.

12. The method of any preceding claim, further comprising providing a set of full scene data representing the one or more entities for full three-dimensional rendering thereof at the one or more remote user devices, and wherein said full scene data is transmitted in the event that the determined bandwidth is above a first threshold and the selected subsets are transmitted only if the bandwidth is below said first threshold.

13. The method of any preceding claim, further comprising providing third data representing a plurality of corresponding frames of audio data for streaming transmission to, and rendering at, the one or more user devices over the data channel.

14. The method of claim 13, wherein the third data represents spatial audio signals.

15. The method of any preceding claim, wherein the first and second data represents mixed reality video content for streaming transmission to one or more user terminals.

16. A computer program comprising instructions that when executed by a computer program control it to perform the method of any preceding claim.

17. A non-transitory computer-readable storage medium having stored thereon computer-readable code, which, when executed by at least one processor, causes the at least one processor to perform a method, comprising:

18. An apparatus, the apparatus having at least one processor and at least one memory having computer-readable code stored thereon which when executed controls the at least one processor:

to provide first data representing a plurality of frames of a stereoscopic video scene for streaming transmission to one or more remote user devices;

to provide second data representing one or more entities to be rendered in threedimensions relative to the stereoscopic scene, the second data being provided as a plurality of segments each of which represents a respective property of the one or more entities to enable at least a partial rendering or updating of the one or more entities at the user device(s); and for sequential frames of the video scene:

to transmit the first data to one or more remote user devices over a data channel;

to determine an available bandwidth of the data channel for transmission of the second data; and to transmit one or more selected subsets of the segments to the one or more remote user devices based on the available bandwidth to enable at least a partial rendering or updating of the one or more entities relative to the stereoscopic scene.

19. Apparatus configured to perform the method of any of claims 1 to 15.

Intellectual

Property

Office

Application No: Claims searched: