WO2017114821A1

WO2017114821A1 - Video streams

Info

Publication number: WO2017114821A1
Application number: PCT/EP2016/082694
Authority: WO
Inventors: Emmanuel Thomas; Arjen VEENHUIZEN; Ray Van Brandenburg; Mattijs Oskar Van Deventer
Original assignee: Koninklijke Kpn N.V.; Nederlandse Organisatie Voor Toegepast-Natuurwetenschappelijk Onderzoek Tno
Priority date: 2015-12-28
Filing date: 2016-12-27
Publication date: 2017-07-06
Also published as: EP3398346A1

Abstract

An apparatus (10) is configured for providing multiple video streams by using metadata from at least one scene information stream relating to a scene and comprises a video coordinator (4) and at least two video generators (3). The at least one scene information stream comprises metadata descriptive of at least one event and includes a time indication of said event. The video coordinator (4) is configured for: - determining at least two partial views of the scene, each partial view covering at least a part of the scene, and - allocating a video generator to each partial view. Each video generator (3) is configured for: - generating, on the basis of the metadata from the scene information stream, a video stream for the allocated partial view, - transforming at least one time indication of the scene information stream into timestamps configured for synchronously rendering the video stream, - assigning the timestamps to the video stream, and - transmitting at least one of said video streams together with its assigned timestamps.

Description

Video streams

Field of the Invention

The present invention relates to video streams. More in particular, the present invention relates to providing multiple video streams of a scene.

Background of the Invention

It is known to remotely watch scenes, for example on a television or computer screen, while being able to choose a camera producing a particular view of the scene. A scene may be shot from several camera locations or camera angles simultaneously, each camera having its own camera location or camera angle and therefore producing its own view of the scene. When multiple cameras offering multiple different views are provided, a spectator is able to choose a view of her liking, for example a view of one of the goals when watching a football match.

Not only real scenes can be watched using multiple different views, but also virtual scenes, such as scenes of computer games. In computer games it is known for spectators to share the view of one of the players, or to be offered an overview of the game. However, the number of alternative views is typically very limited.

SourceTV by Valve Corporation (https://developer.valvesoftware.com/wiki/SourceTV) in principle allows a very large number of spectators to watch online games, each spectator being able to choose her own view. This is achieved by distributing game data via a network of distributed servers and proxies to spectator clients. Each proxy can serve up to 255 spectators. This means that for allowing very large numbers of spectators to simultaneously watch a game, the required number of servers and proxies can become prohibitive. In addition, each spectator requires her own spectator client, a client device for non-participants. As each spectator client requires a substantial amount of processing and therefore uses a significant amount of energy, the SourceTV solution is also inefficient with regard to the use of energy.

There is a need for a solution, preferably comprising an apparatus and a method, which allows a large number of spectators to view a scene, such as a scene of a computer game, from different (virtual) viewpoints chosen by the spectators, which solution has improved scaling properties.

Summary of the Invention

It is an object of the present invention to solve this problem by generating a plurality of video streams based on a single scene, allowing each spectator to select one or more of the produced video streams, while providing timestamps in those video streams for synchronization. This allows a stable composite video image to be formed from two or more synchronized video streams. Spectators may compose a desired video image from the images (or image frames) of the available video streams, thus limiting the number of video streams to be produced. By using the invention, video streams relating to scenes can be easily scaled without requiring substantial amounts of hardware.

More in particular, the invention provides a method of producing multiple video streams by using at least one scene information stream relating to a scene, wherein the scene information stream comprises metadata descriptive of at least one event, said metadataincluding a time indication of the event, the method comprising:

determining at least two partial views of the scene, each partial view covering at least a part of the scene,

- generating, on the basis of the metadata from the scene information stream, a video stream for each of the partial views,

transforming the at least one time indication of the scene information stream into timestamps configured for synchronously rendering the video streams,

assigning the timestamps to the video streams, and

- transmitting at least one of said video streams together with its assigned timestamps.

By determining at least two partial views of the scene, each partial view covering at least part of the scene, at least two parts of the scene can be offered to the spectators, each of those parts of the scene being represented by a separate video stream generated on the basis of the metadata from the scene information stream.

It is possible to combine two or more of such partial views into a new composite partial view, for example by combining adjacent or overlapping views, optionally in combination with cropping of the (partial) views and/or of the composite (partial) view. This allows the number of views to be significantly larger than the number of video streams, thus saving video stream generating apparatus and reducing the amount of data to be transmitted. However, this also requires such composite views to be precisely synchronized to avoid any "jumps" when, for example, a game character crosses the boundary of two adjacent partial views presented to the same spectator.

In order to achieve such a precise synchronization of the video streams of the partial views, the present invention advantageously uses time indications normally present in a scene information stream and transforms these time indications into timestamps which can be used for synchronization during the rendering and/or playing out of the views. More in particular, in accordance with the invention the time indications of the scene information stream are transformed into timestamps configured for synchronously rendering the video streams. Thus the video can be synchronized so as to provide synchronized partial views. In addition, by assigning the timestamps to the video streams, and transmitting each video stream together with its assigned timestamps, it is ensured that the timestamps are present in the video streams so as to allow synchronization of the rendered streams.

It is noted that determining the partial views is preferably carried out by the entity providing the video stream service and may be independent from user input. That is, the partial views may not be determined by users. Instead, user views may be composed from the partial view offered (e.g. a user device may generate a user view by requesting one or more partial views). Such user view may be generated by for example stitching (synchronized) video frames related to different partial views, and/or by cropping the desired user view from video data related to the one or more partial views. By composing a desired view from available partial views, it is no longer necessary to provide a virtually unlimited number of partial views.

As the partial views cover at least part of the scene, they may cover the entire scene.

However, it is preferred that at least most partial views cover only part of the scene. It is further preferred that each partial view covers a different part of the scene, although partial views may partially overlap. In some embodiments, it can be advantageous to provide at least one view of the entire scene.

Transforming the time indications into timestamps may be carried out in several ways. In an embodiment, transforming the time indications comprises applying a linear transform. However, other transforms are also possible, such as transforms in which a single time indication influences multiple timestamps. The transform may involve a calculation or a look-up table.

In an embodiment, the scene information stream is generated by a computer-generated game. That is, the invention can advantageously be used to provide video streams of computer game scenes. However, the scene information stream may also originate from a motion picture, such as a computer-generated motion picture.

At least two partial views may overlap. By providing overlapping views, the number of (partial) video streams can be reduced. Some partial views may entirely be constituted by parts of other partial views. The method may comprise composing a view to be displayed by using one or more of said partial views.

The so-called camera position of the partial views can be suitably chosen. The camera position refers to the angle and the perspective to the views. In an embodiment, the partial views correspond with an infinite distance camera position. That is, the partial views have the perspective corresponding with an infinite camera position. The angle of the camera position relative to the horizontal plane of the views may be suitably chosen, and may vary from 0° to 90°, for example 60°.

In some embodiments, the transmitting may be based on HTTP adaptive streaming. In such embodiment, the method may further comprise requesting the at least one video stream by using a spatial manifest structure.

The invention also provides a software program product comprising instructions allowing a processor to carry out the method described above. The software program product may be stored on a tangible carrier, such as a DVD or a USB stick. Alternatively, the software program product may be stored on a server from which it may be downloaded using the Internet. The software program product contains software instructions which can be carried out by the processor of a device, such as a server, a user device (for example a smartphone), and/or a monitoring device.

The invention further provides a spatial manifest data structure configured for use in the method described above, and in particular configured for producing multiple video streams from a scene information stream. More in particular, the spatial manifest data structure comprises

Identification information for selecting one or more video streams, each video stream representing a partial view of a scene, and position information representing the position of each of the partial views in the scene, wherein said spatial manifest structure is preferably configured for retrieving said one or more video streams using HTTP adaptive streaming, wherein said one or more video streams each comprise a plurality of chunks, and wherein said identification information comprises chunk identifiers for selecting said chunks to request transmission thereof. The spatial manifest data file preferably allows time synchronized video frames from the at least two video streams to be combined into a video frame for display. The invention still further provides an apparatus configured for generating multiple video streams on the basis of the metadata from at least one scene information stream relating to a scene, wherein the scene information stream comprises metadata descriptive of at least one event, the metadata including a time indication of said event, the apparatus comprising a video coordinator and at least two video generators, the video coordinator being configured for:

determining at least two partial views of the scene, each partial view covering at least a part of the scene, and

allocating each partial view to a video generator; and

each video generator being configured for:

- generating, from the metadata of the scene information stream, a video stream for the allocated partial view,

transforming the at least one time indication of the scene information stream into timestamps configured for synchronously rendering the video stream,

assigning the timestamps to the video stream, and

- transmitting the video stream together with its assigned timestamps.

By determining at least two partial views of the scene, each partial view covering at least part of the scene, and by allocating a video generator to each partial view, at least two different parts of the scene can be produced, each part of the scene being represented by a video stream generated from the scene information stream by a separate video generator.

As mentioned above, it is possible to combine two or more (partial) views into a new composite (partial) view, for example by combining adjacent or overlapping views. This allows the number of partial views to be significantly larger than the number of video generators.

In an embodiment, each video generator is configured for rendering video frames of the allocated partial view by using said metadata from said scene information stream, and encoding the video frames into a video stream. That is, each video generator is capable of generating video frames of the particular partial view allocated to the video generator and producing a video stream containing those frames.

In an embodiment, each video generator is configured for splitting (that is, segmenting) the video stream into time segments, preferably time segments suitable for HTTP-based adaptive streaming. A non-limiting example of HTTP-based adaptive streaming is MPEG DASH.

Different embodiments of the apparatus according to the invention are possible. In an embodiment, at least one video generator comprises:

at least one spectator client configured for generating the video stream for the allocated partial view,

- a control unit configured for controlling the transforming of the time indications of the scene information stream into timestamps, and

at least one encoder configured for encoding the video stream and

at least one multiplexer for assigning the timestamps to the video stream.

In this embodiment, different functions are carried out by different functional units. The control unit of the video generator can carry out the controlling of the transforming of the time indications. In some embodiments, the control unit of the video generator may not, or not only, control the transforming of time indications carried out by other units, but may carry out the transforming itself. That is, in some embodiments the control unit of a video generator can be configured for controlling the transforming of the time indications of the scene information stream into timestamps.

Advantageously, the at least one multiplexer of each video generator may be configured for segmenting the video stream into time segments. This allows time-segmented video streams to be produced (that is, generated).

In an embodiment, at least one video generator comprises:

at least one streaming client configured for generating the video stream for the allocated partial view,

a control unit configured for controlling the transforming of the time indications of the scene information stream into timestamps,

at least one encoder configured for encoding the video stream, and

at least one multiplexer for assigning the timestamps to the video stream.

It is noted that in this embodiment, the spatial manifest data file may advantageously be used by the spectator client of the video generator, preferably in addition to its regular end user use.

The time indications may be transformed into timestamps in various ways. It is preferred, however, that each video generator is configured for linearly transforming the time indications into timestamps. A linear transform has the benefits of efficiency and simplicity. A special embodiment of a linear transform is the identity transform, which may be used in some embodiments. However, a linear transform of the type y = a.x + b, with both a and b non-zero, is preferred. Those skilled in the art will understand that various other transformations may also be utilized, and that both linear transforms and non-linear transforms may be implemented by using a look-up table.

In embodiments of the apparatus described above, the video coordinator may comprise: - a rendering coordinator configured for coordinating the partial views of the video

generators,

an encoding coordinator configured for coordinating settings used for the encodings, a segmentation coordinator configured for coordinating the segmenting into time segments of the video streams and preferably also for producing manifest files describing relationships between the video streams, and

a video coordinator control unit configured for controlling the rendering coordinator, the encoding coordinator and the segmentation coordinator.

It is noted that the video coordinator controls the video generators mentioned above.

The apparatus may further comprise a scene information server configured for producing the scene information stream and a timer configured for supplying the time indications to the source server.

The invention yet further provides a system for supplying multiple video streams, the system comprising:

a communication network,

- at least one video streaming client connected to the communication network, and the apparatus as described above, the apparatus also being connected to the communication network.

In some embodiments of the system, the at least one video streaming client is not constituted by a separate unit, but by the video streaming client of the video generator (recursive embodiment).

The at least one video streaming client is preferably configured for retrieving video streams, preferably using a manifest file according to the MPEG DASH SRD standard format. It will be understood that the at least one video streaming client may additionally, or alternatively, be configured for retrieving a video stream according to another standard.

In an embodiment, the apparatus of the system is configured for transmitting selection information indicative of the partial views to the scene information server, and the scene information server is configured for transmitting partial scene information streams to the respective video generators. By selectively providing scene information, the amount of data transmitted by the scene information server can be further reduced.

The system described above may further comprise at least one display device connected to the at least one video streaming client. The at least one display device may be a separate device, or may be integral with at least one video streaming client.

It will be understood that for the purpose of the present invention the term video streaming client may refer to a device comprising a processor, a memory, and a receiver for receiving and/or transmitting a video stream according to the invention, as the case may be. Non-limiting of such as device are tablets, smartphones, television sets, game consoles, set-top boxes, as will readily be understood by those skilled in the art.

Brief description of the drawings

The present invention will further be explained with reference to exemplary embodiments illustrated in the drawings, in which:

Fig. 1 schematically shows a system for supplying multiple video streams in accordance with the invention.

Fig. 2 schematically shows a matrix of partial views which together constitute a composite view of a scene as used in the invention.

Fig. 3 schematically shows an exemplary embodiment of a video generator in accordance with the present invention.

Fig. 4 schematically shows an exemplary embodiment of a video coordinator in accordance with the present invention.

Fig. 5 schematically shows an alternative embodiment of a video generator in accordance with the present invention.

Fig. 6 schematically shows an exemplary embodiment of a method in accordance with the invention.

Fig. 7 schematically shows a transformation of time indications into timestamps in accordance with the invention. Fig. 8 schematically shows an exemplary embodiment of communication between a video coordinator and video generators in accordance with the present invention.

Fig. 9 schematically shows an exemplary embodiment of communication between a scene information server, a video generator and a CDN ingest node in accordance with the present invention.

Fig. 10 schematically shows an exemplary embodiment of communication between a CDN delivery server, a recursive video generator and a CDN ingest node in accordance with the present invention.

Fig. 1 1 schematically shows an exemplary embodiment of a computer program product in accordance with the invention.

Detailed description of embodiments

The present invention makes it possible to produce a plurality of video streams from a single scene, while providing timestamps in those video streams for synchronization. This allows a stable composite video image to be formed from two or more synchronized video streams. Spectators may therefore compose a desired video image from the available video streams. By using the invention, video streams relating to scenes can be easily scaled without requiring substantial amounts of hardware. In this document, the term scene may refer to an image of a source video.

An exemplary embodiment of a system according to the invention is schematically illustrated in Fig. 1. The system shown merely by way of non-limiting example in Fig. 1 comprises an apparatus 10 for producing (that is, generating) multiple video streams, a communication network 5, video streaming clients 6 and display devices 7. The apparatus 10 is shown to comprise a scene information server 1 , a timer 2, video generators 3 and a video coordinator 4.

The scene information server 1 can be configured for producing a scene information stream relating to a scene, such as a scene of a computer game or a scene of a film. The scene information stream produced by the scene information server 1 can contain metadata describing a scene, in particular events of a scene, but will typically not contain image data. It can thus considered to be a metadata stream. E.g. a stream containing only metadata. The scene information stream contains metadata descriptive of events of the scene, including one or more time indications, each time indication relating to an event in the scene. In the embodiment of Fig. 1 , a timer 2 is present to provide a time reference for the time indications. In other embodiments, the system clock of the server 1 may be used for this purpose. In some embodiments of the invention, the timer 2 is an integral part of the scene information server 1.

An event in a scene may for example relate to a door opening, a shot being fired or a body falling on the ground. Each event is provided with a time indication to allow synchronization at a later stage. The scene information server 1 inserts the time indications in the scene information stream sent to the video generators 3. As is be clear from the above, an event may be described as an action, user or computer generated, which causes a change, over time in the video data associated with a scene. The action may thus be described using metadata and may be used to generate (computer generated) images forming a video stream (such as a networked video game). In accordance with the invention, the apparatus 10 comprises at least two video generators. Each video generator 3 is configured for generating, on the basis of the metadata from the scene information stream output by the scene information server 1 , a video stream for a partial view of the scene. Each partial view covers at least a part of the scene, while the partial views may together cover the entire scene. That is, each video generator 3 can be said to be configured for producing a partial video stream. This allows spectators to select one or more of the produced partial video streams for rendering.

In addition to generating a video stream, a video generator has the task of allowing its video stream to be properly synchronized with the video streams of other video generators. To this end, a video generator is configured for transforming the time indications of the scene information stream into timestamps, assigning the timestamps to the respective video stream, and transmitting both the video stream and the timestamps assigned to it. In some embodiments, the timestamps can be embedded in the video stream, thus ensuring that the timestamps are transmitted together with the video stream. The timestamps can be used to synchronize the rendering of the video streams and, as mentioned before, are derived from events in the scene information stream. In embodiments of the invention the timestamps are derived from the time indications by a linear transformation, but the invention is not so limited.

In embodiments of the invention, the video generators 3 are also configured for generating video frames of the allocation partial view by using the metadata of the scene information stream. That is, each video generator uses its part of the metadata of the scene information stream to generate video frames which can later be displayed. The video generators may be further configured for encoding those video frames into a video stream, thus producing a video stream from the video frames. The video generators 3 transmit the video streams, via the network 5, to video streaming clients 6 of the spectators. The video streaming clients 6 use the display devices 7 to display the partial views represented by the video streams.

A video coordinator 4 is connected with the video generators 3 to coordinate the production (that is, the generation) of the video streams, for example by assigning partial views to the respective video generators. More in particular, the video coordinator can be configured for determining two or more partial views of the scene, each partial view covering at least part of the scene, and allocating a video generator to each partial view (or, conversely, to allocate a partial view to each video generator). In a typical embodiment, only a single video generator is allocated to each partial view in order to save resources.

A scene which can be displayed using the present invention is schematically illustrated in Fig. 2. The scene 20 is shown to be divided into 56 partial views. In the embodiment shown, the partial views all have the same size, but in other embodiments the sizes of the partial views may vary. Some partial views may cover, for example, a quarter of the scene, while others may cover only 1/56^th of the scene, as shown in Fig. 2, or even less. In the example shown, the partial views have numbers relating to their coordinates in the rectangular grid.

The partial views of the scene can have different "camera positions" or viewpoints of virtual cameras. In embodiments of the invention, the viewpoint of all partial views is at an acute angle of the scene, for example 30 to 45 degrees, while the distance is "infinite". The scene coverage of the partial views is in typical embodiments different for each partial view.

An exemplary embodiment of a video generator according to the invention is schematically illustrated in Fig. 3. The video generator 3 of Fig. 3 is shown to comprise a video generator control unit 31 , a spectator client 32, an encoder 33, a multiplexer 34 and an optional wall clock 35.

The spectator client 32 is configured for receiving at least part of the scene information stream, that is, at least the part of the partial view to which the video generator 3 was allocated. In some embodiments, the video generator 3, and hence the spectator client 32, can receive the entire scene information stream, while in other embodiments the video generator 3 only receives a selected part of the scene information stream. In an embodiment, the video generator 3, or another unit of the apparatus 10, is configured for transmitting selection information to the scene information server 1 , which selection information allows the video generator 3 to receive only the part of the scene information stream that pertains to the partial view to which the video generator was allocated. In such an embodiment, the scene information server can be configured for transmitting partial scene information streams to the respective video generators.

As mentioned above, the spectator client 32 receives a (partial or whole) scene information stream and outputs video frames. In typical embodiments, the spectator client 32 outputs those video frames together with time indications to the encoder 33. These time indications were already present in the scene information stream and are in those embodiments passed on by the spectator client. In other embodiments, the spectator client 32 outputs the video frames to the encoder 33 and the time indication to the control unit 31 , where the time indications are transformed into timestamps, which are then output to the multiplexer 34.

The encoder 33 receives the video frames from the spectator client 32 and outputs encoded video frames to the multiplexer 34. The multiplexer 34 multiplexes the encoded video frames into containers suitable for transmission and/or storage. The output of the multiplexer may comprise an MPEG transport stream, an RTP stream or an ISOBMFF.mp4 type file, for example. The multiplexer 34 may be constituted by a segmenter (formatter, packager), which produces a stream consisting of files. The video may be generated in a number of different quality levels, and each of these quality levels may comprise a plurality of segments (which may also be referred to as chunks or fragments).

In some embodiments, the multiplexer 34 also transforms the time indications into timestamps and adds those timestamps to the containers. This transformation is schematically illustrated in Fig. 7. Suitable timestamps are so-called presentation timestamps, but the invention is not limited to presentation timestamps.

The transformation of time indications into timestamps may for example be pipeline-based, API-based, delay-based or based upon a combination of two or three of these techniques.

In a pipeline-based embodiment, the spectator client 32, the encoder 33 and the multiplexer 33 constitute a pipeline, for example a GStreamer pipeline (see http://qstreamer.freedesktop.org). In such an embodiment, the spectator client 32 provides raw frames (for example in the RGB or YUV format, RGB and YUV being well-known color spaces), each frame having a time indication which corresponds with the game time (assuming the scene is a computer game scene). The encoder 33 encodes the raw frames into encoded frames and tracks frame numbers, for example by using a counter. The multiplexer adds (presentation type or other) timestamps and puts the resulting data into containers. Special transformation functions may be added between the spectator client 32 and the encoder 33, and/or between the encoder 33 and the multiplexer 34, to transform the time indications into frame numbers, and to transform frame numbers into (presentation) timestamps. The video generator control unit 31 coordinates these transformation functions and controls their settings.

In an API-based embodiment, the video generator control unit 31 is an application which controls the spectator client 32, the encoder 33 and the multiplexer 34 via an API (Application Programming Interface). As soon as the spectator client 32 has produced a new raw data frame, it informs the control unit 31. In response, the control unit 31 transforms the time indication of the raw frame into a frame number, and the frame is sent to the encoder 33. Similarly, the control unit 31 may transform the frame number into a timestamp when the encoded frame is transferred from the encoder 33 to the multiplexer 34.

In a delay-based embodiment, the control unit 31 controls (and/or knows) the processing time (that is, the delay) induced by the spectator client 32, the encoder 33 and the multiplexer 34. The control unit 31 receives the time indications (typically equal to the game time in game-based embodiments) from the spectator client 32 and determines the (presentation or other) timestamp to be inserted by the multiplexer 34. The transformation may in an embodiment of this type, be described by the general formula y = a.x + b - D, where D is the delay induced by the spectator client 32, the encoder 33 and the multiplexer 34, and where a and b are suitable constants.

Other embodiments may combine features of at least two of the pipeline-based, the API-based and the delay-based embodiments.

The control unit 31 of the video generator 3 is configured for exchanging information with the video coordinator 4 (see Fig. 1 ). The information the control unit 31 receives from the video coordinator 4 is, for example, the identification of the partial view.

The wall clock 35 which is present in this embodiment serves to provide time references for producing the timestamps. In game-based applications, in which the scene is produced by a computer game, the spectator client 32 may be synchronized with the game clock, which ensures that the frame generation rate is synchronous with the game time. If the spectator client 32 is not synchronized with the game clock, a buffer may be used between the spectator client 32 and the encoder 33 to ensure that a frame is only sent to the encoder when the frame is complete.

Although only a single spectator client 32 is illustrated in the example of Fig. 3, the invention is not so limited, and a video generator may include two or more spectator clients. Each video generator may therefore contain multiple spectator clients, thus being able to render multiple virtual camera views in parallel.

Additionally, or alternatively, each video generator may contain multiple encoders, thus being able to generate multiple resolutions, multiple quality levels and/or multiple bitrates.

It is noted that a video generator may be constituted by a hardware unit, such as a hardware unit comprising a microprocessor, or by a software unit such as a thread, a process or a virtual machine running on a computer host. Similarly, other units such as a video coordinator may also be implemented in hardware and/or software.

An exemplary embodiment of a video coordinator 4 is schematically illustrated in Fig. 4. The embodiment of Fig. 4 comprises a video coordinator control unit 41 , a rendering coordinator 42, an encoding coordinator 43, a multiplexing coordinator 44, and an optional wall clock 45.

The rendering coordinator 42 determines the positions of the virtual cameras defining the partial views, the directions of the virtual cameras and their fields of view. These parameters may be determined once, for example at the beginning of a game, but may in some embodiments be changed, for example during a game.

The encoding coordinator 43 determines the settings used for the encoding in the encoder 33, while the multiplexing coordinator 44 produces manifest files (MFs) which describe the structure of the partial views (the so-called tiling). In some embodiments, the multiplexing coordinator 44 describes for each partial view its name, virtual camera direction and position, timing, bitrate, video quality and/or other parameters. The manifest files are transmitted via the network (5 in Fig. 1 ). In an embodiment, a manifest file may be a spatial representation description (SRD) as specified in amendment 2 of MPEG-DASH part 1 (ISO/IEC 23009-1 ).

The video coordinator control unit 41 coordinates the rendering coordinator 42, the encoding coordinator 43, and the multiplexing coordinator 44. For example, the control unit 41 may cause the multiplexing coordinator to be in possession of the settings of the rendering coordinator and the encoding coordinator. It also communicates settings between video generators (3 in Fig. 1 ). In software-implemented embodiments, the control unit 41 may also create or delete video generators. When the game action moves to another area, a new video generator for a new partial view may be created, while an existing video generator may be deleted when there are no spectators for its partial view.

The wall clock 45 is, in a typical embodiment, synchronized with the wall clocks of the video generators. As a result, all processes with the video generator are synchronized, as a result of which buffer overflows or underruns will be avoided.

An alternative embodiment of a video generator 3 is schematically illustrated in Fig. 5. The embodiment of Fig. 5 also comprises a video generator control unit 31 , an encoder 33, a multiplexer 34 and an optional wall clock 35. However, instead of a spectator client 32, the embodiment of Fig. 5 comprises a streaming client 32'. This allows this embodiment to be recursive, as the output of a recursive video generator may be used as input for another recursive video generator. In other words, received video streams are decoded and then go to the past processing phase of stitching.

In some embodiments, the streaming client 32' of the (recursive) video generator 3 and the video streaming client 6 (see Fig. 1 ) may be the same. This allows the resources required for providing video streams in accordance with the invention to be further reduced.

Controlled by the video generator control unit 31 , the streaming client 32' selects a set of partial views, receives the video streams relating to those partial views, decodes these partial video streams, stitches the decoded partial video streams together and forwards the result to the encoder 33. This embodiment has the advantage that a streaming client requires less processing than a spectator client. Another advantage of this embodiment is that it may introduce less delay.

An embodiment of the method according to the invention is illustrated in Fig. 6. The method 60 of Fig. 6 comprises an initial or start step 61 , in which the method is initiated.

In step 62, at least two partial views of the scene are determined, each partial view covering at least part of the scene. In the following step 63, a video stream is generated from the scene information stream for each of the partial views. In step 64, the time indications of the scene information stream are transformed into timestamps configured for synchronously rendering the video streams. In step 65, the timestamps are assigned to the video streams, while in step 66 each video stream is transmitted together with its assigned timestamps. The method may end in step 66. It will be understood that the method 60 may be repeated as desired.

As mentioned before, the transformation of time indications into timestamps is schematically illustrated in Fig. 7. The transformation, which is carried out by each video generator 3, is preferably a linear transformation of the type y = a.x + b, where y is a timestamp, x is a time indication, and a and b are suitably chosen constants.

Fig. 8 shows an example of communication between a video coordinator 4 and video generators 3. First, the video coordinator 4 determines how it wants the spectator video be provided (for example how the spectator video is to be encoded, segmented and how the partial views are to be configured). These values may be pre-configured, or set by a human operator. Next, the video coordinator provides dedicated video generator instructions to each video generator. Each of the video generators starts the instructed video generating process (or processes) and confirms its successful start-up to the video coordinator by sending a message (labelled "200 OK" in the present example). Once the video coordinator 4 has received confirmation from all video generators 3, it publishes the manifest file (also referred to as spatial manifest data file). Users (that is, spectators) can now interactively watch the game.

There are various ways video generator instructions may be distributed. The video coordinator may push the video generator instructions via an HTTP GET message or a previously established websocket connection. It may also be carried out in pull-based manner, where video generators retrieve the video generator instructions from the video coordinator. Messaging and signalling protocols may be used as well, for example XMPP or SIP.

The code below provides an exemplary embodiment of a video generator instruction. The video generator uses the video generator instruction to configure its spectator Client, encoder and multiplexer (see Fig. 3) in order to generate segments for partial view A31 , see Fig. 2. The video generator instruction is formatted using XML, but a skilled person may use some other suitable formatting type, such as JSON or ASN.1.

<video_generator_instruction>

<timing_instruction>

<wall_clock_server_address url="wallclockserver.example <time_zero_reference utc="2015-09-18T09:30: 10Z"/>

<video_frame_rate fps="60" />

</timing_instruction>

<spectator_client_instruction>

<source_tv_proxy_address url="source_tv_proxy_1.example.com" />

<game_instance id="shoot_m_up_instance_qxw" />

direction phi="0" theta="180" psi="0" />

<field_of_view hor="48" vert="36" />

</camera>

<render_section hor="5760" vert="0" width="1920" height="1080" />

</screen>

</spectator_client_instruction>

<encoding_instruction>

<video_codec_parameters codecs="avc1.4d0228 "/>

</encoding_instruction>

<segmenting_instruction>

<ingest_node url="cdn_ingest.example.com" ingest_method="FTP" username="admin" password="admin" />

<file_name base_name="shoot_m_up_instance_qxw_A31_" start_value="0000" increment="1 " extension=".mp4" />

<segment_parameters duration="2">

</segmenting_instruction>

<video_generator_instruction>

The exemplary video generator instruction has four elements:

• <timing_instruction>. This element provides generic instructions about timing. It has the following sub-elements: o <wall_clock_server_address>. This element provides the URL attribute of the wall clock server that the video generator can use to synchronise its wall clock.

o <time_zero_reference>. This element provides the UTC time attribute that is used as "time zero" for the rendering, encoding and segmentation processes in the video generators.

o <video_frame_rate>. This element provides the frame rate attribute of the video generated by the video generator expressed in frames-per-second (fps).

<spectator_client_instruction>. This element provides instructions to the spectator client of the video generator. It has the following sub-elements:

o <source_tv_proxy_address >. This element provides the URL attribute of the Source TV

Proxy to which the Spectator Client connects in the present example,

o <game_instance>. This element identifies the game instance with an id attribute. It is used to distinguish game instances to the Source TV Proxy,

o <camera>. This element provides attributes of the virtual camera. Examples of such

attributes are:

^■ Position, for example 1000 km above the scene.

^■ Direction, for example pointing straight down.

^■ Lens parameters, such as focal length and aperture.

^■ Field of view, for example 48 degrees horizontally and 36 degrees vertically.

^■ Sensor, e.g. a 100 ISO sensor sensitivity.

The element can also provide attributes about the projection that is used to map the virtual world, e.g. the Mercator projection.

o <screen>. This element provides parameters about the virtual screen to which the

Spectator Client renders its game data and events. It may include the following sub- elements:

^■ <resolution>. The overall screen resolution for this Spectator Client, for example 7680 x 4320 pixels.

^■ <render_section>. This element identifies the section of the overall screen that should be rendered by the spectator client. In this embodiment, it is a rectangular area, of which the top-left pixel has pixel coordinates (5760,0) and a resolution of 1920 x 1080.

<encoding_instruction>. This element provides instructions to the encoder of the video generator. It has the following sub-elements (note that the video frame rate has already been provided as attribute in the generic <timing_instruction> element):

o <resolution>. This is the resolution to be used for the video encoding, 1920 x 1080 pixels in this embodiment.

o <video_codec_parameters>. This elements provides video codec parameters, for

example the codec and codec profile used.

<segmenting_instruction>. This element provides instructions to the Multiplexer of the video generator. It has the following sub-elements: o <ingest_node>. This element provides instructions to ingest segments into a Content Delivery Network (CDN). This embodiment uses File Transfer Protocol (FTP) as ingest method, it provides the address of the FTP server, as well as a username and password. A skilled person may also use other CDN ingest methods, like HTTP or a websocket. o <file_name>. This element provides the information that the multiplexer needs to generate the file names of the segments (start value 0000, increment 1 ). In this exemplary embodiment, the first generated segment is shoot_m_up_instance_qxw_A31_0000.mp4, the second is shoot_m_up_instance_qxw_A31_0001.mp4, etcetera. The .mp4 extension indicates that an ISOBMFF container is to be used. The skilled person may use other extensions and its associated containers, for example .ts .avi, .mov, .wmv or another. o <segment_parameters>. This element provides parameters of the segment. This

embodiment provides an attribute on the duration of a segment, for example 2 seconds.

The above embodiment focusses on video. A skilled person could apply the same approach to audio. For example, a virtual microphone could be placed and oriented in the virtual world, and audio parameters could be provided, including an audio sample rate, for example 96000 samples per second, and an audio codec, for example "mp4a.40.2". The audio could be provided separately, for example in a generic way for the whole virtual world or in a specific way to an identified partial view (which may also be referred to as video tile). The audio could also be integrated with the partial view and provided in the same .mp4 container as the video.

Fig. 9 schematically shows an embodiment of the streaming inputs and outputs of a video generator when it is generating partial views. The video generator 3 is tuned to a broadcast of game data and events (e.g. the broadcast being the scene information stream, and the game data (e.g. time indication) and events being the metadata), for example provided by the Source TV Proxy, which may be a Source TV proxy according to the prior art. Alternative embodiments use joining a multicast of game data and events, or retrieving these via unicast. If the virtual world is very large, and the generated partial view represents only a small portion of the virtual world, optimizations can be made to retrieve only a limited (filtered) set of game data and events. For example, the spectator client could provide the coordinates of the area of the virtual world that it is interested in, and the Source TV Proxy would provide only those game data and events relevant to that area.

Once the spectator client of the video generator 3 has received sufficient game data and events from the scene information server 1 , the video generator 3 can render one or more frames of the partial view. The frames are encoded by the encoder and segmented by the multiplexer (see Fig. 3). The multiplexer then uploads a generated segment with the correct file name to the CDN Ingest Node 8 indicated in the video generator Instruction. This process is repeated for each subsequent segment. The process illustrated in Fig. 9 can be coordinated by a video coordinator 4 (see Fig. 1 ).

The code below provides an embodiment of a video generator Instruction to a recursive video generator (see also Fig. 10). The video generator has a video streaming client to generate partial view B21. The video streaming client preferably uses a dedicated manifest file ("for internal use") to learn which partial views are available and what their properties are. It combines this information with the information from the video generator instruction. It then deduces that it needs to download partial views A31 , A32, A41 and A42. It uses these four partial views to compose a single partial view and reduces the resolution as instructed. Next, the newly generated partial view is encoded and segmented similar to the previous embodiment.

<video_generator_instruction>

<timing_instruction>

<video_frame_rate fps="60" />

</timing_instruction>

<video_streaming_client_instruction>

<render_section hor="3840" vert="0" width="3840" height="2160" />

</screen>

</video_streaming_client_instruction>

<encoding_instruction>

<video_codec_parameters video_codec="H.264" profile="high_4:2:2" />

</encoding_instruction>

<segmenting_instruction>

<file_name base_name="shoot_m_up_instance_qxw_B21_" start_value="0000" increment- "! " extension- '. mp4" />

<segment_parameters duration="2" />

</segmenting_instruction>

<video_generator_instruction> This embodiment is similar to the previous embodiment, the differences being:

• <timing_instruction>. Wall clock and zero time reference may be omitted, as the timing of the outgoing partial views should be the same as for the incoming ones.

• <video_streaming_client_instruction>. This element provides instructions to the video streaming client of the recursive video generator. It has the following sub-elements:

o <mpd>. This element provides the URL attribute to retrieve the manifest file, o <screen>. Similar to previous embodiment.

• <encoding_instruction>. Similar to the previous embodiment. • <segmenting_instruction>. Similar to the previous embodiment.

The above embodiment focusses on video. A skilled person could apply the same approach to audio. For example, the output audio could be a weighted average of the input audio from the composing partial views, a selected subset of those (zero, one or more), or a newly created downmix of the different available sound channels.

Fig. 10 schematically shows an embodiment of the streaming inputs and outputs of a recursive video generator (see Fig. 5) when it is generating partial views. The recursive video generator 3 retrieves the manifest file (MPD) from a server, e.g. a CDN Delivery Node 9, using the URL provided in the video generation instruction. In some alternative embodiments, the manifest file is retrieved from the video coordinator (4 in Fig. 1 ), the manifest file is pushed by the video coordinator to the video generation client, or the manifest file is provided as part of the video generation instruction. When the manifest file is received, it is analysed using the video generation instruction. It is determined which segments should be retrieved, and these are retrieved from a server, for example a CDN Delivery Node, which is not necessarily the one that provided the manifest file. In the present embodiment, segments of partial views A31 , A32, A41 and A42 a retrieved. Subsequently, frames are rendered and encoded, and a segment of partial view B21 is generated and uploaded to the CDN Ingest Node 8. This process is repeated for each subsequent output segment. The process illustrated in Fig. 10 can be coordinated by a video coordinator 4 (see Fig. 1 ).

Once all video generators have started generating partial views (e.g. video streams comprising partial views of a scene), the video coordinator publishes the manifest file to the users (that is, the spectators). This publication could be, for example, a publication of a hyperlink on a website, that hyperlink pointing to a location where the manifest file can be retrieved. The publication could also be the pushing of the hyperlink or manifest file to the subscribed user devices. A user device may have a video streaming client that parses the manifest file and starts retrieving the relevant segments, depending on the navigation by the user (spectator) through the virtual world.

The code below provides an embodiment of a manifest file (partial) as provided to the user, and may be used by the video streaming clients 6 shown in Fig. 1. Note that this manifest file is a standards-compliant MPD, following the MPEG-DASH-SRD standard (ISO/I EC 23009-1 :2014 Amd.2) and the associated MPEG-DASH standard (ISO/IEC 23009: 1 :2014).

<?xml version="1.0" encoding="UTF-8"?>

<MPD

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns="urn:mpeg:dash:schema:mpd:201 1 "

xsi:schemaLocation="urn:mpeg:dash:schema:mpd:201 1 DASH-MPD.xsd"

[^■■■]>

<BaseURL>cdn. example. com</BaseURL>

<AdaptationSet mimeType="video/mp4" codecs="avc1.4d0228"> <SupplementalProperty schemeldUri="urn:mpeg:dash:srd:2014"

value="1 , 3840, 0, 1920, 1080, 7680, 4320, 37>

</SegmentList>

</Representation>

</AdaptationSet>

<EssentialProperty schemeldUri="urn:mpeg:dash:srd:2014"

value="1 , 1920, 0, 1920, 1080, 3840, 2160, 27>

</SegmentList>

</Representation>

</AdaptationSet>

[^■■■]

</Period>

</MPD>

The manifest file provides, among others, the following elements:

• <MPD>. The manifest file as a whole.

• <Period>. The exemplary manifest file has a single period that lasts 7200 seconds (2 hours). A skilled person may include multiple periods, for example for advertisement insertion.

· <AdaptationSet>. This is an adaptation set for video with a specific encoding.

• <SupplementalProperty>. This supplementary property is specific to the MPEG-DASH-SRD:2014 standard. The value contains the following parameters.

o source_id=1. This identifies the (camera) source. This embodiment has only a single camera. Multiple cameras could be identified with multiple source_id.

o object_x=3840. This is the horizontal (x) coordinate of the top-left pixel,

o object_y=0. This is the vertical (y) coordinate of the top-left pixel,

o object_width=1920. This is the width of the partial view, expressed in number of pixels. o object_height=1080. This is the width of the partial view, expressed in number of pixels. o total_width=7680. This is the width of the overall screen, expressed in number of pixels. o total_height=4320. This is the height of the overall screen, expressed in number of pixels. <Representation>. This element contains the segments. The present AdaptationSet has only a single representation. A skilled person may include multiple representations, such that an adaptive streaming client (HAS client) can switch back to a lower bandwidth version of the stream when needed. The element has the following attributes:

o id="B31 ". An identifier that is unique in the context of the present MPD.

o bandwidth="5000000". Bandwidth expressed in bits per second (5 megabit per second). o width="1920". Width expressed in pixels.

o height- "! 080". Height expressed in pixels.

<SegmentList>. This element contains a list of segment URLs. Each segment lasts 2 seconds. <SegmentURL>. This element contains a single segment URL. Fig. 1 1 schematically shows a software program product 1 10 which contains instructions allowing a processor to carry out embodiments of the method of the invention. The software program product 1 10 may contain a tangible carrier, such as a DVD, on which the instructions are stored. An alternative carrier is a portable semiconductor memory, such as a so-called USB stick. In some embodiments, the tangible carrier may be constituted by a remote server from which the software program product may be downloaded, for example via the internet.

The use of partial views as disclosed in the present document is very similar to the use of so- called tiles in tiled video.

It will be understood that the description of the invention given above is not intended to limit the invention in any way. Singular nouns and the articles "a" and "an" are of course not meant to exclude the possibility of plurals. Devices mentioned in this document, such as smartphones, may be replaced with their successors, even if these successors are not yet known at the time of writing. As is well established in the law of patents, the abstract should never be used to limit the scope of the claims, and neither should reference numbers in the claims.

It will further be understood by those skilled in the art that the present invention is not limited to the embodiments mentioned above and that many additions and modifications are possible without departing from the scope of the invention as defined in the appending claims.

Claims

1. A method of providing multiple video streams by using metadata of at least one scene

information stream relating to a scene, wherein the metadata of the scene information stream is descriptive of at least one event, and the metadata includes at least one time indication of said event, the method comprising:

determining at least two partial views (A12; A22) of the scene, each partial view covering at least a part of the scene,

generating, on the basis of the metadata from the scene information stream, a video stream for each of the partial views,

- transforming the at least one time indication of the scene information stream into

timestamps configured for synchronously rendering the video streams,

assigning the timestamps to the video streams, and

transmitting at least one of said video streams together with its assigned timestamps. 2. The method according to claim 1 , wherein transforming the at least one time indication

comprises applying a linear transform.

3. The method according to claims 1 or 2, wherein the scene information stream is generated by a computer-generated game.

4. The method according to any of the preceding claims, wherein at least two partial views

overlap.

5. The method according to any of the preceding claims, further comprising composing a view to be displayed by using one or more of said partial views.

6. The method according to any of the preceding claims, wherein the partial views correspond with an infinite distance camera position. 7. The method according to any of the preceding claims, wherein the transmitting is based on

HTTP adaptive streaming, the method preferably further comprising:

requesting the at least one video stream by using a spatial manifest data structure.

8. A software program product (1 10) comprising instructions allowing a processor to carry out the method according to any of the preceding claims.

9. A spatial manifest data structure, preferably configured for use in the method according to any of claims 1 to 7, which spatial manifest data structure comprises:

Identification information for selecting one or more video streams, each video stream representing a partial view of a scene, and

position information representing the position of each of the partial views in the scene, wherein said spatial manifest structure is preferably configured for retrieving said one or more video streams using HTTP adaptive streaming,

wherein said one or more video streams each comprise a plurality of chunks, and

wherein said identification information comprises chunk identifiers for selecting said chunks to request transmission thereof.

0. An apparatus (10) configured for providing multiple video streams by using metadata of at least one scene information stream relating to a scene, wherein the metadata of the scene information stream is descriptive of at least one event and includes at least one time indication of said event, the apparatus comprising a video coordinator (4) and at least two video generators (3),

the video coordinator (4) being configured for:

allocating each partial view to a video generator; and

at least one video generator (3) being configured for:

generating, by using the metadata of the scene information stream, a video stream for an allocated partial view,

assigning the timestamps to the video stream, and

transmitting the video stream together with its assigned timestamps.

1. The apparatus according to claim 10, wherein at least one video generator (3) is configured for:

rendering video frames of the allocated partial view by using the metadata of the scene information stream, and

encoding the video frames into a video stream. 12. The apparatus according to claim 10 or 1 1 , wherein at least one video generator (3) is

configured for:

segmenting the video stream into time segments, preferably time segments suitable for HTTP-based adaptive streaming. 13. The apparatus according to any of claims 10 to 12, wherein at least one video generator (3) is configured for linearly transforming the time indications into timestamps.

14. The apparatus according to any of claims 10 to 13, wherein at least one video generator (3) comprises: at least one spectator client (32) configured for generating the video stream for an allocated partial view,

a control unit (31 ) configured for controlling the transforming of the time indications of the scene information stream into timestamps,

- at least one encoder (33) configured for encoding the video stream, and

at least one multiplexer (34) for assigning the timestamps to the video stream.

15. The apparatus according to any of claims 9 to 13, wherein at least one video generator (3) comprises:

- at least one streaming client (32^') configured for generating the video stream for the

allocated partial view,

at least one encoder (33) configured for encoding the video stream, and

- at least one multiplexer (34) for assigning the timestamps to the video stream.

16. The apparatus according to claims 14 or 15, wherein the at least one multiplexer (34) is further configured for segmenting the video stream into time segments. 17. The apparatus according to any of claims 10 to 16, wherein the video coordinator (4)

comprises:

a rendering coordinator (42) configured for coordinating the partial views of the video generators (3),

an encoding coordinator (43) configured for coordinating settings used for the encodings, - a segmentation coordinator (44) configured for coordinating the segmenting of the video streams into time segments, and

a video coordinator control unit (41 ) configured for controlling the rendering coordinator (42), the encoding coordinator (43) and the segmentation coordinator (44),

and wherein the segmentation coordinator (44) is preferably also configured for producing manifest files describing spatial relationships between the video streams, preferably the spatial relationships defining the spatial position of the partial views of each of the respective video streams in relation to the scene.

18. The apparatus according to any of claims 10 to 17, further comprising a scene information server (1 ) configured for producing the scene information stream and a timer (2) configured for supplying the time indications to the scene information server (1 ).

19. A system for providing multiple video streams, the system comprising:

a communication network (5),

at least one video streaming client (6) connected to the communication network, and the apparatus (10) according to any of claims 10 to 17, the apparatus also being connected to the communication network (5).

20. The system according to claim 19, wherein the at least one video streaming client (6) is

configured for retrieving at least one video stream, preferably using a manifest file according to the MPEG DASH SRD standard format.

21. The system according to claim 19 or 20, wherein the apparatus (10) is configured for

transmitting to the scene information server (1 ) selection information indicative of the partial views, and wherein the scene information server is configured for transmitting partial scene information streams to the respective video generators (3).

22. The system according to any of claims 19 to 21 , further comprising at least one display device (7) connected to the at least one video streaming client (6).