WO2018197743A1

WO2018197743A1 - Virtual reality viewport adaption

Info

Publication number: WO2018197743A1
Application number: PCT/FI2018/050141
Authority: WO
Inventors: Sujeet Shyamsundar Mate; Arto Lehtiniemi; Antti Eronen; Jussi LEPPÄNEN
Original assignee: Nokia Technologies Oy
Priority date: 2017-04-27
Filing date: 2018-02-27
Publication date: 2018-11-01
Also published as: GB2561882A; GB201706701D0

Abstract

The application relates to Virtual Reality content consumption, in particular, although not exclusively, 3DOF and 6DOF video viewing. According to a first aspect, the specification describes a method comprising determining a landing point (32) within a virtual reality scene based on a defined landing view (28) for the scene and one or more characteristics of a virtual reality display device (10) field of view (FOV1); and causing display of the virtual reality scene on the virtual reality display in dependence on the determined landing point (32).

Description

Virtual Reality Viewport Adaption

Field

The application relates to Virtual Reality content consumption, in particular, although not exclusively, 3DOF and 6DOF video viewing.

Background

When experiencing virtual reality (VR) content, such as a VR computer game, a VR movie or "Presence Capture" VR content, users generally wear a specially-adapted head-mounted display device (which may be referred to as a VR display device) which renders the visual content. An example of such a VR device is the Oculus Rift ® , which allows a user to watch 360-degree visual content captured, for example, by a Presence Capture device such as the Nokia OZO camera. Summary of Invention

According to a first aspect, the specification describes a method comprising determining a landing point within a virtual reality scene based on a defined landing view for the scene and one or more characteristics of a virtual reality display device field of view; and causing display of the virtual reality scene on the virtual reality display in dependence on the determined landing point.

The one or more characteristics of the virtual reality display device field of view may include horizontal field of view. The one or more characteristics of the virtual reality display device field of view may include vertical field of view.

The method may be performed by the virtual reality display. The method may be performed by a server or server system.

The method may comprise using user spatio-temporal co-ordinates to determine the landing point.

The method may comprise determining the landing point in response to a scene change. The method may further comprise: determining if a landing view from the landing point satisfies one or more constraints; and adapting the landing view in dependence on the constraint satisfaction. The step of adapting the landing view may comprise at least one of: cropping the landing view; or zooming into or out of the landing view.

The step of adapting the landing view may comprise determining a second landing point based on the determined landing point and at least one constraint.

The one or more constraints may comprise at least one of: a prioritised list of objects of interest and/or persons of interest in the virtual reality scene; a minimum size for objects of interest and/or persons of interest in the virtual reality scene; or a relative size for objects of interest and/or persons of interest in the virtual reality scene.

According to a second aspect, the specification describes apparatus configured to perform any of the methods of the first aspect.

According to a third aspect, the specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any of the methods according to the first aspect.

According to a fourth aspect, the specification describes an apparatus comprising: at least one processor; and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to perform any of the methods of the first aspect.

According to a fifth aspect, the specification describes a computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by a least one processor, cause performance of the method of any of claims 1 to 11.

According to a sixth aspect, the specification describes apparatus comprising: means for determining a landing point within a virtual reality scene based on a defined landing view for the scene and one or more characteristics of a virtual reality display device field of view; and means for causing display of the virtual reality scene on the virtual reality display in dependence on the determined landing point. According to a seventh aspect, the specification describes a virtual reality system comprising: a virtual reality display device; and a virtual reality content server, wherein the system is configured to perform any of the methods of the first aspect.

According to an eighth aspect, the specification describes apparatus comprising:

at least one processor; and

at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to perform:

determining a landing point within a virtual reality scene based on a defined landing view for the scene and one or more characteristics of a virtual reality display device field of view; and

causing display of the virtual reality scene on the virtual reality display in dependence on the determined landing point.

According to a ninth aspect, the specification describes computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by a least one processor, cause performance of:

causing display of the virtual reality scene on the virtual reality display in dependence on the determined landing point. Brief Description of the Figures

For a more complete understanding of the methods, apparatuses and computer-readable instructions described herein, reference is now made to the following descriptions taken in connection with the accompanying drawings in which: Figure 1 is a schematic illustration of a VR display system which may be utilised during performance of various methods described herein with reference to Figures 2 to 10;

Figures 2A and 2B are schematic illustrations relating to methods of determining a landing point for providing a defined landing view for a user in a VR space;

Figure 3 shows multiple users viewing a VR space;

Figure 4 is a flowchart illustrating various operations relating to determining a landing point or a landing point with modified landing view;

Figure 5 shows an example of a client-centric approach to determining a landing point; Figure 6 shows an example of a client-centric approach to determining a landing point using the Omnidirectional Media Application Format (OMAF);

Figure 7 shows an example of a server-centric approach to determining a landing point; Figure 8 shows an example of a server-centric approach to determining a landing point using OMAF;

Figure 9 shows an example of a hybrid approach to determining a landing point;

Figure 10 shows an example of a hybrid approach to determining a landing point using OMAF; and

Figures 11A and 11B are schematic illustrations of example configurations of a VR display device and a content server respectively.

Detailed description

In the description and drawings, like reference numerals may refer to like elements throughout.

Figure 1 is a schematic illustration of a system 1 for providing virtual reality (VR) content for consumption by a user. As will be appreciated from the below discussion, VR content generally includes both a visual component and an audio component but, in some implementations, may include just a visual component and/or other modalities such as for example haptics, smell, or the like . As used herein, VR content may cover, but is not limited to, at least computer-generated VR content, content captured by a presence capture device (presence device-captured content) such as Nokia's OZO camera or the Ricoh's Theta, and a combination of computer-generated and presence-device captured content. Indeed, VR content may cover any type or combination of types of immersive media (or multimedia) content. The term VR as used herein may additionally be used to connote augmented reality and/or mixed reality.

The VR system 1 includes a VR display device 10, for displaying visual data in a virtual reality space, and a VR content server 12 for supplying visual data to the VR display device 10. Details of the components and features of the VR display device and content server are described below in more detail with reference to figures 11A and 11B.

When beginning a VR scene, such as when a user first enters a VR space or during a scene change, the initial view of the scene provided to the user is known as the landing view. The landing view, also known as a landing viewport or landing view volume, is the starting point to initiate VR content viewing. Consequently, it is an important parameter which ensures a deterministic viewing experience for the user. If the landing view provides the user with a view of an unimportant part of the VR content, or a view missing important elements of the VR content, there can be a high probability of the user missing out important events in the VR experience. For example, the user may miss important parts of a story, or other highlights.

Where VR content is provided by a content provider to multiple VR display devices, the situation is further complicated by a plethora of heterogeneous VR display devices that are available. Each of the VR display devices may have different characteristics, such as differing horizontal fields of view or other field of view characteristics. This can result in the landing view provided to some of the VR display devices missing important content.

Figure 2A is a schematic embodiment of a method for determining a landing point for providing a defined landing view a user in a VR space. A user 20 is entering an audio- visual volume for VR content consumption, which may be a 6DOF (6 degrees of freedom: x, y, z, roll, pitch and yaw) volume. The user is wearing a virtual reality display device 10, which as a field of view (FOV) 22. VR content is provided to the user via the VR display device 10, for example by being streamed from a VR content server (not shown) and displayed on a display in the VR display device 10.

A defined landing view 28 is associated with each scene in the VR content. The defined landing view 28 provides an initial view of the scene that includes the elements of the scene that have been deemed to be important for a viewer to see, for example in order to maintain storytelling coherence. In some embodiments, the defined landing view 28 may be defined manually by the content creator. In some embodiments, the defined landing view 28 may be defined automatically based on objects of interest (OOIs) or persons of interest (POIs) identified in the VR content. In some embodiments, the defined landing view 28 may be defined as a combination of content creator information and automatic analysis.

The defined landing view 28 has a certain width ω 24 and height H 26. These may be defined by a length in spatial units, such as meters, a number of pixels, and/or as angles. Angles can be expressed for example in degrees or radians. The defined landing view can also have a defined depth D 30, making the defined landing view part of a defined landing volume. In particular, the landing view 28 is the frontmost part of the landing volume. The landing view can be considered to be a plane having a 3D specific position (and thus orientation) and height and width dimensions. The landing view 28 is intended to be observed from a position on a line that intersects the centre of the landing view

perpendicular to the plane of the landing view. The centre here may be in terms of height and width of the landing view 28 or in terms of width only. To ensure that the incoming user 20 sees the landing view 28 as intended by the content creator or as automatically defined by the system, the system uses the defined landing view 28 coordinates and one or more display device FOV 22 characteristics to determine a landing point 32 for the user 20 at distance 34 "di" from the front of defined landing view 28.

The landing point 32 defines a spatial position in the VR space from which the user 20 initially views the provided VR content. The landing point 32 is determined such that the defined landing view 28 is within the FOV 22 of the VR display device 10. In this way, the user 20 is provided with an initial view of a VR scene that contains important features of that VR scene, and which can maintain coherence in the user 20 experience.

The display device FOV 22 characteristics which may be used for determining the landing point 32 may include a horizontal FOV. The horizontal FOV may be defined, for example, by a width of the FOV or a horizontal FOV angle. Similarly, the display device FOV 22 characteristics for determining the landing point 32 may also or alternatively include a vertical display device FOV characteristic. The vertical FOV characteristic may be defined, for example, by a height of the FOV or a vertical FOV angle. Other characteristics of the FOV 22 which may be used to determine the landing point 32 may include one or more of, for example: the depth of the FOV; the solid angle of the FOV; and the aspect ratio of the FOV. In some embodiments, the landing point may additionally be determined based of other characteristics of the VR display device 10, for example the resolution of the display, supported video coding formats, data rate, and/or user preferences. The user preferences may include an indication as to whether a user prefers to consume VR content with a small FOV but a high resolution or with a full FOV at a low resolution. If characteristics of the display device 22 are initially unknown, the landing point 32 may be determined based on default characteristics, for example a minimum expected FOV for any display device suitable for viewing the VR content. This would guarantee that for majority of display devices the initial view contains the desired features of the VR scene. A further user 20' may use a different VR display device 10' with a different FOV 22' to view the VR content. The different FOV 22' has different FOV characteristics which result in a different landing point 36 at distance 38 "d₂" from the defined landing view 28 being determined. In the example shown, the landing point 36 for the second user 20' is at a greater distance from the defined landing view 28 than the landing point 32 for the first user 20. The second user 20' therefore initially views the defined landing view 28 of the VR scene from greater distance than the first user 20.

By way of contrast, Figure 2B shows users 20, 20' using different VR display devices 10, 10' with different FOVs 22, 22' but positioned at the same distance 40 from the defined landing view 28. In the example shown, the FOV 22, 22' of each user 20, 20' does not fully incorporate the defined landing view 28. This can result in the user missing important elements of the VR content.

Figure 3 illustrates multiple users viewing a VR space. In the example shown, the users 42, 44, 46 are interacting with a 6DOF VR volume 48. A defined landing view 28, in this example in the form of a landing viewport defined by a width ω 24 and height H 26, is provided by the system. The landing view 28 should be visible to all users 42, 44, 46 upon initialisation of the 6DOF volume (or VR scene).

Each of the users 42, 44, 46 is equipped with a VR display device having different FOV characteristics. In the example shown, the fields of view of the VR displays are

characterised by an angle, though other examples are possible. The first user 42 has a VR display device FOV 52 characterised by an angle xi in the horizontal plane. In order to include the full width 24 of the defined view 28 within the FOV 52, the landing point for the first user 42 will be determined to be a distance di 58 from the landing view 28. A second user 44 has a VR display device FOV 54 characterised by a different angle a₂ in the horizontal plane to the first user 42. The landing point determined for the second user 44 may then be different to that of the first user 42, being determined to be a distance d₂ 60 from the landing view 28. In this example, the horizontal FOV of the second display device FOV 54 is narrower than that of the first display device FOV 52, and hence the landing point of the second user 44 is at a greater distance from the landing view 28 than the landing point of the first user 42.

A third user 46 has a VR display device FOV 56 characterised by a different angle a₃ in the horizontal plane to both the first user 42 and the second user 44. The landing point determined for the third user 44 may then be different to the landing point of the first user 42 and second user 44, being determined to be a distance d₃ 61 from the landing view 28. In this example, the horizontal viewing angle of the third display device FOV 56 is narrower than horizontal viewing angles of the both the first display device FOV 52 and second display device FOV 54, and hence the landing point of the third user 46 is at a greater distance from the landing view 28 than the landing points of the first user 42 and the second user 44.

In this way, each of the users 42, 44, 46 has the defined landing view visible on their respective display devices upon initialisation of the VR scene or volume 48. Each of the users 42, 44, 46 is, however, be presented with the defined landing view 28 from the perspective of a different landing point. A VR display device with smaller field of view results in a view from further away. To cover a given width of the landing view, different VR display devices with different FOVs ai, a.2, 0:3 receive different landing points di, d2, d3 respectively.

For a given FOV, the minimum distance d that the landing point could be from the defined landing view 28 in order to be fully visible to a user occurs at the distance at which the edges of the FOV intersect the edges of the defined landing view 28 that are nearest the user.

The FOV can have additional characteristics that may need to be taken into account when determining the landing point of a user. For example, in the situation described in relation to Figure 3, each of the VR display devices may have a FOV that is additionally

characterised by an angle in the vertical plane. When determining a landing point for the user, the landing point distance dh from the landing view 28 can be determined such that the full height 26 of the defined landing view 28 is within the VR device FOV. The larger of this distance dh and the landing point distance d_M from the defined landing view 28 determined such that the full width of the defined landing view 28 is within the FOV can then act as a lower bound for the determined landing point.

In some embodiments, constraints can be placed on the landing view presented to the user, for example that objects of interest should appear to the user at at least a

predetermined minimum size. Furthermore, objects of interest (OOIs) and persons of interest (POIs) in the scene may be given in a prioritised order. If the determined landing point provides the user a view of the defined landing view that does not satisfy these constraints, the landing point and landing view displayed to the user both may be modified to include the salient aspects of the defined landing view. Such modification may include, for example, cropping or zooming into the defined landing view. In some embodiments, a second landing view and second landing point may be determined based on the first landing view, the consumption device characteristics and the constraints. This may occur in situations where the determined first landing point does not provide a landing view that meets the constraints. Calculating a second landing view/ point allows for the landing view to be adapted to meet constraints. For example, the first landing view may result in certain OOIs and/or POIs being too small in order to accommodate the view or volume of interest. A sub-volume or view may then be determined such that the OOI/POI constrains are met. The second landing view/point may be determined such that it exploits the consumption device characteristics and scene semantic information to provide a landing view to the user that may avoid missing out the semantically significant content. The second landing view/point may be determined based on information comprising the OOI/POI

information in prioritized order and/or the OOI/POI height or width tolerances. This information may be signalled to the entity that calculates the second landing viewport and landing point.

Figure 4 shows a flowchart describing a method of determining a landing point or a landing point with modified landing view.

A defined landing view 28 may be specified in advance by the content creator 62, derived based on a scene analysis using an automated landing view module 64, or a combination of the two approaches. When VR content is requested for display on the VR display device, the system receives the defined landing view 66, and any associated constraints. The constraints are defined either by the VR content creator, based on automatic analysis of the VR scene, or by a combination of the two. Examples of such constraints include specifying that the landing point should not result in an OOI and/or POI in the VR scene occupying less than a given percentage of the FOV. The system also receives VR display device capabilities 68 relating to characteristics of the VR display device that the content has been requested for. Examples include

characteristics of the FOV of the VR display device, such as horizontal and vertical FOV angles. If characteristics of the VR display device are not available, the system may retrieve default characteristics for the VR display device from a memory or over a network interface from a remote server. A landing point within the requested VR scene requested for viewing on the display device is then determined 70 based on a defined landing view for the scene and one or more characteristics of a virtual reality display device field of view. In some embodiments, the spatio-temporal coordinates 72 of the VR display device user can also be used to determine the landing point.

The resultant landing view of the VR scene that would be presented to the user via the VR display device from the perspective of the landing point may then be analysed 74 to verify if the constraints are met. If the constraints are met, the landing point information is signalled to a player application 76 which requests the corresponding content from VR content server 78 and causes display of the VR scene on the VR display device in from the perspective of the determined landing point.

If the resulting landing view of the VR scene that would be presented to the user via the VR display device from the perspective of the landing point does not meet the constraints, the system may perform a landing view adaptation 80. This may comprise, for example, resizing, cropping or zooming in or out of the landing view. The resulting adapted landing view data may then be signalled to the player application 76 which requests the corresponding content from VR content server 78 and causes display of the VR scene on the VR display device in from the perspective of the determined landing point, using the determined adaptations to make the displayed landing view meet the constraints.

Therefore, in scenarios where a landing point change does not allow constraint compliance, viewport adaptation with, for example, cropping (Zoom-In) or frame resize (zoom-out) data in addition to the spatio-temporal coordinates of the landing point are signalled to the player. This enables the player to not only request the appropriate VR content data, but also to modify its rendering to satisfy the landing view constraints.

The signaling information may be delivered as a Media Presentation Description (MPD) using Moving Picture Experts Group Dynamic Adaptive Streaming over HTTP (MPEG DASH), as a Session Description Protocol (SDP) in the Session Initiation Protocol/Real Time Streaming Protocol (SIP/RTSP) based session setup or as Extensible Markup Language/ JavaScript Object Notation (XML/JSON) in case of other transport based streaming. The signaling information may consist of spatio-temporal coordinates of the landing point or MPEG- DASH compliant indexes corresponding to the resultant viewport data. The MPEG- DASH standard implementation is suited for client-centric approach or server centric approach. The SIP/SDP based setup may be implemented with the data exchange implemented over Real-time Transport Protocol Control Protocol (RTCP) messages and data streaming over the Real-time Transport Protocol (RTP).

In case of MPEG- DASH delivery, the consumption device may register its capabilities with the server, which results in a client-specific landing viewport generation, which is delivered as part of Omnidirectional Media Application Format (OMAF) metadata.

The method shown in figure 4 can be implemented by the VR system 1 in a number of ways. Figures 5 to 10 illustrate examples of different implementation embodiments which may be tailored for different deployment scenarios.

Figure 5 shows an embodiment of a client-centric approach to determining a landing point. This approach can reduce the need for changes from the server side. This embodiment can be suited for implementation as part of OZO player (or other content capture device player) SDK (software development kit) which can work with different content providers, where the service supplier may not have direct control over the VR content server.

A content requesting and playback application 82 in the VR display device 10 requests VR media content from a VR content server 12. The VR content server 12 supplies VR media data 84 for the requested content. The media content 84 may comprise a defined landing view for the requested VR content. In some embodiments, the VR content server 12 additionally supplies content metadata 86 relating to the requested VR content. This metadata may have been supplied by the VR content creator, and may include, for example, objects of interest (OOIs) or persons of interest (POIs).

The media data 84 is supplied to a scene and viewport creation application 88. The scene and viewport creation application 88 generates automated content related metadata 90 from the media data 84. This may include, for example, scene information, OOIs and/or POIs. The automated content related metadata 90 may be used in addition to or alternatively to any content metadata 86 supplied by the VR content server 12 to determine the landing point of the user in the VR space. In embodiments where no content metadata 86 is supplied by the VR content server 12, the automated content related metadata 90 may be used to determine the landing point of the user in the VR space instead. In some embodiments, the defined landing view is not supplied by the VR content server 12, but may instead be defined automatically by the scene and viewport creation application 88. The content metadata 86, if present, content consumption spatio-temporal co-ordinates 92 and automated content related metadata 90 are provided to a landing viewport and landing point generator 94. The landing viewport and landing point generator 94 determines the landing point for the user in the VR space based on the defined landing view, the content metadata 86 and/or the automated content related metadata 90 and/or the content consumption spatio-temporal co-ordinates 92, and characteristics of the VR display device, such as the display device FOV characteristics. In some embodiments a defined landing view may be generated by the landing viewport and landing point generator 94 based on the media data 84 supplied by the VR content server 12 and/ or the metadata 86, 90 supplied to the landing viewport and landing point generator 94.

The landing point 96 determined by the landing viewport and landing point generator 94 is then signalled to the content requesting and playback application 82. A landing viewport may also be signalled to the content requesting and playback application 82. The content requesting and playback application 82 then causes display of the requested VR scene on the VR display device in dependence on the determined landing point.

Figure 6 shows an example of a client-centric approach to determining a landing point using OMAF. The embodiment utilises a generic Omnidirectional Media Application

Format (OMAF) architecture. It should be noted that the illustrated architecture is only used as a placeholder, since the architecture may change when 6DOF VR comes into the scope of standardization. VR content is encoded using a VR encoder 98. The VR encoder 98 produces encoded files for audio and visual data, and optionally for metadata relating to the encoded video files. Stitched images and/or video may be created from multiple input images. The stitched images/video may be encoded as coded images. The stitched images/video may alternatively or additionally be encoded as a coded video bitstream. Captured audio may be encoded as an audio bitstream. The coded images, video, and/or audio may then be composed into a media file (F) according to a particular media container file format. The coded images, video, and/or audio may be compressed into a sequence of an initialization segment and media segments (Fs) for streaming. In some embodiments, the media container file format is the ISO Base Media File Format (ISOBMFF). The file encapsulator may also include metadata with the file and/or the segments. Such metadata may include, for example, projection and region-wise packing information that may assist in rendering the decoded packed frames. The F_s files are transmitted to a delivery platform 100. The delivery platform 100 may be a file or video segment data server. For example, the delivery platform 100 may be a MPEG- DASH delivery server. When requested, files from the delivery platform 100 may be delivered from the delivery platform 100 to a player and/ or playback device.

A decoder 102 receives encoded files (F') from the encoder 98 at a file/segment decapsulation module 104. These files are unpacked, and provided to the rest of the decoder. Data relating to a defined landing view is provided to a scene analysis and viewport creation application 88. The data may include content creator metadata such as scene information, OOIs and/or POIs, and/or a content creator defined landing view. The scene analysis and viewport creation application 88 may generate automated content related metadata from the supplied data. This may include, for example, additional scene information, OOIs and/or POIs, and/or an automatically defined landing view.

The automated content related metadata and content creator metadata are supplied by the scene analysis and viewport creation application 88 to a landing viewport and landing point generator 94. In some embodiments, user spatio-temporal co-ordinates are also supplied. The landing viewport and landing point generator 94 uses the metadata, including the defined landing view, and characteristics of the VR display device FOV to determine a landing point for the user in the VR space and a landing view for the user, as described in relation to figures 2 to 4.

The landing viewport and landing point generator 94 signals the landing point 96 to an image rendering application 106 in the decoder 102. The landing viewport and landing point generator 94 may also signal a landing view to the image rendering application 106. The landing viewport and landing point generator 94 issues a request to the delivery platform 100 for standards compliant, for example MPEG- DASH compliant, VR views that can be used to render the requested VR scene from the perspective of the determined landing point.

The delivery platform 100 transmits the requested VR views to the decoder 102. The requested VR views from the delivery platform 100 are additionally supplied in

dependence on orientation/viewport metadata determined by a head/eye tracking system 108 on the VR display device. The requested VR views are supplied to the file/ segment decapsulation module 104, where they are unpacked. The requested VR views are then decoded, along with the F' files received from the encoder. The resulting decoded views are provided to the image rendering module 106.

The image rendering module 106 additionally receives orientation/viewport metadata from the head/ eye tracking system 108 on the VR display device.

The image rendering module 106 uses the provided information to render the VR scene from the perspective of the determined landing point. The VR scene is then displayed to the user via a display 110. Audio 112 to go along with the scene may also be provided to the user.

Figure 7 shows an embodiment of a server-centric approach to determining a landing point. The server-centric approach may be tailored for setups where the streaming server can be modified by the content provider and keeps the client-side functionality to a minimum.

A content requesting and playback application 82 in the VR display device 10 requests VR media content from a VR content server 12. The VR display device 10 also communicates the device FOV characteristics to the VR content server 12. The FOV characteristic may be communicated on the fly when the VR content is requested. In some embodiments, the FOV characteristics of the VR display device 10 are pre-registered with the VR content server.

A VR content database 114 in the VR server 12 supplies content metadata 86 relating to the requested VR content to a scene analysis and viewport creation application 88 located at the VR server 12. This metadata may have been supplied by the VR content creator, and may include, for example, objects of interest (OOIs) and/or persons of interest (POIs), and/or a content creator defined landing view. The scene analysis and viewport creation application 88 generates automated content related metadata from the content metadata 86. This may include, for example, additional scene information, OOIs and/or POIs, or an automatically defined landing view.

The automated content related metadata and content creator defined metadata are supplied to a landing viewport and landing point generator 94 located at the VR server 12. The landing viewport and landing point generator 94 determines the landing point for the user in the VR space based on the defined landing view, the content creator metadata and/or the automated content related metadata, as well as characteristics of the VR display device, such as the display device FOV characteristics.

In some embodiments, user position tracking data is optionally provided by the VR display device 10 to the VR server 12. These user spatio-temporal coordinates may be used as an additional input into the landing viewport and landing point generator 94 and used to determine a landing point for the user.

The landing viewport and landing point generator 94 then signals the determined landing point and landing viewport 96 to the content requesting and playback application 82 on the VR display device 10. The landing viewport signalling may consist of timestamped information which may be leveraged by the player application to render the content. The VR server 12 also supplies media data 84 to the content requesting and playback application 82. The content requesting and playback application 82 uses the media data 84 and the determined landing point and landing viewport 96 to render the requested VR content for viewing by a user.

Figure 8 shows an example of a server-centric approach to determining a landing point using OMAF.

VR content is encoded using a VR encoder 98. The VR encoder 98 produces encoded files for audio and visual data, and optionally for metadata relating to the encoded video files. The encoder comprises a file/segment encapsulation module 116. A scene analysis and viewport creation application 88 located at the VR server is used to generate automated content related metadata from content creator metadata. The content creator metadata may include scene information, OOIs and/or POIs, and/or a content creator defined landing view. The scene and viewport creation application 88 can generate automated content related metadata from the supplied data. This may include, for example, additional scene information, OOIs and/or POIs, and/or an automatically defined landing view. The content creator metadata and the automated content related metadata can be input into the file/segment encapsulation module 116, and included with the encoded files F and F_s. A landing viewport and landing point generator 94 is located at the VR server. The landing viewport and landing point generator 94 generates a landing point for the user based the defined landing view signalled from the encoder as part of the content creator metadata or automated content related metadata, and FOV characteristics of the VR display device signalled from a head/ eye tracking system 108 on the VR display device. The landing viewport and landing point generator 94 may also additionally generate a landing view for the user based on the determined landing point. These are signalled to the delivery platform 100.

When VR content is requested, decoder 102 receives the encoded files from the encoder 98 and delivery platform 100, as well as orientation/viewport metadata from the head/eye tracking system 108. The encoded files include the landing point and landing view determined by the landing viewport and landing point generator 94. The encoded files may additionally include audio content relating to the requested VR content.

The decoder 102 decodes the received encoded files to render the VR for display 110 on the VR display device. It may also output audio 112 relating to the displayed VR content.

Figure 9 shows an embodiment of a hybrid approach to determining a landing point. In this embodiment the end to end chain is utilized for delivering an optimal viewing experience. The hybrid setup performs the computationally intensive scene analysis aspects on the server side, while providing the application a greater freedom to define implementation choices for the landing point and viewport generation.

A content requesting and playback application 82 in the VR display device 10 requests VR media content from a VR content server 12. A VR content database 114 in the VR server 12 supplies content metadata 86 relating to the requested VR content to a scene and viewport creation application 88 located at the VR server. This metadata may have been supplied by the VR content creator, and may include, for example, objects of interest (OOIs) and/or persons of interest (POIs), and/or a content creator defined landing view. The scene and viewport creation application 88 generates automated content related metadata from the content metadata 86. This may include, for example, additional scene information, OOIs and/or POIs, or an automatically defined landing view.

The automated content related metadata 90 and content creator defined metadata 86 are supplied to a landing viewport and landing point generator 94 located at the VR display device 10. The landing viewport and landing point generator 94 determines the landing point for the user in the VR space based on the defined landing view, the content creator metadata and/or the automated content related metadata, as well as characteristics of the VR display device, such as the display device FOV characteristics. It may additionally use content consumption spatio-temporal coordinates 92 supplied by the content requesting and playback application.

The landing viewport and landing point generator 94 then signals the determined landing point and landing viewport 96 to the content requesting and playback application 82 on the VR display device 10. The landing viewport signalling could consist of timestamped information which can be leveraged by the player application to render the content. The VR server 12 also supplies media data 84 to the content requesting and playback application 82. The content requesting and playback application 82 uses the media data 84 and the determined landing point and landing viewport 96 to render the requested VR content for viewing by a user. Figure 10 shows an example of a hybrid approach to determining a landing point using OMAF.

VR content is encoded using a VR encoder 98. The VR encoder 98 produces encoded files for audio and visual data, and optionally for metadata relating to the encoded video files. The encoder comprises a file/ segment encapsulation module 116.

A scene analysis and viewport creation application 88 located at the VR server is used to generate automated content related metadata from content creator metadata. The content creator metadata may include scene information, OOIs and/or POIs, and/or a content creator defined landing view. The scene and viewport creation application 88 can generate automated content related metadata from the supplied data. This can include, for example, additional scene information, OOIs and/or POIs, and/or an automatically defined landing view. The content creator metadata and the automated content related metadata can be input into the file/segment encapsulation module 116, and included with the encoded files F and F_s.

When VR content is requested by a VR display device, encoded files are transmitted to a decoder 102 in the VR display device. The decoder 102 receives encoded files (F') from the encoder 98. These files are unpacked, and provided to the rest of the decoder. The content creator metadata and automated content related metadata in the files are signalled to a landing viewport and landing point generator 94 located at the VR display device. The landing viewport and landing point generator 94 uses the metadata, including the defined landing view, and characteristics of the VR display device FOV to determine a landing point for the user in the VR space and a landing view for the user, as described in relation to figures 2 to 4. Spatio-temporal data determined from the head/eye tracking system 108 may also be incorporated into the determination of the landing point and landing view.

The landing viewport and landing point generator 94 signals the determined landing point and landing view 96 to an image rendering application 106 in the decoder 102. The landing viewport and landing point generator 94 issues a request to the delivery platform 100 for standards compliant, for example MPEG- DASH compliant, VR views that can be used to render the requested VR scene from the perspective of the determined landing point. The delivery platform 100 transmits the requested VR views to the decoder 102. The requested VR views from the delivery platform 100 are additionally supplied in

dependence on orientation/viewport metadata determined by a head/eye tracking system 108 on the VR display device. The requested VR views are then decoded, along with the F' files received from the encoder. The resulting decoded views are provided to the image rendering module 106 in the decoder 102.

The image rendering module 106 additionally receives orientation/viewport metadata from the head/eye tracking system 108 on the VR display device. The image rendering module 106 uses the provided information to render the VR scene from the perspective of the determined landing point. The VR scene is then displayed to the user via a display 110. Audio 112 to go along with the scene may also be provided to the user. Some further details of components and features of the above-described apparatuses 10, 12 and alternatives for them will now be described, primarily with reference to Figures 11A and 11B.

The controllers 1000, 1200 of each of the apparatuses 10, 12 comprise processing circuitry 1001, 1201 communicatively coupled with memory 1002, 1202. The memory 1002, 1202 has computer readable instructions 1002A, 1202A stored thereon, which when executed by the processing circuitry 1001, 1201 causes the processing circuitry 1001, 1201 to cause performance of various ones of the operations described with reference to Figures 1 to 11B. For example, the various modules of Figures 1 to 11B may be implemented as computer readable instructions 1002A, 1202A stored on memory 1002, 1202 and being executable by processing circuitry 1001, 1201. Therefore, the various modules discussed in relation to Figures 1 to 11B may be also referred to as "circuitry". The controllers 1000, 1200 may in some instances be referred to, in general terms, as "apparatus".

The processing circuitry 1001, 1201 of any of the UE/apparatuses 10, 12 described with reference to Figures 1 to 9B may be of any suitable composition and may include one or more processors 1001A, 1201A of any suitable type or suitable combination of types. For example, the processing circuitry 1001, 1201 may be a programmable processor that interprets computer program instructions 1002A, 1202A and processes data. The processing circuitry 1001, 1201 may include plural programmable processors.

Alternatively, the processing circuitry 1001, 1201 may be, for example, programmable hardware with embedded firmware. The processing circuitry 1001, 1201 may be termed processing means. The processing circuitry 1001, 1201 may alternatively or additionally include one or more Application Specific Integrated Circuits (ASICs). In some instances, processing circuitry 1001, 1201 may be referred to as computing apparatus. The processing circuitry 1001, 1201 is coupled to the respective memory (or one or more storage devices) 1002, 1202 and is operable to read/write data to/from the memory 1002, 1202. The memory 1002, 1202 may comprise a single memory unit or a plurality of memory units, upon which the computer readable instructions (or code) 1002A, 1202A is stored. For example, the memory 1002, 1202 may comprise both volatile memory 1002-2, 1202-2 and non-volatile memory 1002-1, 1202-1. For example, the computer readable instructions 1002A, 1202A may be stored in the non-volatile memory 1002-1, 1202-1 and may be executed by the processing circuitry 1001, 1201 using the volatile memory 1002-2, 1202-2 for temporary storage of data or data and instructions. Examples of volatile memory include RAM, DRAM, and SDRAM etc. Examples of non-volatile memory include ROM, PROM, EEPROM, flash memory, optical storage, magnetic storage, etc.

The memories in general may be referred to as non-transitory computer readable memory media.

The term 'memory', in addition to covering memory comprising both non-volatile memory and volatile memory, may also cover one or more volatile memories only, one or more non-volatile memories only, or one or more volatile memories and one or more nonvolatile memories. The computer readable instructions 1002A, 1202A may be pre-programmed into the apparatuses 10, 12. Alternatively, the computer readable instructions 1002A, 1202A may arrive at the apparatus 10, 12 via an electromagnetic carrier signal or may be copied from a physical entity such as a computer program product, a memory device or a record medium such as a CD-ROM or DVD. The computer readable instructions 1002A, 1202A may provide the logic and routines that enables the UEs/apparatuses 10, 12 to perform the functionality described above. The combination of computer-readable instructions stored on memory (of any of the types described above) may be referred to as a computer program product.

Where applicable, wireless communication capability of the apparatuses 10, 12 may be provided by a single integrated circuit. It may alternatively be provided by a set of integrated circuits (i.e. a chipset). The wireless communication capability may

alternatively be a hardwired, application-specific integrated circuit (ASIC).

As will be appreciated, the apparatuses 10, 12 described herein may include various hardware components which may not have been shown in the Figures. For instance, the VR display device 10 may in some implementations include a portable computing device such as a mobile telephone or a tablet computer and so may contain components commonly included in a device of the specific type. Similarly, the apparatuses 10, 12 may comprise further optional software components which are not described in this specification since they may not have direct interaction to embodiments. Embodiments may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a "memory" or "computer-readable medium" may be any media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.

Reference to, where relevant, "computer-readable storage medium", "computer program product", "tangibly embodied computer program" etc., or a "processor" or "processing circuitry" etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.

As used in this application, the term 'circuitry' refers to all of the following: (a) hardware- only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.

This definition of 'circuitry' applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term "circuitry" would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware. The term

"circuitry" would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in server, a cellular network device, or other network device.

If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above- described functions may be optional or may be combined. Similarly, it will also be appreciated that flow diagram of Figure 4 is an example only and that various operations depicted therein may be omitted, reordered and or combined.

Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims. It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope as defined in the appended claims.

Claims

1. A method comprising:

2. A method as claimed in claim 1, wherein the one or more characteristics of the virtual reality display device field of view include horizontal field of view.

3. A method as claimed in claim 1 or claim 2, wherein the one or more characteristics of the virtual reality display device field of view include vertical field of view.

4. A method as claimed in any preceding claim, wherein the method is performed by the virtual reality display.

5. A method as claimed in any preceding claim, wherein the method is performed by a server or server system.

6. A method as claimed in any preceding claim, comprising using user spatio- temporal co-ordinates to determine the landing point.

7. A method as claimed in any preceding claim, comprising determining the landing point in response to a scene change.

8. A method as claimed in any preceding claim, comprising:

determining if a landing view from the landing point satisfies one or more constraints; and

adapting the landing view in dependence on the constraint satisfaction.

9. A method as claimed in claim 8, wherein the step of adapting the landing view comprises at least one of: cropping the landing view; or zooming into or out of the landing view.

10. A method as claimed in claims 8 or 9, wherein the step of adapting the landing view comprises determining a second landing point based on the determined landing point and at least one constraint.

11. A method as claimed in claims 8 to 10, wherein the one or more constraints comprise at least one of: a prioritised list of objects of interest and/or persons of interest in the virtual reality scene; a minimum size for objects of interest and/or persons of interest in the virtual reality scene; or a relative size for objects of interest and/or persons of interest in the virtual reality scene.

12. Apparatus configured to perform a method according to any of claims 1 to 11.

13. Computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform a method according to any of claims 1 to 11.

14. Apparatus comprising:

at least one processor; and

at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus to perform the method of any of claims 1 to 11.

15. A computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by a least one processor, cause performance of the method of any of claims 1 to 11.

16. Apparatus comprising:

means for determining a landing point within a virtual reality scene based on a defined landing view for the scene and one or more characteristics of a virtual reality display device field of view; and

means for causing display of the virtual reality scene on the virtual reality display in dependence on the determined landing point.

17. A virtual reality system comprising:

a virtual reality display device; and

a virtual reality content server,

wherein the system is configured to perform the method of any of claims 1 to 10.

18. Apparatus comprising:

at least one processor; and

19. A computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by a least one processor, cause performance of: determining a landing point within a virtual reality scene based on a defined landing view for the scene and one or more characteristics of a virtual reality display device field of view; and